Reddit sells user data to undisclosed AI company

Jim Love

1 year ago

An AI company has reportedly struck a $60 million annual deal with Reddit to use the platform’s user content for training its AI models. As Reddit prepares for an initial public offering (IPO), this deal, disclosed to potential investors, represents a significant move by the company to increase its value.

The unnamed AI company has secured rights to train its AI models on Reddit’s vast trove of user-generated content. Traditionally, AI firms like OpenAI have trained their language models using publicly scraped web data, a practice now facing greater scrutiny and prompting formal agreements.

The trend of seeking explicit permissions for AI training is growing. Apple, for instance, has been in talks with various media companies to license news article archives to train its own AI models, with proposed deals worth millions.

Using user-generated content for AI training raises legal and ethical questions. While Reddit’s terms may permit this use, user reactions may vary, with some potentially opposing the idea.

The deal follows Reddit’s contentious decision to restrict access to its API, which affected client apps like Apollo. This move aimed at revenue maximization led to widespread protests from moderators and users, and aggressive responses from Reddit management, including the removal of moderators who continued to protest.

The API restrictions particularly impacted disabled moderators who relied on third-party apps for their accessibility features.

In response to the API changes, some subreddits limited access to existing members or labeled their content as NSFW to disrupt Reddit’s ad sales.

The agreement marks a pivotal moment in Reddit’s business strategy and highlights the intersection of user-generated content with the burgeoning field of AI, raising critical discussions on privacy, consent, and the commercial use of such data.

Sources include: 9to5Mac