Reddit sues AI startup claiming chatbot trained on stolen data

SAN FRANCISCO (CN) — Nothing is certain in life — except death, taxes and the inevitability that someday, something you posted online will be used to train an AI chatbot.

Social media giant Reddit on Wednesday sued Anthropic, an artificial intelligence startup, over claims that the company violated Reddit’s privacy policy and user agreements by scraping data from user posts to train its flagship artificial intelligence assistant, Claude.

Reddit accused the company — which claims to prioritize honesty, privacy and high public trust in its AI development — of “corporate cognitive dissonance,” stating Anthropic breached the website’s terms of service by training its large language model on users’ personal data.

“Reddit brings this action to stop Anthropic — who tells the world that it does not intend to train its models with stolen data — from doing just that,” the company said in its lawsuit.

Anthropic, headquartered in San Francisco, was founded in 2021 by siblings Dario and Daniela Amodei, who previously worked at OpenAI. The organization claims to focus on improving the safety and reliability of AI models, and has already attracted multibillion-dollar investments from companies like Amazon and Google.

In its 26-page complaint, Reddit claims that the AI company has been scraping user data since 2021 while also refusing to enter into a licensing negotiation with it over the user data at issue.

Although tech companies leveraging user data to train AI models isn’t a new practice, Reddit is suing Anthropic for not going through the channels it set up specifically to license its users’ content. By formally partnering with companies, Reddit said it can ensure tech companies abide by its licensing terms, which it claims better protect users’ interests and privacy.

“Other giants in the AI space understand and respect Reddit’s rules,” Reddit said of its licensing practices.

In February 2024, Google cut a deal with Reddit to the tune of $60 million per year to license its users’ content to the Alphabet company for AI training data.

In May 2024, OpenAI entered into a similar partnership with Reddit for user data to train its AI.

Reddit is seeking a jury trial for punitive and compensatory damages, as well as a court order prohibiting Anthropic from using any Reddit data or content for training its products going forward.

Reddit also makes unjust enrichment claims, arguing that Anthropic is reaping “significant profit” from technology made possible by its content.

In July 2024, Anthropic responded to accusations by Reddit CEO Steve Huffman that it has been unlawfully exploiting Reddit content by stating that it doesn’t train its AI models on Reddit anymore.

“Reddit has been on our block list for web crawling since mid-May, and we haven’t added any URLs from Reddit to our crawler since then,” Anthropic spokesperson Jennifer Martinez told The Verge.

However, in its lawsuit, Reddit claims this isn’t true.

“Despite Anthropic stating that ClaudeBot and other Anthropic agents no longer access Reddit, in the subsequent months, Reddit logs indicated that Anthropic’s bots accessed or attempted to access the Reddit platform over 100,000 times,” the company said.

This case was filed in San Francisco Superior Court on June 4.

Categories / Business, Consumers, Technology

Subscribe to our free newsletters

Our weekly newsletter Closing Arguments offers the latest about ongoing trials, major litigation and rulings in courthouses around the U.S. and the world, while the monthly Under the Lights dishes the legal dirt from Hollywood, sports, Big Tech and the arts.