Updates to our Terms of Use

We are updating our Terms of Use. Please carefully review the updated Terms before proceeding to our website.

Friday, June 28, 2024 | Back issues
Courthouse News Service Courthouse News Service

OpenAI sued by Center for Investigative Reporting over AI plagiarism

The newsroom joins a stable of similar litigation by other news organizations accusing OpenAI and Microsoft of illegally profiting off journalists' labor.

(CN) — The United States' oldest nonprofit newsroom sued ChatGPT maker OpenAI and Microsoft in federal court in New York Thursday, accusing the tech companies of using copyrighted content to train their artificial intelligence programs.

"According to the award-winning website Copyleaks, nearly 60% of the responses provided by defendants’ GPT-3.5 product contained some form of plagiarized content, and over 45% contained text that was identical to pre-existing content," the Center for Investigative Reporting claimed in its 33-page complaint.

Besides accusing OpenAI's ChatGPT responses of including plagiarized work, the center says the defendants violated federal copyright law by training ChatGPT and Microsoft's Copilot AI system on the work of the newsroom's journalists.

“OpenAI and Microsoft started vacuuming up our stories to make their product more powerful, but they never asked for permission or offered compensation, unlike other organizations that license our material,” Monika Bauerlein, CEO of the Center for Investigative Reporting, said in a prepared statement.

Microsoft has invested more than $13 billion in OpenAI since 2019, with one valuation putting OpenAI's 2024 net worth at $86 billion. The newsroom says Microsoft of provided OpenAI with "the bespoke supercomputing infrastructure" needed to train ChatGPT and develop Copilot, with both companies standing to profit off the nonprofit's labor.

The newsroom outlines seven counts against seven OpenAI affiliates and Microsoft, and asks the court to force the defendants to remove copyrighted works from AI training set corpuses like WebText and WebText2. The center claims the defendants added the copyrighted works to their training sets by using algorithms called "Dragnet" and "Newspaper."

Dragnet is designed to copy article content while omitting "author, title, copyright notices, and footers," and the center claims that while Newspaper allows a user to extract title and author information, the defendants chose not to do so in order to maintain consistency with Dragnet standards.

"The absence of author, title, copyright notice, and terms of use information from the copies of plaintiff’s articles generated by applying the Dragnet and Newspaper codes — codes OpenAI has admitted to have intentionally used when assembling WebText — further corroborates that the OpenAI Defendants intentionally removed author, title, copyright notice, and terms of information from Plaintiff’s copyright-protected news articles," the center wrote in its complaint.

The Center for Investigative Reporting was founded in 1977. Earlier this year it merged with another progressive nonprofit journalism outlet, Mother Jones, founded in 1976 and operated by the Foundation for National Progress. The merged newsroom now produces content for several multimedia platforms, including the eponymous Mother Jones magazine and the Reveal radio show and podcast.

The combined outlets argued that Microsoft and OpenAI plagiarizing their work threatens their bottom line. Unless the practice stops, they claim, they will not be able to retain as many reporters or conduct as many journalistic investigations.

The newsroom is not the only entity to voice these concerns. The Authors Guild, the New York Times and eight newspapers owned by MediaNews Group and Tribune Publishing — including the Chicago Tribune — have all filed similar federal copyright lawsuits against OpenAI and Microsoft since last November.

“Developers like OpenAI have garnered billions in investment and revenue because of AI products fundamentally created with and trained on copyright-protected material,” said Matt Topic, one of the Center's attorneys with the Chicago-based civil rights law firm Loevy & Loevy, in a prepared statement. “Publishers are rightfully concerned about AI summaries damaging the market for their journalism, and we intend for this lawsuit to stop this latest example of ‘move fast and break things’ tech companies doing what they want without regard for the rights to others and without compensation.”

A spokesperson for OpenAI responded to news of the lawsuit in an email late Thursday evening.

"We are working collaboratively with the news industry and partnering with global news publishers to display their content in our products like ChatGPT, including summaries, quotes, and attribution, to drive traffic back to the original articles," the spokesperson said. "A component of the partnerships is the ability to leverage publisher content using various machine learning and training techniques to help us optimize the display of that content and make it more useful to users."

Microsoft did not respond to request for comment.

Follow @djbyrnes1
Categories / Courts, Media, Technology

Subscribe to Closing Arguments

Sign up for new weekly newsletter Closing Arguments to get the latest about ongoing trials, major litigation and hot cases and rulings in courthouses around the U.S. and the world.

Loading...