
Getty Images
AI training, copyright issues headline U.S. Senate hearing
U.S. senators blasted companies, including Meta and Anthropic, for training AI models on copyrighted content, including pirated books and other materials.
Anthropic, OpenAI, Meta, Microsoft, Google and Perplexity are among the companies being sued for training generative AI models on copyrighted material. As the lawsuits pile up, Congress is becoming concerned.
Sen. Josh Hawley (R-Mo.) described generative AI (GenAI) vendors' model training on copyrighted material as "the largest intellectual property theft in American history," during a hearing held by the U.S. Senate Subcommittee on Crime and Counterterrorism on Wednesday. Hawley said AI companies have stolen "massive amounts of copyrighted material from illegal online repositories."
Indeed, thirteen authors, including Sarah Silverman and Ta-Nehisi Coates, sued Meta in 2023 for copyright infringement. They alleged that Meta infringed on their copyrighted works when it trained its Llama AI models. A federal judge dismissed their claims in June, ruling it was fair use. Other authors are still pursuing a similar case against OpenAI. A similar case against Anthropic also went in the vendor’s favor last month, but the judge in the Meta case left open the possibility that the authors could pursue further claims because the book material Meta used was from data illegally acquired from shadow libraries, which provide pirated copies of copyrighted material.
The New York Times, Chicago Tribune, Denver Post and other newspapers sued Microsoft and OpenAI in 2024 for copyright infringement, alleging the company used millions of articles to train its AI tools. Multiple other authors, musicians and creators have also sued AI companies for using their copyrighted material.
"AI companies are training their models on stolen material," Hawley said. "We have got to do something to protect the people of this country."
Balancing innovation with protecting intellectual property
Hawley said during the hearing that "enough is enough, it's time to enforce the law."
"I'm all for innovation but not at the price of illegality," he said. "I'm all for innovation, but not at the price of destroying the intellectual property of the average man and woman in this country. We have laws for a reason. Those laws ought to be enforced, and big tech should not be above the law."
Subcommittee Ranking Member Sen. Dick Durbin (D-Ill.) shared Hawley's concerns about AI's effect on intellectual property rights. He said during the hearing that it's a "critical topic we can't overlook."
"While AI can be an incredible tool that unlocks further creativity, writers, artists, musicians, and others are rightfully concerned about what technology means to them personally," he said. "Should AI companies be able to use their materials freely as 'fair use,' or should they receive compensation when their works are used to train AI models?"
Michael Smith, professor of IT and marketing at Carnegie Mellon University, said during the hearing that AI companies today are making similar arguments as companies in the early days of the internet -- that enforcement of copyright law will stifle innovation.
Smith said a vibrant technology economy depends on a vibrant creative economy. He said the U.S. found a way to license music for streaming in the early 2000s, and that can be replicated with GenAI.
"On our current path, we risk killing the goose -- or in this case, the authors, musicians, coders and filmmakers -- who laid the golden eggs that are key to the present and future value of generative AI output," Smith said.
U.S. courts are struggling to address the issue of AI training on copyrighted material, a behavior that's "deeply problematic and troubling," Bhamati Viswanathan, professor of law at New England Law School, said at the hearing.
She added that the courts have yet to reach a consensus on fair use. The U.S. Copyright Office issued a report earlier this year concluding that not all GenAI models' use of copyrighted material in training can be considered fair use.
The Copyright Office suggested that a licensing regime may eventually arise to deal with the issue of AI model training and compensating creators for the use of copyrighted material. Viswanathan reiterated the need to enforce licensing to protect content creators.
"We all believe in innovation, we believe generative AI has potential," she said. "But you cannot compromise the livelihood of creators … simply by saying we need new technologies to flourish."
Author testifies on AI copyright infringement
David Baldacci, an author who testified during the hearing, said his son asked ChatGPT to write a plot that read like a David Baldacci novel. The tool presented three pages with elements of "pretty much every book I'd ever written."
"That's when I found out that the AI community had taken most of my novels without permission and fed them into the machine learning system," he said.
Baldacci said AI companies claim it would be difficult to license the work of individual creators, but the models need those works to generate their insights. He said while AI companies claim the technology is transformational, "billions of people have been transformed by books."
"I'm sure there are aspects of AI that will also transform the world," he said. "But if you want to bet on which side is more transformational for all of us, I will bet on books every single time."
Makenzie Holland is a senior news writer covering big tech and federal regulation. Prior to joining Informa TechTarget, she was a general assignment reporter for the Wilmington StarNews and a crime and education reporter at the Wabash Plain Dealer.