The U.S. Copyright Office's report concluding that not all copyrighted material used in training generative AI models can be considered fair use will likely affect copyright lawsuits against generative AI vendors.
The office's prepublication report on copyright and generative AI training, released on May 10, said that while training applications for research and analysis is considered fair, training models on copyrighted material to create competitive content falls outside the scope of fair use. The report is the last in a three-part series on copyright and AI from the federal agency. While a final version of the third report has not yet been published, no major changes are expected in the analysis and conclusion.
In copyright law, the fair use doctrine permits limited use of copyrighted material to allow for free expression. For AI model training, uses considered fair depend on their purpose and the copyrighted works used, both of which the report noted "can affect the market."
The report pointed out that a model deployed for purposes such as analysis or research has a vastly different market outcome than a model making commercial use of copyrighted work to produce content that then competes with the original copyrighted material in the market. Such use "goes beyond established fair use boundaries," the report stated.
The Copyright Office's assessment on fair use is not unexpected, but rather a nuanced and thoughtful approach to the issue, said Louis Tompros, a lecturer on law at Harvard Law School. Shortly after the report was released, The Washington Post and others reported that President Donald Trump fired Shira Perlmutter, director of the U.S. Copyright Office. The move occurred days after Trump fired Carla Hayden, head of the U.S. Library of Congress. The Copyright Office is part of the Library of Congress.
"Fair use is a classically case-by-case analysis, and it has been a classically case-by-case analysis in copyright law since it came into being," Tompros said. "It is ultimately the statute that tries to strike a balance between the importance of copyright and protecting authors' rights and the importance of the First Amendment and allowing for free expression."
Tompros said the Copyright Office's report will "surely be cited in most, if not all, of these copyright cases."
Courts looking at AI training and copyright issues will be making case-by-case assessments depending on the training methodology, the particular use of copyrighted material and the markets for those works used in training, he said.
The report will likely be cited by both sides as authors and AI companies seek to make the case for or against fair use, which will further emphasize the need for a thorough case-by-case analysis, Tompros said.
"The authors are going to say, to the extent that AI companies suggest that all AI training is fair use, the Copyright Office disagrees," he said. "The AI companies are going to say, to the extent that authors suggest that AI training is not transformative and cannot be fair use, the Copyright Office disagrees."
Licensing agreements
To address use cases that fall outside the fair use doctrine, the Copyright Office report suggested licensing agreements for AI training.
"Effective licensing options can ensure that innovation continues to advance without undermining intellectual property rights," the report said.
However, the report recommended letting the market continue to develop before any government intervention, such as rules or regulations.
It acknowledges that some training uses might not qualify, but that's not new or surprising. Fair use has always depended on context.
Hodan OmaarSenior policy manager, Center for Data Innovation
Indeed, Hodan Omaar, a senior policy manager for AI policy at the Center for Data Innovation, said the report's overall message indicates that existing copyright law -- including fair use -- is a workable tool for technology like generative AI.
"It acknowledges that some training uses might not qualify, but that's not new or surprising. Fair use has always depended on context," she said. "In the end, the report doesn't call for any new laws."
The problem with most of the generative AI tools that exist now is that when they were being developed, there was no market for AI training data, Tompros said.
"When there's no such thing as a market for AI training data, using data in AI training doesn't affect the market and is therefore much more likely to be fair use," Tompros said. "What we're seeing now is a developing set of markets for particular types of AI training data."
News organizations, musicians and authors are now looking to license their works for AI training, Tompros said. As that market develops, it will make it considerably less fair to use material without permission. As a result, he said, it's likely that a market for AI training data and licensing will develop.
Tompros pointed to the development of Napster and online music sharing in the late 1990s and early 2000s. While some saw it as the end of the music industry, the market developed for fair music file sharing and license fees being paid, leading to products like iTunes and Spotify.
The report doesn't "rush to redesign the system," instead recognizing that licensing could be the appropriate path as those markets develop, Omaar said.
"Letting the market evolve first gives the system room to adapt organically," she said. "And if that doesn't work, if creators can't get paid or developers can't get access, then there's room to revisit it."
Makenzie Holland is a senior news writer covering big tech and federal regulation. Prior to joining Informa TechTarget, she was a general assignment reporter for the Wilmington StarNews and a crime and education reporter at the Wabash Plain Dealer.