Apple Intelligence Embedding models for semantic search: A guide

AI, copyright and fair use: What you need to know

As AI technology advances, U.S. and international copyright laws are struggling to keep pace, raising legal and ethical questions about ownership and AI-generated content.

AI innovators and U.S. copyright law are on a collision course, with neither side finding satisfactory technological or legal solutions to enable AI-generated products to move forward.

Major software and hardware vendors have poured billions of research and development dollars into AI technology, producing a steady stream of tools that create new business opportunities for corporate users. That's the good news.

The bad news, in the opinion of a growing number of developers and market researchers, is that copyright law has not kept pace with AI development. The current copyright law framework, the Copyright Act of 1976, dates back to a time when mainframes and minicomputers ruled the IT world, and the internet and AI were more the subjects of science fiction than computer science.

"The U.S. government never does anything quickly in matters like this," said Jack Gold, founder and analyst with J. Gold Associates, LLC. "Call it inertia, but the regulatory system is built around taking things slowly to ensure they get things right. Unfortunately, in the case of AI, issues out there need a much quicker response."

One of the biggest concerns among developers and user organizations, especially media companies, involves generative AI programs, such as OpenAI's ChatGPT. These programs are trained on a vast array of written materials, including potentially copyright-protected works, such as newspapers and books. The technology can then generate diverse types of content, including images, videos, music, speech, text, software code and product designs, based on that training data.

This training process raises ethical, artistic and copyright concerns. These issues have drawn the attention of the U.S. Copyright Office and courts, which are now determining how to address the problem. The key questions are: Can generative AI outputs be copyrighted, and how might generative AI tools infringe on the copyrights of other works?

The landscape of U.S. AI copyright battles

Decisions on copyright protection for AI outputs currently hinge on the idea of "authorship." The Copyright Act of 1976 grants copyrights only to original works, but does not clearly define who or what qualifies as an "author." So far, the Copyright Office only recognizes copyrights in creations made by humans, a stance supported by several court decisions.

One example is Stephen Thaler's 2022 lawsuit against the Copyright Office. Thaler challenged the human authorship requirement after the Copyright Office denied his application to register a visual artwork that he claimed was created "autonomously" by his AI program, Creativity Machine. Thaler argued that the Copyright Act does not mandate human authorship. But last August, a federal district court ruled in favor of the Copyright Office.

In another case in September 2023, Jason Allen requested that the U.S. Copyright Office Review Board reconsider his application to register an artwork entitled Théâtre D'opéra Spatial, which won the 2022 Colorado State Fair art competition. The piece was created using Midjourney, a generative AI tool that produces images in response to text prompts.

Allen laid out his creative process in crafting the text prompts and making various revisions to the AI-generated images. But the Copyright Office, applying its most recent guidelines, determined that the final Midjourney image lacked sufficient human authorship, despite Allen's claims that hundreds of rounds of image generation were necessary to produce the finished work.

In its ruling, the Copyright Office acknowledged Allen's visual edits and stated that generated images "could have a sufficient amount of authorship" to be registered. However, it added that more information was needed to determine whether those edits met the threshold for copyrightability.

According to a report from the law firm Perkins Coie, specialists in international law, this ruling suggests that the Copyright Office is focusing on whether the machine was merely an assisting tool or if the elements of authorship were conceived by the AI, as well as how much creative control the artist had over the work. The Office appears to be separating the AI-generated components from the final work instead of analyzing the work as a whole.

Late last year, the New York Times sued OpenAI and Microsoft -- OpenAI's largest investor -- for using millions of the newspaper's articles to train OpenAI's models. The Times feared that OpenAI's generative AI tools would repurpose its reporting and display that content on AI platforms, such as the ChatGPT interface. If the court rules that OpenAI illegally used the Times' stories, OpenAI could be forced to dump its large language model (LLM) and create a new one from scratch.

Adding momentum to the Times' suit, eight other major newspapers, including the Chicago Tribune and New York Daily News, filed a lawsuit on April 30, claiming that OpenAI "misused" reporters' work to train its generative AI systems.

A spokesperson for the Copyright Office declined to comment on the actions taken by the eight newspapers, citing the agency's policy not to discuss ongoing litigation. However, she added that the Copyright Office is researching and preparing a report to update its policies regarding such suits, to be published this spring. Last June, the agency issued guidance outlining its current policy practices for registering works that contain copyrighted material.

International approaches to AI and copyright

Although the U.S. Copyright Office and courts currently insist on human contributions when deciding whether works qualify for copyright protection, AI innovators might find overseas copyright laws less rigid.

In the Li v. Liu case presented before the Beijing Internet Court last November, the court ruled that an AI-generated image was copyrightable, finding the defendant liable for copyright infringement. The court held that a copyrightable work must show sufficient human contributions and acknowledged the plaintiff's intellectual input through the image generation process. This input included choosing the AI tool provider, Stable Diffusion, and designing the prompts, which the court deemed sufficient to merit copyright protection.

The EU's regulations governing AI and copyright laws follow restrictions similar to the ones in the U.S., requiring a certain level of human contribution. The main legal framework for copyright in the EU is the Copyright Directive, which addresses copyright laws across EU member states. The Directive was updated in 2019 to protect original literary works, including those generated by AI algorithms, if they meet the criteria for originality and creativity.

But the updated regulations do not clearly define who owns the copyright for AI-created works, leaving the question of ownership up to existing copyright laws, which generally attribute ownership to the person who created the work.

The EU is developing a new framework to grant copyright protection to AI-generated works and create a new category of "AI authorship" that could be owned by the developer or user of an AI system, rather than the person who created the work. This updated framework is still under review, however, and will not go into effect until later this year.

Determining whether training AI models constitutes fair use

While copyright law and generative AI communities remain at odds -- particularly over the issue of copyrighting AI products with no human intervention -- some believe a possible solution resides in what Bob Sutor, vice president for emerging technologies at the Futurum Group, calls "the fuzzy middle."

Sutor suggested a scenario where an author comes up with a series of random numbers and then uses those numbers to create a piece of art. "I could have a series of random numbers in my head and, leveraging only AI technologies, then figure out a way to create a piece of art," Sutor said. "I'm not copying anyone else, and so wouldn't that be copyrightable? It would be the same as creating something entirely invented by a computer. However, most cases today tend to revolve around generative AI."

Sutor cautioned that these are "very early days" for generative AI and predicts other issues will surface that require U.S. copyright laws to evolve at a faster pace. "The natural progression of creativity and intellectual property rights will have to be reexamined generation after generation," he said.

Much of the "fuzzy middle" ground touches on questions of fair use, codified in Section 107 of the Copyright Act. Fair use allows for the use of copyrighted work under certain conditions without the owner's permission, intended to soften the sometimes-rigid application of copyright law and encourage creativity. For AI, the fair use provision enables developers to build on previous works without taking away existing owners' rights to control and benefit from their original works.

Some litigants have argued that using copyrighted works to train AI programs should be considered fair use if it meets the four statutory factors under Section 107:

  • The purpose and character of the use -- namely, whether for commercial, nonprofit or educational purposes.
  • The nature of the copyrighted work.
  • The amount and sustainability of the portion used relative to the whole copyrighted work.
  • The effect of the use upon the potential market or value of the copyrighted work.

Despite these ground rules, it remains unclear exactly how much material can be lifted from a copyrighted work before it is no longer considered fair use.

"Fair use might say, if you are writing a review of a book, you can pull out paragraphs for the review," Sutor said. "But how many are you allowed to pull out before it is classified as derivative or not derivative and becomes a new composition? This same issue has come up in music forever."

Evaluating the issue of fair use in OpenAI's model training

OpenAI has acknowledged that its programs are trained on publicly available data sets, including copyrighted works. This process involves making copies of the data to be analyzed. However, creating such copies without permission could infringe on copyright holders' rights. OpenAI argues that its training processes constitute fair use and do not involve infringement.

Specifically, OpenAI contends it is protected under Section 107's fair use provisions because the copies are not made available to the public and are used solely for training its programs. The company cited the Authors Guild, Inc. v. Google, Inc. case, where the U.S. Second Circuit Court of Appeals held that Google's copying of entire books to create a searchable database displaying excerpts of those books constituted fair use.

Last year, comedian Sarah Silverman and writers Christopher Golden and Richard Kadrey filed suit in a California federal court, alleging that OpenAI infringed on their copyrighted books by using them to create and distribute derivative works without permission. The plaintiffs argue that OpenAI's LLMs are themselves derivative works because they rely heavily on creative content from the authors' books.

A second lawsuit with similar complaints claims that OpenAI used copyrighted books to train an AI model that infringes on those copyrights. The suit seeks class action certification for all U.S. residents who hold U.S. copyrights on any work that OpenAI used for training. The judge ordered the two suits to be combined to save time and avoid "judicial waste."

What lies ahead for AI copyright

In the U.S., a Senate committee informally known as the Senate AI Gang released a 31-page report on May 16 after a year of research, including publicly held seminars and dozens of conversations with tech CEOs and academic researchers. This report is designed to serve as a roadmap for regulating AI.

While that roadmap calls for billions of dollars in AI research funding, it failed to concretely address several important issues facing the Copyright Office and AI innovators -- namely, the future of copyright law, regulation of AI models and training data, and issues surrounding open source AI. Some view this as the government's avoidance of thornier AI issues.

"The Senate is punting on these issues, just kicking it back to the agencies … to resolve," said Frank Dzubeck, president of Communications Network Architects, Inc., in Washington, D.C. "Nobody is willing to stick their necks out. They [Congress] are going to wait to see what the Europeans do with AI. But with the number of legal cases building up, it's now a snowball about to roll downhill, and people are jumping out of the way trying to avoid getting hit."

Ed Scannell is a freelance writer and journalist based in Needham, Mass. He reports on a wide range of technologies and issues related to corporate IT.

Next Steps

AI lawsuits explained: Who's getting sued?

Dig Deeper on Enterprise applications of AI

Business Analytics
Data Management