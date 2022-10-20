Accusations that GitHub Copilot steals code have intensified debate in the tech industry about what constitutes fair use of intellectual property, and raised the question of who is responsible when AI suggestions include unattributed but licensed code.

GitHub Copilot, released in June, translates natural language to suggestions for lines of code, which can range from boilerplate code to complex algorithms. The Codex artificial intelligence model behind GitHub Copilot is a natural language processor (NLP) trained on tens of millions of public repositories of code, including the majority of Python code stored on GitHub. Although Codex's developer, OpenAI, believes that the NLP is an instance of transformative fair use, legal investigations and the court of public opinion are beginning to challenge that notion.

Developers have raised questions this year about whether AI pair programmers produce code that can qualify as transformative fair use or if they infringe on copyrights. But this week saw that change from words to action when a team of class-action lit­i­ga­tors at Joseph Saveri Law Firm in San Francisco launched an investigation into a poten­tial law­suit against GitHub Copi­lot.

"Open source software creators, users and owners have serious concerns regarding Microsoft's new Copilot auto-coding product," the team stated on the law firm's website. "Microsoft is profiting from others' work by disregarding the conditions of the underlying open source licenses and other legal requirements."

The potential legal quagmire has one developer ill at ease.

"It is deeply concerning to me because how this plays out is going to determine a lot about which machine learning models get generated -- which will directly impact the usefulness of them," said Chris Riley, senior manager of developer relations at marketing tech firm HubSpot. "For example, if Microsoft loses [a lawsuit], that will open the door to sue OpenAI."

If the user of the Microsoft product is aware that they are knowingly using copyrighted material, it's the same as if any of us knowingly use copyrighted material, absent a transformative use. Aron SolomonHead of strategy and chief legal analyst, Esquire Digital

In turn, if OpenAI is sued, then other tools that use the technology, such as content creation tool Jasper, might be off-limits -- which will have an unknown effect on current projects, Riley said.

But the potential for lawsuits isn't limited to product creators. Copilot users may be breaking copyright law if they use copyrighted material, said attorney Aron Solomon, head of strategy and chief legal analyst at Esquire Digital.

"If the user of the Microsoft product is aware that they are knowingly using copyrighted material, it's the same as if any of us knowingly use copyrighted material, absent a transformative use," he said. "Transformative fair use of code would either have to alter the code itself or it transforms what the code does."

The Copilot FAQ states that "GitHub does not own the suggestions GitHub Copilot generates. The code you write with GitHub Copilot's help belongs to you, and you are responsible for it."

Thus, developers should take steps to avoid legal problems down the road, Solomon said. Developers should do their due diligence, perhaps by pasting suggested code snippets into Google to ensure there's no copyright attached, he said.

"Or at least that very little of their code is subject to copyright," Solomon said. "It's like you or me using an image in a piece we write," he said. "If I Google 'Mona Lisa' and just save the first image I find, I'm pretty sure someone has rights to it. Very different from me Googling 'Creative Commons Mona Lisa' then going through the affirmative steps to make sure I attribute correctly."

@github copilot, with "public code" blocked, emits large chunks of my copyrighted code, with no attribution, no LGPL license. For example, the simple prompt "sparse matrix transpose, cs_" produces my cs_transpose in CSparse. My code on left, github on right. Not OK. pic.twitter.com/sqpOThi8nf — Tim Davis (@DocSparse) October 16, 2022