GitLab‘s partnership with Google for generative AI looks to assuage enterprise concerns about data privacy, but many open questions about licensing and other risks remain.

The partnership news this week comes amid a tidal wave of IT products that incorporate generative AI built on large language models (LLM). Such tools have held a high profile over the last year, from the availability in June 2022 of GitHub’s Copilot feature, based on OpenAI’s Codex project, to OpenAI's public release of ChatGPT in November. A ChatGPT API followed in March, which prompted software vendors to integrate it into their products, including infrastructure as code and DevOps tool makers.

The GPT API can include enterprise licensing to ensure customer data is stored privately and not used to train OpenAI’s models. GitLab’s partnership with Google, by contrast, doesn’t require users’ data to leave the GitLab cloud at all, according to David DeSanto, chief product officer at GitLab.

“It means that we can see end to end, how the data is being sent and responded to and stored, as opposed to other third-party services where it's more of a black box,” he said.

GitLab plans lean on Google’s generative AI expertise to make an experimental “Explain this Vulnerability” feature production-ready this year. Other experimental and beta-stage generative AI features disclosed by GitLab since September 2022 include suggested reviewers, code suggestions, vulnerability guidance, a value streams dashboard, license policies and license compliance scans, secrets leak prevention, and security policy enforcement. Further features slated for initial release this year include dependency lists, container and dependency scanning, management tools for compliance frameworks and SBOM ingestion, according to a company blog post.

DeSanto stopped short of saying GitLab plans to move all of these projects under the Google partnership, but didn’t rule that possibility out. GitLab also has a commercial partnership with OpenAI, and DeSanto declined to comment on whether Google’s Generative AI for Vertex AI, launched in March, is superior technically to OpenAI’s GPT.

“We've got a lot of models that make up our code suggestions,” he said. “We realized that if we just used one model, like most of our competitors, we would not be as effective in suggesting the right code.”

Experts urge generative AI transparency as legal precedents pend GitHub, OpenAI and Microsoft are the targets of a lawsuit alleging copyright violations in code generated by GitHub’s Copilot tool, based on OpenAI’s Codex. The outcome of that case is expected to answer major open questions regarding the licensing and copyright of AI-generated code. Until then, large enterprises will stay mostly on the sidelines, said Ricardo Torres, chief engineer of open source and cloud native and associate technical fellow at aircraft manufacturer Boeing. “If they use [ode licensed under a General Public License ]to train the model, and that ends up in a customer's product, even though they didn't steal data, [the customer may have] been infected by their training data,” Torres said. “Even open source foundations are concerned about this -- if they start taking in AI-generated code, and this code was GPL licensed, for example, that's a viral license, and it could infect other things.” GitLab’s DeSanto declined to comment on the GitHub / OpenAI lawsuit. The company is cautious about generative AI licensing issues, said another company official. “We’re taking great care to help prevent and limit customer exposure to license poisoning,” wrote Taylor McCaslin, GitLab's group manager of product for data science, in an email sent via a company spokesperson. “This includes care and caution with selecting AI foundation models that power our features, filtering and excluding training data sets, and controls for how customers interact with these features.” Enterprises shouldn’t dismiss the innovation happening in generative AI, but it’s mandatory that they require vendors to supply them with information about how their data is being used in relation to such systems, and demand transparency in how LLMs arrive at their results, said Andy Thurai, an analyst at Constellation Research. “Every organization is liable for their data and their customer’s data. Just because you pass it to some AI [provider] doesn’t mean you pass on the ownership and liability,” he said. “Enterprises need to demand to understand the model, algorithm, transparency, ethics, bias mitigation, etc. before they start using any AI solution.”