News

New Nvidia, GitHub AI coding assistants expand devs' options

GitHub Copilot Enterprise and StarCoder2 LLMs, both released this week, will add to an array of AI coding assistants. But caution, especially with security, is still warranted.

Beth Pariseau

By

Beth Pariseau, Senior News Writer

Published: 28 Feb 2024

Updates from GitHub and a consortium comprised of Nvidia, HuggingFace and ServiceNow will bring fresh options to an already wide selection of AI coding assistants for developers. But experts urge adoption caution amid ongoing security and copyright concerns.

GitHub Copilot Enterprise, a new tier of the popular GitHub Copilot AI coding assistant, became generally available this week at $39 per user per month for users of GitHub's Enterprise Cloud. This version offers customization for organizations using Copilot, generating chat answers, code completion and pull request difference analysis based on a specific codebase. An add-on that will offer fine-tuned AI models is coming soon, according to a GitHub blog post.

Andy Thurai, analyst, Constellation Research

Andy Thurai

"Coding copilots are a solid use case for improving developer efficiency that many enterprises are considering, experimenting with and implementing," said Andy Thurai, an analyst at Constellation Research. "GitHub Copilot, … backed by Microsoft, has an early adopter advantage because of its integration with Visual Studio."

GitHub Copilot is already among the most widely used AI coding assistants available, according to a 2023 survey of 800 engineering professionals. The survey, conducted by software supply chain security vendor Sonatype, found that 97% of DevOps and SecOps leader respondents currently employ generative AI to some degree in their workflows. Of that 97%, a majority reported using two or more tools daily. Topping the list of most-used tools at 86% was ChatGPT, followed by GitHub Copilot at 70%.

As such, it will be difficult for competitors to unseat GitHub Copilot, Thurai said.

"Microsoft has complete control within the plugin to Visual Studio software," he said. "The additional cost of Copilot plugins is so minimally incremental that most enterprises have already opted to use that as a default practice."

Security caveats remain for AI coding assistants

With GitHub Copilot Enterprise, GitHub claims "enterprise-grade security, safety and privacy," which includes excluding organizations' data from model training by default. As with Copilot Business, Copilot Enterprise includes intellectual property indemnity for customers. IP indemnity is meant to assuage concerns about ongoing lawsuits against Microsoft, GitHub and large language model (LLM) partner OpenAI that claim their AI models were trained on copyrighted data. Microsoft and GitHub have pledged to cover any costs paying customers might incur depending on the outcome of those lawsuits.

Despite that indemnity, Sonatype's survey report sounded a note of caution about AI coding assistants due to copyright concerns.

"The copyright issues around the training sets and outputs of generative AI aren't going away anytime soon," the report read. "Overall, the devil is in the details, and the legal challenges are likely to help democratize the AI landscape."

Meanwhile, even this new high-end Copilot tier -- and any AI coding assistant, regardless of vendor -- comes with significant caveats for now, particularly around security. Recent research by cybersecurity vendor Snyk showed that AI coding assistants, including GitHub Copilot, are prone to reproducing security vulnerabilities and bad practices from a customer's existing codebase.

LLMs are being refined rapidly, but still sometimes "make stuff up," according to Thurai. "Which means you have to avoid that by either fine-tuning the model, [adding] RAG [retrieval augmented generation] and [doing] other things to make it better."

GitHub offers Dependabot, a free tool that discovers vulnerable software dependencies in codebases, and requires two-factor authentication for all GitHub contributors. A GitHub Advanced Security license available for $49 per active code committer per month comes with code and secrets scanning, custom Dependabot auto-triage rules, and dependency reviews. Numerous third-party tools to scan and remediate security vulnerabilities in AI-generated code are also available.

"Regardless of the tool used, teams cannot and should not depend on any single tool to guarantee the security of their software," a GitHub spokesperson wrote to TechTarget Editorial in response to the Snyk report.

As enterprises move forward with AI coding assistants, Sonatype's survey findings indicate these concerns feed lingering skepticism among some DevSecOps pros.

"A striking 75% of both [DevOps and SecOps leads] cited feeling pressured from leadership to adopt AI technologies, recognizing their potential to bolster productivity despite security concerns," the report read.

StarCoder2 offers lightweight LLM, opt-in data

ChatGPT and GitHub Copilot's early dominance notwithstanding, competitors abound, including Meta's Code Llama, Stability AI's StableCode, Amazon CodeWhisperer, and IBM's WatsonX code assistant. Soon, domain-specific AI code assistants will also be built using the StarCoder family of LLMs, which reached version 2 this week.

The industry group behind StarCoder2 -- enterprise workflow vendor ServiceNow, open source AI clearinghouse HuggingFace and AI chipmaker Nvidia -- claims the updated trio of LLMs will address multiple security and legal concerns around AI coding assistants. These models will be cheaper to run than existing models, can easily be fine-tuned to provide better quality answers based on specific codebases, and addresses ongoing concerns about data sourcing and privacy, according to Nvidia officials.

"What you would call a frontier model, GPT-4 class models, is probably several hundred billion, maybe even up to a trillion parameters," said Jonathan Cohen, vice president of applied research at Nvidia. "[But] there's this emerging class of models that's in the five to 15 billion parameter range, … and what's nice about them is they fit very comfortably on a single GPU. … You don't need a special server with many GPUs or super-fast interconnects because you're going to split it across many nodes."

Copyright issues around the training sets and outputs of generative AI aren't going away anytime soon. Overall, the devil is in the details, and the legal challenges are likely to help democratize the AI landscape.

'Risks and Rewards of AI'Sonatype

StarCoder2 models will come in three sizes in that smaller range: a 3-billion-parameter version trained by ServiceNow, a 7-billion-parameter model trained by HuggingFace and a 15-billion-parameter model trained by Nvidia. These smaller models were also trained for longer on a dataset seven times the size of the first generation of StarCoder models, improving their accuracy, Cohen said.

An Nvidia blog post this week also touted that StarCoder2 LLMs were trained "using responsibly sourced data under license from the digital commons of Software Heritage."

That approach might appeal to enterprises hesitant to use Copilot due to copyright concerns, Thurai said.

"Another important factor is that [StarCoder2] is trained in 619 programming languages," he said. "This can help programmers come up to speed on pretty much any language."

StarCoder2 is likely to find a home among vendors that have their own domain-specific language or offer software platforms, to create custom AI coding assistants, Cohen predicted. ServiceNow already made a domain-specific Now LLM available in September based on the first version of StarCoder.

"Workflow generation in addition to code generation coming from a workflow-rich company [such as] ServiceNow could be a value-add" for forthcoming AI coding assistants built using StarCoder2, Thurai said.

Beth Pariseau, senior news writer for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out on X, formerly known as Twitter, @PariseauTT.

Dig Deeper on Software design and development

Search Cloud Computing

Beyond replacement: How AI is enhancing PaaS offerings
AI is transforming PaaS with automation and cost-efficient features, but will it eventually replace cloud platforms? Industry ...
The cloud's role in PQC migration
Even though Q-Day might be several years away, enterprises should develop a strategic plan to prepare for the future. Experts ...
Prioritize security from the edge to the cloud
Businesses can find security vulnerabilities when they push their workloads to the edge. Discover the pitfalls of cloud edge ...

Search App Architecture

8 best practices for creating architecture decision records
An ADR is only as good as the record quality. Follow these best practices to establish a dependable ADR creation and maintenance ...
Refactor vs. rewrite: Deciding how to fix problem software
At some point, all developers must decide whether to refactor code or rewrite it. Base this choice on factors such as ...
Understanding API proxy vs. API gateway capabilities
API proxies and gateways help APIs talk to applications, but it can be tricky to understand vendor language around different ...

Search ITOperations

Server administrator certifications: 5 nontech certs you need
Discover five nontechnical certifications that can advance your IT career by adding crucial business and leadership competencies ...
Infrastructure-as-code tools advance platform evolution
Infrastructure as code still anchors IT automation, but its primary users are now platform engineers, prompting ongoing shifts in...
The cost of Kubernetes cluster sprawl and how to manage it
Kubernetes cluster sprawl undermines efficiency and security. Implement governance, standardization and monitoring to balance ...

TheServerSide.com

Acceptance criteria vs. definition of done: What's the difference?
Software teams must understand the important distinction between acceptance criteria and definition of done and how to use them ...
Spring, Quarkus or Jakarta EE? How to choose a Java framework
Choosing a Java framework is not about which one is best, it's about accepting their tradeoffs of stability, flexibility and ...
The case against vibe coding
Is vibe coding a bad idea for enterprises? AI can produce results faster than manual coding, but its benefits eventually unravel ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

Close