Gemma
Gemma is a collection of lightweight open source generative AI (GenAI) models designed mainly for developers and researchers. Gemma was created by the Google DeepMind research lab that also developed closed source Gemini, Google's generative AI chatbots. Google makes Gemma available in several sizes and for use with popular developer tools and Google Cloud services.
The name Gemma comes from the Latin word for precious stone. Google released Gemma on Feb. 21, 2024, with two models: Gemma 2B and Gemma 7B. These are text-to-text, decoder large language models (LLMs) with pretrained and instruction-tuned variants. Gemma 2B has a neural network of 2 billion parameters and Gemma 7B has a neural network of 7 billion parameters. Gemma is not as large and powerful as popular AI models, such as OpenAI's ChatGPT-4 and Google's Gemini Ultra and Pro chatbots -- which have trillions of parameters. However, Gemma's compact lightweight models can run on laptop or desktop computers because they have faster inference speeds and lower computational demands.
Gemma also runs across mobile devices and public clouds. Nvidia worked with Google to optimize Gemma to run on its graphics processing units (GPUs). Because of this wide support for platforms and hardware, Gemma can run on GPUs, central processing units or Google Clouds' Tensor Processing Units (TPUs).
Google allows commercial usage and distribution of Gemma and plans to expand the Gemma family.
How is Gemma different from other AI models?
Gemma has several distinct differences from popular AI chatbots, including Google's Gemini. Gemma stands out for being open and lightweight. Gemini and ChatGPT are closed models, and neither is lightweight enough to run on laptops. Because ChatGPT and Gemini are closed, developers cannot customize their code as they can with the open source Gemma.
This article is part of
What is gen AI? Generative AI explained
Gemma is not Google's first open AI model, but it is more advanced in its training and performance compared to older models Bert and T5. OpenAI, the developer of ChatGPT, has yet to release any open source models.
Google also has pretrained and instruction-tuned Gemma models to run on laptops and workstations. Similar to Gemma, Meta's Llama 2 is an open source AI model that can run on laptops. Llama 2 is more of a business tool than Gemma but is also available to developers through Hugging Face and other platforms. Gemma is generally considered better at scientific tasks while Llama 2 is better for general-purpose tasks.
Other open source AI models include Bionic GPT, GPT-Neo, Mistral AI, Hugging Face Falcon 180B, Bloom, Databricks Dolly and Cerebras-GPT. Some of these are much larger than Gemma, and others are mostly developed for specific use cases or vertical markets.
Another difference between Gemma and Gemini is the type of transformer it uses to change an input sequence to an output sequence. Models can use a decoder transformer, encoder transformer or a hybrid of the two.
Decoders generate outputs in the form of new texts, such as answers to user queries. These are different than encoder models that process inputs and understand their context. While decoder models are used for generative AI, encoder models handle tasks such as classifying text, answering questions and analyzing texts for emotional tone.
Gemma and ChatGPT use a decoder transformer. Because they are decoder-only, Gemma and ChatGPT work for text-to-text LLMs but not for images and videos. Google Gemini uses both a decoder and encoder architecture. That architecture facilitates Gemini's multimodal capability, enabling it to support voice and images as well as text in both user prompts and its responses.
What is Gemma used for?
Developers can use Gemma to build their own AI applications, such as chatbots, text summarization tools and other retrieval-augmented generation applications. Because it is lightweight, Gemma is a good fit for real-time GenAI applications that require low latency, such as streaming text.
Gemma is available through popular developers' tools, including Colab and Kaggle notebooks and frameworks such as Hugging Face Transformers, JAX, Keras 3.0 and PyTorch.
Gemma models can be deployed on Google Cloud's Vertex AI machine learning platform and Google Kubernetes Engine (GKE). Google Vertex AI lets application builders optimize Gemma for specific use cases, such as text generation summarization and Q&A. Running Gemma on GKE enables developers to build their own fine-tuned models in portable containers.
Gemma is optimized to run across popular AI hardware, including Nvidia GPUs and Google Cloud TPUs. Nvidia collaborated with Google to support Gemma through the Nvidia TensorRT-LLM open source library for optimizing LLM inference and Nvidia GPUs running in the data center, in the cloud and locally on workstations and PCs.
Gemma has been pretrained on large data sets. This saves developers the cost and time of building data sets from scratch and gives them a foundation that they can customize to build their applications. Pretrained models can help build AI apps in areas such as natural language processing (NLP), speech AI, computer vision, healthcare, cybersecurity and creative arts.
Google said Gemma was trained on a diverse set of English-language web text documents to expose it to a range of linguistic styles, topics and vocabulary. Google also trained Gemma in programming language code and mathematical text to help it generate code and answer code-related and mathematical questions.
Who can use Gemma?
Although Gemma can be used by anyone, it is designed mainly for developers. Because it is open sourced, lightweight and widely available through developer platforms and hardware devices, Gemma is said to "democratize AI."
However, there are risks to making open AI models for commercial use. Bad actors can use AI to develop applications that infringe on privacy or spread disinformation or toxic content.
Google has taken steps to address those dangers with Gemma. It released a Responsible Generative AI Toolkit for Gemma with best practices for using open AI responsibly. The toolkit provides guidance for setting safety policies for tuning, classifying, and evaluating models and a Learning Interpretability Tool to help developers understand natural language processing (NLP) model behavior. It also includes a methodology for building robust safety classifiers.
When launching Gemma, Google said it was built "to assist developers and researchers in building AI responsibly." Gemma's terms of use prohibit offensive, illegal or unethical applications.
Google also claims Gemma is pretrained by DeepMind to omit harmful, illegal and biased content, as well as personal and sensitive information. It also released its model documentation detailing its capabilities, limitations and biases.
Developers and researchers have free access to Gemma in Kaggle and Colab, an as-a-service Jupyter Notebook version. First-time Google Cloud users can receive $300 in credits when using Gemma, and researchers can apply for up to $500,000 in Google Cloud credits for their Gemma projects.
Recent updates to Gemma
In April 2024, Google released Gemma 1.1, which introduced performance improvements and bug fixes, and announced the addition of two pretrained variants to the Gemma family of products: one for coding, and one designed for inference and research purposes.
CodeGemma and RecurrentGemma
CodeGemma offers code completion and generation tasks, along with instruction-following capabilities. Google cited a number of advantages to using this model, including the following:
- Its ability to generate code, even large sections, locally or when using cloud resources.
- Enhanced accuracy related to being "trained on 500 billion tokens of primarily English-language data."
- Its multilanguage proficiency, as CodeGemma understands and can work with a number of programming languages, including Python, JavaScript, Java, Kotlin and C++ among others.
The open sourced, lightweight model is available in three sizes: a 7B pretrained variant for code completion and code generation tasks; a 7B instruction-tuned variant for code chat and instruction-following; and a 2B pretrained variant for fast code completion that fits on a computer.
RecurrentGemma uses recurrent neural networks and local attention to optimize memory usage. Google said that while the model is similar in performance to the Gemma 2B model, its "unique architecture" has lower memory requirements than other models. This means it can generate longer samples on devices with limited memory, such as single GPUs or CPUs.
Google also highlighted the model's ability to handle higher batch sizes, resulting in faster generation, and touted its non-transformer architecture as a breakthrough in deep learning research.
Both CodeGemma and RecurrentGemma are built with JAX and are compatible with JAX, PyTorch, Hugging Face Transformers and Gemma.cpp.
CodeGemma is also compatible with Keras, Nvidia NeMo, TensorRT-LLM, Optimum-Nvidia, MediaPipe and available on Vertex AI. RecurrentGemma will add support for these products soon.
PaliGemma and Gemma 2
In May 2024, Google released PaliGemma, a lightweight vision language model (VLM) based on open components such as the SigLIP vision model and Gemma language model. It was inspired by Pali-3 and is best used to add captions for images and short videos, visual question and answering, understanding image text, detecting objects and object segmentation.
PaliGemma is available on GitHub, Hugging Face models, Kaggle, Vertex AI Model Garden and Ai.nvidia.com accelerated with TensorRT-LLM. Integration is available through JAX and Hugging Face Transformers.
According to Google, the next generation of Gemma should launch in June and will add a bigger model, Gemma 27B.