Tech Accelerator What is GenAI? Generative AI explained

Prev Next

Definition

What is the Google Gemini AI model (formerly Bard)?

By

Cameron Hashemi-Pour, Former Site Editor
Sean Michael Kerner
Andy Patrizio

Published: Jan 08, 2025

Google Gemini -- formerly known as Bard -- is an artificial intelligence (AI) chatbot tool designed by Google to simulate human conversations using natural language processing (NLP) and machine learning. In addition to supplementing Google Search, Gemini can be integrated into websites, messaging platforms or applications to provide realistic, natural language responses to user questions.

Google Gemini is a family of multimodal AI large language models (LLMs) that have capabilities in language, audio, code and video understanding.

Gemini 1.0 was announced on Dec. 6, 2023, and built by Alphabet's Google DeepMind business unit, which is focused on advanced AI research and development. Google co-founder Sergey Brin is credited with helping to develop the Gemini LLMs, alongside other Google staff.

At its release, Gemini was the most advanced set of LLMs at Google, powering Bard before Bard's renaming and superseding the company's Pathways Language Model (Palm 2). As was the case with Palm 2, Gemini was integrated into multiple Google technologies to provide generative AI capabilities.

On Dec. 11, 2024, Google released an updated version of its LLM with Gemini 2.0 Flash, an experimental version incorporated in Google AI Studio and the Vertex AI Gemini application programming interface (API).

This article is part of

What is GenAI? Generative AI explained

Which also includes:
8 top generative AI tool categories for 2025
Will AI replace jobs? 18 job types that might be affected
27 of the best large language models in 2025

Gemini integrates NLP capabilities, which provide the ability to understand and process language. Gemini is also used to comprehend input queries as well as data. It's able to understand and recognize images, enabling it to parse complex visuals, such as charts and figures, without the need for external optical character recognition (OCR). It also has broad multilingual capabilities for translation tasks and functionality across different languages.

Unlike prior AI models from Google, Gemini is natively multimodal, meaning it's trained end to end on data sets spanning multiple data types. As a multimodal model, Gemini enables cross-modal reasoning abilities. That means Gemini can reason across a sequence of different input data types, including audio, images and text. For example, Gemini can understand handwritten notes, graphs and diagrams to solve complex problems. The Gemini architecture supports directly ingesting text, images, audio waveforms and video frames as interleaved sequences.

How does Google Gemini work?

Google Gemini is first trained on a massive corpus of data. After training, the model uses several neural network techniques to understand content, answer questions, generate text and produce outputs.

Specifically, the Gemini LLMs use a transformer model-based neural network architecture. The Gemini architecture has been enhanced to process lengthy contextual sequences across different data types, including text, audio and video. Google DeepMind uses efficient attention mechanisms in the transformer decoder to help the models process long contexts, spanning different modalities.

Gemini models have been trained on diverse multimodal and multilingual data sets of text, images, audio and video with Google DeepMind using advanced data filtering to optimize training. As different Gemini models are deployed in support of specific Google services, there's a process of targeted fine-tuning that can be used to further optimize a model for a use case. During both the training and inference phases, Gemini benefits from the use of Google's latest tensor processing unit chips, Trillium, the sixth generation of Google Cloud TPU. Trillium TPUs provide improved performance, reduced latency and lower costs compared with the TPU v5. They're also more energy efficient than the previous version.

List of tasks Google Gemini can perform. — Google Gemini can be applied pragmatically to complete various tasks.

A key challenge for LLMs is the risk of bias and potentially toxic content. According to Google, Gemini underwent extensive safety testing and mitigation around risks such as bias and toxicity to help provide a degree of LLM safety. To further ensure Gemini works as it should, the models were tested against academic benchmarks spanning language, image, audio, video and code domains. Google has assured the public it adheres to a list of AI principles.

At launch on Dec. 6, 2023, Google said Gemini would comprise a series of different model sizes, each designed for a specific set of use cases and deployment environments. The Ultra model is the top end and is designed for highly complex tasks. The Pro model is designed for performance and deployment at scale. As of Dec. 13, 2023, Google enabled access to Gemini Pro in Google Cloud Vertex AI and Google AI Studio. For code, a version of Gemini is used to power the Google AlphaCode 2 generative AI coding technology.

The Nano model is targeted at on-device use cases. There are two different versions of Gemini Nano: The Nano-1 model has 1.8 billion parameters, while Nano-2 has 3.25 billion parameters. Among the places where Nano is being embedded is the Google Pixel 9 smartphone.

When was Google Bard first released?

Google initially announced Bard, its AI-powered chatbot, on Feb. 6, 2023, with a vague release date. It opened access to Bard on March 21, 2023, inviting users to join a waitlist. On May 10, 2023, Google removed the waitlist and made Bard available in more than 180 countries and territories. Almost precisely a year after its initial announcement, Bard was renamed Gemini.

Many believed that Google felt the pressure of ChatGPT's success and positive press, leading the company to rush Bard out before it was ready. For example, during a live demo by Google and Alphabet CEO Sundar Pichai, it responded to a query with a wrong answer.

In the demo, a user asked Bard the question: "What new discoveries from the James Webb Space Telescope can I tell my 9-year-old about?" In Bard's response, it mentioned that the telescope "took the very first pictures of a planet outside of our own solar system." Astronomers quickly took to social media to point out that the first image of an exoplanet was taken by an earthbound observatory in 2004, making Bard's answer incorrect. The next day, Google lost $100 billion in market value -- a decline attributed to the embarrassing mistake.

Why did Google rename Bard to Gemini and when did it happen?

Bard was renamed Gemini on Feb. 8, 2024. Gemini was already the LLM powering Bard. Some believe rebranding the platform as Gemini might have been done to draw attention away from the Bard moniker and the criticism the chatbot faced when it was first released. It also simplified Google's AI effort and focused on the success of the Gemini LLM.

The name change also made sense from a marketing perspective, as Google aims to expand its AI services. It's a way for Google to increase awareness of its advanced LLM offering as AI democratization and advancements show no signs of slowing.

Who can use Google Gemini?

Gemini is widely available around the world. Gemini Pro is available in more than 230 countries and territories. Gemini Advanced, a service that provides access to Google's most advanced AI models, is available in more than 150 countries and territories. However, there are age limits in place to comply with laws and regulations that exist to govern AI.

Users must be at least 18 years old and have a personal Google account. However, age restrictions vary for the Gemini web app. Users in Europe must be 18 or older. In other countries where the platform is available, the minimum age is 13 unless otherwise specified by local laws. Also, users younger than 18 can only use the Gemini web app in English.

Is Gemini free to use?

Google Gemini is available at no charge to users who are 18 years or older and have a personal Google account, a Google Workspace account with Gemini access, a Google AI Studio account or a school account. The Gemini API also has a free tier.

The most advanced version of Gemini is available through the Gemini Advanced option, which is available to users with a Google Workspace account. Google offers a free one-month trial of the Advanced option; it costs $20 per month after the free trial. Users sign up for Gemini Advanced through a Google One AI Premium subscription, which also includes Google Workspace features and 2 TB of storage. Google Workspace offers two Gemini add-on plans: Gemini Business is available for $20 per user, per month, and Gemini Enterprise is $30 per user, per month.

What can you use Gemini for? Use cases and applications

The Google Gemini models are used in many different ways, including text, image, audio and video understanding. The multimodal nature of Gemini also enables these different types of input to be combined for generating output.

Use cases

Businesses can use Gemini to perform various tasks that include the following:

Text summarization. Gemini models can summarize content from different types of data.
Text generation. Gemini can generate text based on user prompts. That text can also be driven by a Q&A-type chatbot interface.
Text translation. The Gemini models have broad multilingual capabilities, enabling translation and understanding of more than 100 languages.
Image understanding. Gemini can parse complex visuals, such as charts, figures and diagrams, without external OCR tools. It can be used for image captioning and visual Q&A capabilities.
Audio processing. Gemini has support for speech recognition across more than 100 languages and audio translation tasks.
Video understanding. Gemini can process and understand video clip frames to answer questions and generate descriptions.
Multimodal reasoning. A key strength of Gemini is its use of multimodal AI reasoning, where different types of data can be mixed for a prompt to generate an output.
Code analysis and generation. Gemini can understand, explain and generate code in popular programming languages, including Python, Java, C++ and Go.

Applications

Google developed Gemini as a foundation model to be widely integrated across various Google services. It's also available for developers to use in building their own applications. Applications that use Gemini include the following:

AlphaCode 2. Google DeepMind's AlphaCode 2 code generation tool makes use of a customized version of Gemini Pro.
Google Pixel. The Google-built Pixel 8 Pro smartphone was the first device engineered to run Gemini Nano. Gemini powers new features in existing Google apps, such as summarization in Recorder and Smart Reply in Gboard for messaging apps.
Android. The Pixel 8 Pro was the first Android smartphone to benefit from Gemini. Android developers can build with Gemini Nano through the Android operating system's AICore system capability.
Vertex AI. Google Cloud's Vertex AI service, which provides foundation models that developers can use to build applications, also provides access to Gemini Pro.
Google AI Studio. Developers can build prototypes and apps with Gemini using the Google AI Studio web-based tool.
Search. Google has experimented with using Gemini in its AI Overview to reduce latency and improve quality.

What are Gemini's limitations?

A few limitations might cause hesitation among potential end users. These include the following:

Training data. Like all AI chatbots, Gemini must learn to give correct answers. To do this, the models must be trained on correct information that's not inaccurate or misleading. However, they also must be able to identify incorrect or misleading information when it comes their way.
Bias and potential harm. AI training is an endless, compute-intensive process because there's always new information to learn. Across all Gemini models, Google has claimed it has followed responsible development practices, including extensive evaluation to help limit the risk of bias and potential harm.
Originality and creativity. There are limits on how original and creative the content Gemini produces can be. This is particularly the case with the free version, which has had trouble processing complicated prompts, with multiple steps and nuances, and producing adequate output. The free version is based on the Gemini Pro LLM, which is more limited in capabilities; the paid versions of the platform offer access to more advanced features.

What are the concerns about Gemini?

One concern about Gemini revolves around its potential to present biased or false information to users. Any bias inherent in the training data fed to Gemini could lead to issues. For example, as is the case with all advanced AI software, training data that excludes certain groups within a given population will lead to skewed outputs.

Gemini's propensity to generate hallucinations and other fabrications and pass them along to users as truthful is also a concern. This has been one of the biggest risks with ChatGPT responses since its inception, as it is with other advanced AI tools. In addition, because Gemini doesn't always understand context, its responses might not be relevant to the prompts and queries users provide.

What languages is Gemini available in?

Gemini can be used in 46 languages. It can translate text-based inputs into different languages with almost humanlike accuracy. Google plans to expand Gemini's language understanding capabilities and make it ubiquitous. However, there are important factors to consider, such as bans on LLM-generated content or ongoing regulatory efforts in various countries that could limit or prevent future use of Gemini.

Gemini offers other functionality across different languages in addition to translation. For example, it's capable of mathematical reasoning and summarization in multiple languages. It can also generate captions for an image in different languages.

Is image generation available in Gemini?

Upon Gemini's release, Google touted its ability to generate images the same way as other generative AI tools, such as Dall-E, Midjourney and Stable Diffusion. Gemini currently uses Google's Imagen 3 text-to-image model, which gives the tool image generation capabilities.

Gemini's outputs range from simple to complex, depending on end-user inputs. Users provide descriptive prompts to elicit specific images. Users follow a simple step-by-step process to enter a prompt, view the image Gemini generated, edit it and save it for later use.

From late February 2024 to late August 2024, Gemini's image generation feature was halted to undergo retooling after generated images were shown to depict factual inaccuracies. Google improved the image generation feature and upgraded it to Imagen 3.

Gemini vs. GPT-3 and GPT-4

Google Gemini is a direct competitor to the GPT-3 and GPT-4 models from OpenAI. The following table compares some key features of Google Gemini and OpenAI products.

	Gemini	GPT-3 and GPT-4
Developer	Google DeepMind	OpenAI
Chatbot interface	Gemini; formerly Bard	ChatGPT
Modality	Multimodal; trained on text, images, audio and video	Originally built as a text-only language model; GPT-4 is multimodal
Model variations	Size-based variations, including Ultra, Pro and Nano	Optimizations for size, including GPT-3.5 Turbo and GPT-4o
Context window length	Gemini 1.5 Pro has a 2 million-token context window	GPT-4o has a 128,000-token context window

Google Gemini vs. ChatGPT

Both Gemini and ChatGPT are AI chatbots, also known as AI assistants, designed for interaction with people through NLP and machine learning. Both use an underlying LLM for generating and creating conversational text.

ChatGPT uses generative AI to produce original content. For example, users can ask it to write a thesis on the advantages of AI. Gemini uses generative AI as well. Both are geared to make search more natural and helpful as well as synthesize new information in their answers.

In January 2023, Microsoft signed a deal reportedly worth $10 billion with OpenAI to license and incorporate ChatGPT into its Bing search engine to provide more conversational search results, similar to Google Bard at the time. That opened the door for other search engines to license ChatGPT, whereas Gemini supports only Google.

Another similarity between the two chatbots is their potential to generate plagiarized content and their ability to control this issue. Neither Gemini nor ChatGPT has built-in plagiarism detection features that users can rely on to verify that outputs are original. However, separate tools exist to detect plagiarism in AI-generated content, so users have other options. Gemini is able to cite other content in its responses and link to sources. Gemini's double-check function provides URLs to the sources of information it draws from to generate content based on a prompt.

Alternatives to Google Gemini

Gemini didn't spring up in a vacuum. AI chatbots have been around for a while, in less versatile forms. Multiple startup companies have similar chatbot technologies but without the spotlight ChatGPT has received.

Examples of Gemini chatbot competitors that generate original text or code, as mentioned by Audrey Chee-Read, principal analyst at Forrester Research, as well as by other industry experts, include the following:

Chatsonic. Marketed as a "ChatGPT alternative with superpowers," Writesonic's Chatsonic is an AI chatbot powered by Google Search with an AI-based text generator that lets users discuss topics in real time to create text or images.
Claude. Anthropic's Claude is an AI-driven chatbot named after the underlying LLM powering it. It has undergone rigorous testing to ensure it's adhering to ethical AI standards and not producing offensive or factually inaccurate output.
Copy.ai. Copy.ai was originally built to aid sales and marketing teams. It generates original text, such as social media posts, blogs, emails and other types of content, and it also automates workflow tasks.
GitHub Copilot. GitHub Copilot specializes in code generation for developers. The aim is to simplify the otherwise tedious software development tasks involved in producing modern software. While it isn't meant for text generation, it serves as a viable alternative to ChatGPT or Gemini for code generation.
Jasper Chat. Jasper AI's Jasper Chat is a conversational AI tool focused on generating text. It's aimed at companies looking to create brand-relevant content and have conversations with customers. It enables content creators to specify search engine optimization keywords and tone of voice in their prompts.
Microsoft Bing. Microsoft and its partnership with OpenAI offer exactly what Google does with Gemini: AI-powered search that recognizes natural language queries and gives natural language responses. When a user makes a search query, they receive the standard Bing search results and an answer generated by GPT-4, as well as the ability to interact with the AI regarding its response.
SpinBot. This generative AI tool specializes in original text generation as well as rewriting content and avoiding plagiarism. It handles other simple tasks to aid professionals in writing assignments, such as proofreading.
YouChat. This AI chatbot is from the You.com search engine. YouChat answers questions and provides citations for its answers so that users can review the sources and fact-check its responses.

Gemini's history

Gemini, under its original Bard name, was initially designed in March 2023 around search. It aimed to provide more natural language queries, rather than using keywords, for search. Its AI was trained around natural-sounding conversational queries and responses. Instead of giving a list of answers, Bard provided responses with context. Bard AI was designed to help with follow-up questions -- something new to search. It also had a share-conversation function and a double-check function that helped users fact-check generated results.

Bard was integrated with several Google apps and services, including YouTube, Maps, Hotels, Flights, Gmail, Docs and Drive. These integrations let users apply the AI tool to their personal content.

The first version of Bard used a light version of Google's Lamda conversation technology that required less computing power to scale more concurrent users. Subsequent use of the Palm 2 language model made Bard more visual in its responses to user queries. Bard also incorporated Google Lens, letting users upload images in addition to written prompts. The Gemini language model was added later, enabling more advanced reasoning, planning and understanding.

As part of the initial launch of Gemini on Dec. 6, 2023, Google announced Gemini Ultra, Pro and Nano; however, it didn't make Ultra available at the same time as Pro and Nano. Initially, Ultra was only available to select customers, developers, partners and experts; it was fully released in February 2024.

Google is now incorporating Gemini across the Google portfolio, including the Chrome browser and the Google Ads platform, providing new ways for advertisers to connect with and engage users.

Recent updates to Google Gemini

In May 2024, Google announced enhancements to Gemini 1.5 Pro at the Google I/O conference. Upgrades included performance improvements in translation, coding and reasoning features. The upgraded Google 1.5 Pro also improved image and video understanding, including the ability to directly process voice inputs using native audio understanding. The model's context window was increased to 2 million tokens, enabling it to remember much more information when responding to prompts.

Also released in May 2024 was Gemini 1.5 Flash, a smaller model with a sub-second average first-token latency and a 1 million-token context window. In addition to the core model upgrades, Google announced new features to the Gemini API in May, including the following:

Video frame extraction, which lets users upload a video to generate content.
Parallel function calling, which lets users engage in more than one function call at a time.

In June 2024, Google added context caching to ensure users only have to send parts of a prompt to a model once.

Google introduced Gemini 2.0 Flash on Dec. 11, 2024, in an experimental preview through Vertex AI Gemini API and AI Studio. Gemini 2.0 Flash is twice the speed of 1.5 Pro and has new capabilities, such as multimodal input and output, and long context understanding. Other new features include text-to-speech capabilities for image editing and art. The new API has audio streaming applications to assist with native tool use and improved latency. Google plans to roll this new model out to a wider audience in January 2025.

The list of large language models available continues to grow. Learn about the top LLMs, including well-known ones and others that are more obscure.

Continue Reading About What is the Google Gemini AI model (formerly Bard)?

Pros and cons of AI-generated content

Top generative AI benefits for business

Conversational AI vs. generative AI: What's the difference?

AI content generators to explore

How to manage generative AI security risks in the enterprise

Dig Deeper on AI technologies

Search Business Analytics

What makes an effective data science team structure?
Data science team structures vary in strength, and their success depends on how roles and leadership align with business goals to...
Synthetic data vs. real data for predictive analytics
Synthetic data helps simulate rare events and meet privacy compliance, while real data preserves natural variability needed to ...
7 predictive analytics skills to improve simulation modeling
Predictive analytics skills such as statistical analysis, data preprocessing and model evaluation can help data professionals ...

Search CIO

Trump shifts U.S. competition policy
While revoking former President Joe Biden's executive order on competition may make M&A more favorable for tech companies, it ...
How to become a Web 3.0 developer: Required skills and guide
Becoming a Web 3.0 expert means mixing old and new skills.
How to attract tech talent in 2025: 7 essentials
In this time of 'the great churn,' finding and keeping great tech talent sounds merely aspirational. Read on for seven methods ...

Search Data Management

Modern architecture, high-quality data key to AI development
An expert explains how components such as a shared foundation and protocols that ensure quality help foster real-time insights ...
MongoDB launches enterprise-focused AI models
The vendor's new models enhance document context capture and enable developers to guide the reranking process with instructions, ...
Top data quality management tools in 2025
Data quality management tools provide profiling, cleansing and monitoring features that keep enterprise data accurate and ...

Search ERP

7 last-mile delivery metrics to measure success
Getting an accurate picture of last-mile delivery often requires measuring all related operational expenses. Learn more about ...
Is geospatial data the real game changer for digital twins?
In the podcast, the CEO of TwinMatrix Technologies explains the benefits and challenges of adding geospatial capabilities to ...
AI and ERP: The digital labor evolution in manufacturing
Despite hype and growing pains, agentic AI finds a home in the enterprise with manufacturing process functionality.

Close