Browse Definitions :
Will AI replace jobs? 9 job types that might be affected Pros and cons of AI-generated content

16 of the best large language models

Large language models have been affecting search for years and have been brought to the forefront by ChatGPT and other chatbots.

Large language models are the dynamite behind the generative AI boom of 2023. However, they've been around for a while.

LLMs are black box AI systems that use deep learning on extremely large datasets to understand and generate new text. Modern LLMs began taking shape in 2014 when the attention mechanism -- a machine learning technique designed to mimic human cognitive attention -- was introduced in a research paper titled "Neural Machine Translation by Jointly Learning to Align and Translate." In 2017, that attention mechanism was honed with the introduction of the transformer model in another paper, "Attention Is All You Need."

Some of the most well-known language models today are based on the transformer model, including the generative pre-trained transformer series of LLMs and bidirectional encoder representations from transformers (BERT).

ChatGPT, which runs on a set of language models from OpenAI, attracted more than 100 million users just two months after its release in 2022. Since then, many competing models have been released. Some belong to big companies such as Google and Microsoft; others are open source.

Constant developments in the field can be difficult to keep track of. Here are some of the most influential models, both past and present. Included in it are models that paved the way for today's leaders as well as those that could have a significant effect in the future.

Top current LLMs

Below are some of the most relevant large language models today. They do natural language processing and influence the architecture of future models.


BERT is a family of LLMs that Google introduced in 2018. BERT is a transformer-based model that can convert sequences of data to other sequences of data. BERT's architecture is a stack of transformer encoders and features 342 million parameters. BERT was pre-trained on a large corpus of data then fine-tuned to perform specific tasks along with natural language inference and sentence text similarity. It was used to improve query understanding in the 2019 iteration of Google search.


The Claude LLM focuses on constitutional AI, which shapes AI outputs guided by a set of principles that help the AI assistant it powers helpful, harmless and accurate. Claude was created by the company Anthropic and powers its two main product offerings: Claude Instant and Claude 2. Claude 2 excels at complex reasoning according to Anthropic.


Cohere is an enterprise LLM that can be custom-trained and fine-tuned to a specific company’s use case. The company that created the Cohere LLM was founded by one of the authors of Attention Is All You Need. One of Cohere’s strengths is that it is not tied to one single cloud -- unlike OpenAI, which is bound to Microsoft Azure.


Ernie is Baidu’s large language model which powers the Ernie 4.0 chatbot. The bot was released in August 2023 and has garnered more than 45 million users. Ernie is rumored to have 10 trillion parameters. The bot works best in Mandarin but is capable in other languages.

Falcon 40B

Falcon 40B is a transformer-based, causal decoder-only model developed by the Technology Innovation Institute. It is open source and was trained on English data. The model is available in two smaller variants as well: Falcon 1B and Falcon 7B (1 billion and 7 billion parameters). Amazon has made Falcon 40B available on Amazon SageMaker. It is also available for free on GitHub.


Just three days into its public release in November 2022, Galactica was Meta's LLM designed specifically for scientists. It was trained on a collection of academic material -- 48 million papers, lecture notes, textbooks and websites. As most models do, it produced AI "hallucinations" that members of the scientific community deemed unsafe because they sounded authoritative. This made them hard to detect quickly and were generated in a domain that requires little margin for error.


GPT-3 is OpenAI's large language model with more than 175 billion parameters, released in 2020. GPT-3 uses a decoder-only transformer architecture. In September 2022, Microsoft announced it had exclusive use of GPT-3's underlying model. GPT-3 is 10 times larger than its predecessor. GPT-3's training data includes Common Crawl, WebText2, Books1, Books2 and Wikipedia.

GPT-3 is the last of the GPT series of models in which OpenAI made the parameter counts publicly available. The GPT series was first introduced in 2018 with OpenAI's paper "Improving Language Understanding by Generative Pre-Training."


GPT-3.5 is an upgraded version of GPT-3 with fewer parameters. GPT-3.5 was fine-tuned using reinforcement learning from human feedback. GPT-3.5 is the version of GPT that powers ChatGPT. There are several models, with GPT-3.5 turbo being the most capable, according to OpenAI. GPT-3.5's training data extends to September 2021.

It was also integrated into the Bing search engine but has since been replaced with GPT-4.


GPT-4 is the largest model in OpenAI's GPT series, released in 2023. Like the others, it's a transformer-based model. Unlike the others, its parameter count has not been released to the public, though there are rumors that the model has more than 170 trillion. OpenAI describes GPT-4 as a multimodal model, meaning it can process and generate both language and images as opposed to being limited to only language. GPT-4 also introduced a system message, which lets users specify tone of voice and task.

GPT-4 demonstrated human-level performance in multiple academic exams. At the model's release, some speculated that GPT-4 came close to artificial general intelligence (AGI), which means it is as smart or smarter than a human. GPT-4 powers Microsoft Bing search, is available in ChatGPT Plus and will eventually be integrated into Microsoft Office products.


Lamda (Language Model for Dialogue Applications) is a family of LLMs developed by Google Brain announced in 2021. Lamda used a decoder-only transformer language model and was pre-trained on a large corpus of text. In 2022, LaMDA gained widespread attention when then-Google engineer Blake Lemoine went public with claims that the program was sentient. It was built on the Seq2Seq architecture.


Large Language Model Meta AI (Llama) is Meta's LLM released in 2023. The largest version is 65 billion parameters in size. Llama was originally released to approved researchers and developers but is now open source. Llama comes in smaller sizes that require less computing power to use, test and experiment with.

Llama uses a transformer architecture and was trained on a variety of public data sources, including webpages from CommonCrawl, GitHub, Wikipedia and Project Gutenberg. Llama was effectively leaked and spawned many descendants, including Vicuna and Orca.


Orca was developed by Microsoft and has 13 billion parameters, meaning it's small enough to run on a laptop. It aims to improve on advancements made by other open source models by imitating the reasoning procedures achieved by LLMs. Orca achieves the same performance as GPT-4 with significantly fewer parameters and is on par with GPT-3.5 for many tasks. Orca is built on top of the 13 billion parameter version of LLaMA.


The Pathways Language Model is a 540 billion parameter transformer-based model from Google powering its AI chatbot Bard. It was trained across multiple TPU 4 Pods -- Google's custom hardware for machine learning. Palm specializes in reasoning tasks such as coding, math, classification and question answering. Palm also excels at decomposing complex tasks into simpler subtasks.

PaLM gets its name from a Google research initiative to build Pathways, ultimately creating a single model that serves as a foundation for multiple use cases. There are several fine-tuned versions of Palm, including Med-Palm 2 for life sciences and medical information as well as Sec-Palm for cybersecurity deployments to speed up threat analysis.


Phi-1 is a transformer-based language model from Microsoft. At just 1.3 billion parameters, Phi-1 was trained for four days on a collection of textbook-quality data. Phi-1 is an example of a trend toward smaller models trained on better quality data and synthetic data.

"We'll probably see a lot more creative scaling down work: prioritizing data quality and diversity over quantity, a lot more synthetic data generation, and small but highly capable expert models," wrote Andrej Karpathy, former director of AI at Tesla and OpenAI employee, in a tweet.

Phi-1 specializes in Python coding and has fewer general capabilities because of its smaller size.


StableLM is a series of open source language models developed by Stability AI, the company behind image generator Stable Diffusion. There are 3 billion and 7 billion parameter models available and 15 billion, 30 billion, 65 billion and 175 billion parameter models in progress at time of writing. StableLM aims to be transparent, accessible and supportive.

Vicuna 33B

Vicuna is another influential open source LLM derived from Llama. It was developed by LMSYS and was fine-tuned using data from It is smaller and less capable that GPT-4 according to several benchmarks, but does well for a model of its size. Vicuna has only 33 billion parameters, whereas GPT-4 has trillions.

LLM precursors

Although LLMs are a recent phenomenon, their precursors go back decades. Learn how recent precursor Seq2Seq and distant precursor ELIZA set the stage for modern LLMs.


Seq2Seq is a deep learning approach used for machine translation, image captioning and natural language processing. It was developed by Google and underlies some of their modern LLMs, including LaMDA. Seq2Seq also underlies AlexaTM 20B, Amazon's large language model. It uses a mix of encoders and decoders.


Eliza was an early natural language processing program created in 1966. It is one of the earliest examples of a language model. Eliza simulated conversation using pattern matching and substitution. Eliza, running a certain script, could parody the interaction between a patient and therapist by applying weights to certain keywords and responding to the user accordingly. The creator of Eliza, Joshua Weizenbaum, wrote a book on the limits of computation and artificial intelligence.

Next Steps

Generative AI challenges that businesses should consider

Generative AI ethics: 8 biggest concerns

Generative AI landscape: Potential future trends

Generative models: VAEs, GANs, diffusion, transformers, NeRFs

AI content generators to explore

Dig Deeper on Data analytics and AI

  • firewall as a service (FWaaS)

    Firewall as a service (FWaaS), also known as a cloud firewall, is a service that provides cloud-based network traffic analysis ...

  • private 5G

    Private 5G is a wireless network technology that delivers 5G cellular connectivity for private network use cases.

  • NFVi (network functions virtualization infrastructure)

    NFVi (network functions virtualization infrastructure) encompasses all of the networking hardware and software needed to support ...

  • cybersecurity

    Cybersecurity is the practice of protecting internet-connected systems such as hardware, software and data from cyberthreats.

  • Advanced Encryption Standard (AES)

    The Advanced Encryption Standard (AES) is a symmetric block cipher chosen by the U.S. government to protect classified ...

  • operational risk

    Operational risk is the risk of losses caused by flawed or failed processes, policies, systems or events that disrupt business ...

  • Risk Management Framework (RMF)

    The Risk Management Framework (RMF) is a template and guideline used by companies to identify, eliminate and minimize risks.

  • robotic process automation (RPA)

    Robotic process automation (RPA) is a technology that mimics the way humans interact with software to perform high-volume, ...

  • spatial computing

    Spatial computing broadly characterizes the processes and tools used to capture, process and interact with three-dimensional (3D)...

  • OKRs (Objectives and Key Results)

    OKRs (Objectives and Key Results) encourage companies to set, communicate and monitor organizational goals and results in an ...

  • cognitive diversity

    Cognitive diversity is the inclusion of people who have different styles of problem-solving and can offer unique perspectives ...

  • reference checking software

    Reference checking software is programming that automates the process of contacting and questioning the references of job ...

Customer Experience
  • martech (marketing technology)

    Martech (marketing technology) refers to the integration of software tools, platforms, and applications designed to streamline ...

  • transactional marketing

    Transactional marketing is a business strategy that focuses on single, point-of-sale transactions.

  • customer profiling

    Customer profiling is the detailed and systematic process of constructing a clear portrait of a company's ideal customer by ...