Compare large language models vs. generative AI
While large language models like ChatGPT grab headlines, the generative AI landscape is far more diverse, spanning models that are changing how we create images, audio and video.
For many people, the phrase generative AI brings to mind large language models (LLMs) like OpenAI's ChatGPT. Although LLMs are an important part of the generative AI landscape, they're only one piece of a bigger picture.
LLMs are a specific type of generative AI model specialized for linguistic tasks, such as text generation, question answering and summarization. Generative AI, a broader category, encompasses a much wider variety of model architectures and data types. In short, LLMs are a form of generative AI, but not all generative AI models are LLMs.
What is generative AI?
The term generative AI refers to AI systems that can create new content, such as text, images, audio, video, visual art, conversation and code.
Generative AI models create content by learning from large training data sets using machine learning (ML) algorithms and techniques. For example, a generative AI model tasked with creating new music would learn from a training data set containing a large collection of music. By employing ML and deep learning techniques and relying on its recognition of patterns in music data, the AI system could then create music based on user requests.
Types of generative AI models
Generative AI models are built on several types of ML algorithms, each with different capabilities and features. The following are some of the most common:
- Generative adversarial networks (GANs). Introduced in 2014, GANs are ML models in which two neural networks compete. The first network (the generator) creates original data, while the second (the discriminator) receives data and labels it as either AI-generated or real. By employing deep learning methods and a feedback loop that penalizes the discriminator for each mistake, the GAN learns how to generate increasingly realistic content.
- Variational autoencoders (VAEs). Also introduced in 2014, VAEs use neural networks to both encode and decode data, enabling them to learn techniques for generating new data. The encoder compresses data into a condensed representation, and the decoder then uses this condensed form to reconstruct the input data. In this way, encoding helps the AI represent data more efficiently, and decoding helps it develop more efficient ways of generating data. VAEs can complete a variety of content generation tasks.
- Diffusion models. Created in 2015, diffusion models are popular for image generation. These models work by gradually adding noise to input data over several steps to create a random noise distribution, then reversing this process to generate new data samples from that noise. Many image generation services, like OpenAI's Dall-E and Midjourney, apply diffusion techniques along with other ML algorithms to create highly detailed outputs.
- Transformers. Introduced in 2017 to improve language translation, transformers revolutionized the field of natural language processing (NLP) through their use of self-attention mechanisms. These mechanisms enable transformers to process large volumes of unlabeled text to find patterns and relationships among words or sub-words in the data set. Transformers opened the door for large-scale generative AI models, especially LLMs, many of which rely on transformers to generate contextually relevant text.
- Neural radiance fields (NeRFs). Introduced in 2020, NeRFs employ ML and artificial neural networks to generate 3D content from 2D images. By analyzing 2D images of a scene from various angles, NeRFs can infer the scene's 3D structure, enabling them to produce photorealistic 3D content. NeRFs show potential to advance multiple fields, such as robotics and virtual reality.
Generative AI examples and use cases
Common examples of generative AI include versatile chatbots like OpenAI's ChatGPT and Google Gemini (formerly Bard); image-generating platforms like Midjourney and Dall-E; code generation tools like GitHub Copilot and Amazon CodeWhisperer; and audio generation tools like AudioPaLM and Microsoft Vall-E.
In line with the vast amount of models and tools under its umbrella, generative AI has many use cases. Organizations can use generative AI to create marketing and promotional images, personalize output for users, translate language, compile research, summarize meeting notes and much more. Choosing the right generative AI tool comes down to matching its capabilities with the organization's objectives.
What are large language models?
LLMs are a type of generative AI that deals specifically with text-based content. Traditional LLMs use deep learning algorithms and rely on massive data sets to understand text input and generate new text output, such as song lyrics, social media blurbs, short stories and summaries.
LLMs belong to a class of AI models called foundation models. As the term suggests, LLMs form the fundamental architecture for much of AI language comprehension and generation. Many generative AI platforms, including ChatGPT, rely on LLMs to produce realistic output.
The LLM evolution
In 1966, the Eliza chatbot debuted at MIT. While not a modern language model, Eliza was an early example of NLP; the program engaged in dialogue with users by recognizing keywords in their natural-language input and choosing a reply from a set of preprogrammed responses.
After the first AI winter -- the period between 1974 and 1980 when AI funding lagged -- the 1980s saw a resurgence of interest in NLP. Advancements in areas such as part-of-speech tagging and machine translation helped researchers better understand the structure of language, laying the groundwork for the development of small language models. Improvements in ML techniques, GPUs and other AI-related technology in the years that followed enabled developers to create more intricate language models that could handle more complex tasks.
With the 2010s came further exploration of generative AI models' capabilities, with deep learning, GANs and transformers scaling the ability of generative AI -- LLMs included -- to analyze large amounts of training data and improve their content-creation abilities. By 2018, major tech companies had begun releasing transformer-based language models that could handle vast amounts of training data (therefore dubbed large language models).
Google's Bert and OpenAI's GPT-1 were among the first LLMs. In the years since, an LLM arms race ensued, with updates and new versions of LLMs rolling out nearly constantly since the public launch of ChatGPT in late 2022. Recent LLMs like GPT-4 offer multimodal capabilities, meaning that the model is able to work with other mediums, such as images and audio, along with language.
LLM examples and use cases
LLM examples include OpenAI's GPT-3.5 and GPT-4, Google's Palm and Gemini models, and Meta's Llama series of open source models.
LLMs have many use cases and benefits. Organizations can use traditional LLMs for text generation, translation, summarization, content classification, rephrasing text, sentiment analysis and conversational chatbots. Newer multimodal LLMs widen that scope, with models such as GPT-4 making it possible for the LLM to handle use cases like image generation.
Multimodality and LLMs
The emerging category of multimodal AI blurs the lines between LLMs and other types of generative AI. Multimodal generative models expand on the capabilities of traditional LLMs by adding the ability to understand other data types: Rather than solely handling text, multimodal models can also interpret and generate data formats such as images and audio. For example, users can now upload images to ChatGPT that the model can then incorporate into its text-based dialogues, as shown in the screenshot below.
LLMs vs. generative AI: How are they different?
LLMs differ from other types of generative AI in a few key ways, including their capabilities, model architectures, training data and limitations.
Capabilities
Common LLM capabilities include the following:
- Text generation. LLMs can produce coherent, context-aware text based on a user's input, from marketing collaterals to fiction passages to software code.
- Translation. LLMs can translate text from one language to another, although they typically fare worse than purpose-built translation models and struggle with less common languages.
- Question answering. Although their ability to provide factual answers is limited, LLMs can explain concepts by simplifying terminology or using analogies, offer advice on certain topics, and answer many natural-language questions.
- Summarization. LLMs can summarize and identify key arguments in lengthy passages of text. Google's Gemini 1.5 Pro, for example, can analyze up to a million tokens in one go -- equivalent to roughly 750,000 words or nine average-length novels.
- Dialogue. LLMs can simulate conversation by providing responses in a back-and-forth dialogue, making them ideal for chatbots and virtual assistants.
Generative AI, in contrast, is a much broader category. Its capabilities include the following:
- Image generation. Models like Midjourney and Dall-E can produce images based on users' textual prompts. Some, such as Adobe Firefly, can also edit portions of human-created images -- for example, generating a new background for a portrait.
- Video generation. A newer category in the generative AI landscape, models like OpenAI's Sora can generate realistic or animated video clips based on users' prompts.
- Audio generation. These models can generate music, speech and other types of audio. For example, Eleven Labs' voice generator can produce spoken audio from users' textual input, and Google's Lyria model can generate instrumental and vocal music.
- Data synthesis. Generative models can create artificial data that mimics -- and can be used in place of -- real-world data. While synthetic data can present problems if relied on too heavily, it's useful for training ML models when real data is hard to come by or particularly sensitive. For example, a team training a medical model could use synthetic data to avoid or minimize the use of personal health information.
Model architecture
The underlying algorithms used to build LLMs have some differences from those used in other types of generative AI models.
Most of today's LLMs rely on transformers for their core architecture. Transformers' use of attention mechanisms makes them well suited to understanding long passages of text, as they can develop a model of the relationships among words and their relative importance. Notably, transformers aren't unique to LLMs; they can also be used in other types of generative AI models, such as image generators.
However, there are some model architectures used for non-language generative AI models that aren't used in LLMs. One noteworthy example is convolutional neural networks (CNNs), which are primarily used in image processing. CNNs are specialized for analyzing images to decipher notable features, from edges and textures to entire objects and scenes.
Model training
Training data and model architecture are closely linked, as the nature of a model's training data affects the choice of algorithm.
As their name suggests, LLMs are trained on vast language data sets. While the data used to train LLMs typically comes from a wide range of sources -- from novels to news articles to Reddit posts -- it's ultimately all text. Training data for other generative AI models, in contrast, can vary widely -- it might include images, audio files or video clips, depending on the model's purpose.
Due to these differences in data types, the training process differs for LLMs versus other types of generative AI. For example, the data preparation stages for an LLM and an image generator involve different preprocessing and normalization techniques. The scope of training data could also differ: An LLM's data set should be very broad to ensure that it learns the fundamental patterns of human language, whereas a generative model with a narrow purpose would need a more targeted training set.
Challenges and limitations
Training any generative AI model, including an LLM, entails certain challenges, including how to handle bias and the difficulty of acquiring sufficiently large data sets. However, LLMs also face some unique problems and limitations.
One significant challenge is the complexity of text compared with other types of data. Think about the range of human language available online: everything from dense technical writing to Elizabethan poetry to Instagram captions. That's not to mention more basic language issues, like learning how to interpret an odd idiom or use a word with multiple context-dependent meanings. Even advanced LLMs sometimes struggle to grasp these subtleties, leading to hallucinations or inappropriate responses.
Another challenge is maintaining coherence over long stretches. Compared with other types of generative AI models, LLMs are often asked to analyze longer prompts and produce more complex responses. LLMs can generate high-quality short passages and understand concise prompts with relative ease, but the longer the input and desired output, the likelier the model is to struggle with logic and internal consistency.
This latter limitation is especially dangerous because hallucinations aren't always as obvious with LLMs as with other types of generative AI; LLMs' output can sound fluent and seem confident even when inaccurate. You're likely to notice if an image generator produces a picture of a person with eight fingers on each hand or a coffee cup floating over a table, for instance, but you might not pick up on a factual error in an LLM's well-written summary of a complex scientific concept you know little about.
Lev Craig covers AI and machine learning as the site editor for TechTarget Enterprise AI. Craig graduated from Harvard University and has previously written about enterprise IT, software development and cybersecurity.
Olivia Wisbey is the associate site editor for TechTarget Enterprise AI. She graduated from Colgate University, where she served as a peer writing consultant at the university's Writing and Speaking Center.