your123 - stock.adobe.com

What GPT model limitations mean for the future of GenAI

With concerns like limited reasoning and hallucination risk, will AI developers turn away from the GPT model and toward alternative model types?

It's not hyperbolic to say that generative pre-trained transformer models are a revolutionary breakthrough in machine learning.

GPT models are faster and more flexible than other types of AI models, such as those based on recurrent neural network (RNN) architectures. Without the development of GPT-type models starting nearly a decade ago, AI -- especially generative AI -- as we know it today would not exist.

Despite their advantages, GPT models have some significant limitations. Issues such as hallucinations, difficulty performing logical reasoning and context window constraints make GPT models (and the transformer architecture they are based on) a poor option for certain use cases.

This prompts some important questions. What are GPT models incapable of doing well? Which limitations can AI researchers solve over time? Which ones present impassable obstacles that engineers can only mitigate by using alternative types of models.

To help answer these questions, let's consider how GPT models work, their current limitations and what it all means for the future of GPT models.

What are GPT models?

GPT models are a type of machine learning model that uses transformer architecture and is trained to generate new content.

A unique feature of GPT models is that they process data in parallel. This means they can accept input that includes multiple components -- such as a sentence with multiple words -- and analyze all components simultaneously. The result is faster processing and the ability to interpret complex input efficiently. Parallel processing differentiates GPT-style models from other types of models, such as RNNs, which primarily process data sequentially.

Many of the key concepts behind GPT models and transformer architecture date back decades. However, researchers did not begin implementing usable GPT models until the late 2010s. That is when models such as OpenAI's GPT-1 model and Google's BERT model, both released in 2018, appeared. In subsequent years, improvements upon these and similar models led to the introduction of production-ready generative AI technology, such as ChatGPT, in the early 2020s.

Although OpenAI includes the label GPT in most of its model names, the term GPT can refer more generally to any type of model that uses transformer architecture, employs pretraining -- an unsupervised learning process -- and generates content. Most of today's prominent large language models (LLMs) possess these characteristics and are GPT models in the broad sense of the term.

Note also that while some LLMs are GPT models, not all are. GPT models are a subset of LLM technology.

GPT model limitations

GPT models offer many benefits, but there are also significant limitations inherent to transformer architecture.

1. Hallucinations

Hallucinations -- when generative AI models output incorrect information -- can stem from multiple factors. Some, such as insufficient training data or poorly written prompts, can be mitigated through simple measures -- such as feeding a model more training data or engineering better prompts.

At a more fundamental level, however, GPT model hallucinations result from how the transformer architecture manages context windows. A context window is the amount of data components -- called tokens -- that a model can evaluate simultaneously. Context windows in GPT models are limited because the models process data components in parallel, and attempting to process too many components will starve the model of compute resources.

If context windows are too small, a model might lack sufficient context to generate an accurate response to a query, leading to a hallucination.

Hallucinations can also occur due to the role that attention mechanisms play in transformer architecture. Models use attention mechanisms to determine which components within input data are most relevant to focus on when generating output. In some cases, a model might over-attend, which means it focuses on input components of little or no relevance, leading to output that doesn't make sense in the context of the input.

2. Trouble processing large volumes of data

Context window constraints can also make it difficult for models to interpret long input strings, such as a multipage document. Because GPT models can only process a limited context at a given time, they often need to break lengthy inputs into different parts. They process each one separately and then attempt to combine the results.

At best, this approach is slow and inefficient because it takes more time and requires more compute resources than processing all data simultaneously. At worst, it can lead to the loss of important information because the model drops some data while trying to combine its processing of different input components into one unified output.

For example, if you feed an entire book into a GPT model and ask for a summary, it might forget the names of major characters who appear in some chapters but not others because it breaks the input into chunks and processes them independently.

Services such as ChatGPT can generate book summaries, but that ability is often only possible because the underlying model was trained on data that included book summaries, not because it was trained on entire books and can effectively summarize them.

3. Limited reasoning abilities

Reasoning is the ability to draw conclusions based on logic rather than pattern recognition. GPT models can't perform actual reasoning because, like most ML models, they can only identify data patterns and dependencies within training data. They can't reason through situations that expose them to information not represented in their training data.

For example, consider a prompt like, "What day of the week was August 24, 1572?" Very few humans can immediately answer this question, but most could use logic and math to work backward from the current date and calculate the correct response.

A GPT model, however, wouldn't be able to use logic. It would only know the answer if the date in question and its corresponding weekday happened to be in the model's training data, which is unlikely for dates of no historical significance. Incidentally, August 24, 1572 -- which is a significant date because it is when the Saint Bartholomew's Day Massacre took place in Paris -- was a Sunday, according to the Linux calendar utility cal. ChatGPT told me it was a Sunday, and Gemini said it was a Thursday.

The inability to reason is also why GPT models famously struggle to solve complex math problems -- a challenge in fields such as finance, where accurate calculations are critical. Models can typically solve simple arithmetic because the questions and answers are in their training data. But a conditional word problem -- for instance, one that requires evaluating how a business's cash flow forecast might change based on fluctuating operational costs and revenue -- would be challenging for GPT-style models to answer accurately.

Certain GPT reasoning models, such as OpenAI's o3 model, attempt to solve the reasoning challenge by performing more extensive input analysis before generating a response. For example, a reasoning model might break a complex math problem into multiple steps, solve each one independently and then combine the results.

However, this is not true reasoning because the model is ultimately still pattern-matching by comparing different parts of the problem with patterns it recognizes. It's not using logic to understand information it has never encountered before.

The future of GPT models

Because GPT model limitations stem from the fundamental characteristics of the transformer architecture, they are not challenges that researchers can solve by simply tossing more computing power or training data at their models. Even modifying internal model algorithms is not likely to yield major improvements.

Researchers have two main options to advance generative AI technology beyond its current state. One is to continue to use models based on the transformer architecture but improve how they work. For example, developers could redesign attention mechanisms to reduce over-attention concerns. They can also employ simple techniques like caching to mitigate some of the challenges related to context window constraints.

These measures wouldn't completely overcome GPT model limitations, but they would reduce their effect. That could potentially help make the transformer architecture more viable for use cases that it currently does not support well.

The other option is to think beyond the transformer architecture entirely -- either by revisiting older types of architectures that are no longer as popular, such as RNNs, or creating entirely new types of models.

Because GPT models have proven so successful over the past several years, relatively little research on transformer alternatives has taken place. But there are some intriguing projects and proofs of concept. One example is Megalodon, which offers the parallelization benefits of GPT models without requiring as many compute resources or being subject to tight context window limitations.

State-space models, examples of which include Mamba, also provide the flexibility of GPT models without requiring as many compute resources.

For now, it's a safe bet that GPT models and the transformer architecture will continue dominating the AI market. But eventually, AI developers might lean into alternative types of models that deliver superior results in areas such as hallucination risks, context window management and the ability to reason.

Chris Tozzi is a freelance writer, research adviser, and professor of IT and society who has previously worked as a journalist and Linux systems administrator.

Dig Deeper on AI technologies