An explanation of large language models

Sabrina Polin, Managing Editor

In this video, TechTarget editor Sabrina Polin talks about the benefits and challenges of large language models.

Humans need language to communicate, so it makes sense that AI does, too.

A large language model -- or LLM -- is a type of AI algorithm based on deep learning (and huge amounts of data) that can understand, generate and predict new content.

Language models aren't new -- the first AI language model can be traced back to 1966 -- but large language models use a significantly larger pool of data used for training, which means a significant increase in the capabilities of the AI model.

So, just how large are large language models?

Well, there's no universally accepted figure for how large an LLM training data set is, but it's typically in the petabytes range. For context, a single petabyte is equivalent to 1 million gigabytes; the human brain is believed to store about 2.5 petabytes of memory data.

The LLM training consists of multiple steps, usually starting with unsupervised learning, where the model starts to derive relationships between words and concepts, then fine-tuned with supervised learning. The training data then passes through a transformer, which enables the LLM to recognize relationships and connections using a self-attention mechanism.

Once the LLM is trained, it can serve as the base for any AI uses, including the following:

Generate text.
Translate languages.
Summarize or rewrite content.
Organize content.
Analyze sentiment of content, like humor or tone.
And converse naturally with a user, unlike older generations of AI chatbot technologies.

LLMs can be particularly useful as a foundation for customized uses for both businesses and individuals. They're fast, accurate, flexible and easy to train. However, users should heed caution, too. LLMs come with a number of challenges, too, including the following:

The cost of deployment and operation.
Bias, depending on what data it was trained on.
AI hallucinations, where a response is not based on the training data.
Troubleshooting complexity.
And glitch tokens, or words or inputs maliciously designed to make the LLM malfunction.

Sabrina Polin is a managing editor of video content for the Learning Content team. She plans and develops video content for TechTarget's editorial YouTube channel, Eye on Tech. Previously, Sabrina was a reporter for the Products Content team.

View All Videos