putilov_denis - stock.adobe.com
Foundation models will form the basis of generative AI's future in the enterprise.
Large language models (LLMs) fall into a category called foundation models. Language models take language input and generate synthesized output. Foundation models work with multiple data types. They are multimodal, meaning they work in other modes besides language.
This enables businesses to draw new connections across data types and expand the range of tasks that AI can be used for. As a starting point, a company can use foundation models to create custom generative AI models, using a tool such as LangChain, with features tailored to its use case.
The GPT-n class of LLMs has become a prime example of this. The release of powerful LLMs like GPT-4 spurred discussions of artificial general intelligence -- basically, saying that AI can do anything. Since its release, numerous applications powered by GPTs have been created.
GPT-4 and other foundation models are trained on a broad corpus of unlabeled data and can be adapted to many tasks. That's what makes it a foundation model.
What is a foundation model?
Foundation models are a new paradigm in AI system development. AI was previously trained on task-specific data to perform a narrow range of functions.
A foundation model is a large-scale machine learning model trained on a broad data set that can be adapted and fine-tuned for a wide variety of applications and downstream tasks. Foundation models are known for their generality and adaptability.
GPT-4, Dall-E 2 and Bidirectional Encoder Representations from Transformers (BERT) are all foundation models. The term was coined in a 2021 paper by authors at the Stanford Center for Research on Foundation Models and Stanford Institute for Human-Centered Artificial Intelligence (HAI) in a 2021 paper called "On the Opportunities and Risks of Foundation Models."
The authors of the paper stated: "While many of the iconic foundation models at the time of writing are language models, the term language model is simply too narrow for our purpose: as we describe, the scope of foundation models goes well beyond language."
Characteristics of foundation models
The main traits of foundation models include the following:
- Scale. To make foundation models powerful, there are three ingredients that enable scale for foundation models:
- Hardware improvements. GPUs, which power foundation models' chips, have significantly increased throughput and memory.
- Transformer model architecture. Transformers are the machine learning model architecture that powers many language models, like BERT and GPT-4.
- Data availability. There is a lot of data for these models to train on and learn from. Foundation models need large quantities of unstructured data to train.
- Traditional training. Foundation models use traditional machine learning training methods, such as a combination of unsupervised and supervised learning, or reinforcement learning from human feedback.
- Transfer learning. By using knowledge learned from one task and applying it to another, models use transfer learning on surrogate tasks and then fine-tune to a specific one. Pre-training is the type of transfer learning used in the GPT-n series of language models -- it's what the P stands for.
- Emergence. Emergence means that model behavior is induced rather than explicit construction. The model produces results that are not directly related to any one mechanism in the model.
- Homogenization. Homogenization means that a wide range of applications could be powered by a single generic learning algorithm. The same underlying method is used in many domains. The Stanford Institute HAI paper stated that almost all state-of-the-art natural language processing (NLP) models are adapted from one of only a few foundation models.
The name foundation model underscores the fundamental incompleteness of the models, according to the paper. They are the foundation for specific spinoff models that are trained to accomplish a narrower, more specialized set of tasks. The authors of the Stanford HAI paper stated: "We also chose the term 'foundation' to connote the significance of architectural stability, safety, and security: poorly constructed foundations are a recipe for disaster and well-executed foundations are a reliable bedrock for future applications."
Examples of foundation model applications
Foundation models are fine-tuned to create apps. GPT-3 and GPT-4 have become the basis for many applications in the short time they've been around, with ChatGPT being the most notable. A paper from researchers at OpenAI, OpenResearch and the University of Pennsylvania posited that GPTs -- the AI model -- exhibit qualities of general-purpose technologies. General-purpose technologies, such as the steam engine, printing press and GPTs, are characterized by widespread proliferation, continuous improvement and the generation of complementary innovations. These complementary technologies can work with, support or build on top of the GPT.
The paper's findings showed that, with access to an LLM -- a type of foundation model -- about 15% of all worker tasks in the U.S. could be completed significantly faster at the same level of quality.
One example of a foundation model is Microsoft's Florence. It is used to provide production-ready computer vision services in Azure AI Vision. The application uses the model to analyze images, read text and detect faces with pre-built image tagging.
Sweden is attempting to build a foundational LLM for all major languages in the Nordic region: Danish, Swedish, Icelandic, Norwegian and Faroese. It would be used primarily by the public sector. The Swedish consortium running the project has gained access to the supercomputer Berzelius, along with hardware and software help from Nvidia. The model is still in development, but early versions are available on Hugging Face.
Hugging Face is an open source repository of many LLMs, sort of like a GitHub for AI. It provides tools that enable users to build, train and deploy machine learning models.
How are foundation models used?
Foundation models serve as the base for more specific applications. A business can take a foundation model, train it on its own data and fine-tune it to a specific task or a set of domain-specific tasks.
Several platforms, including Amazon SageMaker, IBM Watsonx, Google Cloud Vertex AI and Microsoft Azure AI provide organizations with a service for building, training and deploying AI models.
For example, an organization could use one of these platforms to take a model from Hugging Face, train the model on its proprietary data and fine-tune the model using prompt engineering.
Opportunities and risks of foundation models
Foundation models are multimodal because they have multiple capabilities, including language, audio and vision.
Because of their general adaptability, foundation models could provide numerous opportunities and use cases in a variety of different industries, including the following:
- Healthcare. Foundation models also show promise for generative tasks, like drug discovery. An IBM foundation model -- Controlled Generation of Molecules (CogMol) -- recently generated a set of new COVID-19 antivirals using a common architecture called a variational autoencoder. IBM's MoLFormer-XL is another foundation model currently being used by Moderna to design messenger RNA medicines.
- Law. Law uses generative tasks that foundation models could help with. However, they currently lack the reasoning ability to generate truthful documents. If they could be developed to show provenance and guarantee factuality, then they would be beneficial in this field.
- Education. Education is a complex domain that requires nuanced human interaction to understand student's goals and learning styles. There are many individual data streams in education that together are too limited to train foundation models. Still, foundation models could be broadly applicable to generative tasks, like problem generation.
Despite their broad potential, foundation models pose many risks, including the following:
- Bias. Because foundation models stem from only a core few technologies, inherent biases in those few models might spread through every AI application based on them.
- System. Computer systems are a key bottleneck for scaling model size and data quantity. Training foundation models may require a prohibitively large amount of memory. Training foundation models is expensive and computationally intensive.
- Data availability. Foundation models need access to large amounts of training data to function. If that data is cut off or restricted, they don't have the fuel to function.
- Security. Foundation models represent a single point of failure, which makes them a viable target for cyber attackers.
- Environment. It takes a large environmental toll to train and run large foundation models, like GPT-4.
Foundation model research
"On the Opportunities and Risks of Foundation Models" is just one of the influential research papers about foundation models. AI research is being published at a significant clip. Here are some other foundational AI research papers to know about:
- "Attention Is All You Need." This paper introduced the transformer architecture, which became a new standard in AI systems using NLP.
- "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." This paper introduced BERT, which became a widely used language model for pre-training.
- "Language Models are Few-Shot Learners." This paper introduced GPT-3, which laid the groundwork for ChatGPT. GPT-3 could perform a wide range of NLP tasks with little to no task-specific training.
- "DALL-E: Creating Images from Text." This paper was the basis of DALL-E, an AI that generates images from natural language output.