GPT-3
What is GPT-3?
GPT-3, or the third-generation Generative Pre-trained Transformer, is a neural network machine learning model trained using internet data to generate any type of text. Developed by OpenAI, it requires a small amount of input text to generate large volumes of relevant and sophisticated machine-generated text.
GPT-3's deep learning neural network is a model with over 175 billion machine learning parameters. To put things into scale, the largest trained language model before GPT-3 was Microsoft's Turing Natural Language Generation (NLG) model, which had 10 billion parameters. As of early 2021, GPT-3 is the largest neural network ever produced. As a result, GPT-3 is better than any prior model for producing text that is convincing enough to seem like a human could have written it.
GPT-3 and other language processing models like it are commonly referred to as large language models.
What can GPT-3 do?
GPT-3 processes text input to perform a variety of natural language tasks. It uses both natural language generation and natural language processing to understand and generate natural human language text. Generating content understandable to humans has historically been a challenge for machines that don't know the complexities and nuances of language. GPT-3 has been used to create articles, poetry, stories, news reports and dialogue using a small amount of input text that can be used to produce large amounts of copy.
GPT-3 can create anything with a text structure -- not just human language text. It can also generate text summarizations and even programming code.
This article is part of
What is gen AI? Generative AI explained
GPT-3 examples
One of the most notable examples of GPT-3's implementation is the ChatGPT language model. ChatGPT is a variant of the GPT-3 model optimized for human dialogue, meaning it can ask follow-up questions, admit mistakes it has made and challenge incorrect premises. ChatGPT was made free to the public during its research preview to collect user feedback. ChatGPT was designed in part to reduce the possibility of harmful or deceitful responses.
Another common example is Dall-E. Dall-E is an AI image generating neural network built on a 12 billion-parameter version of GPT-3. Dall-E was trained on a data set of text-image pairs and can generate images from user-submitted text prompts. ChatGPT and Dall-E were developed by OpenAI.
Using only a few snippets of example code text, GPT-3 can also create workable code that can be run without error, as programming code is a form of text. Using a bit of suggested text, one developer has combined the user interface prototyping tool Figma with GPT-3 to create websites by describing them in a sentence or two. GPT-3 has even been used to clone websites by providing a URL as suggested text. Developers are using GPT-3 in several ways, from generating code snippets, regular expressions, plots and charts from text descriptions, Excel functions and other development applications.
GPT-3 can also be used in the healthcare space. One 2022 study explored GPT-3's ability to aid in the diagnoses of neurodegenerative diseases, like dementia, by detecting common symptoms, such as language impairment in patient speech.
GPT-3 can also do the following:
- create memes, quizzes, recipes, comic strips, blog posts and advertising copy;
- write music, jokes and social media posts;
- automate conversational tasks, responding to any text that a person types into the computer with a new piece of text appropriate to the context;
- translate text into programmatic commands;
- translate programmatic commands into text;
- perform sentiment analysis;
- extract information from contracts;
- generate a hexadecimal color based on a text description;
- write boilerplate code;
- find bugs in existing code;
- mock up websites;
- generate simplified summarizations of text;
- translate between programming languages; and
- perform malicious prompt engineering and phishing attacks.
How does GPT-3 work?
GPT-3 is a language prediction model. This means that it has a neural network machine learning model that can take input text and transform it into what it predicts the most useful result will be. This is accomplished by training the system on the vast body of internet text to spot patterns in a process called generative pre-training. GPT-3 was trained on several data sets, each with different weights, including Common Crawl, WebText2 and Wikipedia.
GPT-3 is first trained through a supervised testing phase and then a reinforcement phase. When training ChatGPT, a team of trainers ask the language model a question with a correct output in mind. If the model answers incorrectly, the trainers tweak the model to teach it the right answer. The model may also give several answers, which trainers rank from best to worst.
GPT-3 has more than 175 billion machine learning parameters and is significantly larger than its predecessors -- previous large language models, such as Bidirectional Encoder Representations from Transformers (BERT) and Turing NLG. Parameters are the parts of a large language model that define its skill on a problem such as generating text. Large language model performance generally scales as more data and parameters are added to the model.
When a user provides text input, the system analyzes the language and uses a text predictor based on its training to create the most likely output. The model can be fine-tuned, but even without much additional tuning or training, the model generates high-quality output text that feels similar to what humans would produce.
What are the benefits of GPT-3?
Whenever a large amount of text needs to be generated from a machine based on some small amount of text input, GPT-3 provides a good solution. Large language models, like GPT-3, are able to provide decent outputs given a handful of training examples.
GPT-3 also has a wide range of artificial intelligence applications. It is task-agnostic, meaning it can perform a wide bandwidth of tasks without fine-tuning.
As with any automation, GPT-3 would be able to handle quick repetitive tasks, enabling humans to handle more complex tasks that require a higher degree of critical thinking. There are many situations where it is not practical or efficient to enlist a human to generate text output, or there might be a need for automatic text generation that seems human. For example, customer service centers can use GPT-3 to answer customer questions or support chatbots; sales teams can use it to connect with potential customers. Marketing teams can write copy using GPT-3. This type of content also requires fast production and is low risk, meaning, if there is a mistake in the copy, the consequences are relatively minor.
Another benefit of GPT-3 is that it is lightweight and can run on a consumer laptop or smartphone.
What are the risks and limitations of GPT-3?
While GPT-3 is remarkably large and powerful, it has several limitations and risks associated with its usage.
Limitations
- Pre-training. GPT-3 is not constantly learning. It has been pre-trained, meaning it doesn't have an ongoing long-term memory that learns from each interaction.
- Limited input size. Transformer architectures -- including GPT-3 -- have a limited input size. A user cannot provide a lot of text as input for the output, which can limit certain applications. GPT-3 has a prompt limit of about 2,048 tokens.
- Slow inference time. GPT-3 also suffers from slow inference time since it takes a long time for the model to generate results.
- Lack of explainability. GPT-3 is prone to the same problems many neural networks face -- their lack of ability to explain and interpret why certain inputs result in specific outputs.
Risks
- Mimicry. Language models such as GPT-3 are becoming increasingly accurate, and machine-generated content may become difficult to distinguish from that written by a human. This may pose some copyright and plagiarism issues.
- Accuracy. Despite its proficiency in imitating the format of human-generated text, GPT-3 struggles with factual accuracy in many applications.
- Bias. Language models are prone to machine learning bias. Since the model was trained on internet text, it has potential to learn and exhibit many of the biases that humans exhibit online. For example, two researchers at the Middlebury Institute of International Studies at Monterey found that GPT-2 -- GPT-3's predecessor -- is adept at generating radical text, such as discourses that imitate conspiracy theorists and white supremacists. This presents the opportunity to amplify and automate hate speech, as well as inadvertently generate it. ChatGPT -- powered on a variant of GPT-3 -- aims to reduce the likelihood of this happening through more intensive training and user feedback.
History of GPT-3
Formed in 2015 as a nonprofit, OpenAI developed GPT-3 as one of its research projects. It aimed to tackle the larger goals of promoting and developing "friendly AI" in a way that benefits humanity as a whole.
The first version of GPT was released in 2018 and contained 117 million parameters. The second version of the model, GPT-2, was released in 2019 with around 1.5 billion parameters. As the latest version, GPT-3 jumps over the last model by a huge margin with more than 175 billion parameters -- more than 100 times its predecessor and 10 times more than comparable programs.
Earlier pre-trained models -- such as BERT -- demonstrated the viability of the text generator method and showed the power that neural networks have to generate long strings of text that previously seemed unachievable.
OpenAI released access to the model incrementally to see how it would be used and to avoid potential problems. The model was released during a beta period that required users apply to use the model, initially at no cost. However, the beta period ended in October 2020, and the company released a pricing model based on a tiered credit-based system that ranges from a free access level for 100,000 credits or three months of access to hundreds of dollars per month for larger-scale access. In 2020, Microsoft invested $1 billion in OpenAI to become the exclusive licensee of the GPT-3 model. This means that Microsoft has sole access to GPT-3's underlying model.
ChatGPT launched in November 2022 and was free for public use during its research phase. This brought GPT-3 more mainstream attention than it previously had, giving many nontechnical users an opportunity to try the technology. GPT-4 was released in March of 2023 and is rumored to have significantly more parameters than GPT-3.
Future of GPT-3
There are many Open Source efforts in play to provide a free and non-licensed model as a counterweight to Microsoft's exclusive ownership. New language models are published frequently on Hugging Face’s platform.
It is unclear exactly how GPT-3 will develop in the future, but it is likely that it will continue to find real-world uses and be embedded in various generative AI applications. Many applications already use GPT-3, including Apple’s Siri virtual assistant. Where possible, GPT-4 will be integrated where GPT-3 was used.