What is an AI accelerator? What is agentic AI? Complete guide
X

OpenAI o1 explained: Everything you need to know

OpenAI's o1 models, launched in December 2024, enhance reasoning in AI and excel in complex tasks, such as generating and debugging code.

OpenAI has emerged to be one of the primary leaders of the generative AI era. The company's ChatGPT is among the most popular and widely used instances of generative AI, powered by its GPT family of large language models, or LLMs. As of December 2024, the primary models used by ChatGPT are GPT-4o and GPT-3.5.

For multiple weeks in August and into September 2024, reports circulated about a new model from OpenAI -- codenamed "Strawberry." Initially, it was not clear whether Strawberry was the successor to GPT-4o or something else.

On Sept. 12, 2024, the suspense behind Strawberry lifted with the initial launch of OpenAI o1 models, including o1-preview and o1-mini. On Dec. 5, 2024, as part of its "12 Days of OpenAI" event, the company made the  o1 model generally available, alongside the introduction of the o1 pro mode offering.

What is OpenAI o1?

OpenAI o1 is a family of LLMs from OpenAI that have been optimized with enhanced reasoning functionality.

The o1 models were initially intended to be preview models, designed to provide users -- as well as OpenAI -- with a different type of LLM experience than the GPT-4o model. As is the case with all OpenAI's LLMs, o1 is a transformer model. It can be used to summarize content, generate new content, answer questions and write application code.

As opposed to OpenAI's prior models, the o1 models are designed to reason better. That is, instead of just providing a response as quickly as possible and using the basic transformer approach of weights and understanding what word or words belong together, o1 "thinks" about what the right approach is to solve a problem. The process of reasoning about a given problem in response to a user query is intended to provide a potentially more accurate response to certain types of complex queries. Unlike previous models, the o1 series spends more time processing information before responding. The o1 models are targeted at tackling hard problems that require multistep reasoning and complex problem-solving strategies.

The basic strategy taken by OpenAI for reasoning is chain-of-thought prompting, where a model reasons step by step through a problem in an iterative approach. The development of o1 involved advanced training techniques, such as reinforcement learning.

The initial launch in September 2024 included two models:

  • OpenAI o1-preview -- excels at tackling sophisticated problems.
  • OpenAI o1-mini -- provides a smaller, more cost-efficient version of o1.

In December 2024, OpenAI graduated the o1-preview to become just o1 and introduced the o1 pro mode as part of the $200 ChatGPT Pro service tier.

The o1 model family

There are three models in the OpenAI o1 model family, and each model is designed to meet a specific target use case.

o1

The full o1 model is the graduated version of the original o1-preview release. According to OpenAI, the release version introduces significant improvements, including a 34% reduction in major errors on difficult problems. It also includes the ability to analyze and respond to uploaded images.

o1-mini

The o1-mini model is a small version of the primary o1 model, optimized for speed and efficiency while maintaining strong performance metrics. According to OpenAI, o1-mini does particularly well at coding tasks, making it a good choice for developers and programmers who need quick, reliable responses.

o1 pro mode

The o1 pro mode is the most powerful iteration of the OpenAI reasoning model family. This premium version uses additional computing power to improve performance across multiple challenging benchmarks. According to OpenAI, o1 pro mode had an 86% pass rate on American Invitational Mathematics Examination (AIME) 2024 math competitions, compared to 78% for standard o1.

Some queries can take more time than ChatGPT users have grown to expect. To help manage expectations, the o1 pro mode also provides a progress bar and a notification system for long-running queries to keep users updated.

But all that power comes at a cost. The o1 pro mode is exclusively available through OpenAI's high-end ChatGPT Pro subscription, which costs $200 per month. 

What can OpenAI o1 do?

OpenAI o1 can perform many tasks like any of OpenAI's other GPT models -- such as answering questions, summarizing content and generating new content.

As an advanced reasoning model, o1 is particularly well-suited for certain tasks and use cases, including the following:

  • Enhanced reasoning. The o1 models are optimized for complex reasoning tasks, especially in STEM (science, technology, engineering and mathematics).
  • Brainstorming and ideation. The model's advanced reasoning abilities make it useful for generating creative ideas and solutions in various contexts.
  • Scientific research. The o1 models are ideal for different types of scientific research tasks. For example, o1 can annotate cell sequencing data and handle complex mathematical formulas needed in fields such as quantum optics.
  • Coding. The o1 models are effective at generating and debugging code, performing well in coding benchmarks such as HumanEval and Codeforces, according to OpenAI. The models are also effective in helping build and execute multi-step workflows for developers.
  • Mathematics. According to OpenAI, o1 excels in math-related benchmarks, outscoring the company's prior models. On the American Invitational Mathematics Examination (AIME) benchmark, o1 pro mode scored 86%, while standard o1 scored 78%. The model's math capabilities could potentially be used to help generate complex mathematical formulas for physicists.
  • Self-fact-checking. The o1 models can self-fact-check, improving the accuracy of its responses.
  • Image analysis capabilities. The o1 models provide advanced image analysis capabilities, letting users upload images and receive detailed responses. For example, users can upload photos of objects such as birdhouses and receive building instructions, or submit sketches for data center designs and receive detailed technical feedback.

How to use OpenAI o1

There are several ways users and organizations can use the o1 models.

  • ChatGPT Plus, Team Enterprise and Education users. The o1 and o1-mini models are available directly for users of ChatGPT Plus, Team, Enterprise and Education subscribers. Users can select the model manually in the model picker.
  • ChatGPT Pro users. The ChatGPT Pro tier at $200 a month is the initial exclusive home to the o1 pro model. ChatGPT Pro also includes a grant program providing free access to leading medical researchers, with initial grants awarded to researchers at institutions including Boston Children's Hospital, Berkeley Lab and The Jackson Laboratory.
  • API developers. Developers can access o1 and o1-mini through OpenAI's API.
  • Third-party services. Multiple third-party services have made the models available, including Microsoft Azure AI Studio and GitHub Models.

What are the limitations of OpenAI o1

As a new type of LLM, there are several limitations to the OpenAI o1 model, including the following:

  • Feature gaps. The o1 models lack web browsing, though it is a planned future capability.
  • API restrictions. At launch, there are a variety of restrictions on the API limiting the models. OpenAI has announced plans to expand o1's API functionality to include enhanced features such as function calling and structured outputs in future updates.
  • Response time. OpenAI users have come to expect rapid responses with little delay. But the o1 models are initially slower than previous models due to more thorough reasoning processes.
  • Cost. For API users OpenAI o1 is more expensive than previous models -- including GPT-4o.

How OpenAI o1 improves safety

As part of the o1 models release, OpenAI also publicly released a System Card, which is a document that describes the safety evaluations and risk assessments that were done during model development. It details how the models were evaluated using OpenAI's framework for assessing risks in areas such as cybersecurity, persuasion and model autonomy.

  • Chain-of-thought reasoning. The o1 models use large-scale reinforcement learning to perform complex reasoning before responding. This lets them refine the generation process and recognize mistakes. As a result, they can better follow specific guidelines and model policies, improving their ability to provide safe and appropriate content.
  • Advanced jailbreak resistance. The o1 models demonstrate significant improvements in resisting jailbreaks. On the Strong Reject benchmark, which tests resistance against common attacks from literature, o1 and o1-mini achieve better scores than GPT-4o.
  • Improved content policy adherence. On the Challenging Refusal Evaluation, which tests the model's ability to refuse unsafe content across categories such as harassment, hate speech and illicit activities, o1 achieves a not-unsafe score of 0.92, which is superior to GPT-4o's 0.713.
  • Enhanced bias mitigation. On the Bias Benchmark for QA evaluation, which tests for demographic fairness, o1 selects the correct answer 94% of the time on unambiguous questions, compared to GPT-4o's 72%. The models also show improved performance on evaluations measuring the use of race, gender and age in decision-making, with o1 generally outperforming GPT-4o.
  • Legible safety monitoring. The chain-of-thought summaries provided by o1 models offer a new approach for safety monitoring. In an analysis of 100,000 synthetic prompts, only 0.17% of o1's responses were flagged as deceptive, with most of these being forms of hallucination rather than intentional deception.

GPT-4o vs. OpenAI o1

The following chart provides a comparison of OpenAI's GPT-4o and o1 models, showing a number of differences across them.

Feature GPT-4o o1 models
Release date May 13, 2024 Dec. 5, 2024
Model variants Single model Three variants: o1, o1-mini and o1 pro
Reasoning capabilities Good performance Enhanced reasoning, especially in STEM fields
Performance benchmarks 13% on Mathematics Olympiad 86% on Mathematics Olympiad, PhD-level accuracy in STEM
Multimodal capabilities Handles text, images, audio and video Handles text and images
Context window 128K tokens 128K tokens
Speed Twice as fast as previous models Slower due to more reasoning processes
Availability Widely available across OpenAI products Limited access for specific users
Features Includes web browsing, file uploads Lacks some features from GPT-4o, such as web browsing
Safety and alignment Focused on safety measures Improved safety measures, higher resistance to jailbreaking

Sean Michael Kerner is an IT consultant, technology enthusiast and tinkerer. He has pulled Token Ring, configured NetWare and been known to compile his own Linux kernel. He consults with industry and media organizations on technology issues.

Dig Deeper on Data analytics and AI