Getty Images

GPT-4o vs. GPT-4: How do they compare?

GPT-4o, OpenAI's latest model, promises improved multimodal capabilities and increased efficiency. Explore the differences between GPT-4o and its predecessor, GPT-4.

OpenAI's latest release, GPT-4o, builds on the foundation set by the company's previous models with significant updates, including enhanced multimodal capabilities and faster performance.

Since OpenAI first launched ChatGPT in late 2022, the chatbot interface and its underlying models have already undergone several major changes. GPT-4o was released in May 2024 and is the successor to GPT-4, which launched in March 2023.

GPT-4 and GPT-4o (that's the letter "o," for omni) are advanced generative AI models that OpenAI developed for use within the ChatGPT interface. Both models are trained to generate natural-sounding text in response to users' prompts, and they can engage in interactive, back-and-forth conversations, retaining memory and context to inform future responses.

TechTarget Editorial compared these products by testing the models within ChatGPT; reading informational materials and technical documentation from OpenAI; and analyzing user reviews on Reddit, tech blogs and the OpenAI developer forum.

Differences between GPT-4o and GPT-4

In many ways, GPT-4o and GPT-4 are similar. Both are advanced OpenAI models with vision and audio capabilities and the ability to recall information and analyze uploaded documents. Each has a 128,000-token context window and a knowledge cutoff date in late 2023 (October for GPT-4o, December for GPT-4).

But GPT-4o and GPT-4 also differ significantly in several areas: multimodal capabilities; performance and efficiency; pricing; and language support.

Timeline of key milestones in OpenAI's history from 2015 to 2023, highlighting major releases and corporate changes.
The release of GPT-4o is among the most significant developments at OpenAI since the launch of GPT-4 in 2023.


Multimodal AI models are capable of processing multiple data types, such as text, images and audio. In a sense, both GPT-4 and GPT-4o are multimodal: In the ChatGPT interface, users can create and upload images and use voice chat regardless of whether they're using GPT-4 or GPT-4o. However, the way that the two models approach multimodality is very different -- it's one of the biggest differentiators between GPT-4o and GPT-4.

GPT-4 is a large language model (LLM) primarily designed for text processing, meaning that it lacks built-in support for handling images, audio and video. Within the ChatGPT web interface, GPT-4 must call on other OpenAI models, such as the image generator Dall-E or the speech recognition model Whisper, to process non-text input.

GPT-4o, in contrast, was designed for multimodality from the ground up, hence the "omni" in its name. "We trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network," OpenAI representatives wrote in a blog post announcing the launch.

This native multimodality makes GPT-4o faster than GPT-4 on tasks involving multiple types of data, such as image analysis. In OpenAI's demo of GPT-4o on May 13, 2024, for example, company leaders ​used GPT-4o to analyze live video of a user solving a math problem and provide real-time voice feedback.

Controversy over GPT-4o's voice capabilities

The demo during OpenAI's livestreamed GPT-4o launch featured a voice called Sky, which listeners and Scarlett Johansson both noted sounded strikingly similar to Johansson's AI assistant character in the film Her. OpenAI CEO Sam Altman himself tweeted the single word "her" during the demo.

Subsequently, Johansson said she had retained legal counsel and revealed that Altman had previously asked to use her voice in ChatGPT, a request she declined. In response, OpenAI paused the use of the Sky voice, although Altman said in a statement that Sky was never intended to resemble Johansson.

The incident highlights growing concerns over the ethical use of voice likenesses and artists' rights in the generative AI era. Rep. Nancy Mace, chairwoman of the House Subcommittee on Cybersecurity, Information Technology and Government Innovation, recently invited Johansson to testify before the committee about the Sky voice and the broader issue of deepfakes.

Performance and efficiency

GPT-4o is also designed to be quicker and more computationally efficient than GPT-4 across the board, not just for multimodal queries. According to OpenAI, GPT-4o is twice as fast as the most recent version of GPT-4.

When TechTarget Editorial timed the two models in testing, GPT-4o's responses were indeed generally quicker than GPT-4's -- although not quite double the speed -- and similar in quality. The following table compares GPT-4o and GPT-4's response times to five sample prompts using the ChatGPT web app.

Prompt GPT-4o GPT-4
Generate a 500-word essay on how quantum computing could change the IT industry. 23 seconds 33 seconds
Develop an itinerary for a three-day trip to Traverse City, Michigan. 28 seconds 48 seconds
Print "hello world" in C. 4 seconds 7 seconds
Write alt text for the attached image [a photo of an oriole]. 2 seconds 3 seconds
Summarize the attached document [a 22-page neuroscience journal article] in five key bullet points. 16 seconds 19 seconds

OpenAI's testing indicates that GPT-4o outperforms GPT-4 on major benchmarks, including simple math, language comprehension and vision understanding. OpenAI has also said that GPT-4o has stronger contextual understanding than GPT-4, enabling it to better grasp idioms, metaphors and cultural references.

What actual users say varies. As of publication time, GPT-4o is the top-rated model on the crowdsourced LLM evaluation platform LMSYS Chatbot Arena, both overall and in specific categories such as coding and responding to difficult queries. But other users call GPT-4o "overhyped," reporting that it performs worse than GPT-4 on tasks such as coding, classification and reasoning.

Unfortunately, each type of evidence -- self-reported benchmarks from model developers, crowdsourced human evaluations and unverified anecdotes -- has its own limitations. For developers building LLM apps and users integrating generative AI into their workflows, deciding which model is the best fit might ultimately require experimenting with both over time and in various contexts. Some developers, for example, say that they switch back and forth between GPT-4 and GPT-4o depending on the task at hand.


One advantage of GPT-4o's improved computational efficiency is its lower pricing. For developers using OpenAI's API, GPT-4o is by far the more cost-effective option. It's available at a rate of $5 per million input tokens and $15 per million output tokens, while GPT-4 costs $30 per million input tokens and $60 per million output tokens. Even GPT-4-Turbo, designed to be faster and cheaper than GPT-4, is more expensive than GPT-4o at $10 per million input tokens and $30 per million output tokens.

For web app users, the difference is even more significant. Moving forward, GPT-4o will power the free version of ChatGPT, replacing GPT-3.5. This gives free users access to multimodality, higher-quality text responses, voice chat and custom GPTs -- a no-code option for building personalized chatbots -- which were previously only available to paying customers. GPT-4 will remain available only to those on a paid plan, including ChatGPT Plus, Team and Enterprise, which start at $20 per month.

However, this rollout is still in progress, and some users might not yet have access to GPT-4o. As of a test on June 10, 2024, GPT-3.5 was still the default for free users without a ChatGPT account.

Moreover, free and paid users will have different levels of access to each model. Free users will face message limits for GPT-4o, and after hitting those caps, they'll be switched to GPT-3.5. ChatGPT Plus users will have higher message limits than free users, and those on a Team and Enterprise plan will have even fewer restrictions.

Language support

GPT-4o also offers significantly better support for non-English languages compared with GPT-4. In particular, OpenAI has improved tokenization for languages that don't use a Western alphabet, such as Hindi, Chinese and Korean. The new tokenizer more efficiently compresses non-English text, with the aim of handling prompts in those languages in a cheaper, quicker way.

This change addresses a longstanding issue in natural language processing, in which models have historically been optimized for Western languages at the expense of languages spoken in other regions. Handling more languages with greater accuracy and fluency makes GPT-4o more effective for global applications and opens up access to groups that may not have been able to engage with models as fully before.

But the improved language support isn't without challenges. Just days after OpenAI released GPT-4o, researchers noticed that many Chinese tokens included inappropriate phrases related to pornography and gambling. Model developers might have included these problematic tokens due to inadequate data cleaning, potentially degrading the model's comprehension and risking security breaches and hallucinations.

Is GPT-4o better than GPT-4?

In most cases, GPT-4o is indeed better than GPT-4. OpenAI now describes GPT-4o as its flagship model, and its improved speed, lower costs and multimodal capabilities will be appealing to many users.

That said, some users may still prefer GPT-4, especially in business contexts. Because GPT-4 has been available for over a year now, it's well tested and already familiar to many developers and businesses. That kind of stability can be crucial for critical and widely used applications, where reliability might be a higher priority than having the lowest costs or the latest features​.

In addition, although GPT-4o will generally be more cost-effective for new deployments, IT teams looking to manage existing setups might find it more economical to continue using GPT-4. Transitioning to a new model comes with its own costs, particularly for systems tightly integrated with GPT-4 where switching models could involve significant infrastructure or workflow changes.

In addition, GPT-4o's multimodal capabilities might differ for API versus web users, at least for now. In a May 2024 post in the OpenAI Developer Forum, an OpenAI product manager explained that GPT-4o does not yet support image generation or audio through the API. Consequently, enterprises primarily using OpenAI's APIs might not find GPT-4o compelling enough to make the switch until its multimodal capabilities become generally available through the API.

What does the introduction of GPT-4o mean for ChatGPT users?

The introduction of GPT-4o as the new default version of ChatGPT will lead to some major changes for users. One of the most significant updates is the availability of multimodal capabilities, as mentioned previously. Moving forward, all users will be able to interact with ChatGPT using text, images, audio and video and to create custom GPTs -- functionalities that were previously limited or unavailable.

These advancements might make the Plus subscription less appealing to some users, as many formerly premium features are now accessible in the free tier. That said, paid plans still offer benefits such as higher usage caps and faster response times, which could be a deciding factor for heavy users or businesses that need reliability in consistent, high-volume interactions.

Even amid the GPT-4o excitement, many in the AI community are already looking ahead to GPT-5, expected later this summer. Enterprise customers received demos of the new model this spring, sources told Business Insider, and OpenAI has teased forthcoming capabilities such as autonomous AI agents.

Lev Craig covers AI and machine learning as the site editor for TechTarget Editorial's Enterprise AI site. Craig graduated from Harvard University with a bachelor's degree in English and has previously written about enterprise IT, software development and cybersecurity.

Dig Deeper on Artificial intelligence platforms

Business Analytics
Data Management