prompt engineering transformer model
X
Definition

What is Dall-E and how does it work?

Dall-E is a generative artificial intelligence (AI) technology that enables users to create images by submitting text-based prompts. Behind the scenes, Dall-E uses advanced text-to-graphic technologies to turn plain words into pictures. Dall-E is a trained neural network that can generate entirely new images in a variety of styles based on the user's prompt.

The name Dall-E is an homage to the two different core themes of the technology, hinting at the goal of merging art and AI technology. The first part (Dall) is intended to evoke the Spanish surreal artist Salvador Dalí, and the second part (E) is related to the fictional Disney robot Wall-E. The combination of the two names reflects the technology's abstract and somewhat surreal illustrative power.

AI vendor OpenAI developed Dall-E and launched the initial release in January 2021. The technology used deep learning models alongside the GPT-3 large language model (LLM) as a base for understanding natural language user prompts and generating new images.

Dall-E is an evolution of a project that OpenAI first introduced in June 2020. Originally called Image GPT, the project represented an initial attempt at demonstrating how a neural network could be used to create high-quality images. Dall-E extended the initial concept of Image GPT by enabling users to generate new images with text prompts, much like how GPT-3 can generate new text in response to natural language text prompts.

The Dall-E technology fits into a category of AI that is sometimes referred to as generative design. It competes against similar technologies, such as Stable Diffusion and Midjourney.

How does Dall-E work?

Dall-E uses several technologies to generate images, including natural language processing, LLMs and diffusion processing.

The original Dall-E was built using a subset of the GPT-3 LLM. However, instead of the full 175 billion parameters that GPT-3 provides, Dall-E used only 12 billion, an approach designed to optimize image generation. Like the GPT-3 LLM, Dall-E uses a transformer neural network -- also called a transformer -- to enable the model to create and understand connections between different concepts.

The original method used in Dall-E to implement text-to-image generation was described in the research paper "Zero-Shot Text-to-Image Generation," published in February 2021. Zero-shot is an AI method for enabling a model to execute a task, such as generating an entirely new image by using prior knowledge and related concepts.

To help prove that the Dall-E model could correctly generate images, OpenAI also built the Contrastive Language-Image Pre-training (CLIP) model, which was trained on 400 million labeled images. OpenAI used CLIP to help evaluate Dall-E's output by analyzing which caption is most suitable for a generated image.

OpenAI announced the first release of Dall-E in January 2021. Dall-E generated images from text using a technology known as a discrete variational autoencoder. The dVAE was loosely based on research conducted by Alphabet's DeepMind division with the vector quantized variational autoencoder.

The move to Dall-E 2

In April 2022, OpenAI introduced Dall-E 2, which provided users with a series of enhanced capabilities. It also improved on the methods used to generate images, resulting in a platform that could deliver more high-end and photorealistic images. One of the most important changes was the move toward a diffusion model that integrated the CLIP data to generate higher-quality images.

Compared to the dVAE used in Dall-E, the diffusion model could generate even higher-quality images. OpenAI claimed that Dall-E 2 could create images four times the resolution of Dall-E images. Dall-E 2 also featured improvements in speed and image sizes, enabling users to generate bigger images at a faster rate.

Dall-E 2 also expanded the ability to customize an image and apply different styles. In Dall-E 2, for instance, a prompt could specify that an image be drawn as pixel art or as an oil painting. Dall-E 2 also introduced the concept of outpainting, which enabled users to create an image as an extension -- or outpainting -- of an original image.

The introduction of Dall-E 3

OpenAI released Dall-E 3 in October 2023. Dall-E 3 builds on and improves Dall-E 2, offering better image quality and prompt fidelity. Dall-E 3 is also natively integrated into ChatGPT, unlike its predecessor. Now, any user can create AI-generated images from the ChatGPT prompt. However, the free ChatGPT version limits users to only two images per day. Developers can also access Dall-E 3 services through the OpenAI application programming interface (API), enabling them to embed Dall-E 3 functionality directly into their applications.

Dall-E 3 comes with significant improvements to the text-to-image engineering. Users can generate images more easily through simple conversation, and Dall-E 3 renders them more faithfully. Dall-E 3 can process extensive prompts without getting confused and render intricate details in a wide range of styles. It can understand more nuanced instructions. In addition, ChatGPT automatically refines a user's prompt, tailoring the original prompt to achieve more precise results. Users can also ask for revisions directly within the same chat as the first image request.

The images themselves are also superior to Dall-E 2. They're more accurate, in terms of responding to prompts, and the details are crisper, more precise and more visually refined. Dall-E 3 can also generate images in both landscape and portrait aspect ratios. In addition, Dall-E 3 can add text to an image much more effectively than Dall-E 2, although text capabilities are still somewhat unpredictable.

OpenAI has added several safeguards to Dall-E 3 to limit its ability to generate adult, violent or hateful content. For example, Dall-E 3 does not return an image if a prompt includes harmful biases or the name of a public figure. OpenAI has also taken steps to improve demographic representation within generated images. In addition, Dall-E 3 declines any requests that ask for the style of a living artist. Artists can also decline to have their art used to train models.

After the release of Dall-E 3, OpenAI stopped accepting new Dall-E 2 customers. This also means that new customers cannot purchase Dall-E 2 credits, although previously purchased credits remain valid.

What are the benefits of Dall-E?

Potential benefits of Dall-E include the following:

  • Speed. Dall-E can generate images in a short time, often less than a minute. A user can create a detailed, high-quality image with only a single text prompt.
  • Customization. With the right text prompt, a user can create a highly customized image of nearly anything that can be imagined -- though within the limitations on adult, violent or hateful content.
  • Accessibility. Because Dall-E 3 is accessible through ChatGPT using natural language, Dall-E is available to a wide range of users. It does not require any extensive training or specific programming skills.
  • Refinement. A user can refine an image through subsequent prompts in the same chat session as the original prompt. The user can also use Dall-E's generated prompt when launching a new chat session. Dall-E also suggests prompts for refining the image after creating the initial image.
  • Flexibility. Dall-E can analyze an image submitted by the user and, from this, generate a new image based on the user's prompt.

What are the limitations on Dall-E?

While Dall-E has plenty of benefits, it does come with several important concerns:

  • Copyright. In the past, there was concern about the copyright on images created by Dall-E, as well as whether it was trained on copyrighted images. With Dall-E 3, OpenAI has taken multiple steps to address some of these concerns, but the effectiveness of those steps remains unclear.
  • Image legitimacy. Some question the legitimacy and ethics of AI-generated art and whether it displaces humans. This controversy will continue for the foreseeable future; there are no clear answers to the concerns. However, OpenAI is researching ways to identify when an image was created with AI.
  • Data set. Even though Dall-E was trained using a large data set, a vast amount of image and descriptive data is still untapped. As such, a user prompt might fail to generate an intended image because the model lacks the foundational information.
  • Realism. Although Dall-E 3 has dramatically improved the quality of the generated images, some images might not appear realistic enough for some users.
  • Context. To get the right image, a user must submit a clearly defined prompt. If the prompt is too generic or lacks context, the image generated by Dall-E might be inaccurate. Even subsequent clarification prompts might not result in the expected image.
  • Bias. Although OpenAI is taking steps to reduce bias in Dall-E images, the risk for bias can still exist around issues such as race, class, gender, belief systems or country of origin.

Dall-E use cases

As a generative AI technology, Dall-E 3 offers a wide range of potential use cases for both individuals and organizations:

  • Creative inspiration. The technology can be used to help inspire artists or other individuals to create something new. Dall-E can also be used to support an existing creative process.
  • Entertainment. Images created by Dall-E can potentially be used in books or games. Dall-E can go beyond traditional computer-generated imagery because the prompts make it is easier to create graphics.
  • Education. Teachers and educators can use Dall-E to generate images to help explain different concepts.
  • Advertising and marketing. The ability to create entirely unique and novel images can be useful for advertising and marketing.
  • Product design. A product designer can use Dall-E to visualize something new, which can be significantly faster than using traditional computer-aided design technologies.
  • Art. Dall-E can be used by anyone to create new art to be enjoyed and displayed.
  • Fashion design. As a supplement to existing tools, Dall-E can potentially help fashion designers devise new concepts.
A Dall-E generated image
Dall-E can generate images based off a user's text prompt.

How much does Dall-E cost?

Dall-E 3 is now embedded in ChatGPT and is available to users with a paid ChatGPT subscription plan, including Plus, Team and Enterprise. The plans start at $20 per user per month. Individuals using the free version of ChatGPT can generate only two Dall-E images per day. OpenAI is no longer accepting new Dall-E 2 customers.

Dall-E 3 is also available to Microsoft Copilot users. Microsoft does not limit the number of images a user can generate each day. Instead, the company limits the number of boosts available to each subscription plan. A boost is a performance boost that the image generator receives each time it creates an image. The free plan offers only 15 boosts per day. The number increases with paid subscriptions.

Developers can also access Dall-E 2 and Dall-E 3 capabilities through the OpenAI API. The API makes it possible for them to incorporate Dall-E capabilities directly into their applications. This table shows OpenAI's current pricing for the API's Dall-E service.

Model Quality Resolution Price
Dall-E 3 Standard 1024×1024 $0.040 per image
Standard 1024×1792, 1792×1024 $0.080 per image
Dall-E 3 HD 1024×1024 $0.080 per image
HD 1024×1792, 1792×1024 $0.120 per image
Dall-E 2 1024×1024 $0.020 per image
512x512 $0.018 per image
256x256 $0.016 per image

The Dall-E 2 rates apply only to existing customers. All prices here are subject to change. OpenAI maintains a pricing page on its website.

Read about the differences between generative AI vs. machine learning. Learn everything you need to know about foundation AI models, which are large-scale and adaptable AI models reshaping enterprise AI. Explore factors to consider when getting an AI certification. Check out how AI will affect the future of content marketing.

This was last updated in November 2024

Continue Reading About What is Dall-E and how does it work?

Dig Deeper on AI technologies