prompt engineering transformer model


What is Dall-E?

Dall-E is a generative AI technology that enables users to create new images with text to graphics prompts. Functionally, Dall-E is a neural network and is able to generate entirely new images in any number of different styles as specified by the user's prompts.

The name Dall-E is an homage to the two different core themes of the technology, hinting at the goal of merging art and AI technology. The first part (DALL) is intended to be evocative of famous Spanish surreal artist Salvador Dali, while the second part (E) is related to the fictional Disney robot Wall-E. The combination of the two names reflects the abstract and somewhat surreal illustrative power of the technology, that is automated by a machine.

Dall-E was developed by AI vendor OpenAI and first launched in January 2021. The technology uses deep learning models alongside the GPT-3 large language model as a base to understand natural language user prompts and generate new images.

Dall-E is an evolution of a concept that OpenAI first began to talk about in June 2020, originally called Image GPT, that was an initial attempt at demonstrating how a neural network can be used to create new high-quality images. With Dall-E, OpenAI was able to extend the initial concept of Image GPT, to enable users to generate new images with a text prompt, much like how GPT-3 can generate new text in response to natural language text prompts.

The Dall-E technology fits into a category of AI that is sometimes referred to as generative design and competes against other similar technologies including Stable Diffusion and Midjourney.

How does Dall-E work?

Dall-E works by using a number of technologies including natural language processing (NLP), large language models (LLMs) and diffusion processing.

Dall-E was built using a subset of the GPT-3 LLM. Instead of the full 175 billion parameters that GPT-3 provides, Dall-E uses only 12 billion parameters in an approach that was designed to be optimized for image generation. Just like the GPT-3 LLM, Dall-E also makes use of a transformer neural network -- also simply referred to as a transformer -- to enable the model to create and understand connections between different concepts.

Technically, the approach that enables Dall-E was originally detailed by Open AI researchers as Zero-Shot Text-to-Image Generation and explained in a 20-page research paper released in February 2021. Zero Shot is an AI approach where a model can execute a task, such as generating an entirely new image, by using prior knowledge and related concepts.

To help prove that the Dall-E model was able to correctly generate images, Open AI also built the CLIP (Contrastive Language-Image Pre-training) model, which was trained on 400 million labeled images. OpenAI used CLIP to help evaluate Dall-E's output by analyzing which caption is most suitable for a generated image.

The first iteration of Dall-E (Dall-E 1) generated images from text using a technology known as a Discreet Variational Auto-Encoder (dVAE) that was somewhat based on research conducted by Alphabet's DeepMind division with the Vector Quantized Variational AutoEncoder.

Dall-E 2 improved on the methods used for its first generation to create more high-end and photorealistic images. Among the ways Dall-E 2 works is with the use of a diffusion model that integrates data from the CLIP model to help generate a higher quality image.

Dall-E use cases

As a generative AI technology, there are a wide range of possible use cases for Dall-E to help individuals and organizations, including the following:

  • Creative inspiration. The technology can be used to help inspire a creative person to create something new. It can also be used as a supplement to an existing creative process.
  • Entertainment. Images created by Dall-E could potentially be used in books or games. Dall-E can go beyond the capabilities of traditionally computer-generated imagery (CGI) in that the prompt system is easier to use to create graphics.
  • Education. Teachers and educators use Dall-E to generate images to explain different concepts.
  • Advertising and marketing. The ability to create entirely unique and novel images can be useful for advertising and marketing.
  • Product design. A product designer can use Dall-E to visualize something new, just with the use of text, in an approach that can be significantly faster than using traditional computer-aided design (CAD) technologies.
  • Art. Dall-E can be used by anyone to create new art to be enjoyed and even displayed.
  • Fashion design. As a supplement to existing tools, Dall-E can potentially be useful to help fashion designers come up with new items.
Dall-E generated image.
Dall-E can generate images based off a user's text prompt.

What are the benefits of Dall-E?

Dall-E can provide numerous potential benefits including the following:

  • Speed. In a very short period of time, often less than a minute, Dall-E can produce an image from a simple text prompt.
  • Customization. Based on a text prompt, a user can create a highly customized image of nearly anything that can be imagined.
  • Accessibility. Since it just requires natural language text, Dall-E is relatively accessible to users and does not require any extensive training or specific programming skills.
  • Extensibility. Dall-E can help an individual to extend an existing image, by remixing it or allowing it to be re-imagined in a new way.
  • Iteration. New and existing images can be iterated quickly with Dall-E, allowing users to generate multiple iterations.

What are the limitations on Dall-E?

While Dall-E has plenty of benefits, the capabilities of the technology are not limitless. There are several limitations on Dall-E:

  • Copyright. The issue of copyright on images created by Dall-E, as well as whether it was trained on copyrighted images remains a concern.
  • Legitimacy of generated art. There are some that also question the legitimacy and ethic of AI-generated art and whether it displaces humans.
  • Data set. Even though Dall-E was trained using a large data set, there is still vastly more data for images and descriptions that is available. As such, a user prompt can potentially not generate an intended image as the model lacks the foundational information.
  • Realism. Though Dall-E 2 has dramatically improved the image quality of the generated images, some images can still have a quality that doesn't make them look real enough for some users.
  • Context. To get the right image, a user must have a clearly defined prompt. If the prompt is too generic and lacks any context, the image generated by Dall-E may be inaccurate.

How much does Dall-E cost?

Dall-E can be used both by individuals and developers who can choose to embed the technology via an API into their own offerings.

For those using Dall-E directly on the OpenAI site, the company has built out a credit system to help meter usage. Currently, free credits are granted to early adopters of Dall-E who signed up before April 6, 2023. These free credits replenish on a monthly basis and expire a month after they are granted. A credit is consumed for each request that is made to generate or customize an image with Dall-E. New users can purchase credits. As of April 2023, 115 credits cost $15. Paid credits expire a year after purchase.

For developers using the API, OpenAI bills on a cost-per-image basis. The cost varies based on the image size. In April 2023, a 256x256 image cost $0.016, 512x512 cost $0.018 and 1024 x1024 cost $0.020 per image.

OpenAI also provides volume discounts via its enterprise sales organization. The most up-to-date pricing can be found on its pricing page.

Dall-E vs. Dall-E 2

Dall-E 2 represents an evolution of the original Dall-E engine, providing users with a series of enhanced capabilities.

Dall-E 1 was announced in January 2021, while Dall-E 2 came out in April 2022. With the original Dall-E, OpenAI used a dVAE to generate images. Dall-E 2 uses a diffusion model that can generate higher quality images. OpenAI claims that Dall-E 2 images can have four times the resolution of images created with Dall-E. Dall-E 2 also benefits from speed and image size capability improvement over its predecessor, enabling users to generate bigger images at a faster rate.

The ability to customize an image using different styles was also significantly expanded with the Dall-E 2 model. For example, a prompt can specify an image be drawn as pixel art or as an oil painting. Dall-E 2 also introduced the concept of outpainting that allows users to create an image as an extension (or outpainting) of an original image.

This was last updated in April 2023

Continue Reading About Dall-E

Dig Deeper on AI technologies

Business Analytics
Data Management