OpenAI GANs vs. VAEs: What is the best generative AI approach?

generative adversarial network (GAN)

What is a generative adversarial network (GAN)?

A generative adversarial network (GAN) is a machine learning (ML) model in which two neural networks compete with each other by using deep learning methods to become more accurate in their predictions. GANs typically run unsupervised and use a cooperative zero-sum game framework to learn, where one person's gain equals another person's loss.

The two neural networks that make up a GAN are referred to as the generator and the discriminator. The generator is a convolutional neural network and the discriminator is a deconvolutional neural network. The goal of the generator is to artificially manufacture outputs that could easily be mistaken for real data. The goal of the discriminator is to identify which of the outputs it receives have been artificially created.

Essentially, generative models create their own training data. While the generator is trained to produce false data, the discriminator network is taught to distinguish between the generator's manufactured data and true examples. If the discriminator rapidly recognizes the fake data that the generator produces -- such as an image that isn't a human face -- the generator suffers a penalty. As the feedback loop between the adversarial networks continues, the generator will begin to produce higher-quality and more believable output and the discriminator will become better at flagging data that has been artificially created. For instance, a generative adversarial network can be trained to create realistic-looking images of human faces that don't belong to any real person.

How GANs work

GANs are typically divided into the following three categories:

  • Generative. This describes how data is generated in terms of a probabilistic model.
  • Adversarial. A model is trained in an adversarial setting.
  • Networks. Deep neural networks can be used as artificial intelligence (AI) algorithms for training purposes.

The first step in establishing a GAN is to identify the desired end output and gather an initial training data set based on those parameters. This data is then randomized and input into the generator until it acquires basic accuracy in producing outputs.

Next, the generated samples or images are fed into the discriminator along with actual data points from the original concept. After the generator and discriminator models have processed the data, optimization with backpropagation starts. The discriminator filters through the information and returns a probability between 0 and 1 to represent each image's authenticity -- 1 correlates with real images and 0 correlates with fake. These values are then manually checked for success and repeated until the desired outcome is reached.

A GAN typically takes the following steps:

  1. The generator outputs an image after accepting random numbers.
  2. The discriminator receives this created image in addition to a stream of photos from the real, ground-truth data set.
  3. The discriminator inputs both real and fake images and outputs probabilities -- a value between 0 and 1 -- where 1 indicates a prediction of authenticity and 0 indicates a fake.

This creates a double feedback loop where the discriminator is in a feedback loop with the ground truth of the images and the generator is in a feedback loop with the discriminator.

How GAN works.
An image showing how GAN works.

Types of GANs

GANs come in a variety of forms and can be used for various tasks. The following are the most common GAN types:

  • Vanilla GAN. This is the simplest of all GANs and its algorithm tries to optimize the mathematical equation using stochastic gradient descent, which is a method of learning an entire data set by going through one example at a time. It consists of a generator and a discriminator. The classification and creation of generated images is done using the generators and discriminators as straightforward multi-layer perceptrons. The discriminator seeks to determine the likelihood that the input belongs to a particular class while the generator collects the distribution of the data.
  • Conditional GAN. By applying class labels, this kind of GAN enables the conditioning of the network with new and specific information. As a result, during GAN training, the network receives the images with their actual labels, such as "rose," "sunflower" or "tulip" to help it learn how to distinguish between them.
  • Deep convolutional GAN. This GAN uses a deep convolutional neural network for producing high-resolution image generation that can be differentiated. Convolutions are a technique for drawing out important information from the generated data. They function particularly well with images, enabling the network to quickly absorb the essential details.
  • CycleGAN. This is the most common GAN architecture and is generally used to learn how to transform between images of various styles. For instance, a network can be taught how to alter an image from winter to summer or from an image of a horse to a zebra. One of the most well-known applications of CycleGAN is FaceApp, which alters human faces into various age groups.
  • StyleGAN. Researchers from Nvidia released StyleGAN in December 2018 and proposed significant improvements to the original generator architecture models. StyleGAN can produce photorealistic, high-quality photos of faces, but users can modify the model to alter the appearance of the images that are produced.
  • Super resolution GAN. With this type of GAN, a low-resolution image can be changed into a more detailed one. Super-resolution GANs increase the image resolution by filling in blurry spots.

Popular use cases for GANs

GANs are becoming a popular ML model for online retail sales because of their ability to understand and recreate visual content with increasingly remarkable accuracy. They can be used for a variety of tasks, including anomaly detection, data augmentation, picture synthesis, and text-to-image and image-to-image translation.

Common use cases of GANs include the following:

  • Filling in images from an outline.
  • Generating a realistic image from text.
  • Producing photorealistic depictions of product prototypes.
  • Converting black and white imagery into color.
  • Photo translations from image sketches or semantic images that are especially useful in the healthcare industry for diagnoses.

In video production, GANs can be used to perform the following:

  • Model patterns of human behavior and movement within a frame.
  • Predict subsequent video frames.
  • Create a deepfake.

Other use cases of GANs include text-to-speech for the generation of realistic speech sounds. Furthermore, GAN-based generative AI models can generate text for blogs, articles and product descriptions. These AI-generated texts can be used for a variety of purposes, including advertising, social media content, research and communication.

GAN examples

GANs are used to generate a wide range of data types, including images, music and text. The following are popular real-world examples of GAN:

  • Generating human faces. GANs can produce accurate representations of human faces. For example, StyleGAN2 from Nvidia can produce excellent, photorealistic images of people that don't exist. These pictures are so lifelike that many people believe they're actual individuals.
  • Developing new fashion designs. GANs can be used to create new fashion designs that reflect existing ones. For instance, clothing retailer H&M used GANs to create new apparel designs for its merchandise.
  • Creating realistic animal images. GANs can also generate realistic images of animals. For example, BigGAN, a GAN model developed by Google researchers, can produce high-quality images of animals such as birds and dogs.
  • Video game character creation. GANs can be used to create new characters for video games. For example, Nvidia created new characters using GANs for the well-known video game Final Fantasy XV.
  • Generating realistic three-dimensional (3D) objects. GANs are also capable of producing actual 3D objects. For example, researchers at Massachusetts Institute of Technology have created 3D models of chairs and other furniture that appear to have been created by people using GANs. These models can be applied to architectural visualization or video games.

Both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have played a big role in the advancement of AI. Learn how CNNs and RNNs differ from each other and explore their strengths and weaknesses.

This was last updated in March 2023

Continue Reading About generative adversarial network (GAN)

Dig Deeper on Machine learning platforms

Business Analytics
Data Management