Tech Accelerator What is GenAI? Generative AI explained

Prev Next

Definition

What is Fréchet inception distance (FID)?

George Lawton

By

George Lawton

Published: Nov 21, 2024

Fréchet inception distance (FID) is a metric for quantifying the realism and diversity of images generated by generative adversarial networks (GANs). Realistic could mean that generated images of people look like real images of people. Diverse means they are different enough from the original to be interesting and novel.

FID is generally used for analyzing images and not text, sounds or other modalities. Other related metrics are being developed for these domains.

FID is used for assessing individual images generated by GANs, the impact of neural network model changes on realism and the relative merits of multiple GAN models for generating images. It assesses visual quality and diversity well within a single metric. A lower score can measure when generated images are more like real images. For example, it can help weed out images of people with extra fingers or eyes in the wrong place.

First introduced in 2017, FID is one of the best automated measures for improving GANs for image generation. However, it can lead to various problems that developers must consider. Furthermore, it does not seem to work as well for other generative AI techniques, such as stable diffusion models, variational autoencoders or transformers.

This article is part of

What is GenAI? Generative AI explained

Which also includes:
8 top generative AI tool categories for 2025
Will AI replace jobs? 18 job types that might be affected
27 of the best large language models in 2025

Combining Fréchet distance and inception

Fréchet inception distance is a combination of the terms Fréchet distance and Google's inception model.

Generative adversarial network training method

The Fréchet distance quantifies the similarity of two curves. First introduced in 1906 by Maurice Fréchet, it quantifies the minimum length of leash required between a dog and walker while each walked a separate curved path of a certain distance. The same calculation is also useful for many other problems in handwriting recognition, robotics, geographic information systems and protein structure analysis.

The Inception-v3 model used in FID is one in a library of modules introduced by Google as part of its GoogLeNet convolutional neural network in 2014. It was first discussed in a research paper titled "Going deeper with convolutions." These components transform raw imagery into a latent space for representing the mathematical properties of images at multiple scales and in different locations within the image. For example, this could help align images of a cat in a latent space used for analysis, whether the image is zoomed in on the face or paw or whether the cat is located at the top or the bottom of the image.

The original inception models were introduced to help improve the performance of new neural networks on the ImageNet Large-Scale Visual Recognition Challenge in 2014. The various inception models represent both global and local information in smaller neural network layers for training deep neural networks while reducing computational complexity. Google explored variations, including Inception-ResNet, Inception-v2, Inception-v3 and Inception-v4.

These various inception models are sometimes used to extract features in computer vision tasks and detect objects. Despite not being the latest model, the Inception-v3 model combined with the Fréchet distance is best suited for analyzing GAN imagery.

Fréchet inception distance vs. inception score

Ian Goodfellow and research team at the University of Montreal first introduced GANs in a 2014 paper. In the early days, a competitive adversarial network was responsible for improving image quality.

In 2016, Goodfellow worked with researchers at OpenAI to improve GAN training using an inception score. This new metric evaluated the diversity and quality of generated images. It calculated the Kullback-Leibler (KL) divergence for assessing the diversity of generated images. The KL score determines how the probability distribution varies between two sets of numbers. In this case, the numbers represented the distribution of colors, shapes and edges at varying levels of scale calculated using the inception model.

However, an inception score suffers some limitations regarding how it compares to human judgment. It could also be adversely affected by different image sizes. Consequently, researchers continued to explore better techniques for assessing GAN image quality.

A research team at the Johannes Kepler University Linz introduced FID in a 2017 paper. The paper explored a better way of training both the GAN generator neural network and the discriminator neural network at different time scales. They reported that FID captures the similarity of generated images to real ones better than the inception score. Since then, the FID has continued to be the most popular approach for assessing GAN image quality.

How is the FID measured?

FID is measured by computing the differences between the representations of features, such as edges and lines, and higher-order phenomena, such as the shapes of eyes or paws that are transformed into an intermediate latent space. FID is calculated using the following steps:

Preprocess the images. Ensure the two images are compatible using basic processing. This can include resizing to a given dimension size, such as 640x480 pixels, and then normalizing pixel values.
Extract feature representations. Pass the real and generated images through the Inception-v3 model. This transforms the raw pixels into numerical vectors to represent aspects of the images, such as lines, edges and higher-order shapes.
Calculate statistics. Statistical analysis is performed to determine the mean and covariance matrix of the features in each image.
Compute the Fréchet distance. Compare the difference between each image's computed mean and covariance matrixes.
Obtain the FID. Compare the Fréchet distance between the real and generated images. Lower numbers indicated the images are more similar.

What is FID used for?

The primary use of FID is to evaluate the quality of images generated by GAN models. It provides a simple metric for assessing individual images or tuning the models used to generate them. Uses of FID include the following:

GAN evaluation. FID provides a metric for assessing how well a particular GAN model is performing in terms of generating realistic and diverse images. This can help compare different models or compare the performance of a model during training.
Model selection. FID can help compare the performance of GAN model variations or architectures.
Tuning hyperparameters. FID can assess the impact of changing hyperparameters on GAN model performance to guide adjustments toward more optimal configurations.
Novelty detection. FID can help identify images that are highly different, which could indicate novel examples.
Research. FID provides a simple way of comparing the merits of different GAN models for researchers.

What are the limitations of FID?

FID is widely used in evaluating the quality of images generated by GANs. However, it is not used for other types of media, such as music or text or with different kinds of neural network architectures. In addition, several other limitations should be taken into consideration:

Use of pre-trained models. FID uses a pre-trained Inception-v3 model as part of the process, which could introduce bias based on how the model was trained. This could be an issue if the training data differs substantially from the domain of the generated images. For example, an inception model pre-trained on cats and other animals may not work as well on buildings.
Insensitivity. FID may miss some aspects of image quality, such as fine-grained details or textures. As a result, certain kinds of image imperfections might not be caught by FID scores.
Requirement for consistent preprocessing. All the images -- including training data, real images, and generated images -- need to be scaled, cropped and normalized consistently. Differences in preprocessing can affect FID scores.
Subjectivity. FID scores do not necessarily capture all aspects of human perception and preferences. It's important to include human evaluators as part of the process to refine GAN models.
Overfitting. Exclusive focus on FID could lead to models that achieve high scores but do not look realistic. Consequently, developers need to perform human analysis to weed out problematic models.

The future of FID

FID continues to be a popular metric for assessing the performance of GAN image-generating models. At the same time, other kinds of generative models, such as Stable Diffusion and transformers, are growing in popularity for image generation. These different techniques will require new types of metrics for assessing image quality.

In the meantime, researchers will likely refine and improve FID metrics to make them more robust. For example, better pre-trained Inception-v3 models or new inception models could help to overcome bias or fine-tune data sets for different domains.

Other evaluation metrics may also emerge that could separately represent qualities, such as diversity or realism, that are rolled into a single metric with FIDs.

In the meantime, FID provides a relatively simple way to capture the quality of GAN models in a single metric. Like other metrics, this could guide research into better metrics come along.

Continue Reading About What is Fréchet inception distance (FID)?

10 top AI and machine learning trends for 2023

GANs vs. VAEs: What is the best generative AI approach?

Generative AI landscape: Potential future trends

Generative models: VAEs, GANs, diffusion, transformers, NeRFs

CNN vs. GAN: How are they different?

Dig Deeper on AI technologies

Search Business Analytics

Synthetic data vs. real data for predictive analytics
Synthetic data helps simulate rare events and meet privacy compliance, while real data preserves natural variability needed to ...
7 predictive analytics skills to improve simulation modeling
Predictive analytics skills such as statistical analysis, data preprocessing and model evaluation can help data professionals ...
Knime updates framework for agentic AI development
The open source analytics vendor is keeping up with competitors by providing features aimed at enabling users to create ...

Search CIO

9 common risk management failures and how to avoid them
As enterprises rework their business models and strategies to meet various new challenges, risks abound. Here are nine risk ...
Traditional vs. enterprise risk management: How do they differ?
Traditional risk management and enterprise risk management are similar in their aim to mitigate risks that can harm a company. ...
Domestic manufacturing policy emphasizes U.S. tech, products
Bringing manufacturing back to the U.S. might be a lofty goal for some products, but companies like Apple are making moves to ...

Search Data Management

Informatica adds MCP support, spate of AI-fueled features
With Model Context Protocol helping standardize how enterprises develop and deploy agents, support for the open standard is ...
What is data lineage? Techniques, best practices and tools
Organizations can bolster data governance efforts by tracking the lineage of data in their systems. Get advice on how to do so ...
Collibra's acquisition of Deasy targets unstructured data
With AI development on the rise, the vendor's latest purchase better enables customers to combine the complete array of relevant ...

Search ERP

6 benefits of using low-code ERP
Using low-code ERP can result in easier user training and more agility, among other benefits. Learn more about how the software ...
Ultimo adds digital labor to org chart, EAM system
The EAM vendor is building out a digital workforce at 'light speed' to become an AI-first business. It also wants to help ...
8 ways ERP software can improve customer service
By integrating sales, inventory and shipping data, ERP software helps companies avoid delays and stockouts. Learn more about how ...

Close