Tech Accelerator What is GenAI? Generative AI explained

Prev Next

Definition

What is image-to-image translation?

Nick Barney

By

Nick Barney, Technology Writer

Published: Nov 21, 2024

Image-to-image translation is a generative artificial intelligence (AI) technique that translates a source image into a target image while preserving certain visual properties of the original image. This technology uses machine learning and deep learning techniques such as generative adversarial networks (GANs); conditional adversarial networks, or cGANs; and convolutional neural networks (CNNs) to learn complex mapping functions between input and output images.

Image-to-image translation allows images to be converted from one form to another while retaining essential features. The goal is to learn a mapping between the two domains and then generate realistic images in whatever style a designer chooses. This approach enables tasks such as style transfer, colorization and super-resolution, a technique that improves the resolution of an image.

The image-to-image technology encompasses a diverse set of applications in art, image engagement, data augmentation and computer vision, also known as machine vision. For instance, image-to-image translation allows photographers to change a daytime photo to a nighttime one, convert a satellite image into a map and enhance medical images to enable more accurate diagnoses.

Example of image-to-image translation changing the background of a stock photo. — Image-to-image translation makes changes to an image while maintaining its original properties, though the results aren't always perfect.

How does image-to-image translation work?

Image processing systems using image-to-image translation require the following basic steps:

Define image domains. The process begins by defining the image domains, which represent the types of input and output images the system will handle. These domains can include diverse categories such as style transfer, super-resolution and semantic segmentation.
Train the system. A data set containing paired examples of input and target images -- sometimes called ground truth target images -- is used to train the system so that it can learn the mapping that's required between the two domains.
Combine the generator and discriminator. Once trained, a GAN is used to combine generator and discriminator networks. The generator network takes in an input image from the source domain and generates an output image that belongs to the target domain. Meanwhile, the discriminator network learns to distinguish real images in the target domain as well as synthesized images produced by the generator. A loss function metric is used to measure the difference between the generated output and ground truth target image.

A critical aspect of image-to-image translation is ensuring the model generalizes well in response to previously unseen or unsupervised scenarios. Cycle consistency and unsupervised learning help to ensure that if an image is translated from one domain to another and then back, it returns to its original form. Deep learning architectures, such as U-Net and CNNs, are also commonly used because they can capture complex spatial relationships in images. In the training process, batch normalization and optimization algorithms are used to stabilize and expedite convergence.

This article is part of

What is GenAI? Generative AI explained

Which also includes:
8 top generative AI tool categories for 2025
Will AI replace jobs? 18 job types that might be affected
27 of the best large language models in 2025

Supervised vs. unsupervised image-to-image translation

The two main approaches to image-to-image translation are supervised and unsupervised learning.

Supervised learning

Supervised methods rely on paired training data, where each input image has a corresponding target image. Using this approach, the generated image system learns the direct mapping that's required between the two domains. However, obtaining paired data can be challenging and time-consuming, especially when dealing with complex image transformation.

Unsupervised learning

Unsupervised methods tackle the image-to-image translation problem without paired training examples. One prominent unsupervised approach is CycleGAN, which introduces the concept of cycle consistency. This involves two mappings: from the source domain to the target domain and vice versa. CycleGAN ensures the target domain is similar to the original source image.

For more information on generative AI-related terms, read the following articles:

What is an AI prompt engineer?

What is prompt engineering?

What is synthetic data?

What is LangChain?

What is multimodal AI?

AI models for image translation

Image-to-image translation and generative AI in general are touted for being cost-effective, but they're also criticized for lacking creativity. It's essential to research the various AI models that have been developed to handle image-to-image translation tasks, as each comes with its own unique benefits and drawbacks. Research groups such as Gartner also urge users and generative AI developers to look for trust and transparency when choosing and designing models.

Some of the most popular models include the following:

StarGAN. This is a scalable, single-model image translation approach, designed to perform image translation for multiple domains. Unlike traditional methods that require building separate models for each pair of image domains, StarGAN consolidates the translation process into a unified framework. This model introduces a novel architecture that can effectively learn mappings between different image domains, enabling versatile and efficient image translation.
CycleGAN. This is an unsupervised image-to-image translation model that has gained significant attention in the research community. It addresses the challenge of training data with unpaired images by using the concept of cycle consistency. By incorporating cycle consistency loss, which ensures the translated image can be mapped back to the original source image, CycleGAN achieves remarkable results in various image transformations without the need for paired examples.
Pix2Pix GAN. This GAN is a conditional generative model that learns a mapping from an input image and a noise vector to the output image instead of from random noise. This conditional approach enables more controlled and precise translations. The model uses a U-Net architecture, which combines an encoder and decoder network to capture detailed pixel-to-pixel features and enable high-quality image generation.
Unsupervised image-to-image translation (UNIT). The UNIT model focuses on unsupervised image translation and aims to learn mapping between different image domains without a paired training set of data. UNIT uses a U-Net autoencoder-like architecture and introduces a novel loss function that encourages the preservation of content representations during translation. This approach enables the model to generate visually appealing and semantically consistent images across different domains.

Image-to-image translation is a popular generative AI technology. Learn the eight biggest generative AI ethical concerns.

Continue Reading About What is image-to-image translation?

Generative models: VAEs, GANs, diffusion, transformers, NeRFs

CNN vs. GAN: How are they different?

GAN vs. transformer models: Comparing architectures and uses

Intersection of generative AI, cybersecurity and digital trust

How to prevent deepfakes in the era of generative AI

Dig Deeper on AI technologies

Search Business Analytics

Synthetic data vs. real data for predictive analytics
Synthetic data helps simulate rare events and meet privacy compliance, while real data preserves natural variability needed to ...
7 predictive analytics skills to improve simulation modeling
Predictive analytics skills such as statistical analysis, data preprocessing and model evaluation can help data professionals ...
Knime updates framework for agentic AI development
The open source analytics vendor is keeping up with competitors by providing features aimed at enabling users to create ...

Search CIO

9 common risk management failures and how to avoid them
As enterprises rework their business models and strategies to meet various new challenges, risks abound. Here are nine risk ...
Traditional vs. enterprise risk management: How do they differ?
Traditional risk management and enterprise risk management are similar in their aim to mitigate risks that can harm a company. ...
Domestic manufacturing policy emphasizes U.S. tech, products
Bringing manufacturing back to the U.S. might be a lofty goal for some products, but companies like Apple are making moves to ...

Search Data Management

Informatica adds MCP support, spate of AI-fueled features
With Model Context Protocol helping standardize how enterprises develop and deploy agents, support for the open standard is ...
What is data lineage? Techniques, best practices and tools
Organizations can bolster data governance efforts by tracking the lineage of data in their systems. Get advice on how to do so ...
Collibra's acquisition of Deasy targets unstructured data
With AI development on the rise, the vendor's latest purchase better enables customers to combine the complete array of relevant ...

Search ERP

6 benefits of using low-code ERP
Using low-code ERP can result in easier user training and more agility, among other benefits. Learn more about how the software ...
Ultimo adds digital labor to org chart, EAM system
The EAM vendor is building out a digital workforce at 'light speed' to become an AI-first business. It also wants to help ...
8 ways ERP software can improve customer service
By integrating sales, inventory and shipping data, ERP software helps companies avoid delays and stockouts. Learn more about how ...

Close