variational autoencoder (VAE)
What is a variational autoencoder (VAE)?
A variational autoencoder (VAE) is a generative AI algorithm that uses deep learning to generate new content, detect anomalies and remove noise.
VAEs first appeared in 2013, about the same time as other generative AI algorithms, such as generative adversarial networks (GANs) and diffusion models, but earlier than large language models built on BERT, the GPT family and the Pathways Language Model.
Suited for generating synthetic time series data that trains other AI algorithms, VAEs are a top choice in performing signal analysis to interpret IoT data feeds, biological signals like EEG or financial data feeds.
VAEs are also suitable for generating text, images and video. However, they are more likely to complement other models such as GANs, stable diffusion -- an innovation on diffusion models -- and transformers when generating different kinds of content.
VAEs combine two types of neural networks, much like GANs. However, they combine two distinct kinds of neural networks that operate differently. In the case of VAEs, one network finds better ways of encoding raw data into a latent space, while the second -- the decoder -- finds better ways of transforming these latent representations into new content. In GANs, one neural network finds better ways of generating fake content while the second finds better ways of detecting fake content.
History of autoencoders
Autoencoders trace their history back to the 1980s and research into improving neural networks. The most popular neural networks at the time -- perceptrons and multilayered perceptrons -- used a supervised learning approach, which required labeling training data.
In the early 1990s, researchers began to explore ways to train neural networks using unlabeled data. This streamlined the development of certain applications and enabled new use cases. One line of research focused on combining neural networks for encoding and decoding data more efficiently. Researchers named these autoencoders, since they could automate the process without having to label the data.
The simplest autoencoder trained one encoder network to map input data into a compressed latent representation, and a second decoder network to reconstruct the original data from the latent space. These early networks could compress and reduce noise from data.
In the early 2000s, researchers began exploring different ways of building neural networks by using more neurons in each layer to correspond to patterns in the data. Researchers called these sparse autoencoders because they required only a subset of neurons to model a representation of the data. This helped reduce overfitting, which limited the adaptability of the network to new circumstances. Sparse autoencoders also improved interpretability, since the richer network of connections made it easier to connect features in the underlying data with decisions.
Starting around 2010, researchers began to explore how they could apply deep learning approaches to craft autoencoders with multiple hidden layers, which could allow for complex representations from the data. Further research explored ways to add specialized denoising autoencoders for removing noise, as well as contractive autoencoders for improving the robustness and generalizability of autoencoders.
In 2013, Diederik P. Kingma and Max Welling introduced VAEs in a paper called "Auto-Encoding Variational Bayes." Their key innovation was to add variational inference that can act for the probability distribution of changes in the input data signal. The original paper showed how the technique could generate realistic-looking faces and handwritten numerical digits. Researchers subsequently developed various refinements on top of the new approach to improve the performance of VAEs.
Autoencoders vs. variational autoencoders
Autoencoders are an older neural network architecture that excel at automating the process of representing raw data more efficiently for various machine learning and AI applications. Plain, vanilla autoencoders are helpful in codec creation for compressing data and detecting anomalies. However, they are only useful for finding better ways of storing and reconstructing the original data more efficiently.
The key innovation of VAEs was a new probabilistic model that helped generate new content similar to -- yet different from -- the original content. In VAEs, the intermediate layer provides a way to represent data in a probability field that enables the layer to store more varieties and with greater precision. For example, it can represent faces or images of numerical digits with smoother features.
Early applications of autoencoders included dimensionality reduction and feature learning. Dimensionality reduction involves finding a way of representing a data set more efficiently using fewer variables. Feature learning is the process of identifying the appropriate set of mathematical relationships within a data set for a particular machine learning problem.
Over the years, researchers have integrated autoencoders into other AI and machine learning algorithms to improve precision and performance. Autoencoders are suitable for image classification, object detection and removing noise, as well as independent component analysis kinds of applications, such as filtering out one voice at a cocktail party or distilling the vocals and instruments from a music track.
Types of autoencoders
There are several types of basic autoencoders, including the following:
- Sparse autoencoders. These are one of the oldest and most popular approaches. They are suitable for feature extraction, dimensionality reduction, anomaly detection and transfer learning. They use techniques to encourage the neural network to use only a subset of the intermediate neurons. This surplus of unused neurons gives them the flexibility to identify and learn a more efficient representation of data.
- Denoising autoencoders. These learn ways to reconstruct the original data from a noisy data stream. They are often used to clean up low-light images, recognize speech and preprocess IoT data.
- Contractive autoencoders. These specialize in learning a representation that can adapt to small changes in the input data. This helps them better adapt to unseen data. Researchers use them to improve the interpretability of neural network models by highlighting the most salient features in the data set responsible for results.
How do VAEs work in neural networks?
Both VAEs and autoencoders use a reconstruction loss function to tune the neural networks using gradient descent. This optimization algorithm adjusts the weights of the neural network connections in response to feedback about the network's performance. The algorithm rewards neural network configurations with a lower loss function since they are more similar, while a higher loss function is penalized. This training process lets the autoencoder capture the underlying structure of the training data and model it into the neural network.
A traditional autoencoder represents the input data in the latent space using a regularized field of discrete numbers. In contrast, a VAE uses a probabilistic field that represents the input data in the latent space using a statistical distribution of the mean and variance of the data. The VAE also introduces a new measure called the Kullback-Leibler (KL) divergence function. The KL divergence represents differences between the learned distribution and a predetermined statistical distribution.
This starting, prior distribution may be preselected from common statistical occurrences or learned from a data set. Once both VAE and autoencoders are trained, the resulting neural network can be configured into an inference engine for processing input.
In a classic autoencoder, the intermediate latent space signifies the input data as discrete points. It will recreate the original data when the appropriate input feeds into the inference engine. But it will fail when anomalous data is input, making it a good anomaly detector. In a VAE, slight variations in input data can generate entirely new content representative of the patterns found in the training content.
Future of VAEs
Both autoencoders and VAEs are continuing to evolve. Researchers are continuing to explore better latent space representations that promise to improve the expressiveness of the learned representations. This could enhance the performance and interpretability of autoencoders and VAEs.
There is also considerable research into how researchers can combine both techniques with other generative AI algorithms to improve representations of signals and patterns found in the raw data. Moreover, both techniques could play a role in labeling or otherwise processing data to improve the training process for other AI and machine learning algorithms.
VAEs are also likely to see continued adoption in applications such as synthetic data generation, data augmentation and data preparation in manufacturing, energy, healthcare, finance and robotics. Future innovations may also focus on the performance and quality of VAEs for generating more types of content.