Rob Byron - Fotolia
Autoencoders are a common tool for training neural network algorithms, but developers need to be mindful of the challenges that come with using them skillfully.
Autoencoders are additional neural networks that work alongside machine learning models to help data cleansing, denoising, feature extraction and dimensionality reduction.
An autoencoder is made up by two neural networks: an encoder and a decoder. The encoder works to code data into a smaller representation (bottleneck layer) that the decoder can then convert into the original input. Autoencoders distill inputs into the densest amount of data necessary to re-create a similar output. This removes data noise, transforms raw files into clean machine learning data and detects anomalies.
While autoencoders have data-cleansing power, they are not a one-size-fits-all tool and come with a lot of applicational errors. Data scientists using autoencoders for machine learning should look out for these eight specific problems.
1. Insufficient training data
Autoencoders are an unsupervised technique that learns from its own data rather than labels created by humans. This often means that autoencoders need a considerable amount of clean data to generate useful results. They can deliver mixed results if the data set is not large enough, is not clean or is too noisy.
"To maintain a robust autoencoder, you need a large representative data set and to recognize that training a robust autoencoder will take time," said Pat Ryan, chief architect at SPR, a digital tech consultancy.
2. Training the wrong use case
Not only do autoencoders need a comprehensive amount of training data, they also need relevant data. Like many algorithms, autoencoders are data-specific and data scientists must consider the different categories represented in a data set to get the best results.
"If one trains an autoencoder in a compression context on pictures of dogs, it will not generalize well to an application requiring data compression on pictures of cars," said Nathan White, lead consultant of data science and machine learning at AIM Consulting Group.
It is vital to make sure the available data matches the business or research goal; otherwise, valuable time will be wasted on the training and model-building processes. In some cases, it may be useful to segment the data first using other unsupervised techniques before feeding each segment into a different autoencoder.
3. Too lossy
Additionally, autoencoders are lossy, which limits their use in applications when compression degradation affects system performance in a significant way. White said there is no way to eliminate the image degradation, but developers can contain loss by aggressively pruning the problem space. In this case, the autoencoder would be more aligned with compressing the data relevant to the problem to be solved. For example, in a predictive analytics application, the resulting encodings would be scored on how well they align with predictions related to common business problems in a domain.
4. Imperfect decoding
Semirelated to being lossy, the decoder process is never perfect. In some circumstances, Ryan said it becomes a business decision to decide how much loss is tolerable in the reconstructed output. This can be important in applications such as anomaly detection. In these cases, data scientists need to continually monitor the performance and update it with new samples. He stressed that anomalies are not necessarily problems and sometimes represent new business opportunities.
5. Misunderstanding important variables
The biggest challenge with autoencoders is understanding the variables that are relevant to a project or model, said Russ Felker, CTO of GlobalTranz, a logistics service and freight management provider.
Developing a good autoencoder can be a process of trial and error, and, over time, data scientists can lose the ability to see which factors are influencing the results.
Felker recommended thinking about autoencoders as a business and technology partnership to ensure there is a clear and deep understanding of the business application. For example, implementing an image recognition algorithm might be easy in a small-scale application, but it can be a very different process in a different business context. Data scientists need to work with business teams to figure out the application, perform appropriate tests and determine the value of the application.
6. Better alternatives
Data scientists must evaluate data characteristics to deem data sets fit for the use of autoencoders, said CG Venkatesh, global head of data science, AI, machine learning and cognitive practice at Larsen and Toubro Infotech Ltd., a global IT services provider. While the use of autoencoders is attractive, use cases like image compression are better suited for other alternatives. Alternatively, data scientists need to consider implementing autoencoders as part of a pipeline with complementary techniques. If there is a large number of variables, autoencoders can be used for dimension reduction before the data is processed by other algorithms.
Venkatesh recommended doing trial runs with various alternatives to get a sense of whether to use autoencoders or explore how they might work alongside other techniques. If autoencoders show promise, then data scientists can optimize them for a specific use case.
7. Algorithms become too specialized
Training autoencoders to learn and reproduce input features is unique to the data they are trained on, which generates specific algorithms that don't work as well for new data. The network can simply remember the inputs it was trained on without necessarily understanding the conceptual relations between the features, said Sriram Narasimhan, vice president for AI and analytics at Cognizant. This problem can be overcome by introducing loss regularization using contractive autoencoder architectures. Another approach is to introduce a small amount of random noise during training to improve the sturdiness of the algorithm.
8. Bottleneck layer is too narrow
A typical autoencoder consists of multiple layers of progressively fewer neurons for encoding the original input called a bottleneck layer. One danger is that the resulting algorithms may be missing important dimensions for the problem if the bottleneck layer is too narrow. This problem can be avoided by testing reconstruction accuracy for varying sizes of the bottleneck layer, Narasimhan said. Narrow layers can also make it difficult to interpret the dimensions embedded in the data. When this becomes a problem, he recommended increasing the bottleneck layer, even if there is a minor trade-off in reproduction loss.
When is an autoencoder wrong for you?
Autoencoders excel at helping data science teams focus on the most important features of model development. They can also help to fill in the gaps for imperfect data sets, especially when teams are working with multiple systems and process variability. But Felker suggested teams consider other approaches when they run into the following pitfalls:
- Difficulties scaling, since autoencoders may need to be specific to the data set that is used as an input.
- Low investment, when data science teams are skipping the step of becoming well versed and detail-oriented about how the autoencoder fits into the data preparation pipeline.
- Data loss, since autoencoders might eliminate important information in input data because a human is not scrubbing it with intuition.
- High sensitivity, since autoencoders can be more sensitive to input errors than manual approaches.
- Time constraints, since there may be no appreciable difference in the output or speed using an autoencoder.
- Complexity, as an autoencoder is an added layer of difficulty and management that might not be needed.
- Irrelevance, since autoencoders capture as much information as densely as possible but may not be relevant to what you need.