What is automated machine learning (AutoML)?
Automated machine learning (AutoML) is the process of applying machine learning (ML) models to real-world problems using automation. More specifically, it automates the selection, composition and parameterization of ML models. Automating the machine learning process makes it more user-friendly and often provides faster, more accurate outputs than hand-coded algorithms.
AutoML software platforms make machine learning more user-friendly and give organizations without a specialized data scientist or ML expert access to machine learning. These platforms can be acquired from a third-party vendor, accessed through open source repositories such as GitHub or built in house.
How does the AutoML process work?
AutoML is typically a platform or open source library that simplifies each step in the machine learning process, from handling a raw data set to deploying a practical ML model. In traditional machine learning, models are developed by hand, and each step in the process must be handled separately.
AutoML automatically locates and uses the optimal type of machine learning algorithm for a given task. It does this with two concepts:
- Neural architecture search. This automates the design of neural networks. It helps AutoML models discover new architectures for problems that require them.
- Transfer learning. Pre-trained models apply what they've learned to new data sets. Transfer learning helps AutoML apply existing architectures to new problems that require it.
Users with minimal machine learning and deep learning knowledge can then interface with the models through a relatively simple coding language such as Python.
More specifically, here are some steps in the machine learning process that AutoML can automate, in the order they occur:
- Raw data processing.
- Feature engineering and feature selection.
- Model selection.
- Hyperparameter optimization and parameter optimization.
- Deployment with consideration for business and technology constraints.
- Evaluation metric selection.
- Monitoring and problem checking.
- Analysis of results.
Why is AutoML important?
AutoML is important because it represents a milestone in machine learning and artificial intelligence (AI). AI and ML have been subject to the "black box" criticism -- meaning that machine learning algorithms can be difficult to reverse engineer. Although they improve efficiency and processing power to produce results, it can be difficult to track how the algorithm delivered that output. Consequently, this also makes it challenging to choose the correct model for a given problem, because it can be difficult to predict a result if a model is a black box.
AutoML helps to make machine learning less of a black box by making it more accessible. This process automates parts of the ML process that apply the algorithm to real-world scenarios. A human performing this task would need an understanding of the algorithm's internal logic and how it relates to the real-world scenarios. AutoML, however, learns and makes choices that would be too time-consuming or resource-intensive for humans to do with efficiency at scale.
Fine-tuning the end-to-end machine learning process -- or machine learning pipeline -- through meta learning has been made possible by AutoML.
On a wider scale, AutoML also represents a step toward artificial general intelligence.
Pros and cons of AutoML
The main benefits of AutoML are as follows:
- Efficiency. It speeds up and simplifies the machine learning process and reduces training time of ML models.
- Cost savings. Having a faster, more efficient machine learning process means a company can save money by devoting less of its budget to maintaining that process.
- Accessibility. Having a simpler process allows companies to save money on training staff or hiring experts. It also makes machine learning a viable possibility for a wider range of companies.
- Performance. AutoML algorithms tend to be more efficient than hand-coded models.
The main challenge of AutoML is the temptation to view it as a replacement for human knowledge.
Like most automation, AutoML is designed to perform rote tasks efficiently with accuracy and precision, freeing up employees to focus on more complex or novel tasks. Things that AutoML automates -- such as monitoring, analysis and problem detection -- are rote tasks that are faster if automated. A human should still be involved to assess and supervise the model, but no longer needs to participate in the ML process step by step. AutoML should help, not replace, data scientists and other employees.
Another challenge is that AutoML is a relatively new field, and some of the most popular tools are not yet fully developed.
Different ways to use AutoML
AutoML shares common use cases with traditional machine learning. Some of these include the following:
- Fraud detection in finance, where it improves the accuracy and precision of fraud detection models.
- Research and development in healthcare, where it can analyze large data sets and draw insights.
- Image recognition, which is useful for facial recognition.
- Risk assessment and management in banking, finance and insurance.
- Cybersecurity, where it can be used for risk assessment, monitoring and testing.
- Customer support, where it can be used for sentiment analysis in chatbots as well as increasing efficiency in customer support teams.
- Malware and spam, where it can be used to generate adaptive cyberthreats.
- Agriculture, where it can be used to expedite the quality testing process.
- Marketing, where it can be used for predictive analytics, improving engagement rates and making behavioral marketing campaigns on social media more efficient.
- Entertainment, where it can be used as a content selection engine.
- Retail, where it can be used to improve profits and reduce waste and inventory carryover.
AutoML tool features
The following are some popular AutoML platforms:
- Google AutoML, Google's proprietary, cloud-based automated machine learning platform.
- Azure Automated Machine Learning, a proprietary, cloud-based platform.
- AutoKeras, an open source software library developed by the Data Lab at Texas A&M University.
- Auto-sklearn, which evolved from and replaced Scikit-learn, an open source, commercially usable collection of simple machine learning tools in Python.
Auto-sklearn and Azure are generally considered cheaper because they are usually less resource-intensive than the other two platforms. They rely strongly on known architectures and data they've already seen, meaning that they don't need the whole data set to work. They use classification and regression techniques to do this.
Google AutoML and AutoKeras, by contrast, are more adept at creating new models, but also more resource-intensive, as they normally require the whole data set. They use recurrent neural networks, convoluted neural networks and long short-term memory.