Realizing a return on investment for data science projects often relies on data scientists' ability to fail quickly and then recover to deliver finished projects in a timely fashion. However, many of these projects take too much time and don't succeed.
Dennis Michael Sawyers, lead data scientist at SI Capital, explains in Automated Machine Learning with Microsoft Azure why companies aren't seeing the benefits of machine learning (ML) and AI projects and how they can address this issue.
ML project development has its share of difficulties
Companies often equate ML projects with regular software development projects, but in fact they require different approaches. Another major difference is that with ML, you never know what data you're going to need in advance, because data sets are tested to determine the correct ones.
Managers who lack a data science background often try to follow methods and timelines that are inappropriate for ML projects. Hard timelines don't work since data scientists face numerous uncertainties. Therefore, these projects don't succeed when managers don't allow adequate time and support.
Automated ML software can speed up the process
In the book, Sawyers elaborates on how new automated ML platforms mitigate these common ML-related issues, using AutoML from Microsoft Azure as the foremost example.
They provide the tools needed to surmount many of the hurdles data scientists encounter when building ML models. Specifically, they can automatically transform data, build models and tune the settings each ML algorithm has (called hyperparameters). The goal here is to automate much of the work data scientists do, allowing them to fail and succeed more quickly.
For example, this software transforms data automatically into cleansed, error-free data sets that are easy to understand. This is all done via intelligent "feature engineering" -- the process of altering data and making it suitable for machine learning algorithms. Also, to tackle the process of training a model, these new platforms train models using the most up-to-date algorithms.
The hope is that these new automated ML platforms will be widely used in the future, allowing data scientists to complete ML projects faster. In this excerpt from the first chapter, Sawyers outlines why businesses should embrace the general concept of automated machine learning.
Let's look at some of the advantages of AutoML:
- AutoML transforms data automatically: Once you have a cleansed, error-free dataset in an easy-to-understand format, you can simply load that data into AutoML. You do not need to fill in null values, one-hot encode categorical values, scale data, remove outliers, or worry about balancing datasets except in extreme cases. This is all done via AutoML's intelligent feature engineering. There are even data guardrails that automatically detect any problems in your dataset that may lead to a poorly built model.
- AutoML trains models with the best algorithms: After you load your data into AutoML, it will start training models using the most up-to-date algorithms. Depending on your settings and the size of your compute, AutoML will train these models in parallel using the Azure cloud. At the end of your run, AutoML will even build complex ensemble models combining the results of your highest performing models.
- AutoML tunes hyperparameters for you: As you use AutoML on Azure, you will notice that it will often create models using the same algorithms over and over again. You may notice that while early on in the run it was trying a wide range of algorithms, by the end of the run, it's focusing on only one or two. This is because it is testing out different hyperparameters. While it may not find the absolute best set of hyperparameters on any given run, it is likely to deliver a high-performing, well-tuned model.
- AutoML has super-fast development: Models built using AutoML on Azure can be deployed to a REST API endpoint in just a few clicks. The accompanying script details the data schema that you need to pass through to the endpoint. Once you have created the REST API, you can deploy it anywhere to easily score data and store results in a database of your choice.
- AutoML has in-built explainability: Recently, Microsoft has focused on responsible AI. A key element of responsible AI is being able to explain how your machine learning model is making decisions. AutoML-generated models come with a dashboard showing the importance of the different features used by your model. This is available for all of the models you train with AutoML unless you turn on the option to use black-box deep learning algorithms. Even individual data points can be explained, greatly helping your model to earn the trust and acceptance of business end users.
- AutoML enables data scientists to iterate faster: Through intelligent feature engineering, parallel model training, and automatic hyperparameter tuning, AutoML lets data scientists fail faster and succeed faster. If you cannot get decent performance with AutoML, you know that you need to add more data. Conversely, if you do achieve great performance with AutoML, you can either choose to deploy the model as is or use AutoML as a baseline to compare against your hand-coded models. At this point in time, it's expected that the best data scientists will be able to manually build models that outperform AutoML in some cases.
- AutoML enables non-data scientists to do data science: Traditional machine learning has a high barrier to entry. You have to be an expert at statistics, computer programming, and data engineering to succeed in data science, and those are just the hard skills. AutoML, on the other hand, can be performed by anyone who knows how to shape data. With a bit of SQL and database knowledge, you can harness the power of AI and build and deploy machine learning models that deliver business value fast.
- AutoML is the wave of the future: Just as AI has evolved from a buzzword to a practice, the way that machine learning solutions get created needs to evolve from research projects to well-oiled machines. AutoML is a key piece of that well-oiled machine, and AutoML on Azure has many features that will empower you to fail and succeed faster. From data transformation to deployment to end user acceptance, AutoML makes machine learning easier and more accessible than ever before.
- AutoML is widely available: Microsoft's AutoML is not only available on Azure but can also be used inside Power BI, ML.NET, SQL Server, Synapse, and HDInsight. As it matures further, expect it to be incorporated into more and more Azure and non-Azure Microsoft services.