kirill_makarov - stock.adobe.com
Two years ago, I discussed the state of machine learning and how frameworks are preventing the more widespread adoption of AI in the field. Today, there's still a problem finding enough people to work with such lower-level tools to help companies create modern AI systems.
The main place where the industry is beginning to see visual access to machine learning is in early incorporation into business intelligence (BI) systems. However, those applications are leveraging canned models into the same interface as other, non-AI based BI models. What is still needed is the ability to help programmers and analysts build customer models based on machine learning. There has been some movement on this front and, despite its slow pace, the progress being made is real.
Assessing automated machine learning
In the current state of machine learning, automated machine learning is the general term used to describe frameworks that are trying to automate pieces of the machine learning process. The industry currently refers to machine learning in many ways. A lot of modern statistical analysis is considered machine learning by some, while much of what is described as machine learning by others is deep learning. When referring to automated machine learning, machine learning is typically referring to more complex models used in deep learning and some non-deep learning complex tools such as random trees. The technology is still in the early adopter phase and the definition of machine learning is fluid. The challenge to leveraging it remains complex, however, regardless of the definition.
The most complex problems in the automated machine learning process are earlier in the flow than the actual machine learning training. The first remains, as it has throughout the history of computing, as the collection and cleaning of data. Without the right data to utilize in training, no system, machine learning or otherwise, can provide accurate responses.
The other key issue is that of feature engineering. As with any programming, feature engineering in machine learning is defining what features of the data are needed in order to provide the machine learning system with the ability to perform accurate analysis.
As long as there has been data, analysts have had to define what features of the data should be used to turn it into information for each type of analysis. What's new in machine learning is the added complexity of data needed to feed some models and the large amount of data inherently needed for many machine learning applications. AI can be used to analyze the data and to create the initial set of features, saving the development team significant time. Assisting the analyst in finding and detailing the needed features is a critical benefit being added.
In the RDBMS world, feature engineering is defining what information from the full data set is needed and defining which columns are important for which analysis. In the machine learning world, it can be more complex, especially in the more freeform worlds of vision and language. For instance, in a visual system, color, edges and several other features need to be defined in order to understand an image. The problem is that automated machine learning is focused on the later stage, only on the running of the machine learning system, for training and a bit on runtime.
A new approach
In the history of computing, programmers and analysts have usually de-emphasized data collection. Expecting people who have labeled themselves as "data scientists" to get their hands dirty is not realistic. Simplifying that critical task is important. While there are tools that can simplify data collection, many haven't looked at the data sources from a machine learning point of view.
"Machine learning is a technology with significant potential for solving a variety of business problems," said Ryohei Fujimaki, Ph.D., Founder & CEO of dotData. "The challenge is to build a platform that better helps the people on the technical and the business sides better communicate in order to build systems that provide actionable information."
This is where a new approach is required. Companies are working to advance past automated machine learning, incorporating its strengths into a visual tool that can help with the earlier processes in the development flow. These new frameworks are being made in an attempt to increase the automation of the earlier steps and then tie them to automated machine learning, thus moving the frameworks forward and improving the approachability of typically high-level technologies.
The NoSQL is dead, and frameworks are evolving
An answer to some of the NoSQL claims of the previous decade, there are relatively new companies focusing on access to data via SQL. That is the access language of business, and it's needed. The people on the NoSQL bandwagon were those who didn't comprehend the difference between a data source and a query language.
Firms that focus on helping business have been starting with large amounts of data in relational databases, hence the use of SQL. In addition, they are looking to a future of working with columnar and other databases, for less structured data.
That some companies which focus on machine learning and newer technologies have realized that SQL matters is a good thing. AI/machine learning is moving out of the sandboxes of academics and a few companies and beginning to show an early maturity necessary to spread throughout software.
Companies such as PowerSoft and Gupta were drivers in moving regular programming from third generation to fourth generation, helping far more people create applications to solve real-world problems. And companies like dotData are looking at how to slowly move the framework model forward. While the early versions still require very technical people to be involved, it seems they have an architecture that can, potentially, help move the UX needed to leverage machine learning up to a level where business analysts can use it.