The darling of artificial intelligence -- the technology referred to most by vendors and media outlets -- is machine learning. In fact, the technology is so popular today that some companies use the terms machine learning and AI interchangeably.
But the technologies are not the same. While artificial intelligence can refer to anything from software bots to actual robots, machine learning specifically refers to "anything that uses statistical models to infer patterns from data," said James Kobielus, an analyst at SiliconAngle Media Inc.'s research outfit Wikibon.
A looming question for CIOs of companies that are planning to make machine learning a competitive differentiator may be how to support its advanced techniques. Rather than reinvent the wheel, CIOs can take a page out of the Wayfair LLC playbook. To enable enterprise machine learning, Ben Clark, the e-tailer's chief architect, worries less about data quality and more about data accessibility.
The rise of machine learning
Even at companies where there is no formal data science practice, machine learning is no stranger. Kobielus said it's everywhere -- underpinning chatbots and conversational user interfaces, analyzing IT operations logs and detecting fraud. "I think, in a lot of ways, CIOs have a lot of machine learning under their broad scope," he said.
And its presence is poised to expand -- greatly. A forecast from market research firm MarketsandMarkets Research estimated that the machine learning market size will grow from $1.41 billion in 2017 to $8.81 billion by 2022. Another forecast from Grand View Research estimated that the market for deep learning, a subset of machine learning, will be worth $10.2 billion by 2025.
But, in many enterprise deployments of machine learning, the technology is a feature that lives in the background of an external application offering. While investing in tools that incorporate machine learning will certainly help companies stay current, it won't equate to a strategic differentiation in the market, according to Brent Leland, co-founder and partner at Cimphoni Consulting LLC and former CIO at Trek Bicycle Co.
"For the majority of companies, I think that's going to be their exposure to machine learning," he said. "But that's not going to give you a competitive advantage."
The good news is that the time is ripe to build an enterprise machine learning competency. Gartner's 2018 survey of more than 3,000 CIOs and senior IT executives reported that only 4% of companies have invested and deployed artificial intelligence. More than three times as many respondents, 14%, reported no interest in the technology to date.
If companies want to hedge their bets on enterprise machine learning, Leland said IT has a part to play as an enabler but, in most cases, it won't be tasked with the heavy lifting. "If you really look at what it takes to do machine learning -- to do really robust algorithms -- that's not IT, that's math," he said.
Kobielus agreed, saying that, "traditionally, CIOs have not needed to grapple with the data science and the advanced analytics with which we associate machine learning. It's not traditionally been a core piece of what you put on someone's desktop to help them do their jobs better."
Still, Kobielus continued, if machine learning is going to become a core capability for the enterprise, it will require an incredible amount of data and produce a sprawling portfolio of machine learning models and variations of models that need to be managed. "[Machine learning] doesn't magically spring up from the earth," he said. "Or, rather, these are assets, just like code and metadata are, that need to be architected, built, optimized and debugged."
James Kobielusanalyst, SiliconAngle Wikibon
Indeed, Kobielus said the modern application development team is adopting a kind of "unified pipeline" approach for production, development, debugging, testing and deployment that relies on the expertise of coders, data scientists and even domain experts.
"What I'm getting at is the [machine learning] pipeline is a collaborative process that's ongoing in something called DevOps and that these are, increasingly, real-time pipelines," he said. "[Machine learning] models are being built all of the time, they're being tested all of the time, and old models are being decommissioned all of the time in favor of new ones."
Underneath that pipeline, and the foundation on which enterprise machine learning is built, is often a data lake, Kobielus said. He advised CIOs build and maintain data repositories where structured, unstructured and streaming data can be accessed, combined and interrogated in a raw form. "Whether it's on Hadoop or SQL [Server] as a secondary, data lakes are critically important as a best practice," he said.
Data accessibility is crucial
Wayfair's Clark considers access to data as one of two essential ways that he supports those who experiment with advanced techniques such as machine learning.
"We have a lot of effort going into this general area of making sure that the people who want to do analysis of this type as part of a data science or machine learning effort are able to start asking their questions as soon as they have them and don't have to commission a kind of big [extract, transform and load] process in order to get going," he said.
Clark, who started the company's data science initiative seven years ago, oversees three main platforms: a traditional SQL Server data warehouse for online transaction processing (OLTP), a massively parallel Vertica database and a distributed computing Hadoop framework. "We have a lot of data flowing from OLTP systems or other systems into Hadoop and Vertica for the purposes of making [the data] readily available for this type of work," he said.
He said the three platforms, together, are "a pretty good set" of basic infrastructure and data platform building blocks needed to enable the company's data scientists and analytics practitioners.
The second way he supports data scientists is not by ensuring the data they're working with is clean or even high quality. Instead, he makes sure that the tools needed to poke and prod the raw, unsanitized data are available for use. "The data scientists who I know mostly feel that when they're exploring the data in its messy state is when they have some of their most important insights," he said.
He called it a paradoxical situation where highly skilled data practitioners who are hired to build predictive models and perform other advanced analytics techniques are also asked to perform janitorial-type work on the data itself.
"So, they wade through it, they waste a little time, but then they're clever about creating tools for themselves, including artificially intelligent tools," he said, "but they do that from a position of understanding the messy data sets that they get."
His job, he said, is to give them the tools they need to do that work as efficiently and as effectively as possible.