As 2018 gets rolling, it appears that various aspects of big data are morphing into machine learning and AI. The changes that machine learning models bring to big data analytics are not readily apparent.
To sort through recent developments, including data science and DevOps, reporter Jack Vaughan caught up with James Kobielus, lead analyst for AI, data science, deep learning and application development, at SiliconAngle/Wikibon. He had just finished a serious round of predicting 2018 with colleagues when we knocked on the door.
AI was asleep for a few years. Was it just waiting for big data to come along?
James Kobielus: Well, AI has been around a while. And, very much, it had been rule-based expert systems at the core of it. That meant fixed rules that had to be written by some subject matter experts.
What's happened in the last 10 years is that AI in the broad sense -- both in research and in the commercialization of the technology -- has shifted away from fixed, declarative, rule-based systems toward statistical, probabilistic, data-driven systems.
That is what machine learning models are about. Machine learning is the core of modern AI. It's all about using algorithms to infer correlations and patterns in data sets. That's for doing things like predictive analysis, speech recognition and so forth.
Much of the excitement more recently has been from neural networks -- statistical algorithms that in many ways are built to emulate the neural interconnections in our brains. Those too have been around since the 1950s, with a research focus.
In the last 10 years, [neural networks] have become much more powerful. One of the things that has made them much more powerful is there is much more data.
Much of that is unstructured data coming from the real world, meaning things like social media, for customer sentiment. That has come about as things like Facebook, LinkedIn and Twitter have become parts of our life. And there is value in being able to get inside your customer's head.
The frontier of that is deep learning; it's machine learning with more processing layers, more neural layers, able to infer higher level abstractions of the data.
Machine learning is exciting. At the same time, something could go wrong. What challenges will data analytics managers face when moving to these new technologies?
Kobielus: First of all, the fact is that this is tough stuff. It is complex stuff to develop and to get right. Any organization needs a group of developers who have mastered the tools and the skills of data science.
Data scientists are the ones that build, train and test these models against actual data -- that is, to determine if a model predicts what it is supposed to predict. It's not enough to build the algorithms; you have to train them to make sure they are fit for the purpose for which they have been built. And training is tough work.
You have to prepare the data -- that's no easy feat. Three-quarters of the effort in building out AI involves acquiring and preparing the data to do the training and so forth. The data sets are huge, and they run on distributed clusters. Often, Hadoop and NoSQL are involved. It costs money to deploy all that.
Conceivably, you might outsource much of this infrastructure to your cloud provider. Be it [Amazon Web Services], Microsoft Azure, IBM Cloud or whatever it may be. Once again, it is not cheap. Clearly, you need senior management buy-in to get the budget to hire the people and to acquire the technology to do this.
And these are not types of projects that get done, and that is it -- machine learning models have to be regularly revisited, right? And, isn't that where DevOps is coming into greater play?
Kobielus: Yes, you have to keep re-evaluating and retraining the AI models you have deployed. Just because you have built and trained them once, and they've worked at predicting the phenomenon you are looking at, doesn't mean they are going to work forever.
James Kobielusanalyst, Wikibon
You encounter what is called model decay -- it's been experienced by data scientists forever. Models become less predictive over time. That's simply because the world changes. The model behind predicting an item a customer may have clicked on three years ago in your e-commerce portal may not be as predictive anymore. There may be other variables predictive of response rate. So you end up retraining and redeploying.
And that demands an orientation toward AI in a DevOps workflow. To do all that is not trivial. That is, you need to create a workflow that is very operational. It means always being sure you have the best training data and the best-fit AI and machine learning models.
Top features to look for in machine learning platforms