Kit Wai Chan - Fotolia
With the proliferation of vendors, doing a machine learning platform comparison can be a dizzying process. But most of the features prized by users and experts come back to the platform's flexibility.
"We just want to be able to pick the right tool for the right job," said Chris Robison, lead data scientist at Overstock.com.
In a webinar hosted by Databricks, the San Francisco-based company offering managed Spark products, Robison described how his team uses Databricks' software to score site visitors on their propensity to purchase. This involves first taking raw web log data and attaching features to it. The data is organized by individual user sessions, which are then ordered sequentially. This builds a picture of the actions that lead up to a purchase. Once the data is structured, the team trains machine learning algorithms to classify actions associated with a purchase.
The process requires several steps, covering everything from data preparation to machine learning model building and producing algorithms. Robison said having a data platform that can do all those actions helps move projects from proof of concept to production.
Another part of the platform that helps the team deploy models is its flexibility in supporting different programming languages. Robison said different members of the team use a number of languages, including R, Python and Scala. The choice depends on what the data scientist is most comfortable with and which is best suited to the specific part of the model building lifecycle. He said the fact that Databricks supports multiple versions of all these languages and that notebooks can switch between versions and languages is a major plus.
"There's no silver bullet for any one of these tasks, so why not try all of them," Robison said.
Of course, Databricks isn't the only machine learning platform offering these features. In the webinar, Forrester analyst Mike Gualtieri said he's currently tracking 47 different vendors offering products in this space. He described 10 characteristics and features to look for when doing a machine learning platform comparison:
- data preparation features;
- bank of pre-written algorithms;
- support for open source programming languages;
- a workbench-style interface;
- collaboration features that allow project sharing;
- deployment capabilities;
- model management tools to track the effectiveness of models in production;
- pre-written tools for common business problems, like customer churn modeling; and
- the ability of the vendor to execute on their promises.
"It's getting confusing out there," Gualtieri said. "There are a lot of tiny vendors in this space, but you have to look at the ability of the company to execute on their vision."
Databricks offering aims to unify machine learning frameworks