Melpomene - Fotolia
Tools fill gaps in predictive modeling and machine learning
Machine learning platforms can bring the lone data scientist into the overall workflow. Updates to tooling also let that data scientist use familiar development interfaces.
For many companies, what comes after big data is operational business models that use big data to repeatedly update existing products or create new ones.
But it's not easy to achieve success with predictive modeling, machine learning and decision-making. So, data science and related tools vendors continue to build out their platforms to confront the challenges the new machine learning models present.
Recent platform updates look to fill gaps in the predictive modeling and machine learning lifecycles by enabling developers, data analysts, data scientists and others to more easily participate in the process.
Subset of data science today
Machine learning has become a popular subset of data science today, according to Gartner. The firm's recent research suggests it requires -- depending on the organization -- a breadth of tooling to support the lone data scientist, corporate data science teams, application developers, analysts and even citizen data scientists.
Machine learning updates to data science platforms will be a big part of a predictive and prescriptive analytics mix that Gartner, in its 2018 Magic Quadrant for Data Science and Machine Learning Platforms, estimated will attract 40% of enterprises' net-new investment in analytics and business intelligence by 2020.
Gartner listed Anaconda, Dataiku, Domino Data Lab, H20.ai, IBM, SAS Institute and others as competitors in the fields of data science and machine learning platforms.
Predictive modeling for data science
Moving beyond the solo data scientist and bringing machine learning to predictive analytics for other members of data science teams is a big push for Domino Data Lab.
On March 20, the company updated its flagship platform with activity feeds that allow team members to view contextual data describing data scientists' work, as well as an experiment manager that acts as a single system of record for model-building activity.
Also new is Domino Datasets. This is a data store that organizes resources to streamline repetitive data preparation and preprocessing for model development.
The Domino Data Lab platform enables data scientists, developers and data end users to each have their own view of model-building projects, according to Josh Poduska, chief data scientist at the company, based in San Francisco. With Domino Datasets, data can be associated with specific projects, but still exist on its own. That reduces reworking in the model development process as users try various tests and tweaks, he said.
The Domino Datasets make it easier to share results with colleagues, according to user Luiz Scheinkman, principal software engineer at Numenta, a neuroscience technology research firm in Redwood City, Calif. That sharing is important if machine learning is to move beyond the narrow confines of the expert data scientist.
The main advantage of Domino Datasets, Scheinkman said, is its ability to shrink the amount of time needed to train staff in mounting the learning models to the cloud.
This reduces steps and allows various team members to "store large amounts of data, create multiple versions, preprocess it doing machine learning and then run it on the cloud," he said.
Machine learning in the enterprise
Dataiku is also updating its data science platform to promote better machine learning in the enterprise. Enhancements to the Dataiku machine learning platform unveiled on March 6 focus on supporting customized coding environments, project duplication tooling and data governance policy tracking, according to Jed Dougherty, lead data scientist at Dataiku, based in New York.
Dougherty said he sees momentum for machine learning and AI in enterprises based, in some part, on fear of being left behind as the new technology quickly matures.
Jed Doughertylead data scientist, Dataiku
"Large companies have educated workers who are potential data science users, but they don't yet have the skills. Now, enterprise-level AI can help you leverage the skills of your data analysts," he said. "You don't need a Ph.D. in AI to do that."
Dougherty said companies like Dataiku are trying to make user interfaces for machine learning and predictive modeling more visual, "so someone who knows the business very well can leverage the algorithms."
But expanding into visual interfaces doesn't mean top coders -- people who Dougherty called "strong users" -- will be left out. In the 5.1 version of Dataiku, he said, coders are able to work in their familiar development environments, such as RStudio and PyCharm, at the same time that they use Dataiku resources.
Data science platforms fill lifecycle gaps. The platforms give everyone a single place to work and, at their best, can help organizations build out data science teams, Dougherty said.
At the same time, he indicated, the platforms can bring the individual data scientist into the overall workflow. Machine learning and predictive analytics, ultimately, are best achieved when everyone is working together.