red150770 - Fotolia
As enterprises establish data-centric cultures for decision-making and planning, data scientists continue to grow in importance to businesses around the world. But organizations can't hire data scientists fast enough, as the field of qualified candidates remains highly constrained.
In order to cope with this data scientist shortage, enterprises are taking a variety of approaches to get as much as they can from the few data professionals they can find and retain.
A lot of the work typically done by data scientists is focused on data management and operational tasks like identifying data sources, merging data sets and validating data quality. These tasks are not the high-value work that data scientists are generally hired to do. That's changing as more automation efforts comes into the enterprise.
"Model development, as well as model operationalization, can be significantly simplified by automation," said Ryohei Fujimaki, CEO and founder of DotData, an automated machine learning (ML) software company based in San Mateo, Calif. "New data science automation platforms will enable enterprises to deploy, operate and maintain data science processes in production with minimal efforts, helping companies maximize their AI and ML investments and their current data team."
According to Matthew Baird, founder and CTO of AtScale, an automated data engineering software company, some of the most promising developments in data science automation are in the area of autonomous data engineering, which automates data management and handling tasks.
"Such advances come in the form of 'just-in-time' data engineering -- automation that essentially acts like the perfect data engineering team if they had all that knowledge and complete input to data handling," Baird said, "including understanding how to best leverage underlying data structures of various databases, their unique network characteristics, data location, native security setup and policies."
Emphasizing self-service analytics
All this added data management and modeling automation is meant to not only serve to get the most out of senior data scientists but also democratize data resources for citizen data scientists. Scaling out data exploration with self-service analytics is another popular method of dealing with the data scientist shortage.
"The combination of autonomous data engineering advances and the increasing enablement of citizen analysts via self-service analytics are freeing valuable data science and data engineering resources to focus on higher-value activities such as building the next in machine learning or artificial intelligence models," Baird said.
Creating cross-functional teams
At the same time, enterprises are bumping into the limits of self-service analytics tooling and automation.
Chris NicholsonFounder and CEO, Pathmind
"Every tool that simplifies data science also limits the flexibility and options of the users, which means that certain complex tasks requiring customization are impossible," said Chris Nicholson, founder and CEO of Pathmind, a deep learning software company. Nicholson believes this reality has led many companies exploring new team strategies to get more out of their limited data experts.
"Many companies respond to the scarcity of data scientists by creating cross-functional data science teams that work with many business units across an organization or by hiring external consultants," Nicholson said. "Often what limits the value of data science in an organization is not the scarcity of data scientists themselves but the data that the organization gathers and how it lets people access and process that data."
Cross-functional teams can help companies get around fragmented data silos that have been created due to technical and internal political hurdles that can be overcome when the right stakeholders work together in the same teams, Nicholson said.
This can also alleviate a common problem that looks like a data scientist shortage but is even more fundamental -- namely that too many data science projects look unmanageable because they have no clear path to business value.
"Too many projects are wild goose chases where you throw a bunch of data to the data scientists and say, 'See what you can make of this,'" said Sten Vesterli, principal consultant at More Than Code, an IT consulting firm based in Denmark. "We've seen more than 80% of all data science projects fail to move from the lab into production code, and companies need to allocate their data scientist to the most high-value business goal."
Defining data science roles better
One of the big issues impeding effective recruitment of data scientists is that enterprises are making the data science title and role far too broad, said Amy Hodler, director of graph analytics and AI programs at graph database company Neo4j.
"This makes it difficult to find the right fit for any organization and means new employees have a harder time understanding and aligning to business objectives," Hodler said.
She believes that in the coming year, many organizations will start diversifying their data science-related titles, creating subcategory job focuses and more tightly focused job requirements.
Hodler also believes the market will start responding to the data scientist shortage this year with more internal training of existing employees who exhibit any potential or desire to pivot into data science. This is going to be a hit-or-miss tactic, as organizations will have to be strategic about the specific skills they nurture in their budding data scientists, she said.
"A long-view mindset is required to clearly evaluate and define required skill sets in a way that balances not just the tools/approaches that are hot today but also investing in core concepts that can be built upon for years ahead," Hodler said. "Pairing junior and senior data scientists will become crucial to evolving and retaining these employees for the next few years."