How to increase the success rate of data science projects
Most data science initiatives don't produce value due to obstacles like poor data quality, but CNN senior lead of product analytics Bryce Macher has a plan for success.
The vast majority of data science projects are doomed to fail.
But they don't have to be, according to Bryce Macher, senior lead of product analytics at CNN.
In 2016, Gartner estimated that 60% of big data science projects fail to go into production and generate any value. A year later, Gartner analyst Nick Heudecker said 60% was far too conservative an estimate and the real number is around 85%.
Since then, despite advances in augmented intelligence and machine learning, Gartner has not changed its estimates. Meanwhile, a 2018 IDG study found that even with the application of AI capabilities, only one in three data science projects succeed.
Among the myriad obstacles preventing organizations from having success with data science projects are siloed data, a shortage of data science skills within the organization, poor data quality, starting a project without a clear goal and the absence of a data culture within the organization..
Macher, however, said the obstacles can be overcome.
Speaking on Aug. 17 on the opening day of Ai4, a virtual conference on artificial intelligence, Macher laid out a blueprint for how to overcome the overwhelming failure rate of data science projects and give them a chance to succeed.
"The focus, whether we talk about individual tactics or overall strategies, is really on data products," Macher said. "The data-product approach is about building internal products that deliver data to decision-makers, whether they are business decision-makers or customer decision-makers. That's the number one way to prevent some of the failure of data science projects."
He added that four strategies organizations should employ to build that focus on data products (applications and tools that lead to business processes and decisions) are:
- understanding that data science starts at data strategy;
- building a data science infrastructure for applications and not experiments;
- focusing on early growth applications of data science; and
- hiring data scientists who can actually do the data science.
Data science starts at strategy
One strategic step organizations can take to improve the chances of data science projects delivering value is to hire data scientists long before any project begins.
Organizations, however, often make the mistake of gathering their data, building up their data infrastructure and developing data operations strategies before bringing in data scientists, according to Macher.
"Because data science is a little more intense than analytics, for example, organizations end up with data strategies that don't account for the needs of data science," he said.
Data sourcing, he noted, is crucial to a data scientist, and having a data scientist on board when data is first gathered and curated will later enable that data scientist to more easily work with that data when the time comes to use it for data science purposes.
"Placing a data scientist at the key moment of data strategy is incredibly crucial," Macher said.
Bryce MacherSenior lead of product analytics, CNN
Data quality is also critical, he continued.
Including a data scientist as part of the data team to ensure the quality of the data will enable organizations to avoid repeatedly attempting data science projects built on bad data. Data catalogs are a tool that can help ensure data quality, providing a centralized place where organizations can document their data and easily find the data they require for their different needs.
"Data science has to be part of an organization's data DNA," Macher said. "Rather than hiring it last, it should be part of the data governance process. Putting a data scientist as a key stakeholder is going to supercharge the ability to build data science applications based on good data. It's also going to give teams a culture of thinking about data strategy."
Building a data science infrastructure
When building a data science infrastructure, it has to be part of the organization's entire digital ecosystem rather than localized, according to Macher.
A data science project can't start with models developed and trained on someone's individualized laptop.
"Starting at that micro level almost always sets you up for failure," Macher said.
When a project is started by a person on their own laptop, they're thinking about a model on a computer rather than a model that's part of an entire data infrastructure, he explained.
Data science projects, therefore, should start on the same cloud that will power applications.
In addition, when beginning a data science project, organizations should think about how the resulting application will be deployed on both a micro level with a single deployment point and a macro level with deployment resulting in applications that will be scaled across various departments.
"Building infrastructure for both easy, lightweight deployments and big application deployments puts deployment and productization at the core of every data science project," Macher said.
Finally, when building a data science infrastructure, ensuring a culture of experimentation is important, he said.
While it's important to reduce the failure rate of data science projects, it's still acceptable to have ideas that don't pan out. If significant time is devoted to projects that are doomed to fail because of poor data quality, that's one thing. But if they fail early because an idea wasn't quite right, that will lead to a culture of experimentation.
"That feeling of fail-fast is very crucial to success," Macher said. "Making sure your infrastructure supports the full breadth of possible failures, whether on the model side or deployment side, is very crucial to building a good data science culture that's focused on building infrastructure for applications."
Focusing on early growth
Growth and knowledge need to be prioritized, according to Macher. And that manifests in two ways.
First, the growth of the business needs to be a top priority for data science teams. Projects should be about reducing risk and optimizing opportunity. And second, their own growth and knowledge is critical.
When developing a data science strategy and undertaking data science projects for the first time, organizations should start small and then grow. The first projects shouldn't be massive undertakings with a broad scope. Instead, even if the eventual value gained from the first projects is minimal, small projects that succeed will eventually lead to bigger projects with the potential for bigger results.
"That's going to set your data science team's culture in the right direction," Macher said. "That first project you land is going to define the direction of your data science team. Making sure that there's an initial win focused on growth will not only set the team in the right direction but also jumpstart what we call the flywheel of machine learning."
Data scientists will build models that DevOps teams hand off to product managers. They'll get used to fuel growth, which will lead to the influx of more data that will feed the data scientists who can then build more and better models.
"And then that flywheel keeps spinning," Macher said.
Meanwhile, a mistake organizations often make happens at the hiring level. They hire someone with deep data science knowledge to be a leader and then fill in the positions below that top level.
Instead, they should hire what Macher calls people at the advanced middle who have room to grow. Those people are senior enough to have seen projects done before and know what pieces of infrastructure they need, but not so senior that they are removed from being hands-on.
"Organizations can then promote, grow and scale from there," Macher said.
Hiring a data scientist
Finally, if organizations want to reap the benefits of data science projects, they need to hire data scientists to do the work, according to Macher.
Data engineers and data analysts are important to organizations, but they're not data scientists. They have different skills than data scientists, and are trained to think in different ways.
Data engineers are often experts in machine learning, but they lack the advanced math and statistics skills needed to solve business problems, Macher said. Analysts, meanwhile, usually have the math and statistics skills, but not the technical skills.
Data scientists have both.
"We need to hire data scientists," Macher said. "They're not data engineers and they're not data analysts."
Beyond the different ways data scientists approach projects, data scientists are needed to mentor other data scientists and provide the expertise that results in action.
"Providing opportunity requires providing expertise," Macher said. "Even if we want to hire those promising engineers or those incredibly talented analysts, if we don't have the data scientist on the team with the data science experience that's required, then the engineers and analysts bring great engineering practices and analytics thought, but not a lot of data science solutions that gain traction."