jamesteohart - stock.adobe.com
Editor's note: This article was originally reported and published in April 2021. It was updated with new information in July 2022.
Data scientists are one of the most sought-after roles in corporate America today, because organizations, armed with the right talent, can drive more value from its data.
However, data scientist roles are evolving as a matter of technological innovation and market maturity. In fact, the titles of statistician, actuary and quant, depending on the industry, preceded the title of data scientist.
There are some challenges when it comes to determining how the data scientist role is changing, however. For one, despite the high demand for data scientists there aren't clear requirements for the job.
What is data science?
Data science, as defined by today's industry professionals, is the study and use of data to inform business decisions and create new customer-facing products. Data scientists are typically responsible for data analytics to find new insights. They often work with advanced machine learning models to predict future customer or market behavior based on past trends.
The ultimate goal of what businesses hope to get from data scientists isn't expected to change. But how data scientists accomplish those goals is likely to undergo substantial alterations in the years ahead.
Does data science have a future?
Experts have said that 80% or more of a data scientist's job is getting data ready for analysis. Now, technology providers sell platforms that automate tasks and abstract data into low-code or no-code environments, potentially eliminating much of the work currently done by data scientists.
"[The data scientist title] will probably fade into the background because more tools are becoming prevalent," said Kathleen Featheringham, director of AI strategy and training at management and IT technology consulting firm Booz Allen Hamilton. "To me, it's like website design years ago when you had to have people who really like code, but now you can go online and use a tool that will build your website for you."
Kathleen FeatheringhamDirector of AI strategy and training, Booz Allen Hamilton
Will AI and automation replace data scientists?
Predicting the future of artificial intelligence requires understanding its past. The earliest realm of data science -- analytics or stochastics -- incorporated probability theory and analysis into programming. The R language emerged as an open source equivalent of SASS and SRS, two ancient analytics packages that trace their lineage back to Fortran. Python's incorporation of similar packages made it the go-to language for combining the results of such data analysis with other components.
These gave way to visual pipeline tools such as Alteryx or Microsoft BI, which reduced the need for programming experience, yet required enough understanding of statistics to know what these packages were doing. It is unlikely that the need for competency in modeling such pipelines will ever fully go away, so while the notion of being a dedicated data scientist will fade, the need for subject-matter-expert analysts will continue.
On the other hand, the argument can be made that the field of machine learning engineering, which requires an understanding of higher-level mathematics, is already moving outside the realm of the data scientist. This falls into the realm of adaptive cognitive science, where programmatic neurons handle tasks such as speech generation, image recognition, contextual classification and similar areas.
Finally, graph cognition, which uses mathematical graphs to support inferential analysis, was outside the realm of "formal" data science for some time but is now being drawn back into the machine learning engineer role because pure machine learning solutions tend to be inadequate for building inferential systems. One area that is becoming especially intriguing today is neural networks as graphs, while the emergence of Bayesian and Markov blankets within graph systems offers an entirely novel way to manage predictive analytics.
As is typical of careers within the technology space, the data scientist as a distinct entity is fading away, but emerging careers demonstrating the advancement of programming into these areas are as important as ever.
How will quantum computing impact data science jobs?
Quantum computing and quantum information science are still in their infancy, but they represent a new market for data scientists.
"If you're doing a calculation on a classical computer and you have a bunch of initial inputs, you have to run them one at a time. On a quantum computer, you can run them through at the same time," said Patty Lee, chief scientist at Honeywell Quantum Solutions.
"You can't just take a classical computing algorithm and plug it into a quantum computer. You have to come up with new algorithms that take advantage of quantum mechanical properties and then you can extract the information out of your data that way," she said.
Quantum data scientists must understand quantum mechanics and how to use a quantum algorithm to solve a particular problem. However, Lee doesn't think they necessarily need an advanced degree in the subject.
"We need a lot of people to be in that space because there are people in the application side of businesses and quantum theorists who are well-versed in the quantum algorithms. We need someone in the middle to do the translation," Lee said.
Data scientist vs. data engineer jobs
In today's world, a company is better off having the right mix of data driven skills as opposed to the right mix of titles.
Still, titles help individuals and others understand the scope of their responsibilities and their pay scale. Even people who have achieved the coveted data scientist title may grow into another role because it suits them better or their company needs something else.
While it's more likely that a data engineer might become a data scientist in the U.S., the opposite trend is happening in the U.K., according to Rob Weston, founder of Heimdal Satellite Technologies.
"There's an expectation that they're going to work only on machine learning, which is absolutely not the case. How do I get the data ready? How is the data going to be moved to the pipeline?" Weston said. "The challenge is the volume and diversity of data are changing and therefore the ability to handle and move data around, that's an engineering problem."
Many organizations think they need a data scientist, but that may not be the case. Staffing firm ManpowerGroup is aware of this phenomenon, so it first asks customers what business problem they're trying to solve.
"A lot of people hear buzzwords and they want those buzzwords, but it's not really what they need," said Chuck Kincaid, a principal data scientist and product architect at Experis Solutions, a subsidiary of ManpowerGroup.
Kincaid said one of his biggest concerns now is candidates who list software tools on their resume they don't know how to use properly. Similarly, he warns of candidates who attempt to take full credit for a group project.
Basic qualifications for data scientists
The Data Science Association, a nonprofit professional association of data scientists, wants to set standards for data science certifications and licensure. From a career standpoint, it would mean that data scientists would need to meet predefined criteria to apply for a license and anyone who is not a licensed professional could not use the title legally.
Weston makes a point of verifying a candidate's qualifications and is often disappointed. For example, if he gives a candidate a hypothetical scenario, "49 out of 50" candidates will say they've never worked in the industry in which the hypothetical scenario takes place, rather than demonstrate their problem-solving prowess and come up with an answer.
"I interviewed a guy recently who had a massive CV that said data science, big data and lots of roles in all the areas we are looking for. We need highly sophisticated analytics, as we're dealing with data in the petabyte range," Weston said. "I said, 'We are using Python for most of our code. How can we use Python within EMR Spark? What libraries could we use?' He could not answer the question and had never even heard of PySpark. It's a fair question since his CV stated three years of experience doing exactly this."
Data science degrees and certifications
Many top data scientists tend to have advanced degrees in math or statistics and are masters at problem-solving. Others have a background in computer science, astrophysics or other subjects.
"Do I believe that data scientists must have those specific degrees? No. Absolutely not," Featheringham said. "There have been a lot of definitions, but it's inherently somebody who's curious."
Like any other role, a data scientist may evolve into something else, and there are a few indicators that will happen.
Ultimately, the role of the data scientist is changing, although exactly how it's changing is a matter of debate. Automated solutions are accelerating and simplifying some tasks, but they are not automating data scientists out of a job just yet. Meanwhile, other opportunities are emerging, such as quantum data science.
Will data scientist careers eventually disappear? Some think it will. However, in the meantime, there is plenty of opportunity for those who have mastered their craft.