metamorworks - stock.adobe.com
Many companies are hiring data scientists in the information-driven economy, but the profile of these candidates is changing rapidly. It's not that there isn't any room for Ph.D. statisticians or mathematicians who are also R or Python coders, it's that there aren't enough of them to go around. Another issue is that not all of those brainiacs have great people skills or the ability to solve business problems.
Does that mean companies should just hire anybody who calls themselves a data scientist? Certainly not, but companies are wise to think about their actual talent requirements because they may differ depending on whether a company is hiring its first data scientist or adding to an existing team.
Qualification requirements have changed
When Deloitte AI Institute executive director Beena Ammanath worked for E-Trade, there were data scientists (called analysts or statisticians at the time) who did all the data predictions and simulations; along with a BI team that produced historical reports and an ETL team. When Hadoop and machine learning started becoming popular, organizations looked for data scientists with doctoral degrees who were tasked with data collection, cleansing and preparation as well as setting up tables.
"They were horrible at it so you could never scale it beyond the lab environment," said Ammanath. "We still see companies having data scientists doing 12 to 15 different tasks, but the more mature companies are separating them out into data engineers, data visualization engineers, QA and testing, machine learning engineers [and] MLOps engineers."
That fragmentation of the data science role is actually a good thing, given the shortage of top-tier data scientists. Also, Ph.D. data scientists are expensive, so it's more economical to employ a range of positions.
Meanwhile, data science tools have become simpler to use, which enables more people to connect to data sources, prepare data, build models and analyze data.
"If you look at existing programs offering degrees in data science, they don't necessarily agree on what the qualifications are," said Alin Deutsch, chief scientist at graph database and graph analytics platform provider TigerGraph and director of the data science program at the University of California, San Diego (UCSD). "At [TigerGraph and UCSD] we want data scientists to have a foundation in several different areas which involve understanding the principles of these foundations."
Understanding data science principles is important because languages and tools are evolving so rapidly that whatever data scientists use today might not be what they use tomorrow. If a data scientist has a strong background in statistics, even an undergraduate degree might suffice.
However, if hiring a chief data scientist or chief AI officer, organizations should consider the differences between the roles.
"A chief data scientist is somebody who has a Ph.D., who has deep knowledge of the technology, who does hands-on coding and has knowledge of the latest and greatest in terms of AI research," Ammanath said. "A chief AI officer is someone who understands computer science and has an MBA."
When hiring a data scientist, Deutsch looks for object-oriented language fluency, experience with statistics and probability applications, knowledge of some machine learning techniques and commonly used algorithms such as for clustering and linear regression. Ideally, candidates should also have experience visualizing large data sets and presenting them to non-expert consumers, he added.
"They should not learn those on the job," Deutsch said. "As for how to apply them, it's ideally in an end-to-end pipeline where they had to start with data that was not clean, clean it, combine it, run machine learning algorithms over it and draw conclusions based on their statistical knowledge."
Candidates earn bonus points for having exposure to relational and NoSQL databases. More specifically, experience with or the ability to accelerate a query over large data sets by instituting indices and pre-materializing computation.
Alternatively, if you're hiring a data scientist who is a recent graduate, look for relevant school or extracurricular projects.
"If I had to give advice to somebody in school today, I would tell them, 'If you do a project at school and you want to be a data scientist, choose a project where data is the heart of the project, like building a recommendation system where you really have to deal with the complexity of data,'" said Ira Cohen, chief data scientist at Anodot, an AI platform provider that focuses on anomaly detection for business monitoring.
Verifying the fit
Organizations have different ways of testing data scientists to determine whether they're qualified for the position. For example, Cohen provides a candidate with a business problem, snippets of data and the desired outcome.
"I want to hear first I will do A, then I will do B," said Cohen. "Then I say, 'OK, you said A, describe it.' If you know what data science is and you know how to do it, this is natural."
Deutsch provides a problem to be solved asynchronously: The candidate takes the problem home and then returns with a solution.
"One could take a small project along these lines -- clean it up, learn something from it, give me a conclusion," Deutsch said.
Depending on what they use, some organizations test for knowledge of specific languages such as Python or R, or the candidate's experience with using technologies such as Snowflake or Spark. However, it's important to remember that a tool-only test is probably shortsighted, given the rate at which languages, technologies and tools evolve.
Fundamentally, problem-solving capabilities are critical regardless of how much education and experience a candidate possesses, because some of the most brilliant minds can work magic in a lab that doesn't translate well in production. What businesses actually need are data scientists who can help them meet their business goals.