Jeff Herman is not a locomotives expert. His background is in math and engineering, followed by additional study in data science. But when he was a data scientist for a railroad company, his job was to look at data about locomotives in order to make predictions about them.
Since he wasn't a locomotives expert, he needed to find people at the company who were.
"Having good relationships with those people would make my job easier," he said. And relationships don't just happen, he said. "A big part of teamwork is providing value to the people on the other side."
For example, he noticed that some analysts were producing Excel reports that required obnoxious amounts of manual labor.
"I was able to automate some of those processes with an automated Python script, and that helped build relationships," he said. "Teamwork is a two-way street. You provide value, you get value."
Relationship building and teamwork are key soft skills for data scientist jobs. There are other necessary skills such as communication, ethics and understanding business value, which aren't the skills people normally think of when they think about data science.
One skill every data scientist should have is the ability to communicate with a nontechnical audience.
"That's the one skill I think is the most important," said Herman, who is now a data science instructor at New York City's Flatiron School. "The main thing is being able to meet the audience where they're at."
In his previous job at the railroad, Herman would focus on results when explaining his work to the rest of the company. He would focus on the reasons the model was important.
"If you can't explain your model, your model isn't going to go into production," Herman said. "If I couldn't explain why this model was going to predict coal trains more accurately than how we used to predict coal trains, then it's just research I did for fun, and nothing important was going to come out of it."
But can people who aren't natural communicators get this skill? Absolutely.
Herman's students are required to make a slide deck for a nontechnical audience as part of their presentation in addition to a technical write-up.
"They'll present it to an instructor, and the instructor will pretend they're a nontechnical person and ask business-related questions," he said.
Outside the classroom, data scientists can practice explaining their projects to friends or family members, at Meetup groups or at Toastmasters events.
With the pandemic, a lot of Meetup events, technical conferences and other groups where a data scientist might go to make a presentation have gone online, but there are still opportunities for people to give presentations. And within companies, even if there is nobody in the office, there still will be presentations, Herman said.
"If I made a model, it's not going to go into production unless I present it, even if it's a Zoom type of meeting," he said.
Critical thinking skills are important in any profession, but as it pertains to data science, there is more urgency to use critical thinking when it comes to data sources.
"Spend a little extra time getting to know and understand your data set," Herman said. "Don't just accept it. If you're not sure about a particular column or value, do some research -- see if there's a reason for it."
Data scientists often practice their skills with personal projects and get their data sets from scraping websites or downloading files, he said.
Instead of jumping straight ahead into the analysis, this would be an opportunity to look at the data set and can help make or break a project in an enterprise setting.
Critical thinking isn't just about being critical of data sets. It's also about being critical of algorithms. And it starts with the most basic algorithm question of all: whether a problem requires machine learning or can be addressed with traditional statistics or some other approach.
"It's very easy to apply machine learning to everything, but in reality, only a few problems qualify for a machine learning solution," said Sachin Gupta, co-founder and CEO at HackerEarth, an online coding platform.
"One of the questions I ask often when interviewing candidates is to list the number of mobile applications they've used since they woke up in the morning," he said. "Then I ask their own opinion about which apps could benefit from machine learning to provide more value and why."
Data scientists often fall short when it comes to connecting theory and practice, said Bobby Rountree, data intelligence lead at Hitachi Vantara Federal, which provides technology services to federal agencies.
"The value is in helping customers have better ideas, make better investments and make better decisions," he said.
Rountree admits that when he first began his career, he didn't start out with a business mindset. Fortunately, this is a skill that can be learned.
"Being able to surround myself with people who thought about business first, that is how I was able to advance my career," he said.
Data scientists need to understand what makes sense in their particular industry, in their company and even in a specific department or job function. They also need to be able to ask the right questions to find out exactly what the customers or users want.
"Sometimes they have no idea of what they want, or they change their minds," he said. "You have to be able to adjust on the fly."
Data scientists may need to be investigators, said Kathleen Featheringham, director of AI strategy and training at Booz Allen Hamilton.
"The number one thing we get asked is, 'We want to do AI.' And we ask them, "Do what?" she said. "And just because you can, doesn't mean you should."
Once data scientists identify a business problem that needs solving, the other side of the business value question is whether the solution will be used. There might be cultural or management obstacles to adoption or other deployment issues that could kill a data science project even if everything else went right.
"People might rebel," Featheringham said. "They might think that robots are coming for their job. So you need to look at the psychological aspects and make sure you address both the technical and the human elements. The worst is creating something that nobody is going to use."
Another common problem is that when the users or customers are explaining the scope of the problem, they leave out key aspects of the workflow they're so used to doing that they don't even think about it.
Another often overlooked aspect of the business value of data science projects is in the management area. If a company sends people out to be trained in Python, but the performance assessments aren't changed and employees aren't rewarded for using their new skills, the training will have been wasted, Featheringham said.
Data scientists can start by trying to imagine themselves as one of their users, said Andrea Levy, data analytics lead at Alation, an enterprise data company.
Put on the figurative hat of someone else in the organization and ask yourself what they would care about, she suggested.
"Another way to better understand the big picture is by engaging with other teams in an informal setting," Levy said. Learn about what they do, ask them about the data they use and generate, she said.
Data scientists have a great deal of power when they're building predictive models. The choices they make in selecting data sets, in prioritizing some features over others, and in how they use data can impact the success of a project -- even in the viability of a company.
Perpetuating or exacerbating existing biases or creating new ones is just one potential problem. Privacy violations could result in bad PR, compliance violations or doom a company just as it's getting off the ground.
Just because data can be collected doesn't mean it should. Sometimes, the problems aren't immediately obvious.
"If you're working at a bank, you might not be comfortable predicting a loan based on gender," Herman said. "But a different feature might be highly correlated with gender."
Being able to explain your actions has a role to play here as well. It doesn't just make it easier to communicate the value of a model to business stakeholders, but it can also help people determine whether the decisions are made within ethical guidelines.
No data scientist can know everything, and even if they did, the profession is evolving so quickly that this state of full mastery will only last a moment.
Jeff HermanLead data science instructor, Flatiron School
"When you come across a new problem, you need to be comfortable doing new research and learning something new," Herman said. "That's what I call the lifelong learner or the hacker mindset."
Part of it is having a natural curiosity, said Rob Harper, chief science officer at Moogsoft, a San Francisco-based AIOps technology company.
"Sometimes data science can be more educated art than science," he said. "And, as with many fields, some breakthroughs happen by chance. Having perseverance and knowing when to keep looking -- in the right places -- is very important."
Data science soft skills make the difference
According to a survey released late last year by MIT and the Boston Consulting Group, 40% of organizations making significant investments in AI do not report business gains from AI.
The technology is there. What's missing are people who can identify areas where the technology can provide meaningful business value, in a way that works for all stakeholders. Because of this, the key lies in the soft skills for data scientist jobs, not just technical abilities.