Sergej Khackimullin - Fotolia
Becoming a data scientist requires determination and hard work, but the rise of open online courses and open source coding languages has ensured that data science skills are more accessible than ever.
Universities and colleges provide online and in-person coursework on topics that are needed for all positions in the field. Aspiring data scientists can take courses in neural networks and deep learning, Hadoop platform and application framework and R and Python programming.
But now there are also open online courses hosted by unaffiliated sites that utilize the same tools as traditional classes and promote self-education. Whether you are completely new to the field or looking to specialize, there are multiple paths to consider.
Levels of data science
Peter Krensky, senior research analyst for Gartner's Business Analytics and Data Science team, divides data science expertise in three levels.
The top level consists of the true data scientist. These people are highly trained and capable of understanding and engaging with each part of the lifecycle in their field.
"They can source, prepare and engineer their own data as well as build their own models and apply machine learning," Krensky said. "Then they can also deploy and manage those models and understand what we call machine learning operationalization and how it works."
This is just a small portion of the operating field in data science but one that, for good reason, receives a lot of attention. Data scientists are capable of handling everything. For time and resource management reasons, however, they typically only perform some of the duties listed. Most often, they are supported by the second of Krensky's levels: citizen data scientists.
"A citizen data scientist is anybody with a quantitative or technical background that was not primarily focused in machine learning, that is upskilling into machine learning," Krensky said.
This is a much larger group than the top and includes anyone who can handle some, but not all, of the tasks of a regular data scientist.
And just below citizen data scientists are those who operate with a general understanding of the technologies and can relay that information to consumers. They support those in the upper levels, but most of their insights are at the consumer level.
Making data science more accessible
Stepping into and climbing this hierarchy is more possible today because of the dramatic increase in open source algorithms and open online courses. The traditional method of going to college and grad school specifically to become a data scientist isn't the only option, but still dominates in the uppermost echelon of data science.
"The barriers to entry have really been disintegrated by the amount of free, high-quality education that's available. And the amount of free desktop tools that are the same tools experts use," Krensky said. "All that stuff is downloadable for free on the desktop."
In Kaggle's State of Data Science and Machine Learning 2019 survey, the data scientist forum found that over 70% of their data scientist respondents had a degree above a bachelor's.
Universities have been increasingly offering night programs and courses on the essential skills required to join any level of the data science ladder. For those who are just finishing high school or for those seeking more familiarity with other parts of their field, continued learning is just as important as previous education in data science.
John Sullivan, an HR thought leader and author, discussed the hiring process, emphasizing just how crucial it is to be adaptive.
"It's continuous learning," Sullivan said. "Because whatever you know today will be obsolete tomorrow."
From the moment you get your degree, your knowledge starts to become out of date, Sullivan said. And companies should be looking for people with the abilities to handle the job rather than just the degrees. Being able to prove that you have these skills is a competitive advantage.
The availability of tools, the importance of certain coding languages and the demands on those in the field have all varied in recent years. There is a constant need for more education for citizen data scientists as well as traditional data scientists. This is clear from Kaggle's survey, where over 95% of data scientist respondents said they utilize media to improve their skills. This includes blogs, Kaggle itself and journals, as well as online course forums.
Employers should assist those already in the field when it comes to getting more education. The relationship between an employer and an employee can be mutually beneficial when it comes to upskilling. Companies can avoid the typical high salary of a seasoned data scientist through investing in their home-grown talent.
"Upskilling quantitative professionals you already have is a much more appealing option from that perspective," Krensky said. "People will pay for university fees, will pay if there's any fees on an online course or will do what I call time equity, where they devote a certain amount of the person's hours to their pursuit of an online degree in the field."
This in turn moves the employee further up the data science hierarchy and, therefore, increases their value.
Tools that can help you climb
According to Krensky, Python is the dominant language for programmers. When he joined Gartner in 2016 the company's survey cited 25% of corporate data science teams used the coding language. That number now sits at over 90%.
Acquiring proficiency in Python can be achieved through university programs or through a combination of the new open programs and online courses through platforms like Coursera, edX, Udemy and DataCamp.
"Through working with those and downloading the free tools and engaging in data science communities like Kaggle, they essentially get a free massive open online course education," Krensky said.
There perhaps isn't a way to fully replace formal education in this field at the moment, but the presence of these platforms and range of options can give more people more opportunities.
The skill level and productivity aren't going to be inherently the same between someone who takes one path or the other, but their tool kits are going to be nearly identical. The algorithms, techniques and machine learning frameworks are all going to be basically the same, whether in a class or online.
"I like to lay out people's options," Krensky said. "There's not a best practice. It depends on the individual, depends what their goals are and what they want to accomplish."