What is a data scientist? What do they do?
A data scientist is an analytics professional who collects, analyzes and interprets data to transform it into actionable insights that can facilitate decision-making. The data scientist's makes sense of large amounts of raw data to help answer questions and mitigate business challenges.
The data scientist role merges several traditional and technical jobs, including mathematician, scientist, statistician and programmer. The work uses advanced analytics, such as machine learning and predictive modeling, and applying mathematical and scientific principles to collect, process, analyze and interpret data.
The data scientist job title emerged in 2008 on various social media sites; four years later, a Harvard Business Review article called it "the sexiest job of the 21st century."
Demand for data science skills has grown as companies seek useful information from massive volumes of big data, and hope to harness artificial intelligence (AI) and machine learning technologies in new types of analytics applications. As data volumes explode, many organizations now rely on data scientists to interpret data and present it in an actionable form.
What does a data scientist do?
In business settings, data scientists typically mine data for information used to predict customer behavior, identify revenue opportunities, detect fraudulent transactions and meet other business needs. They support organizations in a range of industries and sectors, including finance, healthcare, telecommunications, technology, media and entertainment, government, utilities and law enforcement.
Data scientists work on initiatives that use large amounts of data to develop and test hypotheses, make inferences and analyze areas such as markets, financial risks, cybersecurity threats, stock trades, equipment maintenance needs and medical conditions. They use mathematical and statistical techniques, computer science and machine learning capabilities, and their understanding a business or industry to uncover useful and actionable insights, patterns and trends from the data.
Data scientists are commonly tasked with finding and interpreting information that enables better marketing campaigns, customer service, supply chain management, and business decisions and strategies. They do this by analyzing sets of quantitative and qualitative data, depending on the needs of specific applications and organizations.
Sometimes, data scientists explore data without being given a specific business problem to solve. In these scenarios, they must understand both the data and the business well enough to formulate questions, generate hypotheses, do the analysis and deliver insights to business executives on ways to improve business operations, products or services.
In many organizations, data scientists are responsible for helping to define and promote best practices for data collection, preparation and analysis. Some data scientists develop AI technologies for internal use or for customers. These can be conversational AI systems, AI-driven robots and other autonomous machines, including key components in self-driving cars.
In general, the basic responsibilities of a data scientist include the following:
- Gathering and preparing relevant data to use in analytics applications.
- Using various analytics tools to detect patterns, trends and relationships in data sets.
- Cleaning and integrating data, and then loading it into a data warehouse, data lake or other storage repository.
- Developing statistical and predictive models to run against the data sets.
- Creating data visualizations, dashboards and reports to communicate findings to decision-makers.
Data scientists typically present their insights to decision-makers and other stakeholders as reports, stories, illustrations and other visualizations that show what the data means in nontechnical terms. This helps end users understand the data and use it to inform strategic planning, problem-solving and decision-making.
Besides analyzing data to glean insights, data scientists also handle the following tasks:
- Collecting and extracting structured and unstructured data from various sources using methods like scraping, manual data entry and real-time data streaming.
- Improving data quality by cleaning, deduplicating or combining it.
- Creating data models for predictive analytics and machine learning.
- Configuring and implementing data analysis tools.
- Writing algorithms to automate data processing.
Characteristics of an effective data scientist
Data scientists should have certain traits and skills that include the following:
- Intellectual curiosity.
- Critical thinking.
- A healthy skepticism.
- Good intuition.
- Problem-solving abilities.
- Creativity.
Data scientists typically work on a data science team that includes data engineers and data analysts.
Since they frequently communicate with diverse audiences, interpersonal skills and the ability to work well with others are critical requirements. Data scientists should be strong communicators who can use data storytelling capabilities to present and explain data insights to executives, managers and other employees. They need leadership capabilities and business savvy to understand the organization and its data requirements, and to help steer a data-driven decision-making process.
Technical skills and qualifications
Data scientists process and analyze data, and then uncover actionable insights. They must be able to complete a range of complex planning, modeling and analytical tasks in a timely manner. These requirements dictate that data scientists have mastery of the following skills:
- Data science tools and libraries.
- Interactive development tools, like Jupyter Notebook and GitHub.
- Big data platforms, such as Hadoop, Hive, Kafka and Spark.
- Statistical analysis tools like SAS and IBM SPSS.
- Programming languages, such as Julia, Python, R, Scala and SQL.
Technical skills required include data mining, predictive modeling, machine learning, deep learning, upfront data processing and data preparation. The ability to work with structured, semistructured and unstructured data is required, especially in big data environments with different types of data. Data scientists usually have experience with statistical research and analytics techniques, such as classification, clustering, regression and segmentation. Expertise in natural language processing (NLP) is another prerequisite.
Data scientist job postings typically call for the following skills and experience:
- Expertise in all phases of data science, from initial data discovery through data cleansing and model selection, validation and deployment.
- Knowledge and understanding of common data warehouse and data lake structures.
- Experience with using statistical approaches to solve analytics problems.
- Proficiency in machine learning frameworks.
- Familiarity with common data science and machine learning techniques, such as decision trees, K-nearest neighbors, naive Bayes classifiers, random forests and support vector machines.
- Experience with techniques for qualitative and quantitative analysis.
- The ability to identify new opportunities to apply machine learning and data mining tools to business processes for optimal efficiency and effectiveness.
- Experience with public cloud platforms and services.
- Familiarity with data sources, including databases and big data platforms, as well as public and private APIs and standard data formats, like JSON, YAM and XML.
- The ability to aggregate data from disparate sources and prepare it for analysis.
- Experience with data visualization tools, such as Microsoft's Power BI and Salesforce's Tableau.
- The ability to design and implement reporting dashboards that can track key business metrics and provide actionable insights.
- The ability to do ad hoc analysis and present the results in a clear manner.
Education, training and certifications
Most data science jobs require a bachelor's degree in data science or a related technical field. Many data scientists have advanced degrees in statistics, data science, computer science or mathematics. Candidates must have a basic understanding of the key concepts from these fields. Knowledge of concepts from other fields, like machine learning, deep learning, NLP and analytics, is preferred.
3Prospective and experienced data scientists looking to show readiness for a job in the field can take advantage of boot camps and online courses that educational platforms such as Codecademy, Coursera, Kaggle and Udemy offer. Many universities have data science courses. For example, UC Berkeley offers the following courses:
- Foundations of data science.
- Computational structures in data science.
- Principles and techniques of data science.
- Data engineering.
- Data mining and analytics.
- Probability for data science.
- Data and justice.
Various certification opportunities for beginners and experienced professionals are available through universities, technology vendors and industry groups. IBM offers a beginner-level Data Science Professional Certificate. Harvard University offers an introductory Professional Certificate in Data Science. The Data Science Council of America offers numerous globally recognized certifications for data scientists at all career levels.
Organizations can implement internal programs to train talented and interested professionals working in other positions or fields to become data scientists. Those employees might include database developers and software programmers, as well as traditional scientists and other experts in particular disciplines.
Data scientist salaries
Because the desired combination of analytics skills, personality traits and experience is still somewhat elusive, qualified data scientists can command high salaries. According to job posting site Indeed, the average base salary for data scientists in the U.S. in late 2025 was nearly $130,000, based on about 4,800 reported salaries. The high end of the range reported was nearly $208,000.
The U.S. Bureau of Labor Statistics (BLS) predicts the outlook for data scientist jobs is positive. In 2024, 245,900 such jobs were added to the U.S. economy. The BLS predicts this job category will grow 34% in the next decade, much faster than the average for all occupations.
Data scientist vs. data analyst
The role of data scientist is often confused with that of data analyst. While there is overlap in many of the job responsibilities and required skills, there are significant differences.
A data analyst's duties can vary . They collect, process and analyze data, and also create visualizations and dashboards to report findings. Some data analysts design and maintain the databases and other data stores used in analytics applications. Generally, they don't have all the technical skills required of the data scientist. They can also lack the business acumen and industry-level understanding data scientists need.
Data analysts often support the work of data scientists and report to a data scientist when working on analytics initiatives. Since data analysts have fewer responsibilities, they often earn less than data scientists – about $84,000 on average in the U.S. in 2025 vs. nearly $130,000 for a data scientist, according to Indeed.
Data scientists vs. citizen data scientist
In addition to skilled data scientists, many organizations rely on citizen data scientists to do some analytics work. They can include business intelligence (BI) professionals, business analysts, data-savvy business users and other workers who get involved in data science initiatives. The differences between the two groups include the following:
- Education. Data scientists have relevant degrees, while citizen data scientists have a variety of educational backgrounds and little or no formal training in data science. Typically, they've gained enough experience with analytics tools and systems to create models and do relatively complex analysis work.
- Coding. Citizen data scientists rely on software that includes prebuilt analytical modeling tools, drag-and-drop features and user-friendly algorithms to perform standard analyses. That doesn't prevent them from discovering patterns or data points, but professional data scientists can create complex custom algorithms and approach data analysis in more advanced ways.
- Salary. Citizen data scientists earn less than data scientists. Although some get additional compensation for data science work, most do not.
- Tools. The range and complexity of tools used for the roles differ. Data scientists use data science tools with built-in statistical modeling or machine learning capabilities, programming languages, data visualization tools, data processing platforms, databases and machine learning frameworks. BI professionals use BI tools to prepare, mine, manage and analyze data. These tools help them visualize data, identify actionable information and generate descriptive insights to support decision-making.
Major areas of data science work
The key aspects of a data scientist's job include the following disciplines:
- Data preparation. The first step in data science applications is to collect and prepare the data to be analyzed. This involves gathering, cleansing, organizing, transforming and validating data sets. Data scientists often work with data engineers in the data prep phase.
- Data analytics. The analytics work of data scientists is aimed at improving business performance and helping organizations gain a competitive advantage.
- Data mining. Advanced algorithms are applied to the data being analyzed. Data scientists use the results algorithms generate to create analytical models and uncover patterns and relationships in large data sets.
- Machine learning. Increasingly, machine learning drives data mining and analytics. Algorithms are built to learn about data sets and then find the desired information in them. Data scientists are responsible for training and overseeing machine learning algorithms as needed. Deep learning is a more advanced form that uses artificial neural networks.
- Predictive modeling. Data scientists must be able to create predictive models of different business scenarios to analyze potential outcomes. Models can be built to predict how different customers will respond to marketing offers or to assess possible indicators of diseases.
- Statistical analysis. Data science work also involves using statistical analysis techniques to analyze data sets. This is a way to explore data and find and interpret trends and patterns.
- Data visualization. Findings from data science applications are usually organized into charts or other visualizations so target audiences can easily understand them. Often, data scientists combine multiple visualizations into reports, interactive dashboards or detailed data stories.
Challenges that data scientists face
Despite having what's seen as one of the best IT jobs, data scientists face challenges. Their work is complex, and because they often aren't given specific questions to answer or research areas to focus on, data scientists can't always be sure that what they do will meet business needs.
Gathering relevant data for analytics applications can be difficult, especially in organizations with data silos. Incorrect or inconsistent data can skew results. To avoid that, rigorous data profiling and cleansing is required upfront to identify and fix data quality issues.
Data preparation is time-consuming: Data scientists spend 80% of their time finding and preparing data and only 20% analyzing it. This results in inefficient use of the data scientist's time and abilities, which impede on-time project completion and other outcomes.
Identifying and addressing biases in data science applications is a challenge, both in the data being analyzed and in algorithms and analytical models. Maintaining models and ensuring that they're updated when data sets or business requirements change can be problematic. Finally, analytics workloads can be hard to handle if companies don't invest in a full data science team.