Data science -- the amalgam of mathematics, statistics, computer science disciplines, machine learning, cluster analysis, data mining and visualization -- is no longer just the realm of data scientists. Small wonder, then, that it's become a popular topic of business leaders, economists, anthropologists and others. That said, those well entrenched in the intricacies of data science can find a flood of new titles on the market. Here is a small compendium of some of the best.
General data science books
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are
By Seth Stephens-Davidowitz
This book is targeted at those who seek greater self-awareness of how we express ourselves in our digital era. Its discoveries show how our digital actions -- how and where we search on the internet, for example -- belie our self-image. Every day, Stephens-Davidowitz notes, human beings searching the internet will amass 8 trillion gigabytes of data. This data reveals our fears, desires and behaviors as well as conscious and unconscious biases. For example, what percentage of white voters didn't vote for Barack Obama because he's Black? They also reveal insights into an array of aspects from economics to sports to sex. For example, does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls?
Stephens-Davidowitz received a B.A. in philosophy from Stanford, where he graduated Phi Beta Kappa, and a Ph.D. in economics from Harvard. The book is available on Amazon, on the author's website, Barnes & Noble and elsewhere.
Naked Statistics: Stripping the Dread from the Data
By Charles Wheelan
The second of three books in Wheelan's Naked series, Naked Statistics brings the arcane and soul-sucking world of statistics through wry observations and unexpected real-world applications. He explores, for example, how Netflix recommends movies for viewing and why contestants on "Let's Make A Deal" make the choices they do. Along the way, Wheelan clarifies key concepts such as inference, correlation and regression analysis. And, perhaps most importantly in this day and age of disputed polling practices, Wheelan explains how bias or carelessness can manipulate or misrepresent data.
Wheelan is a professor at the Harris School of Public Policy at the University of Chicago and former correspondent for The Economist. Naked Statistics can be purchased on Amazon, Apple Books, Barnes & Noble and elsewhere.
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
By Cathy O'Neill
O'Neill, a former Wall Street quant (an expert at analyzing and managing quantitative data) penned this exceptionally important book after working as a data scientist where she built models predicting people's purchases and clicks. After her early and well-credentialled career -- she earned a Ph.D. in math from Harvard, was a postdoc at the MIT math department and a professor at Barnard College where she published several research papers in arithmetic algebraic geometry -- she wrote Doing Data Science in 2013. She also launched the Lede Program for Data Journalism at Columbia in 2014 and she founded ORCAA, an algorithmic auditing company. One of ORCAA's principles -- that AI ethics cannot be automated -- explains the foundations she lays in Weapons of Math Destruction: “[T]here is no excuse for an algorithm to be racist, sexist, ageist, ableist, or otherwise discriminatory.”
Weapons of Math Destruction can be purchased on Amazon and Barnes & Noble. More information on the title can be found on the author's blog; more information on ORCAA can be found on the company's website.
Algorithms of Oppression: How Search Engines Reinforce Racism
By Dr. Safiya U. Noble
Dr. Safiya U. Noble, an associate professor at UCLA in the departments of Information Studies and African American Studies, argues that the combination of private interests in promoting certain sites, along with the monopoly status of a relatively small number of internet search engines, leads to a biased set of search algorithms that privilege whiteness and discriminate against people of color, specifically women of color. Noble reaches her conclusions -- that a culture of racism and sexism exists in the way discoverability is created online -- after analyzing textual and media searches and researching paid online advertising. Algorithms of Oppression was featured in the New York Public Library 2018 Best Books for Adults (non-fiction) and recognized by Bustle magazine as one of "10 Books About Race To Read Instead Of Asking A POC To Explain It To You."
Noble holds appointments in African American Studies, Gender Studies, and is a research associate to the Oxford Internet Institute at the University of Oxford. Currently, she is the co-director of the UCLA Center for Critical Internet Inquiry. Algorithms of Oppression can be purchased on Amazon, Barnes & Noble, Kobo.com and elsewhere.
Beginner data science books
Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
By Peter Bruce, Andrew Bruce and Peter Gedeck
While we've listed this book under the beginners category, remember everything is relative. Unlike the titles listed in the general data science category, Practical Statistics for Data Scientists assumes some knowledge of the R programming language and some exposure to statistics. The authors, Peter and Andrew Bruce, look to find common ground between data scientists, many of whom they argue have never had formal statistics training, and statisticians, who often lack a data science perspective. Practical Statistics for Data Scientists explains how to apply various statistical methods to data science and how to avoid their misuse. This title is available for purchase at Amazon.
Data Science from Scratch: First Principles with Python, 2nd Edition
By Joel Grus
Joel Grus, principal engineer at Capital Group and former software engineer at the Allen Institute for AI and Google, avows that to really learn data science you must understand the principles that underlie it. His idea is to show you how data science libraries, frameworks, modules and toolkits work by implementing them from scratch. Grus promises that if you have an aptitude for mathematics and some programming skills, he can help get you comfortable with the math and statistics at the core of data science, and with the hacking skills you need to get started as a data scientist.
More information from and about the author can be found on his blog. Data Science from Scratch can be purchased at Amazon. Code and examples from the book (requiring at least Python 3.6), can be found on GitHub.
Python for Data Science: The Ultimate Beginners' Guide to Learning Python Data Science Step by Step
By Ethan Williams
This book, one in a voluminous series by Ethan Williams, is for those absolute beginners who want to learn Python programming and its application for data science. A few Python libraries are introduced, including NumPy, Pandas, Matplotlib and Seaborn for data analysis and visualization. Practical examples and applications of each lesson are given, and the reader is equally encouraged to practice the techniques through exercises. In addition, references to relevant reading and practice materials are given.
An Introduction to Statistical Learning: With Applications in R
By Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
Now in its seventeenth printing, An Introduction to Statistical Learning is a follow-on to two of the authors' 2009 best seller, The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009). An Introduction to Statistical Learning, which is targeted at statisticians and non-statisticians alike, assumes only a previous course in linear regression and no knowledge of matrix algebra. This book provides an accessible overview of the field of statistical learning, used to sort out the vast data sets that have emerged in biology, finance, marketing, astrophysics and other fields in the past 20 years. Topics covered include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering and more. Color graphics and real-world examples are used to illustrate the methods presented. Each chapter contains a tutorial on implementing the analyses and methods presented in R.
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
By Aurélien Géron
Aurélien Géron, an artificial intelligence engineer and former Google product manager, looks to assist programmers with little knowledge of machine learning by using simple, efficient tools to implement programs capable of learning from data. By using concrete examples, minimal theory and two production-ready Python frameworks, Scikit-learn and TensorFlow, Géron offers an intuitive understanding of the concepts and tools for building intelligent systems. Readers will start with simple linear regression and progress to deep neural networks. Exercises are offered in each chapter to help the reader apply what's been learned. Hands-On Machine Learning explores neural networks and several training models, including support vector machines, decision trees, random forests and ensemble methods.
Python Crash Course for Data Analysis: A Complete Beginner Guide for Python Coding, NumPy, Pandas and Data Visualization
By AI Publishing
The book is for those who are new to Python and data science. Its main focus is on hands-on learning. AI Publishing, which offers a large library of AI titles, proffers that readers can shorten the learning curve by using hands-on tools -- including example Python codes, references and exercises -- available on the publisher's website, at no extra cost. The topics covered include:
- Introduction to data analysis
- Python for data analysis -- basics and advanced
- IPython and Jupyter notebooks
- NumPy for numerical data processing
- Pandas for data manipulation
- Data visualization
Advanced data science books
Pattern Recognition and Machine Learning
By Christopher M. Bishop
Christopher Michael Bishop -- the laboratory director at Microsoft Research in Cambridge and professor of computer science at the University of Edinburgh and a fellow of Darwin College, Cambridge -- presents this is the first textbook on pattern recognition to present the Bayesian viewpoint. The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning. No previous knowledge of pattern recognition or machine learning concepts is assumed, though familiarity with multivariate calculus and basic linear algebra is required. For those without experience in the use of probabilities, the book includes a self-contained introduction to basic probability theory.
Data Science with Python and Dask
By Jesse Daniel
Data Science with Python and Dask teaches you to build scalable projects that can handle massive data sets. Dask provides dynamic task scheduling and parallel collections that extend the functionality of NumPy, Pandas and Scikit-learn, enabling users to scale their code from a single laptop to a cluster of hundreds of machines with ease. The cool part about Data Science with Python and Dask is that it uses the example of analyzing the NYC Parking Ticket database. You then simplify the process using DataFrames. Using Dask-ML, you'll build machine learning models. Then, using AWS and Docker, you’ll create interactive visualizations and clusters.
The Hundred-Page Machine Learning Book
By Andriy Burkov
Available in 11 languages, The Hundred-Page Machine Learning Book is the latest book from Andriy Burkov, who has a Ph.D. in AI and is the leader of a machine learning team at Gartner. This AI book is filled with best practices and design patterns of building reliable machine learning solutions that scale. It's based on Burlov's 15 years of experience in solving problems with AI and on the published experience of the industry leaders. In her forward to the book, Cassie Kozyrkov, chief decision scientist at Google, describes The Hundred-Page Machine Learning Book as "one of the few true Applied Machine Learning books out there."