Browse Definitions :

peshkova - Fotolia

A history and timeline of big data

Milestones that led to today's big data revolution -- from 1600s' statistical analysis to the first programmable computer in the 40s to the internet, Hadoop, IoT, AI and more.

Big data has revolutionized the modern business environment in recent years. A mixture of structured, semistructured and unstructured data, big data is a collection of information that organizations can mine for business purposes through machine learning, predictive modeling, and other advanced data analytics applications.

At one time the concept of big data may have seemed like a buzzword, but the reality is the impact of big data on the world around us has been tremendous. As you will see from this timeline covering the history of big data, big data analytics builds on concepts that have been around for centuries.

The history data analysis that led to today's advanced big data analytics starts way back in the 17th century in London. Let's begin our journey.

The bedrock of big data

A foundational period where clever people started seeing the value of turning to statistics and analysis to make sense of the world around them.


John Graunt introduces statistical data analysis with the bubonic plague. The London haberdasher published the first collection of public health records when he recorded death rates and their variations during the bubonic plague in England.


Richard Millar Devens coins the term "business intelligence." As we understand it today, business intelligence is the process of analyzing data, and then using it to deliver actionable information. In his "Cyclopædia of Commercial and Business Anecdotes," Devens described how a banker used information from his environment to turn a profit.


Herman Hollerith invents the punch card tabulating machine, marking the beginning of data processing. The tabulating device Hollerith developed was used to process data from the 1890 U.S. Census. Later, in 1911, he founded the Computing-Tabulating-Recording Company, which would eventually become IBM.


Nikola Tesla predicts humans will one day have access to large swaths of data via an instrument that can be carried "in [one's] vest pocket." Tesla managed to predict our modern affinity for smartphones and other handheld devices based on his understanding of how wireless technology would change particles: "When wireless is perfectly applied, the whole earth will be converted into a huge brain, which in fact it is, all things being particles of a real and rhythmic whole. We shall be able to communicate with one another instantly, irrespective of distance."


Fritz Pfleumer invents a way to store information on tape. Pfleumer's process for putting metal stripes on magnetic papers eventually led him to create magnetic tape, which formed the foundation for video cassettes, movie reels and more.


The U.K. created a theoretical computer and one of the first data processing machines to decipher Nazi codes during WWII. The Colossus, as it was called, performed Boolean and counting operations to analyze large volumes of data.

Colossus was the world's first programmable electronic computer
Colossus: The first programmable electronic computer.


Arthur Samuel, a programmer at IBM and pioneer of artificial intelligence, coined the term machine learning (ML).


The U.S. plans to build the first data center buildings to store millions of tax returns and fingerprints on magnetic tape.


Advanced Research Projects Agency Network (ARPANET), the first wide area network that included distributed control and TCI/IP protocols, was created. This formed the foundation of today's internet.

The internet age: The dawn of big data

As computers start sharing information at exponentially greater rates due to the internet, the next stage in the history of big data takes shape.

1989 and 1990

Tim Berners-Lee and Robert Cailliau found the World Wide Web and develop HTML, URLs and HTTP while working for CERN. The internet age with widespread and easy access to data begins.


Digital data storage becomes more cost-effective than storing information on paper for the first time in 1996, as reported by R.J.T. Morris and B.J. Truskowski in their 2003 IBM Systems Journal paper, "The Evolution of Storage Systems."


The domain is registered a year before launching, starting the search engine's climb to dominance and development of numerous other technological innovations, including in the areas of machine learning, big data and analytics.


Carlo Strozzi develops NoSQL, an open source relational database that provides a way to store and retrieve data modeled differently from the traditional tabular methods found in relational databases.


Based on data from 1999, the first edition of the influential book, How Much Information, by Hal R. Varian and Peter Lyman (published in 2000), attempts to quantify the amount of digital information available in the world to date.

Big Data in the 21st century

Big data as we know it finally arrives, and the explosion of ingenuity that it brings with it cannot be overestimated. Everyone, and everything, is impacted.


Doug Laney of analyst firm Gartner coins the 3Vs (volume, variety and velocity), defining the dimensions and properties of big data. The Vs encapsulate the true definition of big data and usher in a new period where big data can be viewed as a dominant feature of the 21st century. Additional Vs -- such as veracity, value and variability -- have since been added to the list.

The six Vs of big data
Big data collects data from different well-defined to loosely defined human or machine sources.


Computer scientists Doug Cutting and Mike Cafarella create Apache Hadoop, the open source framework used to store and process large data sets, with a team of engineers spun off from Yahoo.


Amazon Web Services (AWS) starts offering web-based computing infrastructure services, now known as cloud computing. Currently, AWS dominates the cloud services industry with roughly one-third of the global market share.

Core components of Hadoop
Hadoop's ability to run on commodity servers, scale, flexibility and speed help make it an important part of the big data ecosystem.


The world's CPUs process over 9.57 zettabytes (or 9.57 trillion gigabytes) of data, about equal to 12 gigabytes per person. Global production of new information hits an estimated 14.7 exabytes.


Gartner reports business intelligence as the top priority for CIOs. As companies face a period of economic volatility and uncertainty due to the Great Recession, squeezing value out of data becomes paramount.


McKinsey reports that by 2018 the U.S. will face a shortage of analytics talent. Lacking between 140,000 and 190,000 people with deep analytical skills and a further 1.5 million analysts and managers with the ability to make accurate data-driven decisions.

Also, Facebook launches the Open Compute project to share specifications for energy-efficient data centers. The initiative's goal is to deliver a 38% increase in energy efficiency at a 24% lower cost.


The Obama administration announces the Big Data Research and Development Initiative with a $200 million commitment, citing a need to improve the ability to extract valuable insights from data and accelerate the pace of STEM (science, technology, engineering, and mathematics) growth, enhance national security and transform learning. The acronym has since become STEAM, adding an A by incorporating the arts.

Harvard Business Review names data scientist the sexiest job of the 21st century. As more companies recognized the need to sort and gain insights from unstructured data, demand for data scientists soared.


The global market for big data reaches $10 billion.


For the first time, more mobile devices access the internet than desktop computers in the U.S. The rest of the world follows suit two years later, in 2016.


Ninety percent of the world's data was created in the last two years alone, and IBM reports that 2.5 quintillion bytes of data is created every day (that's 18 zeroes).


IDC forecasts big data analytics market would reach $203 billion in 2020.


Allied Market Research reports the big data and business analytics market hit $193.14 billion in 2019, and estimates it will grow to $420.98 billion by 2027 at a compound annual growth rate of 10.9%.

Edge computing set to revise how data is managed and processed for critical sectors of the economy. Edge computing, which refers to computing done near the source of data collection rather than in the cloud or a centralized data center, represents the next frontier for big data.

How edge computing works
With edge computing data is processed closer to its source.

The future of big data: Where does big data go from here?

Due to the explosion of connected devices, our increasing reliance on the cloud and the coming edge computing revolution, among other factors, big data has a lot of growing left to do.

Technologies such as machine learning, AI and IoT analytics, for example, continue to push the envelope by vastly improving our ability to process, analyze and act upon data.

Expect significant advancements in big data and analytics to happen at a faster clip. The next few years could very well make what we've seen over the last 20 years look like child's play.

Dig Deeper on Computer science

  • zero-day vulnerability

    A zero-day vulnerability is a security loophole in software, hardware or firmware that threat actors exploit before the vendors ...

  • DNS attack

    A DNS attack is an exploit in which an attacker takes advantage of vulnerabilities in the domain name system.

  • malware

    Malware, or malicious software, is any program or file that's intentionally harmful to a computer, network or server.

  • data collection

    Data collection is the process of gathering data for use in business decision-making, strategic planning, research and other ...

  • chief trust officer

    A chief trust officer (CTrO) in the IT industry is an executive job title given to the person responsible for building confidence...

  • green IT (green information technology)

    Green IT (green information technology) is the practice of creating and using environmentally sustainable computing resources.

  • diversity, equity and inclusion (DEI)

    Diversity, equity and inclusion is a term used to describe policies and programs that promote the representation and ...

  • ADP Mobile Solutions

    ADP Mobile Solutions is a self-service mobile app that enables employees to access work records such as pay, schedules, timecards...

  • director of employee engagement

    Director of employee engagement is one of the job titles for a human resources (HR) manager who is responsible for an ...

Customer Experience
  • digital marketing

    Digital marketing is the promotion and marketing of goods and services to consumers through digital channels and electronic ...

  • contact center schedule adherence

    Contact center schedule adherence is a standard metric used in business contact centers to determine whether contact center ...

  • customer retention

    Customer retention is a metric that measures customer loyalty, or an organization's ability to retain customers over time.