Definition

Google Bigtable

Robert Sheldon

By

Robert Sheldon

Published: Jan 25, 2024

What is Google Bigtable?

Google Bigtable is a distributed NoSQL database service created by Google to handle large amounts of structured, semistructured and unstructured data. Although Bigtable is available as a public subscription service, the platform also supports many of Google's own core services, including Google Search, Google Maps, Google Drive, Google Analytics and YouTube.

Bigtable was designed to support analytical and operational workloads requiring massive scalability. The platform can scale to petabytes of data, with workloads spread across thousands of commodity servers. The platform can also deliver single-digit millisecond latency and promises up to 99.999% high availability. Bigtable is based on a sparsely populated table design that can accommodate thousands of columns and billions of rows.

Bigtable is capable of processing more than 6 billion requests per second. Customers can seamlessly scale their operations from thousands to millions of read/write operations per second by adding or removing cluster nodes, and they can make these changes without incurring downtime. They can also automatically scale their clusters up or down to meet fluctuating workload demands. In addition, Bigtable supports automatic replication with eventual consistency.

According to Google, the Bigtable service can accommodate multiple types of data, making it possible to support a variety of workloads. This includes financial data, marketing data, graph data, time-series data and IoT data. The platform is well-suited to batch MapReduce operations, machine learning applications, and stream processing and analytics. The Bigtable service currently manages over 10 exabytes of data.

Graphic explaining the four primary types of NoSQL databases. — NoSQL databases like Google's Bigtable service are geared toward managing large sets of varied data and frequently updated data.

How is data stored in Bigtable?

Bigtable stores data in scalable tables made up of columns and rows. Each table provides a sorted key/value map, with each row indexed by a single row key. Related columns are often grouped into column families and a unique name is assigned to each column family. The tables are also sparse; if a column does not contain data for a particular row, the cell does not use any storage space.

Within these tables, each row/column intersection can contain one or more cells. In a traditional relationship database table, each row/column intersection can contain only one cell. Every cell in a Bigtable table contains a unique timestamped version of the data. This approach enables Bigtable to maintain a record of how data has changed over time.

The tables in a Bigtable database are sharded into blocks of contiguous rows that are referred to as tablets. Tablets are flexible structures that make it easier to balance workloads across server nodes. The tablets are stored in the SSTable format on Google's Colossus file system. An SSTable is a file that contains sorted key-value pairs. The file provides a key-to-value mapping that is ordered, persistent and immutable. Both the keys and values are arbitrary byte strings.

Chart showing key differences between structed and unstructured data. — A distributed NoSQL database service, Google Bigtable can handle large amounts of structured, semistructured and unstructured data.

The Bigtable service is organized into a hierarchy of components made up of instances, clusters and nodes. These components provide customers with an overall structure for working with their databases:

Instance. The instance sits at the top of the hierarchy. It offers a logical structure for deploying a Bigtable database, while providing a container for the customer's data. An instance requires more than one cluster to support replication. The storage type (SSD or HDD) is determined at the instance level.
Cluster. An instance contains one or more clusters. Each cluster is located in a specific zone. Google organizes its data services into regions, which contain individual zones. A single Bigtable instance can contain clusters in up to eight regions; however, each zone can include only one cluster.
Node. A cluster contains one or more nodes. The nodes provide the compute resources necessary to drive the Bigtable operations. A tablet is associated with a specific node at any given time. The node tracks the tablets that are assigned to that node. A node never stores the data, only points to it. In this way, Bigtable can quickly rebalance nodes by updating the pointers. It can also seamlessly recover from node failure without losing data. In addition, the nodes handle incoming read/write requests for the tablets and perform maintenance tasks on them, such as compacting the data.

All client requests to the Bigtable service go through a front-end server pool, which forwards the requests to the individual Bigtable nodes. The nodes then communicate with their respective tablets. By adding nodes to a cluster, customers can increase the number of simultaneous requests that their clusters can handle, while also increasing the cluster's maximum throughput.

Google has maintained Bigtable as a proprietary, in-house technology. Nevertheless, Bigtable has had a large impact on NoSQL database design. In 2006, Google software developers publicly disclosed Bigtable details in a technical paper presented at the USENIX Symposium on Operating Systems and Design Implementation.

The paper's thorough description of Bigtable's inner workings allowed other organizations and open source development teams to create database systems that are modeled after Bigtable. Those systems include Apache HBase database, which runs on top of the Hadoop Distributed File System (HDFS); Cassandra, which originated at Facebook; and Hypertable, an open source project that ended development in 2016.

Explore 7 Google Cloud database options to free up your IT team. See also: columnar database and check out 18 top big data tools and technologies to know about.

Continue Reading About Google Bigtable

A cloud services cheat sheet for AWS, Azure and Google Cloud

DBMS keys: Types of keys defined

Break down Google big data services

Hadoop vs. Spark: An in-depth big data framework comparison

Explore Hadoop distributions to manage big data

Dig Deeper on Database management

Search Business Analytics

What makes an effective data science team structure?
Data science team structures vary in strength, and their success depends on how roles and leadership align with business goals to...
Synthetic data vs. real data for predictive analytics
Synthetic data helps simulate rare events and meet privacy compliance, while real data preserves natural variability needed to ...
7 predictive analytics skills to improve simulation modeling
Predictive analytics skills such as statistical analysis, data preprocessing and model evaluation can help data professionals ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

Search Content Management

The top 10 RFP response software
As B2B organizations grow, the RFP response process can become too time-consuming for manual workflows. Top tools, such as Loopio...
CRM vs. CMS: How they differ and how to integrate them
CMSes and CRM systems serve different purposes, but together, they can help organizations improve customer data management and ...
How to accomplish a SharePoint-Teams integration
Depending on the complexity of a business's SharePoint sites, a Teams integration can benefit organizations by being ...

Search Oracle

Oracle sets lofty national EHR goal with Cerner acquisition
With its Cerner acquisition, Oracle sets its sights on creating a national, anonymized patient database -- a road filled with ...
With Cerner, Oracle Cloud Infrastructure gets a boost
Oracle plans to acquire Cerner in a deal valued at about $30B. The second-largest EHR vendor in the U.S. could inject new life ...
Supreme Court sides with Google in Oracle API copyright suit
The Supreme Court ruled 6-2 that Java APIs used in Android phones are not subject to American copyright law, ending a ...

Search SAP

SAP agrees to allow Celonis data access until case resolved
SAP agrees to allow Celonis customers to access data from its systems as their legal battle continues, but customers will be best...
Grow with SAP fuels Phoenix Global's digital transition
Phoenix Global implemented S/4HANA Cloud via Grow with SAP to replace outdated systems, digitize manual processes and enable AI ...
SAP Sapphire 2025 news, trends and analysis
SAP showcased new business AI applications and continued to make the case for S/4HANA Cloud as the future of SaaS-based ERP ...

Close