A petabyte is a measure of memory or data storage capacity that is equal to 2 to the 50th power of bytes. There are 1,024 terabytes (TB) in a petabyte -- or 1 million gigabytes (GB) -- and approximately 1,024 PB make up one exabyte.
Petabytes are not suited to traditional data backups, which have to scan the entire system every time a data backup or archiving job occurs. Traditional network-attached storage (NAS) is scalable and capable of handling petabytes of data, but it can take too much time and use too many resources when going through the system's organized storage index.
Comparing memory and not storage, a typical laptop or desktop computer contains 16 GB of random access memory (RAM). A top-end server can contain as much as 6 TB of RAM. That means it would take 170 top-end servers -- or roughly 61,000 desktops -- to add up to a single petabyte of RAM.
For another example of how large a petabyte of storage is, a typical DVD holds 4.7 GB of data. That means a single terabyte of storage could hold 217.8 DVD-quality movies, while a single petabyte of storage could hold 223,101 DVD-quality movies.
Facebook's data warehouse stores approximately 10 billion user photos, which adds up to about 1.8 PB of storage space. In 2014, the company began planning to accommodate 300 PB of user data.
Petabyte storage vendors
Barely a decade ago, data storage vendors would boast of selling an aggregate of a petabyte or two in all of their storage systems sold. Due to the continued rapid increase in data storage capacity requirements, it's now common to see individual companies and even single storage systems with more than a petabyte of storage capacity.
In 2013, Fujitsu announced its Eternus DX block storage devices, which can scale from 4.6 PB to 13.8 PB of raw capacity. The HGST Active Archive System, released in 2015, scales to 4.7 PB of raw data. DataDirect Networks offers ExaScaler storage arrays with up to 14 PB of capacity across two racks. And the latest EMC Isilon NAS arrays can scale up to 50 PB.
In August 2017, Intel announced a new form factor for solid-state drives (SSDs) it calls the ruler. Roughly a foot long and looking like a thick ruler, Intel claims that when the new SSD form factor comes to market, it will pack enough flash memory to allow a petabyte of storage to fit in a single 1U server rack. In comparison, using typical 10 TB 3.5-inch hard disk drives (HDDs) would require a 100-bay 4U rack and significantly more power.
Petabyte backups and storage
However, other data storage technologies can back up and archive at a petabyte scale.
- Snapshots and other disk-based backup technologies provide a local copy of the data, enabling a rapid restore.
- Tape and the cloud provide relatively low-cost backup options for petabytes of data, but they are more often used as off-site archival storage rather than primary storage.
- Solid-state storage can scan petabytes of data at a much higher speed without sacrificing data integrity.
- Object storage assigns each object a unique identifier, enabling the system to search large amounts of data in a flat space as opposed to examining a complete storage index to find a specific file.
Petabytes and big data
There is no specific quantity of data that qualifies as big data, but the term often refers to information in the petabyte, or even exabyte, range. Mining for information across petabytes of data is a time-consuming task. Organizations working with big data often use the Hadoop Distributed File System because it facilitates rapid data transfer and allows a system to operate uninterrupted while working with petabytes of data.
To get a sense of how big some data warehouse stores have become, in July 2017, the European research center CERN announced that its data center has 200 petabytes archived in its tape library. In May 2018, Google announced it would be hosting Twitter's 300 PB Hadoop clusters in the Google Cloud Platform.
Combined, Facebook, Google and YouTube accounted for approximately 35,000 PB (35 exabytes) of data generated in 2017. With the increased use of 4K video and the advent of the internet of things (IoT), IDC predicts that global data produced every day will reach roughly 440,000 PB in 2025.
One typical example of the amount of IoT data being generated is the Geared Turbofan jet engine made by Pratt & Whitney, which contains 50,000 individual sensors and produces 10 GB of data every second. That adds up to 1 PB of data generated every 3.6 operational hours from a single jet engine.