Orlando Florin Rosu - Fotolia

A primer on DNA data storage

DNA storage is an emerging technology that has the potential to impact data storage in the long term. Learn more about how it works and its recent advancements.

Traditional storage continues to struggle to keep pace with the growth of digital information. Limitations include physical structure, economic value and environmental impact. DNA-based data storage, on the other hand, is an emerging technology that addresses these issues by encoding digital information into synthetic DNA. Data is represented using the four nucleotides that form DNA, creating a biological long-term storage medium.

Interest in DNA storage has increased as organizations confront exponential growth, rising data costs and the need to preserve data for decades. Today, DNA storage has moved from theoretical to early practical exploration due to advances in research, declining costs and backing from major technology companies. It is a potential option for ultra-dense, durable and sustainable data storage.

The four DNA nucleotides are the basic building blocks of DNA. These units are used to encode information for DNA data storage. The nucleotides are:

  • Adenine (A)
  • Cytosine (C)
  • Guanine (G)
  • Thymine (T)

When sequenced into molecules, the four nucleotides act as an alphabet for encoding data.

How DNA data storage works

DNA storage works by translating data into a biological format that can be synthesized and stored. DNA-based storage does not rely on living organisms. Instead, it uses chemical synthesis to generate the sequences. The process is complex, and organizations should explore it as a long-term, archive medium.

The encoding process consists of the following steps for storing data:

1. Data encoding: Translating binary data into sequences composed of the four DNA nucleotides. The sequences represent specific bit patterns.

2. Error correction and redundancy: Data is protected before encoding using standard error-correction and parity.

3. Writing sequences to synthetic DNA: Binary bits are mapped to the four nucleotides, resulting in a long string of DNA sequences representing the original information.

4. Creation of strands and synthesized molecules: Long DNA sequences are broken into segments containing payload, index and error-correction information. Segments are chemically synthesized into physical DNA molecules.

5. Data storage: Storage facilities store the DNA sequences or molecules in containers under stable conditions. The containers do not need power or active maintenance.

Retrieving data, unsurprisingly, reverses the encoding process. The high-level steps are:

1. Data retrieval: DNA is read using sequencing technologies that interpret the order of the nucleotides, resulting in a digital representation of the original data.

2. Data assembly and validation: Indexes order the DNA segments and apply error-correction algorithms to ensure integrity.

3. Data decoding: The DNA sequences are translated back to binary bits, and the original file structure is reconstructed.

The encoding and decoding processes are slower than standard read-write operations, so DNA data storage is better suited to long-term archiving scenarios than to dynamic storage requirements. View it as a complementary technology to traditional data storage.

Benefits of DNA storage

DNA-based storage offers multiple strategic benefits for long-term archiving and cost-effective storage. These benefits hold strategic appeal when discussing cold storage, data archives and information preservation.

Key benefits include:

  • Extremely high storage density: DNA-based storage consumes a very small amount of physical space while handling vast amounts of data. This feature offers an advantage in addressing rising storage costs and scalability.
  • Longevity and durability: DNA storage theoretically holds data for thousands of years without degradation. It does not require periodic refreshes or migrations to new media, further reducing costs and effort.
  • Low energy requirements: DNA storage requires no power to maintain its state. This characteristic enables long-term storage while meeting environmental impact and sustainability efforts.

More benefits might emerge as the technology continues to evolve, including media stability compared to existing technologies. These potential benefits drive enterprises and governments to invest in additional research.

Challenges and limitations of DNA storage

Various challenges exist with DNA storage, although the technology continues to evolve to address these barriers.

  • Cost: Writing data into DNA requires chemical synthesis, and retrieving it relies on sequencing technologies -- both of which are expensive.
  • Accuracy/reliability: Technical hurdles still remain concerning accuracy and errors. These concerns mean organizations require error correction and redundancy to ensure data integrity.
  • Performance: Writing and reading data is very slow, often requiring hours or days. This limitation keeps DNA storage in the realm of data archiving rather than daily operations.
  • Integration: Another challenge is integrating DNA storage into existing technologies and workflows, which rely on proven infrastructure and familiar practices. Since DNA storage requires synthesis and sequencing technologies, the approach requires a new technology layer.

However, like all emerging technologies, innovators are addressing DNA storage challenges to bring it within the reach of mainstream business.

Uses today and leading vendors

Currently, DNA storage is typically found in experimental and research environments, with infrequent early commercial use. It suits scientific research datasets, historical/cultural archives and government records that must be preserved for long periods. Expect its ongoing evolution to bring it into business scenarios over time.

Various technology companies fund research, build prototypes and publish reference architecture. These contributions demonstrate that DNA storage is a viable technology. The vendor landscape continues to emerge.

Recent advancements in DNA storage research

Recent advancements have accelerated DNA storage technologies toward real-world use. As expected, these advancements address some of the existing challenges and limitations organizations must overcome before integrating DNA storage into regular workflows.

Advancements continue in the following areas.

  • Encoding efficiency: Algorithms continue to improve, enabling more digital information to be stored per nucleotide while reducing error rates. Not only do these advances improve data density, but they also make the storage technology more reliable.  
  • Automation and throughput: New techniques for synthesis and sequencing reduce write and read times, and prices for these technologies continue to fall. These techniques also reduce manual labor.
  • Hybrid storage architectures: Data storage tools that enable DNA and traditional storage methods to coexist and work together improve efficiency and cost effectiveness.

Leaders in the field

The DNA Data Storage Alliance (DDSA) is an industry consortium focused on developing, standardizing, and commercializing DNA-based data storage technologies. It is a key component of current research and development in DNA storage.

Many companies are actively involved with DNA storage research through the DDSA. A few leaders include:

  • Microsoft Corporation's Microsoft Research DNA Storage division -- A founding member of the DDSA with a focus on encoding digital data into synthetic DNA.
  • Twist Bioscience/Atlas Data Storage -- A founding member of the DDSA, with a specialization in synthetic DNA production.
  • Illumina, Inc. -- A leading DNA sequencing and genome company participating in the DDSA and supplying DNA synthesis/sequencing technologies.
  • Western Digital Corporation -- A leading storage manufacturer participating in the DDSA, exploring the integration of DNA storage with traditional media.

What IT and business leaders should watch next

DNA data storage is best viewed as an evolving, long-term strategic capability. Industries using data archiving will want to watch for specific signals that the technology has become mainstream enough to warrant investigation or the establishment of pilot programs.

Likely commercial readiness signals include:

  • Declining costs for synthesis and sequencing.
  • Alignment with industry standards and regulations.
  • Vendor platform maturity and product offerings.
  • Pilot programs in parallel industries, government, and hyperscale environments.

By monitoring advances in DNA storage research and product offerings, IT leaders can proactively assess when and where this technology could fit into the organization's long-term storage strategies. DNA data storage might still be emerging, but its potential impact on information storage and preservation should make it a topic every forward-looking technology leader should keep on their radar.

Damon Garn owns Cogspinner Coaction and provides freelance IT writing and editing services. He has written multiple CompTIA study guides, including the Linux+, Cloud Essentials+ and Server+ guides, and contributes extensively to Informa TechTarget, The New Stack and CompTIA Blogs.

Next Steps

An overview of Microsoft Project Silica and its archive use

Dig Deeper on Primary storage devices