What is content-addressed storage (CAS)?
Content-addressed storage (CAS) -- also called content-addressable storage -- is a method for storing fixed content as objects and providing fast access to that content.
Fixed content refers to data that is not expected to be updated or deleted for a set period of time, such as legal documents, emails and their attachments, medical X-rays, log files, or data that must comply with government regulations. CAS prevents fixed data from being duplicated or modified once it has been stored, providing write once, read many (WORM) data access while preserving the data in its original form.
The term content-addressed storage was coined by EMC Corporation when it released its Centera storage product in 2002. Centera was a purpose-built archiving storage platform that led the way in CAS implementations. After Dell acquired EMC in 2016, Dell EMC continued to offer Centera until 2018. As early as 2016, however, the company was already steering customers toward its Elastic Cloud Storage (ECS) products, which included CAS support and continue to do so today.
How does content-addressed storage work?
When storing data, a CAS system assigns a content address to each object. The content address is a unique identifier that is calculated based on the content itself, providing a digital fingerprint that ensures the data's authenticity and uniqueness.
Applications that need to access data in a CAS system must use the content addresses to find and retrieve the desired objects. In CAS, data is stored on disk, rather than on tape, which streamlines the process of searching for archival data.
Because an object's address is based on the content, it can be used to ensure that each stored object is unique, thus avoiding data duplication. If an application attempts to insert duplicate data, the system creates a pointer to the original object, rather than creating a second, identical object with the same address. (Identical objects always receive the same address.)
However, some CAS implementations store a backup copy of each object to enhance reliability and minimize the risk of catastrophic data loss, but this data is maintained separately from the primary storage platform.
Content-based naming also ensures that data does not get changed. If an object is modified, it automatically receives a different content address, and the data is stored as a new object, with the original object left untouched. In addition, once an object has been stored, it cannot be deleted until the specified retention period has expired.
What are the benefits of content-addressed storage?
An important advantage of CAS is that it minimizes the storage space consumed by data backups and archives by assigning a retention period to each object and avoiding duplicate data. Other types of secondary or archival storage systems are not as efficient in this regard, with much of the stored data duplicated or obsolete.
Another advantage is authentication. Because there is only one copy of each object (backups notwithstanding), verifying its legitimacy is much simpler. Data retrieval is also faster than other approaches to data archiving, such as tape or optical disc.
Despite these advantages, CAS-only storage systems are in decline and being replaced by more modern object storage products that offer greater flexibility, such as Dell EMC ECS, which supports data access technologies such as Amazon S3, Dell EMC Atmos, Swift and OpenStack, along with CAS.
Explore the consumption-based IT and Opex storage landscape and how distributed storage is carving a niche for enterprise use. Also, see how to build a cloud-ready, global distributed file system and best practices for enterprise image data storage.