cloud object storage
Cloud object storage is a format for storing unstructured data in the cloud. Object storage is considered a good fit for the cloud because it is elastic, flexible and it can more easily scale into multiple petabytes to support unlimited data growth. The architecture stores and manages data as objects compared to block storage, which handles data as blocks, and logical volumes and file storage which store data in hierarchical files.
The object storage software design includes a globally unique identifier for each object along with rich, customizable metadata. The metadata is separated to enable other capabilities such as application- and user-specific data for indexing, interfaces that can be directly programmed by the application, a global namespace and more flexible data management policies.
An object identifier is an address tied to the object, which enables the object to be found over a distributed system. Objects may be spread across multiple data centers located in different parts of the world. The object storage-based data can be found without the user knowing the specific physical location of the data.
Object storage, along with the metadata, can be accessed directly via application program interfaces (APIs), HTTP and HTTPS. That differs from block storage volumes, which only can be accessed when they are attached to an operating system.
Cloud object storage vendors include Amazon Simple Storage Service, Caringo Swarm, Cloudian HyperStore, Dell EMC Elastic Cloud Storage, Hewlett Packard Enterprise Scalable Object Storage based on the Scality Ring software-defined storage platform, Hitachi Vantara's Hitachi Content Platform, IBM Cloud Object Storage and the OpenStack Swift open source object storage system.
Object storage vs. file storage and block storage
Traditional block and file storage are not always the best options for storing large unstructured data sets for applications like medical imaging. Both block and file storage are difficult and costly to extend beyond the data center, and they hit a point of diminishing returns when scaling to massive amounts of data.
File storage, often deployed as network-attached storage (NAS) systems, uses a file system to place and share data. It is built for working across a local area network (LAN), but performance suffers when used over a wide area network (WAN). Most file systems are not designed to handle billions of files. Block storage, usually deployed as a storage area network (SAN) system, has performance problems over long distances, making it a poor fit for the cloud. Block storage also has blocks of data with their own address, but there is no metadata to provide context for each block.
Cloud object storage pros and cons
Object storage's main advantage is that it makes data more resilient to disaster or hardware failures because it is highly distributed -- so it is still available even if several nodes fail. It's also a lot cheaper compared to traditional storage because object storage is stored on commodity hardware or virtual machines (VMs) that are infinitely scalable. Objects are stored in a flat address space, which eliminates complexity and scalability challenges. Data protection is built into the architecture, which can be in the form of either replication technology or erasure coding.
Object storage is best suited for static data and cloud storage. Typical use cases for object storage are cloud backup and archiving because the technology works best with data that is more often read instead of written to. Object storage has matured to the point that it scales at the exabyte level, representing trillions of objects. The use of commodity hardware or VMs means nodes can be easily added and disk space is utilized more efficiently.
Object storage systems, through the use of object IDs (OIDs) or identifiers, can access any piece of data without needing to know on which physical storage device, file system or directory it resides. This abstraction allows object storage devices to work with storage hardware configured in a distributed node architecture, so processing power can scale in conjunction with data storage capacity. I/O requests do not have to go through a central controller, enabling a true global storage system for large amounts of data managed by objects, physically stored anywhere, and accessed via a WAN or the internet.
The least desirable use case for object storage is applications and environments with high transactional rates. Object storage systems are not consistent enough for real-time systems such as transactional databases. Object storage does not provide a guarantee that a read request will return the most recent version of the data. In addition, the technology is not always suited for applications with high performance demands.
Cloud object storage gateways
One of the earliest challenges to cloud object storage adoption is legacy applications that were written to understand older protocols. A cloud object storage gateway is built to provide basic protocol translation and more transparent communication. The gateway makes cloud storage appear to be a NAS filer, a block storage array, a backup target or an extension of the application. Most cloud providers rely on internet protocols such as RESTful APIs over HTTP instead of SAN or NAS protocols.
Many cloud object storage gateways provide data deduplication, compression, snapshot technology, automated tiered storage and encryption. Cloud object storage gateways are hardware- or software-based appliances located on the customers' premises as a translation bridge between local applications and remote cloud-based storage.