vali_111 - Fotolia
Enterprises are increasingly shifting IT infrastructure from on-premises systems to cloud services. In doing so, they face a problem familiar to distributed systems designers: minimizing the performance degradation that comes when applications and users are separated from their data. Even organizations going all-in on cloud infrastructure and putting applications and storage in the same cloud region aren't spared this performance-sapping cloud latency, because employees still must traverse the internet before accessing file shares and databases.
The typical solution to data latency, whether on a processor chip or an enterprise IT environment, is caching and synchronizing. With these techniques, information that's frequently accessed is copied to local repositories with local changes automatically replicated to a central storage location.
Storage caching takes many forms in the cloud. One popular approach is a cloud caching appliance resembling small NAS arrays that regularly synchronize with one or more cloud storage services, which act as the authoritative data source.
How cloud caching technology works
There are several categories of cloud storage caching appliances available. The most popular implementation uses on-premises hardware or a virtual appliance that connects to at least one cloud-resident virtual gateway and provides the conduit to one or more cloud object, file or block storage services. These appliances can be used in parallel, within a single cluster and across multiple remote locations. They create a distributed storage system encompassing multiple enterprise data centers, edge locations and IaaS regions hosting storage services.
Cloud caching systems are sometimes called hybrid storage gateway appliances to emphasize their role as a bridge between on-premises and cloud infrastructure. As this Gartner definition points out, like other in-line caches, these appliances intercept file, block or object storage I/O, depending on the device and application. They use cache management algorithms -- typically, the least recently used or adaptive replacement cache algorithms -- to maximize the cache hit rate and minimize external data traffic. Appliances also use data reduction algorithms, such as compression and deduplication, along with WAN optimization software to improve performance and minimize latency when retrieving cache misses -- the data not in the cache -- from the cloud.
Cloud storage caching alternatives
There are two variants on cloud caching technology that don't act as pure local caches:
- A cloud storage gateway is like a caching appliance without the local storage. It's usually implemented as a virtual appliance, running as a VM and exposing NFS and SMB NAS file protocols or block interfaces, such as iSCSI, to on-premises users and writing data to cloud object storage services. Given the low cost of local storage on VM servers and the significant advantages of a local cache, there are few remaining products that act as pure gateways.
- A distributed file system provides enterprise implementations of cloud-like object storage that can span multiple clusters and locations. Some products can also be deployed on cloud infrastructure like AWS, enabling file systems to span private and public resources and the ability to cloud burst extra capacity when needed. Distributed file systems offer virtually unlimited scalability, nondisruptive capacity expansion, data encryption at rest and in transit, high availability with geographically distributed replication, and a single global namespace for file and object storage. Some products also include backup and archive modules to regularly duplicate data to a cloud repository and migrate unused or cold data to low-cost cloud storage services, such as Amazon Glacier.
Cloud caching appliance market and products
The market for cloud storage gateway and caching appliances is growing in parallel with the broader enterprise adoption of cloud services. Zion Research estimated the market to be about $3 billion in sales, growing at 30% annually to nearly $7.2 billion in 2023. The market is divided among the major cloud providers; large enterprise IT suppliers; and smaller firms specializing in hybrid cloud storage software, such as Ctera Networks and Panzura.
The following list isn't an exhaustive product guide. However, it includes popular cloud gateway caching products.
AWS Storage Gateway is available as a virtual or hardware appliance that connects on-premise systems with AWS storage resources. It supports file (SMB or NFS), volume (iSCSI) and tape (virtual tape library) access to S3 resources. The software appliance requires a VM with four vCPUs, 16 GB RAM and 80 GB disk space for image and system data. The hardware appliance is a dual-processor Dell EMC PowerEdge R640 server with 128 GB RAM and four 2 TB SSDs for caching.
Ctera Networks Ltd. has a suite of three software products that together make up a hybrid storage architecture. They include Ctera Portal for the core cloud services and system management, Ctera Edge Filer for on-premises servers and Ctera Drive for remote clients. Ctera software is available as a virtual appliance or bundled with one of five system configurations.
Dell EMC offers several hybrid cloud storage products in addition to the AWS Gateway. These include:
- The PowerScale OneFS OS CloudPools feature, which enables extending data tiering policies to cloud services so infrequently accessed data is automatically migrated off of local storage. CloudPools supports Alibaba Cloud, AWS, Google Cloud Platform (GCP), Microsoft Azure and Virtustream.
- Unity XT All-Flash Unified Storage midrange storage array with the Unity Cloud Edition virtual appliance that can be run on AWS to extend block, file and VMware Virtual Volumes to cloud infrastructure.
Microsoft has built a hybrid storage portfolio primarily through its acquisition of Avere in 2018 and StorSimple in 2012.
- Avere was a pioneer in cloud caching technology and continues to offer appliances designed for both high-performance computing (Azure vFXT) and enterprise (Azure FXT Edge Filer) workloads. The Edge Filer is available in two 1U models -- one with a 12.8 TB SSD cache, the other with double the cache -- that can scale out to 24 nodes per cluster and deliver millions of IOPS and hundreds of gigabytes per second throughput.
- StorSimple will be obsolete by 2022, replaced by Avere, Azure File Sync service or Azure Stack Edge. Azure Stack Edge is a 1U server with an embedded field-programmable gate array for AI workload acceleration. It acts as a caching cloud gateway managed via the Azure portal.
Nasuni Is a distributed object file system that can expose NAS file protocols and is supported by AWS, Azure, GCP and IBM Cloud. It's a software product that's often locally deployed on hyper-converged infrastructure systems.
NetApp Global File Cache is based on Talon Fast storage software, which NetApp recently acquired. The product creates a globally distributed virtual file share with local data caches that are automatically refreshed based on usage patterns. The File Cache works with NetApp Cloud Volumes Ontap and Azure NetApp Files to provide low-latency access to cloud storage from internal servers and edge clients.
Panzura Freedom, a global, distributed object file system, has a unified namespace with local caching that automatically tracks hot and cold data blocks to improve cache performance and efficiency. Its SmartCache policies allow overriding automatic cache management by pinning data in the cache regardless of its access frequency. Freedom is available as a virtual software appliance that works on AWS, Azure, GCP, IBM Cloud and VMware or as one of three Freedom Filer hardware appliances that support up to a 28 TB cache and 5,000 users.
Pure Storage doesn't offer a cloud caching gateway. However, its cloud block and object storage software can span multiple locations and be extended to AWS and VMware Cloud infrastructure via a virtual appliance.
StoneFly Smart Cloud Storage Gateway is available as a virtual appliance or bundled with one of StoneFly's SAN or NAS appliances. It provides iSCSI SAN, S3-compatible object and NAS storage. The virtual appliance works on bare-metal servers and most hypervisors, and both the software and hardware appliances support AWS, Azure and StoneFly's private cloud.