Demystify the distributed file system and object storage market
IT teams in search of a distributed file system that supports object storage must carefully map out their requirements -- and remember that no two products are exactly alike.
Today's dynamic data storage market can make it a challenge to classify products in a meaningful way. Even when products are similar in nature, they often go by different labels and descriptors.
This is especially true when it comes to the distributed file systems and object storage market.
The only way for storage admins to make sense of this product category is to dig into the details of vendor offerings, determine exactly what they provide and gauge whether they meet their data storage needs.
A broad and varied market
Analyst firm Gartner released a report on distributed file systems and object storage in October 2020. The report was part of Gartner's Magic Quadrant series of market research and included a number of storage products and vendors, including Dell Technologies and IBM.
The firm described distributed file systems and object storage products as software and hardware platforms based on a distributed architecture that use object and/or scale-out file technology to support the growth of unstructured data. Gartner further classified the products as being able to distribute, replicate or erasure-code data and metadata over a network, as well as across various nodes in a cluster.
Even with these qualifiers, the product category is broad and tough to define. For example, Dell Technologies describes its PowerScale and ECS offerings -- both of which are included in the Magic Quadrant for distributed file systems and object storage -- differently. The vendor describes PowerScale as a scale-out NAS platform, and ECS as a software-defined, cloud-scale object storage platform.
Product similarities and differences
The main commonality between the products in Gartner's report is that each one is built on a distributed computing architecture and incorporates or supports object storage. There are, however, significant differences.
Dell EMC PowerScale, for example, is a family of NAS products. The OneFS operating system runs on each node in a PowerScale cluster. The OS provides a software-defined architecture to store, manage, secure and analyze data at scale. OneFS orchestrates cluster components and provides a unified storage pool to consolidate data. The OS supports multiple industry-standard protocols for file-based access. It also incorporates Amazon Simple Storage Service (S3) -- an object storage service in the AWS cloud -- as a first-class protocol, which is implemented on top of the file service engine. Because of this design, both file-based and object-based applications can access a file system on the same platform.
Unlike PowerScale, IBM Spectrum Scale is a software-only offering that comes with several data services. At its heart is a global parallel file system based on IBM's General Parallel File System. Spectrum Scale makes it possible to combine flash, hard-disk, cloud and tape storage into a unified environment that provides resiliency, scalability and control across platforms. According to IBM, the file system can handle tens of thousands of clients, billions of files and yottabytes of data, and includes interfaces for file, object and Hadoop Distributed File System access. For object access, Spectrum Scale supports OpenStack Swift and S3 APIs and can tier data to and from any Swift or S3 storage system.
NetApp takes yet another approach to distributed storage with StorageGrid, an object storage system that enables organizations to manage unstructured data across hybrid and multi-cloud environments. Admins can deploy StorageGrid as virtual or hardware appliances, or in Docker containers. They can run it on bare metal or in VMs. StorageGrid provides a single namespace that can span up to 16 global data centers, and it includes integrated lifecycle management policies to optimize where data resides. StorageGrid also supports S3 API features such as object versioning, multi-part upload and object tags.
There are other products -- including those from Panzura and Nasuni -- that are not in the 2020 Magic Quadrant that also qualify as distributed file systems and object storage.
Panzura offers a high-performance storage product built on the CloudFS file system. The company describes CloudFS as a global cloud file system with a single namespace. CloudFS facilitates scalable multisite deployments, while providing data consistency across public and private clouds. CloudFS caches frequently used files at each location to optimize on-site performance, while providing a single, authoritative data source that operates at scale. Panzura's system behaves like an enterprise NAS with a standard Windows file share, but without NAS' geographic restrictions.
The Nasuni system, meanwhile, includes UniFS, a cloud-native global file system that stores all data in public or private cloud object storage. UniFS also caches active files locally, using specially designed appliances. According to the company, however, a global file system like UniFS is not the same as a distributed file system or a global namespace. Those models tie files to a specific piece of hardware, with the distributed file system or global namespace discovering and directing access to the file. With a global file system, the files are not tethered to hardware.
Quobyte and LucidLink Filespaces could also be considered distributed file systems that incorporate object storage. Quobyte refers to its product as a software storage system that can manage hundreds of petabytes of data, and LucidLink describes Filespaces as a distributed global file system for object storage.