Distributed file systems enable users to access file data that is spread across multiple storage servers. Key features -- including application and platform support and storage type -- differ from one product to another, so it's crucial for organizations to make a vigilant comparison of distributed file systems.
Users must examine their requirements and the differences among distributed file systems, evaluate the many products available on the market and then commence a systematic process for selection and implementation.
How distributed file systems differ
Most distributed file systems use object storage, though some also accommodate file storage. Some are hardware-based versus strictly software, and some are hybrids. For example, NAS uses both hardware and software to deploy storage in multiple locations, whether on-site or remotely. Users do not know where the system stores their data. Offerings can also be cloud-based, such as in multi-cloud and hybrid cloud environments.
A parallel file system is a variation of a distributed file system. Through striping, data sets are broken up into blocks that are sent to multiple storage devices. A global namespace applied to each block helps identify the data blocks for future access. Metadata can also identify the data, its location and owner.
Distributed file system product examples
A comparison of distributed file systems should involve many vendors. Here are five major options:
- Dell PowerScale. PowerScale is a scale-out NAS system that uses server technology and file and object storage to support a range of requirements. It uses the OneFS OS to create a software-based storage environment with support for industry file storage protocols. It also supports Amazon S3 for object storage in AWS.
- IBM Spectrum Scale. Spectrum Scale is a software-only application that uses parallel file system technology and supports a range of storage media, including hard disks, flash drives, cloud storage and even tape. It supports OpenStack Swift, open source software that facilitates the storage of large amounts of data. Spectrum Scale also supports Amazon S3 and most cloud service offerings.
- Nasuni. A hybrid cloud-based object storage platform, the Nasuni system uses UniFS, its homegrown operating software. UniFS works with all major cloud operators and stores user data as a series of snapshots of every version of every file in storage.
- NetApp StorageGrid. StorageGrid is a software-defined object storage platform that supports a number of applications, especially the management of unstructured data. It can run on a server or in a virtual environment, it handles multiple data centers and it provides integrated lifecycle management and Amazon S3 features.
- Panzura CloudFS. This system provides a hybrid cloud-based object storage environment for unstructured data management. It uses global namespace technology to locate files in both on-site and cloud storage environments. The system replicates a NAS arrangement but without the issue of physical location.
Address these questions while considering distributed file systems:
- Are current data storage technologies sufficient?
- What kinds of files are stored?
- Is file storage the primary storage approach?
- Where is data currently stored? If it is on-site, how much growth can the current technology accommodate?
- What plans are there for cloud-based storage? Is cloud storage already used?
- Would a hybrid arrangement make sense?
- What are current, medium-term and long-term data storage requirements?
- What are the implications to company operations with a change in the data storage platform?
Finally, perform the due diligence of a comparison of distributed file systems. Set up a project plan to facilitate the new or updated storage arrangement.