animind - Fotolia
For the past 25 years, NAS file servers, or filers, have been the traditional way to store unstructured data, or...
data that isn't within a classic database format. Calling it unstructured data doesn't mean the data doesn't have structure within itself, but that the file is essentially a binary object. With that in mind, we are seeing the rise of object stores as an alternative to traditional file servers, with many vendors offering object-level storage and file-based interfaces to the same data.
In this article, we discuss the pros and cons of mixing object-level storage and file as a method of storing unstructured data and examine what options IT departments have when looking at products that do just that.
A quick NAS primer
NAS covers two technologies that came from different sides of the IT landscape. NFS, developed by Sun Microsystems, has become the standard protocol for accessing file content across the network for non-Windows systems. SMB, formerly known as CIFS, is the file protocol for Microsoft platforms. Both evolved rapidly since their introduction, with performance and scalability enhancements added to allow more functionality than simple file sharing.
Traditional NAS designs use RAID as a protection mechanism for recovery in the event of a hardware failure, and vendors base most NAS products on dual-controller architectures -- some with scale-out capabilities. File systems are built on top of the physical storage media, and these file systems are exposed to the network using either NFS or SMB.
NAS and object similarities
Both file-based NAS and object storage work on the same type of data -- unstructured files within or outside of a file hierarchy. Both provide scale-out architecture capabilities, letting them store millions, if not billions, of objects.
The use of a file system as the method of data storage presents a few usability issues:
- The scalability problem. On a single node -- or failover dual-node -- NAS, the file system exists on one instance of the operating system. That makes it relatively easy to handle "immutable" operations such as file creation, locking and updates. Scaling the file system out is a real challenge, however, and becomes complex when dealing with many nodes.
- Data integrity. File systems store data in structures that keep metadata and file content across a logical or physical disk volume. If power is lost to the file server, the system must perform a file system check, or FSCK, to validate the state of data at the time of power loss. This delay can be significant, depending on how the file system has been implemented; some systems, such as NetApp's Data ONTAP, use nonvolatile RAM to commit data en masse and reduce the FSCK burden.
- RAID protection. Since its inception in a 1987 paper by David Patterson, Garth Gibson and Randy Katz, RAID has become the go-to protection method for storage appliances. It has served us well, but started to reach limits of scalability as hard drive capacities increased further than could have been imagined when the paper was written. Today, RAID has issues with rebuild times that can run into days and will continue to worsen with the increasing use of 12 TB-plus capacity drives.
RAID has other limitations, because it's only practical for data stored in a single appliance. To protect data from more intrusive issues than device failures, it must be replicated, creating entire duplicate copies in geographically dispersed locations.
Understanding object storage
Object-based storage is a relatively new way of storing binary data or objects. The technology traces its roots to the mid-1990s and a company called FilePool, which introduced the idea of content-addressable storage. EMC (now Dell EMC) acquired FilePool, and it became the Centera product line. Since then, many vendors have come to market with the idea of offering the ability to store large quantities of unstructured content.
A cure for the common file system
As a physical storage architecture, object stores remove many of the performance and integrity headaches associated with storing unstructured data on file systems that NAS products use. That's because an object store doesn't use the file system concept, but instead stores data in a single, flat namespace or hierarchy.
- Access through web-based protocols (HTTP or HTTPS) and typically stateless. Every interaction with an object platform uses simple constructs like store-create, update and delete.
- No file structure. Object-level storage offers "buckets" or logical storage containers that hold data in a flat, nonhierarchical way.
- No understanding of the format or structure of content. Data is stored with metadata that holds attributes describing the content. This can be system metadata -- e.g., date-time stored -- or user-defined metadata to provide some way for external applications to retrieve and search content.
- Typically immutable updates. Storage of new objects is an all-or-nothing process, with updates handled as a delete-and-create process rather than update in place.
- Initial object stores designed with high scalability in mind. Many IT shops may have felt that object storage usage wasn't justified without having a need to store large volumes of binary data. That's changing.
- Data protection implemented by alternative techniques to RAID. This includes keeping multiple copies or replicas of an object or using erasure coding.
- Different approach to content locking. NAS maintains data integrity through content locking. Individual files can be opened for exclusive or write access, ensuring data is only written from a single source at any one time. Object stores don't inherently offer locking, but ensure objects are treated as immutable, even if that means overwriting them in order to maintain consistency.
Looking at the attributes of both storage systems, we can see there are also many similarities between the two methods of storing data. Both work on unstructured data and use metadata to track information on specific objects being stored. It's not too difficult to see how you could adapt an object store to offer NAS protocols.
Why would we want to merge NAS and object? Aside from the obvious savings in physical storage you could achieve from running a single storage platform, there are other benefits:
- Object stores use techniques like erasure coding to spread data protection, and data access, over geographically dispersed locations. This means they don't require traditional replication techniques that keep entire copies of data. The saving in storage hardware becomes obvious, but there are other benefits, such as being able to efficiently extend data access to multiple locations rather than the point-to-point nature of most replication. A word of caution: Geo-distributed file locking -- a key factor in delivering an efficient, distributed object-based approach -- isn't a trivial exercise.
- Data can be accessed on many systems using multiple protocols at the same time. This provides the ability to ingest content from a traditional protocol like NFS or SMB while using more efficient object-based access to analyze that content for other purposes. The stateless nature of object protocols compared with NAS reduces the overhead of accessing content -- e.g., file locking or tracking, distributing locks and keeping track of open file handles (see "Reducing overhead").
- Object storage media is a very scalable and cheap. It's ideal to use as an archive while still providing traditional file access. You can also move data off to cloud object-level storage, including for long-term cold retention, while retaining the ability to search content with suitable metadata. This approach makes it easy to develop a hybrid platform that uses a mix of physical and cloud-based resources.
The multiprotocol nature of converged object-NAS systems means that if a user wants to access data for mundane requirements like read-only analysis or analytics, they don't need the benefit and complications of a global file system that might otherwise slow down data access -- and new data writing. So developers can write using NAS protocols -- get locking and integrity benefits -- then read at another location, all without affecting performance.
NAS on object: The vendor roundup
Who is offering NAS on object? We see two distinct types of product emerging: NAS on object to improve the NAS experience and not expose the object store, and vendors offering NAS-object hybrids, where data can be accessed through either protocol.
Examples of the NAS on object deployment model include Nasuni, a startup that offers a cloud-based global NAS product using Amazon Web Services Simple Storage Service for back-end storage. Another company with a similar approach is Exablox (now part of StorageCraft). Its scale-out OneBlox system uses a distributed object store ring to hold file content while providing some interesting features such as smart file versioning and snapshots. OneBlox breaks objects into chunks that enable deduplication, but aren't accessible by the user.
For commercial object store software, there are a range of proprietary vendor products available. Here are some examples:
- Scality's RING offers SMB 2.0 and NFSv3 support, including integration with Microsoft's Active Directory. Protocol support is implemented through "connectors" that are native services running on the RING platform.
- Caringo's Filefly uses file services to extend the company's Swarm object store to support NAS protocols. It also offers SwarmNFS, a lightweight interface that provides NFSv4 access to data stored in a Swarm object store.
- DataDirect Networks lets you use file content with its WOS object store via a feature called NoFS. The company claims some 15% to 20% saving in storage space using NoFS compared with traditional file systems and significant I/O traffic reduction.
- Hitachi Data Systems provides NAS access to its Hitachi Content Platform object storage through HCP Anywhere. A custom HCP Anywhere application is available to access content from mobile devices.
- Cloudian delivers file access to its HyperStore object platform using HyperStore Connect for Files. The product offers stateless access points that provide standard NAS capabilities, including a global namespace and file locking.
In addition to object storage vendors that offer native NAS support, there are file gateways through which you can connect to object stores, such as Avere's FXT. These products don't provide access to data through both protocols, however, and may store data on the back-end object store in a proprietary format, making it impossible to access data at an object level.
Finally, we should mention open source options. Ceph uses object storage as the basis of a scale-out platform that supports object, file and block storage formats, although it doesn't (yet) directly expose the same data through multiple protocols. And there's OpenIO, which supports a range of storage protocols and can be deployed on commodity hardware. This includes ARM-based hardware, using what the company calls "nano-nodes" to turn individual hard drives into storage servers.
Friends in data analytics
Object and file is a great mix for analytics. Store your data with traditional NAS protocols and use object or HTTP to do back-end analytics with minimal overhead and impact. Object protocols don't need file locking and other data integrity features, as long as data is read and written immutably. This reduces the overhead on the file system and improves performance.
There's no doubt the line between object and file storage is blurring, and for many use cases, it makes sense to merge the two. Object-level storage provides a more practical storage method, with greater efficiency and geoflexibility than traditional NAS storage. We can expect to see object and file access as a standard native protocol on all unstructured storage appliances in the not too distant future.
Object storage startups you'll want to know about
Object storage providers: The big six
Learn the issues behind object storage's popularity