OpenStack Swift object storage offers a potentially economical way for organizations to store large amounts of data on commodity hardware for public and private cloud installations. But users can achieve improved results by following a number of tips and recommendations.
In this podcast interview with senior writer Carol Sliwa, Beth Cohen, a senior cloud architect at Boston-based Cloud Technology Partners Inc., supplies advice on the types and amounts of data for which OpenStack Swift is best suited, the tools that work with Swift, the recommended number of zones and data centers, as well as tips for saving money with Swift object storage.
For what types of data is OpenStack Swift suitable, and for what types of data is Swift not suitable?
Beth Cohen: I find that Swift is best suited for very, very large pools of relatively small data. There is a limit of 5 GB per object, so if you have, let's say, large video files, you'll need to break them up. Typically a large archive store or a pool of documents would be an appropriate use for Swift.
What's the minimum amount of data for which somebody might want to consider using Swift?
Cohen: You can use Swift for pretty much any amount, but the reality is it's the amount of data that fills up more than one storage box. So, I would say about half a petabyte is kind of where you want to start. [For] anything less than that, it's just going to be in a single box or a couple boxes, so it doesn't really make any sense.
What are some of the tools that people can use to work with OpenStack Swift?
Cohen: OpenStack Swift is an open source project, which means that it tends to not have the polished tools and out-of-the-box functionality that you would expect with a commercial product. Now the good news is you can buy commercial versions of OpenStack Swift. But, if you're rolling your own, you're really going to be using the typical Linux tools. Euca2ools has some good monitoring and logging and auditing tools, and Nagios, of course, and Zenoss and other standard tools are available for you. OpenStack Swift also uses an API, so there's a number of drivers that work with the OpenStack API.
You just mentioned Swift's API, and Swift also supports Amazon's Simple Storage Service, which is better known as S3. How well does Swift support the S3 API, and why is S3 API support important for end users?
Cohen: S3 is sort of a de facto standard at this point. OpenStack Swift does indeed support the S3 API, and it supports it reasonably well. So, it's certainly the standard functionality. It's important because many users are moving into using hybrid clouds and multiple storage end points or landing points, and some of them may be in S3, and some of them may be in OpenStack, and some of them may be in more traditional storage locations. So, you really want to have options to be able to mix and match.
What's the best way to ensure data protection with Swift?
Cohen: Swift uses a concept called 'unique as possible' to deliver its data protection. It makes three copies of the data in a hierarchy of locations, and the top location is a region, which could be in different data centers around the world. Below that are zones, which could be unique racks within a data center, and then under that is a server, which is also sometimes referred to as a node. The final atomic element is a disk on the server. In a very small installation, you would have the object on three different disks in that server.
To ensure your data is protected as much as possible, you would use the combination of the region, zone, server and disk in a configuration that meets your requirements. So, in a very large implementation, you would have multiple regions around the world, which would give the highest level of data protection. If one data center went away, the other two would still be available. But in a very small installation, you might have it on a single box and that would have three copies distributed among the disks that are on that server.
Is there a requirement that you have a full hierarchy of regions, zones, servers and disks?
Cohen: No, that's the beauty of the system, which is that you can use them in combination, mix and match, based on your requirements. In a very large installation, you would most likely use all four of the concepts, while in a very small installation you might only use a server and disk. It gives you a lot more flexibility than the previous version, which only allowed zones.
Do you recommend the use of multiple data centers with Swift?
Cohen: You certainly can do that, and there are a number of companies that have done that mostly for data replication purposes. But, be aware that whenever you have multiple instances where you're using it for high availability, it also is going over the Internet, which means an expensive amount of data is getting moved around. So, unless you have a very large installation or have access to relatively cheap bandwidth, I would not particularly recommend using that for a Swift store.
Organizations generally use commodity hardware with Swift. What are some other tips you suggest to save money with Swift?
Cohen: Swift really is pretty inexpensive to implement, but there are a number of places you can save money. Certainly commodity hardware is one. Another area [where] I have found that you can save money, particularly in a very, very large installation, is to cut back on the redundancy of the hardware. You can cut the number of top-of-rack switches. You don't need to have multiple ones. You can have just one. Another area to save money is you don't need to implement RAID on the nodes because it's built into the architecture of Swift itself.