Laurent - stock.adobe.com
Q&A: SwiftStack object storage zones in on AI, ML, analytics
Q&A: SwiftStack's founder and president Joe Arnold says disk-based object storage can deliver the concurrency and throughput that AI, machine learning and analytics workloads need.
SwiftStack founder Joe Arnold said the company's recent layoffs reflected a change in its sales focus but not in its core object storage technology.
San Francisco-based SwiftStack attributed the layoffs to a switch in use cases from classic backup and archiving to newer artificial intelligence, machine learning and analytics. Arnold said the staffing changes had no impact on the engineering and support team, and the core product will continue to focus on modern applications and complex workflows that need to store lots of data.
"I've always thought of object storage as a data as a service platform more than anything else," said Arnold, SwiftStack's original CEO and current president and chief product officer.
TechTarget caught up with Arnold to talk about customer trends and the ways SwiftStack is responding in an increasingly cloud-minded IT world. Arnold unveiled product news about SwiftStack adding Microsoft Azure as a target for its 1space technology, which facilitates a single namespace between object storage locations for cloud platform compatibility. The company already supported Amazon S3 and Google.
SwiftStack's storage software, which is based on open source OpenStack Swift, runs on commodity hardware on premises, but the 1space technology can run in the public cloud to facilitate access to public and private cloud data. Nearly all of SwiftStack's estimated 125 customers have some public cloud footprint, according to Arnold.
Arnold also revealed a new distributed, multi-region erasure code option that can enable customers to reduce their storage footprint.
What caused SwiftStack to change its sales approach?
Joe Arnold: At SwiftStack, we've always been focused on applications that are in the data path and mission critical to our customers. Applications need to generate more value from the data. People are distributing data across multiple locations, between the public cloud and edge data locations. That's what we've been really good at. So, the change of focus with the go-to-market path has been to double down on those efforts rather than what we had been doing.
How would you compare your vision of object storage with what you see as the conventional view of object storage?
Arnold: The conventional view of object storage is that it's something to put in the corner. It's only for cold data that I'm not going to access. But, that's not the reality of how I was brought up through object storage. My first exposure to object storage was building platforms versus Amazon Web Services when they introduced S3. We immediately began using that as the place to store data for applications that were directly in the data path.
Didn't object storage tend to address backup and archive use cases because it wasn't fast enough for primary workloads?
Arnold: I wouldn't say that. Our customers are using their data for their applications. That's usually a large data set that can't be stored in traditional ways. Yes, we do have customers that use [SwiftStack] for purely cold archive and purely backup. In fact, we have features and capabilities to enhance some of the cold storage capabilities of the product. What we've changed is our go-to-market approach, not the core product.
So, for example, we're adding a distributed, multi-region erasure code storage policy that customers can use across three data centers for colder data. It allows the entire segments of data -- data bits and parity bits -- to be distributed across multiple sites and, to retrieve data, only two of the data centers need to be online.
How does the new erasure code option differ from what you've offered in the past?
Arnold: Before, we offered the ability to use erasure code where each site could fully reconstruct the data. A data center could be offline, and you could still reconstruct fully. Now, with this new approach, you can store data more economically, but it requires two of three data centers to be online. It's just another level of efficiency in our storage tier. Customers can distribute data across more data centers without using as much raw storage footprint and still have high levels of durability and availability. Since we're building out storage workflows that tier up and down across different storage tiers, they can utilize this one for their most cold data storage policies.
Does the new erasure coding target users who strictly do archiving, or will it also benefit those doing AI and analytics?
Arnold: They absolutely need it. Data goes back and forth between their core data center, the edge and the public cloud in workflows such as autonomous vehicles, personalized medicine, telco and connected city. People need to manage data between different tiers as they're evolving from more traditional-based applications into more modern, cloud-native type applications. And they need this ultra-cold tier.
How similar is this cold tier to Amazon Glacier?
Arnold: From a cost point of view, it will be similar. From a performance point of view, it's much better. From a data availability point of view, it's much better. It costs a lot of money to egress data out of something like AWS Glacier.
How important is flash technology in getting performance out of object storage?
Arnold: If the applications care about concurrency and throughput, particularly when it comes to a large data set, then a disk-based solution is going to satisfy their needs. Because the SwiftStack product's able to distribute requests across lots of disks at the same time, they're able to sustain the concurrency and throughput. Sure, they could go deploy a flash solution, but that's going to be extremely expensive to get the same amount of storage footprint. We're able to get single storage systems that can deliver a hundred gigabytes a second aggregate read-write throughput rates. That's nearly a terabit of throughput across the cluster. That's all with disk-based storage.
What do you think of vendors such as Pure Storage offering flash-based options with cheaper quad-level cell (QLC) flash that compares more favorably price-wise to disk?
Arnold: QLC flash is great, too. We support that as well in our product. We're not dogmatic about using or not using flash. We're trying to solve large-footprint problems of our customers. We do have customers using flash with a SwiftStack environment today. But they're using it because they want reduced latencies across a smaller storage footprint.
How do you see demand for AWS, Microsoft and Google based on customer feedback?
Arnold: People want options and flexibility. I think that's the reason why Kubernetes has become popular, because that enables flexibility and choice between on premises and the public cloud, and then between public clouds. Our customers were asking for the same. We have a number of customers focused on Microsoft Azure for their public cloud usage. And they want to be able to manage SwiftStack data between their on-premises environments with SwiftStack and the public cloud. So, we added the 1space functionality to include Azure.
What tends to motivate your customers to use the public cloud?
Arnold: Some use it because they want to have disaster recovery ready to go up in the public cloud. We will mirror a set of data and use that as a second data center if they don't already have one. We have customers that collect data from partners or devices out in the field. The data lands in the public cloud, and they want to move it to their on-premises environment. The other example would be customers that want to use the public cloud for compute resources where they need access to their data, but they don't want to necessarily have long-term data storage in the public clouds. They want the flexibility of which public cloud they're going to use for their computation and application runtime, and we can provide them connections to the storage environment for those use cases.
Do you have customers who have second thoughts about their cloud decisions due to egress and other costs?
Arnold: Of course. That happens in all directions. Sometimes you're helping people move more stuff into the public cloud. In some situations, you're pulling down data, or maybe it's going in between clouds. They may have had a storage footprint in the public cloud that was feeding to some end users or some computation process. The egress charges were getting too high. The footprint was getting too high. And that costs them a tremendous amount month over month. That's where we have the conversation. But it still doesn't mean that they need to evacuate entirely from the public cloud. In fact, many customers will keep the storage on premises and use the public cloud for what it's good at -- more burstable computation points.
What's your take on public cloud providers coming out with various on-premises options, such as Amazon Outposts and Azure Stack?
Arnold: It's the trend of 'everything as a service.' I think what customers want is a managed experience. The number of operators who are able to manage these big environments is becoming harder and harder to come across. So, it's a natural for those companies to offer a managed on-premises product. We feel the same way. We think that managing large sets of infrastructure needs to be highly automated, and we've built our product to make that as simple as possible. And we offer a product to do storage as a service on premises for customers who want us to do remote operations of their SwiftStack environments.
How has Kubernetes- and container-based development affected the way you design your product?
Arnold: Hugely. It impacts how applications are being developed. Kubernetes gives an organization the flexibility to deploy an application in different environments, whether that's core data centers, bursting out into the public cloud or crafting applications out to the edge. At SwiftStack, we need to make the data just as portable as the containerized application is. That's why we developed 1space. A huge number of our customers are using Kubernetes. That just naturally lends itself to the use of something like 1space to give them the portability they need for access to their data.
What gaps do you need to fill to more fully address what customers want to do?
Arnold: One is further flushing out 'everything as a service.' We just launched a service around that. As more customers adopt that, we're going to have more work to do, as the deployments become more diverse across not just core data centers, but also edge data centers.
I see the convergence of file and object workflows and furthering 1space with our edge-to-core-to-cloud workflows. Particularly in the world of high-performance data analytics, we're seeing the need for object -- but it's a world that is dominated by file-based applications. Data gets pumped into the system by robots, and object storage is awesome for that because it's easy and you get lots of concurrency and lots of parallelism. However, you see humans building out algorithms and doing research and development work. They're using file systems to do much of their programming, particularly in this high performance data analytics world. So, managing the convergence between file and object is an important thing to do to solve those use cases.