The smartest digital brains that fuel AI need fast and efficient storage, not massive and redundant repositories for customers to waste money on, according to Peak:AIO founder and CEO Mark Klarzynski.
The U.K. company sells software-defined storage specifically targeted at supporting GPU compute for AI while running on commodity hardware. The company sells to OEM partners, namely Dell Technologies and PNY Technologies, as well as through channel partners.
The company opened its doors in 2019, emerged from stealth a year later and remains private with no venture capital funding. U.K. education and healthcare companies such as the AI Centre for Value Based Health Care in London, the University of Liverpool and the Oxford Robotics Institute are among the company's 100 deployments.
In this Q&A, Klarzynski explains that customers experimenting with AI generation want storage that's fit for high availability without reduced performance or more costly enterprise storage features, and he talks about why building with capacity storage is wrong for AI -- including for the new generative AI craze.
Editor's note: This Q&A has been edited for clarity and conciseness.
What was the genesis of Peak:AIO?
Mark Klarzynski: All the resellers were selling the Nvidia GPU servers [for AI], but were not selling storage. We'd all heard the cliche that GPUs really needed data. If you don't feed your GPU data, then you're wasting your money.
Data has always been the lifeblood of a company. Storage companies create loads of features that [they] can charge lots of money for because it has a lot of value -- [such as] snapshots, seven nines, all the rest of it.
What we realized with AI was the data wasn't an output anymore -- it's just the input. The output was just a model, an algorithm [or] a decision. The data sets were refined and they were put on this high-performance storage that the only value it had was to feed the GPUs.
Why is storage different for AI creation?
Klarzynski: Everyone driving AI was no longer an IT director. They were a [medical IT director] or an AI data scientist. They really didn't know storage and they didn't want to know storage. Storage was a necessary evil.
We were taking storage [arrays] that had that [needed AI] performance, and it was heavily discounted at $1.5 million. It just didn't work [financially] when the compute [components] were only half a million [dollars]. [This] balance was wrong, [and the customer] didn't want the storage in the first place.
This is one of those times when you've got to reset the [customer] value. They don't want snapshots. They don't want deduplication. They don't need seven nines. The data is safe somewhere else. They just need stable, high-performance, modern storage.
We all thought and everybody will tell you that [generative AI] churns massive amounts of storage, [but the] reality is it doesn't. I would honestly say that within 90% of the [AI] market that we see, we are probably less than 500 terabytes. If you think about ChatGPT, it's 700 terabytes. That's less than a 2U [server].
Mark KlarzynskiFounder and CEO, Peak:AIO
Because of the way we've developed storage, [enterprise] performance demanded hundreds of nodes -- not one or two or three [machines]. We've developed this powerful parallel file system to deal with hundreds of nodes.
[Generative AI] doesn't need it. [Customers] that need performance in that capacity can do it in one machine.
Why is building with capacity storage in mind the wrong choice for AI?
Klarzynski: AI isn't in the IT infrastructure, and the IT guys don't like it. They don't know it, they don't get it, they don't want it.
Even though [an enterprise customer] may have 20,000 machines in the IT department, the AI division will be a handful of data scientists that go out and start afresh. They start afresh because they're just not part of the IT team.
Our feeling and our evidence seems to support the fact that what they do is a medium-sized developer model as opposed to becoming like a data lake.
How did you design the software around those customer concerns?
Klarzynski: We're building more features to say, 'This is your ultrafast, pure performance RDMA NFS to feed, but here's some slightly slower archive storage as well.' We're very much focused on the idea of a completely different, new file system. Forget the metadata -- we don't need it in this world.
It's time for a change for both the single-node performance, like we do, [and the] traditional controller-type environments. We're pretty much a piece of software you take off the shelf -- [a] server off the shelf or an NVMe off the shelf -- and we work [with that].
The Linux mdadm is the cornerstone RAID [software] everybody knows. It's probably the most adopted. We had to parallelize that and remove some of the legacy elements that are not needed today.
Things like snapshots, thin provisioning and remote replication that are not needed for this new environment. We rewrote those to remove the features and the latency that was needed by the IT environment, but not needed by the GPU [high-performance computing] environment.
What do you think is a major storage bottleneck for AI in the future?
Klarzynski: The biggest problem that we're seeing emerge for AI supercomputers is the power. A modern-day Nvidia supercomputer takes 12 kilowatts [of power]. You actually start finding that data centers just can't power them. Energy and cooling is a problem.
Tim McCarthy is a journalist from the Merrimack Valley of Massachusetts. He covers cloud and data storage news.