Generative AI's massive data needs will drive companies to choose hybrid storage that includes cloud and on-premises hardware in 2024.
Enterprise storage needs for data lake generation and machine learning (ML) drove purchases in the past several years, but the boom of GenAI interest and services has made storage a priority in the enterprise IT stack, said Dave Raffo, a senior analyst at Futurum Group.
"The idea for the storage vendors is you'll need more to hold all this data for AI," Raffo said.
Many enterprises and organizations can't afford to develop their own GenAI capabilities, making them turn to cloud GenAI services and cloud storage to store those data repositories. Analysts expect most customers to settle for a combination of on-premises hardware with large amounts of cloud object storage to support this GenAI boom.
This hybrid model will likely continue as GenAI creation and adoption continues, said Ray Lucchesi, founder and president of Silverton Consulting.
"[GenAI] is becoming so pervasive, and the need for training data isn't going away," he said.
Hybrid cloud a priority
Research conducted by TechTarget's Enterprise Strategy Group (ESG) indicates that investments and movement into clouds will continue, according to buyer survey responses. GenAI and ML services have a voracious appetite for data, especially if an organization develops its own capabilities without drawing on a public large language model (LLM).
Organizations might want to build their own LLMs on their own storage to avoid the copyright infringement or data privacy concerns looming over the GenAI market hype or just to specialize for a given industry, storage analysts said. A hybrid cloud approach can enable that regulatory compliance with access to the cloud services that enterprise customers are interested in using.
About 35% of responding IT professionals in an ESG infrastructure modernization survey last month said their organizations are using the cloud to consolidate data collected at the edge when moving from on-premises data centers. That need outpaced the desire to support large analytics or machine learning at 31%.
Most respondents to the survey said they are using between two to four different cloud providers and moving data from on-premises to the cloud regularly.
In another ESG survey from September that focused on public cloud decisions, 47% of respondents said their organization has adopted a cloud-first policy, meaning they use public cloud services to deploy new applications unless there's a business reason to do otherwise. Of those cloud-first replies, 38% of respondents said the reason was for a reduced total cost of ownership.
Optimization required for hybrid cloud
The survey results align with Raffo's expectation that reducing cost will continue to be a focus for hybrid cloud operations as more enterprises embrace the model, likely made possible through a proliferation of vendor hybrid cloud data management tools such as NetApp's BlueXP.
"Everyone is talking about cost optimization," Raffo said. "Everybody was [previously] talking about [cloud] repatriation, but everyone is taking a look at where things are."
More cloud buyers seeking a hybrid environment will turn to vendor platforms such as Hewlett Packard Enterprises' GreenLake or Dell Technologies' Apex, said Mike Matchett, principal analyst at Small World Big Data. These platforms, which offer a variety of hardware and software through a SaaS payment model, could further drive down cloud costs in storage for cloud vendors to compete.
Ray LucchesiPresident and founder, Silverton Consulting
"The more hybridization that goes on under the cover, the cheaper cloud storage would need to be to remain attractive," Matchett said.
Not only will cloud storage sellers need to stay on their toes, but the hybridization of storage vendors that previously focused on selling on-premises hardware will continue, Lucchesi said.
NetApp is one storage vendor that has invested heavily in the hybrid model, making deals and investments with numerous cloud vendors such as AWS and Google Cloud, he said. Similarly, Pure Storage's pitch of all-flash hardware has been joined at the hip with a cloud software and service component as well.
"The [storage hardware] vendors have realized for a long time now that they have to make money in the cloud or [they'll] kill themselves off," Lucchesi said.
Data to remain in motion
As GenAI adoption spurs new purchases, IT buyers might adopt new models for thinking how to best use available storage resources through virtualization, Matchett said.
Cloud vendors including AWS have released flash-based object storage for customers, recognizing how many are now integrating object storage into workloads, he said. The underlying storage architecture and configuration won't be as important compared with how applications and associated APIs use the data and the cost of housing it, rendering hardware specifics moot.
"Even the storage administrators won't care [about the hardware] anymore," Matchett said. "How [data] is stored underneath, it could be any and all of those [storage types] pointed at one another."
Object storage does provide a cheap data repository and has increased its presence in on-premises deployments, Lucchesi said. But specific workloads will still have performance requirements needing manual configuration.
"[Object has] become the back end of choice because it's so flexible and can play almost any role," Lucchesi said. "Is it going to be a block storage solution at millisecond [response] level? Probably not."
Matchett believes archive data might also shift away from being a priority for storage administrators, as data at rest isn't being used to create or train new AI or ML applications. Customers that are purchasing storage might also splurge for additional services to keep that data in motion, instead of moving it to colder storage.
"Is there such a thing as an archive anymore?" Matchett said. "You will always need backups and secondary storage, but the idea is, if you're not making use of your data, you're spending too much to keep it."
Tim McCarthy is a journalist from the Merrimack Valley of Massachusetts. He covers cloud and data storage news.