Getty Images/iStockphoto
How neocloud is influencing network connectivity
Demand for computational power optimized for AI is driving the rise of the neocloud, but can the transport infrastructure keep up with the pace of change?
With the rise of AI, the need for specialized low-latency processing to rapidly produce and render images, videos and 3D graphics is surging. GPUs, which can run thousands of mathematical operations, are foundational to facilitating the development and deployment of AI-driven applications. But global demand for GPU-centric computing has exceeded hyperscaler capacity. Specialized providers emerged to deliver GPU-as-a-service platforms, known as neocloud.
Neoclouds offer high-performance GPU compute power essential for AI and machine learning workloads. These specialized, purpose-built clouds provide dense GPU clusters and ultra-low-latency interconnects to support accelerated, power-efficient processing for AI and ML tasks. Neoclouds give organizations access to on-demand GPU clusters, typically bare metal, without requiring clients to sign up for the extensive set of adjacent services associated with hyperscaler engagements.
These GPU clusters can be located at the edge, closer to content creation and consumption points, to alleviate latency issues. Vendors recognize the AI opportunity as a significant revenue opportunity, projecting the pipeline for connectivity in this $2 billion sector.
Raising the design quotient
So, how does a neocloud interconnection architecture differ from traditional cloud infrastructure? The compute performance required to support AI workloads necessitates ultra-high-bandwidth, low-latency network connectivity.
To support GPU clusters, neocloud providers often use a dual-network approach with the following:
- A conventional front-end Ethernet network for user traffic.
- A second high-performance backend fabric network.
The second network is dedicated to GPU-to-GPU communication, enabling AI clusters to process high volumes of internal data flows. This setup ensures optimal performance without the bottlenecks that happen on a traditional cloud network.
Neoclouds often use network fabrics such as InfiniBand because the technology delivers greater bandwidth and lower latency than traditional data center Ethernet connectivity to support parallel GPU workload processing. This fabric removes networking overhead that can impede sizable AI training tasks.
Unlike conventional cloud networks built to move north-south client-server traffic, neocloud interconnections need to be optimized for east-west data transfer between servers. In a neocloud, the GPU network must move significant synchronization traffic for distributed AI training without packet loss. The neocloud network, in essence, is a high-performance computing cluster interconnect that supports potentially thousands of GPUs working simultaneously without throughput barriers.
Because these networks are dedicated to AI workloads, the isolation eliminates the contention and jitter typical of a multi-tenant network. This enables the consistent, predictable performance required for AI.
Organizations must ensure their networking procedures are aligned with neocloud architectures. Instead of handling conventional branch-to-data center transfers, SASE networks handle high-throughput AI model training and inference traffic. SASE and SD-WAN policies are necessary to ensure high bandwidth and minimal packet loss for data-intensive training sets.
Rising to the challenge
Neoclouds can provide the foundation organizations need to develop and deploy AI-driven applications. But the inherent complexity of integration and connectivity differs from that in conventional cloud environments. Unlike hyperscaler clouds, which typically offer easy entry points and peering for enterprise interconnections, neoclouds don't use public internet exchanges. This can cause erratic latency and throughput. This places the onus on network operations teams to build bespoke connections.
The intensive computing requirements in a neocloud environment mean AI performance can take a hit if the network isn't up to the task. In many cases, neocloud providers are still expanding capacity. This can hamper scalability and performance.
Because neoclouds are so nascent, mature hyperscaler environments could lack certain security and observability features. The responsibility to build these capabilities could fall to enterprise security and operations teams. Neocloud providers also tend to offer fewer tools and have more limited partner ecosystems than their hyperscaler peers. This places an additional burden on customers who likely lack the internal neocloud skillset.
For organizations that already navigate the challenge of managing a multi-cloud environment, neocloud adds another layer of complexity. From a networking perspective, this translates into more involved routing and policy management.
On the flip side
Neoclouds can provide a path forward for optimal multi-cloud computing by providing an environment dedicated to the specific requirements of AI applications while maintaining standard infrastructure for general workloads. In other words, neocloud is not a replacement for conventional cloud, but a parallel specialized environment.
From a cost perspective, neoclouds promise cost savings of 50% to 70% compared to running AI compute in conventional public clouds. Because neoclouds are designed specifically for AI workloads, they rely on infrastructure and capabilities such as advanced liquid cooling to deliver faster model training and more efficient GPU utilization.
Best practices for neocloud success
Some organizations might feel rushed to develop AI-driven applications that produce desired business outcomes. For neocloud deployment, however, enterprises should consider many factors in advance. Two of the most important questions organizations should ask are the following:
- Does the architecture map to business objectives?
- Is the planned application aligned with an organization's mission?
With AI mandates being set across enterprises, it can create a rushed mentality toward developing AI-driven applications that produce desired business outcomes.
Businesses need to produce a consistent network model across clouds while creating policy variations that meet workload requirements. Cost also needs to be part of the equation. Inefficient deployments have plagued multi-cloud implementations, which drive up expenses and reduce dividends. Enterprises need to approach the total costs of neocloud connectivity with this in mind.
Network standards and the ability to move workloads to secondary environments are key issues that need to be addressed. A neocloud environment where critical AI workloads run must be resilient from the beginning.
Amy Larsen DeCarlo has covered the IT industry for more than 30 years, as a journalist, editor and analyst. As a principal analyst at GlobalData, she covers managed security and cloud services.