Getty Images

AWS CloudOps hones multi-cloud support for AI, resilience

Network, observability and Kubernetes management news at re:Invent aligned around themes of multi-cloud scale and resilience amid AI growth and cloud outage concerns.

AWS rolled out a series of updates to its cloud operations services this week, all tailored to making multi-cloud and multi-environment infrastructure easier to manage as enterprises prepare for AI.

New features unveiled at the annual AWS re:Invent 2025 conference included a multi-cloud interconnect service launched with Google, based on a new OpenAPI specification; a new set of highly abstracted capabilities for its Elastic Kubernetes Service (EKS); and simplified multi-account, multi-region observability data management, as well as new AIOps automation features.

The impetus for all these changes is the emergence of enterprise AI apps and agents, which correlate with multi-cloud deployments at high scale, and are accompanied by an explosion of operational data, according to a keynote presentation Monday by Nandini Ramani, vice president of cloud operations at AWS.

"While agents certainly help you with creating efficiency and making things a lot simpler, it has also created certain operational challenges," Ramani said. "Operational complexity is only increasing. You are now managing microservices, and in addition, now you have distributed agents, you have event-driven architectures, and on top of that, you are managing all of this across dozens, if not hundreds, of accounts and multiple regions."

Throughout presentations during re:Invent, AWS officials emphasized a desire to support customers wherever they run AI workloads, even if it's not the AWS cloud. This represents a break with tradition for AWS, which has previously focused on getting enterprise customers to go "all-in" on its cloud, according to Matt Flug, an analyst at IDC.

"It's relatively new, especially for AWS, but really, all three of the hyperscalers," Flug said.

AI apps are only part of the wider motivation for this change in strategy, Flug added. High-profile outages for Google, AWS and Azure this year, along with new EU regulations requiring digital resiliency, have also increased enterprise interest in multi-cloud infrastructure, he said.

"Obviously, Google and AWS had been working on [the interconnect API] since before the outages, but I do think that we're seeing an increase in companies using multi-cloud, and a lot more people planning to use multi-cloud strategies," he said. "I hear a lot about it from clients in terms of resiliency."

Nandini Ramani operations keynote re:Invent 2025
Nandini Ramani, vice president at AWS, delivers a cloud operations-focused keynote presentation at re:Invent 2025.

AWS Interconnect simplifies multi-cloud networking

Among the most striking departures from AWS's traditionally insular approach to cloud infrastructure prompted by the new AI gold rush was the rollout of AWS Interconnect – multi-cloud in preview. The service, designed in collaboration with Google using a new open source API specification, is meant to simplify the configuration of multi-cloud networks, according to a joint blog post.

"Previously, to connect cloud service providers, customers had to manually set up complex networking components including physical connections and equipment; this approach required lengthy lead times and coordinating with multiple internal and external teams," according to the post. "They can now provision dedicated bandwidth on demand and establish connectivity in minutes through their preferred cloud console or API."

One enterprise that's already deeply invested in multi-cloud infrastructure management for AI will welcome the assistance with manual configurations, according to one of its IT leaders.

"When it comes to AI and providing generative AI models to power our AI services, we use the right model for the task. This can be challenging as no single provider has models that work for all tasks in the 60+ languages we support and in every geographic region we operate," wrote Ian Beaver, chief data scientist at Verint, a contact center-as-a-service provider in Melville, N.Y., in an email to Informa TechTarget.

"Therefore, we end up with multi-cloud deployments where some product services may run on one hyperscaler and the AI models used may be provided from another, such as [Amazon] Bedrock or Azure OpenAI," Beaver said. "This requires setting up secure cloud-to-cloud networking and it can be time consuming to deploy. Any automation created around the secure network setup across hyperscalers is welcome and will reduce both complexity and time to deployment when configuring new regions."

Another joint customer of both Google Cloud and AWS emphasized the resiliency benefits of the new interconnect service.

"I think the EU resiliency regulations explain it much better -- it's when you're integrating layers of application stacks across clouds that these network tunnel integrations shine," said David Strauss, chief architect and co-founder at WebOps company Pantheon. "We are about 95% GCP but have some resources on AWS. I welcome this initiative because I've managed routing and tunnels. It's a giant project to get secure, low-latency, reliable links between data centers. There is so little that's unique to each implementation, but it somehow never feels that way."

Jim Frey, analyst, OmdiaJim Frey

AWS will take the Interconnect spec a step further with another service in gated preview this week called AWS Interconnect – last mile.

"Interconnect – last mile will provide similar accelerated deployment of AWS Direct Connect to customer premises locations such as private data centers and campuses -- essentially automated WAN," said Jim Frey, an analyst at Omdia, a division of Informa TechTarget. "Lumen is the first ISP they are working with, but it will require a lot of partnering with the ISP provider community."

Cost considerations will also come into play for Interconnect to be viable for mainstream enterprises, said Rob Strechay, an analyst at TheCube Research.

"The proof will be in the pudding, which will take the shape of a bill each month," he said. "The layers of potential fees will be interesting to see, but could be nominal if your company is already doing this cross-cloud."

EKS Capabilities move management 'up the stack'

Three new options for EKS users made generally available this week also conform with the overall theme of simplifying cloud operations, "allowing you to focus on deploying applications rather than maintaining platform infrastructure," according to the AWS website.

EKS Capabilities include managed services for GitOps workflows wth Argo CD; AWS Controllers for Kubernetes, which provide cloud infrastructure management within the EKS control plane; and the Kube Resource Orchestrator, which platform teams can use to create custom resource templates. All three are based on open source projects, but focused on AWS infrastructure, as opposed to other open source equivalents designed for similar purposes that support multiple clouds such as Crossplane and KubeVela. Commercial products such as Red Hat OpenShift also support high-scale multi-cluster management in hybrid clouds.

It's basically a race to onboard as many AI workloads as possible, and to make sure to show to people that scale is not an issue.
Torsten VolkAnalyst, Omdia

For users already invested in EKS, the new capabilities are in keeping with broader trends in the Kubernetes ecosystem focused on easing multi-cluster management to support AI scale, said Torsten Volk, an analyst at Omdia.

"They all have the same ambition," Volk said. "It's basically a race to onboard as many AI workloads as possible, and to make sure to show to people that scale is not an issue, because we hear about the incredible resource requirements that all those workloads have and none of the three major cloud vendors wants to be under the suspicion that maybe they can't handle it."

As with Interconnect, pricing will be a factor in adoption for EKS Capabilities, Strechay said. For example, the Argo CD capability comes with a base charge of $0.02771 per hour, according to the EKS pricing website, plus a per-application charge of $0.00136 per hour. With 100 applications, Argo CD would cost about $130 per month. A Reddit thread this week drew comments objecting to this as a "heinous markup."

"If you use it for significantly more [applications] and in dev environments, the price could easily be tens of thousands of dollars a month," Strechay said. "At the low end of small and midsize businesses, you might be able to make a case for that. But at enterprise scale, rolling your own is way more cost-effective."

Longer-term, Volk said it's clear that AWS aspires to making AI agents the main management layer for EKS, as evidenced by the launch of an EKS Model Context Protocol (MCP) server and DevOps frontier agent this week.

"They see those agents as a super orchestrator, where they want developers to manage EKS through an agent so that you don't have to go to CloudFormation or CloudWatch anymore," Volk said. "Instead, you get it at the agent level, one level of abstraction above, so you don't have to know exactly which metrics you need to look at, because it will tell you, based on your context, which Prometheus metrics you need to scrape, and things like that. That's what they're trying to do."

CloudWatch observability, AIOps updates

In the meantime, however, AWS customers are still going directly to cloud operations tools to manage AI applications, prompting a bevy of updates to Amazon CloudWatch and AgentCore Observability this week.

These updates included:

  • Generative AI observability support including latency, token usage and errors, compatible with agentic frameworks such as LangChain, LangGraph, and CrewAI.
  • CloudWatch Application Signals, first launched in September, which creates an application topology map without requiring application instrumentation, now automatically groups resources based on corresponding applications.
  • CloudWatch investigations, released in June, now provide automatic incident .report generation including the "Five Whys" used by AWS for its internal incident postmortems
  • MCP servers for CloudWatch and Application Signals.
  • General availability of a CloudWatch Application Signals GitHub Action, which automatically correlates CloudWatch telemetry with application source code.

Ramani emphasized multi-cloud support for AI application telemetry data as well during her presentation on these new features.

"This works no matter where you choose to host your agents," she said. "You get complete visibility out of the box into latency, token usage and performance across all of your AI workloads."

Scaling AI data and storage management

Agentic AI is also prompting calls for simpler storage and data management at scale among AWS customers, according to Ramani's presentation.

"Now your AI agents work 24/7, and they handle thousands of customer requests every hour, and each of these requests generates an immense amount of telemetry throughout the day," she said. "That's not just more data, it's exponentially more surface area that you now have to both monitor and secure."

In response, AWS issued multiple updates for observability and security data management, such as natural language processing support for data pipelines; cross-account and cross-region log data collection; support for CloudWatch Database Insights across RDS and Aurora databases, as well as multiple accounts and regions; and aggregated events for CloudTrail audit logs.

All of these incremental improvements add important underpinnings for multi-cloud AI management, according to IDC's Flug.

"With AI, everyone wants that singular source of truth, of data, [but] you need to make it easier for folks to do that – if they need to build physical connections themselves, and it takes months to do that, that's not really helping them get started with AI," he said.

Beth Pariseau, a senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.

Dig Deeper on Cloud provider platforms and tools