Getty Images/iStockphoto

Kubernetes multi-cluster users tap service mesh alternatives

Istio service mesh is back in the spotlight since joining the CNCF, but the foundation's existing projects are preferred by IT pros focused on multi-cluster Kubernetes resiliency.

Enterprise IT pros tasked with shoring up resiliency among Kubernetes multi-cluster and multi-cloud environments favored open source service mesh projects Linkerd and Kuma over Istio.

Such distributed deployments have become more common in the last two years as Kubernetes, container management and microservices have gone mainstream. At the same time, a clear trend toward multi-cloud management has also emerged among enterprises: 89% of 339 enterprise IT pros surveyed by Enterprise Strategy Group in December 2021 said they use two or more public cloud service providers, while 42% use four or more.

Kubernetes can act as a common infrastructure automation layer between multiple private and public cloud environments, where managing multiple clusters for each deployment platform is more practical than federating a single cluster over large geographic distances and multiple cloud networks. Security issues with Kubernetes multi-tenancy also have steered enterprises toward multi-cluster scenarios, as has growth in edge computing. IDC predicted in January that worldwide spending on edge computing would grow 14.8% to $176 billion in 2022 and would sustain that growth rate to reach nearly $274 billion by 2025.

As clusters and clouds continue to multiply, service mesh has become an essential part of Kubernetes multi-cluster management for some enterprise IT teams.

Mark SwarbrickMark Swarbrick

"Multi-tenancy, and the ability to manage and orchestrate that, is going to be massively important going forward," said Mark Swarbrick, head of infrastructure at U.K. fintech company Bink. "I can see a lot of people moving to multi-tenancy, multi-region, multi-cloud, and having a stack which you can just deploy across anything ... a sort of agnostic network mesh."

Linkerd, Istio face off amid Kubernetes multi-cluster trend

The first adopters of service mesh were mainly attracted to the network architecture's ability to secure and monitor communications between individual containers within a Kubernetes cluster. In this realm, Istio captured most of the market's attention, which was renewed when Google donated the project to the Cloud Native Computing Foundation (CNCF) in April after holding out for two years.

But for some enterprises, service mesh projects already hosted by the CNCF were a better fit, as they emphasized Kubernetes multi-cluster deployment and didn't require an "all or nothing" approach to service mesh.

Kasper NissenKasper Nissen

"We didn't really want to replicate the observability stack between all our different environments -- we wanted to create a central cluster and find a way to connect [log management between] clusters across AWS accounts," said Kasper Nissen, lead platform architect at Lunar, a digital financial services company based in Denmark, speaking in a user panel session during the recent KubeCon + CloudNativeCon EU event.

Linkerd added a multi-cluster extension in version 2.8, which was released in June 2020. Along with its relative ease of use, this led Lunar's IT team to deploy Linkerd that year over Istio, which was a struggle to get up and running in tests. 

"We weren't ready to take on the complexity that came with a service mesh" connecting services within clusters, Nissen said. "We started with connecting clusters and putting the services that were on the edge into [Linkerd], and not all the rest."

This contrasted with Istio's approach to service mesh deployment, and still does. As of IstioCon this year, Istio still monitors traffic between all resources in a cluster by default, regardless of whether they are part of the service mesh. The upstream project introduced a feature called Discovery Selectors in version 1.10 last year that limits which resources Istio addresses; the project is still in development. IstioCon speakers also said multi-cluster and multi-cloud management remains a work in progress for the project, including support for a new alpha-stage Kubernetes gateway API that will simplify multi-cluster setup.

This isn't to say Linkerd is exempt from some of the management complexities inherent in service mesh. Speakers at the KubeCon panel said they may consider Linkerd commercial vendor Buoyant's fully managed Linkerd service, launched May 4. That service takes on common management headaches such as upgrades and TLS certificate rotation, which can require clusters to be rebuilt or restarted.

UK fintech supports Kubernetes multi-cluster with Linkerd

Linkerd became part of a cloud-native technology overhaul that began in 2019 for Bink, a customer loyalty rewards program service provider whose partners include Barclays, Visa, American Express and Mastercard. This infrastructure refresh began with a move to Kubernetes, but the company's IT teams quickly realized that financial regulatory compliance would require more reliability features than the container orchestration platform provided natively.

That's where Linkerd came in to manage traffic, failover and connection retry logic between Bink's two production Kubernetes clusters, and as a means of automatically injecting Prometheus and Fluentd observability tooling into the clusters.

"The next thing we did is implement leader election code in our microservices, so we can have two production clusters live simultaneously," Bink's Swarbrick said. "That means we can now take a complete cluster offline without any downtime."

Linkerd eliminated the need to add connection retry logic -- which allows applications to gracefully handle transient network failures -- to application code as well, Swarbrick said. Kubernetes can handle some of this, with its ability to automatically restart and reconnect failed pods within the cluster, but Linkerd measurably enhanced app reliability when the service mesh was added in Bink's tests, Swarbrick said.

"If we deliberately kill some of the Kubernetes pods and they restart as you'd expect, without Linkerd, there will be some requests from our applications that fail," Swarbrick said. "Whereas if Linkerd implements the retry logic, it gives us a window [of time to pause connections] until the pods come back up -- internally, we may be only seeing a 70% or 80% [connection] success rate, but externally [the app] will be seeing 100%."

Kuma multi-mesh focus lands American Airlines as a user

Kuma, the CNCF-hosted open source service mesh created by network vendor Kong Inc., is among the newest service mesh projects, and it has focused on multi-mesh support to differentiate it from established competitors. Linkerd and Istio also offer such support, but Kuma's approach centralizes management of multiple meshes through a set of federated remote control planes, which can manage multiple sub-meshes from a central location without requiring the deployment of separate Kubernetes clusters or tying meshes to Kubernetes namespaces.

Jason WalkerJason Walker

This global federation appealed to Jason Walker, director of technology at American Airlines, who chose Kuma as the standard service mesh for the airline's DevOps platform in late 2021.

"We wanted an open source offering that still had some sort of enterprise path if we ever wanted to tap into support; we wanted to make sure that we can provide active-active deployments across [cloud] regions and across clusters; and we might want to even have the service mesh go off of a Kubernetes cluster and start to talk to [legacy apps]," Walker said. "Kuma nailed all three."

If there's an issue in one region, apps are able to fail over to another, and the service mesh helps notify our Kubernetes operators, which then update the rest of the ecosystem about how traffic should be shaped and routed, to automatically self-heal.
Jason Walker Director of technology, American Airlines

Istio and Linkerd have also begun to add support for non-Kubernetes workloads, along with refinements to multi-cluster and multi-cloud failover features, but both areas were an early point of focus for Kuma.

"If there's an issue in one region, apps are able to fail over to another, and the service mesh helps notify our Kubernetes operators, which then updates the rest of the ecosystem about how traffic should be shaped and routed, to automatically self-heal [the system]," Walker said.  "It also helps us identify when there's a gap in how our failovers are currently set up, and through code, we're able to re-instrument and reconfigure our approach."

Walker already had experience with the Kong API gateway, which can act as a Kubernetes ingress controller under Kuma. The Kuma project also offered a middle ground between Istio's flexibility and Linkerd's manageability, in Walker's view.

"We can abstract the Kuma configuration from our developers so that they don't have to know [about it], but when our product teams do want to know, they can go into the repos, see our [configuration] code and get a better understanding for how this stuff works," Walker said.

While Kuma will be part of the "paved road" for American Airlines developers who want to use the company's DevOps platform, some teams choose to use Istio instead.

"Kuma is something we've said we're investing in around automation, monitoring, tuning and just making sure it's easy to use," Walker said. "But it's not a top-down mandate where we're forcing [developers] to make a decision or to change tools."

Enterprise Strategy Group is a division of TechTarget.

Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.

Next Steps

Kubernetes clusters multiply like Tribbles -- but why?

Dig Deeper on Containers and virtualization

Software Quality
App Architecture
Cloud Computing
Data Center