Modern Stack

Insight on building and supporting cloud apps

drx - Fotolia

Service mesh architecture radicalizes container networking

Containerization is the IT industry's favorite superhero, so it's only fitting that containers have a powerful sidekick in service mesh. Together, they fight network management chaos.

Containers and microservices have given rise to a new network architecture paradigm called service mesh, but IT industry watchers disagree about whether it'll see widespread enterprise use.

A service mesh architecture uses a proxy called a sidecar container attached to every application container, VM or container orchestration pod, depending on the type of service mesh in use. This proxy can then attach to centralized control plane software, which gathers fine-grained network telemetry data, applies network management policies or proxy configuration changes, and establishes and enforces network security policies.

It's still early days for the service mesh architecture in IT systems, but, as with containers, its rise to prominence has been rapid. At the Cloud Native Computing Foundation's (CNCF's) KubeCon and CloudNativeCon in December 2017, service mesh had already bypassed containers as the hottest topic of conversation among cutting-edge DevOps shops.

"We often find ourselves wanting to build application software, but what we're actually doing is writing the same code over and over again to solve something that's actually a really hard computer science problem that should be factored into some kind of common interface," said Ben Sigelman, CEO of microservices monitoring startup LightStep, in a keynote talk about service mesh at KubeCon.

"Service mesh can help with discovery of services, interconnection of those services, circuit breaking, load balancing, … security and authentication," said Sigelman, a former Google engineer and co-creator of OpenTracing, a provider of open source, vendor-neutral APIs.

A brief history of service mesh

The earliest versions of the sidecar proxy technology began to emerge in early 2016 at web-scale shops such as Google and Twitter, where microservices management necessitated new thinking about networks. Unlike traditional monolithic applications, microservices rely on the external network to communicate and coordinate application functions. These microservices communications need to be closely monitored and, at times, reconfigured en masse at large scale.

The earliest techniques used to automate microservices network management relied on libraries, such as Netflix's Hystrix, that were deployed as part of the application code. As a result, developers needed to take on network management. These libraries also had to be written in every application language used in a particular environment. This presented a conundrum, since a major tenet of the microservices ethos is independent service management by small teams free to work in any language.

Most organizations that think they're doing microservices are not really doing true microservices.
Anne Thomasanalyst, Gartner

In early 2016, engineers who had implemented the first microservices at Twitter founded Buoyant, a company that took the sidecar proxy approach as an alternative to application libraries. Buoyant coined the term service mesh in mid-2016, and its initial service mesh product, Linkerd, uses a Java Virtual Machine (JVM) as a sidecar, a design that shifts the network management burden away from app developers and supports centralized management of polyglot application networks. So far, Linkerd is the only service mesh architecture in production in mainstream enterprise IT shops. Customer references include Salesforce, PayPal, Credit Karma, Expedia and AOL.

As Linkerd gained a foothold, however, Docker containers and Kubernetes orchestration sent Buoyant engineers back to the drawing board. In December 2017, the company released Conduit, a service mesh architecture built on a lightweight container-based proxy, rather than Linkerd's resource-heavy JVM. It was written specifically for use with Kubernetes in a combination of the Go and Rust application languages.

The Kubernetes community was writing lightweight services in Go that might take 20 MB or 50 MB of memory to run, so Linkerd's JVM, which could weigh in at 200 MB of memory consumption, was a point of friction for Kubernetes enthusiasts, said William Morgan, Buoyant's co-founder and CEO.

"It's not ideal for it to take that much memory, especially when the value proposition is that it's going to be part of the underlying infrastructure that the developer doesn't have to worry about," Morgan said.

But just as Buoyant engineers began to rethink their service mesh architecture in early 2017, Kubernetes' creator Google and fellow tech heavyweight IBM teamed up with the ride-hailing company Lyft to create Istio. This container-based service mesh garnered big industry buzz, given the fame of its backers and Google's internal experience managing container-based microservices at massive scale. Google contributed control plane software to Istio based on its internal Service Control tool, while IBM added the control-plane tool Amalgam8. Istio is based on Lyft's Envoy sidecar proxy, which the company built to take orders from an API-based control plane. It can absorb sidecar configuration updates without requiring a restart.

A service mesh architecture uses sidecar containers to facilitate network traffic

Istio's backers are in talks with the CNCF, home of Kubernetes, about long-term governance. They plan to issue a production-ready 1.0 version in the third quarter of 2018.

Linkerd and Istio have turned the most heads in this emerging market so far, but there are many service mesh architecture projects afoot, including both open source and proprietary options. Many of these projects are based on the Envoy sidecar, but not all. Nginx introduced its own centralized management control plane based on its Nginx Plus proxy. Other early service mesh hopefuls include Turbine Labs' Houston, Datawire's Ambassador, Heptio's Contour,'s Gloo and Tigera's CNX.

Who needs a service mesh?

It's too soon to tell just how widespread service mesh architecture adoption will be among mainstream enterprise IT shops that don't work at the scale of Twitter or Google.

For organizations working with containers in a limited way, the service discovery and network management features of existing API gateways and Kubernetes or PaaS software, such as Docker Enterprise Edition or Cloud Foundry, may provide adequate microservices support, Gartner analyst Anne Thomas said.

"Most organizations that think they're doing microservices are not really doing true microservices," Thomas said. "And I'm not convinced that true microservices are ever going to become mainstream within the traditional enterprise."

[Service mesh] allows you to push traffic around in a centralized way that's consistent across many different environments and technologies, and I feel like that's useful at any scale.
Zack Angelodirector of platform engineering, BigCommerce

To Thomas, true microservices are as independent as possible. Each service handles one individual method or domain function; uses its own separate data store; relies on asynchronous event-based communication with other microservices; and lets developers design, develop, test, deploy and replace this individual function without having to redeploy any other part of the application.

"Plenty of mainstream companies are not necessarily willing to invest quite that much time and money into their application architecture," Thomas contended. "They're still doing things in a more coarse-grained manner, and they're not going to use a mesh, at least until the mesh becomes built into the platform as a service that they're using, or until we get brand-new development frameworks."

Some early adopters of the service mesh architecture don't believe a slew of microservices is necessary to benefit from the technology.

"It allows you to push traffic around in a centralized way that's consistent across many different environments and technologies, and I feel like that's useful at any scale," said Zack Angelo, director of platform engineering at BigCommerce, an e-commerce company based in Austin, Texas, that uses the Linkerd service mesh. "Even if you have 10 or 20 services, that's an immensely useful capability to have."

Traditional network management concepts, such as load balancers, don't have the ability to route tiny percentages of traffic to certain nodes for a canary or blue/green application rollout, Angelo said. Nor do traditional network monitoring tools offer the kind of granular telemetry data provided by a service mesh, which allows Angelo to keep track of tiny outliers in the 99th percentile of application latencies, the importance of which is magnified in a microservices network.

Linkerd's load-balancing mode uses a technique called exponentially weighted moving average so that, when the service mesh distributes network traffic across hosts, it considers how fast downstream services are responding and then routes traffic to where services are performing best, as opposed to the traditional round-robin load-balancing technique.

It's important they have real-time data and that they personalize the experience for each user.
Jennifer Linproduct management director for Istio, Google

"We're spread across several data centers, and it's nice to have technology built into our load balancer that will automatically know and pick the fastest network path," Angelo said. "That's really interesting to us from a failover perspective."

That's not to say service mesh is without tradeoffs, especially when it comes to management complexity where IT operations staff are not familiar with advanced networking concepts. The centralized control plane can become its own single point of failure if not managed correctly, Angelo said, though organizations can mitigate this risk by building resiliency into their service mesh design.

"If something's happening in service discovery, serving stale data or something to a Linkerd node, and there's a bad host in the load-balancing pool, the Linkerd failure algorithm will pull it out of the pool even though the service discovery information is incorrect, which is really nice," Angelo said.

Other companies plan to kick the tires on the Istio service mesh when it's generally available, in part for its centralized network monitoring features.

"We still have [application code] in PHP, Node and Go, and three different ways to collect logs and monitor services and uptime," said Harrison Harnisch, a Chicago-based staff engineer for Buffer, a social media management platform with a distributed workforce around the U.S. "But if we can get everything talking through a service mesh, we can just use the same patterns for logging, and build template dashboards to share across teams, which is difficult to do now."

Istio's creators mull service mesh outlook

Even in traditionally stodgy industries, such as banking, developers are creating complex consumer-facing apps that look more like the high-scale web apps championed by the likes of Google.

"It's important they have real-time data and that they personalize the experience for each user," said Jennifer Lin, a product management director for Istio at Google. "This requires a more granular … set of services that allow these innovative applications to do things at scale with very low latency in a secure way."

Granular traffic routing and security policies will also be a key component of the hybrid cloud concept for Istio marketed by IBM, and will be necessary to manage microservices across private and public clouds, IBM engineer Daniel Berg said.

"Customers are going to need a mesh to help organize and manage the complexity that comes with transitioning between traditional and cloud-native applications," Berg said. "If you start using any mesh as part of your application, if you then try to port that to another provider that isn't using it, while it might run, you're going to get very different behavior that might be unexpected and undesirable."

But Envoy's creator at Lyft, senior software engineer Matt Klein, said it's most likely that mainstream enterprises will wait until features of the service mesh architecture are part and parcel of public clouds' container as a service and PaaS offerings, echoing the prediction of Gartner's Thomas.

"The way that you can image it would work in something like AWS Fargate. They would automatically inject a proxy like Envoy next to each user function or container, ... and the user would just get the features without caring how they're actually implemented," Klein said. "They'd get service mesh features, but it doesn't really matter to them that it's service mesh."

It's also anyone's guess how long the transition to such services might take, Klein said.

"We're probably 10 to 20 years out from when the majority of things will run in some type of public cloud," Klein said. "Businesses like [Microsoft] Azure, [Google Cloud Platform] and Amazon are 100-year businesses, and we're at the very beginning of that phase."

Article 1 of 6

Dig Deeper on Containers and virtualization

Software Quality
App Architecture
Cloud Computing
Data Center