Edelweiss - Fotolia

Linux kernel utility could solve Kubernetes networking woes

Linux kernel utility eBPF gets new life as a more effective means to scale Kubernetes networking than native Kubernetes tools, and in some cases, service mesh.

As production Kubernetes clusters grow, a standard Linux kernel utility that's been reinvented for the cloud era may offer a fix for container networking scalability challenges.

The utility, extended Berkeley Packet Filter (eBPF), traces its origins back to a paper published by computer scientists in 1992. It's a widely adopted tool that uses a mini-VM inside the Linux kernel to perform network routing functions. Over the last four years, as Kubernetes became popular, open source projects such as Cilium began to use eBPF data to route and filter Kubernetes network traffic without requiring Linux kernel changes. 

In the last two years, demand for such tools rose among enterprises as their Kubernetes production environments grew, and they encountered new kinds of thorny bottlenecks and difficult tradeoffs between complexity and efficiency.

IT monitoring vendor Datadog saw eBPF-based tooling as the answer to its Kubernetes scaling issues after a series of experiments with other approaches.

"Right now, there are a lot more people running Kubernetes at smaller scale," said Ara Pulido, a developer relations specialist at Datadog, in an online presentation last month. "When you start running Kubernetes at bigger scale, you run into issues that just a handful of people have found before, or maybe you are the first one."

As Datadog's environment expanded to dozens of Kubernetes clusters and hundreds of nodes, it quickly outgrew the default Kubernetes networking architecture, Pulido said.

Among the scalability issues Datadog encountered was the way the native Kubernetes load balancer component called kube-proxy handles service networking data. In microservices environments, application services comprised of Kubernetes Pods communicate through load balancers; by default, kube-proxy performs this role and is deployed to every Kubernetes cluster node. Kube-proxy then monitors the Kubernetes API for any changes. When changes are made, by default, kube-proxy updates iptables to keep track of service routing information.

"One of the issues is that with every change, you have to resync the whole table, and as you scale the number of pods and services, that's going to have a cost," Pulido added.

Since Kubernetes 1.11, kube-proxy can also use the Linux IP Virtual Server instead of Iptables, which doesn't require a full resync when changes are made to the cluster, among other improvements. However, this required Datadog engineers to become upstream contributors to IPVS to ensure it worked well in their environment, Pulido said.

As we moved to Cilium in our newer clusters, we realized we could also remove kube-proxy, as Cilium already implements a replacement.
Ara PulidoDeveloper relations, Datadog

Datadog then began to explore eBPF tools from Cilium for granular container security features and found it could serve as wholesale replacement for kube-proxy.

Cilium provides identity-based connections via Kubernetes labels, rather than connections based on IP addresses, which may not be fine-grained enough to accommodate individual workload permissions in security-sensitive environments, Pulido said in an interview following her presentation. "As we moved to Cilium in our newer clusters, we realized we could also remove kube-proxy, as Cilium already implements a replacement."

Cilium updates eBPF for Kubernetes networking

Cilium, launched four years ago, and its commercial backer, Isovalent, have developed Kubernetes networking and security tools based on eBPF, as have other vendors such as Weaveworks, whose Weave Scope network monitoring tool uses eBPF data  to perform granular tracking of Kubernetes TCP connections. Another company, Kinvolk, created the cgnet open source utility to collect detailed pod and node statistics via eBPF and export them to Prometheus.

Cilium Kubernetes networking architecture
Cilium eBPF-based tools replace native Kubernetes networking functions.

Cilium's eBPF-based tools replace Kubernetes networking elements including kube-proxy to provide network and load balancing services and to secure connections within them. Users say the Cilium tools perform better than kube-proxy, especially the IPtables version, and offer a more straightforward approach to Kubernetes service network routing than overlay tools such as Flannel.

"The IPtables approach [with kube-proxy] was always kind of kludgy," said Dale Ragan, principal software design engineer at SAP's Concur Technologies Inc., an expense management SaaS provider based in Bellevue, Wash.

Ragan also encountered some known issues between Flannel and Kubernetes NodePort connections as of late 2018, which he discovered that Cilium could potentially avoid. Concur has since swapped out Flannel Container Network Interface (CNI) plugins for Cilium in its production clusters, and is also testing Isovalent's proprietary SecOps add-ons, such as intrusion detection and forensic incident investigation.

"The other [appeal of eBPF] was from a security perspective, that we could apply policies both cluster-wide and to individual services," Ragan said.

eBPF vs service mesh

Cilium contributors also contribute to Envoy, the sidecar proxy used with Istio and other service meshes, and eBPF isn't a complete replacement for service mesh features such as advanced layer 7 application routing. Cilium can be used with a service mesh to accelerate its performance, said Isovalent's CEO, Dan Wendlandt.

"CNIs are at a lower layer of Kubernetes networking -- service mesh still depends on that core networking and security layer within Kubernetes," Wendlandt said. "Cilium is a good networking foundation for service mesh that can get data in and out of any service mesh proxy efficiently."

However, at lower layers of the network stack, there's significant overlap between the two technologies, and Concur's engineers will consider whether eBPF might support multi-cluster connectivity and mutual TLS authentication more simply than a service mesh.

"We want to get the networking layer correct, and from there add service mesh," Ragan said. "From a TLS perspective, it could be very transparent for the user, where Cilium is inspecting traffic at the system level -- there are all kinds of opportunities around intrusion detection without a lot of overhead and work for [IT ops] teams to do to allow visibility for SecOps."

Still, Cilium and other eBPF-based tools represent just one approach that may gain traction as more users encounter problems with Kubernetes networking at scale. For some truly bleeding-edge Linux experts, eBPF may be eclipsed in network performance enhancement by the io_uring subsystem introduced in the Linux kernel a year ago, for example.

"eBPF is going through a bit of a hype cycle right now," said John Mitchell, an independent digital transformation consultant in San Francisco. "From the VC perspective, it's a super-techy 'special sauce', and the eBPF ecosystem has gotten some good push from influential uber-geeks."

However, eBPF has real potential to add advanced Kubernetes network security features without requiring changes to application code, Mitchell said.

Next Steps

Take advantage of eBPF's monitoring capabilities on Linux

An eBPF tutorial to try out the bpftrace framework

Dig Deeper on Containers and virtualization

Software Quality
App Architecture
Cloud Computing
Data Center