tiero - Fotolia

IT pros shore up Prometheus monitoring via third-party tools

Upstream Prometheus monitoring isn't ready for prime time in enterprise environments, but it gets by with a little help from its friends, such as Sysdig and Rancher Labs.

Prometheus monitoring offers crucial data in Kubernetes container orchestration environments, but many enterprises prop up its reliability and security with third-party tools.

Prometheus is an open source time-series monitoring tool governed by the Cloud Native Computing Foundation (CNCF) alongside Kubernetes. Both Prometheus and Kubernetes graduated to stable status within the CNCF in 2018. While some IT shops are comfortable with upstream Kubernetes deployments in production, Prometheus monitoring is a different story.

"Prometheus isn't particularly enterprise-friendly," said Jeremy Pullen, CEO and principal consultant at Polodis Inc., a DevSecOps and Lean management advisory firm in Tucker, Ga., that works with large enterprise clients. "Some people get scared by Prometheus being lossy -- it throws away data when it falls behind."

Prometheus monitoring in its upstream form lacks native server-side authentication and encryption of the data it collects, so users must deploy a reverse proxy server alongside Prometheus to enhance its security. Fairly simple reverse proxy tools such as Nginx and HAProxy are available in open source versions and are easily deployed, but enterprises find that Prometheus monitoring security in multi-tenant enterprise environments requires a proprietary tool.

The search for multi-tenant Prometheus

For an IT contractor that works with the U.K.'s central Home Office agency, a combination of Prometheus monitoring and Sysdig Monitor 3.0 met the agency's requirements for multi-tenant security support and data management.

"We started working with Sysdig two years ago, because its integration with the Kubernetes API gave it knowledge of the context of containers," said Jay Keshur, director of Appvia Ltd., the contracted digital delivery and cloud consultancy firm. "But we've been asking for Prometheus integration for about a year."

Sysdig Prometheus multi-tenancy
Multi-tenant management for open source Prometheus compared to the Sysdig Teams feature.

Prometheus uses labels to track instances in the Kubernetes infrastructure, and Prometheus labels align with resource labels in Kubernetes. Prometheus also pulls a detailed set of granular time-series data from Kubernetes nodes, which offers application developers more detailed data on application performance than traditional IT monitoring tools.

"Prometheus gets closer to the app," Keshur said. "Sysdig could already pull some of that data through StatsD, but Prometheus integration gives us the ability to organize metrics with consistent tagging groups." StatsD is a way to unify application metrics collection, originally created by Etsy Inc.

These tagging groups are the basis for alerts and team-specific views of the Prometheus database. If an application developer wants to track the number of logins on an app and determine whether the logins came from an administrator or another type of user, Sysdig's Prometheus integration offers an easy way to break down that data that doesn't require custom metrics work.

However, while Prometheus monitoring is well-suited to Kubernetes pods and the containers within them, Sysdig has better integration with hosts in the Kubernetes cluster. This means developers don't have to write hooks for Nginx host monitoring, for example, Keshur said.

Sysdig also integrated Prometheus monitoring with its Sysdig Teams approach to multi-tenancy, which Keshur's team requires in a highly secure government environment. Sysdig Teams differs from other approaches to Prometheus monitoring on multi-tenant systems that deploy multiple instances of the Prometheus application.

"This gives us multi-tenancy that scopes what each team can see, but with a single central place to host Prometheus," Keshur said.

Kubernetes orchestrators reinforce Prometheus

For some enterprises, Prometheus monitoring, with the necessary security and reliability features, comes baked in to third-party Kubernetes distributions from vendors such as Rancher Labs Inc. and Red Hat.

"Rancher does the work to install Prometheus for us, and role-based access control is built into the tool," said Matthew Esser, product owner of container services and infrastructure at Viasat Inc., a satellite telecommunications company in Carlsbad, Calif. "It's something we can offload to Rancher, and it helps with performance and reliability."

If you use [Prometheus] in a multi-tenant environment you might lose some capabilities. ... Deploying a new instance inside each project and a separate instance to monitor the health of the overall Kubernetes cluster is still the safer way to go.
Sheng Liangfounder and CEO, Rancher Labs

In addition to multi-tenancy support through integration with Kubernetes role-based access control, Rancher Labs includes a reverse proxy for server-side authentication in Prometheus and can automate upgrades the same way it does for Kubernetes. Red Hat OpenShift Container Platform includes similar features, and version 3.11, released last week, includes a CoreOS Operator for Prometheus that automates storage provisioning, sets monitoring alert thresholds and integrates Prometheus data into OpenShift dashboards. These improvements in OpenShift boost Prometheus performance as well to address the risk of dropped data.

However, both Rancher Labs and Red Hat say there's room to improve scalability and security in the upstream versions of Prometheus.

"Prometheus is historically a single-tenant stack, and if you use it in a multi-tenant environment, you might lose some capabilities," said Sheng Liang, founder and CEO of Rancher Labs. Trying to cram too much data from a large environment into a single central Prometheus deployment, for example, risks data loss. "Deploying a new instance of Prometheus inside each project and a separate instance to monitor the health of the overall Kubernetes cluster is still the safer way to go," Liang said.

Red Hat participates in an open source project called Thanos, which supports a highly scalable centralized Prometheus architecture with long-term data storage features.

"In 2019, it will give customers the choice to deploy one big Prometheus instance for tenants and clusters, with multi-tenant cluster views and an improved user experience," said Mike Barrett, director of OpenShift product management at Red Hat. However, some users will still deploy instances of Prometheus for each project to get the most detailed data possible, Barrett predicted.

Prometheus roadmap includes security improvements

Multi-tenancy support for Prometheus monitoring will remain the domain of third parties, said Julius Volz, a former platform engineer at SoundCloud who created Prometheus and helps to maintain its open source code.

However, server-side authentication improvements for server endpoints in Prometheus, its exporters and a related utility called Alertmanager are on the Prometheus roadmap, Volz said. Also, while some dropped data is considered acceptable under some conditions in Prometheus environments, planned improvements will reduce data loss when Prometheus exports data to external databases.

"When Prometheus cannot send samples to the remote [endpoint], it will drop samples for the duration of the outage and then resume only with current data when things work again," Volz said. "There are some plans to improve that part by replaying samples from the local database and resend them to the remote end."

Dig Deeper on IT systems management and monitoring

Software Quality
App Architecture
Cloud Computing
Data Center