Containers provide the tools to organize and build microservices. They include everything that a specific microservice needs to run, such as code, runtime, system tools, system libraries and settings. Using containers in the development process lets different teams work on separate microservices simultaneously. Kubernetes quickly became the de facto standard for container orchestration, and it enables developers to predictably deploy apps across any environment.
The Kubernetes platform provides a degree of no-hassle portability that helps alleviate scalability, reliability and performance issues. In general, containers limit the risk involved with bold, high-level deployments since it isolates the software in one place. Yet, there are still an exponential number of ways for container deployments to fail.
Selecting the right Docker and Kubernetes monitoring tools can help resolve those problems quickly and prevent unnecessary delays. Monitoring tools will help materialize problems you often won't see on a chart, such as source code referents, instrumentation verifications or SQL queries, which cause transaction bottlenecks.
The benefits of monitoring are obvious: It lets developers see events at the host level, application level or anywhere else in the system. In some cases, certain monitoring tools will automatically repair failed deployments.
For development teams today, there are typically a range of things to track, such as Docker containers, the particular cloud platform, Kubernetes orchestration and individual services. Teams should explore and evaluate combinations of prepackaged Kubernetes monitoring tools and key Docker monitoring tools to find the best fit.
Monitoring Kubernetes and Docker with open source and commercial tools
Kubernetes is an attractive orchestration platform because it uses precise language. This provides consistency and helps to simplify communications among developers, engineers and operations. With Kubernetes, multiple services -- and even namespaces -- can be scattered across the same physical infrastructure.
Each of these services resides in Kubernetes pods, which, in turn, can contain numerous containers. Because of this, it becomes quite complex to identify exactly which containers support specific microservices.
Development teams need both an infrastructure-based view and a service-centric view. Kubernetes monitoring can provide key insights and a service-centric view to help identify the role of an individual container or a set of containers.
Development teams should also monitor Kubernetes to derive environmental information about their applications. These can include failed launches or pod restarts that signal when certain elements have changed. Monitoring also provides information on service request latencies and per-container resource utilization.
These basic Kubernetes-centric monitoring tools include some that can be easily configured with a few lines of YAML and others that require more time and resources to configure.
Liveness and readiness probes
To verify if a container in a pod is healthy and prepared to serve traffic, Kubernetes provides a range of health checks. Liveness and readiness probes offer a unique type of automated self-healing. The purpose of liveness probes is to indicate that an application is up and running.
By contrast, readiness probes are designed to check if an application is prepared to serve traffic. The right combination of liveness and readiness probes lets developers achieve zero-downtime deploys, prevent the deployment of broken images and automatically restart failed containers.
Using these probes to monitor applications will ensure greater stability and provide self-healing if an app becomes unresponsive.
The Open Source cAdvisor works well with a host of other container runtimes. It auto discovers all containers in a given node and collects basic resource information, such as CPU percentages, memory, file system and network use statistics. Though it gathers real-time data, it lacks the storage capability that enables long-term analysis. Still, many other metrics collection systems use cAdvisor as an underlying technology to gather metrics. It exposes a remote REST API endpoint for studying metrics and also offers a built-in web UI for data visualization.
A key component of all Kubernetes monitoring, Heapster aggregates tracking and event data across a cluster. While it requires an investment of time and resources to configure, Heapster offers the advantages of open source in terms of flexibility and durability. Packaged with Kubernetes, it offers detailed trending and analysis capabilities.
Heapster not only collects performance metrics about workloads, pods and containers, but also about events and other signals generated by that cluster. It groups and labels the information by pod, which makes it considerably easier than other tools to perform analyses and reviews. Heapster supports a number of different back ends, including InfluxDB, Elasticsearch and Graphite.
As a web UI for accessing cluster information, Kubernetes Dashboard displays all running workloads, enabling developers to actively manage and troubleshoot applications. It provides a complete view of all running applications. Developers can choose to modify individual resources, including DaemonSets, Deployments or Jobs.
When combined with Heapster, the dashboard can provide logs for individual pods, offering a consistent means for fast data visualization. Exposing the dashboard to the internet in early versions of Kubernetes resulted in a much publicized Tesla security breach. Subsequently, the default configuration for installing the dashboard has been locked down in later versions.
Sysdig is a unified open source agent that provides a service-level view of metrics, command histories and policy violations, along with commercial support. Working in tandem with the Kubernetes DaemonSet, the tool ensures that an instance of a specific pod is running on all nodes in a cluster. Sysdig can see inside containers to provide a more code-centric view of an application. As soon as it's configured, the tool will monitor applications even while Kubernetes is in the process of scaling Docker containers. The tool functions as an effective unifier and provides both a high-level infrastructure view and a Kubernetes-centric view.
In 2017, Sysdig launched its application checks function, which lets Sysdig Cloud gather even more information from microservices infrastructures. The preinstalled plug-ins poll applications for custom metrics, which are then exported for review through the application's status or management pages.
Prometheus is an open source, multi-tiered monitoring system that can straddle both the Kubernetes system and Docker containers. The tool digs into time series events and provides the mechanisms to store those metrics for later review. Prometheus incorporates a range of features, including dashboard visualization -- i.e., console templates -- time series storage, data collection, alarms and event management, and extensibility via Prometheus data exporters.
The complexity of the console templates can be offset by first integrating with a simpler visualization back end, such as Grafana. Prometheus also acts as a centralized hub for collecting metrics data through its support for a range of third-party tools, including HAProxy, MySQL, Memcached, Redis and StatsD. It offers an easy-to-grasp query language and enables developers to use the same language for graphing and alerting.
Pinpointing exactly when and where an application error has occurred or continues to occur remains a key challenge in microservices architectures. Geared specifically to developers, Retrace is a straightforward SaaS tool that enables users to identify which deployment in which microservice is causing errors. As an application performance management product for .NET and Java applications, it offers a range of features, such as logging method call stacks and database queries, tracking requests and monitoring errors and logs.
A tool like Retrace can help developers quickly identify which deployment in what specific microservice has caused the error. Once developers identify the cause, rolling back or patching is fast and easy. The latest release supports the most commonly used application servers and frameworks in the Java ecosystem, including web services and data storage. With its built-in Java support, the Retrace console will automatically show Java frameworks by default if they're part of an application.