The idea behind observability around a technical platform is to understand the state of a complex environment through its outputs. The depth of observability then leads to how quickly and effectively any problems on that platform can be identified and fixed.
This might seem like restating and renaming what IT admins have been doing all along -- attempting to keep a grasp on what's going on through the use of systems management software and other tools.
However, observability isn't simply repurposing existing software to garner more money from customers. It's a real need driven by the changes most organizations are grappling with -- the move from a totally owned and controlled platform to a complex mixed hybrid environment of owned and shared, physical and virtualized cloud-based platforms. Tools that cover such an environment are few and far between, but observability provides a starting point to deal with many of these areas.
Use case 1. Underlying observability architecture
Observability brings together data from a wide range of sources so it can be analyzed and identifies where problems are or might occur in the future. Strong monitoring capabilities are required, along with an understanding of the underlying metrics for each part of a system being monitored. Dependencies between different parts of the platform must be understood, and what's normal and abnormal must be defined. This can be done as a mix of out-of-the-box settings, user-defined settings and empirical learned limits as the observability system runs.
This article is part of
In addition, observability should be able to identify abnormal activity from zero-day threats or immediate issues caused by poor or wrong coding.
Use case 2. Data monitoring, aggregation and reporting
Public cloud owners might not allow organizations to run in-depth management software, but their platforms continually create data, such as telemetry and log data, particularly where it pertains to your own workloads. By aggregating this data with your data streams, analyzing it and gaining near-real-time reports, an organization can discover issues at an early stage. If the problem is on your platform, you can deal with it immediately; if it's indicated as being on a third party-owned part of the platform, you have an early heads-up about the issue and can share the data with the provider so it can fix things.
Event-based automation enables IT teams, particularly site reliability engineering teams, to trigger trouble tickets that can be routed to the right app service or people. This enables employees to get on with the business of creating strategic IT value for the organization.
For example, employees can focus on areas such as digital transformation, with increased trust that automated remediation via in-depth observability will make complex workflows more likely to work and to be fixed more quickly if things do go wrong.
Use case 3. Platform security and DevOps
Another area where observability comes in handy is in proactively managing security. Data outputs from across the platform can monitor for abnormal activity and trigger events to mitigate or block any effect from a security issue.
Similarly, a DevOps environment can monitor for abnormal activity and prevent workloads from being provisioned if that action would create problems on the working platform. Even if a workload is on the main platform and begins to misbehave, observability can be used to set off actions that throttle or bring the workload offline, replacing it with a known working version if necessary.
Even with the upstream side of DevOps, developers will find observability useful: The capability to deal with outputs across different microservices and virtual containers ensures such environments are ready for the production environment when pushed down the DevOps line.
Use case 4. Longer-term trending
One useful aspect of observability is tracking the performance of an app or platform over time. Changes can be picked up, and trends outside of the target can be identified, triggering remediation or requests for human intervention.
Apps or services that suffer from memory leaks, for example, can cause issues even if the leak is slow. Apps used by a greater number of people can be identified and resources adjusted to better meet their needs.
For now, observability isn't available as a single off-the-shelf product. But, by ensuring that monitoring, data aggregation and analysis capabilities are in place to support an observability approach and then integrating event triggers into help desk systems, automated systems management and cloud-based resource management and workload provision engines should provide organizations with much of what they require for the future.