Observability is the capability to deduce what's happening across an IT platform by monitoring and analyzing outputs from that platform. This is important for areas such as workload performance monitoring and platform security.
The use of observability means there's no need for a highly granular knowledge of the underlying physical platform, which is useful with today's hybrid private and public systems. But there are several areas that should be covered to ensure you can trust what the outputs tell you.
1. Know your platform.
This goes against the idea of observability not needing a granular knowledge of the physical platform, but without that knowledge, it's difficult to identify all possible sources for data feeds. As such, a discovery engine is required to carry out an audit of the platform. Many of these feeds will be related to virtual environments, so you shouldn't need to identify the specific physical hardware they're attached to. A good discovery engine will keep everything updated as new resources are added or removed from the platform.
2. Turn on data logging where it's not already enabled.
Use the Simple Network Management Protocol or other means of creating standardized data logging wherever possible. Where proprietary data formats are used, ensure they can be accessed. Use connectors that can translate the data into a standardized form; many of the data aggregation tools mentioned below will have this capability either out of the box or as add-ons.
This article is part of
3. Filter data as close to the point of creation as possible.
Much of the data created by an IT platform won't be of any use -- it essentially says everything is all right. An observability system should be designed to filter data at multiple levels to ensure bandwidth isn't swamped by excessive chatter and data analysis can be carried out quickly and effectively in real time. But be careful: Filtering out what seems unimportant to the operations team could be very important when aggregated with data from other sources.
4. Ensure data can be aggregated and centralized.
Observability requires a means of analyzing data to recognize patterns and abnormalities so the platform can report what it sees. Systems such as Splunk, Datadog and Mezmo (previously LogDNA) have shown how data can be centralized and used to provide observability insights.
5. Data analysis tools should fit the purpose.
Analysis tools that don't pick up on key areas, such as early-stage problems or zero-day attacks on the platform, won't provide the peace of mind an effective observability system offers. Most observability approaches are coalescing around systems such as security information and event management products from the likes of LogRhythm, FireEye or Sumo Logic.
These products, built on a need for organizations to secure their platforms against internal and external threats, are rapidly recognizing they have the capabilities to become observability offerings and can use their pattern recognition and advanced heuristics systems to identify other issues, such as early-stage problems at a virtual or physical level across an IT platform.
6. Report in the right manner.
Observability shouldn't be seen as a tool only for sys admins or DevOps practitioners, but as a means of breaching the chasm between IT and the business by reporting what it sees and advising on what needs to be done. Reporting should inform IT professionals in real time as to what problems are present and provide trend analysis and business impact reporting that can be understood by line-of-business personnel.
7. Integrate with automated remediation systems wherever possible.
Many issues identified by an observability offering will be relatively low-level. Most sys admins will already have tooling in place to automatically fix issues such as systems requiring patching or updating, or where extra resources must be applied to a workload. By integrating an observability system into these tools, IT can more easily maintain an optimized environment. Where automation isn't possible, having such a filter ensures IT can focus on more important problems and fix them more quickly.
8. Feedback loops should be present and effective.
Repeated security issue identification or resource problems might be caused by coding issues or implementation that can't be fixed through automated means. Tying observability systems into help desk and trouble ticketing offerings ensure areas are picked up and assigned to the right IT staff.
Observability is becoming a necessity as organizations move to a more decentralized IT platform. Without the capability to aggregate and analyze data coming from all areas of an IT platform, organizations open themselves up to problems ranging from inadequate application performance through a poor user experience to major security issues. In the long term, observability will differentiate how well organizations perform in a highly dynamic and complex world.