Luiz - Fotolia

Tip

Conquer 8 cloud observability challenges to maximize ROI

Cloud administrators and operations teams face all types of observability challenges. With the right practices in place, you can reduce downtime and increase your ROI.

Damon Garn, Cogspinner Coaction

Published: 09 Oct 2025

Complex infrastructure and deployments -- such as hybrid and multi-cloud environments -- require thorough and thoughtful observability strategies. Without understanding the internal state and behavior of a system, companies can expect various issues, including performance and reliability problems.

According to New Relic's "2025 Observability Forecast Report," high-impact outages carry a median cost of $2 million per hour, or approximately $33,333 for every minute systems remain down. However, observability can help reduce outages -- those surveyed with full-stack observability only experienced an average $1 million per hour during high-impact outages.

While there are benefits to implementing observability, it must be done properly -- planning is essential. Some of the challenges an organization might face include the following:

Delays in detecting and addressing incidents.
Difficulty correlating incidents and performance issues across applications.
Increased costs from gathering and storing too much monitoring data.
Lack of data standardization which creates miscommunications and incompatible data among teams.

Discover the common observability challenges that most cloud administrators face today. Then, review best practices for mitigating them to improve observability within your cloud deployment.

1. Metrics and data overload

One problem administrators typically face with logging and monitoring is the sheer volume of information. When configuring monitoring tools, the temptation is to gather as much information as possible. However, this often leads to metrics overload and an immense amount of data. Admins must sift and store data, which can lead to alert fatigue and an inability to find the important information. The cost of gathering and storing large amounts of data can also become astronomical.

Managing the cost of monitoring and observability is essential, but it's often difficult to quantify. A well-designed and standardized platform helps administrators control these costs while gathering only the necessary data to maintain systems effectively.

2. Weak performance monitoring

Another challenge related to metrics overload is the performance implications of monitoring. Monitoring is a critical function, so administrators must allocate a specific amount of compute and storage resources to it.

Always remember that gathering metrics can affect the host systems. Cloud administrators must ensure they collect the right information -- not too much and not too little. This helps optimize systems by providing only accurate, relevant data to streamline processes and by relieving systems and resources of hefty, unnecessary workloads.

3. Mismanaged observability tools

Departments within an organization might work with separate, siloed observability tools. These tools could then output different data types or formats. Then, administrators within these separate departments could each have their own names for products, data results and applications. This disjointed data is a nightmare for organizations.

Standardization is critical in larger hybrid and multi-cloud deployments. Ensure teams use the same tools and output data using the same formats for compatibility. Additionally, establish naming conventions that ensure clear and accurate communication when identifying systems.

4. Lack of skilled staff

Many organizations face challenges in finding and keeping skilled staff. Those with skills and experience in monitoring and analyzing cloud services are even more rare. Correlating monitoring data with incident alerts and troubleshooting root causes is a complex process that typically requires extensive knowledge. Once your organization finds knowledgeable administrators, retaining them is crucial to achieving your observability goals.

Plenty of training opportunities exist. Whether training takes the form of formal, on-the-job training, providing technical certification opportunities or facilitating a self-paced system of learning for employees, there are options that benefit all types of learners.

5. Inadequate context for troubleshooting

Administrators rely on data for root cause analysis in addition to experience. Not all monitoring platforms are created equal, and many can't identify the root cause of an incident. Monitoring results can enable basic incident management, and it's up to administrators to uncover the fundamental problem.

There are tools and services, including AI, that can provide administrators with troubleshooting support. However, organizations must not treat monitoring and troubleshooting interchangeably. By monitoring resources, IT teams can discover changes or anomalies within their systems. Troubleshooting enables teams to learn what the change was, where it occurred and why it happened.

6. Reactive approaches to problem solving

Reactive measures prevent IT operations teams from getting ahead of problems. An effective observability and monitoring infrastructure enables proactive incident prevention rather than reactive firefighting.

AI-based predictive analytics enable cloud administrators to get ahead of potential outages. However, an implementation like this one requires thoughtful design and standardization across the hybrid or multi-cloud environment.

7. Managing compliance, security and privacy concerns

Data sovereignty and similar regulatory compliance issues continue to be at the forefront of cloud deployments. Observability strategies must encompass and satisfy these requirements to ensure transparency, privacy, and data security.

Maintaining the security of observability data is a continuing challenge when dealing with distributed compute and storage resources. In addition, monitoring teams might be off-site or could even be third-party contractors, adding complexity to enforcing compliance. To create a thorough compliance strategy, ensure that cloud administrators and IT teams understand their responsibilities and their role in ensuring the security of their organization's sensitive data and systems.

8. Limited visibility and multi-cloud challenges

Multi-cloud deployments offer unique challenges. Competing cloud service providers often employ incompatible observability tools. These tools, provided by vendors like AWS, Microsoft and Google, deliberately focus on their own products.

Third-party tools, such as open source options, can help address these concerns. According to New Relic's 2024 Observability Forecast Report, 51% of respondents to their survey were using an open-source offering for one or more observability capabilities. And of those users, 38% were using Grafana, 23% were using Prometheus and 19% were using OpenTelemetry.

However, keep in mind that third-party options may include additional costs and might not provide full coverage for specific use cases. The complexity level also increases significantly when dealing with on-premises deployments, particularly with legacy systems.

Some of the risks associated with increased complexity include the following:

Blind spots.
Delayed detection and responses.
Performance bottlenecks.
Incompatible data and results.
Training challenges with multiple tools.

However, good observability is as critical to multi-cloud environments as it is to single-vendor deployments. Multi-cloud solutions are already very complex, so gathering information on them is essential to ensuring their services function as intended.

Damon Garn owns Cogspinner Coaction and provides freelance IT writing and editing services. He has written multiple CompTIA study guides, including the Linux+, Cloud Essentials+ and Server+ guides, and contributes extensively to Informa TechTarget, The New Stack and CompTIA Blogs.

Dig Deeper on Cloud infrastructure design and management

Part of: Master observability for business success

Up Next

Real-world examples of cloud observability in action

Observability platforms are no longer just IT tools --they're strategic business enablers that directly affect revenue, customer satisfaction and competitive positioning in the market.

Improve observability with AI: 5 real-world success stories

As businesses rely more on hybrid and multi-cloud, comprehensive visibility into system performance and its effect on business outcomes is critical. Observability and AI can help.

Conquer 8 cloud observability challenges to maximize ROI

Cloud administrators and operations teams face all types of observability challenges. With the right practices in place, you can reduce downtime and increase your ROI.

MELT away your cloud observability troubles with open source

In today's complex cloud environments, enterprises face a critical visibility challenge. Comprehensive observability isn't just a technical advantage -- it's a business imperative.

OpenTelemetry vs. Prometheus: Which should you choose?

Choosing the right observability tool has a big impact on growing and future-proofing your business. Discover how to make OpenTelemetry, Prometheus or both work for you.

Conquer 8 cloud observability challenges to maximize ROI

Cloud administrators and operations teams face all types of observability challenges. With the right practices in place, you can reduce downtime and increase your ROI.

1. Metrics and data overload

2. Weak performance monitoring

3. Mismanaged observability tools

4. Lack of skilled staff

5. Inadequate context for troubleshooting

6. Reactive approaches to problem solving

7. Managing compliance, security and privacy concerns

8. Limited visibility and multi-cloud challenges

Dig Deeper on Cloud infrastructure design and management

End-to-end network observability for AI workloads

Top observability tools for 2025

Why observability is important in multi-cloud environments

Ease multi-cloud governance challenges with 5 best practices