Nabugu - stock.adobe.com
Dynatrace bets on causal intelligence for AI observability
At Perform 2026, Dynatrace unveils an agentic operations system that anchors probabilistic AI to deterministic truth. Learn how causal intelligence gives agents a reality check.
At Perform 2026, Dynatrace laid out its vision for transforming observability and monitoring. It aims to move from reactive, dashboard-driven processes to a paradigm of autonomous operations, where systems proactively work to prevent or solve problems.
This shift is a response to the escalating operational complexity organizations face when adopting agentic AI and cloud-native application architectures in hybrid and multi-cloud settings. Reactively addressing those challenges is an uphill battle that drains IT ops resources and slows down the software development lifecycle.
"In today's world, where enterprises compete based on their ability to rapidly and continuously deliver new software capabilities to the market -- while making investors happy by showing healthy margins -- these enterprises need to shift from managing incidents to managing intelligence," says Steve Tack, CPO at Dynatrace.
The official session schedule of Dynatrace Perform 2026 emphasizes the company's focus on business observability. Business observability aims to unify customers' technical telemetry data with business KPIs to transform operational insights into boardroom-ready outcomes.
Dynatrace Intelligence: Business observability for autonomous operations
Dynatrace Intelligence -- based on Dynatrace's proprietary reasoning engine, Davis AI -- replaces the traditional dashboard-driven, break-fix approach to observability with autonomous operations. Dynatrace positions this as the first agentic OS. The platform combines the company's Causal AI feature with its Smartscape topology graph to interpret dependencies among all components in the stack, using agentic AI that can reason, decide and act within those deterministic boundaries. This fusion of deterministic and probabilistic principles enables AI agents to propose actions that are not simply based on statistical correlations -- essentially probabilistic guesses -- but constrained by what the platform definitively knows about upstream and downstream impact.
Consider an e-commerce platform running on Amazon EKS EC2 Kubernetes Service, where a sudden spike in cart abandonment triggers an AI agent to investigate. Without causal topology, the agent might correlate the timing with a recent Kubernetes pod scaling event and recommend rolling back the deployment.
Smartscape might reveal that the actual culprit is an AWS RDS connection pool exhaustion caused by a downstream inventory service that the scaled pods now overwhelm with queries. Alternatively, Smartscape might reveal that the traffic spike originated from a marketing campaign that went live 30 minutes earlier, directing thousands of users to a landing page that triggers the Compare Products workflow. This is a perfectly legitimate business event that the AI agents misinterpreted as a threat because they lacked visibility into the business context.
This is where Dynatrace Intelligence Agents come in. An SRE agent, for example, would have immediately correlated the backend spike with the frontend click event through Smartscape's end-to-end trace. The SRE agent would recognize the Compare Products workflow as a legitimate user action triggered by a marketing campaign and would inform the security agent that the traffic pattern matches expected behavior for this feature. This would prevent the unnecessary cluster scaling and traffic throttling that would have degraded customer experience.
Dynatrace's causal AI (Davis) is a reasoning engine that traverses the topology map (Smartscape) to establish causality, ultimately finding the root cause based on interdependencies and event timelines in a deterministic manner. This determinism is key to providing humans with clear instructions and, ultimately, to applying automated remediation and resolution.
Dynatrace believes its proprietary combination of causal AI and real-time topology graphs delivers the deterministic foundation required for automated action. The company's confidence in autonomous operations stems from the conviction that Smartscape's end-to-end mapping of the enterprise application and AI stack provides a deeper contextual understanding than competitors, such as Datadog, New Relic and Splunk. Dynatrace believes that Davis AI, built atop this topology, can reason within those boundaries rather than guess from correlations.
While this approach offers significant transformational potential, we first need to evaluate customer implementations to determine how close they can come to autonomous operations.
AI Observability: Grail and the MCP Server
Observability is critical for developers to understand the impact of new code on the overall application. Dynatrace MCP Server enables developers to check system health and performance through natural language queries in their IDE. Instead of context-switching to a dashboard, a developer can get an instant answer grounded in production data by asking, "What's the p99 latency for the checkout service since my last commit?"
Davis CoPilot translates these plain English questions into Dynatrace Query Language behind the scenes, aiming to relieve developers of having to learn yet another query syntax to get the insights they need. A developer debugging a slow API endpoint can instantly see the slow queries, the services that called them and the code commits that introduced the regression by saying, "Show me the database queries triggered by the checkout endpoint that exceeded 100ms this week."
The Grail data lakehouse provides the foundation, offering federated access across frontend, backend, AI telemetry, database, cloud and mobile, providing the decision context needed by autonomous agents. When a Dynatrace Intelligence Agent investigates a production incident, Grail offers complete context. It identifies when error rates spiked, correlates the spike with a specific deployment and notes who is affected, such as users on the mobile app. The investigation could trace the problem back to a third-party payment API that altered its response schema. This enables the agent to recommend a targeted fix rather than a blanket rollback.
The MCP server race is heating up across the industry, with Dynatrace putting a clear strategic story behind MCP. Competitors, such as Grafana, New Relic and Datadog, have launched their own MCP integrations, enabling similar natural language queries and IDE workflows. Where Dynatrace aims to differentiate is in the depth of context that flows through the MCP connection. Rather than surfacing metrics and logs in isolation, Dynatrace feeds Smartscape's causal topology and Grail's unified data into every query response, giving AI assistants the dependency context needed to reason about impact rather than just retrieve data.
Digital experience monitoring: Next-generation RUM
The combination of Grail, Smartscape and Dynatrace Intelligence enables new Real User Monitoring (RUM) capabilities that unify frontend telemetry with backend context. When a user clicks a button on a single-page application that triggers a cascade of AI model calls that are invisible to traditional tools. The events those tools do monitor -- URL changes and page reloads -- don't occur.
Smartscape, by contrast, traces the causal chain from that frontend interaction through every AI model, microservice and database query it interacts with. The new RUM includes purpose-built developer interfaces that prioritize grouped errors with end-to-end context, supporting advanced use cases such as analyzing single-page application rendering delays or AI-generated content performance.
Dynatrace connects LLM telemetry to the full causal topology so that AI agent behavior can be understood in the context of upstream user actions and downstream business impact.
Log management and application observability: Multi-cloud foundation
Dynatrace announced expanded cloud-native integrations across AWS, Azure and Google Cloud, positioning the platform as a control plane for AI in production. All telemetry flows into Grail, where Smartscape's automatic topology discovery maps dependencies across all three hyperscalers in real time. The unified Grail architecture is designed to solve the problems of fragmented data, disconnected telemetry and incomplete understandings of how services relate. The problems prevent AI from making reliable recommendations or taking safe autonomous actions.
Datadog has long led in multi-cloud breadth with over 800 integrations, while Grafana Labs emphasizes OpenTelemetry and vendor neutrality as its multi-cloud strategy. Cisco/Splunk is using the acquisition to combine Splunk's log analytics with Cisco's network infrastructure visibility. Dynatrace's angle is that breadth of integrations matters less than depth of causal understanding -- an argument that resonates with enterprises struggling with tool sprawl -- but requires proof that Smartscape can map dependencies they cannot see elsewhere.
Imagine a healthcare company running its patient-facing chatbot on Azure OpenAI, its claims processing agents on AWS Bedrock, and its research summarization pipeline on Google Vertex AI. When a patient complains that the chatbot gave contradictory information about their coverage, the support team needs to trace that conversation across three clouds, multiple AI models and the backend systems that fed context to each model. Smartscape's cross-cloud topology map turns an investigation that would take days into one that takes minutes, because it already knows which database fed which context to which model, and when that data was last updated.
The bottom line
Dynatrace is betting that the solution to agentic AI chaos lies in its organically grown architectural stack. Grail is the unified data lakehouse, Smartscape is the real-time causal topology engine and Dynatrace Intelligence is the agentic layer that fuses deterministic and probabilistic AI. The company positions this causal understanding as the moat that makes autonomous action safe. The MCP Server, Davis CoPilot, Dynatrace Assist and the domain-specific Intelligence Agents are all interfaces to this underlying architecture. Business observability leads because that is where the value is measured. AI observability follows because that is where the uncertainty lives, and the foundation of logs, applications and digital experience ties it all together through Grail and Smartscape.
Whether enterprises buy this vision will depend on how well Dynatrace can demonstrate that causal AI combined with agentic AI produces something greater than either alone. Still, the direction across the entire industry is clear: Observability is no longer about watching, it is about proactive controlling.
Torsten Volk is principal analyst at Omdia covering application modernization, cloud-native applications, DevOps, hybrid cloud and observability.
Omdia is a division of Informa TechTarget. Its analysts have business relationships with technology vendors.