Omdia Analysts' Perspectives

View All Series Articles

Opinion

Cisco and Splunk are teaching AI to anticipate system failures

Cisco and Splunk use machine data to train a new time-series foundation model that surfaces hidden issues and creates a durable competitive edge.

Torsten Volk

By

Torsten Volk, Principal Analyst

Published: 22 Sep 2025

"Your data is your moat," says Jeetu Patel, President and Chief Product Officer at Cisco.

This idea anchors Splunk's core mission: making all data -- machine-generated, human-generated, transactional or security-centric -- available for secure, compliant and cost-effective use across an organization.

By doing this, organizations finally benefit from the vast amounts of data collected daily. Buried in this data is their unique institutional knowledge -- the operational context, cause-and-effect patterns and business insights that help them solve problems faster, make better decisions and build AI systems competitors can't replicate.

Cisco sees machine data as a major untapped opportunity. Most AI models are trained primarily on human data, and the supply of high-quality human data is leveling off. At the same time, organizations continue to generate massive streams of proprietary machine data that are largely unused but have the potential to give their AI a competitive advantage. This is why Splunk is focused on "teaching AI to speak machine data."

Unlike human-generated data, machine data is not widely available online. It is specific to each organization and largely untapped for AI training and insights. For example, combining historical ticketing data, engineering reports, technical documentation and operational telemetry can train AI agents that answer customer questions instantly and accurately. A competitor could replicate the training process but wouldn't have access to proprietary data. They could never reproduce the same depth of institutional knowledge, operational context and business history in their agents. All major Splunk .conf25 announcements focused on helping organizations access, process and act on this data.

Data-driven decision-making is hard

Photo of a presenter at Splunk .conf25 in front of a slide that reads — Splunk .conf25 presentation slide.

Even though data-driven decision-making has been a goal for years, most organizations still struggle to get there. Silos, inconsistent data quality and the high cost of processing and storage continue to keep institutional knowledge locked in their fragmented data repositories

Cisco Data Fabric: Federated data access across the enterprise

Cisco Data Fabric provides the foundational layer that makes the goal of data-driven decision-making possible. It unifies access to data across hybrid and multi-cloud environments and applies consistent governance, security and policy controls to both human- and machine-generated data.

It integrates with Cisco's networking and compute stack to ingest and correlate real-time infrastructure telemetry alongside traditional data. Its distributed "ludicrous scale" architecture supports petabyte-to-exabyte volumes without centralizing everything, while built-in policy enforcement and lineage tracking maintain compliance across domains.

Data Fabric also supports edge-based preprocessing to cut cost and latency and connects third-party and cloud-native sources through open APIs to avoid lock-in. To jumpstart adoption, Cisco is seeding the fabric with curated machine data, such as making firewall data available for free, to improve security analytics and speed up AI model training. It also ties tightly to Splunk Machine Data Lake (MDL), the Time Series Foundation Model (TSFM) and Cisco AI Canvas so AI models and observability workflows receive high-quality, contextualized data from a single, trusted source of truth.

Edge-based processing

To keep "ludicrous scale" affordable, Cisco moves parts of the data pipeline to the edge. Even though the Cisco Data Fabric is federated and doesn't require all data to be centralized, preprocessing telemetry data close to its source allows organizations to filter out noise, aggregate signals and compress payloads before they enter the fabric's data streams. This reduces cost and latency while improving the data quality that reaches downstream analytics and AI models.

Splunk Machine Data Lake

A photo of a presenter at Splunk .conf25 in front of a slide that reads — Splunk .conf25 presentation announcing Machine Data Lake

The Splunk Machine Data Lake (MDL) is the persistent data layer atop Cisco Data Fabric. While the fabric federates and governs access to data wherever it resides, MDL is where high-value machine data is stored, curated and optimized for analysis. It ingests and retains massive volumes of historical and real-time telemetry from Cisco network and security devices, cloud workloads and third-party sources.

Designed for high-throughput, low-latency access to time-series data, MDL powers training models and operational analytics at scale and underpins the upcoming TSFM.

Splunk AI Toolkit

A photo of a presenter at Splunk .conf25 in front of a slide that reads — Splunk .conf25 presentation showing the features of AI Toolkit.

Building on this foundation, the Splunk AI Toolkit lets teams apply generative AI (GenAI) directly to the data stored in the MDL so teams can build custom agents for incident summaries, log classification, anomaly triage and natural language queries. It provides a controlled environment for combining proprietary telemetry with LLMs, creating domain-specific copilots that can later run within Cisco AI Canvas.

ChatGPT for machine data

A photo of a presenter at Splunk .conf25 in front of a slide shows an example of time series data. — Splunk .conf25 presentation illustrating time series data.

While the Splunk AI Toolkit supports custom copilots, Cisco's upcoming TSFM adds a pretrained AI engine purpose-built for large-scale machine data. TSFM is a purpose-built AI model designed to detect patterns, predict issues and generate insights from massive time-series data streams. Cisco announced that TSFM will be released as open source on Hugging Face in November 2025 and aims to establish it as a de facto industry standard. Organizations can then fine-tune TSFM locally using their own data inside the MDL.

The value of a TSFM

Infographic with 5 steps to uncovering systemic failures with TSFM. Step 0: Hidden systemic failure. Step 1: Correlate anomalies. Step 2: Trace chain reaction. Step 3: Surface failure pattern. Result: Visible root causes. — Figure 1. Uncovering systemic failures with TSFM.

TSFM can correlate faint anomalies across layers to reveal root causes long before they become visible outages. For example, it might connect a slight temperature rise in a switch with small drops in fan speed and power stability, minor upticks in network retransmissions and database timeouts, and subtle slowdowns in application performance and third-party APIs. By linking these weak signals across infrastructure, network and application layers, TSFM surfaces hidden systemic failure patterns that no single dashboard would reveal on its own.

Unlike competitors such as Datadog and Dynatrace, Cisco can feed TSFM with deep hardware and network-layer sensor data from its own switches, routers and servers, giving it visibility into early failure signals that competing platforms don't have access to.

Cisco AI Canvas

A screenshot of the AI Canvas dashboard web UI. — A screenshot of the Splunk AI Canvas dashboard.

AI Canvas is a collaborative layer atop MDL and the TSFM. Users can ask natural-language questions about telemetry, and the system uses AI agents to retrieve relevant data, run analyses, and auto-generate visual components -- charts, anomaly panels and service maps -- on the fly.

Dynamic widget generation

When a user or AI agent formulates a hypothesis, such as "Are we seeing memory pressure on East Coast nodes correlated with error spikes?", the Canvas will perform three steps:

Query MDL for relevant telemetry such as metrics, logs, traces and events.
Run inference using TSFM or other models to detect patterns.
Render the results as an interactive widget -- time-series graphs, node maps or funnel charts -- in the shared canvas.

Multi-user, multi-agent collaboration

Infographic showing the Data analysis and visualization workflow between the four Cisco and Splunk products. — Figure 2. Data analysis workflow visualization.

Canvas supports real-time co-authoring by both humans and AI agents. One agent might summarize recent incidents while another generates a performance regression chart, and a human SRE can tie them together with annotations or automated remediation steps.

Instead of forcing teams to prebuild dashboards for every scenario, Canvas lets them generate new visualizations on demand as questions arise. This turns siloed data into a living investigation surface. The approach is especially powerful for complex hybrid-cloud environments, where static dashboards rarely keep up with change.

Conclusion

Focusing on data makes strategic sense for Splunk because complete, contextualized data is the foundation for identifying problems before they affect the organization. By introducing vast amounts of machine data from Cisco's network and security devices and using it to train a dedicated time-series foundation model, Splunk gives GenAI agents the ability to detect patterns directly from raw telemetry -- patterns traditional LLMs, trained only on human-generated data, would miss entirely. Conventional LLMs may see indirect descriptions of incidents, but a model trained on machine data can analyze system dynamics directly to uncover new failure modes. Allowing customers to use time series data from Cisco's proprietary network equipment sets Splunk apart and capitalizes on Cisco's dominating market share in networking equipment.

Cisco also makes large-scale processing economically viable through edge processing on its networking hardware and pricing changes, such as free firewall logs ingestion.

The emergence of an agent framework that dynamically generates user-centric dashboards marks the culmination of Cisco's data-centric strategy. It signals a new frontier in how observability platforms differentiate themselves: not just visualizing what is happening, but continuously discovering and explaining why.

Torsten Volk is principal analyst at Enterprise Strategy Group, now part of Omdia, covering application modernization, cloud-native applications, DevOps, hybrid cloud and observability.

Enterprise Strategy Group is part of Omdia. Its analysts have business relationships with technology providers.

Dig Deeper on Data management strategies

Part of: Reports from Splunk .conf25

Up Next

Under Cisco, Splunk AI roadmap tees up pricing overhaul

Splunk's plans to develop a federated AI data management platform for Cisco will include a new pricing program that lowers data ingestion costs.

Cisco-Splunk strategy shift unveiled with Data Fabric

Cisco Data Fabric emphasizes bringing Splunk analytics to data where it lives, rather than a central ingestion point, and will add more third-party data sources such as Snowflake.

Splunk preps OpenLLMetry tie-ins for deeper AI monitoring

Detailed visibility into internal communications will be essential to enterprise trust in AI agents, and something Splunk and the OpenTelemetry project intend to offer.

Cisco and Splunk are teaching AI to anticipate system failures

Cisco and Splunk use machine data to train a new time-series foundation model that surfaces hidden issues and creates a durable competitive edge.

Search Business Analytics

Ng: Biggest benefit of AI may be unlocking unstructured data
Tech entrepreneur Andrew Ng says that in addition to autonomous action, one of the most beneficial applications of AI is enabling...
15 common data science techniques to know and use
Data scientists use statistical and analytical techniques to analyze data sets. Here are 15 popular classification, regression ...
Master these skills to get the right data scientist role
Data science offers many professional opportunities. Balance education and experience to present yourself as an adaptable and ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

Search Content Management

OpenText's agentic AI glow-up
While OpenText has talked AI agents before, its new vision is a radical departure.
OpenText users get their Aviator enterprise content AI platform
OpenText: More homegrown AI tools, fewer acquisitions.
How to create a digital signature in Adobe, Preview or Word
Business executives can use different tools and methods to get digital signatures to close deals, but some important security ...

Search Oracle

Click-to-launch tools pull apps through Oracle Cloud Infrastructure marketplace
Oracle has made it easier for customers to choose and launch third-party software onto its cloud. Now, the question is whether ...
Willis develops app to put a personal touch back in voluntary benefits
Part two of a two-part article: Willis uses PeopleSoft 9.1 to bring back the personal feel to automated insurance selection for ...
Willis develops app for real-time voluntary benefit selection
Part one of a two-part article: Willis uses PeopleSoft 9.1 to create real-time automated insurance selection for voluntary ...

Search SAP

At TechEd, SAP continues to lay down the AI data foundation
New tools to speed up agentic AI development, open SAP platforms and provide access to data products were also touted as helping ...
SAP pitches role-based Joule assistants as ERP work partners
New AI-driven applications for supply chain, procurement and CX also shared the spotlight as SAP strives to portray its broad ...
There are '50 shades of clean core' for SAP customers
In this Q&A, Michael Lemashov and Denis Malov of JDC Group discuss the strategies for SAP customers to achieve a clean core and ...

Close