Cisco and Splunk are teaching AI to anticipate system failures
Cisco and Splunk use machine data to train a new time-series foundation model that surfaces hidden issues and creates a durable competitive edge.
"Your data is your moat," says Jeetu Patel, President and Chief Product Officer at Cisco.
This idea anchors Splunk's core mission: making all data -- machine-generated, human-generated, transactional or security-centric -- available for secure, compliant and cost-effective use across an organization.
By doing this, organizations finally benefit from the vast amounts of data collected daily. Buried in this data is their unique institutional knowledge -- the operational context, cause-and-effect patterns and business insights that help them solve problems faster, make better decisions and build AI systems competitors can't replicate.
Cisco sees machine data as a major untapped opportunity. Most AI models are trained primarily on human data, and the supply of high-quality human data is leveling off. At the same time, organizations continue to generate massive streams of proprietary machine data that are largely unused but have the potential to give their AI a competitive advantage. This is why Splunk is focused on "teaching AI to speak machine data."
Unlike human-generated data, machine data is not widely available online. It is specific to each organization and largely untapped for AI training and insights. For example, combining historical ticketing data, engineering reports, technical documentation and operational telemetry can train AI agents that answer customer questions instantly and accurately. A competitor could replicate the training process but wouldn't have access to proprietary data. They could never reproduce the same depth of institutional knowledge, operational context and business history in their agents. All major Splunk .conf25 announcements focused on helping organizations access, process and act on this data.
Data-driven decision-making is hard

Even though data-driven decision-making has been a goal for years, most organizations still struggle to get there. Silos, inconsistent data quality and the high cost of processing and storage continue to keep institutional knowledge locked in their fragmented data repositories
Cisco Data Fabric: Federated data access across the enterprise
Cisco Data Fabric provides the foundational layer that makes the goal of data-driven decision-making possible. It unifies access to data across hybrid and multi-cloud environments and applies consistent governance, security and policy controls to both human- and machine-generated data.
It integrates with Cisco's networking and compute stack to ingest and correlate real-time infrastructure telemetry alongside traditional data. Its distributed "ludicrous scale" architecture supports petabyte-to-exabyte volumes without centralizing everything, while built-in policy enforcement and lineage tracking maintain compliance across domains.
Data Fabric also supports edge-based preprocessing to cut cost and latency and connects third-party and cloud-native sources through open APIs to avoid lock-in. To jumpstart adoption, Cisco is seeding the fabric with curated machine data, such as making firewall data available for free, to improve security analytics and speed up AI model training. It also ties tightly to Splunk Machine Data Lake (MDL), the Time Series Foundation Model (TSFM) and Cisco AI Canvas so AI models and observability workflows receive high-quality, contextualized data from a single, trusted source of truth.
Edge-based processing
To keep "ludicrous scale" affordable, Cisco moves parts of the data pipeline to the edge. Even though the Cisco Data Fabric is federated and doesn't require all data to be centralized, preprocessing telemetry data close to its source allows organizations to filter out noise, aggregate signals and compress payloads before they enter the fabric's data streams. This reduces cost and latency while improving the data quality that reaches downstream analytics and AI models.
Splunk Machine Data Lake

The Splunk Machine Data Lake (MDL) is the persistent data layer atop Cisco Data Fabric. While the fabric federates and governs access to data wherever it resides, MDL is where high-value machine data is stored, curated and optimized for analysis. It ingests and retains massive volumes of historical and real-time telemetry from Cisco network and security devices, cloud workloads and third-party sources.
Designed for high-throughput, low-latency access to time-series data, MDL powers training models and operational analytics at scale and underpins the upcoming TSFM.
Splunk AI Toolkit

Building on this foundation, the Splunk AI Toolkit lets teams apply generative AI (GenAI) directly to the data stored in the MDL so teams can build custom agents for incident summaries, log classification, anomaly triage and natural language queries. It provides a controlled environment for combining proprietary telemetry with LLMs, creating domain-specific copilots that can later run within Cisco AI Canvas.
ChatGPT for machine data

While the Splunk AI Toolkit supports custom copilots, Cisco's upcoming TSFM adds a pretrained AI engine purpose-built for large-scale machine data. TSFM is a purpose-built AI model designed to detect patterns, predict issues and generate insights from massive time-series data streams. Cisco announced that TSFM will be released as open source on Hugging Face in November 2025 and aims to establish it as a de facto industry standard. Organizations can then fine-tune TSFM locally using their own data inside the MDL.
The value of a TSFM

TSFM can correlate faint anomalies across layers to reveal root causes long before they become visible outages. For example, it might connect a slight temperature rise in a switch with small drops in fan speed and power stability, minor upticks in network retransmissions and database timeouts, and subtle slowdowns in application performance and third-party APIs. By linking these weak signals across infrastructure, network and application layers, TSFM surfaces hidden systemic failure patterns that no single dashboard would reveal on its own.
Unlike competitors such as Datadog and Dynatrace, Cisco can feed TSFM with deep hardware and network-layer sensor data from its own switches, routers and servers, giving it visibility into early failure signals that competing platforms don't have access to.
Cisco AI Canvas

AI Canvas is a collaborative layer atop MDL and the TSFM. Users can ask natural-language questions about telemetry, and the system uses AI agents to retrieve relevant data, run analyses, and auto-generate visual components -- charts, anomaly panels and service maps -- on the fly.
Dynamic widget generation
When a user or AI agent formulates a hypothesis, such as "Are we seeing memory pressure on East Coast nodes correlated with error spikes?", the Canvas will perform three steps:
- Query MDL for relevant telemetry such as metrics, logs, traces and events.
- Run inference using TSFM or other models to detect patterns.
- Render the results as an interactive widget -- time-series graphs, node maps or funnel charts -- in the shared canvas.
Multi-user, multi-agent collaboration

Canvas supports real-time co-authoring by both humans and AI agents. One agent might summarize recent incidents while another generates a performance regression chart, and a human SRE can tie them together with annotations or automated remediation steps.
Instead of forcing teams to prebuild dashboards for every scenario, Canvas lets them generate new visualizations on demand as questions arise. This turns siloed data into a living investigation surface. The approach is especially powerful for complex hybrid-cloud environments, where static dashboards rarely keep up with change.
Conclusion
Focusing on data makes strategic sense for Splunk because complete, contextualized data is the foundation for identifying problems before they affect the organization. By introducing vast amounts of machine data from Cisco's network and security devices and using it to train a dedicated time-series foundation model, Splunk gives GenAI agents the ability to detect patterns directly from raw telemetry -- patterns traditional LLMs, trained only on human-generated data, would miss entirely. Conventional LLMs may see indirect descriptions of incidents, but a model trained on machine data can analyze system dynamics directly to uncover new failure modes. Allowing customers to use time series data from Cisco's proprietary network equipment sets Splunk apart and capitalizes on Cisco's dominating market share in networking equipment.
Cisco also makes large-scale processing economically viable through edge processing on its networking hardware and pricing changes, such as free firewall logs ingestion.
The emergence of an agent framework that dynamically generates user-centric dashboards marks the culmination of Cisco's data-centric strategy. It signals a new frontier in how observability platforms differentiate themselves: not just visualizing what is happening, but continuously discovering and explaining why.
Torsten Volk is principal analyst at Enterprise Strategy Group, now part of Omdia, covering application modernization, cloud-native applications, DevOps, hybrid cloud and observability.
Enterprise Strategy Group is part of Omdia. Its analysts have business relationships with technology providers.