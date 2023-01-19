Growth in the size of corporate data sets and the increasing impact of digital apps on business bottom lines have IT teams borrowing from the AI world to keep pace with observability.

Data pipelines are a component of DataOps, an organizational approach to data management which arose out of the need to optimize data sets for big data analytics to enhance their business value. DataOps applies many of the Agile and DevOps principles familiar to software developers, such as breaking down silos between data sets, encouraging self-service and collaboration, and embracing IT automation for repetitive tasks.

Data pipeline tools already well established in this field include the open source project Kafka and Amazon's Kinesis. These frameworks automate and standardize the process of gathering, transforming and migrating data from its source into repositories optimized for AI.

Some early adopter companies, such as Ticketmaster, have used Kafka to feed observability systems for several years. But the practice is now going mainstream as more enterprises create microservices applications and work with distributed cloud-native systems such as Kubernetes, according to IT experts.

"When you get into hybrid environments, multiple cloud regions and complex suites of applications, [it's important to] properly manage what is often very important business data now," said Gregg Siegfried, an analyst at Gartner. "It's not just, 'Is my stuff up or down?' but using telemetry to understand how well your business is performing."

Vendors go all-in with observability pipelines Data pipelines offer a systematic approach to collecting data from multiple clouds, regions and sources about the entire IT infrastructure, including networking and storage as well as applications. Some data pipeline tools for observability also look to create cost savings on back-end data storage systems by removing unnecessary data before it's ingested. This market segment, which Siegfried has dubbed telemetry pipelines, has grown especially rapidly over the last 18 months, he said. These newer vendors include Edge Delta, Calyptia and Mezmo. APM vendor Datadog also launched its own observability pipelines in June 2022. Cribl, founded in 2017, is considered the earliest mover in this field, Siegfried said. AIOps is among the most significant areas of investment in data center modernization for enterprises in 2023. Mezmo is now staking its business, realigned from its origins as log analytics vendor LogDNA, on that trend. To differentiate from general-purpose data pipelines such as Kafka and Kinesis as well as telemetry pipeline commercial competitors, Mezmo is developing a set of tools specifically geared toward observability for cloud automation that are based on open source, according to Tucker Callaway, the company's CEO. "We have an interesting opportunity to correlate data across streams while it's in motion to drive telemetry-related workflows," Callaway said. "We'll provide a set of correlation [features] out of the box for observability and security events. But store data in open formats so that customers can still own it." Mezmo's new product, which is still a work in progress, is slated to offer Kubernetes log data pipelines this quarter and is expected to add support for metrics and trace data and non-Kubernetes environments in subsequent releases. Ultimately, Callaway envisions linking Mezmo's data pipeline with AI and machine learning frameworks, such as Apache Flink, for streaming data analysis.