Industry tackles observability's data management problems

Enterprises and the vendors they buy from have begun to fundamentally rethink data management for observability, including a new startup with close ties to Cisco.

Long term, most organizations won't be able to keep pace with the growth of observability data volumes, whether in terms of data gathering and ingestion into analytics tools or the cost of retaining and accessing data to troubleshoot problems.

That was the message from multiple speakers at the recent Monitorama conference, including Joseph Ruscio, a general partner at Heavybit Industries, an early-stage investor in cloud infrastructure startups.

"Along with [cloud-native] complexity … we now have deep systems [with] large fan-outs which are very hard to reason about … our metrics systems are pre-materializing views for 'known unknowns,' which is great," Ruscio said during his presentation. "[But] it turns out, that's how you end up with a million-dollar Datadog bill."

Overwhelming data growth stands to generate untenable data storage and network transfer costs for many enterprises and make understanding complex systems harder and harder, Ruscio said. In response, the industry will have to undergo a shift in observability data management techniques starting with the architecture of tools.

Fresh tools refine observability data management at the source

This architectural rethink has already begun with changes to how observability tools gather and store data. Vendors such as Dynatrace, with a back-end overhaul last year named Grail, have pledged to curtail storage and processing costs and better organize data for analysis.

Another emerging technology, eBPF, facilitates high-performance, lower-cost data gathering in cloud-native environments. One observability startup, Groundcover, released a redesigned monitoring agent named Flora for its eBPF-based tools in April, which it claims creates a highly efficient data pipeline within the Linux kernel on monitored machines. The OpenTelemetry project has standardized data gathering processes for distributed tracing by various observability tools as well.

Observability vendors such as Cribl and Mezmo have also rolled out data pipelines that groom and reduce data before it's sent to back-end systems along with federated search that doesn't require data be moved to a centralized repository at all. Log analytics bellwether Splunk also expanded in the federated search realm this month with support for data stored in low-cost AWS S3 buckets. AIOps vendor BigPanda began to offer a data engineering service for its platform in February to help customers cleanse and normalize data before it's ingested into its systems for AI analysis.

These shifts toward more efficient data gathering as well as analyzing data in place will continue, according to Ruscio.

"[Observability] SaaS vendors are reselling cloud at 80% margins, and that's not a good place to be," he said during his Monitorama presentation. "We are definitely going to have to find ways, or new ways will emerge -- hybrid architectures where the data plane comes on-premises, to some extent, to address this."

ESG observability data management survey responses
Enterprise ops and DevOps pros concerned about observability data growth have used multiple techniques to reduce data volumes, but also find themselves paying more for data storage.

CloudFabrix cozies up to Cisco Full-Stack Observability

Another player in the market's observability architecture rethink that has emerged over the last year is CloudFabrix. It issued software updates beginning in mid-2022 that brought several trending architectural elements together, including data pipelines, data management hygiene and normalization techniques, along with commodity OpenTelemetry instrumentation. The CloudFabrix Robotic Data Automation Fabric (RDAF), launched a year ago. It uses low-code/no-code bots to gather and enrich data and then sends it to a pipeline that maps infrastructure and application dependencies before displaying the data to the end user.

Last month, CloudFabrix added an observability data modernization service that ties in with Cisco's Full-Stack Observability (FSO) product line. The service uses CloudFabrix bots and pipelines to convert data sources into OpenTelemetry-compatible form. A version of RDAF for edge computing that packs the company's data-gathering tech into a single virtual machine is also available first for Cisco FSO.

CloudFabrix plans to integrate with other observability vendors' products, but its initial focus on Cisco isn't necessarily a coincidence. The company's founders have sold all three of their previous startups to Cisco:

  • Jahi in 2004, for infrastructure and API management, which became part of the Cisco Enhanced Device Interface.
  • Pari Networks in 2011, for infrastructure compliance, security and lifecycle management, which became part of Cisco Smart Net Total Care and Network Configuration Change Management.
  • Cloupia in 2012, for cloud data center management and automation, which became Cisco UCS Director and Intersight, according to company briefing documents.

A CloudFabrix exec said there are no discussions about another spin-in on the table. But the fact that it was a launch partner for Cisco's FSO tools last month demonstrates that the company can fill gaps in the networking giant's portfolio, such as collecting legacy VMware data via OpenTelemetry, according to Gregg Siegfried, an analyst at Gartner.

Still, it's relatively early for the latest CloudFabrix features, he said.

"It's an intriguing company," Siegfried said. "But so far it's been hard for me to separate the stuff from the fluff in terms of what [the CloudFabrix product] can actually do today."

In the meantime, enterprises make do

Along with [cloud-native] complexity … we now have deep systems [with] large fan-outs which are very hard to reason about.
Joseph RuscioGeneral partner, Heavybit Industries

While they await new observability data management tools, the average enterprise organization surveyed this year by TechTarget's Enterprise Strategy Group (ESG) in its Distributed Cloud Series uses two or three strategies to mitigate data growth. Respondents concerned about observability data growth in 2023 said they use methods including tools that optimize log volume or use sampling to reduce observability data, moving data to a lower-cost storage tier or platform, and shortening the retention period for logs. More than half of the respondents said they also increased storage spending.

Though it's not a long-term solution to exponential observability data growth, it has bought some time. This year's edition of the ESG survey showed a slight lessening of enterprise anxiety around observability data growth since last year. The 2022 survey of 357 IT and DevOps professionals found that 71% of respondents agreed with the statement, "Observability data (metrics, logs, traces) is growing at a concerning rate." By comparison, 69% of 293 respondents to the 2023 edition of the survey agreed with that statement.

"While this number is still alarmingly high, the drop might indicate that some of the strategies being employed are actually working, or that organizations are just sucking it up and eliminating the problem by budgeting better," said Jonathan Brown, senior analyst at ESG, a division of TechTarget.

Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.

Dig Deeper on IT systems management and monitoring

Software Quality
App Architecture
Cloud Computing
Data Center