New Databricks tool targets streaming data cost, complexity
With Zerobus Ingest, the vendor is providing a service that differentiates it from competitors by reducing the tools required to ingest real-time data to its lakehouse platform.
Databricks on Monday launched Zerobus Ingest, a new feature in its Lakeflow Connect data ingestion service aimed at simplifying and dramatically reducing the cost of streaming data collection.
Streaming data feeds real-time intelligence, which is particularly valuable for AI applications such as agents, transforming them from passive assistants into context-aware applications capable of taking autonomous action based the most relevant data possible. Agentic AI, meanwhile, is the dominant focus of many current enterprise development initiatives given its potential to make entire organizations better informed and more efficient.
Many streaming data pipelines, however, are complex to configure and costly to maintain, turning them into hindrances to development rather than enablers.
They can include Apache Kafka or a similar platform -- sometimes referred to as a bus -- to provide the streaming data architecture, schema registries to store and validate data structures, connector frameworks to link centralized environments to data sources, numerous storage systems and multiple layers for engineers to address and refine data quality. In addition, streaming data is often processed outside of an organization's normal data governance framework given that such frameworks are generally designed for static data.
The result is an intricate, costly system with governance shortfalls that create compliance risks and poor data lineage.
Tackling cost and complexity
Zerobus Ingest is a fully managed service designed for Databricks customers that want to ingest streaming data solely into their Databricks lakehouse; it is not for streaming systems that feed multiple endpoints. With that sole destination rather than multiple databases and other potential landing spots, it drastically simplifies streaming data pipelines and lowers the cost of ingesting real-time information.
Zerobus Ingest -- unlike Kafka or other streaming data frameworks designed as a universal hub for routing data to numerous end points -- streams data directly into governed Delta tables where it can be used to can feed AI pipelines and inform other applications that drive business processes and decisions.
With Zerobus Ingest, customers can potentially achieve exponentially faster insights at scale. This was a top item missing in Databricks.
William McKnightPresident, McKnight Consulting
Given its potential to lower costs and simplify streaming data pipelines, the service is a valuable addition for Databricks users, according to William McKnight, president of McKnight Consulting.
"Zerobus Ingest is significant because it allows for massive, real-time streaming at scale into Databricks without the infrastructure overhead of complex messages buses, which will result in fewer operational delays and lower costs," he said. "With Zerobus Ingest, customers can potentially achieve exponentially faster insights at scale. This was a top item missing in Databricks."
However, Zerobus Ingest's scope is limited, McKnight continued, noting that it is solely designed to route streaming data to Databricks' lakehouse and not to various other systems an enterprise might deploy.
"If a user needs to route events to dozens of downstream systems simultaneously, Zerobus Ingest is not built for that," he said. "It's not going to be for everything in the enterprise. … Zerobus Ingest sacrifices multi-sink routing, event replay, and exactly-once delivery in exchange for architectural simplicity, making it a great tool for single-destination lakehouse ingestion rather than a universal message hub."
Stewart Bond, an analyst at IDC, likewise noted that Zerobus Ingest is a replacement for another streaming data system only if the purpose is to transport event data into a Databricks lakehouse.
However, he added that the service is nevertheless important for Databricks users connecting their lakehouse to streaming data sources. Used in conjunction with Lakeflow Connect for stream processing and analysis, Zerobus Ingest will help fuel real-time workflows.
"It will be a significant new capability for Databricks customers that may have been relying on event brokers for ingesting high volume/high frequency," Bond said. "It not only will simplify the architecture and operations, but it will also reduce latency allowing for nearer real-time analytics."
Setting up a governed streaming workflow using Zerobus Ingest is a two-step process. First, users create a table in their Unity Catalog -- Databricks' data catalog for centralizing governance -- and second, they write data to that table using a prebuilt application programming interface or software development kit provided by Databricks.
Zerobus Ingest's serverless architecture then takes over, scaling up or down as needed without any configuration changes.
By reducing the number of tools required to set up a streaming data pipeline, including eliminating the amount of compute and storage required when using platforms such as Apache Kafka, Databricks is aiming to not only reduce complexity but help customer lower the spending on cloud computing.
In response, AWS recently made cost control a focal point of the data management features it unveiled during its annual re:Invent user conference. Vendors such as Aerospike and ThoughtSpot have also made performance efficiency and cost control points of emphasis.
With Zerobus Ingest, Databricks is responding to customer feedback regarding the cost and complexity of streaming data pipelines, according to Elise Georis, senior staff product manager at Databricks
"Zerobus Ingest was driven directly from our customers, especially those managing high-velocity [Internet of Things], clickstream and telemetry data," she said. "We saw a recurring pattern."
Specifically, Databricks users had to implement architectures that required data to be copied and transformed multiple times using streaming platforms for staging.
"This worked, but it also introduced operational complexity, data duplication and high costs, requiring specialized teams to spend their valuable time on all of this plumbing," Georis said. "To address this, we went back to first principles to rethink how data moves into the lakehouse. The result is Zerobus Ingest."
While beneficial for Databricks customers, potentially helping enterprises more successfully move pilots into production by simplifying streaming data ingestion, Zerobus Ingest's is not the only ingestion service that doesn't require a central hub such as Kafka, according to McKnight. However, it is nevertheless important for Databricks and its users.
"Zerobus Ingest [collapses] the ingestion pipeline to escape pilot purgatory, reflecting a broader market shift toward bus-free services," McKnight said. "By removing intermediary brokers, these solutions prioritize lower total cost of ownership and streamlined architectural management. It is not a unique category. Other major vendors offer similar 'bus-free' ingestion services."
Looking ahead
Zerobus Ingest is the latest in a recent series of features that attempt to simplify aspect of Databricks' platform.
For example, Lakebase helps make it easier to manage disparate data types to fuel AI development and Agent Bricks provides a framework for developing agents. To better serve the needs of existing customers and perhaps even attract new ones, McKnight suggested that Databricks continue to make simplicity a focal point of its product development plans.
"To establish Databricks as the aspirational operating system for Agentic AI, they must prioritize serverless, scale-to-zero efficiency and Agent Bricks' prebuilt operational templates that deliver immediate business value over experimental complexity," he said.
Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than three decades of experience. He covers analytics and data management.