Getty Images/iStockphoto

Confluent platform update targets streaming data quality

The vendor unveiled a tool designed to address concerns about the quality of continuous data as well as a feature that enables secure continuous data sharing.

Confluent unveiled a host of new features for its cloud-based data streaming platform, highlighted by an expansion of its Stream Governance suite of tools and new sharing capabilities.

The vendor offers Confluent Cloud as a managed service for its cloud customers and Confluent Platform for on-premises users. Both are built on Apache Kafka, an open source technology for streaming data, and enable customers to capture and ingest data from disparate sources as events occur to fuel real-time decisions.

The new Confluent Cloud capabilities were introduced on May 16 during Kafka Summit London, a conference organized by Confluent for developers, data engineers and other users of Apache Kafka.

Before this most recent update, Confluent's platform improvements included added governance for streaming data pipelines in October 2022 and tools that make it easier to integrate Confluent with multi-cloud deployments in July 2022.

New capabilities

Confluent first launched Stream Governance in 2021, providing customers with a fully managed suite of governance capabilities for Apache Kafka and other streaming data tools.

In October 2022, the vendor added Stream Governance Advanced, which is designed to help organizations govern complex pipelines and manage how streaming data can be shared and used.

Now Confluent has added Data Quality Rules to the governance suite in a move aimed at ensuring the quality of data streams so they are ready for consumption and resilient to changes over time.

The tool automatically validates the values of individual fields within data streams to ensure the integrity of the data, enables data engineers to quickly resolve data quality problems with customizable actions, and uses migration rules to transform streaming messages from one data format to another so streams remain consistent even as new data is ingested.

Streaming data quality is an ongoing challenge for many organizations, according to Kevin Petrie, an analyst at Eckerson Group. Many organizations have started using third-party data observability tools to address data quality.

As a result, adding Data Quality Rules will likely become an attractive feature for users of Confluent's platform.

"Enterprises feel perpetual pain when it comes to data quality," Petrie said. "By enforcing rules for data validation, resolution and schema evolution, Confluent can reduce the risk of quality issues without implementing third-party tools. This reduces the effort of governance and provides data consumers with more valuable, trusted inputs."

Stewart Bond, an analyst at IDC, similarly noted the potential importance of Data Quality Rules. He cited a recent IDC study, which found that streaming data was one of the least trusted enterprise data sources.

"Confluent has been investing in stream governance capabilities for this very reason," Bond said. "Data quality capabilities in the stream are adding a form of data observability into data in motion use cases, providing opportunities to identify and correct data quality issues before downstream systems are affected."

In addition to Data Quality Rules, a tool called Stream Sharing has significant potential for Confluent users, Bond continued.

Stream Sharing is designed to let customers share streaming data both across their organization as well as with external Kafka users. It does so with built-in authenticated sharing, access management and other security and governance measures.

Bond noted that part of Kafka's intent is to enable sharing. But what stands out about Stream Sharing is that it can open up an organization's closed environment to other organizations.

"What is interesting about this announcement is adding the external element," he said. "Business-to-business data exchange is still very complicated, using point-to-point APIs, managed file transfers and electronic data exchange. Adding the ability to share data in near real-time … could cause a significant disruption in how data is exchanged between business partners in industry ecosystems."

Other new features

Beyond Data Quality Rules and Stream Sharing, Confluent's latest platform update includes three other new features:

  • Custom connectors. Prebuilt connectors that enable users to link any data system to their organization's Kafka Connect plugins without requiring them to change code, ensure the health of users' connectors with logs and metrics and eliminate the burden of constantly managing their connector infrastructure.
  • Kora. An Apache Kafka engine built for the cloud designed to enable Confluent Cloud users to scale considerably faster than before while also removing data storage limits and powering workloads with low latency.
  • Early access to Confluent's Apache Flink. A stream processing tool designed to handle large-scale data streams.

One of Confluent's main strengths is its close connection with the Kafka community, according to Bond. That close connection includes Jun Rao, co-founder of Confluent and Kafka's co-creator.

As a result, Confluent is frequently at the forefront of new innovations related to Kafka, as evidenced by Custom Connectors and the development of Kora. Among the vendor's competitors are Cloudera and Tibco as well as tech giants AWS, Google and Microsoft, which all offer data streaming platforms.

Enterprises feel perpetual pain when it comes to data quality. By enforcing rules for data validation, resolution and schema evolution, Confluent can reduce the risk of quality issues without implementing third-party tools.
Kevin PetrieAnalyst, Eckerson Group

"We typically see Confluent ahead of its competitors because of tight connections and influence in the Kafka community," Bond said. "Confluent tends to be at the forefront of bringing new innovations to Kafka -- in part because of its legacy and also because of the experience Confluent has gained in managing Kafka environments for customers."

While Confluent's relationship with Kafka may be one of its strengths, that focus on one tool may also be holding it back.

Kafka is not the only tool that can be used to move streaming data, Bond noted. Therefore, Confluent would be wise to expand its relationship with other event streaming platforms, such as Pulsar, a fast-growing competitor to Kafka, he said.

"Confluent wants to be the software vendor that is synonymous with 'data in motion,'" Bond said. "But Kafka is not the only technology that can be used to move data. While the market penetration is low, Confluent could look at support of alternative data movement technologies such as Pulsar."

Petrie, meanwhile, said that Confluent's added support for Apache Flink will benefit users once generally available and help keep the vendor competitive with its peers.

Most organizations use Kafka, he noted. Now, however, many are adding Flink.

"Flink helps with specialized stream processing, which has become increasingly important to assist real-time machine learning use cases," Petrie said. "So Confluent is correct to add Apache Flink capabilities as well."

Looking ahead

With Confluent's latest platform update includes features aimed at better streaming data governance and expanded streaming data sharing, it does not include one of the technologies many data management and analytics vendors are now attempting to incorporate into their platforms: generative AI.

In the six months since OpenAI introduced ChatGPT, not only have Microsoft and Google used the technology to improve search engines but also data vendors -- despite concerns about its security and accuracy -- have begun developing capabilities to improve their querying and machine learning tools.

For example, both Tableau and ThoughtSpot are among the analytics vendors that are integrating generative AI throughout their platforms, while Informatica is combining its existing AI engine with generative AI.

Generative AI, therefore, is something Confluent could potentially add throughout its platform, according to Bond.

"Clearly there is a lot of hype right now in regard to generative AI," he said. "It would be interesting to see how Confluent will be addressing it in the context of Kafka and Confluent Cloud."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies

Business Analytics
Content Management