Apache Kafka 3.1 opens up data streaming for analytics
The latest release of open source Kafka data streaming technology brings a series of new features that provide users with usability improvements for real-time data queries.
Apache Kafka is continuing to build out its event data streaming technology platform as the open source project moves forward.
Apache Kafka 3.1 became generally available on Jan. 24, providing users of the open source event streaming technology with a series of new features.
Organizations use Kafka to enable real-time data streams that can be used for operations, business intelligence and data analytics.
Kafka is a developed by an open source community of developers that includes Confluent, an event streaming vendor that provides a commercial platform for Kafka, as well as Red Hat, which has a managed Kafka service.
Gartner analyst Merv Adrian said he looks at Kafka as a data source that feeds a database.
"More uses and users are moving upstream to engage with data in motion, before it comes to rest, and Kafka and its adjacent technologies are moving to capture share of that business," Adrian said.
Apache Kafka 3.1 brings new event data streaming functionality
The Kafka 3.1 release brings several improvements, including OpenID Connect for authentication, noted Simon Woodman, engineering manager for Kafka at Red Hat.
"This increases the flexibility of using different authentication providers," Woodman said.
Merv AdrianAnalyst, Gartner
Danica Fine, senior developer advocate at Confluent, identified a feature known as KIP-775 (Kafka improvement proposal) as the most powerful new update in the Kafka 3.1 release.
Fine explained that users previously were limited in their ability to conduct foreign key joins in Kafka streams. Before Kafka 3.1, both data tables had to be partitioned using the default Kafka partitioner, which wasn't easy for all applications.
"Having the ability to leverage custom partitioners in foreign key joins should alleviate headaches for quite a few users," she said.
Topic identifiers land in Apache Kafka 3.1
Another highlight of the Kafka 3.1 update is KIP-516, which brings a capability known as topic identifiers to Kafka streams. In Kafka, a topic is the primary way that data is organized, much like how the main way data is organized in a traditional database is with data tables.
"Topic IDs provide a safer way to fetch data from topics without any chance of incorrectly interacting with stale topics with the same name," the Kafka 3.10 release notes state.
Fine said KIP-516 is less about adding functionality than it is preventing annoying things from happening. She explained that the introduction of topic IDs through KIP-516 effectively ensures that stale data isn't a problem for users.
In the past, Fine explained that if a topic with a given name was deleted and recreated later with the same name, it was possible under certain circumstances that consumers could see stale data from the old version of the topic.
By assigning a universal unique identifier (UUID) to every topic and having Kafka refer to the topic by UUID rather than given name, the stale data problem is resolved, she said.
Range queries opens up Kafka for analytics
Enterprises are increasingly using Kafka for analytics, an application that gains more support in the 3.1 update.
KIP-763 enables range queries -- common queries in which users query data within a certain set of boundaries or "ranges" -- with open endpoints for Kafka.
Before Kafka 3.1, there were ways to get around the lack of open range endpoints queries, but users had to filter what the lowest and highest points would look like for data on their own, Fine noted.
"I envision this [KIP-763 ] making analytics a lot more appealing to teams already using Kafka streams, mostly because it makes accessing the data you need so much less intimidating," she said.