DataStax extended its Astra DB cloud database with a new capability -- generally available Thursday -- that integrates real-time data streaming for change data capture.
In January 2021, DataStax acquired event streaming vendor Kesque, which builds a platform based on the Apache Pulsar technology. DataStax brought that Pulsar technology to Astra DB to provide real-time change data capture (CDC) capabilities that can support analytics, business intelligence and operational applications.
Amy MachadoAnalyst, IDC
"Building a streaming change data capture solution linked to NoSQL Cassandra is a smart strategy for DataStax," said IDC analyst Amy Machado.
Why real-time data streaming matters for DataStax Astra DB
Machado noted that IDC survey data shows that the use of CDC is growing and more organizations plan to use it in the next 12 to18 months to populate real-time data streams. DataStax is listening to what developers and IT managers want, she said.
"CDC allows for fast data movement by listening to the data, supporting real-time streaming analytics and machine learning, to determine what is happening now in the business environment," she said.
Real-time data streams are well suited for hybrid or cloud environments and can boost insights at scale across the organization by helping to unlock data from departmental silos for more strategic use across the business, Machado said.
"IDC understands that most streaming data use cases revolve around transactional or operational procedures, but the percentage of organizations using CDC to drive real-time analytics is growing with big potential," she said.
How DataStax brings real-time streaming to Astra DB
Chris Latimer, vice president of product management at DataStax, explained that users of the vendor's database wanted to be able to take data from Astra DB and easily use it in another source, such as cloud data platform Snowflake.
"Up until now, that's been a challenge," he said.
With the new CDC capability, as soon as changes are made to the database, they are captured in real time with the Astra Streaming technology service, built on Apache Pulsar. From there, users can integrate the changes into other databases, messaging platforms, data warehouses or data lakes.
"It's a great way for you to take the data that's running on Astra DB and connect it anywhere else that you need it to go," Latimer said.
CDC is not an entirely new capability for the Cassandra database that is the foundation of Astra DB.
Latimer said the existing CDC capability in Cassandra had a number of shortcomings, including that it was non-deterministic.
Thus, change updates didn't always occur as the data changes in the database. Latimer said DataStax has improved the CDC mechanism so as soon as a change is made it is picked up in real time.
With Astra DB and Astra Streaming, Latimer DataStax now provides real-time streaming CDC capability as a fully managed cloud service.
Looking forward, DataStax is building out capabilities to enable its Pulsar-based streaming capability to also work with other messaging technologies, including Apache Kafka. DataStax also has an open source project called Starlight for Kafka that helps connect Kafka users to Pulsar.
DataStax is developing Starlight connectors for the RabbitMQ and JMS (Java Message Service) messaging services as well.
"We're really building out a streaming platform that can run anywhere and has compatibility with your existing system," Latimer said. "We'll be able to take the existing apps that you have, give you a drop-in replacement, and really just modernize your entire data in motion strategy."