Database vendor Rockset is providing users with the ability to query real-time streaming data with SQL.
Rockset, based in San Mateo, Calif., revealed the beta availability of new features for its real-time indexing database on Aug. 25. The features will be generally available later this year.
The new features will enable users to directly query an event stream, such as Apache Kafka or Amazon Kinesis.
Going a step further, Rockset is adding the ability to continuously transform data coming from an event-streaming source into different structure or formats that users can configure. For example, a Rockset user could analyze an event stream in real time for potentially sensitive data, and then transform that data to anonymize with a cryptographic hash.
The ability to directly query a real-time data stream for data analytics or business intelligence is often costly, according to Torsten Volk, an analyst at Enterprise Management Associates.
Volk noted that in many cases the cost of storing massive amounts of data for normalization, aggregation and querying for real-time data typically lessens the enthusiasm of application developers to add real-time data capabilities to their code.
"That Rockset allows developers to use standard SQL to transform, aggregate and query multiple data streams, while eliminating the need to store massive amounts of data on high performance volumes, almost sounds too good to be true," Volk said. "This could make developers much more willing to experiment and unleash the value of currently untapped data streams."
Querying real-time data with Rockset
Rockset users could query event stream data before, but the process was more cumbersome, requiring more storage, compute and time, according to Venkat Venkataramani, CEO and co-founder of Rockset.
Venkataramani explained that without the new query feature, a user would have had to ingest all the data flowing from the event stream, with each message becoming a database record in Rockset.
That data would over time take up increasing volume of storage space, as well as increasing compute power to operate. With the update, Rockset can now query the data stream itself.
"Now there is a reduction in the storage required, because now you're not materializing and storing the raw data stream," Venkataramani said.
Multiple streams while deduplicating data is an option
Often organizations will use multiple streams, which Rockset said it is prepared for.
Venkataramani said that his development teams have tested the new functionality on more than 1,000 concurrent data streams.
Torsten VolkAnalyst, Enterprise Management Associates
He noted that the Rockset query is able to get data from across all the streams without double counting of data or duplication.
Beyond duplication, another problem that can occur with multiple simultaneous data streams is that some data could be a bit later than others.
For example, Venkataramani explained that a connected sensor device might lose connectivity for a small amount of time and then reconnect, sending some data out of sequence. Rockset's system can account for that and ensure that late data is queried in the correct time sequence.
The overall goal for Rockset, according to Venkataramani, is to make real-time data easier to use.
"There is still quite a bit of innovation to come as we're still pushing the envelope and we're on a kind of a mission to make that happen, so that real-time data should be the default in every enterprise," he said.