buchachon - Fotolia
ChaosSearch brings SQL to cloud data lake platform
The cloud data lake engine provider expands beyond log search, with a multi-model API that now includes SQL query execution for analytics and BI.
ChaosSearch expanded its namesake data platform with an update available now that provides new APIs that enable users to use SQL queries on cloud data lake storage.
The vendor, based in Boston, has built out its cloud data lake platform over the past year, launching its 2.0 platform with an ElasticSearch API in 2019.
ChaosSearch technology enables organizations to organize and query data stored in cloud object storage, such as Amazon S3. With the ElasticSearch API, ChaosSearch helped with log data searches and now the platform is being expanded with a SQL API that will expand the platform to support analytics and business intelligence technologies.
Among the organizations that use the ChaosSearch data platform is educational technology vendor Blackboard, based in Reston, Va. Joel Snook, director of DevOps engineering at Blackboard, explained that Blackboard's SaaS offerings are deployed in multiple AWS regions across the globe, producing hundreds of terabytes of ingestible logs a month.
"Our initial driver for moving to ChaosSearch was to centralize into one solution across multiple product lines with a familiar look and feel to an ELK stack [ElasticSearch, Logstash, Kibana] which the team was most familiar with," Snook said.
Expanding ChaosSearch with SQL
Snook noted that Blackboard uses multiple business intelligence products in its environment, but that the BI tools don't overlap with the log dashboard capabilities from ChaosSearch.
With the new SQL capabilities in ChaosSearch, Blackboard will have an opportunity to consolidate processes and use ChaosSearch as a data engine for more than just log data.
Joel SnookDirector of DevOps engineering, Blackboard
ChaosSearch CEO Ed Walsh noted that data consumers generally want to use their own tools to analyze data, but still need access to the data.
With the increasing use of cloud data lakes, requiring users to copy and move data into a separate tool is not a scalable or efficient approach, Walsh said.
He explained that with ChaosSearch, data in a cloud data lake is not moved or transformed. Rather, ChaosSearch overlays on top of it with a data index to help identify data sets, and an API layer that enables access.
Enabling ChaosSearch SQL with Presto
For the SQL queries, Walsh said ChaosSearch supports a Presto API to connect an organization's existing analytics and BI tools to query data in a ChaosSearch-enabled cloud data lake.
Walsh noted that many popular BI tools including PowerBI, Looker and Tableau, have a Presto connector to support SQL queries.
Presto is an increasingly popular open source query engine technology originally developed at Facebook. There are currently two different versions of Presto: PrestoDB and Trino, which was formerly known as PrestoSQL. ChaosSearch supports both versions on its data platform.
Walsh said adding SQL to the ChaosSearch data platform is part of a broader effort to enable what he referred to as a multi-model approach for analyzing cloud data lakes.
The first model is the ElasticSearch API, and the new model is SQL, with more query models to come including one for machine learning that is currently in development, with availability expected in 2022.
"What we're saying is multi-model is different APIs and they're all open," Walsh said.