StarRocks brings open source OLAP database to the cloud
The open source online analytical transaction processing DBaaS startup takes a different approach to accelerate queries as it moves to the cloud.
Open source online analytical processing database vendor StarRocks on Thursday launched a beta version of its database-as-a-service cloud offering.
StarRocks got its start in 2020 as a fork of the open source Apache Doris database, which is optimized for OLAP workloads.
To date, StarRocks has been available as an open source technology that organizations can run and manage on their own. With the StarRocks Cloud service, the startup joins the growing move toward open source DBaaS platforms during the past two years.
The StarRocks database differs from other OLAP databases in that it doesn't require organizations to transform data from the star schema format for database tables to what are commonly referred to as denormalized tables in order to optimize analytics queries.
To meet performance requirements for OLAP, databases typically need to denormalize tables. The problem is that denormalization makes data bulkier, pipelines more complex, and tables slower to update, said Kevin Petrie, an analyst at Eckerson Group.
Kevin PetrieAnalyst, Eckerson Group
"Enterprises struggle to support real-time business intelligence use cases such as reporting and dashboards," Petrie said. "StarRocks aims to speed up queries without denormalizing tables. The hope and opportunity is to meet OLAP performance requirements in a simpler, cleaner way."
StarRocks competes against a number of different databases, including Apache Pinot, a cloud version from vendor StarTree, and the Rockset real-time indexing database. StarRocks still needs to get its cloud service live on multiple cloud providers. The initial target is AWS, with others to follow.
How StarRocks accelerates the OLAP Database
Li Kang, vice president of strategy at StarRocks, said the vendor's database does not denormalize data and instead has developed a new query engine that accelerates analytics queries on star schema data.
A challenge with denormalization is that it makes updating and deleting data more complex. Kang said StarRocks' data query engine also addresses that problem with a real-time data access approach.
The underlying architecture of StarRocks is what is known as a massively parallel processing database, which runs multiple operations at the same time to enable scalability and high performance.
To further accelerate queries, StarRocks has also built its own cost-based optimizer (CBO) for query execution. The CBO optimizes queries based on how the data is distributed and the database tables are structured.
When it becomes generally available, the StarRocks Cloud will have two different deployment options available for users. One is as a managed service, where users deploy StackRocks into their own virtual private cloud.
The other option is sometimes referred to as the serverless DBaaS approach. In this mode, StarRocks is a service that an organization subscribes to, with all the infrastructure resources managed by StarRocks in the cloud.