In a move aimed at keeping current with the latest data management technology, Oracle on Tuesday launched a beta release of its new MySQL HeatWave Lakehouse service.
MySQL Heatwave is a cloud service that Oracle first brought to market in December 2020, combining analytics and transactional database capabilities into a unified offering.
Over the last two years, Oracle has steadily iterated MySQL Heatwave, adding autopilot and machine learning capabilities. Now Oracle is taking the service a step further, with the new MySQL HeatWave Lakehouse beta it unveiled at its CloudWorld user conference in Las Vegas on Tuesday that enables users to access cloud object-based storage, including data lakes.
The basic idea behind a lakehouse is that it combines data warehouses and data lakes, enabling analysis for queries on large volumes of data.
Oracle is following a larger trend of enabling data lakehouse capabilities that has emerged in the market in recent years. Databricks is the vendor that pioneered the concept and the term data lakehouse, while other vendors including Snowflake, Dremio, Google, Starburst and Cloudera now have services that support data lakehouses.
Instead of just being limited to data stored directly within MySQL, the lakehouse concept enables access to external data located in object storage, which provides more overall capacity.
It's an approach that Ron Westfall, analyst at Futurum Research, said he sees as helping Oracle to better compete in the market by providing a service that can query data from cloud object storage.
"The ability to provide analytics on hundreds of terabytes of data in object store delivers built-in capacity advantages into the foreseeable future, exceeding even the most demanding capacity environments today with abundant clearance," Westfall said.
The evolution of MySQL HeatWave to the data lakehouse
Oracle's move to a data lakehouse for MySQL HeatWave is a recognition that more data organizations often want to work outside of a database in other data repositories such as data lakes and cloud data warehouses.
"Although we do get a lot of customers migrating into MySQL Heatwave, we do recognize that there is a huge growth of data outside the databases in files," said Nipun Agarwal, senior vice president of research and advanced development at Oracle.
The MySQL HeatWave Lakehouse supports multiple file types that are stored in cloud object storage systems including the open source Apache Parquet format, comma-separated values (CSV) files, and as backups from Amazon Aurora and Redshift data warehouse services.
MySQL HeatWave Lakehouse users will be able to run queries that cut across data in the database and combine it with data that's in the object store.
How the MySQL Heatwave data lakehouse works
The MySQL database has long had the InnoDB storage engine, which is optimized for performance, as a primary option for users to store and access data.
When data is read from a MySQL database table, the MySQL HeatWave service transforms the data into an in-memory representation that's optimized for scalability. The in-memory representation includes all the same data as what is in the database table -- running on system memory, rather than needing to load first from storage.
With the new lakehouse capability, MySQL HeatWave is taking a similar approach for cloud object storage data.
Data from cloud object storage, such as a CSV file, is first read by the MySQL HeatWave Lakehouse service and then transformed into an in-memory representation. The transformation piece is enabled in part with an auto schema inference capability that automatically takes information from a cloud object storage file and converts it into a schema with the right column structure.
Different cloud object storage systems and file formats can also potentially have different performance characteristics that can affect the loading of data into MySQL HeatWave for queries. To address this, Oracle is providing a feature it calls adaptive data flow that automatically adjusts to the performance of the underlying object storage system, wherever it is running.
The purpose of the optimizations, with auto inference and adaptive data flow, is to speed up object storage queries in the lakehouse.
"There is no performance hit regardless of whether the data is in InnoDB or an external object store, " Agarwal said.