Oracle launched MySQL HeatWave Lakehouse, a new cloud-based service designed to make lakehouse queries as fast and easy as database queries.
Oracle first previewed MySQL HeatWave Lakehouse in October 2022, making it available in beta testing at the time. It is now generally available.
While a new addition to the MySQL HeatWave portfolio, the MySQL HeatWave Lakehouse is not Oracle's first data lakehouse. Oracle also offers lakehouse capabilities in its Autonomous Data Warehouse -- a fully managed version of Oracle Database -- which serves a different user base than the MySQL HeatWave suite.
"Oracle Database is the higher-end corporate offering, and MySQL is ... for a variety of lower-budget users who still demand good database management system support," said Carl Olofson, an analyst at IDC.
Data lakehouses, first pioneered by Databricks but now available from a variety of vendors including Snowflake and Google, essentially combine the capabilities of data warehouses and data lakes.
Data warehouses excel at storing structured data such as financial and transaction records. Data lakes, meanwhile, use object storage and are designed to house unstructured data such as text and video files. But because both specialize in storing only some data, the result of deploying both can be isolated data that takes much manual labor to combine to get a more complete view of an operation.
Data lakehouses, however, are able to house both structured and unstructured data -- as well as semistructured -- so users can more easily combine diverse data types and better understand what is happening within their organization.
Because of their flexibility, Matt Aslett, an analyst at Ventana Research, said he expects data lakehouses to become significantly more widely used over the next two years.
He noted that object storage has become an inexpensive -- and common -- way for organizations to store data. But without structure, the data stored in data lakes is difficult to use to inform decisions. Lakehouses are a way to address this problem, enabling unstructured data to be combined with structured data.
"We are seeing growing interest in the lakehouse approach, especially among organizations that have already invested in data lake environments," Aslett said. "I assert that by 2025, 8 in 10 current data lake adopters will be investing in data lakehouse architecture to improve the business value generated from their accumulated data."
Beyond enabling users to easily combine diverse data types, lakehouses also automate much of the work to do so, which is key, according to Holger Mueller, an analyst at Constellation Research.
"Lakehouses are the revival of big data -- which got a bad reputation -- and the industry is running with it," he said. "Most importantly, [with lakehouses, big data] is finally working in an easy and automated way. It worked before, but when the consultants left [organizations to themselves], the projects tended to stop."
Oracle first unveiled the MySQL HeatWave Database in 2020.
MySQL HeatWave is an in-memory managed database service using the open source MySQL platform as its foundation, with Oracle adding its own capabilities on top. And though Oracle runs its own cloud, the MySQL HeatWave service is available on both AWS and Microsoft Azure in addition to the Oracle Cloud Infrastructure.
Since launching it three years ago, Oracle has enhanced the MySQL HeatWave service with MySQL Autopilot, a machine learning-powered automation capability designed to learn from past queries to improve the execution of future queries.
Holger MuellerAnalyst, Constellation Research
On July 20, the tech giant extended its MySQL HeatWave service beyond just databases to include data lakehouses. The move was significant, Mueller said, because it both brings together structured and unstructured data as well as speeds querying with Autopilot.
"Bringing structured and unstructured [data] together is a key achievement and benefit from an insights perspective," he said. "And the setup with Autopilot makes it easy and fast."
Aslett, meanwhile, noted that there are two approaches to data lakehouses.
One approach injects the functionality of data warehouses into the data lake environment to combine the capabilities of the two.
The other keeps data warehouses and data lakes somewhat separate, using the data lake for low-cost storage and then applying predetermined schema -- effectively giving structure to the data -- from an associated data warehouse to the previously unstructured data.
According to Oracle, MySQL HeatWave Lakehouse enables users to query data in object storage, but does not create a single environment, essentially taking the second approach to the lakehouse architecture.
One significant benefit of that approach is cost savings, since data does not have to be moved, according to Aslett.
"MySQL HeatWave Lakehouse enables users to query data in low-cost object storage from MySQL HeatWave without the cost and complexity of moving it to the database," he said. "The advantage of this approach is that it relatively inexpensively facilitates analytics on large volumes of data."
But there is a disadvantage, he continued.
Query speed can be slower when querying data in external object storage compared with querying data stored in-database. Oracle, however, asserts that it has eliminated that concern with its deployment of Autopilot.
"Oracle's claim that customers can query data in object storage as fast as querying data in the database is therefore significant," Aslett said.
Query speed is important because cloud platforms charge customers not only for the amount of compute power they consume, but also for the amount of time they spend using the service. Every second counts.
"The more time they spend in the cloud, the higher the bill," said Steve Zivanic, Oracle's vice president of database and autonomous services, product marketing. "Delivering at these accelerated speeds, [users] are going to get a lower bill. There's a purely economic rationale."
The impetus for developing MySQL HeatWave Lakehouse, meanwhile, came from customer requests, according to Nipun Agarwal, Oracle's senior vice president of MySQL Database and HeatWave.
He noted that when Oracle enabled users to bring analytic processing to MySQL, many had unstructured data in files that they were unable to use for analysis.
"It was a pain point, and we thought we could extend the capabilities of HeatWave to address it," Agarwal said. "We needed to combine object storage with MySQL data."
Just as development of MySQL HeatWave Lakehouse was driven by customer requests, the roadmap for the MySQL HeatWave suite will be based on feedback from customers, according to Agarwal.
That focus on customer requests is a good strategy, according to Olofson.
"Their best bet is to stay close to their users, hear what they say and watch how competitors may attempt to woo them away," he said.
In addition, Zivanic noted that Oracle has plans to make generative AI part of its entire data management and analytics portfolio in the coming months.
Mueller, meanwhile, said Oracle is among the more full-featured cloud database providers, and its capabilities frequently exceed those of its competition. Therefore, it has no obvious weakness when compared with peers -- so no obvious functionality to target for improvement.
"They are the fastest-innovating cloud database, and there is very little left [to add]," he said. "If there were an Oscar for database innovation, Agarwal and team would have won it the last few years."
Where there might be room for growth -- beyond the infusion of generative AI mentioned by Zivanic -- is by moving beyond data storage, Mueller continued.
"They might pivot into ... more data operations and application development," he said. "There's nothing really left on the database side."
Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.