HPE intros GreenLake data lakehouse fabric for analytics
HPE taps multiple open source technologies, including Kubernetes, Apache Spark and Delta Lake, for developing new data fabric services on its GreenLake cloud platform.
HPE said on Tuesday it is adding new unified data lakehouse capabilities to its GreenLake platform, with general availability of the new data services expected in early 2022.
HPE GreenLake is a hybrid cloud platform that enables users to run applications and services on premises and in the cloud. Notable among other new services HPE unveiled is the Ezmeral Data Fabric Object Store, which provides a Kubernetes-based storage technology that will run across hybrid environments.
HPE also introduced Ezmeral Unified Analytics, a cloud data lakehouse platform built with a group of open source technologies that provide a data fabric for users to run data analytics and business intelligence workloads.
Underlying the data fabric are several open source technologies, including the Apache Spark query engine and Delta Lake data lakehouse.
Delta Lake was originally created by Databricks and is now an open source project run by the Linux Foundation. With Ezmeral Unified Analytics, HPE is targeting the data lakehouse market now dominated by Databricks.
"Amidst the cloud hoopla, it's easy to forget that a lot of data will remain on premises for the foreseeable future, thanks to factors like data gravity and sovereignty requirements," said Kevin Petrie, an analyst at Eckerson Group. "The challenge therefore is not just figuring out how to optimize data workloads on the cloud, but also how to optimize them in hybrid environments that comprise edge, data center, cloud and multi-cloud infrastructure."
Kevin PetrieAnalyst, Eckerson Group
Petrie noted that HPE's new services aim to optimize both BI and data science workloads, with containerized application management and a data lakehouse that spans hybrid environments.
HPE looks to data lakehouse model with GreenLake
In the webcast event introducing the new GreenLake services, HPE CEO Antonio Neri emphasized the hybrid nature of the vendor's platform.
"We unify your data globally and make it available to all your analytics teams where the data is at the edge in an enterprise data warehouse, on premises, a cloud data lake, or on other cloud platforms such as Snowflake," Neri said.
Neri said that with the new Kubernetes-native object store, HPE is providing an Amazon S3-compatible API. He added that HPE's goal with the new GreenLake services is to enable users to combine different types of data from files, object event streams and databases into the same data fabric.
The main benefit is that organizations will be able to manage disparate sources of data in a single platform that accelerates time to insights, according to Neri.
Open source foundation
In a media briefing, Matt Maccaux, global field CTO for HPE Ezmeral Software, said an S3-compatible data layer is an important option for organizations.
"If you think about the need to be able to spin up a compute job somewhere, you probably want to think about spinning up ephemeral storage as well," Maccaux said. "We know that these applications are oftentimes written for the S3 API, so we have developed an object store that is deployable by the same runtime as the compute services, and then it stretches back the connection to the overall fabric."
By using the open source Delta Lake lakehouse technology, Maccaux said HPE is trying to provide an approach that won't lock organizations into a single vendor.
"We don't think it makes sense to go from one legacy proprietary stack, just to get locked into another one in a public cloud," Maccaux said.