Cloud giant AWS on Tuesday launched a series of new data services aimed at integrating the movement of data between AWS services.
During the opening keynote on the second day of the AWS re:Invent 2022 user conference, AWS CEO Adam Selipsky outlined a series of new services designed to help organizations better use data in the cloud.
Among the services is Amazon DataZone, which provides data management capabilities including data catalog and data governance.
Another new data management service is a serverless version of the Amazon OpenSearch service that is now available in preview.
Also in preview are a pair of services designed to enable better data integration without the need for extract, transform and load (ETL) operations across AWS cloud data services. The new capabilities include the Amazon Aurora zero-ETL integration with Amazon Redshift and Amazon Redshift integration for Apache Spark.
AWS competes against Microsoft Azure and Google Cloud Platform. A continuing challenge for AWS is the ability to support third-party data sources that reside outside its cloud platform.
AWS is trying to be all things for all people, said Hyoun Park, an analyst at Amalgam Insights.
The additions revealed at re:Invent 2022 all relate to Amazon needing to be more open, scale faster and provide more context to users, Park said. And the combination of serverless capabilities and increased data source support all respond to the need to translate data into business insights.
"These announcements go hand in hand with Amazon's focus on supply chain, which requires the translation of massive amounts of data into practical analysis and recommendations and shows Amazon's continued role as a data and analytics provider working at massive scale," Park said.
AWS looks to ease cloud data integration
Hyoun ParkAnalyst, Amalgam Insights
Selipsky said during his keynote that a goal for AWS is to build integrations between its services to make it easier to do analytics and machine learning without having to deal with ETL tasks.
"What if we could eliminate ETL entirely? That would be a world we would all love," Selipsky said. "This is our vision for what we're calling a zero ETL future."
He noted that one area where organizations spend time building and managing ETL pipelines is between transactional databases and data warehouses.
To reduce the need for ETL in that use case with AWS cloud data services, the cloud giant built the new Amazon Aurora zero-ETL integration with Amazon Redshift service. The goal of that service is to let users of the Amazon Aurora relational database easily move data back and forth to the Amazon Redshift data warehouse.
"This integration brings together transactional data with analytics capabilities, eliminating all of the work of building and managing customer data pipelines between Aurora and Redshift," Selipsky said.
The Amazon Redshift integration for Apache Spark further reduces the need for ETL. Apache Spark is a widely used analytics query engine. Previously, Redshift users needed to migrate data to a different location, such as an Amazon S3 data lake, to run Spark queries.
With DataZone, AWS wants to govern cloud data
AWS is also taking aim at the data catalog and governance market with the preview of Amazon DataZone. DataZone lets users catalog, discover, share and govern data across an organization.
DataZone provides a data catalog accessible through a web portal where users within an organization can find data that can then be used for analytics, business intelligence and machine learning. All the data in DataZone is governed by access and use policies that the organization can define, and data lineage is also tracked.
"To unlock the full power and full value of data, we need to make it easy for the right people in applications to find access and share the right data when they need it," Selipsky said.