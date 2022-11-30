AWS continued growing its cloud data capabilities with a series of features to help enterprises scale its database services and ensure data quality.

The features the tech giant revealed on Wednesday follow a series of updates it introduced on Nov. 29 at its re:Invent 2022 conference, including the new DataZone data catalog and governance service.

Among the new services AWS rolled out on Wednesday is the Amazon DocumentDB Elastic Clusters service intended to help document database workloads more easily scale up and down based on traffic requirements.

The Amazon Redshift cloud data warehouse also got a new multi-zone, high-availability configuration. AWS additionally brought data quality capabilities to the AWS Glue metadata discovery service.

AWS competes mainly against Microsoft Azure and Google Cloud Platform. Helping users easily manage scalability of database services is a challenge all three major public cloud vendors are addressing.

AWS and cloud technology itself are maturing, so there are fewer new areas for cloud vendors to push into, said Doug Henschen, an analyst at Constellation Research. The tech giants' shift to polishing services and filling gaps in their existing portfolios of services is understandable.

"One of those gaps was data quality. So the Glue Data Quality was a welcome -- and one could say overdue -- announcement," Henschen said. "It provides automated ways to generate data quality rules."

Henschen noted that if organizations previously struggled with data quality, they were likely already turning to third-party partners to provide data quality service through the AWS Marketplace.

Improving data quality in the cloud Organizations are now commonly using data lakes, often with Amazon S3 cloud object storage, as a foundational element of data analytics and business intelligence In a keynote at the conference, Swami Sivasubramanian, vice president of databases, analytics and machine learning at AWS, said a challenge with data lakes is that if organizations don't monitor the data quality, the lakes may become "data swamps." "Customers told us building [those] data quality rules across data lakes and data pipelines is very, very time consuming and very error prone," he said. The AWS Glue Data Quality service can generate automated data quality rules for data sets. The rules ensure the accuracy and freshness of data in a data lake or data pipeline, Sivasubramanian said. "Rules can be applied to your data pipelines so poor-quality data does not even make it to your data lakes in the first place," he said. The new service can run continuously; if data quality deteriorates for any reason, the organization is alerted.