Getty Images/iStockphoto

AWS Lake Formation advances cloud data lake control

AWS added new features to its cloud data lake service, featuring Governed Tables to manage consistency of data and row-level access control for security.

AWS on Tuesday revealed a series of updates to its AWS Lake Formation service that aim to enable organizations to better manage cloud data lakes.

AWS Lake Formation is a cloud data lake system that provides organizations with tools and capabilities to structure and manage data lake deployments using Amazon S3 object storage.

Lake Formation was first introduced in 2018, with general availability in 2019. The tech giant unveiled the new updates -- generally available now -- at its AWS re:Invent conference in Las Vegas, a hybrid event this year with live and streamed proceedings.

During the opening keynote, AWS CEO Adam Selipsky, who took over the vendor's top job in May, introduced the new AWS Lake Formation features, including row-level security, storage optimization and Governed Tables.

The general direction that AWS is taking with the new features is to enable users to more easily manage and secure data lake deployments on AWS.

The role of AWS Lake Formation and cloud data lakes

Constellation Research analyst Doug Henschen said he sees data lakes as serving an important role for organizations' data architectures.

"Data lakes help to eliminate data silos by providing a single source of truth that might be used by multiple systems," Henschen said.

Screenshot of AWS CEO Adam Selipsky
AWS CEO Adam Selipsky revealed a series of new data lake management capabilities during his opening keynote at the AWS re:Invent 2021 conference.

Henschen noted that if isolated applications change the data, organizations need to support updates from the multiple systems that tap lakes as a "single source of truth" or reliable unified data record. That's where the new Governed Tables features have a role.

Henschen explained that transactions for Governed Tables help to automatically manage conflicts and errors and ensure consistency and querying of transactions from multiple systems and data lakes.

Data lakes help to eliminate data silos by providing a single source of truth that might be used by multiple systems.
Doug HenschenAnalyst, Constellation Research

Why AWS is expanding AWS Lake Formation for cloud data lake management

During the keynote, Selipsky said AWS Lake Formation was created to make it easier for organizations to get a data lake up and running quickly.

"AWS Lake Formation helps you collect and catalog data from databases and object storage, move the data into your Amazon S3 data lake, clean and classify your data using machine learning algorithms, and then secure access to the sensitive data," Selipsky said.

While AWS Lake Formation has provided access control since its launch in 2019,

Selipsky noted that AWS users have asked for a more targeted and direct way to govern access to data lakes. To that end, Selipsky unveiled new row- and cell-level security capabilities for the data lake system.

Selipsky explained that users can now enforce access controls for individual rows and cells, with AWS Lake Formation automatically filtering data and revealing only the data permitted by policy to authorized users.

How Governed Tables improve AWS cloud data lakes

Governed Tables is a new type of table for data lakes, Selipsky said.

Governed Tables support ACID (atomicity, consistency, isolation and durability) transactions, which are designed to prevent data conflicts and ensure data consistency.

"As data is added or changed in S3, AWS Lake Formation automatically prevents conflicts and errors to give all users a consistent view of the data," Selipsky said. "So now multiple sources and data pipelines can keep updating data in real time while users are querying data instead of having to wait for data to be updated in batches."

Dig Deeper on Database management

Business Analytics
Content Management