AWS adds serverless capability to S3 object storage

Amazon introduced a feature to process data as it is being retrieved from S3. This allows use cases such as on-the-fly format conversion or PII redaction.

Johnny Yu

Published: 18 Mar 2021

AWS users can now provide multiple views of the same data set without generating another copy of the data.

Amazon launched S3 Object Lambda on Thursday, enabling users to add code to data retrieved from S3 storage. This allows the data to be processed before it reaches the application that called it, enabling use cases such as personally identifiable information (PII) masking and compressing or decompressing files as they are downloaded.

This function is particularly helpful in situations where multiple applications need different views of the same data. Normally, each application would need its own customized version of the data, such as a "clean" version that's been scrubbed of PII or an "enhanced" version that has information from other services or databases.

Amazon S3 Object Lambda saves users from needing to generate these extra copies, which take up storage space. Additionally, the Lambda function is performed along a standard S3 GET request, so there is no coding change necessary at the application level. Other ways Amazon S3 Object Lambda can modify data as it's being called include resizing images, converting data formats and implementing custom access rules.

Customers wanted a redaction function built into S3, but it wasn't so simple, as each customer had slightly different definitions of redaction, said Kevin Miller, general manager of Amazon S3 at AWS. Some wanted to remove a column, others wanted to exclude a whole line if certain text matched, and others wanted to change the data, but not remove it.

"We took a step back and said that there was probably something more generic we could do, rather than building one-off functionality for each use case. That led to the idea of S3 Object Lambda," Miller said. Amazon S3 Object Lambda is available now in all AWS regions except for Asia Pacific (Osaka), AWS GovCloud (US-East), AWS GovCloud (US-West), China (Beijing) and China (Ningxia). Customers are charged for the compute required to execute Lambda changes and for the data called by applications.

Writing proxy Lambdas for object requests is a common bit of toil for serverless developers. This is one of those features you only get by listening closely to customers.

Ryan MarshCEO, TheStack.io

The new AWS service helps developers around specific tasks, said Ryan Marsh, CEO of TheStack.io, a consulting firm in Houston that specializes in digital transformation and serverless. For example, adding unique watermarks to files per user to check if content is shared outside the platform and redacting specific information from data could be done without S3 Object Lambda, but there are a lot of extra hoops to jump through and gotchas to avoid, Marsh said.

"Writing proxy Lambdas for object requests is a common bit of toil for serverless developers. This is one of those features you only get by listening closely to customers," Marsh said.

The release of S3 Object Lambda represents an industry trend toward moving compute closer to APIs and to the data, Marsh added. He has observed companies are already offering serverless functions triggered by actions in their product.

"Distributed, bite-sized, event-driven compute is the future of software development in the cloud," Marsh said.

AWS adds serverless capability to S3 object storage

Amazon introduced a feature to process data as it is being retrieved from S3. This allows use cases such as on-the-fly format conversion or PII redaction.

Dig Deeper on Cloud storage

How to Create an AWS Lambda Function with CloudFormation

Explore top AWS storage types for file, block, object

Compare AWS Cloudtrail vs. Config for resource monitoring

IT pros see AWS SaaS Boost as a rudimentary start