kantver - Fotolia


Work Lambda into Amazon SQS queues to prevent backlogs

Amazon SQS enables users to process and track tasks in a queue. Combine it with Lambda and CloudWatch to add extra message processing functionality.

Amazon SQS is a managed service that sends messages between applications. Instead of applications having to invoke other applications directly, the service enables them to submit a message to a queue, which another application can then pick up at a later time.

With microservices on the rise, Amazon Simple Queue Service -- which is designed for tasks that process asynchronously -- has garnered a big developer following. Amazon SQS queues process events and ensure traceability among requests.

The role of a message queue

As an example, if you operate a restaurant, you might use a queue to place and manage customer orders. The kitchen staff would read the queue, and any number of cooks could pick up the next order, work on the order and then remove that order from the queue once it's complete. To alert the wait staff that the order is ready, the cook could place a message in another queue for waiter pickup.

A first-in, first-out queue can help with some workloads.
An example of a queue, which also applies to applications

You can apply this same type of pattern to any application and scale it to any level. At any time, you could add more waiters or cooks, depending on the best resolution for the backlog. Even better, with SQS, you can determine how many pending messages exist in any given queue. So, if you have 100 pending items in your kitchen queue, you could easily add a few more cooks, without having to scale up your entire infrastructure.

SQS can also provide first-in, first-out (FIFO) queues, which allow developers to guarantee that messages process in the exact order they are sent -- and process exactly one time. This is incredibly important for tasks that need to happen in a specific order or things that need to happen exactly once, such as financial transactions or order placement. For instance, you wouldn't want two cooks to pick up the same order and make double the amount of food.

Work past lacking Lambda support

AWS Lambda is also popular with developers lately. It frees development teams from server management, including server software, such as Apache, Internet Information Services or Nginx. Code doesn't need to run unless it's time to process a request, and you are only charged for the time it runs. Additionally, Amazon manages Auto Scaling for you, so if 100 orders suddenly come in, you can immediately have 100 cooks at that exact moment, without having to preplan for the sudden burst in orders.

Both Amazon SQS and Lambda can help manage scale, so it would make sense for Lambda to listen for SQS events. Unfortunately, that's still not possible, but Amazon has said it's working on it. For now, developers have to rely on alternative methods to use SQS with Lambda.

One common workaround is to have an Amazon CloudWatch cron-style rule that triggers a Lambda function to run every few minutes to read from an SQS message. However, while this approach helps process Amazon SQS messages, it also triggers Lambda functions, even when there are no SQS messages to be read.

A better approach is to set CloudWatch alarms for whenever there are messages in the queue. CloudWatch can trigger actions when they are in an alarm state, so you can set an alarm to go off when particular Amazon SQS queues contain more than zero messages. This triggers a Lambda function to run, which processes messages from those Amazon SQS queues.

Developers should set several alarms in case there are messages that continue to flood in or if a large number of messages require multiple Lambda functions. For example, a developer might set the following alarms to trigger a Lambda function:

  • ≥ one message, for five minutes, trigger Lambda;
  • ≥ one message, for 15 minutes, trigger Lambda;
  • ≥ one message, for one hour, trigger Lambda and send an alert to DevOps; or
  • ≥ 30 messages, for 15 minutes, trigger Lambda and send an alert to DevOps.

To better control Amazon SQS queues, DevOps teams can also receive alerts to trigger additional Lambda functions manually.

When a Lambda function ends with messages still remaining to process, it should trigger another instance of itself to ensure it reads all messages in the queue. CloudWatch only triggers alarms again when they go back to an OK status, so it's important that the queues eventually reach zero again for the Lambda events to be triggered once more.

Unfortunately, any time an application uses CloudWatch to trigger Lambda functions, it results in about a five-minute delay between when the SQS message is written and when it will be read. This led some teams to also directly trigger the Lambda function, which reads an SQS message from the code that writes the message. While the original application could simply write the event directly to Lambda, writing it to SQS first guarantees more traceability and retry logic.

At-least-once processing with FIFO queues

Amazon SQS guarantees at-least-once processing, which also means that Lambda functions that use regular SQS messages must be able to possibly process the same SQS message multiple times.

FIFO queues guarantee order, but they also guarantee exactly-once processing. This sacrifices some performance to ensure multiple processes don't read your message at the same time. But FIFO queues also require developers to provide a group ID if they want to enable multiple messages to process at a time. When added to Lambda processing, it's important for developers to set up a dead-letter queue (DLQ) with a maximum of a few attempts before a message transfers to a secondary queue.

Additionally, Lambda functions will time out after five minutes, so hide messages for the same amount of time that the Lambda function runs. For example, if you hide a message for 10 minutes, the second time a Lambda function runs to try to read a message, it won't be able to read any if there is a stuck message at the beginning of the queue. Messages only move to a DLQ if they time out, so a stuck message in the front of the queue can cause a critical backlog.

Lambda is also a queue

One of the most overlooked things about Lambda is that it essentially is an Amazon SQS queue -- but it's not a FIFO queue, so you can't guarantee order. Asynchronous Lambda requests automatically retry, and you can set up a DLQ to handle retrying requests later or to alert developers to problem messages. However, that DLQ can't trigger a Lambda function directly, so with one of these methods, developers can automatically pipe failed Lambda requests into another Lambda function.

There are many ways to use Lambda, and not every request needs to go through an SQS queue. AWS Step Functions are also easier for developers who require a process to happen in multiple different stages, but for developers who want more control, it is possible with the steps above.

Next Steps

Do you use the right AWS messaging service?

CloudWatch metrics help with SQS monitoring

Learn how to monitor microservices in AWS

Dig Deeper on AWS cloud development

App Architecture
Cloud Computing
Software Quality