Design patterns that govern cloud-based applications aren't always talked about -- until companies reach a certain scale. While there are countless design patterns to choose from, one of the biggest challenges of doing so is dealing with scale when it becomes necessary.
Rapid growth is a blessing and a curse for any application, providing both increased revenue but also increased technical challenges. To better scale, there are a number of design patterns that can make any cloud-based application more fault-tolerant and resistant to problems that often come from increased traffic.
The following five cloud design patterns help developers better handle unexpected increases in throughput.
Named after the divided partitions of a ship that help isolate flooding, the bulkhead pattern prevents a single failure within an application from cascading into a total failure. While the implementation of this pattern in the wild isn't always obvious, it is typically found in applications that can operate under some sort of degraded performance.
An application that implements the bulkhead pattern is built with resiliency in mind. While not all operations are possible when email or caching layers go down, with enough foresight and communication to the end user, the application can still be semi-functional.
With isolated application sections that can operate independently of one another, subsystem failures can safely reduce the application's overall functionality without shutting everything down. A good example of the bulkhead pattern in action is any application that can operate in "offline mode." While most cloud-based applications require an external API to reach their full potential, fault-tolerant clients can operate without the cloud by relying on cached resources and other workarounds to ensure the client is marginally usable.
In many applications, failure is a final state. However, in more resilient services, a failed request can potentially be re-sent.
The retry pattern, a common cloud design pattern when dealing with third-party interactions, encourages applications to expect failures. Processes that implement the retry pattern create fault-tolerant systems that require minimal long-term maintenance. These processes are implemented with the ability to safely retry failed operations.
The retry pattern is often seen in webhook implementations. When one service tries to send a webhook to another service, that request can do one of two things:
- Succeed. If it succeeds, then the operation is completed.
- Fail. If it fails, the sending service can resend the webhook a limited number of times until the request is successful. To avoid overloading the target system, many webhook implementations will use an incremental backoff, gradually adding time delays between each request to give a faulty destination time to recover before giving up.
The retry pattern only works when both the sender and receiver know that failed requests can be re-sent. In the webhook example, a unique identifier for each webhook is often provided, allowing the receiver to validate that a request is never processed more than once. This avoids duplicates while also making it possible for the sender to experience its own errors that could erroneously re-send redundant data.
Dealing with scale can be an incredibly nuanced problem in cloud-based applications, especially with processes with unpredictable performance. The circuit breaker pattern prevents processes from "running away" by cutting them short before they consume more resources than necessary.
To illustrate how this cloud design pattern works, imagine you have a web page that generates a report from several different data sources. In a typical scenario, this operation may take only a few seconds. However, in rare circumstances, querying the back end might take much longer, which ties up valuable resources. A properly implemented circuit breaker could halt the execution of any report that takes more than 10 seconds to generate, which prevents long-running queries from monopolizing application resources.
Queue-based load leveling
Queue-based load leveling (QBLL) is a common cloud design pattern that helps with scale problems as an application grows. Rather than performing complex operations at request time -- which adds latency to user-exposed functionality -- these operations are instead added to a queue that is tuned to execute a more manageable number of requests within a given time period. This design pattern is most valuable in systems where there are many operations that do not need to show immediate results, such as sending emails or calculating aggregate values.
For example, take an API endpoint that must make retroactive changes to a large dataset whenever it is executed. While this endpoint was built with a certain threshold of traffic in mind, a large burst in requests or a rapid growth in user adoption could negatively affect the latency of the application. By offloading this functionality to a queue-based load leveling system, the application infrastructure can more easily withstand the increased throughput by processing a fixed number of operations at a time.
An alternative design pattern to QBLL is the throttling pattern, which centers on the concept of the "noisy neighbor" problem. While the QBLL pattern offloads excess workloads to a queue for more manageable processing, the throttling pattern sets and forces limits on how frequently a single client can use a service or endpoint to keep one "noisy neighbor" from negatively impacting the system for everyone. The throttling pattern can also supplement to the QBLL pattern, which allows for the managed processing of excess workloads and ensures the queue depth doesn't become too full.
Looking back at the QBLL example, let's say that the API endpoint could originally handle about 100 requests per minute before the heavy work was offloaded to a queue, while an API can support a maximum throughput of about 10,000 requests per minute. Ten thousand is a huge jump from 100, but the queue will still only be able to support about 100 requests per minute without any noticeable impact on the end user. This means that 1,000 API requests would take about 10 minutes to fully process, and 10,000 API requests would take almost two hours.
In a system with evenly distributed requests, every user would experience slower processing equally, but if a single user sends all 10,000 requests, then all other users will experience a two-hour delay before their workloads even get started. A throttling schema that limits all users to 1,000 requests per second would ensure that no single user could monopolize application resources at the expense of any other user.
The 6-month rule
It can be incredibly difficult to scale a cloud-based application. Often, IT teams must choose between implementing a design pattern that can support application growth for another six months, or a design pattern that can support application growth for another six years.
In my experience, options that fall under the six-month timeline are the most cost effective. Spend a few weeks to buy yourself six months that will support the needs of the business and users. It's more effective than spending a year building a more robust system that is much harder to change.
A midterm focus is not the same thing as shortsighted hacks and Band-Aids. The careful implementation of common design patterns can support the long-term maintenance of an application while also being flexible enough to adapt as circumstances change.