3 microservices resiliency patterns for better reliability

The flipside of microservices agility is the resiliency you can lose from service distribution. Here are some microservices resiliency patterns that can keep your services available and reliable.

Joydip Kanjilal

Published: 28 Feb 2020

An application's resiliency is its ability to recover from failures. When building microservices, it's critical to consider how resilient those numerous distributed services are.

Microservices-based applications often have several dependencies -- including databases, back-end components and APIs -- that can potentially cause service call failures, which can be categorized broadly into the following:

Transient faults: These are usually intermittent and might bring the application down for a short duration of time -- usually only a few seconds. These include temporary network outages and missed requests.
Permanent faults: These can bring down the application for long, critical periods of time. These usually stem from severely disrupted services and permanent outages.

This tip examines three proven microservices resiliency patterns that boost fault-tolerance and enable applications to handle failures gracefully.

Retry

Microservices often have many dependencies, including databases, components, back-end services and APIs. Any of these dependencies can intermittently fail, and consequently create numerous service call failures. The retry pattern provides a solution to these transient errors.

During intermittent and instantaneous failures, the retry pattern creates a mechanism that reruns a failed operation a specified number of times. IT admins configure a specific number of retries and time intervals between them. This provides failed services a chance to invoke services one or more times until the expected response from the service is received, rather than simply shut down upon initial failure.

Remember to avoid chaining retry attempts, and only use this pattern for transient failures. Maintain diligent logs to later determine the root cause of these failures. Finally, give each service the time it needs to recover to prevent cascading failures and preserve network resources while the failed service recovers.

Circuit Breaker

While the retry pattern works for transient failures, teams still need a reliable microservices resiliency pattern that handles larger, long-term, permanent faults. If a retry mechanism accidentally invokes a severely damaged service several times until it gets the desired result, it could result in cascading service failures that become increasingly difficult to identify and fix.

The circuit breaker pattern creates a component that resembles a traditional electric circuit breaker. This component sits between requesting services and the services' endpoints. As long as these services communicate normally, the circuit breaker delegates messages between them in a closed state.

When a retried service request travelling through the closed circuit fails a predetermined number of times, the breaker opens the message circuit to halt service execution. During this open state, the breaker stops service execution and returns error messages to the requesting service for each failed transaction.

After a certain interval of time (known as the circuit reset timeout), the breaker works in a half-open state. During this time, the breaker calls closes the loop to check if connectivity between the two services has been restored. If the breaker still detects a single error, it will once again trip to the open state. Once the error is resolved, it recloses the loop as normal.

Design the circuit breaker such that it can examine the service failures and then change call strategies as appropriate. The circuit breaker must be thread-safe and asynchronous as well.

Avoid distributed transactions

When working on microservices-based applications, distributed transactions occur across several services. You can typically follow patterns like two-phase commit and sagas to handle such transactions in a microservices-based application. However, it is a good practice to avoid distributed transactions as much as possible in order to avoid problems that arise from their inherent complexity.

Correlation ID

In a typical microservices-based application, several services span different systems, possibly separated by large geographical distances. This means each service must log useful and meaningful data that specifies what it has been doing and details any failures. This requires a third microservices resiliency pattern geared towards service tracking.

A correlation ID pattern creates an identifier for each individual request. This can help you track the complete flow of each HTTP request through all communication channels. You can set the correlation ID as part of the HTTP request header and include it in every log message, which will help you to quickly find errors, warnings and relevant debug messages.

While a correlation ID will illustrate the flow of a request from the source to the destination, a log aggregator pulls together the logs from all your microservices for easier search and analysis. Popular log management tools include SolarWinds Security Event Manager, Loggly, Splunk or Logstash to monitor your logs.

Next Steps

Cloud design patterns to create resilient applications

3 microservices resiliency patterns for better reliability

The flipside of microservices agility is the resiliency you can lose from service distribution. Here are some microservices resiliency patterns that can keep your services available and reliable.

Retry

Circuit Breaker

Avoid distributed transactions

Correlation ID

Next Steps

Dig Deeper on Application management tools and practices

5 cloud design patterns to create resilient applications

How to maintain data center power systems

3 critical stops on the back-end developer roadmap

Monte Carlo introduces Circuit Breakers for data pipelines