4 critical API caching practices all developers should know
API caching can increase the performance and response time of an application, but only if it's done right. Learn about some of the caching tools and techniques worth implementing.
When implemented correctly, API caching can reduce an application's load and increase responsiveness. But without proper implementation and testing, caching problems can lead to unmanageable loads, cascading failures and ultimately the breakdown of an application.
Many management tools -- including open source tools -- can easily integrate with an application to perform API caching processes. With the right combination of tools and techniques, development and testing teams can ensure caching works properly and doesn't unnecessarily drain application performance.
What is API caching?
API caching is a process that places commonly requested objects in a secondary data store to avoid continuous calls to a primary database or any other type of data store. A cache's primary advantage is processing speed, as it allows an application to fetch commonly requested objects from sources that it can access quickly and easily.
Choosing between a primary data store and a cache comes down to speed vs. size. Data in a primary database might have more structure and searchability but still can be harder to access than a dedicated cache.
Determine baseline performance
When integrating API caching into an application, developers should consider testing a forefront concern from the beginning. For one, it's important to go into it with an understanding of performance benchmarks, specifically to compare an application's performance with caching against its performance without caching enabled.
To begin, developers can create load tests targeted at API requests using tools such as Apache JMeter or Locust. These two open source tools let developers scale the number of API requests to simulate various request loads from different types of users. The results of these early load tests can provide an upfront benchmark of the application's performance.
A computer's network bandwidth, latency and processing power can have a significant impact on the amount of request loads generated. Developers must keep this in mind when comparing load test results, as the results from one run to another might not prove a valid comparison. To avoid these sorts of discrepancies, consider adding a cloud-based load testing tool that uses stable, isolated servers that offer consistent network bandwidth and latency. BlazeMeter or CloudTest are a couple good examples of tools that can do this.
Run test scenarios for requests
After getting a baseline benchmark, developers can implement caching and reassess the application's performance. Ideally, the application's ability to handle load under stress should improve -- and, hopefully, its overall performance will too. Regardless of performance, however, teams should also validate the responses that requests return to ensure the cache is behaving properly.
One way to confirm this is to create test scenarios that check for updated values, which developers can run in just a few steps. Here's an example:
- Configure a group of requests to specifically use the application's cache exclusively.
- Make an update to a value located within the application's primary database.
- Send a request to the application after the cache's expected expiration time to validate that the updated value was returned.
Depending on cache implementation, developers can also run test scenarios to validate certain features.
Use key-value stores
Many open source caching tools -- Memcached being an example -- use a key-value approach to fill the cache in memory as requests come through. Before a value exists in the cache, an application checks the cache for the specified key, which identifies the objects to return as part of the response.
If there is no key present in the cache, the tool will query a database and provide a response for the cache to use, along with the expected key. Subsequent requests for that same key won't require a query to the database, as they are now stored in the cache.
Avoid the thundering herd problem
Imagine there are 10 servers each serving the same webpage application. The webpage is stored in a cache, with the cache set to expire every five minutes to ensure users consistently see the most recent version of the page. The cache could expire while those 10 servers are still under heavy load, leading each server to simultaneously query the cache, find no webpage and attempt to directly access the primary database.
Caching under a heavy load like this -- particularly in a distributed system -- can lead to the so-called thundering herd problem. Allowing 10 servers to query the database at once creates a heavy load, and a computationally intense query could easily cause a cascading number of requests to time out as the database continues to struggle. Furthermore, when those failed requests retry, they'll continue to put even more load on the database and potentially render the application useless.
Fortunately, there are a few ways to avoid a thundering herd scenario. For one, lock the cache to ensure only one process can update the cache at a time. With the lock in place, applications trying to update the cache can use previously stored values until the update is complete. Developers can also use an external process to update the cache, rather than relying on the application itself.
Another useful way to avoid a thundering herd is to update the cache expiration to a predicted value as the cache's expiration time nears. In a case like this, applications that rely on the cache can also calculate expected expiration times and provide a better way to ensure they don't all expire at once.