pathdoc - stock.adobe.com
An architecture that fails to align with performance expectations and business needs can suffer significant operating costs, lose the ability to scale for predicted workload volumes, build up significant technical debt and cause pervasive application performance issues. To that end, it's important to regularly verify that your architecture's expected performance and projected growth adequately align with business needs and contextual requirements.
Napkin math can be a useful and relatively quick means of identifying potential architectural bottlenecks and establish a rough estimate of a system's anticipated performance. In this article, we'll explore how architects can use napkin math effectively, including some use case examples, and review some of the important do's and don'ts to keep in mind.
Identifying business expectations
Before delving into specific napkin math calculations, it is important to understand what the business fundamentally needs to achieve. To determine this, architects should be able to answer high-level questions like:
- Does the business need to prove a product's ability to compete within a market?
- What are the expected traffic volumes and data transaction amounts?
- How much data will the applications within a system typically generate?
- How much data will applications and services consume during operation?
- Are there any related SLAs, such as minimum uptime requirements or expected response times?
It is especially important to identify any predictable spikes in traffic or transaction rates -- for instance, a web service for filing taxes will undoubtedly see a massive surge of traffic during tax season. Also, trying to estimate how the service or application's scale or resource requirements will increase year over year, which will help you anticipate the need for any major redesigns later. Outlining these aspects will help clarify some of the nonfunctional requirements that will guide design choices.
Design validation with napkin math
Although it is not a purely software-specific technique, expert systems engineers like Simon Eskildsen have endorsed it as a way to help navigate the process of system design using calculations architects can perform with just a pencil and paper. These calculations serve as a guiding light as they help narrow down choices within the boundary of an architecture's unique constraints and experiment with alternate design approaches.
Napkin math shouldn't be used to predict the performance of an architectural design authoritatively, as the process doesn't involve actual financial calculations or account for specific real-world implementation factors. Rather, it is a way to experiment with certain design choices on paper using tangible data points. However, it can provide an effective mental starting point for large-scale architecture builds or redesigns.
Sample problem: Client-side or server-side?
Imagine you want to implement a new internal search system for a website that hosts blogs. However, you're unsure whether to pursue a client-side or server-side implementation. A client-side implementation could save time by freeing the server from pre-search data processing responsibilities, but only if the client is capable of handling that data. Alternatively, a server-side implementation eliminates the need for end clients to process the data but could add unnecessary time to the process if the clients can easily handle the data themselves.
To measure the amount of data involved if a user wants to search through their own content, we'll estimate that there's an average of 100 posts per user. A typical post is 1,000 words long, and each word contains an average of five characters. If each character represents one byte of data, this factors out to 0.5 MB of data per user (1,000 words x 100 entries/user x 5 bytes/characters = 500,000 bytes). Assuming we can compress the average data payload to half the size, we can calculate that each search involves about 0.25 MB of data.
It takes 1 millisecond (ms) to compress the required data, 2.5 ms to transfer the data over the network and 2 ms for the client to decompress the data. So, we can gauge that this process will take roughly 5.5 ms from start to finish, which is typically a reasonable turnaround time for this kind of search process. Of course, it's worth noting that this calculation admittedly overlooks other factors like read/write times. However, even against the conservative estimates used in this example, it seems sensible to implement a client-side search and still expect a fast response time.
Let's contrast this same situation against a 10-year-old news site that has published an average of 30 articles per day since it began. Like the blog posts from earlier, each article is about 1,000 words in length at five characters per word. This amounts to 109,500 articles (30 articles/day x 365 days x 10 years) and 5,000 characters per article (1,000 words x 5 characters). If each character represents one byte of data, it means that a search of all the site's content requires the system to index 547.5 MB of data (109,000 articles x 5,000 bytes/characters = 547.5 million bytes).
We can compress the data payload down to 273 MB, which will take an average of 2 seconds to compress, 4 seconds to decompress and 10 seconds to transfer over the network. This means the average turnaround time for transferring the indexes to the client for processing is about 16 seconds, which is often beyond the limits of acceptable response times. In this instance, it's likely better to implement a server-side search system that indexes the data on its own before responding to client requests. Otherwise, the time it takes to transfer indexing information to the client could result in troublesome timeouts and bottlenecks.
Techniques to improve napkin math calculations
Napkin math isn't a perfect science, and works best when assessing systems that are relatively low in complexity and segmented by clear contextual boundaries. There are a few important things to keep in mind when performing napkin math assessments:
- If possible, apply calculations at the component level with due consideration to its relationship with other components (especially within large software systems).
- Devote time to learning the measurable nuances of back-end components, including their individual procedures, dependencies, overhead costs and potential impacts on performance.
- Perform benchmarking tests on various hardware configurations to gain a more realistic picture of actual performance.
- Test to see if the result of your performance calculations will remain within bounds when faced with 10 times the workload, or if you might need a major redesign down the road.
- Compare theoretical results garnered from your napkin math against actual performance numbers and investigate any areas where they differed significantly.