This content is part of the Essential Guide: How to achieve application performance improvement in ops

Application performance metrics and tools fit for modern architectures

Distributed applications and cloud hosting have led to new application interactions and KPI categories -- and different performance concerns need different metrics.

With cloud scaling and web-native architectures, classic application performance metrics are evolving or getting rapidly eclipsed. IT teams should reconsider which metrics to apply so that they know if the application meets user expectations.

Average request response time, resource utilization rates and traditional service-level agreements (SLAs) for application availability no longer present the most instructive, accurate picture of user experience (UX). Instead, IT organizations must see what the user sees, via experience-focused metrics and performance monitoring technologies, to adjust the application and infrastructure accordingly.

Not-so-average response time

Average response time is no longer a reliable application performance metric, said James Marcus, vice president of technical operations at Magnetic, an AI company that makes advertising more effective (adtech).

"In the last five years, online traffic has surged, and with that comes new challenges for adtech companies or any company operating a business online," Marcus said. When an application sees hundreds of thousands of requests each second, average response time is no longer a good indicator of performance, he said. Instead, companies like Magnetic look at the response time for the 99.99th percentile of traffic.

"This metric provides insight into the 0.01% slowest requests. These are the requests we need to understand to tune so that our infrastructure is capable of 10 times scaling on a moment's notice," he explained. Customers rely on the speed of the company's technology to load ads with online content seamlessly. When Magnetic tunes performance on the 0.01% slowest requests, it zeroes in on the ads that could cause user disruption with slow load times.

Another approach to gauge UX is the Application Performance Index (Apdex) score, said Kiran Chitturi, CTO architect at SunGard Availability Services. Apdex scoring helps define a satisfactory threshold for request response time.

"The ratio of satisfactory response times to unsatisfactory response times can be used to measure the user experience," providing a holistic view of interaction with the company's application, he said.

Rather than gather supporting application performance metrics, such as network latency or query response time, it targets user satisfaction directly.

Infinite resource utilization

As application architectures evolve, IT teams need better autoscaling strategies on cloud resources that fit fluctuating workloads rather than linear ones.

"If your application workload changes track against a utilization metric in a linear fashion to the number of instances, a simple scaling policy may not be the most efficient way to manage your infrastructure," Chitturi said. Target tracking scaling policies, such as those for AWS' EC2 Auto Scaling capability, accommodate a fluctuating load pattern. To best match resource availability with the load, monitoring must change as well.

Standard cloud provider's monitoring options might not be appropriate to inform these refined application performance metrics, Chitturi said. Examples include when an organization wants to track memory consumption or metrics broken down by individual processes on servers.

Slay the SLA

Traditional SLA tracking predominantly consists of availability metrics with some attention to latency and consistency. A more holistic approach should include business key performance indicators (KPIs), such as the number of orders or payments processed, and end-user experience measures, such as time to interactive, which is the time it takes for the page to load sufficiently so that a user starts to interact with it, and Apdex scores on satisfaction and errors.

It used to be that, if the [SLA] number looked good, you shouldn't complain. Now, it doesn't matter what the number says -- it is what the end-user experience says.
Edwin YuenSenior analyst, Enterprise Strategy Group

"This helps track the metrics that matter directly to the business, the end users and the bottom line," Chitturi said.

Contrasting newer approaches, such as Apdex scores, with older standards, like CPU, memory and storage usage, Edwin Yuen, senior analyst at Enterprise Strategy Group, said the focus is no longer on simple standards of general health, analogous to a doctor's temperature and blood pressure checks. Instead, IT teams are looking for symptoms that could indicate a problem.

Similarly, with SLA tracking, "it used to be that, if the number looked good, you shouldn't complain," Yuen said. "Now, it doesn't matter what the number says -- it is what the end-user experience says."

If performance issues make an application difficult to use, Yuen said, there's no other measure of success to fall back on. "The main problem is the iPhone generation: What people see on the screen is all they see." So, IT teams must start at the end users and trace back from there.

Monitoring technologies have enabled a shift in the ability to measure what people experience when they use a given service or application.

APM tools adapt to users' reign

The change in application performance metrics spawned a shift in application performance monitoring (APM). APM vendors fall into two general categories: emerging companies natively focused on end-user experience monitoring and traditional monitoring providers that must realign from a network and infrastructure focus to think of the user.

Emerging and cloud-native vendors, such as New Relic, start with the application experience and then move down into the code, Yuen said, but can have trouble with traditional IT monitoring. Established vendors, as characterized by BMC, have pivoted toward the newcomers' style of monitoring. Network monitoring companies, for example, can largely see what the end user is experiencing through the lens of the network.

End-user experience is top of mind for Mark Kaplan, director of IT at The Barbri Group, which provides a bar review course. Barbri handles massive increases in traffic as exam dates approach. The company experienced monitoring complexity with its two on-premises, failover-enabled data centers and lacked monitoring dashboards to quickly assess performance. Barbri migrated to a container-based, cloud-hosted setup and discovered monitoring had to change to keep pace. "Everything we had built for monitoring was oriented toward on premises and not the cloud," Kaplan said.

Barbri picked Dynatrace to monitor traffic, customer flows and customer experience in the cloud and on premises. The monitoring platform helped the IT team understand which resources to ramp up to meet peak demand, Kaplan said. As the company has expanded its cloud operations, it has eliminated many older monitoring tools.

Since adopting Dynatrace, Kaplan reviews Barbri's Apdex score daily. "I had never looked at it before," he added. Similarly, the company replaced SLA uptime monitoring with customer experience monitoring.

Despite the shift in application monitoring metrics and the tools to track them, the ultimate goal of monitoring remains the same: to determine the health of business KPIs, which tend to be difficult to measure directly. For web applications, the old metrics have become less effective at indicating or predicting the KPIs that matter, said Troy Presley, product manager at Apica Systems, a performance monitoring platform. For example, web applications can postpone, or otherwise alter, the traditional resource loading on a page, and users can perform more actions on a single page without triggering a new page load. In this scenario, the common standard measurements don't correlate well with what users actually experience. Fresh metrics, such as those produced by the Google Lighthouse web app quality auditor tool, "have good potential to regain the connection with the desired KPIs," he said. Additionally, real-time monitoring for both performance and security should be on IT teams' agendas.

Dig Deeper on IT systems management and monitoring

Software Quality
App Architecture
Cloud Computing
Data Center