Finding the right application performance monitoring vendor can be difficult. Taking into account the significant list of vendor features to compare and contrast can make this task feel all-consuming. You can more easily compare APM software by examining three primary features: monitoring, reporting and analysis.
Monitoring is the core feature of every single APM platform, and it's available in many forms. However, the type of monitoring each platform supports can vary greatly from platform to platform, depending on its target market.
Basic server monitoring and metrics
Servers are one of the few universal requirements in application development, which makes server monitoring the cornerstone capability of every APM provider. Because of this, you'd be hard-pressed to find APM software that doesn't offer basic server monitoring and metrics. For this particular feature, it is important to take extra note of how a vendor implements server monitoring -- taking into account things like design and usability -- and determine how that might impact your organization.
Platforms like Datadog provide a highly refined UI that is perfect for teams of all sizes but can come at a high price as infrastructure scales. On the flip side, open source platforms, like Nagios, put a heavier emphasis on configurability and are much more manageable financially thanks to their self-hosted nature. While the differences between each platform's basic server monitoring offering can seem nuanced, the reality is that these details will directly impact how your team will interact with a particular platform.
With extensive research into APM software, TechTarget editors have focused this series of articles on vendors that offer APM capabilities as a separate platform rather than as part of a larger system. Our research included Gartner, Forrester and TechTarget surveys.
Real user monitoring
While every application requires a server to run on, not every application requires users. Real user monitoring (RUM) is a form of passive monitoring that tracks and analyzes every action a user takes on an application using both client-side and server-side metrics. This means that, in the case of a web application, RUM tracks and aggregates every page a user visits and every button they click, allowing you to identify the impact that every action has across the entire stack. It can track everything from the speed of the network request to the efficiency of underlying database queries, depending on the abilities of the underlying APM software.
By nature, RUM is often available from APM providers that focus specifically on user-centric application monitoring -- typically in the form of mobile and web applications. Stackify, for example, focuses heavily on web application RUM, while New Relic's software offers both web and mobile user monitoring. Other vendors, such as Sensu, target server-side monitoring and do not provide any client-side RUM functionality.
Web performance monitoring
While RUM and web performance monitoring track actual application usage, synthetic monitoring tracks common application usage. This means that, rather than tracking the behavior of an active user, it uses preprogrammed behavior to identify changes in monitoring data. This can be useful for identifying potential issues before real users encounter them.
A great example of targeted synthetic monitoring is SolarWinds' Pingdom, an APM tool that offers tools to continually monitor website performance and availability from outside the application's network. While Pingdom is a great platform for any size web application, it is excellent for smaller applications due to its reasonable price point and client-side targeted feature set. On the flip side, more robust APM platforms, like LogicMonitor, offer synthetic monitoring as an additional feature on top of more traditional server-side monitoring and analysis.
While monitoring is the cornerstone of any APM software, the sheer amount of data that can generate is rendered useless without the ability to report on anomalies. Much like monitoring, many APM tools offer at least one kind of reporting.
Application error reporting
In software development, errors are inevitable, but identifying and repairing them before they have a significant impact on the end user can be difficult. At a high level, application error reporting involves simple log aggregation and fatal error analysis, but logs aren't the only place errors happen. Many APM tools integrate directly with popular frameworks and programming languages to catch application-level errors in addition to server-level errors. For APM software that supports web performance monitoring, this level of error reporting often extends into browser-based errors as well.
The true value of error reporting often falls under the ability of an APM provider to not only aggregate and list errors, but also identify patterns in those errors. Dynatrace, for example, is a scalable APM tool that offers built-in aggregation of application errors and identifies the rate at which users are encountering errors, while competitors like Splunk and Grafana offer extensive plug-in architectures for building more custom error reporting integrations.
Load testing and alerts
Not every application needs to be scalable. Internal tools and niche products can often forego load testing. On the other hand, large-scale consumer applications require a high level of availability, which means that load testing and resource utilization reporting are a high priority. While load testing is an important factor in these types of applications, its true value comes from the data that the APM platform aggregates.
Identifying patterns across application infrastructure during different load levels can go a long way toward scalability and sustainability planning. To better facilitate this, APM vendors like AppDynamics and New Relic partner with third-party load testing platforms to provide an integrated, end-to-end load testing environment for an application. You can also utilize open source platforms, like Nagios, to manually identify patterns that indicate an unusual spike in application load and to set alerts accordingly.
Many APM tools offer analysis capabilities, which can be extremely helpful in understanding all of the generated data. While identifying application error rates and high server loads is incredibly valuable, more complex applications can benefit greatly from tools that automate the analysis of these issues.
Root cause analysis
When it comes to software defects, it can often be hard to identify the underlying cause of an application error. A database misconfiguration can cause a browser-based error page, or server overutilization can aggregate a database latency issue. Root cause analysis is a feature to help shortcut the diagnosis part of fixing defects and better identify what was happening across the full application stack when you encounter an error.
Because of the amount of data that companies need to retain, APM tools that offer root cause analysis tools often work best for larger organizations. New Relic and CA Technologies offer detailed overviews of the real-time and historical state of an application. These platforms make it possible to identify how users are utilizing applications and to drill down for more visibility, while troubleshooting by using data like throughput metrics, error rates and end-user response times, in addition to service-specific data.
The idea behind integrating AI into a standard APM platform is that the machine can actually learn and automate the manual diagnosis steps for root cause analysis and performance alerting, ultimately allowing for more time to optimize and less time for debugging.
While AI is still a burgeoning feature, Dynatrace is one provider that has some very impressive functionality for understanding causation and detecting causal relationships between issues. Because this technology is still developing, AI may not be a good fit for organizations that have established processes in place for dealing with errors and performance issues, but for smaller organizations with fewer resources, it can be the difference between surviving and thriving.
Thanks to the cloud, admins now rarely perform application development in a vacuum. Integrating with third-party services is more rule than exception, so many APM platforms offer the ability to integrate with multiple third-party cloud platforms. The advantage to integration is that it provides a way to add to the data set without having to make drastic infrastructure changes to accommodate an APM tool.
While most APM software offers some sort of plug-in architecture or integration capabilities, the two platforms with the most variability in their offerings are Datadog and Grafana. Covering the spectrum between commercial and open source between the two, Datadog and Grafana both offer an extensive library of third-party integrations, allowing you to aggregate and analyze data from just about every major service provider.
Thanks to the ubiquity of software applications in everyday life, speed and stability are more important than ever before. APM platforms are an excellent way to stay ahead of the curve on these issues, but picking the right vendor for your organization is essential for reaping the full benefits of this software.