This tip is part one of two, originally published as part of "Tuning Performance and Capacity Management," chapter two of the Choosing Performance Monitoring Tools e-book.
To ensure that investments in hardware, a virtualization strategy or a private cloud pay off, it's important to track systems' performance over time -- but with today's mix of physical and virtual environments and the added complications posed by multiple hypervisors, picking the right performance monitoring software can be complex.
IT professionals need ways to record, analyze and improve data center performance -- and application performance. Performance monitoring software is available from many sources:
- System vendor tools. Many of the large server vendors offer monitoring tools that support their products, as well as third-party products.
- Third-party tools. Independent software vendors also provide performance monitoring tools, in part because vendor-specific tools don't always support competitive products well.
- Cloud-based tools. Performance monitoring is now offered as a Software as a Service-based option, for companies that want to avoid installing and managing an on-premises product.
- Open source tools. The systems administrator community has created many free performance monitoring tools.
Performance monitoring software products should all include certain core features, and many offer extended features that aid in troubleshooting and administration. Tools with analytical features will allow users to apply reported data to improve server performance and capacity planning. When evaluating potential purchases, start with the core set of features, then figure out what additional functionality your environment may need.
Core features of performance monitoring tools
All performance monitoring tools should reduce application downtime by shrinking mean time to recovery, or how long it takes to return an application to normal performance once a problem exhibits itself.
All performance monitoring tools should boost application performance by identifying and resolving performance bottlenecks.
By increasing application uptime and reducing the amount of time that IT employees spend troubleshooting problems, the performance monitoring tool will help save money.
Rule out any performance monitoring tool that cannot monitor server, network and storage hardware for core functionality performance.
Server monitoring should cover at least four core areas:
- CPU utilization. The percentage of processor capability being used.
- Local disk I/O. How much the onboard storage is being used and what throughput level the disk is achieving.
- Local disk space. The percentage of total onboard storage that is being used.
- Memory. The percentage of total system memory that is used and free.
Network monitoring should address these three key network performance elements:
- The total bandwidth being consumed. This measure is for both inbound and outbound bandwidth, to and from the server.
- The number of packets being transmitted. This provides a sanity check on total bandwidth consumed; if the bandwidth being consumed is high but the number of packets being transmitted is low, this indicates a problem that needs to be addressed.
- Incidence of packet errors. It's important to know how many packet errors are occurring, because a high number points to a problem somewhere on the network.
Storage monitoring needs to monitor and report on the following aspects of shared storage devices:
- Array availability and performance. Since storage devices are shared devices that can affect many applications at once, tracking their status is critical.
- Data volumes and their status. Arrays support many volumes, each of which is tied to an application, so tracking individual volumes is an important part of application monitoring.
- Array capacity used and free. Running out of storage space is a common application availability issue, so tracking used and free space is a fundamental requirement for performance monitoring tools.
While these areas all relate to hardware monitoring, software monitoring is equally important. Historically, simpler application deployment topologies made monitoring applications and application components less critical. Today, however, software monitoring is a necessary companion to hardware monitoring.
Core performance monitoring requirements for software include:
- Monitoring for system software components that make up a company's core infrastructure. For example, with virtualization, hypervisor monitoring is crucial.
- Preconfigured monitoring capability for common commercial applications such as Microsoft Exchange, open source applications such as RabbitMQ, and middleware such as Oracle databases.
- A software development kit to enable monitoring for custom applications and additional monitoring metrics, e.g., number of calls per second to specific application functions and average function response time.
- The ability to monitor external services to measure availability and performance and assess application performance effect.
Extended performance monitoring features
For less-complex environments and simpler applications with few components that run on hardware in-house, these core performance monitoring features suffice. But application topologies often require additional functionality, especially when operating in a cloud computing environment.
Performance monitoring tools with extended functionality commonly include:
- Aggregated/segregated performance display. An application tier may include 10 or more individual virtual machines (VMs), all performing the same function. While it's useful to view the VMs as a collection, you'll need to drill down to individual VM performance information if problems occur.
- Aggregated/consolidated logging. Applications can contain dozens or hundreds of software components. Tracking down performance issues is challenging; a consolidated collection of log entries from all the different components enables users to track incidents to get to the root cause.
- Alerting. While alerts are typically included in basic performance monitoring features, tools with extended functionality can define thresholds and alert routing logic. Therefore, one kind of problem will prompt alerts to the network group; another will alert the server group.
- Configurable dashboard. A graphical display of infrastructure and application performance is extremely useful for quick information output. A configurable dashboard makes it possible to create individual displays for IT staff with different responsibilities.
- Application programming interface (API). The DevOps movement has prompted an explosion of new automation options for applications and infrastructure. Many options use performance monitoring data to trigger events and actions. An API is crucial to support the automation of IT tasks in the data center.
- In-memory metric storage. With vast amounts of data from Web-scale applications and the need to respond immediately to performance issues, data retrieval from disk can take an unacceptably long time. For faster response times, leading-edge tools offer in-memory metric storage along with slice-and-dice analytics to support quick problem resolution.
Time series analytics. Comparing performance and metrics over time often highlights events and conditions that trigger problems. Storing and displaying time-based analytics is a common extended feature for performance monitoring software.