James Thew - Fotolia
VRealize Operations Manager is a diagnostics tool that provides comprehensive system health monitoring. It's essential for administrators to have this information so they can find and prioritize issues while maintaining the health of the overall system. You can use vRealize Operations in vSphere and -- thanks to VMware's partnership with Amazon -- Amazon Web Services environments. The thing that sets vRealize Operations Manager apart from other diagnostic tools is the depth and breadth of information it provides about the environment. Essentially, the longer vRealize Operations Manager runs, the more detailed the information you receive. VRealize Operations Manager then presents this information in a clear and useful troubleshooting style.
In this article, we'll look at how to use vROps Manager to troubleshoot environments.
Get an overview of your system's health
Start by logging into vROps. This initial system health monitoring menu might look a little complex, but it gives you an immediate overview of the condition of your environment. Depending on which version of vROps you use, you might have a slightly different view. In this example, I'm using the latest version of vROps, so it's a heat map circle, like the one in Figure A, rather than the old-fashioned square.
At this point, you should immediately investigate and remediate any issues colored red. To do this, double-click the red portion of the circle. This allows the administrator to examine issues level by level and prioritize by urgency. The other portions of the circle -- yellow and green -- indicate, respectively, warnings and normal system functions.
Measure the performance of specific VMs
If an administrator wants to troubleshoot a specific VM and possible causes, he can use the quick search function at the top-right corner of the screen.
Typing in the VM's name will take you directly to the VM in question; at this point, vROps Manager will display several metrics about performance. You can also input other items -- essentially anything within the environment -- such as data stores and networks. Within this view, navigate to the top ribbon bar, and double-click the arrows to the right of the All Metrics tab.
This shows the Troubleshooting tab, which gives an instant view of the VM's current health status. As you can see in Figure B, it has some issues that we must address.
Notice the three buttons near the top of the screen: Symptoms, Timeline and Events. Timeline shows when these issues occurred in a timeline feature. As its name suggests, the timeline feature shows how and when the problems occurred.
On the topmost menu, there is also an All Metrics tab. This allows the administrator to interrogate almost every possible metric for the item in question. The CPU workload metric measures how much pressure the VM places on the underlying hardware. The CPU Ready metric indicates how much time the VM waits for scheduled time on the CPU. There will always be a certain amount of CPU Ready time; anything higher than a few percent is bad. Typically, the way to resolve this issue is to add hardware.
The mem workload metric measures the amount of memory pressure placed on the available physical memory, and the mem contention metric shows the amount of time VMs spend competing with each other for physical RAM. Finally, the virtual disk (aggregate) total latency measures latency, which can be a significant drag on performance; consistent latency above 15 milliseconds is considered bad.
Analyze performance trends
As its name implies, the Analysis tab analyzes trends.
For those new to vROps, the entire system works as a number of badges and values. Badges have different shapes and meanings and are divided by levels of major and minor effect. The major badges concern items such as health, risk and anomalies. An amber badge means there are issues to resolve, and a red badge means there are major issues. This makes it easier to troubleshoot because you can examine the badge to find out what's causing less than a perfect score.
All of these items and values derive from an algorithm that VMware permits in the product.
Anomalies are essentially metrics outside of the normal range. For example, if a server normally transfers 10 KB of data per second and then spikes for several hours, this is a workload anomaly. The VMware vROps Manager also takes into account the fact that VMs might have periodic spikes, such as during month-end tasks.
These badges are useful for understanding the current status of numerous items. To directly troubleshoot a VM, enter the machine's name in the top right-hand corner, as mentioned before. This will present another heat circle and display related information.
The Related Objects tab is useful because it allows the administrator to understand which objects are directly involved in the affected VM. For example, if the administrator suspects that I/O performance is subpar, he can use the data store item to examine storage. By clicking on the related objects of choice, the main screen will also change to reflect the new view.
The best way to approach using the vROps Manager for system health monitoring and troubleshooting is to look at the high-level view and then drill down through the stack. The main heat map shows major infrastructure issues, but to examine specific parts of the infrastructure, use the search function.
Make smarter decisions with vROps Manager
Tips to better manage and monitor server log data
Compare top virtualization management software