Fixing VMware Horizon disk latency and other resource issues
When users in a VMware Horizon environment run into performance issues, IT should look into the resource consumption, including disk CPU, to find the root cause and address it.
VDI resource usage is very different from other infrastructure-related workloads such as email servers or database servers, so VDI administrators must take extra care when managing resources such as RAM and storage on virtual disks.
The mix of user types and applications that workers access within a VDI environment, such as VMware Horizon, make resource usage less predictable with VDI. In addition, the VDI-mix for IOPS is different from other workloads -- the percentage of writes from Windows virtual desktops is much higher than most server workloads, ranging from 20/80 to 30/70 read-write-ratio.
Therefore, IT must size the storage platform for those types of workloads. Proper planning can help provide for a good user experience. When VMware Horizon admins plan for storage, there are two areas they need to cover: capacity planning and performance planning.
Capacity planning for virtual storage in VMware Horizon
Capacity planning is usually a relatively easy task for IT admins to handle. They should multiply the number of desktops by the number of gigabytes needed per desktop -- this will provide a reasonable estimate of the total capacity required.
With this formula, Horizon admins should also include the swap space per machine, operational overhead factoring in snapshots and provisioning, and some room for future growth and scale. It also depends on the type of desktops: full clones, which are equal to the size of the original virtual machine, or linked clones and instant clones, which only store the delta per virtual machine. Whatever the VMware clone type, IT admins can follow the relatively simple formula with a predictable total capacity. Additionally, if a VMware Horizon environment is short on disk space, increasing the total capacity is usually simple.
However, it isn't as easy to change a storage platform based on its fit with an organization's performance profile. Therefore, this step requires long-term planning and an approach that factors future and present business goals. This process starts with understanding all the relevant workloads. IT cannot simply look up the number of IOPS needed per desktop in a table and then multiply it with the number of desktops. The capacity needs also depend on how homogeneous the user base is.
A typical task worker using an email application and a browser for web-based applications might require only 10-30 IOPS, but someone using the full office suite of applications, graphical design and other business applications may need up to 100 IOPS. The only way to find out is to measure the performance of desktops when actual users are performing their normal day-to-day tasks.
Total capacity and the number of IOPS are two important factors for a storage platform, but the technology that IT chooses should also fit any budgetary restrictions and match the IT administrator's skill set. VMware vSAN is an interesting product because admins can manage it inside of vSphere: the platform that VMware administrators already use. It's also good to know that customers with an Advanced or Enterprise subscription license of Horizon automatically have a vSAN Advanced license. This may make the cost-benefit anaylsis a bit more attractive for the Advanced license. But because of the distributed nature of vSAN, each host contributes to the storage capacity and total IOPS performance of the cluster.
Storage optimization for VMware Horizon
One feature that IT administrators shouldn't forget is the Horizon Storage Accelerator. With previous versions of Horizon, the maximum cache size per host was only 2 GB. With Horizon 8, VMware increased this to 32 GB. This means each host can reserve that amount of RAM to cache Horizon disk files. Now, the hosts don't have to read disks from storage and instead service requests from memory. IT can configure this setting at the vCenter level for all servers or per-server basis (Figure 1).
If all hosts in the vCenter environment serve as virtual desktops, IT can centrally configure the amount of RAM. Otherwise, IT admins shouldn't adjust the cache settings on hosts that do not run virtual desktops.
When the feature is enabled for a vCenter-server, VMware admins can be enabled or disabled at the desktop-pool level. VMware's advice is to keep the feature enabled for Instant-Clone Desktop Pools. That way, Horizon can cache the replica and parent disks, which are the disks where most reads come from.
Analyzing Horizon VDI performance
From a more technical perspective, a few good tools and counters can help IT investigate why a virtual desktop isn't providing the performance it should. If IT installs the VM with the Horizon agent, the Performance Tracker offers insight into some performance statistics of the machine. However, it will not show all resources because disk performance is not part of the output (Figure 2).
Another reason that the Performance Tracker is not always the best tool for the job is that IT administrators have to run it from the actual virtual desktop. IT would have to perform this action within the user's session. It's much easier to collect the data outside of the user's session.
The Helpdesk Tool allows Horizon administrators and help desk personnel to collect performance statistics from the Horizon Admin Console without interacting with the user. Consider this example of the output of performance statistics for all important virtual resources (Figure 3).
The only disadvantage is that the Helpdesk Tool is only available in the enterprise license of Horizon, so some customers won't be able to access it.
However, VMware vSphere performance statistics are always available to Horizon admins, and they can access those statistics through the vSphere client. The general overview page allows admins to look at the most important resources in a single view. This way, IT admins can discover the most likely issue by viewing the Performance Overview page of a virtual machine (Figure 4).
Once IT finds the resource that is most likely causing performance problems, it can look to the Advanced Performance graphs for more detail. Consider an example where the disk was the issue. IT could look deeper into disk counters for read/write latency and transport rate. With storage, disk latency is the best indicator to determine issues with that resource. In this example, the maximum read/write latency are 48 ms and 50 ms, respectively, which is rather high for most systems (Figure 5).
When investigating performance problems, it's important to keep the normal and acceptable values in mind for comparison. If latency is normally around 10 ms to 20 ms, then the 50 ms in the example is a strong indicator that the disk I/O channel is the reason users are experiencing poor performance. However, if the typically acceptable latency could range from 20 ms to 25 ms, then that's close enough to normal and may not be the culprit.
Solving disk performance problems
When the disk I/O channel is the problem, IT can take a few different approaches. The simplest would be to optimize the virtual desktops to save on resource usage. Look at which processes the performance issue affects the most -- the way Horizon loads and accesses users' profiles can also cause excessive storage access. Improving the efficiency of that process may resolve the issue without any further action.
When Horizon admins can't fix the root cause with optimization, they should consider improving storage hardware components. This can lead to improvements in caching, storage network bandwidth, storage adapter queue enhancements and more.
If none of this helps, Horizon admins may conclude that their storage platform doesn't match the user's workload profile needs. As a last resort, organizations can always change platforms and purchase new hardware, but IT should avoid this if possible.