Some IT professionals are under the impression that monitoring the performance of virtualized or automated networks is no different from monitoring traditional hardware-based appliances. That's simply not the case, however.
Both virtualization and automation can easily mask issues that cause network performance degradation. These issues can go completely unnoticed when admins use traditional network performance monitoring (NPM) tools. For enterprises that truly value the insight gained from NPM analytics, here's what you need to know about monitoring in virtualized or heavily automated network infrastructures.
Performance monitoring strategy for virtual networks
While virtualization can offer tremendous efficiencies, scalability and cost savings, it has some drawbacks. One of those drawbacks is added complexity. Troubleshooting legacy networks that operate on bare-metal hardware appliances is simple because each hop from one network device to the next is a physical connection.
Now that virtual servers are the norm, dozens or hundreds of virtual machines (VM) can reside on a single hypervisor. Each VM can have one or more virtual network interface cards that connect to a virtual switch (vSwitch). In some cases, NPM software has little to no visibility into the vSwitch. Thus, when one VM wants to talk to another VM on the same hypervisor, some NPM tools can't penetrate the virtualized layer. If performance issues were to occur on that vSwitch, network admins would have a hard time determining why.
The good news is that more recent hypervisor iterations have improved visibility into virtual network appliances. You should verify that your NPM software and the vSwitch can grant the necessary level of visibility, however.
Performance problems can also manifest in the hypervisor platform or in the physical hardware on which the hypervisor runs. That's why, in virtualized environments, it's not enough to simply monitor inside the VM -- you must also have visibility outside the VM. This includes performance monitoring for the hypervisor, as well as all the compute, memory and storage resources.
To further complicate matters, a hypervisor can be installed in a distributed computing environment. This means that CPU, memory and storage are networked together to offer improved efficiency and scalability.
Yet, with these benefits comes a network-within-a-network scenario. Your IP network that runs inside a VM may exhibit unexpected slowdowns due to issues in the underlying distributed computing network. Performance monitoring for virtualized network components means monitoring the performance of the hypervisor and distributed computing environment, as well.
Performance monitoring strategy for automated networks
When it works, network automation can be a tremendous time-saver. But when automation goes wrong, it can go wrong in a hurry.
A great example of this is how automation can negatively affect the reliability of NPM when it comes to quickly identifying and resolving a performance-related issue. Whether you set up simple scripts or build out a far more intelligent automation system using machine learning, automation can inadvertently create network performance problems. While that's an issue on its own, the problem is compounded by the fact that identifying the problem can be far more complex and time-consuming if the automation processes themselves aren't properly monitored.
When a human manually performs network configuration changes, it's easy to identify when a network change was made, who made it and why. So, when a change creates a network performance problem, the administrator can simply roll back that change. When those changes become automated, it's not always as easy to identify when or why the change was made. Thus, you and your users may experience a decline in network performance without knowing what caused it.
Performance monitoring strategies within an automated network require users to monitor the automations. That's why it's so important to include thorough monitoring and logging of any automated changes. That way, you can more easily correlate a drop in network performance with the change. This may seem like an obvious automation requirement, but you'd be surprised how many administrators don't equate this type of logging with their need to also monitor network performance.