Gajus - Fotolia
Enterprise-grade network testing is one of the most frequently overlooked aspects of network engineering.
Network engineers often spend so much time designing and architecting the network that we seldom think about continuous testing. The time we have left after architecture and design is typically spent in break/fix mode dealing with the inevitable problems that crop up. Everything is an emergency; everything must be installed, configured or fixed right now; and ongoing testing seems to be a luxury we either can't afford or don't have time to think about. (Read The Phoenix Project by Gene Kim, Kevin Behr and George Spafford for a great story illustrating this common cycle.)
The reality is network engineers need continuous testing if they are going to keep their networks stable, head off problems before they occur and still have time for ongoing improvements, patching, upgrades and new deployments of technologies that support business-driven initiatives. Networks are complicated beasts and to leave them undomesticated in the wild is to abdicate responsibility to the business for which the network exists.
But how do network engineers go about testing their networks, and what kind of testing should they be performing? These are complicated questions with highly variable answers -- but, fortunately, network engineers can take a few common approaches to help them conduct regular network health checks.
Network testing approaches
Run traffic analysis
One of the more important tests network engineers should run is traffic analysis. Does traffic traverse the network as expected? Does it get from its origin to its destination, across the portion of the path that network engineers control, as efficiently as the network design will allow?
These tests can be as unsophisticated and basic as running ICMP echo requests between two points on the network and measuring the latency on the path. In fact, many modern network paradigms rely on echo, or ping, testing as part of their intelligent traffic steering algorithms. The challenge with relying on only this approach is it misses many details when something does go wrong.
Create synthetic network traffic
Another approach in network traffic testing is to use dedicated and purpose-built traffic generation hardware from vendors, such as Gigamon or Ixia, to create network traffic while simultaneously monitoring the network at various points. Engineers can monitor the network using software such as Kentik, PRTG, Splunk or Wireshark, to name a few, and can gather traffic in a variety of ways, including port mirroring or dedicated hardware taps.
The challenge with synthetic traffic is that, if done incorrectly, it's easy to overwhelm the network with artificial traffic and crowd out or seriously degrade legitimate business traffic. The key is to perform testing during dedicated outage windows or in a less obtrusive manner by padding existing traffic up to within the reasonable headroom of the network's capacity. Engineers can, of course, monitor traffic regardless of whether they use synthetic traffic generation.
Test wireless traffic
Much of the access layer these days is wireless in one form or another. Testing wireless traffic requires a similar approach as above, but with different tools. One key measurement of wireless accessibility is whether users have adequate coverage -- i.e., they can get on the network -- wherever they are trying to work from.
Ubiquitous coverage within a designated office space should be the minimum benchmark. But, all too often, teams perform site surveys upon installation of the wireless infrastructure and then leave it alone through changing times, technologies and paradigms. People and furniture are moved, interfering equipment is introduced -- such as microwaves -- and, over time, the experience is noticeably degraded. It's not necessary to run continuous surveys of the wireless environment, but surveys should be put in a regular rotation of maintenance. Don't just set it and forget it.
Check backup power systems
One key aspect of network testing that is constantly overlooked is testing the backup power system. Batteries in uninterruptible power supply units should be tested and replaced regularly. A and B power banks should be switched off both independently as well as simultaneously. Testing failover between the two banks as well as the generator transfer switch should be done once a year, at minimum, preferably before storm season.
Monitor protocol redundancies and recovery strategies
Countless other examples of network testing are available, but it would take too long to list them all. However, engineers should regularly monitor and test any and all network protocol redundancies. Routing protocol failover, intelligent fabric failure to a fall-through spanning tree protocol configuration, and A/B port channel configurations to servers should all be tested on a recurring cycle.
Also, this should be self-evident, but it's key to test backup and recovery strategies. Server and network equipment backups are fine, but it's more than inconvenient when teams can't recover from the mediums upon which they store the backups.
The length of the list of items we need to think about when it comes to network health checks is staggeringly long, and it's easy to overlook certain aspects of it. Teams overlook items at their own risk, however, as the components they haven't tested are the ones that most often fail -- the things that shouldn't ever fail and we know always work are the first to go. Network engineers can protect themselves and their businesses by coming up with a comprehensive test plan now. Write it down, execute it regularly and mitigate as much risk as possible.