5 principles of change management in networking
Network change management includes five principles, including risk analysis and peer review. These best practices can help network teams reduce failed network changes and outages.
Network change management is a process that aims to reduce the risk of a failed change. This process entails several steps that ensure successful changes.
Aircraft pilots use well-defined processes to ensure safe flying. Similarly, network teams can use defined processes to reduce the risk of failed network changes that create unplanned outages. Still, organizations sometimes find that changes don't go as planned, leading to network outages. Some disruptions are due to a process failure, while others result from unintended consequences of complex configurations.
This article discusses the basic operating principles of network change management, such as the following:
- Scope determination and risk analysis.
- Peer review.
- Pre-deployment testing and validation.
- Implementation and testing.
- Documentation updates.
Before entering the change management process, network teams must establish the change details, such as new configurations, device connection information and documentation.
1. Scope and risk analysis
The first step in the network change management process is to evaluate the scope of a proposed change. Determine which services might be affected and the stakeholders who use those services. Consider the blast radius of a change for its potential scope and effect, including any possible negative outcomes.
This article is part of
What is network management?
Teams should measure the scope in terms of the following two factors:
- The number of endpoints affected by a change.
- The importance of the services a change might affect.
Once teams identify the scope, they should perform a risk assessment of the change. Is it something that has been done many times before and is well understood? Is it fully automated, or is it possible that human error will alter the change in an unexpected way? Is the technology involved well understood, or is there a chance of something unexpected happening?
The scope of a change figures into the risk. A change to the infrastructure on which key business processes run poses a greater risk to the business than a change to a small branch site.
Network teams can assess risk by assigning values to key parameters. By averaging the values from the following example parameters, they can determine the overall risk level.
The greater the risk, the more careful teams need to be during the remainder of the change management process. Teams should have clear change control documentation in place, detailing the rationale for any changes, rollback procedures and scope.
2. Peer review
The next step is to conduct a peer review. Teams can perform this step before the risk analysis, but it's better to use the risk level to drive the thoroughness of a peer review. While all peer reviews should be comparably thorough, it's likely that teams conduct cursory reviews for routine changes, such as access control list changes or virtual LAN modifications. Automated testing and deployment of routine changes can help mitigate the risk of cursory peer reviews.
Typically, internal staff who are familiar with the network conduct the peer reviews. If a change is out of the ordinary, however, it makes sense to have an expert from the equipment vendor perform the review. Findings from the reviews should feed back to the risk analysis phase and update the technical risk measurements.
Peer reviewers should examine several factors during a review, including the following:
- Configuration scripts.
- Hardware and software compatibility.
- Rollback procedures.
- Change rationale.
- Business needs.
- Network security and compliance.
- Templates and documentation.
3. Pre-deployment testing and validation
Ideally, all changes go through a pre-deployment testing and validation phase. Consider automating low-risk, repetitive tasks and changes to remove the temptation to skip testing for changes that teams perceive as low risk. The greater the scope and risk, the more important it is to properly test and validate the proposed change.
The prevalence of virtual router and switch OS instances makes it easier to automate the creation of test network topologies without expensive hardware investments. Use network labs and sandboxes to build automation workflows within a virtual network topology that teams can tear down after the tests are successfully completed.
Pre-deployment testing includes several steps teams should follow to evaluate a proposed change:
- Verify that the test network works as intended prior to the change.
- Implement the change in a test infrastructure to confirm that the change results in the desired final state. Teams should use automated processes to avoid human error and reduce the time to validate the change. If validation in the test environment fails, determine the reason. Did it fail because the change was incorrect, or was it because the test network didn't accurately represent the real network?
- Test the backout change process so it's easy to revert to the previous state if something goes wrong. The rollback should return the network to the starting state, which teams can validate by repeating Step 1.
4. Implementation and testing
Deployment and post-deployment testing and validation should follow Steps 1 and 2 of pre-deployment testing. If teams have done a good job of pre-deployment testing and validation, nothing unexpected should happen. If post-change testing detects an unexpected problem, teams should back out the change and verify service restoration.
Some network protocols take longer to converge after changes to large networks. As such, post-change verification should incorporate delays or convergence tests, which pre-deployment testing in a small test environment doesn't need.
Many organizations automate network configuration changes to migrate to a DevOps culture based on infrastructure as code. The objective is to adopt a continuous integration/continuous delivery process for low-risk changes.
5. Documentation and network management updates
Ideally, teams create and update documents during the change creation process, enabling them to review the documentation and network management changes along with details of the change. Once teams have implemented and verified the change, they can incorporate the documentation changes into a network documentation system.
Teams should also update the network management system as needed. Most network management systems have APIs that enable automated processes to make the changes.
If the change validation step is automated, it can be incorporated into periodic network validation checks. These periodic checks can detect failures in highly redundant and resilient networks. Over time, teams build a library of network validation checks that cover many parts of the network.
The principles of good network change management provide guidance to reduce unplanned network outages due to failed changes. Teams should create a process that works for their organization and work toward making that process highly efficient.
Editor's note: This article was originally written by Terry Slattery and updated by Informa TechTarget editors to improve the reader experience.
Terry Slattery is an independent consultant who specializes in network management and network automation. He founded Netcordia and invented NetMRI, a network analysis appliance that provides visibility into the issues and complexity of modern router- and switch-based IP networks.