NetOps is more than just applying automation to network operations. The transition from manual command-line interface processes to template-driven NetOps is as much a cultural change as it is a network automation initiative. As such, it takes a while for teams to fully adopt NetOps because it requires changes to how they think about automation.
Planning the transition to NetOps
NetOps is much more than simply automating manual processes. The NetOps model requires engineers to think about how to create repeatable automation tasks using templates, instead of using the shortest set of tasks to implement a network change. It may take longer to complete a task using a NetOps approach, and that should be acceptable as long as teams are building a repeatable process that incorporates unit testing.
The adoption of NetOps, like the migration to basic automation, should start with simple tasks that are more easily implemented. This is the same approach I recommended when implementing network automation in your network. The difference is a NetOps approach should include testing to verify that changes result in the intended network state. Ideally, teams use a test environment in which unit tests can validate the changes before applying them to the production network.
Troubleshooting and network state validation are ideal low-risk tasks that won't jeopardize a network's operation.
Troubleshooting. Ping and traceroute tests are a great place to start. Enable the operations staff to select the source and destination for the test, or create a version that tests from multiple sources.
Network state validation. Network state validation verifies that the network is functioning as desired. Commercial configuration management products are incorporating validation functions that enable teams to verify pre-change and post-change operation.
Some questions to verify include the following:
- Do routing tables contain the desired routes, and is the next hop correct?
- Is your public IP address range visible from Border Gateway Protocol Looking Glass sites?
- Are the correct virtual LANs (VLANs) configured on trunking links?
- Do the interface parameters match on point-to-point links?
- When upgrading a switch, does the set of peer and endpoint devices match the pre-upgrade set?
- Are redundancy functions operational, like redundant links, EtherChannels and router redundancy peers?
Simple read/write automation
Maintain network device configuration parameters. The simplest automation is maintaining the global parameters in network device configurations, like time protocols, network management, logging and authentication.
The validation of these configurations is just as easy as the initial configuration. Identify the necessary show commands, and build some automation to verify that the desired state was achieved. Network Time Protocol, for example, should have active server or peer sessions open with the specified clock sources. This is also a good place for troubleshooting automation that can identify the problem if a clock source is not reachable.
Interface configuration. Next on the complexity scale is interface configuration. The NetOps system may need to integrate with an IP address management system to get an assigned IP address for Layer 3 interfaces. Layer 2 trunking links would need VLAN lists.
Either custom scripts or commercial automation tools can be used for this class of automation. A key characteristic of simple automation is that new values replace old values.
Intermediately complex automation
Automation that is intermediately complex challenges commercial tools. These tasks interface with other systems in ways that commercial products don't natively support. APIs between tools can help with the integration, but they don't eliminate the need for some custom scripting.
Device upgrades. One innovative engineer created a custom script to aid in the upgrade of switches at remote sites. The old and new switches had different port counts and names. The engineer could have used a manual process to assign the right VLAN to each endpoint switch port, but that would have taken a lot of effort.
Instead, he created a script to save the endpoint's media access control (MAC) address, switch port name and VLAN assignment before the upgrade. After the hardware was replaced, his script identified each endpoint's MAC address and the new switch port. The script then configured the new switch port with the original VLAN assignment. All endpoint devices were automatically assigned to their proper VLAN, saving the upgrade crew many hours of effort.
Complex NetOps automation
More complex automation tasks require the system to track the desired state and remove configuration commands that are no longer desired. Some commercial products can handle the complexity, making them the ideal tool.
Access control list (ACL) upkeep and quality of service (QoS) configuration. Tasks in this category include ACL maintenance and QoS configuration, which includes ACLs and policy maps.
Firewall migration. There's another class of complex automation that isn't as obvious. In one case, a customer needed to migrate a set of firewall rules from one vendor's platform to another, which required changing syntax. An additional requirement was to verify that the firewall rules applied to existing endpoints. If an endpoint had been decommissioned, then the rule could be deleted. Often, an entire subnet had been changed.
Move load-balancer virtual IPs (VIPs). In another case, load-balancer VIPs needed to be moved to a new platform, also requiring a syntax change. Of 3,600 VIPs, the automation identified 1,100 that had no activity and could be omitted. The result was a significant reduction in licensing cost.
Custom NetOps UI
One of the problems with custom scripts is creating a useful UI for the network operations team. An innovative approach is called ChatOps, a term derived from using chatbots to perform operational tasks. In Slack, for example, a chat sentence that begins with a slash (/) is treated as a command to be executed. This leads to NetOps interactions like the following:
/find ip 192.0.2.127
This command runs a find ip automation script that returns the following:
Device located: IP 192.0.2.127
Similar interactive interfaces exist for most chat-style collaboration tools.
Continuous integration/continuous deployment
The ultimate NetOps environment includes a test environment and automation that validate network configuration changes before implementing those changes in production. The test environment needs to replicate enough of the production network that the tests are valid. It helps if the test network includes traffic generators that can be programmatically controlled. These generators enable NetOps engineers to build unit tests along with the proposed changes, a process known as test-driven development.
Building a continuous integration and continuous deployment process for NetOps is quite challenging. The benefits, however, are substantial, as described in the companion article, "The benefits of network infrastructure as code."
Teams should anticipate using a mix of custom scripts and commercial products in their NetOps initiatives. As described above, custom scripts can incorporate additional business process actions that a commercial product doesn't support. This isn't to say that commercial products aren't valuable. Several types of changes exist, and as usual, the answer to which tool is best is: It depends.