When I set about trying to determine the five most common routing errors, I didn't realize what a challenge it would be to narrow them down. I considered writing about the five most annoying routing errors, but quickly realized all five would revolve around ISDN dial, and no one would want to read that. So I eventually decided to eliminate all the host-based routing problems and focus on errors that involve routing protocols and that, while not necessarily the MOST common, are fairly common and easily avoided or resolved.
As you probably know, one of the joys of consulting or doing project work as a network reseller, is discovering these little routing errors in customer networks during an install or change window. It's fairly common to find situations where a previous administrator committed one of these errors and was unable to resolve it, so they "fixed" it by adding static routes or changing the administrative distance of a protocol.
This turns the network into a minefield in which routes aren't propagated as anticipated when you make your changes, so you have to find and remove the "fix," then find and resolve the actual problem. You have to be extremely careful in this situation though, because your "fix" can send a flood of new routes into the network, changing traffic patterns that may not be transparent to the users.
Another gotcha to be thinking about when troubleshooting routing errors is how it changes your customer's expectation of the project's scope. If you have to fix a lot of errors, it could take a lot more time than anyone expected. You also need to make sure you have the ability to change lots of routers. For instance, if you're only supposed to be working on one router, and the problem needs to be corrected on another router, do you have access? Are you allowed to change it? If not, do you have the contact information for someone who can? How can you document the change? And let's not forget... are you being paid to change it?
With that in mind, let's move on to the errors...
Error #1: Filtering redistribution
When redistributing routing you need to filter the routes properly to avoid routing loops and route feedback. Not applying filters at all is usually a significant problem, but managing the filters in a complex and poorly summarized network is such an administrative burden that missing a route here or there is extremely common. The best way to avoid this, of course, is to not do redistribution at all. In fact, redistribution is almost always a bad decision. Friends don't let friends do mutual redistribution.
Error #2: Mismatched neighbor parameters in OSPF
In order to form an adjacency, OSPF routers need to have quite a few parameters in common. These include authentication, area ID, mask, hello interval, router dead interval, etc. Quite often, due to fat fingers, non-standard configurations or invalid passwords, deviant parameters will prevent adjacencies from forming. While it's hard to avoid typos, it's fairly easy to use the debug command for OSPF adjacencies, which will quickly let you know if mismatched parameters are a problem. Once you know that, it's trivial to correct the configuration.
Error #3: 'subnets'
It's fairly common, when redistributing routes into OSPF, to find several missing. Most commonly, the culprit is that someone forgot to tack the 'subnets ' keyword to the end of the redistribute command. Cisco says, "The subnets keyword tells OSPF to redistribute all subnet routes. Without the subnets keyword, only networks that are not subnetted will be redistributed by OSPF." They should know.
Error #4: Metrics
If you're redistributing routes into EIGRP and find they're all missing, the problem is almost always that someone forgot to set the metrics. Oddly enough, Cisco declined the opportunity to set a default metric for EIGRP routes. Instead, they leave that up to the administrator. (Never mind the fact that it's not really a 'default' if you have to set it.) Thus, if you don't set it, routes will not be redistributed. I suspect this is penance for making the decision to use EIGRP and do route redistribution at the same time.
To solve this problem, you need to either set a default metric with the deceptively named 'default-metric bandwidth delay reliability loading mtu' command -- and yes, you need to specify ALL of those -- or you can set the same parameters with the 'metric' keyword as part of the redistribute command.
Error #5: Tweaking EIGRP metrics
Speaking of EIGRP metrics, it's often hard for administrators to resist tweaking them in order to cause traffic to prefer one circuit over another. In my experience, this is almost always an attempt to send traffic over an Internet VPN instead of a low-bandwidth frame-relay circuit . The bandwidth and delay parameters just seem so simple to apply, but the problems come over time after all these tweaks add up to a little nightmare as administrators try to find all the metrics they've set and stuff them into the not-so-simple formula for the composite metric to determine how to get traffic to flow over the right circuit again.
My advice for avoiding unpredictable traffic flows is simple: if you're thinking about tweaking the EIGRP metrics, have a friend page you at 3 a.m. and give you the parameters to three paths through your network. Calculate the correct cost of each path and then predict which will be preferred. If you get it right, and you enjoyed the exercise, then go ahead and make your changes.
About the author
Tom Lancaster, CCIE# 8829 CNX# 1105, is a consultant with 15 years of experience in the networking industry. He is co-author of several books on networking, most recently, CCSP: Secure PIX and Secure VPN Study Guide, published by Sybex.