For a lot of network folks Layer 2 protocols such as Spanning Tree Protocol (STP) can be the source of confusion and large scale problems. All too often I see fundamental Layer 2 design flaws which lead to an engineer scratching their head while troubleshooting an issue. While not a complex protocol, STP – if combined with another technology such as Asynchronous Transfer Mode (ATM), Transparent LAN Services (TLS) or inadvertent "back doors" – can cause severe issues over the network's lifetime. This week's tip will focus on the level of understanding needed to prevent these issues and what steps you can take to troubleshoot a potential STP problem.
It starts with the fundamentals
There have been many improvements to the Spanning Tree Protocol (STP) since it was first developed to support bridging throughout various Local Area Networks (LANs). The Spanning Tree Algorithm (STA), which is the underlying mechanism, actually receives of all the modifications and newer standards. The STA is responsible for the prevention of Layer 2 loops within a network topology. Without going into the specifics of ethernet operation keep in mind that it's a bad thing when LAN devices receive the same information twice, which is what most certainly happens in a loop. I am going to take you through some factors which contribute to data flow issues and STP operation.
The first step in the process of learning the LAN's topology to determine where best to block traffic that might cause a loop is the election of the root bridge. The key elements I want to focus on are the factors affecting an election. Bridge priority is the determining factor when electing a root bridge, and the lowest priority wins. This is important because each vendor sends out their LAN switches with a default priority – for example Cisco sets its default STP priority to 32768. If all priorities are equal, the lowest bridge MAC address wins. Don't let it get that far – determine your LAN's root bridge and L2 traffic flow for yourself.
Root Port Cost
Once an election has taken place each bridge on a LAN (or VLAN) will calculate the path of least resistance to reach the root bridge. The bandwidth of the link (or type of interface) determines its STP cost – for example a Gigabit Ethernet link will have a cost of say 4 where a 100Mb link will have a cost of 10. These costs are standardized but can be modified to affect traffic flow.
Place your Demarcation points wisely
The day of large L2 domains is coming to an end. High speed ASIC based routers make the case for combining your L2 and L3 devices and limiting the span and size of your broadcast domains. There are of course factors and requirements within a changing network which might lead you to extend a LAN to multiple points around your network. Be aware that STP knows no real bounds (with the exception of spanning through 7 consecutive bridges) and VLAN-IDs can be carried across multiple vendors. An inadvertent loop can lead to disaster. You can use your L3 demarcation points to minimize risk during the addition of a L2 extension. Keep in mind there are also technologies out there which can be implemented to scale these types of solutions such as Q-in-Q and VPLS.
A word about 'load balancing' and cost manipulation
I will mention one thing about the recent "fad" of using STP to load balance L2 traffic. This is accomplished by alternating root bridges on multiple VLANs and creating data flow that for odd numbered VLANs travels left and even numbered VLANs travels right. In theory this makes sense and could potentially reduce the saturation on heavily traveled links. The reality, however is that this adds a level of complexity not needed in most operational networks. During an outage, the last thing the engineer is thinking about is whether or not the VLAN is odd or even. Pick your root and choose a side – this will reduce the time it takes to troubleshoot an issue.
Manipulating port cost is another way to affect data flow and is used often in "triangle" configurations (a configuration where an L2 access switch uplinks into redundant distribution switches that have a link in between them). During your design, determine the exact sequence of events that will occur during a failure before doing this. It's essential that you know how a configuration such as this will affect all VLANs during a failure. The recommendation here is to make that redundant link (between the distribution devices) L3.
About the author:
Doug Downer (CCIE #9848) is a Sr. Consultant with Callisma, INC, a wholly owned subsidiary of SBC Communications. Doug has over 7 years in the industry and currently provides high level business and technology consulting for various federal clients in the Washington D.C. area. He can be reached at [email protected].