No More VLANs Jumping on the Trunk

ThinkGeek Cable MonkeyNo more VLANs jumping on the trunk. That’s what I wanted. Instead, I got all the VLANs and a nice supervisor debilitating loop in the process… From my after action report: “the issue was a configuration error that caused the CPU to peg at 99%, which caused a variety of issues.”

Due to a legacy design that we have been working to retire, we have a layer 2 Ethernet WAN that can’t run STP. Yep, it’s a loop waiting to happen.

We were migrating sites to a new WAN. The circuits come in over a trunk with each site getting a VLAN tag. I was moving the last site from an interface. The procedure I wrote had the VLAN being removed from the interface with “switchport trunk allowed vlan remove ###”. Worked great until we hit the last VLAN on the trunk. When the last VLAN is removed from an interface, you might expect no VLANs to be allowed. If that’s what you expected, you are wrong. When you remove the last VLAN with “switchport trunk allowed vlan”, the port reverts to allowing ALL VLANs.

Unfortunately, since all VLANs were suddenly available on that port, it allowed a bridging loop to be created through the sites that were still connected to both the old and new WANs. This caused a variety of different bits of havoc that resulted in connectivity issues. Yeah, something of an understatement.

So, here’s some lessons I took from this:

  • Dual connected sites should have their legacy WAN ports shut down once they are stable. (This shouldn’t be a problem in a proper L3 design, but is important with our broken L2 design.)
  • Designs that prevent you from using STP to prevent loops need to be fixed sooner rather than later.
  • Entering explicit commands is better than implicit. In this case, “switchport trunk allowed vlan none” would have been better than “switchport trunk allowed vlan remove <last vlan>” and assuming the result will be what you expect.
  • Don’t make assumptions on how a device will behave. Lab it and know.
  • Always verify that changes had the desired effect. Had I verified the interface was configured as expected, the problem would have been discovered immediately and had little or no impact.

FIN