Tuesday, August 25, 2015

Linux bridging of VLAN interfaces and bridge IDs

Or, there can be only one

Linux has been offering a decent set of network bridging features for quite some time now. It may not always boast the absolutely latest and greatest of network protocols, but it gets the job done, and does so with commendable stability. I have had literally hundreds (possibly thousands) of Linux software bridges running in rather complex meshed network topologies for extended periods of time with no issues directly related to the implementation.
Basic bridge configuration under Linux is a reasonably simple affair, provided you have some basic understating of network protocols. Tutorials are available for a number of popular distributions. Delving deeper and handling specific, more complex situations, is, however, a bit more demanding. With the advent of virtualization technologies, Linux bridging became increasingly common, as most host (and guest) operating systems had to implement it in some form, in order to enable relatively unimpeded access to the outside network for all involved parties. This resulted in a slew of bug reports (usually against libvirt) and different solutions for some commonly encountered issues, one of which stems from the way the Bridge ID is (re)calculated when individual ports are dynamically added or removed from the bridge. See, for example, this report.
The bridge ID is an 8-byte value used to uniquely identify a bridge within a network running Spanning Tree Protocol (STP). The STP itself is not mandatory, unless you have a meshed topology (or, actually, any topology that can include closed loops). In that case, STP becomes a must, if you want to prevent the dreaded switching loops and accompanying broadcast storms and network outages. The Bridge ID's interpretation has historically developed from 2 bytes of priority and 6 bytes of lowest of bridge ports' MAC address into a more complex form, a process nicely illustrated here. Note that the "extension", or, more accurately, reinterpretation of the first 2 bytes of Bridge ID to include the Extended system ID was made to cater for different VLANs, so that the same physical bridge/port combination would be mapped to a separate bridge ID, depending on the VLAN it is a member of. The Linux bridge control command line utility, brctl, doesn't make the distinction between old and new interpretations at all, instead allowing one to explicitly control all 2 bytes (16 bits) or the priority/extended ID field, using the brctl setbridgeprio syntax. Important thing to note is that, for STP to function correctly (also known as: to avoid all of the ugliest demons of hell to come forth, spawn offspring with the nastiest of the gremlins and then let them loose on your network), bridge IDs have to be unique! This is why MAC address is used as a part of Bridge ID (those are guaranteed to, or at least assumed to be globally unique), and VLAN IDs were added in later to cover the cases where the same port MACs are used to carry the traffic for otherwise distinct VLANs.
The issue with Linux bridging setup (or, at least, its Centos 6.x implementation) arises when you want to bridge traffic from different VLANs, after the VLANs have been terminated and the tags stripped off. A common use case would be a L2 separation of different traffic groups within one physical Ethernet trunk. The way this is commonly set up under Linux is (shown symmetrically for clarity):

A simple VLAN trunking
The pair of eth0s form a trunk that carries three distinct VLANs (1, 2 and 3, using default naming), but the eth0.x endpoints give us untagged frames belonging only to the appropriate VLANs. This ensures traffic isolation between each of the VLANs, and enables us to do whatever we want with each of the eth0.x endpoints -- give them arbitrary IP addresses, bridge them further, etc. For more examples and a more in-depth explanation of VLAN bridging on Linux, see this article. So far, so good. One thing to note here is that only two distinct, real MACs are in play: those belonging to eth0 interfaces on both systems. All of the eth0.x (virtual) interfaces share the same MAC as their physical trunk interface they are derived from. Now, what we might want to do once we have the VLAN trunking configured as above is, dynamically connect some other actual (physical) interfaces through our VLAN pipes:

Passing external traffic through our VLANs
We want to pipe network traffic between ethA and ethX, and between ethB and ethY, through Systems 1 and 2. One way to do it is by bridging ethA and ethX to input/output points for VLAN 2, eth0.2, and doing similar with ethB and ethY via VLAN 3 (eth 0.3). While this configuration might be somewhat unusual, the ability of Linux bridges to add and remove individual ports on the fly might be worth the effort if the setup needs to be dynamic in nature. Let's assume that we have to run STP on all bridges (both br2s and br3s), because a loop might be present between ethA/ethX and ethB/ethY pairs, in the topology outside of systems 1 and 2. Then, if the numerical value of eth0's MAC on either side is less than corresponding values for ethA and ethB (ethX and ethY for system 2), both br2 and br3 will end up with the same Bridge ID! This happens because the algorithm for bridge ID calculation will go through all the ports that belong to a bridge and pick the MAC with the smallest numerical value as the MAC address portion of the bridge ID. But, in our case, both eth0.2 and eth0.3 ports have the same MAC, copied from the actual eth0 MAC. If it is actually smaller than the other bridge port's MAC, it will be used, leading to multiple bridges with the same Bridge IDs. Afaik, at least on Centos 6.x, no automagic effort will be made by the command line tools or system configuration scripts to use the VLAN ID portion of bridged ports and configure the Extended system ID of bridges they are a part of. This can be overcome by issuing appropriate brctl setbridgeprio commands to split bridge IDs, of course. Centos 6 network configuration scripts don't allow for setting up the priority of bridges, but that ability can be added easily enough, by editing /etc/sysconfig/network-scripts/ifup-eth file to include something along the lines of:

[ -n "${PRIO}" ] && /usr/sbin/brctl setbridgeprio ${DEVICE} ${PRIO}

in the bridge configuration section. After that change, any PRIO=... settings in persistent network configuration files will be applied, whenever the device in question is a bridge. Other distributions may already support bridge priority setting at boot time (well, actually, at network bringup time), or would require a somewhat different solutions to add the feature.
The other issue related to the Bridge IDs and the extended ID field is somewhat more obfuscated. The Kernel version powering Centos 6 series (2.6.32) and, quite possibly, the latest 4.x Kernel versions too, have a quirk when it comes to updating STP settings in a reaction to a change in one of the bridge's port's. The change in question might be adding or removing the entire port from the bridge, or the change of port's MAC address, while it is a member of a bridge. As already explained, this should trigger a set of recalculations, because the smallest numerical MAC value of all the bridge's ports might have changed. The Kernel includes a function called br_stp_change_bridge_id(). It is provided with the new, smallest, numerical MAC value to apply to the bridge, and then it does two things: first it updates the bridge's own MAC address (correctly), and then goes through all of the bridge's ports, checking to see if the old Bridge ID (prior to the MAC change) was the root and/or designated bridge for any of the ports. If so, it updates those values as well. A piece of code that does the second task is:

list_for_each_entry(p, &br->port_list, list) {
    if (ether_addr_equal(p->designated_bridge.addr, oldaddr))
    memcpy(p->designated_bridge.addr, addr, ETH_ALEN);

    if (ether_addr_equal(p->designated_root.addr, oldaddr))
        memcpy(p->designated_root.addr, addr, ETH_ALEN);
    }
}

The oldaddr is the original MAC address, addr is the new address. The ether_addr_equal()function compares the two MAC addresses and returns true if they are equal. Older Kernel versions use the !compare_ether_addr() instead, but the effect is the same. Can you spot the problem?
Having concluded that several bridges on the same system can, in fact, share the MAC portion of their Bridge IDs, we can see that the condition for updating the root and/or designated bridge is too broad: comparing the MAC portions alone might lead to false positives. For example, if two bridges with IDs 8000:00:11:22:33:44:55 and 8001:00:11:22:33:44:55 existed on the same system, updating the MAC of the second bridge from 00:11:22:33:44:55 to 00:66:77:88:99:AA would cause any ports that had root and/or designated bridge as 8000:00:11:22:33:44:55 to change those values to 8000:00:66:77:88:99:AA - a non-existing bridge. In this case, we would have to compare the full Bridge IDs, like so:

bridge_id old_bridge_id;
(...)
memcpy(&old_bridge_id, &br->bridge_id, sizeof(old_bridge_id));
(...)
list_for_each_entry(p, &br->port_list, list) {
    if (!memcmp(&p->designated_bridge, &old_bridge_id, sizeof(old_bridge_id)))
        memcpy(p->designated_bridge.addr, addr, ETH_ALEN);

    if (!memcmp(&p->designated_root, &old_bridge_id, sizeof(old_bridge_id)))
        memcpy(p->designated_root.addr, addr, ETH_ALEN);
}

Whether this is something to be concerned about, I am not sure. Given the scope of Linux network stack deployment in real world, I guess that even this (very) particular scenario would show up sooner or later, prompting a fix. On the other hand, if you're comfortable with rolling your own bridging Kernel module.. it never hurts to be (a bit more) sure.


No comments:

Post a Comment