and we are *done*

work just did a major migration from #mstp spanning tree, to a #EVPN #backbone spine/leaf configuration, where the leafs are connected with multi-chassis #lacp.

~12s outage per switch as we converted from old-world to new-world, including updating the mac-tables for all switches in our network.

6 months of preparation. 1 month of *serious* prep. 1 day move a few non-critical systems, and let them percolate. Then, 7 days of swingshift to move everything else..

#networking

spanning tree is no longer a relevant feature of our network.

most importantly, we got rid of it, *without* requiring changes to any end-user system.

@phessler So is there something else preventing network loops? Just the threat of slow, painful death?

@kurtm rstp is still running on the leafs, and that will catch loops there.

spines are primarily L3 routing. Or, using lacp to uplink anything with more than one port.

@phessler \o/   

Congrats! I know that was such a big project for so long. Well done.

@phessler Just out of curiosity - how many people working on that? We've been discussing the effort of such a move (including getting everyone on board on the required knowledge), and then settled for a L2 MLAG spine/leaf (with the option of going for EVPN later on)...

@galaxis I did almost all of the configuration work. We trained our coworkers in the new system. Coworkers spent probably a week during the first 6 months discussing things informally, then had a solid week of documentation and learning the new system.

Actual move was: one or two people in the DC, and two people remote. Remote people handled monitoring, clicking DownTimes, and configuration changes. People in the DC did the cabling and (when necessary) moving of hardware.

@galaxis probably 70% of my effort in the first 6 months was reverse engineering how the new vendor did stuff, so we could automate it.

For a vendor that properly supports automation, it can probably be done a lot faster.

@phessler @galaxis As opposed to a vendor that merely claims to support automation?
@kurtm @galaxis Cumulus, so yes.
@phessler I'm just sitting here thinking "Peter wouldn't choose a vendor if it didn't automate. Ah. Vendors lie."
@kurtm I should have asked to see an example. I didn't, and are now an example to you all.
@phessler It happens to so many of us. I think it is because so many vendors can't bullshit competently. When one finally does, we assume that they are competent.
@kurtm I made the mistake of trusting someone I knew.