And if I limit myself to parts supported by free Vivado, that leaves five options: XCAU25P, XCKU025, XCKU035, XCKU3P, XCKU5P.
The AU25P is by far the least expensive (XCAU25P-1FFVB676E is $427 at Digikey) and I have two in inventory already. It's got 40% more LUT capacity than the 7k160t, but slightly *less* block RAM, and a lot less IO: 208 HP and 96 HD. I'd need 196 HP for the RAM, leaving 12 left: enough for clock and Vref and that's about it.
Which leaves me HD pins for interfacing with the MCU, maybe driving some indicator LEDs, and boot flash. But for a 24+2 port design I only need 6 GTs for QSGMII and 2 for 10G, so I'd have four extras.
Which is good because RGMII would really be pushing limits for HD I/O, and free GTs would let me use a SGMII PHY instead.
So as long as I can get by with 300 BRAMs (I'm using 157 in LATENTPINK including the management engine and MAC table which don't scale with interface count, so should be doable?) I think I've got a good shot.