Does this make sense? Like would an IPv6 Router Advertisement contain essentially other internal gateways on the layer 3 network?

I'm thinking that not all clients should have to run BGP and peer just to get the gateways to the other subnets.

Looks like there's no way to wire up FRR to the RA options in OPNsense, and I'm not sure RAs support this (though it seems logical).

#Networking #HomeLab #IPv6 #OPNsense #iBGP #BGP

@arichtman I don't really see a way in which that would be possible in RAs. Ref:
https://datatracker.ietf.org/doc/html/rfc4861#section-4.2

You can advertise additional prefixes in the RA, but those are generally for additional on-link prefixes, i.e. additional prefixes that are also present on the same L2 domain. You can also control the L flag for prefixes to indicate whether or not they are local (https://blog.ipspace.net/2012/11/ipv6-router-advertisements-deep-dive/), but that is used to try to create a Private VLAN type setup, or in other words trying to force all traffic for that prefix through the router sending that RA.

In order to try to do what you describe, the router would need to basically need to have a set of prefix:router tuples, indicating "for this prefix, use this router". But the RA packet format doesn't allow that: there is the router, and then the set of prefixes; it's a one-to-many for router-to-prefix(es), not many-to-many for router(s)-to-prefix(es). I suppose maybe you could have a given router, like, craft a number of different RAs, manipulating the Source link-layer address in those RAs and then only including in a given RA the prefixes relevant for that router. That starts to feel really dodgy, though, and I'm not even sure exactly how clients would respond to that.

The best bet for the constraints (no L3 switches involved in the routing protocol), imho, is for the router that's peered to Cilium to drop the additional prefixes into its RAs, probably with the L bit cleared in order to force the traffic up through it, and then possibly with ICMP redirects to shortcut the traffic over. ICMP redirects always feel brittle, though.

Actually, looking at this, the crux of the matter seems to be that you have non-BGP-speaking nodes that are co-resident in the same L3 network as your k8s nodes, and you want them to go directly to the workers rather than up through a central router, right? Honestly, for that I would say the ideal layout would be to drop those non-BGP-speaking hosts in a separate/discrete L3 subnet (VLAN etc if needed, and then just pulling the traffic through L3 ToRs that also speak BGP to the k8s workers.

I gather the issue is that we have dumb switches here, though, and L3 outside of k8s itself is through opnsense or such as a central router, so this would be pulling that traffic through your central router layer. I guess the question is how much traffic are we talking here, and whether it's Good Enough to transit it through your central router. Otherwise yea, you'd be looking at a simple router VM or something if you want to avoid hitting the cost of a stateful transit layer through opnsense, or just a cheap L3 switch of some description.

RFC 4861: Neighbor Discovery for IP version 6 (IPv6)

This document specifies the Neighbor Discovery protocol for IP Version 6. IPv6 nodes on the same link use Neighbor Discovery to discover each other's presence, to determine each other's link-layer addresses, to find routers, and to maintain reachability information about the paths to active neighbors. [STANDARDS-TRACK]

IETF Datatracker
@hugo thanks for the detailed info! Tbqh I think north-south traffic won't be that much that it matters, I can't see much more than GBE stuff for home anyways and I'm not ready to go to SMB-priced gear. East-West traffic is likely to be much more as I'm planning on distributed storage (though I have ideas about optimizing data locality)
@hugo re multiple RAs I don't know if client behavior is well-defined. There's a priority component but that's mostly for fail-over I think. Rogue RAs are a known security issue in v6 and it's a core trust/knowledge problem, so I am not anticipating a nice solution here 😁