ESI-LAG: Initial draft implementation#2425
Conversation
|
For the data model, I would consider the following: I'm not sure if this is best implemented as a separate module, or as part of the evpn module (since esi is closely linked with evpn, e.g. on SR Linux it would require evpn). The |
But this is explicitly addressing ESI-LAG, where the esi attribute must stay under lag config (at least both in junos, frr, eos). And IIRC in FRR esi mh attributes are supported only under bond interfaces (while other vendors support esi also outside the lag). However the big question is: how can we separate them but at the same time keep them tied in some cases? that is what I was not able to easily achieve with a plugin (especially on the "validation" of "mlag" - i.e., not requiring peerlink & co...). |
Taking an example from your FRR config - this is evpn configuration, not lag. If FRR has the limitation that ESI can only be applied to lags, a device quirk could enforce that (or a feature flag inside the In my example, if My main concern, is that the data model imho shouldn't be based on the (limited) way in which some particular platform supports the feature. In my mind the feature is "ethernet segments", with implementations under EVPN and (here) LAG |
totally agree on this. but how we can change the LAG validation? i.e., if I apply an ethernet segment to a (m)lag interface, then in some way we need to disable mlag peer link validation (or device not supporting "strict" mlag) - so evpn and lag context will be mixed here. |
I can see how it's hard to get a "clean" separation / modularity here I would say it'd be ok to modify the logic in the lag module to check for an 'esi' interface attribute (and perhaps a |
|
@jbemmel I really need your help on this topic: I moved the implementation to an external plugin, and I was successfully able to add additional verification to 'lag' module to avoid errors for not supporting mclag. However, after passing throught the lag plugin, the vlan module starts complaining: Topology snippet: Error: I guess this is related on how the lag module copies attributes to additional links? any hint? How did you make vlan not complaining for the 'lag' attributes? thanks :-) |
nevermind, I think I got it. it's in the vlan module itself. |
| - lag: | ||
| members: | ||
| - s1: | ||
| lag.lacp_system_id: 1 |
There was a problem hiding this comment.
As commented above, I think it would be great if the plugin could implement logic to auto-generate the lacp_system_id:
When a lag interface references an ES with 'auto: true', if there is no manual override generate a lacp_system_id based on the minimum node ID of all the nodes attached to that segment (could be >2)
tests/topology/expected/lag-l2.yml
Outdated
| name: r1 -> r2 | ||
| neighbors: | ||
| - ifname: port-channel1 | ||
| lag: |
There was a problem hiding this comment.
It is odd that this changes - could point at a bug somewhere (not necessarily in your additions)
There was a problem hiding this comment.
Agree -- this should be investigated. Also, this is definitely triggered by the additions (not saying the bug is not somewhere else), or we would have already seen it in dev branch.
There was a problem hiding this comment.
It's triggered by the addition of the lacp_system_id interface attribute
There was a problem hiding this comment.
Add
intf_to_neighbor: False # By default, do not include lag attributes in neighbors
to fix it
|
Side note: frr seems to have some problems, under investigation. Control plane looks good, looks like the official example found here: https://github.com/FRRouting/frr/tree/master/tests/topotests/bgp_evpn_mh But test is not passing. Included test is passing for vjunos-switch and eos. |
|
@ipspace I'd love to hear also your feedback on this plugin - but no rush. |
jbemmel
left a comment
There was a problem hiding this comment.
- (IMHO) ethernet segments should be similar to vlans, in that they always need to be declared - even if all IDs are auto-generated. It helps users avoid mistakes by enforcing consistency / referential integrity
- should add
intf_to_neighbor: Falseto lag module to avoid changes to neighbors
|
Note that Cumulus NVUE fails to pass the test; after disabling the link, the ping from h2 to h1 fails |
| description: End-to-end connectivity after a LAG member failure | ||
| nodes: [ h2, h3 ] | ||
| wait_msg: Waiting for ESI-LAG convergence | ||
| level: warning |
There was a problem hiding this comment.
Should this be a mere warning? Supporting failover scenario's is the whole point of doing ES Multihoming, Cumulus NVUE currently fails this test - so it basically doesn't support MH properly
There was a problem hiding this comment.
I copied (and adapted) your M-LAG tests. If this should not be a warning here, it should not be a warning here as well:
netlab/tests/integration/lag/10-mlag.yml
Line 60 in 110bf8a
|
Thank you!
Based on how Nvidia rolls, NVUE is dead to me and I plan to declare it obsolete (together with CL 4.x) in the next release notes (did I mention https://www.youtube.com/shorts/tQIdxbWhHSM already?). |
|
I lost track 🤦♂️ Is this ready for another review or a merge? |
Rebased, ready for review and merge if you think it's ok. I won't investigate further on cumulus convergence on link failure. |
* Use 'data.get_empty_box' to create a box * There's no need to create dicts inside a box (they are created on first reference) * Dict can be used instead of a set to detect duplicates * data.append_to_list implements the 'append to list, create if missing' functionality
ipspace
left a comment
There was a problem hiding this comment.
Looks good to me (and EOS implementation works ;). I streamlined the _esi_stats code a bit; if you're OK with those changes, let's merge this one.
all good, you can proceed, thank you!!! |
Initial draft implementation of ESI-LAG in LAG module.
I tried to implement this as a plugin first, but it was really a mess to have to interact with the lag module (especially for mlag checks), so I decided to "embed" the code directly in the module itself.
JunOS implementation works, it should be straightforward to add support for FRR and EOS - and I can do it.
But first I'd like your opinion on this @ipspace @jbemmel , then I can also update the documentation as well.
With the current LAG module "syntax", ESI-LAG must be specified in this way: