-
Notifications
You must be signed in to change notification settings - Fork 1.4k
bgpd: fix bugs with suppress-duplicates #20325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
18f22cb to
224ff0b
Compare
bgpd/bgp_updgrp_adv.c
Outdated
| if (likely(CHECK_FLAG(bgp->flags, BGP_FLAG_SUPPRESS_DUPLICATES))) | ||
| attr_hash = attrhash_key_make(attr); | ||
| attr_new = bgp_attr_intern(attr); | ||
| attr_prev = adj->adv && adj->adv->baa ? adj->adv->baa->attr : adj->attr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new duplicate detection logic at line 592 uses adj->attr as a fallback when adj->adv is NULL, but adj->attr is never set in bgp_adj_out_set_subgroup. It's only populated by bgp_build_update_subgroup_packet_of_subgroup after the advertisement is sent. After the advertisement is cleaned up, adj->adv becomes NULL and the fallback to the unset adj->attr field is used, breaking duplicate detection if the attribute comparison fails. Unlike the old code which stored attr_hash in this function, the new code removed this field without ensuring adj->attr is properly initialized and maintained here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me clarify. How about the following:
+ /*
+ * If there is an update queued, the update would be the very last
+ * one that we should compare with. Otherwise compare with what was
+ * sent out previously.
+ */
+ attr_prev = adj->adv ? (adj->adv->baa ? adj->adv->baa->attr : NULL) : adj->attr;
The adj->attr starts as NULL by bgp_adj_out_alloc(). As you correctly pointed out, the adj->attr is set after the advertisement is sent. In this patch, it's also set to NULL after a withdraw is queued. Do you see any gap in how the adj->attr is initialized and maintained?
So the comparison here is either against the attribute of the update that has been queued, or of the update that has been sent out. The attr_prev could be NULL, and that will work too.
224ff0b to
4da9e2d
Compare
4da9e2d to
bcd8759
Compare
riw777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
|
There is side effect for clearing the adj->attr when a route withdraw is queued. It can result in duplicate updates in case of rapid add/delete/add. It's addressed by the revised patch. |
bcd8759 to
dff004d
Compare
dff004d to
e3bc60a
Compare
|
ci:rerun |
riw777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good
|
Thanks @riw777. Let me update the patch with some debug logs. |
e3bc60a to
adf61b3
Compare
When determining whether the attribute of a new update is the same as that of the preivous update, currently a hash value is calculated from the attribute and is then compared with the hash value stored for the previous update. Two issues are identified and fixed in this patch: 1) The comparison of the hash value alone is insufficient. That's not correct when two different attributes end up with an identical hash value. It would result in missing updates. The fix is to compare the attributes fully by intern'ing the new attribute first. 2) The queued withdraw is not accounted for when suppressing duplicate updates. This can result in missing updates with rapid route changes, as reported in the following commit that implements a workaround: commit 1eed792 Author: Donatas Abraitis <[email protected]> Date: Thu Jun 5 11:21:48 2025 +0300 bgpd: Force adj-rib-out updates if MRAI is kicked in The issue is fixed in this patch by removing the queued withdraw when supressing duplicate updates. Signed-off-by: Enke Chen <[email protected]>
Currently the SUBGRP_STATUS_FORCE_UPDATES flag is cleared in bgp_generate_updgrp_packets(). It turns out that's not the right place for doing it. Several topotests such as bgp_as_allow_in and bgp_set_aspath_replace are failing due to premature clearing of the flag. In this patch the logic is moved to subgroup_announce_route(), and the flag is cleared only after the whole table has been announced. Signed-off-by: Enke Chen <[email protected]>
This reverts commit 1eed792. The workaround is no longer needed. Signed-off-by: Enke Chen <[email protected]>
adf61b3 to
5c29f55
Compare
|
Somehow the "rebase" label remains after I updated the branch. |
|
ci:rerun |
1 similar comment
|
ci:rerun |
When determining whether the attribute of a new update is the same as
that of the previous update, currently a hash value is calculated from the
attribute and is then compared with the hash value stored for the previous
update.
Two issues are identified and fixed in this patch:
The comparison of the hash value alone is insufficient.
That's not correct when two different attributes end up with an
identical hash value. This would result in missing updates.
The fix is to compare the attributes fully by intern'ing the new
attribute first.
The queued withdraw is not accounted for when suppressing duplicate
updates. This can result in missing updates with rapid route changes,
as reported in the following commit that implements a workaround:
commit 1eed792
Author: Donatas Abraitis [email protected]
Date: Thu Jun 5 11:21:48 2025 +0300
bgpd: Force adj-rib-out updates if MRAI is kicked in
The issue is fixed in this patch by removing the queued withdraw
when suppressing duplicate updates.