Skip to content

Conversation

@enkechen-panw
Copy link
Contributor

@enkechen-panw enkechen-panw commented Dec 23, 2025

When determining whether the attribute of a new update is the same as
that of the previous update, currently a hash value is calculated from the
attribute and is then compared with the hash value stored for the previous
update.

Two issues are identified and fixed in this patch:

  1. The comparison of the hash value alone is insufficient.

    That's not correct when two different attributes end up with an
    identical hash value. This would result in missing updates.

    The fix is to compare the attributes fully by intern'ing the new
    attribute first.

  2. The queued withdraw is not accounted for when suppressing duplicate
    updates. This can result in missing updates with rapid route changes,
    as reported in the following commit that implements a workaround:

    commit 1eed792
    Author: Donatas Abraitis [email protected]
    Date: Thu Jun 5 11:21:48 2025 +0300
    bgpd: Force adj-rib-out updates if MRAI is kicked in

    The issue is fixed in this patch by removing the queued withdraw
    when suppressing duplicate updates.

if (likely(CHECK_FLAG(bgp->flags, BGP_FLAG_SUPPRESS_DUPLICATES)))
attr_hash = attrhash_key_make(attr);
attr_new = bgp_attr_intern(attr);
attr_prev = adj->adv && adj->adv->baa ? adj->adv->baa->attr : adj->attr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new duplicate detection logic at line 592 uses adj->attr as a fallback when adj->adv is NULL, but adj->attr is never set in bgp_adj_out_set_subgroup. It's only populated by bgp_build_update_subgroup_packet_of_subgroup after the advertisement is sent. After the advertisement is cleaned up, adj->adv becomes NULL and the fallback to the unset adj->attr field is used, breaking duplicate detection if the attribute comparison fails. Unlike the old code which stored attr_hash in this function, the new code removed this field without ensuring adj->attr is properly initialized and maintained here

Copy link
Contributor Author

@enkechen-panw enkechen-panw Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me clarify. How about the following:

+       /*
+        * If there is an update queued, the update would be the very last
+        * one that we should compare with. Otherwise compare with what was
+        * sent out previously.
+        */
+       attr_prev = adj->adv ? (adj->adv->baa ? adj->adv->baa->attr : NULL) : adj->attr;

The adj->attr starts as NULL by bgp_adj_out_alloc(). As you correctly pointed out, the adj->attr is set after the advertisement is sent. In this patch, it's also set to NULL after a withdraw is queued. Do you see any gap in how the adj->attr is initialized and maintained?

So the comparison here is either against the attribute of the update that has been queued, or of the update that has been sent out. The attr_prev could be NULL, and that will work too.

@github-actions github-actions bot added size/L and removed size/M labels Dec 28, 2025
@enkechen-panw enkechen-panw marked this pull request as ready for review December 28, 2025 06:26
@github-actions github-actions bot added size/M and removed size/L labels Dec 29, 2025
Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@enkechen-panw
Copy link
Contributor Author

enkechen-panw commented Jan 4, 2026

There is side effect for clearing the adj->attr when a route withdraw is queued. It can result in duplicate updates in case of rapid add/delete/add. It's addressed by the revised patch.

@enkechen-panw enkechen-panw marked this pull request as draft January 4, 2026 22:21
@enkechen-panw enkechen-panw marked this pull request as ready for review January 5, 2026 05:26
@enkechen-panw
Copy link
Contributor Author

ci:rerun

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good

@enkechen-panw
Copy link
Contributor Author

enkechen-panw commented Jan 6, 2026

Thanks @riw777. Let me update the patch with some debug logs.

@github-actions github-actions bot added size/L rebase PR needs rebase and removed size/M labels Jan 7, 2026
When determining whether the attribute of a new update is the same as
that of the preivous update, currently a hash value is calculated from
the attribute and is then compared with the hash value stored for the
previous update.

Two issues are identified and fixed in this patch:

1) The comparison of the hash value alone is insufficient.

   That's not correct when two different attributes end up with an
   identical hash value. It would result in missing updates.

   The fix is to compare the attributes fully by intern'ing the new
   attribute first.

2) The queued withdraw is not accounted for when suppressing duplicate
   updates. This can result in missing updates with rapid route changes,
   as reported in the following commit that implements a workaround:

   commit 1eed792
   Author: Donatas Abraitis <[email protected]>
   Date:   Thu Jun 5 11:21:48 2025 +0300
   bgpd: Force adj-rib-out updates if MRAI is kicked in

   The issue is fixed in this patch by removing the queued withdraw
   when supressing duplicate updates.

Signed-off-by: Enke Chen <[email protected]>
Currently the SUBGRP_STATUS_FORCE_UPDATES flag is cleared in
bgp_generate_updgrp_packets(). It turns out that's not the right
place for doing it. Several topotests such as bgp_as_allow_in and
bgp_set_aspath_replace are failing due to premature clearing of
the flag.

In this patch the logic is moved to subgroup_announce_route(), and
the flag is cleared only after the whole table has been announced.

Signed-off-by: Enke Chen <[email protected]>
This reverts commit 1eed792.

The workaround is no longer needed.

Signed-off-by: Enke Chen <[email protected]>
@enkechen-panw
Copy link
Contributor Author

Somehow the "rebase" label remains after I updated the branch.

@ton31337 ton31337 removed the rebase PR needs rebase label Jan 7, 2026
@enkechen-panw
Copy link
Contributor Author

ci:rerun

1 similar comment
@enkechen-panw
Copy link
Contributor Author

ci:rerun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants