Skip to content

upstream: cache metadata hash for O(1) comparison in EDS updates#43351

Open
wdauchy wants to merge 4 commits intoenvoyproxy:mainfrom
wdauchy:eds_hash
Open

upstream: cache metadata hash for O(1) comparison in EDS updates#43351
wdauchy wants to merge 4 commits intoenvoyproxy:mainfrom
wdauchy:eds_hash

Conversation

@wdauchy
Copy link
Contributor

@wdauchy wdauchy commented Feb 6, 2026

Commit Message:
Cache metadata hash on HostDescription to avoid expensive per-host comparisons during EDS updates. The hash is computed once when metadata is set (at host creation or update) and stored alongside the metadata. The comparison in updateDynamicHostList becomes a simple integer comparison instead of MessageDifferencer::Equivalent.

Additional Description:

For large clusters (~5-7k endpoints), updateDynamicHostList is a hot path where metadata comparison runs per host on each EDS update. Caching the hash eliminates the repeated serialization/comparison cost.

Benchmark results (5000 hosts, aarch64, -c opt):

Benchmark                                   Time             CPU   Iterations
bmUpdateHostListEquivalent/5000/5        4.39 ms         4.39 ms          159
bmUpdateHostListEquivalent/5000/20       14.1 ms         14.1 ms           50
bmUpdateHostListHash/5000/5              4.07 ms         4.07 ms          139
bmUpdateHostListHash/5000/20             12.6 ms         12.6 ms           49
bmUpdateHostListCachedHash/5000/5       0.001 ms        0.001 ms       553800
bmUpdateHostListCachedHash/5000/20      0.001 ms        0.001 ms       556835
  • Equivalent: baseline (reflection-based proto comparison)
  • Hash: recomputing hash each time (~8% faster)
  • CachedHash: pre-computed hash comparison (~4000x faster)

Risk Level: Low
Testing: Benchmark added (test/common/upstream/metadata_comparison_benchmark.cc)
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #43351 was opened by wdauchy.

see: more, trace.

@wdauchy wdauchy force-pushed the eds_hash branch 4 times, most recently from 3b607e2 to ab97358 Compare February 6, 2026 15:49
@wdauchy
Copy link
Contributor Author

wdauchy commented Feb 6, 2026

/retest transients

@wdauchy wdauchy marked this pull request as ready for review February 6, 2026 17:15
@wdauchy
Copy link
Contributor Author

wdauchy commented Feb 6, 2026

/retest

Copy link
Member

@botengyao botengyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

/wait

if (host->metadata() && existing_host->second->metadata()) {
metadata_changed = !Protobuf::util::MessageDifferencer::Equivalent(
*host->metadata(), *existing_host->second->metadata());
metadata_changed = MessageUtil::hash(*host->metadata()) !=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have a slight behavior change for Equivalent vs hash, if we are going to change this, we could need a benchmark to compare before we decide if we can move forward.

cc @adisuissa @ravenblackx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the behavior change: hash is stricter than Equivalent, the only impact is potentially treating an unchanged message as changed (false positive), never the reverse. In practice EDS metadata comes from the control plane with consistent serialization, so this shouldn't cause spurious updates?

On benchmarking: our setup with >7k endpoints, updateDynamicHostList is a hot path. Do you think a test on our side would be sufficient or do you mean to add a micro benchmark in the tests? (not super familiar with it so far)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, a mirco benchmark is fine, we can follow the similar pattern here, and here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the mico-benchmark! Do you mind sharing the results?

Copy link
Contributor Author

@wdauchy wdauchy Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked the approach based on your suggestion to add a benchmark.

Instead of recomputing the hash on every comparison (which only gave ~8% improvement), I'm now caching the metadata hash on the host object. The hash is computed once when metadata is set (at construction or update) via a new metadataHash() method on HostDescription. The comparison in updateDynamicHostList becomes a simple integer != check.

Benchmark results (aarch64, -c opt):

Benchmark (5000 hosts) 5 fields 20 fields
Equivalent (before) 4.39 ms 14.1 ms
Hash (recompute each time) 4.07 ms 12.6 ms
CachedHash (new approach) 0.001 ms 0.001 ms

The cached hash is ~4000x faster than Equivalent for the common case (metadata unchanged across EDS updates).

Note: recomputing the hash on every comparison (without caching) is actually slower than Equivalent when metadata differs, because Equivalent can short-circuit on the first mismatch while hash must process the full message. The cached approach avoids this tradeoff entirely, the comparison is always a simple integer check regardless of whether metadata changed or not. The hash cost is paid once in the setter, only when metadata actually changes.

On the behavioral difference: since both hosts compute the hash the same way, the only edge case vs Equivalent is if two semantically equivalent messages have different serializations, which would cause a false "changed" detection. In practice, EDS metadata from the control plane has consistent serialization so this shouldn't happen.

@wdauchy
Copy link
Contributor Author

wdauchy commented Feb 12, 2026

/retest transients

…y comparison

Replace Protobuf::util::MessageDifferencer with MessageUtil::hash for
equality checks in hot paths to reduce CPU during large EDS updates:

- updateDynamicHostList: metadata comparison (upstream_impl.cc)
- LocalityEndpointEqualTo: locality/endpoint comparison (locality_endpoint.h)

Hash comparison is typically faster than reflection-based MessageDifferencer
and matches Envoy's MessageLiteDifferencer approach when full protos are
disabled. Helps with clusters of ~5k endpoints where these comparisons run
per host on each EDS update.

Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>
Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>
@wdauchy wdauchy temporarily deployed to external-contributors February 12, 2026 14:11 — with GitHub Actions Inactive
@wdauchy wdauchy changed the title upstream: use hash instead of MessageDifferencer for metadata/locality comparison upstream: cache metadata hash for O(1) comparison in EDS updates Feb 12, 2026
Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>
@wdauchy
Copy link
Contributor Author

wdauchy commented Feb 12, 2026

/retest transients

@wdauchy
Copy link
Contributor Author

wdauchy commented Feb 13, 2026

/retest

@wdauchy wdauchy requested a review from botengyao February 13, 2026 12:17
@wdauchy
Copy link
Contributor Author

wdauchy commented Feb 14, 2026

/retest transients

@wdauchy
Copy link
Contributor Author

wdauchy commented Feb 14, 2026

/retest transients

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants