Skip to content

Comments

feat(edge_cov): collision-free dense edge IDs#404

Draft
gakonst wants to merge 4 commits intomainfrom
alpharush/collision-free-edge-cov
Draft

feat(edge_cov): collision-free dense edge IDs#404
gakonst wants to merge 4 commits intomainfrom
alpharush/collision-free-edge-cov

Conversation

@gakonst
Copy link
Member

@gakonst gakonst commented Feb 12, 2026

Summary

Replace the hash-modulo scheme (hash(addr,pc,dest) % 65536) with a HashMap-based dense ID assignment that eliminates edge collisions in coverage-guided fuzzing.

Motivation

The current EdgeCovInspector hashes each (address, pc, jump_dest) tuple and truncates to a 65536-entry buffer. With large contracts or instrumented native code (e.g. sancov-instrumented precompiles), distinct edges frequently alias to the same hitcount slot, corrupting the coverage signal. The fuzzer can't distinguish "new edge A" from "more hits on existing edge B", degrading guidance quality.

Changes

  • EdgeCovInspector now holds a HashMap<(Address, usize, U256), usize> mapping each unique edge to a dense monotonic ID
  • New edges are assigned IDs on the cold path (first encounter); known edges hit the Occupied fast path — effectively the same cost as the previous hash, since both require hashing the same key
  • Buffer pre-allocated to 65536 entries (configurable via with_capacity()); edges beyond capacity are silently dropped
  • get_hitcount() returns only the used portion [0..edge_count()] instead of the full buffer
  • reset() clears counters but preserves ID assignments for reuse across iterations
  • Hitcount uses saturating_add instead of checked_add().unwrap_or()

New public API (backward-compatible additions):

  • with_capacity(n) — size the buffer for workloads with more edges
  • edge_count() — number of unique edges discovered
  • into_hitcount_with_size() — returns (buffer, used_size) so consumers only process meaningful entries
  • hitcount_mut() — mutable access for external coverage writers (e.g. sancov)
  • into_hitcount() — preserved for backward compatibility

Perf

  • Hot path (known edges): HashMap::get is ~same cost as SipHash + modulo since both hash the same 3 fields
  • Cold path (new edges): HashMap::insert — ~50ns, happens once per unique edge, amortized over millions of iterations
  • Merge step (downstream): iterating [0..used] instead of full 65536 is a 2-10x speedup for typical edge counts
  • Memory: HashMap overhead is ~500KB-2MB for 10-50K edges — negligible for fuzzing workloads

Testing

  • 8 new unit tests covering: collision-free IDs, same-edge increment, saturation at 255, capacity exhaustion, reset preserving IDs, into_hitcount_with_size, cross-address edge distinction, debug format
  • Existing integration test (test_edge_coverage) passes unchanged

Replace the hash-modulo scheme (hash(addr,pc,dest) % 65536) with a
HashMap that assigns each unique (address, pc, jump_dest) edge a
monotonically-increasing dense ID into a pre-allocated hitcount buffer.

This eliminates coverage map collisions where two unrelated edges share
the same counter, corrupting the feedback signal for coverage-guided
fuzzers.

Key changes:
- EdgeCovInspector now holds a HashMap<(Address, usize, U256), usize>
  for edge-to-ID mapping and a next_id counter
- New edges get a dense ID on first encounter (cold path); known edges
  hit the HashMap Occupied path (hot, O(1) amortized)
- Buffer pre-allocated to 65536 (configurable via with_capacity());
  edges beyond capacity are silently dropped
- get_hitcount() returns only the used portion [0..edge_count()]
- New API: with_capacity(), edge_count(), into_hitcount_with_size(),
  hitcount_mut() for downstream integration
- into_hitcount() preserved for backward compatibility
- reset() clears counters but preserves ID assignments across iterations
- Hitcount uses saturating_add (no overflow past 255)

Amp-Thread-ID: https://ampcode.com/threads/T-019c528b-16c7-7700-8700-02529160df29
Co-authored-by: Amp <amp@ampcode.com>
@gakonst
Copy link
Member Author

gakonst commented Feb 12, 2026

cc @grandizzy for review

src/edge_cov.rs Outdated
///
/// The hitcount buffer is fixed at construction time; if more unique edges
/// are discovered than the buffer can hold the extras are silently
/// dropped. Use [`EdgeCovInspector::with_capacity`] to size the buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of dropping, the map should grow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls see 7b35651

src/edge_cov.rs Outdated
}
};
if let Some(slot) = self.hitcount.get_mut(id) {
*slot = slot.saturating_add(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was purposeful self.hitcount[edge_id].checked_add(1).unwrap_or(1);

From https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.llvm.md#8-neverzero-counters

NeverZero prevents this behavior. If a counter wraps, it jumps over the value 0 directly to a 1. This improves path discovery (by a very small amount) at a very low cost (one instruction per edge).
(The alternative of saturated counters has been tested also and proved to be inferior in terms of path discovery.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls see 7b35651

// so it must be modulo the maximum edge count.
let edge_id = (hasher.finish() % MAX_EDGE_COUNT as u64) as usize;
self.hitcount[edge_id] = self.hitcount[edge_id].checked_add(1).unwrap_or(1);
let id = match self.edge_ids.entry((address, pc, jump_dest)) {
Copy link
Contributor

@0xalpharush 0xalpharush Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated but while we are here, it may be nice to incorporate the call depth (xref crytic/echidna#624). This will distinguish a top-level call from a nested call.

Ofc sometimes more precision in distinguishing executions can sometimes blow up the corpus. Given Echidna does it, it's probably worth doing (I wasn't aware of this when I implemented it fwiw).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if OK would follow up this with a different PR as I'd like to go more through foundry integration and implications / how to efficiently apply the depth

- Hitcount buffer now doubles when capacity is exceeded instead of
  silently dropping new edges.
- Restore AFL++ NeverZero semantics: wrapping_add(1).max(1) so a
  counter that wraps past 255 lands on 1, not 0. This preserves the
  'edge was hit' signal. Saturated counters were shown to be inferior
  for path discovery (see AFL++ docs).

Amp-Thread-ID: https://ampcode.com/threads/T-019c528b-16c7-7700-8700-02529160df29
Co-authored-by: Amp <amp@ampcode.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants