Conversation
Replace the hash-modulo scheme (hash(addr,pc,dest) % 65536) with a HashMap that assigns each unique (address, pc, jump_dest) edge a monotonically-increasing dense ID into a pre-allocated hitcount buffer. This eliminates coverage map collisions where two unrelated edges share the same counter, corrupting the feedback signal for coverage-guided fuzzers. Key changes: - EdgeCovInspector now holds a HashMap<(Address, usize, U256), usize> for edge-to-ID mapping and a next_id counter - New edges get a dense ID on first encounter (cold path); known edges hit the HashMap Occupied path (hot, O(1) amortized) - Buffer pre-allocated to 65536 (configurable via with_capacity()); edges beyond capacity are silently dropped - get_hitcount() returns only the used portion [0..edge_count()] - New API: with_capacity(), edge_count(), into_hitcount_with_size(), hitcount_mut() for downstream integration - into_hitcount() preserved for backward compatibility - reset() clears counters but preserves ID assignments across iterations - Hitcount uses saturating_add (no overflow past 255) Amp-Thread-ID: https://ampcode.com/threads/T-019c528b-16c7-7700-8700-02529160df29 Co-authored-by: Amp <amp@ampcode.com>
|
cc @grandizzy for review |
Amp-Thread-ID: https://ampcode.com/threads/T-019c528b-16c7-7700-8700-02529160df29 Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019c528b-16c7-7700-8700-02529160df29 Co-authored-by: Amp <amp@ampcode.com>
src/edge_cov.rs
Outdated
| /// | ||
| /// The hitcount buffer is fixed at construction time; if more unique edges | ||
| /// are discovered than the buffer can hold the extras are silently | ||
| /// dropped. Use [`EdgeCovInspector::with_capacity`] to size the buffer |
There was a problem hiding this comment.
Instead of dropping, the map should grow
src/edge_cov.rs
Outdated
| } | ||
| }; | ||
| if let Some(slot) = self.hitcount.get_mut(id) { | ||
| *slot = slot.saturating_add(1); |
There was a problem hiding this comment.
This was purposeful self.hitcount[edge_id].checked_add(1).unwrap_or(1);
NeverZero prevents this behavior. If a counter wraps, it jumps over the value 0 directly to a 1. This improves path discovery (by a very small amount) at a very low cost (one instruction per edge).
(The alternative of saturated counters has been tested also and proved to be inferior in terms of path discovery.)
| // so it must be modulo the maximum edge count. | ||
| let edge_id = (hasher.finish() % MAX_EDGE_COUNT as u64) as usize; | ||
| self.hitcount[edge_id] = self.hitcount[edge_id].checked_add(1).unwrap_or(1); | ||
| let id = match self.edge_ids.entry((address, pc, jump_dest)) { |
There was a problem hiding this comment.
Unrelated but while we are here, it may be nice to incorporate the call depth (xref crytic/echidna#624). This will distinguish a top-level call from a nested call.
Ofc sometimes more precision in distinguishing executions can sometimes blow up the corpus. Given Echidna does it, it's probably worth doing (I wasn't aware of this when I implemented it fwiw).
There was a problem hiding this comment.
if OK would follow up this with a different PR as I'd like to go more through foundry integration and implications / how to efficiently apply the depth
- Hitcount buffer now doubles when capacity is exceeded instead of silently dropping new edges. - Restore AFL++ NeverZero semantics: wrapping_add(1).max(1) so a counter that wraps past 255 lands on 1, not 0. This preserves the 'edge was hit' signal. Saturated counters were shown to be inferior for path discovery (see AFL++ docs). Amp-Thread-ID: https://ampcode.com/threads/T-019c528b-16c7-7700-8700-02529160df29 Co-authored-by: Amp <amp@ampcode.com>
Summary
Replace the hash-modulo scheme (
hash(addr,pc,dest) % 65536) with aHashMap-based dense ID assignment that eliminates edge collisions in coverage-guided fuzzing.Motivation
The current
EdgeCovInspectorhashes each(address, pc, jump_dest)tuple and truncates to a 65536-entry buffer. With large contracts or instrumented native code (e.g. sancov-instrumented precompiles), distinct edges frequently alias to the same hitcount slot, corrupting the coverage signal. The fuzzer can't distinguish "new edge A" from "more hits on existing edge B", degrading guidance quality.Changes
EdgeCovInspectornow holds aHashMap<(Address, usize, U256), usize>mapping each unique edge to a dense monotonic IDOccupiedfast path — effectively the same cost as the previous hash, since both require hashing the same keywith_capacity()); edges beyond capacity are silently droppedget_hitcount()returns only the used portion[0..edge_count()]instead of the full bufferreset()clears counters but preserves ID assignments for reuse across iterationssaturating_addinstead ofchecked_add().unwrap_or()New public API (backward-compatible additions):
with_capacity(n)— size the buffer for workloads with more edgesedge_count()— number of unique edges discoveredinto_hitcount_with_size()— returns(buffer, used_size)so consumers only process meaningful entrieshitcount_mut()— mutable access for external coverage writers (e.g. sancov)into_hitcount()— preserved for backward compatibilityPerf
HashMap::getis ~same cost asSipHash + modulosince both hash the same 3 fieldsHashMap::insert— ~50ns, happens once per unique edge, amortized over millions of iterations[0..used]instead of full 65536 is a 2-10x speedup for typical edge countsTesting
into_hitcount_with_size, cross-address edge distinction, debug formattest_edge_coverage) passes unchanged