Skip to content

Document BFS dominator tree approximation bug and add regression test#2800

Open
pyricau wants to merge 15 commits intomainfrom
worktree-treemap-heapdump
Open

Document BFS dominator tree approximation bug and add regression test#2800
pyricau wants to merge 15 commits intomainfrom
worktree-treemap-heapdump

Conversation

@pyricau
Copy link
Member

@pyricau pyricau commented Feb 26, 2026

Summary

Fixes #2715 — retained sizes reported by LeakCanary could be significantly over-attributed to the wrong ancestor due to a known approximation bug in the incremental BFS+LCA dominator tree algorithm.

Root cause

DominatorTree builds dominators incrementally during BFS using a Lowest Common Ancestor approach. When a cross-edge (to an already-visited node) is processed, the parent's dominator may still be stale — it may later be raised by another cross-edge at the same BFS level. By the time the parent's dominator is corrected, the child's dominator is never revisited, leaving it too specific. The consequence is that the child's retained size is incorrectly attributed to an ancestor that doesn't actually exclusively dominate it.

Fix

  • Adds a convergence loop to DominatorTree: stores cross-edges during BFS, then re-processes them with updated dominator values after the BFS completes, iterating until the tree stabilizes (typically 2–3 passes)
  • Cross-edge storage is gated behind collectCrossEdges = true (opt-in, no overhead for callers that don't call runConvergenceLoop)
  • Filters at insertion time skip cross-edges where either endpoint is already attributed to the virtual root (NULL_REFERENCE), since re-processing them in the convergence loop would always be a no-op
  • Settled edges are pruned at the start of each convergence pass so later passes iterate fewer entries

Data structure improvement

Cross-edges were previously stored as MutableList<LongArray> (~40 bytes/entry with object header overhead and indirection). Replaced with LongPairList: a flat LongArray where each pair occupies two consecutive slots (~16 bytes/entry, ~2.5× memory reduction, better cache locality). Pruned entries are marked with a 0L sentinel rather than shrinking the array.

Production wiring

Enabled the convergence loop in all three production callers:

  • PrioritizingShortestPathFinder (used by LeakCanary leak tracing and ObjectDominators)
  • ObjectGrowthDetector

Tests

  • known bug - BFS ordering leaves child dominator too specific: asserts the current (incorrect) behavior before the fix, documenting the approximation
  • convergence loop fixes stale dominator attribution: asserts the same graph produces the correct result after runConvergenceLoop()
  • convergence loop stops at maxIterations: verifies the loop respects the iteration cap

Test plan

  • ./gradlew :shark:shark:test passes
  • ./gradlew :shark:shark:apiCheck passes

🤖 Generated with Claude Code

pyricau and others added 13 commits February 26, 2026 12:30
The incremental LCA-during-BFS approach produces incorrect immediate
dominators when same-level cross-edges cause a node's dominator to be
raised after its children have already been discovered. Those children
keep stale dominator pointers set too high, so retained sizes get
attributed to the wrong ancestor.

Add class- and method-level KDoc naming the limitation concretely, with
an ASCII graph of the minimal failing case. Add a passing test that
asserts the current (wrong) behavior so the bug is visible and any
future fix is automatically validated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The BFS+LCA incremental algorithm can leave a node's immediate dominator
too specific (too far from the virtual root) when a cross-edge P→C is
processed while dom(P) is still stale, and dom(P) is later raised by
another same-level cross-edge after C's LCA was already settled.

This introduces:
- `collectCrossEdges = true` constructor flag on DominatorTree, which
  records each cross-edge (already-visited target) during BFS
- `runConvergenceLoop(maxIterations)` that re-runs LCA on stored
  cross-edges until dominated[] stabilizes (typically 2–3 passes)

Documentation and tests are also updated:
- Fixes the existing test whose graph analysis was incorrect (the old
  example had dom(C)=root as the *correct* answer, not a bug); replaces
  it with a graph where the bug genuinely manifests (dom left too specific)
- Adds a test asserting that runConvergenceLoop fixes the attribution
- Adds a test verifying maxIterations=0 leaves the stale result intact

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename N to ObjectId for clarity
- Replace asRoot() with forestRoot > node, unifying all edges
  under the same > operator (forestRoot wraps NULL_REFERENCE)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three optimizations that eliminate redundant cross-edges:

1. Don't record a cross-edge if dom(objectId) is already NULL_REFERENCE at
   insertion time — the convergence loop would always skip it.

2. Don't record a cross-edge if dom(parentObjectId) is NULL_REFERENCE at
   insertion time — the LCA walk terminates in one step and produces the same
   result already written by the updateDominated call, so re-processing in the
   loop is a no-op.

3. After each convergence pass that produced changes, prune edges whose
   object has reached NULL_REFERENCE. This shrinks the list for subsequent
   passes, which typically converge in 2–3 iterations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cross-edges are recorded before the LCA result is stored in updateDominated,
so the LCA can set dom(objectId)=NULL_REFERENCE after the edge is already in
the list. These edges are inert (the convergence loop would always skip them)
but weren't cleaned up until a later pass with changed=true.

Add an initial pruneSettled() call at the top of runConvergenceLoop to handle
this, and factor out the shared removeAll predicate to avoid duplication.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The existing NULL_REFERENCE guard in the loop already skips settled edges in
O(1), so eagerly reclaiming that memory adds complexity with no practical
benefit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pruning before each pass rather than after means:
- Edges already settled after the BFS traversal (when updateDominated's own
  LCA set dom(objectId)=NULL_REFERENCE after the edge was recorded) are
  cleaned up before the very first pass.
- Edges settled during pass N are removed before pass N+1, reducing the
  iteration count for all subsequent passes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MutableList<LongArray> stores each edge as a separate heap object (~32 bytes
header + data + pointer), with poor cache locality. Replace it with a single
flat LongArray where consecutive pairs (objectId, parentObjectId) occupy
indices [i*2, i*2+1].

CrossEdgeBuffer:
- add(): appends a pair; doubles the array when full
- prune(): marks settled entries in-place using NULL_REFERENCE (0L) as a
  sentinel — safe since heap object IDs are always > 0; array never shrinks
- forEach(): inline higher-order function that skips marked entries, allowing
  the convergence loop body to modify captured vars (e.g. `changed`) without
  boxing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move the flat long-pair buffer to its own file as LongPairList, with no
  knowledge of dominators or NULL_REFERENCE semantics
- Replace prune(dominated) with clearAt(index) + forEachIndexed, keeping
  the data structure generic (0L as cleared-entry sentinel is documented
  as a caller constraint)
- Move pruning logic into DominatorTree.pruneSettledCrossEdges(), which owns
  the dominator-specific slot checks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes #2715

The convergence loop added to DominatorTree was not wired up anywhere,
so retained sizes computed by LeakCanary, ObjectDominators, and
ObjectGrowthDetector were still subject to the stale-dominator bug where
cross-edges processed with an out-of-date dom(parent) leave child
dominators too specific, causing over-attribution of retained sizes.

Enable collectCrossEdges = true in PrioritizingShortestPathFinder and
ObjectGrowthDetector, and call runConvergenceLoop() before
computeRetainedSizes() / buildFullDominatorTree() in all three callers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
data = data.copyOf(data.size * 2)
}
data[size * 2] = first
data[size * 2 + 1] = second
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move size*2 to a local var to avoid doing it 3 times

pyricau and others added 2 commits February 26, 2026 21:52
Benchmarks on gcroot_unknown_object.hprof (25 MB) revealed that the
convergence loop is not viable for production use:
  - 107,692 cross-edges stored after BFS
  - 781 iterations to converge (vs the assumed "2-3")
  - 62 s loop time on top of a 1.5 s analysis (~40x overhead)

The O(cross-edges × depth × iterations) complexity means correction
chains propagate one hop per iteration, so a heap with deep object
graphs requires as many iterations as the longest stale-dominator chain.

Update class-level and @param KDoc in DominatorTree to document the
performance warning. Add ConvergenceLoopBenchmark to measure the
overhead on a real heap dump (without loop vs with loop, 4 timed runs).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retained size computation is incorrect

1 participant