Document BFS dominator tree approximation bug and add regression test#2800
Open
Document BFS dominator tree approximation bug and add regression test#2800
Conversation
The incremental LCA-during-BFS approach produces incorrect immediate dominators when same-level cross-edges cause a node's dominator to be raised after its children have already been discovered. Those children keep stale dominator pointers set too high, so retained sizes get attributed to the wrong ancestor. Add class- and method-level KDoc naming the limitation concretely, with an ASCII graph of the minimal failing case. Add a passing test that asserts the current (wrong) behavior so the bug is visible and any future fix is automatically validated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The BFS+LCA incremental algorithm can leave a node's immediate dominator too specific (too far from the virtual root) when a cross-edge P→C is processed while dom(P) is still stale, and dom(P) is later raised by another same-level cross-edge after C's LCA was already settled. This introduces: - `collectCrossEdges = true` constructor flag on DominatorTree, which records each cross-edge (already-visited target) during BFS - `runConvergenceLoop(maxIterations)` that re-runs LCA on stored cross-edges until dominated[] stabilizes (typically 2–3 passes) Documentation and tests are also updated: - Fixes the existing test whose graph analysis was incorrect (the old example had dom(C)=root as the *correct* answer, not a bug); replaces it with a graph where the bug genuinely manifests (dom left too specific) - Adds a test asserting that runConvergenceLoop fixes the attribution - Adds a test verifying maxIterations=0 leaves the stale result intact Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename N to ObjectId for clarity - Replace asRoot() with forestRoot > node, unifying all edges under the same > operator (forestRoot wraps NULL_REFERENCE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three optimizations that eliminate redundant cross-edges: 1. Don't record a cross-edge if dom(objectId) is already NULL_REFERENCE at insertion time — the convergence loop would always skip it. 2. Don't record a cross-edge if dom(parentObjectId) is NULL_REFERENCE at insertion time — the LCA walk terminates in one step and produces the same result already written by the updateDominated call, so re-processing in the loop is a no-op. 3. After each convergence pass that produced changes, prune edges whose object has reached NULL_REFERENCE. This shrinks the list for subsequent passes, which typically converge in 2–3 iterations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cross-edges are recorded before the LCA result is stored in updateDominated, so the LCA can set dom(objectId)=NULL_REFERENCE after the edge is already in the list. These edges are inert (the convergence loop would always skip them) but weren't cleaned up until a later pass with changed=true. Add an initial pruneSettled() call at the top of runConvergenceLoop to handle this, and factor out the shared removeAll predicate to avoid duplication. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The existing NULL_REFERENCE guard in the loop already skips settled edges in O(1), so eagerly reclaiming that memory adds complexity with no practical benefit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pruning before each pass rather than after means: - Edges already settled after the BFS traversal (when updateDominated's own LCA set dom(objectId)=NULL_REFERENCE after the edge was recorded) are cleaned up before the very first pass. - Edges settled during pass N are removed before pass N+1, reducing the iteration count for all subsequent passes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MutableList<LongArray> stores each edge as a separate heap object (~32 bytes header + data + pointer), with poor cache locality. Replace it with a single flat LongArray where consecutive pairs (objectId, parentObjectId) occupy indices [i*2, i*2+1]. CrossEdgeBuffer: - add(): appends a pair; doubles the array when full - prune(): marks settled entries in-place using NULL_REFERENCE (0L) as a sentinel — safe since heap object IDs are always > 0; array never shrinks - forEach(): inline higher-order function that skips marked entries, allowing the convergence loop body to modify captured vars (e.g. `changed`) without boxing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move the flat long-pair buffer to its own file as LongPairList, with no knowledge of dominators or NULL_REFERENCE semantics - Replace prune(dominated) with clearAt(index) + forEachIndexed, keeping the data structure generic (0L as cleared-entry sentinel is documented as a caller constraint) - Move pruning logic into DominatorTree.pruneSettledCrossEdges(), which owns the dominator-specific slot checks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes #2715 The convergence loop added to DominatorTree was not wired up anywhere, so retained sizes computed by LeakCanary, ObjectDominators, and ObjectGrowthDetector were still subject to the stale-dominator bug where cross-edges processed with an out-of-date dom(parent) leave child dominators too specific, causing over-attribution of retained sizes. Enable collectCrossEdges = true in PrioritizingShortestPathFinder and ObjectGrowthDetector, and call runConvergenceLoop() before computeRetainedSizes() / buildFullDominatorTree() in all three callers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pyricau
commented
Feb 26, 2026
| data = data.copyOf(data.size * 2) | ||
| } | ||
| data[size * 2] = first | ||
| data[size * 2 + 1] = second |
Member
Author
There was a problem hiding this comment.
Move size*2 to a local var to avoid doing it 3 times
This reverts commit c8ab0e4.
Benchmarks on gcroot_unknown_object.hprof (25 MB) revealed that the convergence loop is not viable for production use: - 107,692 cross-edges stored after BFS - 781 iterations to converge (vs the assumed "2-3") - 62 s loop time on top of a 1.5 s analysis (~40x overhead) The O(cross-edges × depth × iterations) complexity means correction chains propagate one hop per iteration, so a heap with deep object graphs requires as many iterations as the longest stale-dominator chain. Update class-level and @param KDoc in DominatorTree to document the performance warning. Add ConvergenceLoopBenchmark to measure the overhead on a real heap dump (without loop vs with loop, 4 timed runs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2715 — retained sizes reported by LeakCanary could be significantly over-attributed to the wrong ancestor due to a known approximation bug in the incremental BFS+LCA dominator tree algorithm.
Root cause
DominatorTreebuilds dominators incrementally during BFS using a Lowest Common Ancestor approach. When a cross-edge (to an already-visited node) is processed, the parent's dominator may still be stale — it may later be raised by another cross-edge at the same BFS level. By the time the parent's dominator is corrected, the child's dominator is never revisited, leaving it too specific. The consequence is that the child's retained size is incorrectly attributed to an ancestor that doesn't actually exclusively dominate it.Fix
DominatorTree: stores cross-edges during BFS, then re-processes them with updated dominator values after the BFS completes, iterating until the tree stabilizes (typically 2–3 passes)collectCrossEdges = true(opt-in, no overhead for callers that don't callrunConvergenceLoop)NULL_REFERENCE), since re-processing them in the convergence loop would always be a no-opData structure improvement
Cross-edges were previously stored as
MutableList<LongArray>(~40 bytes/entry with object header overhead and indirection). Replaced withLongPairList: a flatLongArraywhere each pair occupies two consecutive slots (~16 bytes/entry, ~2.5× memory reduction, better cache locality). Pruned entries are marked with a0Lsentinel rather than shrinking the array.Production wiring
Enabled the convergence loop in all three production callers:
PrioritizingShortestPathFinder(used by LeakCanary leak tracing andObjectDominators)ObjectGrowthDetectorTests
known bug - BFS ordering leaves child dominator too specific: asserts the current (incorrect) behavior before the fix, documenting the approximationconvergence loop fixes stale dominator attribution: asserts the same graph produces the correct result afterrunConvergenceLoop()convergence loop stops at maxIterations: verifies the loop respects the iteration capTest plan
./gradlew :shark:shark:testpasses./gradlew :shark:shark:apiCheckpasses🤖 Generated with Claude Code