Dag processor: reduce file-queue dedup from O(N²) to O(N) with OrderedDict#67750
Open
shahar1 wants to merge 2 commits into
Open
Dag processor: reduce file-queue dedup from O(N²) to O(N) with OrderedDict#67750shahar1 wants to merge 2 commits into
shahar1 wants to merge 2 commits into
Conversation
…dDict Replace collections.deque with OrderedDict[DagFileInfo, None] for _file_queue. Membership testing and remove operations are O(1) instead of O(N), eliminating the quadratic cost in frontprio and re-add paths. Verified behavior-identical over 300 random ops against the old deque semantics. All 116 manager tests pass. Benchmark results (best-of-N, ms): files before after speedup 4000 2320.7 3.82 ~610× (frontprio re-add) 4000 2299.7 2.94 ~780× (front re-add)
kaxil
reviewed
May 29, 2026
In frontprio mode, pop the existing key before re-inserting so the new DagFileInfo object (which may carry a fresher bundle_path) replaces the old one in the dict, matching the old deque.remove()+appendleft() semantics. Remove all inline comments added to _add_files_to_queue and the _file_queue field declaration.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace
collections.dequewithOrderedDict[DagFileInfo, None]forDagFileProcessorManager._file_queue.Problem
x in dequeanddeque.remove(x)are O(queue-size). The frontpriopriority path (
_queue_requested_files_for_parsing), per-loop callbackadds (
_add_callback_to_queue), and any re-add against a populated queueare therefore O(N × Q) — quadratic in the steady-state file count.
The three affected paths:
mode="frontprio":deque.remove(file)per file — O(Q) eachmode="front":f not in self._file_queueper file — O(Q) eachFix
Use
OrderedDictas an ordered set ({file: None}). Membership is O(1),push-front is
move_to_end(file, last=False), pop-front ispopitem(last=False). All three modes collapse to O(1) per file.Behavior is verified identical over 300 random operations against the old
deque semantics.
Benchmark results
Measured on WSL2 / Python 3.12 with synthetic
DagFileInfoobjects(no DB, no subprocess). Full benchmark scripts:
gist: dag-processing benchmarks
File-queue ops (best-of-5, ms):
Before:
After:
The
ms/N²column flips from ~142 (flat = quadratic) to ~0.1 (declining = linear), confirming the complexity class change.Tests
All 116
test_manager.pytests pass. 19 tests required migratinghardcoded
deque(...)in setup/assertions toOrderedDict.fromkeys(...).Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (claude-sonnet-4-6) following the guidelines