⚡️ Speed up function find_last_node by 25,446%
#194
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 25,446% (254.46x) speedup for
find_last_nodeinsrc/algorithms/graph.py⏱️ Runtime :
92.3 milliseconds→361 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 254x speedup by eliminating a nested loop complexity issue. Here's why:
The Core Problem:
The original implementation uses a nested comprehension that checks
all(e["source"] != n["id"] for e in edges)for each node. This creates O(N × M) comparisons where N is the number of nodes and M is the number of edges. For every node candidate, the code must scan through all edges repeatedly.The Optimization:
The optimized version pre-computes a set of source IDs:
source_ids = {e["source"] for e in edges}. This transforms the problem into:This reduces the overall complexity from O(N × M) to O(N + M).
Why This Matters:
in) vs repeated iteration: Python sets use hash tables, making lookups nearly instantaneous compared to iterating through a list for each check.Performance Impact by Test Case:
test_large_linear_chaindrops from 18.6ms to 57.2μs because it eliminates ~1 million comparisons.Special Cases:
The optimization is universally beneficial for any non-trivial graph workload and essential for production code processing moderate to large graphs.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-find_last_node-mjhql0kqand push.