⚡️ Speed up function _get_all_json_refs by 40%
#210
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 40% (0.40x) speedup for
_get_all_json_refsinsrc/algorithms/search.py⏱️ Runtime :
848 microseconds→606 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 39% speedup by replacing recursive calls with an iterative approach using an explicit stack. Here's why this matters:
Key Optimization: Recursion → Iteration
What changed:
_get_all_json_refs()and usesrefs.update()to merge results from child nodeswhileloop with an explicit stack to traverse the JSON structure iterativelyWhy it's faster:
Eliminated Function Call Overhead: The original code made 2,420 recursive calls (visible in line profiler as 2,480 total hits with recursive update operations). Each function call in Python involves:
refsAvoided Repeated Set Operations: The original used
refs.update()to merge child results back into parent sets. The line profiler shows:refs.update()calls consuming ~3.6ms (30% of total time)refssetBetter Memory Locality: The iterative approach maintains one
refsset and onestacklist, improving cache efficiency compared to multiple temporary sets across recursive calls.Performance by Test Case Type
test_large_nested_structure: 101μs → 28.7μs (255% faster) - the recursive version suffers worst with deep nesting (100 levels deep)test_large_flat_dict_of_refs: 36.2μs → 30.8μs (17% faster)test_large_mixed_structure: 38.8μs → 27.5μs (41% faster)Impact Analysis
Based on the function name
_get_all_json_refs(JSON schema reference extraction), this is likely used in:These tools often process complex, deeply nested schemas where this optimization would have significant cumulative impact. The 39% average speedup means schemas that took 1 second to process now take ~606ms, which compounds across large codebases or high-throughput validation scenarios.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
🔎 Click to see Concolic Coverage Tests
codeflash_concolic_y60jt975/tmp0f2e05ub/test_concolic_coverage.py::test__get_all_json_refscodeflash_concolic_y60jt975/tmp0f2e05ub/test_concolic_coverage.py::test__get_all_json_refs_2codeflash_concolic_y60jt975/tmp0f2e05ub/test_concolic_coverage.py::test__get_all_json_refs_3codeflash_concolic_y60jt975/tmp0f2e05ub/test_concolic_coverage.py::test__get_all_json_refs_4To edit these changes
git checkout codeflash/optimize-_get_all_json_refs-mji1cxhyand push.