⚡️ Speed up method AsyncCallInstrumenter._call_in_positions by 69% in PR #739 (get-throughput-from-output)
#745
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #739
If you approve this dependent PR, these changes will be merged into the original PR branch
get-throughput-from-output.📄 69% (0.69x) speedup for
AsyncCallInstrumenter._call_in_positionsincodeflash/code_utils/instrument_existing_tests.py⏱️ Runtime :
313 microseconds→185 microseconds(best of137runs)📝 Explanation and details
The optimization caches frequently accessed object attributes outside the inner loop to reduce redundant attribute lookups. In the
node_in_call_positionfunction, the original code repeatedly accessednode.lineno,node.end_lineno,node.col_offset,node.end_col_offset, andpos.line_noon every iteration of thecall_positionsloop.The optimized version hoists these attribute lookups outside the loop:
node_lineno = node.linenonode_end_lineno = node.end_linenonode_col_offset = node.col_offsetnode_end_col_offset = node.end_col_offsetpos_line_no = pos.line_no(inside the loop but outside the nested conditions)This change is particularly effective for scenarios with many call positions to check, as evidenced by the large-scale test cases showing 57-148% speedup. The profiler data confirms the optimization reduces time spent on attribute access - the original spent 25.4% of time just on the
for pos in call_positions:line accessing attributes, while the optimized version shows improved distribution of execution time.Python attribute access involves dictionary lookups under the hood, so caching these values in local variables (which are stored in a fast array-based structure) provides significant performance gains when the same attributes are accessed repeatedly in tight loops.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pr739-2025-09-22T20.20.14and push.