-
Notifications
You must be signed in to change notification settings - Fork 22
⚡️ Speed up method AsyncCallInstrumenter._process_test_function by 13% in PR #678 (standalone-fto-async)
#768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
codeflash-ai
wants to merge
100
commits into
standalone-fto-async
from
codeflash/optimize-pr678-2025-09-26T20.13.52
Closed
⚡️ Speed up method AsyncCallInstrumenter._process_test_function by 13% in PR #678 (standalone-fto-async)
#768
codeflash-ai
wants to merge
100
commits into
standalone-fto-async
from
codeflash/optimize-pr678-2025-09-26T20.13.52
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[LSP] Ensure optimizer cleanup on server shutdown or when the client suddenly disconnects
…licate-global-assignments-when-reverting-helpers
…/duplicate-global-assignments-when-reverting-helpers`) The optimized code achieves a **17% speedup** by eliminating redundant CST parsing operations, which are the most expensive parts of the function according to the line profiler. **Key optimizations:** 1. **Eliminate duplicate parsing**: The original code parsed `src_module_code` and `dst_module_code` multiple times. The optimized version introduces `_extract_global_statements_once()` that parses each module only once and reuses the parsed CST objects throughout the function. 2. **Reuse parsed modules**: Instead of re-parsing `dst_module_code` after modifications, the optimized version conditionally reuses the already-parsed `dst_module` when no global statements need insertion, avoiding unnecessary `cst.parse_module()` calls. 3. **Early termination**: Added an early return when `new_collector.assignments` is empty, avoiding the expensive `GlobalAssignmentTransformer` creation and visitation when there's nothing to transform. 4. **Minor optimization in uniqueness check**: Added a fast-path identity check (`stmt is existing_stmt`) before the expensive `deep_equals()` comparison, though this has minimal impact. **Performance impact by test case type:** - **Empty/minimal cases**: Show the highest gains (59-88% faster) due to early termination optimizations - **Standard cases**: Achieve consistent 20-30% improvements from reduced parsing - **Large-scale tests**: Benefit significantly (18-23% faster) as parsing overhead scales with code size The optimization is most effective for workloads with moderate to large code files where CST parsing dominates the runtime, as evidenced by the original profiler showing 70%+ of time spent in `cst.parse_module()` and `module.visit()` operations.
Signed-off-by: Saurabh Misra <[email protected]>
…25-08-25T18.50.33 ⚡️ Speed up function `add_global_assignments` by 18% in PR #683 (`fix/duplicate-global-assignments-when-reverting-helpers`)
…cs-in-diff [Lsp] return diff functions grouped by file
* lsp: get new/modified functions inside a git commit * better name * refactor * revert
* save optimization patches metadata * typo * lsp: get previous optimizations * fix patch name in non-lsp mode * ⚡️ Speed up function `get_patches_metadata` by 45% in PR #690 (`worktree/persist-optimization-patches`) The optimized code achieves a **44% speedup** through two key optimizations: **1. Added `@lru_cache(maxsize=1)` to `get_patches_dir_for_project()`** - This caches the Path object construction, avoiding repeated calls to `get_git_project_id()` and `Path()` creation - The line profiler shows this function's total time dropped from 5.32ms to being completely eliminated from the hot path in `get_patches_metadata()` - Since `get_git_project_id()` was already cached but still being called repeatedly, this second-level caching eliminates that redundancy **2. Replaced `read_text()` + `json.loads()` with `open()` + `json.load()`** - Using `json.load()` with a file handle is more efficient than reading the entire file into memory first with `read_text()` then parsing it - This avoids the intermediate string creation and is particularly beneficial for larger JSON files - Added explicit UTF-8 encoding for consistency **Performance Impact by Test Type:** - **Basic cases** (small/missing files): 45-65% faster - benefits primarily from the caching optimization - **Edge cases** (malformed JSON): 38-47% faster - still benefits from both optimizations - **Large scale cases** (1000+ patches, large files): 39-52% faster - the file I/O optimization becomes more significant with larger JSON files The caching optimization provides the most consistent gains across all scenarios since it eliminates repeated expensive operations, while the file I/O optimization scales with file size. * fix: patch path * codeflash suggestions * split the worktree utils in a separate file --------- Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Deque Comparator
* LSP reduce no of candidates * config revert * pass reference values to aiservices * line profiling loading msg --------- Co-authored-by: saga4 <[email protected]> Co-authored-by: ali <[email protected]>
* LSP reduce no of candidates * config revert * pass reference values to aiservices * fix inline condition --------- Co-authored-by: saga4 <[email protected]>
import variable correctly
Signed-off-by: Saurabh Misra <[email protected]>
support attrs comparison
apscheduler tries to schedule jobs when the interpreter is shutting down which can cause it to crash and leave us in a bad state
patch apscheduler when tracing
The optimized version eliminates recursive function calls by replacing the recursive `_find` helper with an iterative approach. This provides significant performance benefits: **Key Optimizations:** 1. **Removed Recursion Overhead**: The original code used a recursive helper function `_find` that created new stack frames for each parent traversal. The optimized version uses a simple iterative loop that traverses parents sequentially without function call overhead. 2. **Eliminated Function Creation**: The original code defined the `_find` function on every call to `find_target_node`. The optimized version removes this repeated function definition entirely. 3. **Early Exit with for-else**: The optimized code uses Python's `for-else` construct to immediately return `None` when a parent class isn't found, avoiding unnecessary continued searching. 4. **Reduced Attribute Access**: By caching `function_to_optimize.function_name` in a local variable `target_name` and reusing `body` variables, the code reduces repeated attribute lookups. **Performance Impact by Test Case:** - **Simple cases** (top-level functions, basic class methods): 23-62% faster due to eliminated recursion overhead - **Nested class scenarios**: 45-84% faster, with deeper nesting showing greater improvements as recursion elimination has more impact - **Large-scale tests**: 12-22% faster, showing consistent benefits even with many nodes to traverse - **Edge cases** (empty modules, non-existent classes): 52-76% faster due to more efficient early termination The optimization is particularly effective for deeply nested class hierarchies where the original recursive approach created multiple stack frames, while the iterative version maintains constant memory usage regardless of nesting depth.
…25-09-25T14.28.58 ⚡️ Speed up function `find_target_node` by 18% in PR #763 (`fix/correctly-find-funtion-node-when-reverting-helpers`)
…node-when-reverting-helpers [FIX] Respect parent classes in revert helpers
Granular async instrumentation
…d move other merged test below; finish resolving aiservice/config/explanation/function_optimizer; regenerate uv.lock
The optimization achieves a **12% speedup** through several targeted improvements in the `_process_test_function` and `_instrument_statement` methods: **Key Optimizations:** 1. **Variable hoisting and local references**: The optimized code extracts frequently accessed instance variables (`self.async_call_counter`, `node.name`) into local variables at the beginning of `_process_test_function`. It also creates local references to methods (`self._instrument_statement`, `new_body.append`) to avoid repeated attribute lookups during the main loop. 2. **Improved timeout decorator check**: Instead of using `any()` with a generator expression, the optimization uses an explicit loop with early termination when a timeout decorator is found. This avoids creating unnecessary generator objects and allows for faster short-circuiting. 3. **Optimized AST traversal**: The most significant improvement is replacing `ast.walk()` with a manual stack-based traversal using `ast.iter_child_nodes()` in `_instrument_statement`. This eliminates the overhead of `ast.walk()`'s recursive generator and provides better control over the traversal process. 4. **Simplified counter management**: The optimization tracks the call index locally during processing and only updates the instance variable once at the end, reducing dictionary access overhead. **Performance Impact by Test Case:** - **Small functions**: 61-130% faster for basic test cases with minimal statements - **Empty/simple functions**: 71-119% faster due to reduced overhead in the main processing loop - **Large-scale functions**: 11.5% faster for functions with 500+ await statements, where the AST traversal optimization becomes most beneficial The optimizations are particularly effective for functions with many statements where the improved AST traversal and reduced attribute lookups compound to significant savings.
40c4108 to
7bbb1e7
Compare
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #678
If you approve this dependent PR, these changes will be merged into the original PR branch
standalone-fto-async.📄 13% (0.13x) speedup for
AsyncCallInstrumenter._process_test_functionincodeflash/code_utils/instrument_existing_tests.py⏱️ Runtime :
2.35 milliseconds→2.09 milliseconds(best of15runs)📝 Explanation and details
The optimization achieves a 12% speedup through several targeted improvements in the
_process_test_functionand_instrument_statementmethods:Key Optimizations:
Variable hoisting and local references: The optimized code extracts frequently accessed instance variables (
self.async_call_counter,node.name) into local variables at the beginning of_process_test_function. It also creates local references to methods (self._instrument_statement,new_body.append) to avoid repeated attribute lookups during the main loop.Improved timeout decorator check: Instead of using
any()with a generator expression, the optimization uses an explicit loop with early termination when a timeout decorator is found. This avoids creating unnecessary generator objects and allows for faster short-circuiting.Optimized AST traversal: The most significant improvement is replacing
ast.walk()with a manual stack-based traversal usingast.iter_child_nodes()in_instrument_statement. This eliminates the overhead ofast.walk()'s recursive generator and provides better control over the traversal process.Simplified counter management: The optimization tracks the call index locally during processing and only updates the instance variable once at the end, reducing dictionary access overhead.
Performance Impact by Test Case:
The optimizations are particularly effective for functions with many statements where the improved AST traversal and reduced attribute lookups compound to significant savings.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pr678-2025-09-26T20.13.52and push.