-
Notifications
You must be signed in to change notification settings - Fork 22
⚡️ Speed up method CommentMapper.visit_FunctionDef by 11% in PR #678 (standalone-fto-async)
#766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
codeflash-ai
wants to merge
100
commits into
standalone-fto-async
from
codeflash/optimize-pr678-2025-09-26T19.48.07
Closed
⚡️ Speed up method CommentMapper.visit_FunctionDef by 11% in PR #678 (standalone-fto-async)
#766
codeflash-ai
wants to merge
100
commits into
standalone-fto-async
from
codeflash/optimize-pr678-2025-09-26T19.48.07
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[LSP] Ensure optimizer cleanup on server shutdown or when the client suddenly disconnects
…licate-global-assignments-when-reverting-helpers
…/duplicate-global-assignments-when-reverting-helpers`) The optimized code achieves a **17% speedup** by eliminating redundant CST parsing operations, which are the most expensive parts of the function according to the line profiler. **Key optimizations:** 1. **Eliminate duplicate parsing**: The original code parsed `src_module_code` and `dst_module_code` multiple times. The optimized version introduces `_extract_global_statements_once()` that parses each module only once and reuses the parsed CST objects throughout the function. 2. **Reuse parsed modules**: Instead of re-parsing `dst_module_code` after modifications, the optimized version conditionally reuses the already-parsed `dst_module` when no global statements need insertion, avoiding unnecessary `cst.parse_module()` calls. 3. **Early termination**: Added an early return when `new_collector.assignments` is empty, avoiding the expensive `GlobalAssignmentTransformer` creation and visitation when there's nothing to transform. 4. **Minor optimization in uniqueness check**: Added a fast-path identity check (`stmt is existing_stmt`) before the expensive `deep_equals()` comparison, though this has minimal impact. **Performance impact by test case type:** - **Empty/minimal cases**: Show the highest gains (59-88% faster) due to early termination optimizations - **Standard cases**: Achieve consistent 20-30% improvements from reduced parsing - **Large-scale tests**: Benefit significantly (18-23% faster) as parsing overhead scales with code size The optimization is most effective for workloads with moderate to large code files where CST parsing dominates the runtime, as evidenced by the original profiler showing 70%+ of time spent in `cst.parse_module()` and `module.visit()` operations.
Signed-off-by: Saurabh Misra <[email protected]>
…25-08-25T18.50.33 ⚡️ Speed up function `add_global_assignments` by 18% in PR #683 (`fix/duplicate-global-assignments-when-reverting-helpers`)
…cs-in-diff [Lsp] return diff functions grouped by file
* lsp: get new/modified functions inside a git commit * better name * refactor * revert
* save optimization patches metadata * typo * lsp: get previous optimizations * fix patch name in non-lsp mode * ⚡️ Speed up function `get_patches_metadata` by 45% in PR #690 (`worktree/persist-optimization-patches`) The optimized code achieves a **44% speedup** through two key optimizations: **1. Added `@lru_cache(maxsize=1)` to `get_patches_dir_for_project()`** - This caches the Path object construction, avoiding repeated calls to `get_git_project_id()` and `Path()` creation - The line profiler shows this function's total time dropped from 5.32ms to being completely eliminated from the hot path in `get_patches_metadata()` - Since `get_git_project_id()` was already cached but still being called repeatedly, this second-level caching eliminates that redundancy **2. Replaced `read_text()` + `json.loads()` with `open()` + `json.load()`** - Using `json.load()` with a file handle is more efficient than reading the entire file into memory first with `read_text()` then parsing it - This avoids the intermediate string creation and is particularly beneficial for larger JSON files - Added explicit UTF-8 encoding for consistency **Performance Impact by Test Type:** - **Basic cases** (small/missing files): 45-65% faster - benefits primarily from the caching optimization - **Edge cases** (malformed JSON): 38-47% faster - still benefits from both optimizations - **Large scale cases** (1000+ patches, large files): 39-52% faster - the file I/O optimization becomes more significant with larger JSON files The caching optimization provides the most consistent gains across all scenarios since it eliminates repeated expensive operations, while the file I/O optimization scales with file size. * fix: patch path * codeflash suggestions * split the worktree utils in a separate file --------- Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Deque Comparator
* LSP reduce no of candidates * config revert * pass reference values to aiservices * line profiling loading msg --------- Co-authored-by: saga4 <[email protected]> Co-authored-by: ali <[email protected]>
* LSP reduce no of candidates * config revert * pass reference values to aiservices * fix inline condition --------- Co-authored-by: saga4 <[email protected]>
import variable correctly
Signed-off-by: Saurabh Misra <[email protected]>
support attrs comparison
apscheduler tries to schedule jobs when the interpreter is shutting down which can cause it to crash and leave us in a bad state
patch apscheduler when tracing
The optimized version eliminates recursive function calls by replacing the recursive `_find` helper with an iterative approach. This provides significant performance benefits: **Key Optimizations:** 1. **Removed Recursion Overhead**: The original code used a recursive helper function `_find` that created new stack frames for each parent traversal. The optimized version uses a simple iterative loop that traverses parents sequentially without function call overhead. 2. **Eliminated Function Creation**: The original code defined the `_find` function on every call to `find_target_node`. The optimized version removes this repeated function definition entirely. 3. **Early Exit with for-else**: The optimized code uses Python's `for-else` construct to immediately return `None` when a parent class isn't found, avoiding unnecessary continued searching. 4. **Reduced Attribute Access**: By caching `function_to_optimize.function_name` in a local variable `target_name` and reusing `body` variables, the code reduces repeated attribute lookups. **Performance Impact by Test Case:** - **Simple cases** (top-level functions, basic class methods): 23-62% faster due to eliminated recursion overhead - **Nested class scenarios**: 45-84% faster, with deeper nesting showing greater improvements as recursion elimination has more impact - **Large-scale tests**: 12-22% faster, showing consistent benefits even with many nodes to traverse - **Edge cases** (empty modules, non-existent classes): 52-76% faster due to more efficient early termination The optimization is particularly effective for deeply nested class hierarchies where the original recursive approach created multiple stack frames, while the iterative version maintains constant memory usage regardless of nesting depth.
…25-09-25T14.28.58 ⚡️ Speed up function `find_target_node` by 18% in PR #763 (`fix/correctly-find-funtion-node-when-reverting-helpers`)
…node-when-reverting-helpers [FIX] Respect parent classes in revert helpers
Granular async instrumentation
…d move other merged test below; finish resolving aiservice/config/explanation/function_optimizer; regenerate uv.lock
The optimized code achieves an 11% speedup through several targeted micro-optimizations that reduce attribute lookups and function call overhead in the hot path: **Key Optimizations:** 1. **Local Variable Caching**: Frequently accessed attributes (`self.context_stack`, `self.original_runtimes`, etc.) are stored in local variables at function start, eliminating repeated `self.` lookups in tight loops. 2. **Method Reference Caching**: `self.get_comment` is cached as `get_comment`, and `context_stack.append`/`pop` are stored locally, reducing method lookup overhead in the main processing loop. 3. **Type Tuple Pre-computation**: The commonly used type tuples for `isinstance` checks are stored in local variables (`_stmt_types`, `_node_stmt_assign`), avoiding tuple creation on every iteration. 4. **Optimized Node Collection**: The inefficient pattern of creating a list then extending it (`nodes_to_check = [...]; nodes_to_check.extend(...)`) is replaced with conditional unpacking (`[compound_line_node, *body_attr]` or `[compound_line_node]`), reducing list operations. 5. **f-string Usage**: String concatenations for `inv_id` and `match_key` are converted to f-strings, which are faster than concatenation operations. **Performance Characteristics:** - Best gains on **large-scale test cases** (24.9-32.5% faster) with many nested blocks or statements, where the micro-optimizations compound - Minimal overhead on **simple cases** (0.6-7.9% variance), showing the optimizations don't hurt baseline performance - Most effective when processing complex ASTs with deep nesting, as seen in the `test_large_many_nested_blocks` (24.9% faster) and `test_large_sparse_runtime_keys` (32.5% faster) cases The optimizations target the innermost loops where attribute lookups and object creation happen most frequently, making them particularly effective for batch AST processing workflows.
40c4108 to
7bbb1e7
Compare
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #678
If you approve this dependent PR, these changes will be merged into the original PR branch
standalone-fto-async.📄 11% (0.11x) speedup for
CommentMapper.visit_FunctionDefincodeflash/code_utils/edit_generated_tests.py⏱️ Runtime :
2.62 milliseconds→2.36 milliseconds(best of295runs)📝 Explanation and details
The optimized code achieves an 11% speedup through several targeted micro-optimizations that reduce attribute lookups and function call overhead in the hot path:
Key Optimizations:
Local Variable Caching: Frequently accessed attributes (
self.context_stack,self.original_runtimes, etc.) are stored in local variables at function start, eliminating repeatedself.lookups in tight loops.Method Reference Caching:
self.get_commentis cached asget_comment, andcontext_stack.append/popare stored locally, reducing method lookup overhead in the main processing loop.Type Tuple Pre-computation: The commonly used type tuples for
isinstancechecks are stored in local variables (_stmt_types,_node_stmt_assign), avoiding tuple creation on every iteration.Optimized Node Collection: The inefficient pattern of creating a list then extending it (
nodes_to_check = [...]; nodes_to_check.extend(...)) is replaced with conditional unpacking ([compound_line_node, *body_attr]or[compound_line_node]), reducing list operations.f-string Usage: String concatenations for
inv_idandmatch_keyare converted to f-strings, which are faster than concatenation operations.Performance Characteristics:
test_large_many_nested_blocks(24.9% faster) andtest_large_sparse_runtime_keys(32.5% faster) casesThe optimizations target the innermost loops where attribute lookups and object creation happen most frequently, making them particularly effective for batch AST processing workflows.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pr678-2025-09-26T19.48.07and push.