-
Notifications
You must be signed in to change notification settings - Fork 21
Granular async instrumentation #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: standalone-fto-async
Are you sure you want to change the base?
Conversation
PR Code Suggestions ✨Latest suggestions up to 91b8902 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit c9aaaad
Suggestions up to commit 14af1a8
|
52dbe88
to
b153989
Compare
ef07e94
to
0a57afa
Compare
add an e2e test for this |
Add tests in the style of codeflash/tests/test_instrument_tests.py Line 289 in 674e69e
|
import_transformer = AsyncDecoratorImportAdder(mode) | ||
module = module.visit(import_transformer) | ||
|
||
return isort.code(module.code, float_to_top=True), decorator_transformer.added_decorator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are isorting user's code?
apply testgen-async fix bug when iterating over star imports fix cst * import errors
…687 (`granular-async-instrumentation`) The optimization replaces expensive `Path` object creation and method calls with direct string manipulation operations, delivering a **491% speedup**. **Key optimizations:** 1. **Eliminated Path object overhead**: Replaced `Path(filename).stem.startswith("test_")` with `filename.rpartition('/')[-1].rpartition('\\')[-1].rpartition('.')[0].startswith("test_")` - avoiding Path instantiation entirely. 2. **Optimized path parts extraction**: Replaced `Path(filename).parts` with `filename.replace('\\', '/').split('/')` - using simple string operations instead of Path parsing. **Performance impact analysis:** - Original profiler shows lines 25-26 (Path operations) consumed **86.3%** of total runtime (44.7% + 41.6%) - Optimized version reduces these same operations to just **25.4%** of runtime (15% + 10.4%) - The string manipulation operations are ~6x faster per call than Path object creation **Test case benefits:** - **Large-scale tests** see the biggest gains (516% faster for 900-frame stack, 505% faster for 950-frame chain) because the Path overhead multiplies with stack depth - **Edge cases** with complex paths benefit significantly (182-206% faster for subdirectory and pytest frame tests) - **Basic tests** show minimal overhead since Path operations weren't the bottleneck in shallow stacks The optimization maintains identical behavior while eliminating the most expensive operations identified in the profiling data - Path object instantiation and method calls that occurred once per stack frame.
…(`granular-async-instrumentation`) The optimization replaces `ast.walk(node)` with direct iteration over `node.body` in the `visit_ClassDef` method. This is a significant algorithmic improvement because: **What was changed:** - Changed `for inner_node in ast.walk(node):` to `for inner_node in node.body:` **Why this leads to a speedup:** - `ast.walk(node)` recursively traverses ALL descendant nodes in the AST subtree (classes, functions, statements, expressions, etc.), which creates unnecessary overhead - `node.body` directly accesses only the immediate children of the class definition - The line profiler shows the iteration went from 10,032 hits to just 409 hits - a 96% reduction in loop iterations - The time spent on the iteration line dropped from 67.8% to 0.6% of total execution time **Performance characteristics:** - The optimization is most effective for classes with complex nested structures, as shown by the 196% speedup - Large-scale test cases with 100+ methods and nested compound statements benefit significantly - Basic test cases with simple class structures also see improvements due to reduced AST traversal overhead - The optimization preserves exact functionality since we only need immediate class body elements (methods) anyway This is a classic case of using the right data structure access pattern - direct indexing instead of tree traversal when you only need immediate children.
⚡️ Codeflash found optimizations for this PR📄 197% (1.97x) speedup for
|
…25-09-03T05.07.18 ⚡️ Speed up method `CommentMapper.visit_ClassDef` by 197% in PR #687 (`granular-async-instrumentation`)
This PR is now faster! 🚀 @KRRT7 accepted my optimizations from: |
…(`granular-async-instrumentation`) The optimization replaces the expensive `ast.walk()` call with a targeted node traversal that only checks the immediate statement and its direct body children. **Key change:** Instead of `ast.walk(compound_line_node)` which recursively traverses the entire AST subtree, the optimized code creates a focused list: ```python nodes_to_check = [compound_line_node] nodes_to_check.extend(getattr(compound_line_node, 'body', [])) ``` This dramatically reduces the number of nodes processed in the inner loop. The line profiler shows `ast.walk()` was the major bottleneck (46.2% of total time, 8.23ms), while the optimized version's equivalent loop takes only 1.9% of total time (180μs). **Why this works:** The code only needs to check statements at the current level and one level deep (direct children in compound statement bodies like `for`, `if`, `while`, `with`). The original `ast.walk()` was doing unnecessary deep traversal of nested structures. **Performance impact:** The optimization is most effective for test cases with compound statements (for/while/if/with blocks) containing multiple nested nodes, showing 73-156% speedups in those scenarios. Simple statement functions see smaller but consistent 1-3% improvements due to reduced overhead.
⚡️ Codeflash found optimizations for this PR📄 84% (0.84x) speedup for
|
…25-09-03T05.27.10 ⚡️ Speed up method `CommentMapper.visit_FunctionDef` by 84% in PR #687 (`granular-async-instrumentation`)
This PR is now faster! 🚀 @KRRT7 accepted my optimizations from: |
…#687 (`granular-async-instrumentation`) The optimized code achieves an 11% speedup through several key micro-optimizations that reduce Python's runtime overhead: **1. Cached Attribute/Dictionary Lookups** The most impactful change is caching frequently accessed attributes and dictionaries as local variables: - `context_stack = self.context_stack` - `results = self.results` - `original_runtimes = self.original_runtimes` - `optimized_runtimes = self.optimized_runtimes` - `get_comment = self.get_comment` This eliminates repeated `self.` attribute lookups in the tight loops, which the profiler shows are called thousands of times (2,825+ iterations). **2. Pre-cached Loop Bodies** Caching `node_body = node.body` and `ln_body = line_node.body` before loops reduces attribute access overhead. The profiler shows these are accessed in nested loops with high hit counts. **3. Optimized String Operations** Using f-strings (`f"{test_qualified_name}#{self.abs_path}"`, `f"{i}_{j}"`) instead of string concatenation with `+` operators reduces temporary object creation and string manipulation overhead. **4. Refined getattr Usage** Changed from `getattr(compound_line_node, "body", [])` to `getattr(compound_line_node, 'body', None)` with a conditional check, avoiding allocation of empty lists when no body exists. **Performance Impact by Test Type:** - **Large-scale tests** show the biggest gains (14-117% faster) due to the cumulative effect of micro-optimizations in loops - **Compound statement tests** benefit significantly (16-45% faster) from reduced attribute lookups in nested processing - **Simple cases** show modest improvements (1-6% faster) as overhead reduction is less pronounced - **Edge cases** with no matching runtimes benefit from faster loop traversal (3-12% faster) The optimizations are most effective for functions with many statements or nested compound structures, where the tight loops amplify the benefit of reduced Python interpreter overhead.
⚡️ Codeflash found optimizations for this PR📄 11% (0.11x) speedup for
|
User description
dependent on #678
PR Type
Enhancement, Tests
Description
Detect and flag async functions in discovery
Instrument async functions with decorators
Add async behavior & performance wrappers
Add comprehensive async instrumentation tests
Diagram Walkthrough
File Walkthrough
6 files
Pass `is_async` flag in testgen payload
Resolve and skip star imports in extraction
Add async behavior and performance wrappers
Add async decorators to existing source
Update qualified_name for nested functions
Integrate async instrumentation into optimizer
2 files
Add async wrapper SQLite validation tests
Add async instrumentation tests for source
1 files
Update debug args for async example
1 files
Add `pytest-asyncio` dependency