Skip to content

Conversation

KRRT7
Copy link
Contributor

@KRRT7 KRRT7 commented Aug 25, 2025

User description

dependent on #678


PR Type

Enhancement, Tests


Description

  • Detect and flag async functions in discovery

  • Instrument async functions with decorators

  • Add async behavior & performance wrappers

  • Add comprehensive async instrumentation tests


Diagram Walkthrough

flowchart LR
  A["AST parsing\n(code_extractor)"] -- "detect AsyncFunctionDef" --> B["Flag is_async\n(FunctionToOptimize)"]
  B -- "pass flag" --> C["Generate tests\n(aiservice)"]
  B -- "instrument code" --> D["Add async decorators\n(instrument_existing_tests)"]
  D -- "capture runtime" --> E["Async wrappers\n(codeflash_wrap_decorator)"]
  E -- "optimize pipeline" --> F["function_optimizer"]
Loading

File Walkthrough

Relevant files
Enhancement
6 files
aiservice.py
Pass `is_async` flag in testgen payload                                   
+2/-0     
code_extractor.py
Resolve and skip star imports in extraction                           
+86/-3   
codeflash_wrap_decorator.py
Add async behavior and performance wrappers                           
+253/-0 
instrument_existing_tests.py
Add async decorators to existing source                                   
+197/-2 
functions_to_optimize.py
Update qualified_name for nested functions                             
+5/-1     
function_optimizer.py
Integrate async instrumentation into optimizer                     
+93/-19 
Tests
2 files
test_async_wrapper_sqlite_validation.py
Add async wrapper SQLite validation tests                               
+286/-0 
test_instrument_async_tests.py
Add async instrumentation tests for source                             
+537/-0 
Configuration changes
1 files
codeflash.code-workspace
Update debug args for async example                                           
+1/-1     
Dependencies
1 files
pyproject.toml
Add `pytest-asyncio` dependency                                                   
+1/-0     

@KRRT7 KRRT7 changed the base branch from main to standalone-fto-async August 25, 2025 22:55
Copy link

github-actions bot commented Aug 25, 2025

PR Code Suggestions ✨

Latest suggestions up to 91b8902

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Correct star import detection

The child.names attribute is a sequence, not a single node, so
isinstance(child.names, cst.ImportStar) will never be true. Instead, check each
alias in the list and skip the entire import-from if any alias is a star import.

codeflash/code_utils/code_extractor.py [275-276]

-if isinstance(child.names, cst.ImportStar):
+if any(isinstance(alias, cst.ImportStar) for alias in child.names):
     continue
Suggestion importance[1-10]: 8

__

Why: The isinstance(child.names, cst.ImportStar) check always fails because child.names is a list, so using any(isinstance(alias, cst.ImportStar) ...) correctly detects star imports and skips them.

Medium

Previous suggestions

Suggestions up to commit c9aaaad
CategorySuggestion                                                                                                                                    Impact
Possible issue
Include full async execution time

Remove the second timestamp reset so that the measured duration includes both
synchronous and asynchronous execution time. Always start the timer once before
calling the function. This ensures the reported duration accurately reflects total
runtime.

codeflash/code_utils/codeflash_wrap_decorator.py [62-71]

 counter = time.perf_counter_ns()
 ret = func(*args, **kwargs)
 
 if inspect.isawaitable(ret):
-    counter = time.perf_counter_ns()
     return_value = await ret
 else:
     return_value = ret
 
 codeflash_duration = time.perf_counter_ns() - counter
Suggestion importance[1-10]: 8

__

Why: The extra counter = time.perf_counter_ns() inside the await branch discards the time spent before awaiting, so removing it yields an accurate total runtime measurement.

Medium
Suggestions up to commit 14af1a8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Support dict parents in qualified_name

Update the property to handle both FunctionParent objects and dicts in self.parents,
avoiding attribute errors when parents are passed as mappings. Build a list of names
using a conditional that checks for name attribute first, then dict key.

codeflash/discovery/functions_to_optimize.py [162-166]

 @property
 def qualified_name(self) -> str:
     if not self.parents:
         return self.function_name
-    # Join all parent names with dots to handle nested classes properly
-    parent_path = ".".join(parent.name for parent in self.parents)
+    parent_names = [
+        p.name if hasattr(p, "name") else p["name"]
+        for p in self.parents
+    ]
+    parent_path = ".".join(parent_names)
     return f"{parent_path}.{self.function_name}"
Suggestion importance[1-10]: 9

__

Why: This prevents attribute errors when self.parents contains dicts as in tests, ensuring qualified_name works in all cases.

High
General
Measure entire async call duration

Remove the second reset of counter before await so that codeflash_duration covers
the full execution time of the async function, including any synchronous setup.

codeflash/code_utils/codeflash_wrap_decorator.py [63-72]

 counter = time.perf_counter_ns()
 ret = func(*args, **kwargs)
 
 if inspect.isawaitable(ret):
-    counter = time.perf_counter_ns()
     return_value = await ret
 else:
     return_value = ret
 
 codeflash_duration = time.perf_counter_ns() - counter
Suggestion importance[1-10]: 6

__

Why: Removing the reset of counter ensures the duration includes both synchronous setup and asynchronous execution, improving timing accuracy.

Low
Log instrumentation errors

Log exceptions in the except block so failures to instrument async decorators are
surfaced during debugging instead of silently ignored.

codeflash/code_utils/instrument_existing_tests.py [325-345]

 def instrument_source_module_with_async_decorators(
     ...
 ) -> tuple[bool, str | None]:
     ...
     try:
         ...
     except Exception as e:
+        logger.exception(
+            f"Failed to instrument async decorator for "
+            f"{function_to_optimize.qualified_name} in {source_path}: {e}"
+        )
         return False, None
Suggestion importance[1-10]: 5

__

Why: Adding logger.exception surfaces internal failures during async decorator instrumentation, aiding debugging without altering control flow.

Low

@KRRT7 KRRT7 force-pushed the granular-async-instrumentation branch from 52dbe88 to b153989 Compare August 26, 2025 10:14
@KRRT7 KRRT7 force-pushed the granular-async-instrumentation branch from ef07e94 to 0a57afa Compare August 26, 2025 10:24
@misrasaurabh1
Copy link
Contributor

add an e2e test for this

@misrasaurabh1
Copy link
Contributor

Add tests in the style of

def test_perfinjector_bubble_sort_results() -> None:
. This gives us confidence in that the tests run and provide the correct return values

import_transformer = AsyncDecoratorImportAdder(mode)
module = module.visit(import_transformer)

return isort.code(module.code, float_to_top=True), decorator_transformer.added_decorator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are isorting user's code?

codeflash-ai bot added a commit that referenced this pull request Aug 29, 2025
…687 (`granular-async-instrumentation`)

The optimization replaces expensive `Path` object creation and method calls with direct string manipulation operations, delivering a **491% speedup**.

**Key optimizations:**

1. **Eliminated Path object overhead**: Replaced `Path(filename).stem.startswith("test_")` with `filename.rpartition('/')[-1].rpartition('\\')[-1].rpartition('.')[0].startswith("test_")` - avoiding Path instantiation entirely.

2. **Optimized path parts extraction**: Replaced `Path(filename).parts` with `filename.replace('\\', '/').split('/')` - using simple string operations instead of Path parsing.

**Performance impact analysis:**
- Original profiler shows lines 25-26 (Path operations) consumed **86.3%** of total runtime (44.7% + 41.6%)
- Optimized version reduces these same operations to just **25.4%** of runtime (15% + 10.4%)
- The string manipulation operations are ~6x faster per call than Path object creation

**Test case benefits:**
- **Large-scale tests** see the biggest gains (516% faster for 900-frame stack, 505% faster for 950-frame chain) because the Path overhead multiplies with stack depth
- **Edge cases** with complex paths benefit significantly (182-206% faster for subdirectory and pytest frame tests)
- **Basic tests** show minimal overhead since Path operations weren't the bottleneck in shallow stacks

The optimization maintains identical behavior while eliminating the most expensive operations identified in the profiling data - Path object instantiation and method calls that occurred once per stack frame.
KRRT7 and others added 3 commits September 2, 2025 23:55
…(`granular-async-instrumentation`)

The optimization replaces `ast.walk(node)` with direct iteration over `node.body` in the `visit_ClassDef` method. This is a significant algorithmic improvement because:

**What was changed:**
- Changed `for inner_node in ast.walk(node):` to `for inner_node in node.body:`

**Why this leads to a speedup:**
- `ast.walk(node)` recursively traverses ALL descendant nodes in the AST subtree (classes, functions, statements, expressions, etc.), which creates unnecessary overhead
- `node.body` directly accesses only the immediate children of the class definition
- The line profiler shows the iteration went from 10,032 hits to just 409 hits - a 96% reduction in loop iterations
- The time spent on the iteration line dropped from 67.8% to 0.6% of total execution time

**Performance characteristics:**
- The optimization is most effective for classes with complex nested structures, as shown by the 196% speedup
- Large-scale test cases with 100+ methods and nested compound statements benefit significantly
- Basic test cases with simple class structures also see improvements due to reduced AST traversal overhead
- The optimization preserves exact functionality since we only need immediate class body elements (methods) anyway

This is a classic case of using the right data structure access pattern - direct indexing instead of tree traversal when you only need immediate children.
Copy link
Contributor

codeflash-ai bot commented Sep 3, 2025

⚡️ Codeflash found optimizations for this PR

📄 197% (1.97x) speedup for CommentMapper.visit_ClassDef in codeflash/code_utils/edit_generated_tests.py

⏱️ Runtime : 11.6 milliseconds 3.90 milliseconds (best of 253 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch granular-async-instrumentation).

KRRT7 and others added 3 commits September 3, 2025 00:14
…25-09-03T05.07.18

⚡️ Speed up method `CommentMapper.visit_ClassDef` by 197% in PR #687 (`granular-async-instrumentation`)
Copy link
Contributor

codeflash-ai bot commented Sep 3, 2025

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

…(`granular-async-instrumentation`)

The optimization replaces the expensive `ast.walk()` call with a targeted node traversal that only checks the immediate statement and its direct body children. 

**Key change:** Instead of `ast.walk(compound_line_node)` which recursively traverses the entire AST subtree, the optimized code creates a focused list:
```python
nodes_to_check = [compound_line_node]
nodes_to_check.extend(getattr(compound_line_node, 'body', []))
```

This dramatically reduces the number of nodes processed in the inner loop. The line profiler shows `ast.walk()` was the major bottleneck (46.2% of total time, 8.23ms), while the optimized version's equivalent loop takes only 1.9% of total time (180μs).

**Why this works:** The code only needs to check statements at the current level and one level deep (direct children in compound statement bodies like `for`, `if`, `while`, `with`). The original `ast.walk()` was doing unnecessary deep traversal of nested structures.

**Performance impact:** The optimization is most effective for test cases with compound statements (for/while/if/with blocks) containing multiple nested nodes, showing 73-156% speedups in those scenarios. Simple statement functions see smaller but consistent 1-3% improvements due to reduced overhead.
Copy link
Contributor

codeflash-ai bot commented Sep 3, 2025

⚡️ Codeflash found optimizations for this PR

📄 84% (0.84x) speedup for CommentMapper.visit_FunctionDef in codeflash/code_utils/edit_generated_tests.py

⏱️ Runtime : 3.03 milliseconds 1.65 milliseconds (best of 148 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch granular-async-instrumentation).

KRRT7 and others added 2 commits September 3, 2025 00:30
…25-09-03T05.27.10

⚡️ Speed up method `CommentMapper.visit_FunctionDef` by 84% in PR #687 (`granular-async-instrumentation`)
Copy link
Contributor

codeflash-ai bot commented Sep 3, 2025

codeflash-ai bot added a commit that referenced this pull request Sep 3, 2025
…#687 (`granular-async-instrumentation`)

The optimized code achieves an 11% speedup through several key micro-optimizations that reduce Python's runtime overhead:

**1. Cached Attribute/Dictionary Lookups**
The most impactful change is caching frequently accessed attributes and dictionaries as local variables:
- `context_stack = self.context_stack`
- `results = self.results` 
- `original_runtimes = self.original_runtimes`
- `optimized_runtimes = self.optimized_runtimes`
- `get_comment = self.get_comment`

This eliminates repeated `self.` attribute lookups in the tight loops, which the profiler shows are called thousands of times (2,825+ iterations).

**2. Pre-cached Loop Bodies**
Caching `node_body = node.body` and `ln_body = line_node.body` before loops reduces attribute access overhead. The profiler shows these are accessed in nested loops with high hit counts.

**3. Optimized String Operations**
Using f-strings (`f"{test_qualified_name}#{self.abs_path}"`, `f"{i}_{j}"`) instead of string concatenation with `+` operators reduces temporary object creation and string manipulation overhead.

**4. Refined getattr Usage**
Changed from `getattr(compound_line_node, "body", [])` to `getattr(compound_line_node, 'body', None)` with a conditional check, avoiding allocation of empty lists when no body exists.

**Performance Impact by Test Type:**
- **Large-scale tests** show the biggest gains (14-117% faster) due to the cumulative effect of micro-optimizations in loops
- **Compound statement tests** benefit significantly (16-45% faster) from reduced attribute lookups in nested processing  
- **Simple cases** show modest improvements (1-6% faster) as overhead reduction is less pronounced
- **Edge cases** with no matching runtimes benefit from faster loop traversal (3-12% faster)

The optimizations are most effective for functions with many statements or nested compound structures, where the tight loops amplify the benefit of reduced Python interpreter overhead.
Copy link
Contributor

codeflash-ai bot commented Sep 3, 2025

⚡️ Codeflash found optimizations for this PR

📄 11% (0.11x) speedup for CommentMapper.visit_AsyncFunctionDef in codeflash/code_utils/edit_generated_tests.py

⏱️ Runtime : 3.58 milliseconds 3.22 milliseconds (best of 291 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch granular-async-instrumentation).

@github-actions github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
workflow-modified This PR modifies GitHub Actions workflows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants