⚡️ Speed up method `AsyncCallInstrumenter._process_test_function` by 13% in PR #678 (`standalone-fto-async`) #768

codeflash-ai · 2025-09-26T20:13:58Z

⚡️ This pull request contains optimizations for PR #678

If you approve this dependent PR, these changes will be merged into the original PR branch standalone-fto-async.

This PR will be automatically closed if the original PR is merged.

📄 13% (0.13x) speedup for `AsyncCallInstrumenter._process_test_function` in `codeflash/code_utils/instrument_existing_tests.py`

⏱️ Runtime : 2.35 milliseconds → 2.09 milliseconds (best of 15 runs)

📝 Explanation and details

The optimization achieves a 12% speedup through several targeted improvements in the _process_test_function and _instrument_statement methods:

Key Optimizations:

Variable hoisting and local references: The optimized code extracts frequently accessed instance variables (self.async_call_counter, node.name) into local variables at the beginning of _process_test_function. It also creates local references to methods (self._instrument_statement, new_body.append) to avoid repeated attribute lookups during the main loop.
Improved timeout decorator check: Instead of using any() with a generator expression, the optimization uses an explicit loop with early termination when a timeout decorator is found. This avoids creating unnecessary generator objects and allows for faster short-circuiting.
Optimized AST traversal: The most significant improvement is replacing ast.walk() with a manual stack-based traversal using ast.iter_child_nodes() in _instrument_statement. This eliminates the overhead of ast.walk()'s recursive generator and provides better control over the traversal process.
Simplified counter management: The optimization tracks the call index locally during processing and only updates the instance variable once at the end, reducing dictionary access overhead.

Performance Impact by Test Case:

Small functions: 61-130% faster for basic test cases with minimal statements
Empty/simple functions: 71-119% faster due to reduced overhead in the main processing loop
Large-scale functions: 11.5% faster for functions with 500+ await statements, where the AST traversal optimization becomes most beneficial

The optimizations are particularly effective for functions with many statements where the improved AST traversal and reduced attribute lookups compound to significant savings.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 54 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	70.6%

🌀 Generated Regression Tests and Runtime

from __future__ import annotations

import ast
import types

# imports
import pytest
from codeflash.code_utils.instrument_existing_tests import \
    AsyncCallInstrumenter


# Dummy implementations for dependencies (since we cannot import actual codeflash modules)
class FunctionToOptimize:
    def __init__(self, function_name, parents=None, top_level_parent_name=None):
        self.function_name = function_name
        self.parents = parents or []
        self.top_level_parent_name = top_level_parent_name

class CodePosition:
    def __init__(self, lineno, col_offset):
        self.lineno = lineno
        self.col_offset = col_offset

# Helper function to parse code and get function node
def get_function_node(code: str, function_name: str):
    tree = ast.parse(code)
    for node in tree.body:
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)) and node.name == function_name:
            return node
    raise ValueError(f"Function {function_name} not found")

# ========== UNIT TESTS ==========

# --- 1. Basic Test Cases ---

def test_adds_timeout_decorator_if_unittest_and_no_decorator():
    """Should add timeout_decorator.timeout if test_framework is 'unittest' and not present."""
    code = """
async def test_func():
    pass
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="unittest",
        call_positions=[],
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 11.2μs -> 6.94μs (61.2% faster)

def test_does_not_duplicate_timeout_decorator():
    """Should not add timeout_decorator.timeout if already present."""
    code = """
@timeout_decorator.timeout(15)
async def test_func():
    pass
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="unittest",
        call_positions=[],
    )
    orig_len = len(node.decorator_list)
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 10.4μs -> 5.53μs (87.9% faster)

def test_no_timeout_decorator_if_not_unittest():
    """Should not add timeout_decorator.timeout if test_framework is not 'unittest'."""
    code = """
async def test_func():
    pass
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=[],
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 6.24μs -> 2.71μs (130% faster)

def test_env_assignment_added_on_await_target_call():
    """Should add env assignment before await of target function at matching position."""
    code = """
async def test_func():
    await my_async_func()
    await other_func()
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    # Only first await should be instrumented
    call_positions = [CodePosition(lineno=3, col_offset=4)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output
    # Check that the assignment is to os.environ["CODEFLASH_CURRENT_LINE_ID"]
    target = result.body[0].targets[0]

def test_env_assignment_not_added_if_no_target_call():
    """Should not add env assignment if no await of target function at matching position."""
    code = """
async def test_func():
    await other_func()
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    call_positions = [CodePosition(lineno=3, col_offset=4)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 12.4μs -> 11.7μs (6.06% faster)

# --- 2. Edge Test Cases ---

def test_handles_empty_function_body():
    """Should handle functions with empty body gracefully."""
    code = """
async def test_func():
    pass
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=[CodePosition(lineno=3, col_offset=4)],
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 6.61μs -> 3.02μs (119% faster)

def test_handles_multiple_target_calls():
    """Should instrument multiple awaits of target function at different positions."""
    code = """
async def test_func():
    await my_async_func()
    await my_async_func()
    await other_func()
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    call_positions = [
        CodePosition(lineno=3, col_offset=4),
        CodePosition(lineno=4, col_offset=4),
    ]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output

def test_handles_attribute_calls():
    """Should instrument await of target function called as attribute (e.g., obj.my_async_func())."""
    code = """
async def test_func():
    await obj.my_async_func()
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    call_positions = [CodePosition(lineno=3, col_offset=4)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output
    await_expr = result.body[1].value

def test_handles_no_lineno_or_col_offset():
    """Should not instrument if call node lacks lineno/col_offset."""
    # Build AST manually with missing lineno/col_offset
    call = ast.Call(func=ast.Name(id="my_async_func", ctx=ast.Load()), args=[], keywords=[])
    await_expr = ast.Await(value=call)
    expr = ast.Expr(value=await_expr)
    node = ast.AsyncFunctionDef(
        name="test_func",
        args=ast.arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]),
        body=[expr],
        decorator_list=[],
    )
    func = FunctionToOptimize("my_async_func")
    call_positions = [CodePosition(lineno=3, col_offset=4)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 12.6μs -> 11.4μs (10.9% faster)

def test_handles_functiondef_and_asyncfunctiondef():
    """Should work for both FunctionDef and AsyncFunctionDef."""
    code = """
def test_func():
    pass

async def test_func_async():
    pass
"""
    func = FunctionToOptimize("my_async_func")
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="unittest",
        call_positions=[],
    )
    node1 = get_function_node(code, "test_func")
    node2 = get_function_node(code, "test_func_async")
    codeflash_output = instrumenter._process_test_function(node1); result1 = codeflash_output # 11.3μs -> 6.43μs (76.2% faster)
    codeflash_output = instrumenter._process_test_function(node2); result2 = codeflash_output # 5.15μs -> 2.88μs (79.1% faster)

# --- 3. Large Scale Test Cases ---

def test_large_number_of_statements():
    """Should handle functions with many statements and instrument only matching ones."""
    # Build code with 100 target awaits, 100 non-target awaits
    code_lines = ["async def test_func():"]
    for i in range(1, 101):
        code_lines.append(f"    await my_async_func()  # target {i}")
        code_lines.append(f"    await other_func()  # non-target {i}")
    code = "\n".join(code_lines)
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    # Instrument all target awaits
    call_positions = [CodePosition(lineno=2 + 2 * i, col_offset=4) for i in range(100)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output
    # Every third statement (0, 3, 6, ...) should be env assignment
    for i in range(0, 300, 3):
        await_expr = result.body[i + 1].value
        await_expr2 = result.body[i + 2].value

def test_large_number_of_functions():
    """Should handle instrumenting many functions independently."""
    # Build code with 10 functions, each with 10 awaits of target func
    code_lines = []
    for f in range(10):
        code_lines.append(f"async def test_func_{f}():")
        for i in range(10):
            code_lines.append(f"    await my_async_func()")
    code = "\n".join(code_lines)
    func = FunctionToOptimize("my_async_func")
    call_positions = []
    for f in range(10):
        for i in range(10):
            call_positions.append(CodePosition(lineno=2 + f * 11 + i, col_offset=4))
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    # Instrument each function node
    for f in range(10):
        node = get_function_node(code, f"test_func_{f}")
        codeflash_output = instrumenter._process_test_function(node); result = codeflash_output
        for i in range(0, 20, 2):
            await_expr = result.body[i + 1].value

def test_scalability_with_mixed_calls():
    """Should scale with mixed calls and only instrument the correct ones."""
    code_lines = ["async def test_func():"]
    # 500 target, 500 non-target
    for i in range(500):
        code_lines.append(f"    await my_async_func()")
        code_lines.append(f"    await other_func()")
    code = "\n".join(code_lines)
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    call_positions = [CodePosition(lineno=2 + 2 * i, col_offset=4) for i in range(500)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output
    # Every third statement is env assignment
    for i in range(0, 1500, 3):
        await_expr = result.body[i + 1].value
        await_expr2 = result.body[i + 2].value
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import ast
import sys
import types

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.instrument_existing_tests import \
    AsyncCallInstrumenter


# Dummy dependencies for the test
class DummyParent:
    def __init__(self, type_, name):
        self.type = type_
        self.name = name

class DummyFunctionToOptimize:
    def __init__(self, function_name, parents=None, top_level_parent_name=None):
        self.function_name = function_name
        self.parents = parents or []
        self.top_level_parent_name = top_level_parent_name

class DummyCodePosition:
    def __init__(self, lineno, col_offset):
        self.lineno = lineno
        self.col_offset = col_offset

# Helper function to parse code and get function node
def get_function_node(code: str, is_async=False):
    tree = ast.parse(code)
    for node in ast.walk(tree):
        if is_async and isinstance(node, ast.AsyncFunctionDef):
            return node
        if not is_async and isinstance(node, ast.FunctionDef):
            return node
    return None

# ----------------------------------------
# Basic Test Cases
# ----------------------------------------

def test_adds_timeout_decorator_for_unittest():
    # Ensure timeout_decorator is added if not present and test_framework is 'unittest'
    code = "def test_func():\n    pass"
    node = get_function_node(code)
    func = DummyFunctionToOptimize("foo")
    instr = AsyncCallInstrumenter(func, "mod.py", "unittest", [])
    codeflash_output = instr._process_test_function(node); result = codeflash_output # 10.7μs -> 7.55μs (41.2% faster)

def test_does_not_add_timeout_if_already_present():
    # Should not add a second timeout_decorator if already present
    code = (
        "@timeout_decorator.timeout(15)\n"
        "def test_func():\n    pass"
    )
    node = get_function_node(code)
    func = DummyFunctionToOptimize("foo")
    instr = AsyncCallInstrumenter(func, "mod.py", "unittest", [])
    orig_decorator_count = len(node.decorator_list)
    codeflash_output = instr._process_test_function(node); result = codeflash_output # 8.80μs -> 5.75μs (52.9% faster)

def test_no_timeout_decorator_for_pytest():
    # Should not add timeout_decorator for pytest framework
    code = "def test_func():\n    pass"
    node = get_function_node(code)
    func = DummyFunctionToOptimize("foo")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [])
    codeflash_output = instr._process_test_function(node); result = codeflash_output # 4.56μs -> 2.65μs (71.7% faster)

def test_instrument_await_target_call_in_position():
    # Should instrument an await of target function at a matching position
    code = (
        "async def test_func():\n"
        "    await foo()\n"
        "    await bar()\n"
    )
    node = get_function_node(code, is_async=True)
    # foo is the target, and its position is line 2, col 10
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    codeflash_output = instr._process_test_function(node); result = codeflash_output
    # Confirm CODEFLASH_CURRENT_LINE_ID is the env key
    target = result.body[0].targets[0]

def test_no_instrument_for_non_target_call():
    # Should not instrument if await is not for target function
    code = (
        "async def test_func():\n"
        "    await bar()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    codeflash_output = instr._process_test_function(node); result = codeflash_output # 9.55μs -> 10.5μs (8.81% slower)

# ----------------------------------------
# Edge Test Cases
# ----------------------------------------

def test_instrument_multiple_awaits_same_func():
    # Should instrument multiple awaits of the target function at different positions
    code = (
        "async def test_func():\n"
        "    await foo()\n"
        "    await foo()\n"
        "    await bar()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos1 = DummyCodePosition(2, 10)
    pos2 = DummyCodePosition(3, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos1, pos2])
    codeflash_output = instr._process_test_function(node); result = codeflash_output

def test_instrument_attribute_call():
    # Should instrument await obj.foo() if foo is target function
    code = (
        "async def test_func():\n"
        "    await obj.foo()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    # Patch ast so that obj.foo() call has correct lineno/col_offset
    await_stmt = node.body[0]
    call_node = await_stmt.value
    call_node.lineno = 2
    call_node.col_offset = 10
    codeflash_output = instr._process_test_function(node); result = codeflash_output

def test_no_instrument_if_no_lineno_coloffset():
    # Should not instrument if call node lacks lineno/col_offset
    code = (
        "async def test_func():\n"
        "    await foo()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    # Remove lineno/col_offset from call node
    await_stmt = node.body[0]
    call_node = await_stmt.value
    if hasattr(call_node, "lineno"):
        delattr(call_node, "lineno")
    if hasattr(call_node, "col_offset"):
        delattr(call_node, "col_offset")
    codeflash_output = instr._process_test_function(node); result = codeflash_output

def test_instrument_with_class_parent():
    # Should set class_name if parent type is ClassDef
    code = (
        "async def test_func():\n"
        "    await foo()\n"
    )
    node = get_function_node(code, is_async=True)
    parent = DummyParent("ClassDef", "TestClass")
    func = DummyFunctionToOptimize("foo", parents=[parent], top_level_parent_name="TestClass")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])

def test_env_assignment_lineno():
    # Should set env assignment lineno to statement's lineno
    code = (
        "async def test_func():\n"
        "    await foo()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    codeflash_output = instr._process_test_function(node); result = codeflash_output

def test_env_assignment_default_lineno():
    # If statement has no lineno, env assignment should default to 1
    code = (
        "async def test_func():\n"
        "    await foo()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    # Remove lineno from await statement
    await_stmt = node.body[0]
    if hasattr(await_stmt, "lineno"):
        delattr(await_stmt, "lineno")
    codeflash_output = instr._process_test_function(node); result = codeflash_output

# ----------------------------------------
# Large Scale Test Cases
# ----------------------------------------

def test_instrument_many_awaits():
    # Should instrument up to 1000 awaits efficiently
    code_lines = ["async def test_func():\n"]
    for i in range(1, 1001):
        code_lines.append(f"    await foo()\n")
    code = "".join(code_lines)
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    positions = [DummyCodePosition(i+1, 10) for i in range(1000)]
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", positions)
    # Patch await statements with correct lineno/col_offset
    for idx, await_stmt in enumerate(node.body):
        call_node = await_stmt.value
        call_node.lineno = idx+2
        call_node.col_offset = 10
    codeflash_output = instr._process_test_function(node); result = codeflash_output
    # Check every even index is Assign, odd is Await
    for i in range(0, 2000, 2):
        pass

def test_instrument_scalability():
    # Should not crash or hang with 500 awaits and 500 positions
    code_lines = ["async def test_func():\n"]
    for i in range(1, 501):
        code_lines.append(f"    await foo()\n")
    code = "".join(code_lines)
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    positions = [DummyCodePosition(i+1, 10) for i in range(500)]
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", positions)
    # Patch await statements with correct lineno/col_offset
    for idx, await_stmt in enumerate(node.body):
        call_node = await_stmt.value
        call_node.lineno = idx+2
        call_node.col_offset = 10
    codeflash_output = instr._process_test_function(node); result = codeflash_output
    for i in range(0, 1000, 2):
        pass

def test_no_instrument_with_empty_positions_large():
    # Should not instrument any awaits if positions list is empty, even for large N
    code_lines = ["async def test_func():\n"]
    for i in range(1, 501):
        code_lines.append(f"    await foo()\n")
    code = "".join(code_lines)
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [])
    codeflash_output = instr._process_test_function(node); result = codeflash_output # 2.24ms -> 2.01ms (11.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr678-2025-09-26T20.13.52 and push.

[LSP] Ensure optimizer cleanup on server shutdown or when the client suddenly disconnects

…licate-global-assignments-when-reverting-helpers

…/duplicate-global-assignments-when-reverting-helpers`) The optimized code achieves a **17% speedup** by eliminating redundant CST parsing operations, which are the most expensive parts of the function according to the line profiler. **Key optimizations:** 1. **Eliminate duplicate parsing**: The original code parsed `src_module_code` and `dst_module_code` multiple times. The optimized version introduces `_extract_global_statements_once()` that parses each module only once and reuses the parsed CST objects throughout the function. 2. **Reuse parsed modules**: Instead of re-parsing `dst_module_code` after modifications, the optimized version conditionally reuses the already-parsed `dst_module` when no global statements need insertion, avoiding unnecessary `cst.parse_module()` calls. 3. **Early termination**: Added an early return when `new_collector.assignments` is empty, avoiding the expensive `GlobalAssignmentTransformer` creation and visitation when there's nothing to transform. 4. **Minor optimization in uniqueness check**: Added a fast-path identity check (`stmt is existing_stmt`) before the expensive `deep_equals()` comparison, though this has minimal impact. **Performance impact by test case type:** - **Empty/minimal cases**: Show the highest gains (59-88% faster) due to early termination optimizations - **Standard cases**: Achieve consistent 20-30% improvements from reduced parsing - **Large-scale tests**: Benefit significantly (18-23% faster) as parsing overhead scales with code size The optimization is most effective for workloads with moderate to large code files where CST parsing dominates the runtime, as evidenced by the original profiler showing 70%+ of time spent in `cst.parse_module()` and `module.visit()` operations.

Signed-off-by: Saurabh Misra <[email protected]>

…25-08-25T18.50.33 ⚡️ Speed up function `add_global_assignments` by 18% in PR #683 (`fix/duplicate-global-assignments-when-reverting-helpers`)

…cs-in-diff [Lsp] return diff functions grouped by file

* lsp: get new/modified functions inside a git commit * better name * refactor * revert

* save optimization patches metadata * typo * lsp: get previous optimizations * fix patch name in non-lsp mode * ⚡️ Speed up function `get_patches_metadata` by 45% in PR #690 (`worktree/persist-optimization-patches`) The optimized code achieves a **44% speedup** through two key optimizations: **1. Added `@lru_cache(maxsize=1)` to `get_patches_dir_for_project()`** - This caches the Path object construction, avoiding repeated calls to `get_git_project_id()` and `Path()` creation - The line profiler shows this function's total time dropped from 5.32ms to being completely eliminated from the hot path in `get_patches_metadata()` - Since `get_git_project_id()` was already cached but still being called repeatedly, this second-level caching eliminates that redundancy **2. Replaced `read_text()` + `json.loads()` with `open()` + `json.load()`** - Using `json.load()` with a file handle is more efficient than reading the entire file into memory first with `read_text()` then parsing it - This avoids the intermediate string creation and is particularly beneficial for larger JSON files - Added explicit UTF-8 encoding for consistency **Performance Impact by Test Type:** - **Basic cases** (small/missing files): 45-65% faster - benefits primarily from the caching optimization - **Edge cases** (malformed JSON): 38-47% faster - still benefits from both optimizations - **Large scale cases** (1000+ patches, large files): 39-52% faster - the file I/O optimization becomes more significant with larger JSON files The caching optimization provides the most consistent gains across all scenarios since it eliminates repeated expensive operations, while the file I/O optimization scales with file size. * fix: patch path * codeflash suggestions * split the worktree utils in a separate file --------- Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

Deque Comparator

* LSP reduce no of candidates * config revert * pass reference values to aiservices * line profiling loading msg --------- Co-authored-by: saga4 <[email protected]> Co-authored-by: ali <[email protected]>

* LSP reduce no of candidates * config revert * pass reference values to aiservices * fix inline condition --------- Co-authored-by: saga4 <[email protected]>

import variable correctly

Signed-off-by: Saurabh Misra <[email protected]>

support attrs comparison

apscheduler tries to schedule jobs when the interpreter is shutting down which can cause it to crash and leave us in a bad state

patch apscheduler when tracing

The optimized version eliminates recursive function calls by replacing the recursive `_find` helper with an iterative approach. This provides significant performance benefits: **Key Optimizations:** 1. **Removed Recursion Overhead**: The original code used a recursive helper function `_find` that created new stack frames for each parent traversal. The optimized version uses a simple iterative loop that traverses parents sequentially without function call overhead. 2. **Eliminated Function Creation**: The original code defined the `_find` function on every call to `find_target_node`. The optimized version removes this repeated function definition entirely. 3. **Early Exit with for-else**: The optimized code uses Python's `for-else` construct to immediately return `None` when a parent class isn't found, avoiding unnecessary continued searching. 4. **Reduced Attribute Access**: By caching `function_to_optimize.function_name` in a local variable `target_name` and reusing `body` variables, the code reduces repeated attribute lookups. **Performance Impact by Test Case:** - **Simple cases** (top-level functions, basic class methods): 23-62% faster due to eliminated recursion overhead - **Nested class scenarios**: 45-84% faster, with deeper nesting showing greater improvements as recursion elimination has more impact - **Large-scale tests**: 12-22% faster, showing consistent benefits even with many nodes to traverse - **Edge cases** (empty modules, non-existent classes): 52-76% faster due to more efficient early termination The optimization is particularly effective for deeply nested class hierarchies where the original recursive approach created multiple stack frames, while the iterative version maintains constant memory usage regardless of nesting depth.

…25-09-25T14.28.58 ⚡️ Speed up function `find_target_node` by 18% in PR #763 (`fix/correctly-find-funtion-node-when-reverting-helpers`)

…node-when-reverting-helpers [FIX] Respect parent classes in revert helpers

Granular async instrumentation

…d move other merged test below; finish resolving aiservice/config/explanation/function_optimizer; regenerate uv.lock

The optimization achieves a **12% speedup** through several targeted improvements in the `_process_test_function` and `_instrument_statement` methods: **Key Optimizations:** 1. **Variable hoisting and local references**: The optimized code extracts frequently accessed instance variables (`self.async_call_counter`, `node.name`) into local variables at the beginning of `_process_test_function`. It also creates local references to methods (`self._instrument_statement`, `new_body.append`) to avoid repeated attribute lookups during the main loop. 2. **Improved timeout decorator check**: Instead of using `any()` with a generator expression, the optimization uses an explicit loop with early termination when a timeout decorator is found. This avoids creating unnecessary generator objects and allows for faster short-circuiting. 3. **Optimized AST traversal**: The most significant improvement is replacing `ast.walk()` with a manual stack-based traversal using `ast.iter_child_nodes()` in `_instrument_statement`. This eliminates the overhead of `ast.walk()`'s recursive generator and provides better control over the traversal process. 4. **Simplified counter management**: The optimization tracks the call index locally during processing and only updates the instance variable once at the end, reducing dictionary access overhead. **Performance Impact by Test Case:** - **Small functions**: 61-130% faster for basic test cases with minimal statements - **Empty/simple functions**: 71-119% faster due to reduced overhead in the main processing loop - **Large-scale functions**: 11.5% faster for functions with 500+ await statements, where the AST traversal optimization becomes most beneficial The optimizations are particularly effective for functions with many statements where the improved AST traversal and reduced attribute lookups compound to significant savings.

codeflash-ai · 2025-09-27T00:16:32Z

This PR has been automatically closed because the original PR #678 by KRRT7 was closed.

mohammedahmed18 and others added 30 commits August 22, 2025 05:58

move optimizer cleanup into server and run optimization in thread

ce79864

for safety

0afb67c

Merge branch 'main' into lsp/threaded-optimizer-cleanup

85ccaaa

prevent duplicate global assignments when reverting helpers

e7de51a

test: simplify

117e682

Merge branch 'main' into lsp/threaded-optimizer-cleanup

c5fbe09

prevent duplicates for new global statements

9c8256a

Merge pull request #676 from codeflash-ai/lsp/threaded-optimizer-cleanup

674e69e

[LSP] Ensure optimizer cleanup on server shutdown or when the client suddenly disconnects

send the file paths with the functions in the current diff

55cb7f4

Merge branch 'main' of github.com:codeflash-ai/codeflash into fix/dup…

28f50cc

…licate-global-assignments-when-reverting-helpers

Merge branch 'main' into standalone-fto-async

9c05130

better name

b18d2c9

revert comment

90ebac3

wip

936fa0a

Signed-off-by: Saurabh Misra <[email protected]>

Merge branch 'main' into parallel-pytest-tracing

0e7834e

bugfix

78519ee

cleanup

6446662

Merge pull request #686 from codeflash-ai/codeflash/optimize-pr683-20…

c1f75ad

…25-08-25T18.50.33 ⚡️ Speed up function `add_global_assignments` by 18% in PR #683 (`fix/duplicate-global-assignments-when-reverting-helpers`)

Merge pull request #688 from codeflash-ai/lsp/send-filenames-with-fun…

fdaf6c0

…cs-in-diff [Lsp] return diff functions grouped by file

Merge branch 'main' into standalone-fto-async

95a149b

[LSP] Get new/modified functions inside a git commit (#694)

a59b9ed

* lsp: get new/modified functions inside a git commit * better name * refactor * revert

debug measure time

d4788b9

should work, will write some tests

48e367e

basic tests

d0195a9

tests for all collections objects

b7a52bc

Merge pull request #705 from codeflash-ai/deque-comparator

8753e54

Deque Comparator

sets instead of lists

0dc325a

fix: regex for api key shell export

97f658c

Saga4 and others added 24 commits September 22, 2025 15:55

LSP reduce no of candidates (#729)

078b5c0

* LSP reduce no of candidates * config revert * pass reference values to aiservices * line profiling loading msg --------- Co-authored-by: saga4 <[email protected]> Co-authored-by: ali <[email protected]>

[fix] candidate inline and remove functions for config (#746)

cdd16bb

* LSP reduce no of candidates * config revert * pass reference values to aiservices * fix inline condition --------- Co-authored-by: saga4 <[email protected]>

Update test_runner.py

51ceda7

Update test_runner.py

5a425c3

Update test_runner.py

4c8b76f

Merge pull request #747 from codeflash-ai/fix_looping_issue

443cb4d

import variable correctly

release/v0.17.0 (#756)

32e85ee

Signed-off-by: Saurabh Misra <[email protected]>

attrs

bebe856

remove unused import

febe0ec

Merge pull request #757 from codeflash-ai/attrs-support

5dffd2a

support attrs comparison

Update tracing_new_process.py

647531f

apscheduler tries to schedule jobs when the interpreter is shutting down which can cause it to crash and leave us in a bad state

code review

444ff12

linter

36e3868

Merge pull request #761 from codeflash-ai/prevent-deadlocks-pytest

f296a0f

patch apscheduler when tracing

respect parents when searching for function in revert helpers

9d57cea

new line

27419be

async function

5cd9024

Merge pull request #764 from codeflash-ai/codeflash/optimize-pr763-20…

76123d0

…25-09-25T14.28.58 ⚡️ Speed up function `find_target_node` by 18% in PR #763 (`fix/correctly-find-funtion-node-when-reverting-helpers`)

Merge pull request #763 from codeflash-ai/fix/correctly-find-funtion-…

2d886e8

…node-when-reverting-helpers [FIX] Respect parent classes in revert helpers

Merge pull request #687 from codeflash-ai/granular-async-instrumentation

1e103bd

Granular async instrumentation

Resolve merge conflicts: consolidate async tests into single block an…

f5c98e5

…d move other merged test below; finish resolving aiservice/config/explanation/function_optimizer; regenerate uv.lock

oopsie mergie

40c4108

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Sep 26, 2025

codeflash-ai bot mentioned this pull request Sep 26, 2025

check if a function is async and add to FTO #678

Closed

KRRT7 force-pushed the standalone-fto-async branch from 40c4108 to 7bbb1e7 Compare September 26, 2025 20:26

codeflash-ai bot closed this Sep 27, 2025

codeflash-ai bot deleted the codeflash/optimize-pr678-2025-09-26T20.13.52 branch September 27, 2025 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `AsyncCallInstrumenter._process_test_function` by 13% in PR #678 (`standalone-fto-async`) #768

⚡️ Speed up method `AsyncCallInstrumenter._process_test_function` by 13% in PR #678 (`standalone-fto-async`) #768

Uh oh!

codeflash-ai bot commented Sep 26, 2025

Uh oh!

codeflash-ai bot commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

⚡️ Speed up method AsyncCallInstrumenter._process_test_function by 13% in PR #678 (standalone-fto-async) #768

⚡️ Speed up method AsyncCallInstrumenter._process_test_function by 13% in PR #678 (standalone-fto-async) #768

Uh oh!

Conversation

codeflash-ai bot commented Sep 26, 2025

⚡️ This pull request contains optimizations for PR #678

📄 13% (0.13x) speedup for AsyncCallInstrumenter._process_test_function in codeflash/code_utils/instrument_existing_tests.py

📝 Explanation and details

Uh oh!

codeflash-ai bot commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

⚡️ Speed up method `AsyncCallInstrumenter._process_test_function` by 13% in PR #678 (`standalone-fto-async`) #768

⚡️ Speed up method `AsyncCallInstrumenter._process_test_function` by 13% in PR #678 (`standalone-fto-async`) #768

📄 13% (0.13x) speedup for `AsyncCallInstrumenter._process_test_function` in `codeflash/code_utils/instrument_existing_tests.py`