Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Sep 26, 2025

⚡️ This pull request contains optimizations for PR #678

If you approve this dependent PR, these changes will be merged into the original PR branch standalone-fto-async.

This PR will be automatically closed if the original PR is merged.


📄 13% (0.13x) speedup for AsyncCallInstrumenter._process_test_function in codeflash/code_utils/instrument_existing_tests.py

⏱️ Runtime : 2.35 milliseconds 2.09 milliseconds (best of 15 runs)

📝 Explanation and details

The optimization achieves a 12% speedup through several targeted improvements in the _process_test_function and _instrument_statement methods:

Key Optimizations:

  1. Variable hoisting and local references: The optimized code extracts frequently accessed instance variables (self.async_call_counter, node.name) into local variables at the beginning of _process_test_function. It also creates local references to methods (self._instrument_statement, new_body.append) to avoid repeated attribute lookups during the main loop.

  2. Improved timeout decorator check: Instead of using any() with a generator expression, the optimization uses an explicit loop with early termination when a timeout decorator is found. This avoids creating unnecessary generator objects and allows for faster short-circuiting.

  3. Optimized AST traversal: The most significant improvement is replacing ast.walk() with a manual stack-based traversal using ast.iter_child_nodes() in _instrument_statement. This eliminates the overhead of ast.walk()'s recursive generator and provides better control over the traversal process.

  4. Simplified counter management: The optimization tracks the call index locally during processing and only updates the instance variable once at the end, reducing dictionary access overhead.

Performance Impact by Test Case:

  • Small functions: 61-130% faster for basic test cases with minimal statements
  • Empty/simple functions: 71-119% faster due to reduced overhead in the main processing loop
  • Large-scale functions: 11.5% faster for functions with 500+ await statements, where the AST traversal optimization becomes most beneficial

The optimizations are particularly effective for functions with many statements where the improved AST traversal and reduced attribute lookups compound to significant savings.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 54 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 70.6%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import ast
import types

# imports
import pytest
from codeflash.code_utils.instrument_existing_tests import \
    AsyncCallInstrumenter


# Dummy implementations for dependencies (since we cannot import actual codeflash modules)
class FunctionToOptimize:
    def __init__(self, function_name, parents=None, top_level_parent_name=None):
        self.function_name = function_name
        self.parents = parents or []
        self.top_level_parent_name = top_level_parent_name

class CodePosition:
    def __init__(self, lineno, col_offset):
        self.lineno = lineno
        self.col_offset = col_offset

# Helper function to parse code and get function node
def get_function_node(code: str, function_name: str):
    tree = ast.parse(code)
    for node in tree.body:
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)) and node.name == function_name:
            return node
    raise ValueError(f"Function {function_name} not found")

# ========== UNIT TESTS ==========

# --- 1. Basic Test Cases ---

def test_adds_timeout_decorator_if_unittest_and_no_decorator():
    """Should add timeout_decorator.timeout if test_framework is 'unittest' and not present."""
    code = """
async def test_func():
    pass
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="unittest",
        call_positions=[],
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 11.2μs -> 6.94μs (61.2% faster)

def test_does_not_duplicate_timeout_decorator():
    """Should not add timeout_decorator.timeout if already present."""
    code = """
@timeout_decorator.timeout(15)
async def test_func():
    pass
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="unittest",
        call_positions=[],
    )
    orig_len = len(node.decorator_list)
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 10.4μs -> 5.53μs (87.9% faster)

def test_no_timeout_decorator_if_not_unittest():
    """Should not add timeout_decorator.timeout if test_framework is not 'unittest'."""
    code = """
async def test_func():
    pass
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=[],
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 6.24μs -> 2.71μs (130% faster)

def test_env_assignment_added_on_await_target_call():
    """Should add env assignment before await of target function at matching position."""
    code = """
async def test_func():
    await my_async_func()
    await other_func()
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    # Only first await should be instrumented
    call_positions = [CodePosition(lineno=3, col_offset=4)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output
    # Check that the assignment is to os.environ["CODEFLASH_CURRENT_LINE_ID"]
    target = result.body[0].targets[0]

def test_env_assignment_not_added_if_no_target_call():
    """Should not add env assignment if no await of target function at matching position."""
    code = """
async def test_func():
    await other_func()
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    call_positions = [CodePosition(lineno=3, col_offset=4)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 12.4μs -> 11.7μs (6.06% faster)

# --- 2. Edge Test Cases ---

def test_handles_empty_function_body():
    """Should handle functions with empty body gracefully."""
    code = """
async def test_func():
    pass
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=[CodePosition(lineno=3, col_offset=4)],
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 6.61μs -> 3.02μs (119% faster)

def test_handles_multiple_target_calls():
    """Should instrument multiple awaits of target function at different positions."""
    code = """
async def test_func():
    await my_async_func()
    await my_async_func()
    await other_func()
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    call_positions = [
        CodePosition(lineno=3, col_offset=4),
        CodePosition(lineno=4, col_offset=4),
    ]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output

def test_handles_attribute_calls():
    """Should instrument await of target function called as attribute (e.g., obj.my_async_func())."""
    code = """
async def test_func():
    await obj.my_async_func()
"""
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    call_positions = [CodePosition(lineno=3, col_offset=4)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output
    await_expr = result.body[1].value

def test_handles_no_lineno_or_col_offset():
    """Should not instrument if call node lacks lineno/col_offset."""
    # Build AST manually with missing lineno/col_offset
    call = ast.Call(func=ast.Name(id="my_async_func", ctx=ast.Load()), args=[], keywords=[])
    await_expr = ast.Await(value=call)
    expr = ast.Expr(value=await_expr)
    node = ast.AsyncFunctionDef(
        name="test_func",
        args=ast.arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]),
        body=[expr],
        decorator_list=[],
    )
    func = FunctionToOptimize("my_async_func")
    call_positions = [CodePosition(lineno=3, col_offset=4)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output # 12.6μs -> 11.4μs (10.9% faster)

def test_handles_functiondef_and_asyncfunctiondef():
    """Should work for both FunctionDef and AsyncFunctionDef."""
    code = """
def test_func():
    pass

async def test_func_async():
    pass
"""
    func = FunctionToOptimize("my_async_func")
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="unittest",
        call_positions=[],
    )
    node1 = get_function_node(code, "test_func")
    node2 = get_function_node(code, "test_func_async")
    codeflash_output = instrumenter._process_test_function(node1); result1 = codeflash_output # 11.3μs -> 6.43μs (76.2% faster)
    codeflash_output = instrumenter._process_test_function(node2); result2 = codeflash_output # 5.15μs -> 2.88μs (79.1% faster)

# --- 3. Large Scale Test Cases ---

def test_large_number_of_statements():
    """Should handle functions with many statements and instrument only matching ones."""
    # Build code with 100 target awaits, 100 non-target awaits
    code_lines = ["async def test_func():"]
    for i in range(1, 101):
        code_lines.append(f"    await my_async_func()  # target {i}")
        code_lines.append(f"    await other_func()  # non-target {i}")
    code = "\n".join(code_lines)
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    # Instrument all target awaits
    call_positions = [CodePosition(lineno=2 + 2 * i, col_offset=4) for i in range(100)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output
    # Every third statement (0, 3, 6, ...) should be env assignment
    for i in range(0, 300, 3):
        await_expr = result.body[i + 1].value
        await_expr2 = result.body[i + 2].value

def test_large_number_of_functions():
    """Should handle instrumenting many functions independently."""
    # Build code with 10 functions, each with 10 awaits of target func
    code_lines = []
    for f in range(10):
        code_lines.append(f"async def test_func_{f}():")
        for i in range(10):
            code_lines.append(f"    await my_async_func()")
    code = "\n".join(code_lines)
    func = FunctionToOptimize("my_async_func")
    call_positions = []
    for f in range(10):
        for i in range(10):
            call_positions.append(CodePosition(lineno=2 + f * 11 + i, col_offset=4))
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    # Instrument each function node
    for f in range(10):
        node = get_function_node(code, f"test_func_{f}")
        codeflash_output = instrumenter._process_test_function(node); result = codeflash_output
        for i in range(0, 20, 2):
            await_expr = result.body[i + 1].value

def test_scalability_with_mixed_calls():
    """Should scale with mixed calls and only instrument the correct ones."""
    code_lines = ["async def test_func():"]
    # 500 target, 500 non-target
    for i in range(500):
        code_lines.append(f"    await my_async_func()")
        code_lines.append(f"    await other_func()")
    code = "\n".join(code_lines)
    node = get_function_node(code, "test_func")
    func = FunctionToOptimize("my_async_func")
    call_positions = [CodePosition(lineno=2 + 2 * i, col_offset=4) for i in range(500)]
    instrumenter = AsyncCallInstrumenter(
        function=func,
        module_path="some_module.py",
        test_framework="pytest",
        call_positions=call_positions,
    )
    codeflash_output = instrumenter._process_test_function(node); result = codeflash_output
    # Every third statement is env assignment
    for i in range(0, 1500, 3):
        await_expr = result.body[i + 1].value
        await_expr2 = result.body[i + 2].value
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import ast
import sys
import types

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.instrument_existing_tests import \
    AsyncCallInstrumenter


# Dummy dependencies for the test
class DummyParent:
    def __init__(self, type_, name):
        self.type = type_
        self.name = name

class DummyFunctionToOptimize:
    def __init__(self, function_name, parents=None, top_level_parent_name=None):
        self.function_name = function_name
        self.parents = parents or []
        self.top_level_parent_name = top_level_parent_name

class DummyCodePosition:
    def __init__(self, lineno, col_offset):
        self.lineno = lineno
        self.col_offset = col_offset

# Helper function to parse code and get function node
def get_function_node(code: str, is_async=False):
    tree = ast.parse(code)
    for node in ast.walk(tree):
        if is_async and isinstance(node, ast.AsyncFunctionDef):
            return node
        if not is_async and isinstance(node, ast.FunctionDef):
            return node
    return None

# ----------------------------------------
# Basic Test Cases
# ----------------------------------------

def test_adds_timeout_decorator_for_unittest():
    # Ensure timeout_decorator is added if not present and test_framework is 'unittest'
    code = "def test_func():\n    pass"
    node = get_function_node(code)
    func = DummyFunctionToOptimize("foo")
    instr = AsyncCallInstrumenter(func, "mod.py", "unittest", [])
    codeflash_output = instr._process_test_function(node); result = codeflash_output # 10.7μs -> 7.55μs (41.2% faster)

def test_does_not_add_timeout_if_already_present():
    # Should not add a second timeout_decorator if already present
    code = (
        "@timeout_decorator.timeout(15)\n"
        "def test_func():\n    pass"
    )
    node = get_function_node(code)
    func = DummyFunctionToOptimize("foo")
    instr = AsyncCallInstrumenter(func, "mod.py", "unittest", [])
    orig_decorator_count = len(node.decorator_list)
    codeflash_output = instr._process_test_function(node); result = codeflash_output # 8.80μs -> 5.75μs (52.9% faster)

def test_no_timeout_decorator_for_pytest():
    # Should not add timeout_decorator for pytest framework
    code = "def test_func():\n    pass"
    node = get_function_node(code)
    func = DummyFunctionToOptimize("foo")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [])
    codeflash_output = instr._process_test_function(node); result = codeflash_output # 4.56μs -> 2.65μs (71.7% faster)

def test_instrument_await_target_call_in_position():
    # Should instrument an await of target function at a matching position
    code = (
        "async def test_func():\n"
        "    await foo()\n"
        "    await bar()\n"
    )
    node = get_function_node(code, is_async=True)
    # foo is the target, and its position is line 2, col 10
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    codeflash_output = instr._process_test_function(node); result = codeflash_output
    # Confirm CODEFLASH_CURRENT_LINE_ID is the env key
    target = result.body[0].targets[0]

def test_no_instrument_for_non_target_call():
    # Should not instrument if await is not for target function
    code = (
        "async def test_func():\n"
        "    await bar()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    codeflash_output = instr._process_test_function(node); result = codeflash_output # 9.55μs -> 10.5μs (8.81% slower)

# ----------------------------------------
# Edge Test Cases
# ----------------------------------------

def test_instrument_multiple_awaits_same_func():
    # Should instrument multiple awaits of the target function at different positions
    code = (
        "async def test_func():\n"
        "    await foo()\n"
        "    await foo()\n"
        "    await bar()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos1 = DummyCodePosition(2, 10)
    pos2 = DummyCodePosition(3, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos1, pos2])
    codeflash_output = instr._process_test_function(node); result = codeflash_output

def test_instrument_attribute_call():
    # Should instrument await obj.foo() if foo is target function
    code = (
        "async def test_func():\n"
        "    await obj.foo()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    # Patch ast so that obj.foo() call has correct lineno/col_offset
    await_stmt = node.body[0]
    call_node = await_stmt.value
    call_node.lineno = 2
    call_node.col_offset = 10
    codeflash_output = instr._process_test_function(node); result = codeflash_output

def test_no_instrument_if_no_lineno_coloffset():
    # Should not instrument if call node lacks lineno/col_offset
    code = (
        "async def test_func():\n"
        "    await foo()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    # Remove lineno/col_offset from call node
    await_stmt = node.body[0]
    call_node = await_stmt.value
    if hasattr(call_node, "lineno"):
        delattr(call_node, "lineno")
    if hasattr(call_node, "col_offset"):
        delattr(call_node, "col_offset")
    codeflash_output = instr._process_test_function(node); result = codeflash_output

def test_instrument_with_class_parent():
    # Should set class_name if parent type is ClassDef
    code = (
        "async def test_func():\n"
        "    await foo()\n"
    )
    node = get_function_node(code, is_async=True)
    parent = DummyParent("ClassDef", "TestClass")
    func = DummyFunctionToOptimize("foo", parents=[parent], top_level_parent_name="TestClass")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])

def test_env_assignment_lineno():
    # Should set env assignment lineno to statement's lineno
    code = (
        "async def test_func():\n"
        "    await foo()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    codeflash_output = instr._process_test_function(node); result = codeflash_output

def test_env_assignment_default_lineno():
    # If statement has no lineno, env assignment should default to 1
    code = (
        "async def test_func():\n"
        "    await foo()\n"
    )
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    pos = DummyCodePosition(2, 10)
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [pos])
    # Remove lineno from await statement
    await_stmt = node.body[0]
    if hasattr(await_stmt, "lineno"):
        delattr(await_stmt, "lineno")
    codeflash_output = instr._process_test_function(node); result = codeflash_output

# ----------------------------------------
# Large Scale Test Cases
# ----------------------------------------

def test_instrument_many_awaits():
    # Should instrument up to 1000 awaits efficiently
    code_lines = ["async def test_func():\n"]
    for i in range(1, 1001):
        code_lines.append(f"    await foo()\n")
    code = "".join(code_lines)
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    positions = [DummyCodePosition(i+1, 10) for i in range(1000)]
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", positions)
    # Patch await statements with correct lineno/col_offset
    for idx, await_stmt in enumerate(node.body):
        call_node = await_stmt.value
        call_node.lineno = idx+2
        call_node.col_offset = 10
    codeflash_output = instr._process_test_function(node); result = codeflash_output
    # Check every even index is Assign, odd is Await
    for i in range(0, 2000, 2):
        pass

def test_instrument_scalability():
    # Should not crash or hang with 500 awaits and 500 positions
    code_lines = ["async def test_func():\n"]
    for i in range(1, 501):
        code_lines.append(f"    await foo()\n")
    code = "".join(code_lines)
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    positions = [DummyCodePosition(i+1, 10) for i in range(500)]
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", positions)
    # Patch await statements with correct lineno/col_offset
    for idx, await_stmt in enumerate(node.body):
        call_node = await_stmt.value
        call_node.lineno = idx+2
        call_node.col_offset = 10
    codeflash_output = instr._process_test_function(node); result = codeflash_output
    for i in range(0, 1000, 2):
        pass

def test_no_instrument_with_empty_positions_large():
    # Should not instrument any awaits if positions list is empty, even for large N
    code_lines = ["async def test_func():\n"]
    for i in range(1, 501):
        code_lines.append(f"    await foo()\n")
    code = "".join(code_lines)
    node = get_function_node(code, is_async=True)
    func = DummyFunctionToOptimize("foo")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [])
    codeflash_output = instr._process_test_function(node); result = codeflash_output # 2.24ms -> 2.01ms (11.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr678-2025-09-26T20.13.52 and push.

Codeflash

mohammedahmed18 and others added 30 commits August 22, 2025 05:58
[LSP] Ensure optimizer cleanup on server shutdown or when the client suddenly disconnects
…licate-global-assignments-when-reverting-helpers
…/duplicate-global-assignments-when-reverting-helpers`)

The optimized code achieves a **17% speedup** by eliminating redundant CST parsing operations, which are the most expensive parts of the function according to the line profiler.

**Key optimizations:**

1. **Eliminate duplicate parsing**: The original code parsed `src_module_code` and `dst_module_code` multiple times. The optimized version introduces `_extract_global_statements_once()` that parses each module only once and reuses the parsed CST objects throughout the function.

2. **Reuse parsed modules**: Instead of re-parsing `dst_module_code` after modifications, the optimized version conditionally reuses the already-parsed `dst_module` when no global statements need insertion, avoiding unnecessary `cst.parse_module()` calls.

3. **Early termination**: Added an early return when `new_collector.assignments` is empty, avoiding the expensive `GlobalAssignmentTransformer` creation and visitation when there's nothing to transform.

4. **Minor optimization in uniqueness check**: Added a fast-path identity check (`stmt is existing_stmt`) before the expensive `deep_equals()` comparison, though this has minimal impact.

**Performance impact by test case type:**
- **Empty/minimal cases**: Show the highest gains (59-88% faster) due to early termination optimizations
- **Standard cases**: Achieve consistent 20-30% improvements from reduced parsing
- **Large-scale tests**: Benefit significantly (18-23% faster) as parsing overhead scales with code size

The optimization is most effective for workloads with moderate to large code files where CST parsing dominates the runtime, as evidenced by the original profiler showing 70%+ of time spent in `cst.parse_module()` and `module.visit()` operations.
Signed-off-by: Saurabh Misra <[email protected]>
…25-08-25T18.50.33

⚡️ Speed up function `add_global_assignments` by 18% in PR #683 (`fix/duplicate-global-assignments-when-reverting-helpers`)
…cs-in-diff

[Lsp] return diff functions grouped by file
* lsp: get new/modified functions inside a git commit

* better name

* refactor

* revert
* save optimization patches metadata

* typo

* lsp: get previous optimizations

* fix patch name in non-lsp mode

* ⚡️ Speed up function `get_patches_metadata` by 45% in PR #690 (`worktree/persist-optimization-patches`)

The optimized code achieves a **44% speedup** through two key optimizations:

**1. Added `@lru_cache(maxsize=1)` to `get_patches_dir_for_project()`**
- This caches the Path object construction, avoiding repeated calls to `get_git_project_id()` and `Path()` creation
- The line profiler shows this function's total time dropped from 5.32ms to being completely eliminated from the hot path in `get_patches_metadata()`
- Since `get_git_project_id()` was already cached but still being called repeatedly, this second-level caching eliminates that redundancy

**2. Replaced `read_text()` + `json.loads()` with `open()` + `json.load()`**
- Using `json.load()` with a file handle is more efficient than reading the entire file into memory first with `read_text()` then parsing it
- This avoids the intermediate string creation and is particularly beneficial for larger JSON files
- Added explicit UTF-8 encoding for consistency

**Performance Impact by Test Type:**
- **Basic cases** (small/missing files): 45-65% faster - benefits primarily from the caching optimization
- **Edge cases** (malformed JSON): 38-47% faster - still benefits from both optimizations  
- **Large scale cases** (1000+ patches, large files): 39-52% faster - the file I/O optimization becomes more significant with larger JSON files

The caching optimization provides the most consistent gains across all scenarios since it eliminates repeated expensive operations, while the file I/O optimization scales with file size.

* fix: patch path

* codeflash suggestions

* split the worktree utils in a separate file

---------

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Saga4 and others added 24 commits September 22, 2025 15:55
* LSP reduce no of candidates

* config revert

* pass reference values to aiservices

* line profiling loading msg

---------

Co-authored-by: saga4 <[email protected]>
Co-authored-by: ali <[email protected]>
* LSP reduce no of candidates

* config revert

* pass reference values to aiservices

* fix inline condition

---------

Co-authored-by: saga4 <[email protected]>
Signed-off-by: Saurabh Misra <[email protected]>
apscheduler tries to schedule jobs when the interpreter is shutting down which can cause it to crash and leave us in a bad state
The optimized version eliminates recursive function calls by replacing the recursive `_find` helper with an iterative approach. This provides significant performance benefits:

**Key Optimizations:**

1. **Removed Recursion Overhead**: The original code used a recursive helper function `_find` that created new stack frames for each parent traversal. The optimized version uses a simple iterative loop that traverses parents sequentially without function call overhead.

2. **Eliminated Function Creation**: The original code defined the `_find` function on every call to `find_target_node`. The optimized version removes this repeated function definition entirely.

3. **Early Exit with for-else**: The optimized code uses Python's `for-else` construct to immediately return `None` when a parent class isn't found, avoiding unnecessary continued searching.

4. **Reduced Attribute Access**: By caching `function_to_optimize.function_name` in a local variable `target_name` and reusing `body` variables, the code reduces repeated attribute lookups.

**Performance Impact by Test Case:**
- **Simple cases** (top-level functions, basic class methods): 23-62% faster due to eliminated recursion overhead
- **Nested class scenarios**: 45-84% faster, with deeper nesting showing greater improvements as recursion elimination has more impact
- **Large-scale tests**: 12-22% faster, showing consistent benefits even with many nodes to traverse
- **Edge cases** (empty modules, non-existent classes): 52-76% faster due to more efficient early termination

The optimization is particularly effective for deeply nested class hierarchies where the original recursive approach created multiple stack frames, while the iterative version maintains constant memory usage regardless of nesting depth.
…25-09-25T14.28.58

⚡️ Speed up function `find_target_node` by 18% in PR #763 (`fix/correctly-find-funtion-node-when-reverting-helpers`)
…node-when-reverting-helpers

[FIX] Respect parent classes in revert helpers
…d move other merged test below; finish resolving aiservice/config/explanation/function_optimizer; regenerate uv.lock
The optimization achieves a **12% speedup** through several targeted improvements in the `_process_test_function` and `_instrument_statement` methods:

**Key Optimizations:**

1. **Variable hoisting and local references**: The optimized code extracts frequently accessed instance variables (`self.async_call_counter`, `node.name`) into local variables at the beginning of `_process_test_function`. It also creates local references to methods (`self._instrument_statement`, `new_body.append`) to avoid repeated attribute lookups during the main loop.

2. **Improved timeout decorator check**: Instead of using `any()` with a generator expression, the optimization uses an explicit loop with early termination when a timeout decorator is found. This avoids creating unnecessary generator objects and allows for faster short-circuiting.

3. **Optimized AST traversal**: The most significant improvement is replacing `ast.walk()` with a manual stack-based traversal using `ast.iter_child_nodes()` in `_instrument_statement`. This eliminates the overhead of `ast.walk()`'s recursive generator and provides better control over the traversal process.

4. **Simplified counter management**: The optimization tracks the call index locally during processing and only updates the instance variable once at the end, reducing dictionary access overhead.

**Performance Impact by Test Case:**
- **Small functions**: 61-130% faster for basic test cases with minimal statements
- **Empty/simple functions**: 71-119% faster due to reduced overhead in the main processing loop  
- **Large-scale functions**: 11.5% faster for functions with 500+ await statements, where the AST traversal optimization becomes most beneficial

The optimizations are particularly effective for functions with many statements where the improved AST traversal and reduced attribute lookups compound to significant savings.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Sep 26, 2025
@KRRT7 KRRT7 force-pushed the standalone-fto-async branch from 40c4108 to 7bbb1e7 Compare September 26, 2025 20:26
@codeflash-ai codeflash-ai bot closed this Sep 27, 2025
@codeflash-ai
Copy link
Contributor Author

codeflash-ai bot commented Sep 27, 2025

This PR has been automatically closed because the original PR #678 by KRRT7 was closed.

@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr678-2025-09-26T20.13.52 branch September 27, 2025 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants