⚡️ Speed up method `AsyncCallInstrumenter._call_in_positions` by 69% in PR #739 (`get-throughput-from-output`) #745

codeflash-ai · 2025-09-22T20:20:20Z

⚡️ This pull request contains optimizations for PR #739

If you approve this dependent PR, these changes will be merged into the original PR branch get-throughput-from-output.

This PR will be automatically closed if the original PR is merged.

📄 69% (0.69x) speedup for `AsyncCallInstrumenter._call_in_positions` in `codeflash/code_utils/instrument_existing_tests.py`

⏱️ Runtime : 313 microseconds → 185 microseconds (best of 137 runs)

📝 Explanation and details

The optimization caches frequently accessed object attributes outside the inner loop to reduce redundant attribute lookups. In the node_in_call_position function, the original code repeatedly accessed node.lineno, node.end_lineno, node.col_offset, node.end_col_offset, and pos.line_no on every iteration of the call_positions loop.

The optimized version hoists these attribute lookups outside the loop:

node_lineno = node.lineno
node_end_lineno = node.end_lineno
node_col_offset = node.col_offset
node_end_col_offset = node.end_col_offset
pos_line_no = pos.line_no (inside the loop but outside the nested conditions)

This change is particularly effective for scenarios with many call positions to check, as evidenced by the large-scale test cases showing 57-148% speedup. The profiler data confirms the optimization reduces time spent on attribute access - the original spent 25.4% of time just on the for pos in call_positions: line accessing attributes, while the optimized version shows improved distribution of execution time.

Python attribute access involves dictionary lookups under the hood, so caching these values in local variables (which are stored in a fast array-based structure) provides significant performance gains when the same attributes are accessed repeatedly in tight loops.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 97 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import ast

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.instrument_existing_tests import \
    AsyncCallInstrumenter


# Dummy classes for testing, mimicking the required interface
class CodePosition:
    def __init__(self, line_no, col_no, end_col_offset=None):
        self.line_no = line_no
        self.col_no = col_no
        self.end_col_offset = end_col_offset

class FunctionToOptimize:
    def __init__(self, function_name, parents=None, top_level_parent_name=None):
        self.function_name = function_name
        self.parents = parents or []
        self.top_level_parent_name = top_level_parent_name

class TestingMode:
    BEHAVIOR = "behavior"
from codeflash.code_utils.instrument_existing_tests import \
    AsyncCallInstrumenter


# Helper function to create ast.Call nodes with position attributes
def make_call_node(lineno, col_offset, end_lineno, end_col_offset):
    node = ast.Call()
    node.lineno = lineno
    node.col_offset = col_offset
    node.end_lineno = end_lineno
    node.end_col_offset = end_col_offset
    return node

# ----------- UNIT TESTS ------------

# ---- BASIC TEST CASES ----

def test_basic_exact_position_match():
    # Test when call position exactly matches the call node's start
    call_node = make_call_node(10, 5, 10, 15)
    call_positions = [CodePosition(10, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.85μs -> 1.94μs (4.63% slower)

def test_basic_within_call_range():
    # Test when call position is within the call node's line range
    call_node = make_call_node(5, 2, 7, 20)
    call_positions = [CodePosition(6, 10)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.78μs -> 1.67μs (6.58% faster)

def test_basic_end_line_and_col_match():
    # Test when call position matches the call node's end line and col
    call_node = make_call_node(3, 1, 5, 10)
    call_positions = [CodePosition(5, 10, end_col_offset=10)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.73μs -> 1.61μs (7.44% faster)

def test_basic_outside_call_range():
    # Test when call position is outside the call node's line range
    call_node = make_call_node(10, 0, 12, 20)
    call_positions = [CodePosition(13, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.41μs -> 1.51μs (6.61% slower)

def test_basic_col_offset_too_small():
    # Test when call position line matches but col is before call node's col_offset
    call_node = make_call_node(8, 10, 8, 20)
    call_positions = [CodePosition(8, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.89μs -> 1.73μs (9.23% faster)

# ---- EDGE TEST CASES ----

def test_edge_no_lineno_or_col_offset():
    # Test when call_node lacks lineno/col_offset attributes
    call_node = ast.Call()  # No lineno/col_offset
    call_positions = [CodePosition(1, 1)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 481ns -> 491ns (2.04% slower)

def test_edge_end_col_offset_none():
    # Test when call position has end_col_offset None
    call_node = make_call_node(2, 2, 4, 10)
    call_positions = [CodePosition(4, 5, end_col_offset=None)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    # Should not match because end_col_offset is None
    codeflash_output = inst._call_in_positions(call_node) # 1.74μs -> 1.65μs (5.38% faster)

def test_edge_call_node_end_lineno_none():
    # Test when call_node.end_lineno is None
    call_node = make_call_node(3, 3, None, 9)
    call_positions = [CodePosition(3, 4)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    # Should not match because end_lineno is None
    codeflash_output = inst._call_in_positions(call_node) # 1.24μs -> 1.43μs (13.3% slower)

def test_edge_position_line_none():
    # Test when call position line_no is None
    call_node = make_call_node(1, 1, 2, 5)
    call_positions = [CodePosition(None, 3)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.11μs -> 1.43μs (22.3% slower)

def test_edge_multiple_positions_one_matches():
    # Test with multiple call_positions, one matches
    call_node = make_call_node(4, 4, 6, 12)
    call_positions = [
        CodePosition(1, 1),
        CodePosition(5, 6),
        CodePosition(10, 10)
    ]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.98μs -> 1.86μs (6.49% faster)

def test_edge_multiple_positions_none_match():
    # Test with multiple call_positions, none match
    call_node = make_call_node(7, 3, 9, 8)
    call_positions = [
        CodePosition(2, 1),
        CodePosition(10, 10),
        CodePosition(6, 5)
    ]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.71μs -> 1.81μs (5.52% slower)

def test_edge_call_node_is_not_call():
    # Test with a node that is not an ast.Call
    node = ast.Name()
    node.lineno = 5
    node.col_offset = 2
    node.end_lineno = 5
    node.end_col_offset = 4
    call_positions = [CodePosition(5, 2)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(node) # 922ns -> 942ns (2.12% slower)

def test_edge_call_position_on_start_and_end():
    # Test where call_positions match both start and end of call_node
    call_node = make_call_node(20, 2, 22, 8)
    call_positions = [
        CodePosition(20, 2),  # start
        CodePosition(22, 8, end_col_offset=8)  # end
    ]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.63μs -> 1.66μs (1.80% slower)

def test_edge_call_position_between_lines():
    # Test when call_position is strictly between start and end lines
    call_node = make_call_node(30, 0, 35, 15)
    call_positions = [CodePosition(32, 7)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.69μs -> 1.61μs (4.89% faster)

# ---- LARGE SCALE TEST CASES ----

def test_large_scale_many_positions_one_match():
    # Test with 1000 call_positions, only one matches
    call_node = make_call_node(50, 10, 60, 20)
    call_positions = [CodePosition(i, 5) for i in range(1, 1000)]
    call_positions[123] = CodePosition(55, 15)  # This should match
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 7.34μs -> 4.66μs (57.7% faster)

def test_large_scale_many_positions_none_match():
    # Test with 1000 call_positions, none match
    call_node = make_call_node(100, 10, 110, 20)
    call_positions = [CodePosition(i, 5) for i in range(1, 1000) if i < 100 or i > 110]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 103μs -> 41.9μs (148% faster)

def test_large_scale_call_node_spans_many_lines():
    # Test with a call_node spanning 900 lines, position matches in the middle
    call_node = make_call_node(1, 0, 900, 10)
    call_positions = [CodePosition(450, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.80μs -> 1.56μs (15.4% faster)

def test_large_scale_call_node_spans_many_lines_none_match():
    # Test with a call_node spanning 900 lines, position outside range
    call_node = make_call_node(1, 0, 900, 10)
    call_positions = [CodePosition(950, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.30μs -> 1.41μs (7.86% slower)

def test_large_scale_all_positions_match():
    # Test where every position matches (should return True for the first match)
    call_node = make_call_node(200, 0, 300, 50)
    call_positions = [CodePosition(i, 1) for i in range(200, 301)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.59μs -> 1.53μs (3.91% faster)

def test_large_scale_positions_with_varied_cols():
    # Test with positions varying columns, only some match
    call_node = make_call_node(400, 10, 400, 50)
    # Only columns >= 10 should match on line 400
    call_positions = [CodePosition(400, i) for i in range(0, 60)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    # At least one position (col >= 10) matches
    codeflash_output = inst._call_in_positions(call_node) # 1.83μs -> 1.54μs (18.9% faster)

# ---- DETERMINISM TEST CASES ----

def test_determinism_multiple_runs():
    # Test that repeated calls with same input give same result
    call_node = make_call_node(10, 2, 15, 20)
    call_positions = [CodePosition(12, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node); result1 = codeflash_output # 1.66μs -> 1.50μs (10.6% faster)
    codeflash_output = inst._call_in_positions(call_node); result2 = codeflash_output # 902ns -> 822ns (9.73% faster)

# ---- CLEAN CODE/READABILITY TEST CASES ----

def test_readability_simple_true():
    # Simple readable test: match at start
    call_node = make_call_node(5, 0, 5, 10)
    call_positions = [CodePosition(5, 0)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.55μs -> 1.45μs (6.81% faster)

def test_readability_simple_false():
    # Simple readable test: position outside
    call_node = make_call_node(1, 1, 1, 2)
    call_positions = [CodePosition(2, 1)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("foo"), "", "", call_positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.32μs -> 1.41μs (6.37% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import ast
from typing import List, Optional

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.instrument_existing_tests import \
    AsyncCallInstrumenter


class CodePosition:
    def __init__(
        self,
        line_no: Optional[int],
        col_no: Optional[int],
        end_col_offset: Optional[int] = None,
    ):
        self.line_no = line_no
        self.col_no = col_no
        self.end_col_offset = end_col_offset

class FunctionParent:
    def __init__(self, type_: str):
        self.type = type_

class FunctionToOptimize:
    def __init__(self, function_name: str, parents: List[FunctionParent], top_level_parent_name: Optional[str] = None):
        self.function_name = function_name
        self.parents = parents
        self.top_level_parent_name = top_level_parent_name

class TestingMode:
    BEHAVIOR = "BEHAVIOR"
from codeflash.code_utils.instrument_existing_tests import \
    AsyncCallInstrumenter

# --- Unit tests ---

# Helper to create a dummy ast.Call node with required attributes
def make_call_node(lineno, col_offset, end_lineno, end_col_offset):
    node = ast.Call()
    node.lineno = lineno
    node.col_offset = col_offset
    node.end_lineno = end_lineno
    node.end_col_offset = end_col_offset
    return node

# Basic Test Cases

def test_basic_exact_match_start():
    """Call node matches position exactly at start line and col_offset."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [CodePosition(10, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.56μs -> 1.52μs (2.69% faster)

def test_basic_exact_match_end():
    """Call node matches position exactly at end line and end_col_offset."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [CodePosition(12, 20, end_col_offset=20)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.82μs -> 1.58μs (15.2% faster)

def test_basic_inside_range():
    """Call node matches position inside range (not at start/end)."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [CodePosition(11, 10)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.64μs -> 1.56μs (5.12% faster)

def test_basic_no_match_before():
    """Position before call node range should not match."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [CodePosition(9, 0)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.30μs -> 1.42μs (8.44% slower)

def test_basic_no_match_after():
    """Position after call node range should not match."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [CodePosition(13, 0)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.34μs -> 1.43μs (6.28% slower)

def test_basic_no_match_wrong_col():
    """Position on correct line but before col_offset should not match."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [CodePosition(10, 4)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.67μs -> 1.58μs (5.69% faster)

def test_basic_multiple_positions_one_matches():
    """Multiple positions, one matches."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [
        CodePosition(9, 0),
        CodePosition(10, 5),
        CodePosition(13, 0)
    ]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.74μs -> 1.62μs (7.39% faster)

def test_basic_multiple_positions_none_match():
    """Multiple positions, none match."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [
        CodePosition(9, 0),
        CodePosition(13, 0)
    ]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.54μs -> 1.57μs (1.97% slower)

def test_basic_empty_positions():
    """Empty positions list should never match."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = []
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.01μs -> 1.23μs (17.9% slower)

# Edge Test Cases

def test_edge_missing_lineno():
    """Call node missing lineno attribute should return False."""
    call_node = ast.Call()
    call_node.col_offset = 5
    call_node.end_lineno = 10
    call_node.end_col_offset = 20
    positions = [CodePosition(10, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 461ns -> 491ns (6.11% slower)

def test_edge_missing_col_offset():
    """Call node missing col_offset attribute should return False."""
    call_node = ast.Call()
    call_node.lineno = 10
    call_node.end_lineno = 12
    call_node.end_col_offset = 20
    positions = [CodePosition(10, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 591ns -> 561ns (5.35% faster)

def test_edge_missing_end_lineno():
    """Call node missing end_lineno attribute should not match any position."""
    call_node = ast.Call()
    call_node.lineno = 10
    call_node.col_offset = 5
    call_node.end_col_offset = 20
    positions = [CodePosition(10, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    # node_in_call_position checks for end_lineno existence
    codeflash_output = inst._call_in_positions(call_node) # 1.23μs -> 1.42μs (13.4% slower)

def test_edge_missing_end_col_offset():
    """Call node missing end_col_offset attribute, but position does not require it."""
    call_node = ast.Call()
    call_node.lineno = 10
    call_node.col_offset = 5
    call_node.end_lineno = 12
    # no end_col_offset
    positions = [CodePosition(11, 10)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    # Should match because end_col_offset is not needed for inside-range match
    codeflash_output = inst._call_in_positions(call_node) # 1.71μs -> 1.62μs (5.55% faster)

def test_edge_position_line_none():
    """Position with line_no=None should never match."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [CodePosition(None, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.10μs -> 1.37μs (19.6% slower)

def test_edge_position_col_none():
    """Position with col_no=None should not match."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [CodePosition(10, None)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    # col_no is required for all match cases
    codeflash_output = inst._call_in_positions(call_node)

def test_edge_position_end_col_offset_none():
    """Position with end_col_offset=None should match if not required."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [CodePosition(11, 10, end_col_offset=None)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    # Should match for inside-range case
    codeflash_output = inst._call_in_positions(call_node) # 2.30μs -> 2.13μs (7.97% faster)

def test_edge_call_node_not_call():
    """Node that is not ast.Call should never match."""
    node = ast.Name()
    node.lineno = 10
    node.col_offset = 5
    node.end_lineno = 12
    node.end_col_offset = 20
    positions = [CodePosition(10, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    # Should always return False, as node is not ast.Call
    codeflash_output = inst._call_in_positions(node) # 982ns -> 981ns (0.102% faster)

def test_edge_call_node_extra_attributes():
    """Call node with extra unrelated attributes should not affect result."""
    call_node = make_call_node(10, 5, 12, 20)
    call_node.foo = "bar"
    positions = [CodePosition(10, 5)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.71μs -> 1.66μs (3.01% faster)

# Large Scale Test Cases

def test_large_many_positions_one_matches():
    """1000 positions, only one matches."""
    call_node = make_call_node(100, 10, 105, 20)
    positions = [CodePosition(i, 0) for i in range(1, 1000)]
    positions[99] = CodePosition(100, 10)  # 0-based index
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 11.5μs -> 6.93μs (65.5% faster)

def test_large_many_positions_none_match():
    """1000 positions, none match."""
    call_node = make_call_node(200, 10, 205, 20)
    positions = [CodePosition(i, 0) for i in range(1, 1000) if i < 200 or i > 205]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 100μs -> 42.5μs (136% faster)

def test_large_call_node_spans_many_lines():
    """Call node spanning many lines, position matches inside range."""
    call_node = make_call_node(1, 0, 999, 999)
    positions = [CodePosition(500, 100)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.67μs -> 1.60μs (4.37% faster)

def test_large_call_node_spans_many_lines_none_match():
    """Call node spanning many lines, position outside range."""
    call_node = make_call_node(1, 0, 999, 999)
    positions = [CodePosition(1000, 100)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 1.32μs -> 1.42μs (7.03% slower)

def test_large_positions_with_varied_cols():
    """Positions with varied col_no, only correct col matches."""
    call_node = make_call_node(50, 30, 60, 40)
    positions = [CodePosition(50, i) for i in range(0, 50)]
    positions.append(CodePosition(50, 35))  # Should match
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 9.52μs -> 4.66μs (104% faster)

def test_large_positions_with_varied_lines():
    """Positions with varied line_no, only correct line matches."""
    call_node = make_call_node(20, 10, 30, 20)
    positions = [CodePosition(i, 15) for i in range(10, 40)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    # Should match for lines 20-30
    found = any(inst._call_in_positions(make_call_node(20, 10, 30, 20)) for i in range(20, 31)) # 2.35μs -> 2.07μs (13.1% faster)

def test_large_positions_all_none():
    """All positions have None for line_no or col_no, should never match."""
    call_node = make_call_node(10, 5, 12, 20)
    positions = [CodePosition(None, None) for _ in range(1000)]
    inst = AsyncCallInstrumenter(FunctionToOptimize("f", [FunctionParent("FunctionDef")]), "", "", positions)
    codeflash_output = inst._call_in_positions(call_node) # 18.4μs -> 22.7μs (19.0% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr739-2025-09-22T20.20.14 and push.

The optimization caches frequently accessed object attributes outside the inner loop to reduce redundant attribute lookups. In the `node_in_call_position` function, the original code repeatedly accessed `node.lineno`, `node.end_lineno`, `node.col_offset`, `node.end_col_offset`, and `pos.line_no` on every iteration of the `call_positions` loop. The optimized version hoists these attribute lookups outside the loop: - `node_lineno = node.lineno` - `node_end_lineno = node.end_lineno` - `node_col_offset = node.col_offset` - `node_end_col_offset = node.end_col_offset` - `pos_line_no = pos.line_no` (inside the loop but outside the nested conditions) This change is particularly effective for scenarios with many call positions to check, as evidenced by the large-scale test cases showing 57-148% speedup. The profiler data confirms the optimization reduces time spent on attribute access - the original spent 25.4% of time just on the `for pos in call_positions:` line accessing attributes, while the optimized version shows improved distribution of execution time. Python attribute access involves dictionary lookups under the hood, so caching these values in local variables (which are stored in a fast array-based structure) provides significant performance gains when the same attributes are accessed repeatedly in tight loops.

KRRT7 · 2025-09-22T20:59:14Z

I don't like much these local assignment micro-optimizations, additionally, many tests report the optimization being slower for test cases

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Sep 22, 2025

codeflash-ai bot mentioned this pull request Sep 22, 2025

Get throughput from output for async functions #739

Merged

KRRT7 closed this Sep 22, 2025

codeflash-ai bot deleted the codeflash/optimize-pr739-2025-09-22T20.20.14 branch September 22, 2025 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `AsyncCallInstrumenter._call_in_positions` by 69% in PR #739 (`get-throughput-from-output`) #745

⚡️ Speed up method `AsyncCallInstrumenter._call_in_positions` by 69% in PR #739 (`get-throughput-from-output`) #745

Uh oh!

codeflash-ai bot commented Sep 22, 2025

Uh oh!

KRRT7 commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up method AsyncCallInstrumenter._call_in_positions by 69% in PR #739 (get-throughput-from-output) #745

⚡️ Speed up method AsyncCallInstrumenter._call_in_positions by 69% in PR #739 (get-throughput-from-output) #745

Uh oh!

Conversation

codeflash-ai bot commented Sep 22, 2025

⚡️ This pull request contains optimizations for PR #739

📄 69% (0.69x) speedup for AsyncCallInstrumenter._call_in_positions in codeflash/code_utils/instrument_existing_tests.py

📝 Explanation and details

Uh oh!

KRRT7 commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up method `AsyncCallInstrumenter._call_in_positions` by 69% in PR #739 (`get-throughput-from-output`) #745

⚡️ Speed up method `AsyncCallInstrumenter._call_in_positions` by 69% in PR #739 (`get-throughput-from-output`) #745

📄 69% (0.69x) speedup for `AsyncCallInstrumenter._call_in_positions` in `codeflash/code_utils/instrument_existing_tests.py`