Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Oct 17, 2025

⚡️ This pull request contains optimizations for PR #824

If you approve this dependent PR, these changes will be merged into the original PR branch opt-impact-aseem-improvement.

This PR will be automatically closed if the original PR is merged.


📄 51% (0.51x) speedup for FunctionCallFinder._extract_source_code in codeflash/code_utils/code_extractor.py

⏱️ Runtime : 1.12 milliseconds 742 microseconds (best of 59 runs)

📝 Explanation and details

The optimized version achieves a 50% speedup through several key micro-optimizations targeting the most expensive operations:

Key Optimizations:

  1. Reduced line.strip() calls: The original code called line.strip() twice per line (once for empty line check, once for dedenting logic). The optimization caches stripped = line.strip() in the min_indent loop, eliminating redundant string operations.

  2. Replaced min() with conditional check: Changed min_indent = min(min_indent, indent) to if indent < min_indent: min_indent = indent, avoiding function call overhead in the tight loop.

  3. Eliminated branch logic in dedenting: The original code had separate loops for class methods vs top-level functions. The optimization unifies this by precomputing dedent_amount once, then using a single conditional to choose between dedenting or direct copying.

  4. Method reference caching: append = result_lines.append avoids repeated attribute lookups in the result building loop.

  5. Optimized slicing logic: Removed the len(line) > dedent_amount check since Python slicing past string end safely returns empty string.

Performance Impact by Test Case:

  • Large functions benefit most: 500-line function shows 130% speedup, 999-line function shows 133% speedup
  • Basic functions: 19-38% speedup on typical small functions
  • Class methods: 19-25% speedup with preserved indentation logic
  • Edge cases: Maintain correctness while still gaining 20-32% performance

The optimizations primarily target the string processing bottlenecks (line stripping, indentation calculation, result building) which dominate runtime for functions with many lines, making this especially effective for large code extraction scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 259 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 96.2%
🌀 Generated Regression Tests and Runtime
import ast
import textwrap

# imports
import pytest
from codeflash.code_utils.code_extractor import FunctionCallFinder

# function to test
# (Paste the _extract_source_code implementation from the provided code)

class DummyFinder:
    """Minimal stub to mimic FunctionCallFinder context for testing _extract_source_code."""
    def __init__(self, source_lines, current_class_stack=None):
        self.source_lines = source_lines
        self.current_class_stack = current_class_stack if current_class_stack is not None else []

    def _extract_source_code(self, node: ast.FunctionDef) -> str:
        # Paste the _extract_source_code method here (from the provided code)
        if not self.source_lines or not hasattr(node, "lineno"):
            # Fallback to ast.unparse if available (Python 3.9+)
            try:
                return ast.unparse(node)
            except AttributeError:
                return f"# Source code extraction not available for {node.name}"

        # Get the lines for this function
        start_line = node.lineno - 1  # Convert to 0-based index
        end_line = node.end_lineno if hasattr(node, "end_lineno") else len(self.source_lines)

        # Extract the function lines
        func_lines = self.source_lines[start_line:end_line]

        # Find the minimum indentation (excluding empty lines)
        min_indent = float("inf")
        for line in func_lines:
            if line.strip():  # Skip empty lines
                indent = len(line) - len(line.lstrip())
                min_indent = min(min_indent, indent)

        # If this is a method (inside a class), preserve one level of indentation
        if self.current_class_stack:
            # Keep 4 spaces of indentation for methods
            dedent_amount = max(0, min_indent - 4)
            result_lines = []
            for line in func_lines:
                if line.strip():  # Only dedent non-empty lines
                    result_lines.append(line[dedent_amount:] if len(line) > dedent_amount else line)
                else:
                    result_lines.append(line)
        else:
            # For top-level functions, remove all leading indentation
            result_lines = []
            for line in func_lines:
                if line.strip():  # Only dedent non-empty lines
                    result_lines.append(line[min_indent:] if len(line) > min_indent else line)
                else:
                    result_lines.append(line)

        return "".join(result_lines).rstrip()

# Helper to get ast.FunctionDef node from source
def get_funcdef_node(source, func_name):
    """Parse source and return the ast.FunctionDef node for func_name."""
    tree = ast.parse(source)
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef) and node.name == func_name:
            return node
    raise ValueError(f"Function {func_name} not found in AST.")

def get_method_node(source, class_name, method_name):
    """Parse source and return the ast.FunctionDef node for method_name in class_name."""
    tree = ast.parse(source)
    for node in ast.walk(tree):
        if isinstance(node, ast.ClassDef) and node.name == class_name:
            for child in node.body:
                if isinstance(child, ast.FunctionDef) and child.name == method_name:
                    return child
    raise ValueError(f"Method {method_name} not found in class {class_name}.")

# --------------------------
# Basic Test Cases
# --------------------------

def test_simple_top_level_function():
    # Basic: Extract a simple top-level function with no indentation issues
    src = textwrap.dedent("""
    def foo():
        x = 1
        y = 2
        return x + y
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "foo")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.92μs -> 6.54μs (9.48% slower)

def test_function_with_blank_lines():
    # Basic: Function with blank lines inside
    src = textwrap.dedent("""
    def bar():
        x = 1

        y = 2

        return x + y
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "bar")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.81μs -> 6.19μs (6.14% slower)

def test_function_with_comments():
    # Basic: Function with comments
    src = textwrap.dedent("""
    def baz():
        # This is a comment
        x = 42  # Inline comment
        return x
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "baz")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.38μs -> 5.46μs (1.47% slower)


def test_function_with_decorator():
    # Basic: Function with a decorator
    src = textwrap.dedent("""
    @mydecorator
    def decorated():
        return 123
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "decorated")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.23μs -> 5.44μs (3.86% slower)

# --------------------------
# Edge Test Cases
# --------------------------

def test_method_in_class_preserves_indentation():
    # Edge: Method inside a class should keep 4 spaces indentation
    src = textwrap.dedent("""
    class MyClass:
        def method(self):
            x = 5
            return x
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_method_node(src, "MyClass", "method")
    finder = DummyFinder(lines, current_class_stack=["MyClass"])
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.66μs -> 5.79μs (2.26% slower)
    expected = "    def method(self):\n        x = 5\n        return x"

def test_nested_function():
    # Edge: Nested function should extract only the outer function's lines
    src = textwrap.dedent("""
    def outer():
        x = 1
        def inner():
            return x
        return inner()
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "outer")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 6.05μs -> 6.07μs (0.329% slower)

def test_function_with_no_body():
    # Edge: Function with only 'pass'
    src = textwrap.dedent("""
    def empty():
        pass
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "empty")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 4.51μs -> 4.65μs (3.03% slower)

def test_function_at_end_of_file_with_no_newline():
    # Edge: Function at end of file, no trailing newline
    src = "def endfunc():\n    return 99"
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "endfunc")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 4.38μs -> 4.34μs (0.922% faster)

def test_function_with_multiline_signature():
    # Edge: Function with a multi-line signature
    src = textwrap.dedent("""
    def long_signature(
        a, b,
        c, d
    ):
        return a + b + c + d
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "long_signature")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.95μs -> 6.19μs (3.88% slower)

def test_method_with_extra_indentation():
    # Edge: Method with extra indentation (should keep 4 spaces for method)
    src = textwrap.dedent("""
    class IndentClass:
            def method(self):
                x = 1
                return x
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_method_node(src, "IndentClass", "method")
    finder = DummyFinder(lines, current_class_stack=["IndentClass"])
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.66μs -> 5.85μs (3.26% slower)
    expected = "    def method(self):\n        x = 1\n        return x"

def test_function_with_only_docstring():
    # Edge: Function with only a docstring
    src = textwrap.dedent("""
    def docfunc():
        \"\"\"This is a docstring.\"\"\"
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "docfunc")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 4.50μs -> 4.51μs (0.222% slower)

def test_function_with_unicode_and_non_ascii():
    # Edge: Function with unicode/non-ASCII characters
    src = textwrap.dedent("""
    def unicode_func():
        s = "προγραμμα"
        return s
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "unicode_func")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.80μs -> 5.94μs (2.34% slower)

def test_function_with_leading_and_trailing_blank_lines():
    # Edge: Function with blank lines before/after body
    src = textwrap.dedent("""
    def spaced():
        
        x = 1
        
        return x
        
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "spaced")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.62μs -> 5.71μs (1.58% slower)
    expected = "def spaced():\n    \n    x = 1\n    \n    return x"

def test_function_with_missing_end_lineno(monkeypatch):
    # Edge: Simulate node with no end_lineno (older Python)
    src = textwrap.dedent("""
    def no_end():
        x = 1
        y = 2
        return x + y
    """).lstrip("\n")
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "no_end")
    # Remove end_lineno attribute
    if hasattr(node, "end_lineno"):
        delattr(node, "end_lineno")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.56μs -> 5.49μs (1.29% faster)
    # Should extract from start line to end of file
    expected = src.rstrip()

def test_function_with_no_source_lines(monkeypatch):
    # Edge: No source_lines, should fallback to ast.unparse or error message
    src = textwrap.dedent("""
    def fallback():
        return 1
    """).lstrip("\n")
    node = get_funcdef_node(src, "fallback")
    finder = DummyFinder(None)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 27.1μs -> 27.9μs (3.05% slower)
    # Accept either ast.unparse output or fallback string
    try:
        expected = ast.unparse(node)
    except AttributeError:
        expected = "# Source code extraction not available for fallback"

def test_function_with_no_lineno(monkeypatch):
    # Edge: Node without lineno attribute
    src = textwrap.dedent("""
    def noline():
        return 1
    """).lstrip("\n")
    node = get_funcdef_node(src, "noline")
    if hasattr(node, "lineno"):
        delattr(node, "lineno")
    finder = DummyFinder(src.splitlines(keepends=True))
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 17.0μs -> 17.5μs (2.86% slower)
    try:
        expected = ast.unparse(node)
    except AttributeError:
        expected = "# Source code extraction not available for noline"

# --------------------------
# Large Scale Test Cases
# --------------------------


def test_many_functions_in_file():
    # Large: File with 500 functions, extract one from the middle
    funcs = []
    for i in range(500):
        funcs.append(f"def func_{i}():\n    return {i}\n")
    src = "".join(funcs)
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "func_250")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.95μs -> 5.97μs (0.335% slower)
    expected = f"def func_250():\n    return 250"

def test_large_class_with_many_methods():
    # Large: Class with 100 methods, extract one
    methods = []
    for i in range(100):
        methods.append(f"    def method_{i}(self):\n        return {i}\n")
    src = "class BigClass:\n" + "".join(methods)
    lines = src.splitlines(keepends=True)
    node = get_method_node(src, "BigClass", "method_42")
    finder = DummyFinder(lines, current_class_stack=["BigClass"])
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.78μs -> 5.92μs (2.36% slower)
    expected = "    def method_42(self):\n        return 42"

def test_large_function_with_blank_lines_and_comments():
    # Large: Function with 100 lines, blank lines and comments
    body = ""
    for i in range(50):
        body += f"    # comment {i}\n    x{i} = {i}\n\n"
    src = f"def big_comments():\n{body}    return 0\n"
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "big_comments")
    finder = DummyFinder(lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 42.8μs -> 42.4μs (0.921% faster)

def test_large_file_with_mixed_functions_and_classes():
    # Large: File with many functions and classes, extract from both
    funcs = []
    for i in range(200):
        funcs.append(f"def func_{i}():\n    return {i}\n")
    methods = []
    for i in range(50):
        methods.append(f"    def method_{i}(self):\n        return {i}\n")
    class_src = "class MixedClass:\n" + "".join(methods)
    src = "".join(funcs) + class_src
    lines = src.splitlines(keepends=True)
    # Extract a function
    node_func = get_funcdef_node(src, "func_100")
    finder_func = DummyFinder(lines)
    codeflash_output = finder_func._extract_source_code(node_func); result_func = codeflash_output # 5.49μs -> 5.74μs (4.35% slower)
    expected_func = "def func_100():\n    return 100"
    # Extract a method
    node_method = get_method_node(src, "MixedClass", "method_25")
    finder_method = DummyFinder(lines, current_class_stack=["MixedClass"])
    codeflash_output = finder_method._extract_source_code(node_method); result_method = codeflash_output # 4.90μs -> 4.90μs (0.000% faster)
    expected_method = "    def method_25(self):\n        return 25"
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import ast
import textwrap

# imports
import pytest
from codeflash.code_utils.code_extractor import FunctionCallFinder


# Helper to create AST nodes with lineno and end_lineno
def get_funcdef_node(source: str, func_name: str):
    """
    Parse the source and return the ast.FunctionDef node for the given function name.
    """
    tree = ast.parse(source)
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef) and node.name == func_name:
            return node
    raise ValueError(f"Function {func_name} not found in AST.")

# -------------------- UNIT TESTS --------------------

# 1. BASIC TEST CASES

def test_extract_simple_top_level_function():
    # Simple function, no indentation issues
    src = textwrap.dedent("""
    def foo():
        x = 1
        y = 2
        return x + y
    """).lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "foo")
    finder = FunctionCallFinder("foo", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 6.03μs -> 4.82μs (25.1% faster)
    expected = "def foo():\n    x = 1\n    y = 2\n    return x + y"

def test_extract_function_with_blank_lines():
    # Function with blank lines inside and at the end
    src = textwrap.dedent("""
    def bar():
        a = 1

        b = 2

        return a + b

    """).lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "bar")
    finder = FunctionCallFinder("bar", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 6.15μs -> 4.70μs (30.9% faster)
    expected = "def bar():\n    a = 1\n\n    b = 2\n\n    return a + b"

def test_extract_function_with_docstring():
    # Function with a docstring
    src = textwrap.dedent('''
    def baz():
        """This is a docstring."""
        return 42
    ''').lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "baz")
    finder = FunctionCallFinder("baz", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.06μs -> 4.05μs (25.0% faster)
    expected = 'def baz():\n    """This is a docstring."""\n    return 42'

def test_extract_function_with_multiline_statement():
    # Function with a multiline statement
    src = textwrap.dedent('''
    def qux():
        total = (
            1 +
            2 +
            3
        )
        return total
    ''').lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "qux")
    finder = FunctionCallFinder("qux", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 6.79μs -> 4.91μs (38.4% faster)
    expected = 'def qux():\n    total = (\n        1 +\n        2 +\n        3\n    )\n    return total'

def test_extract_function_with_decorator():
    # Function with a decorator
    src = textwrap.dedent('''
    @mydecorator
    def dec():
        pass
    ''').lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "dec")
    finder = FunctionCallFinder("dec", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 4.44μs -> 3.72μs (19.4% faster)
    expected = '@mydecorator\ndef dec():\n    pass'

# 2. EDGE TEST CASES

def test_extract_function_with_tabs_and_spaces():
    # Function indented with tabs and spaces
    src = "def mix():\n\tval = 1\n\treturn val\n"
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "mix")
    finder = FunctionCallFinder("mix", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 4.82μs -> 4.01μs (20.3% faster)
    # Tabs are preserved as per original source
    expected = "def mix():\n\tval = 1\n\treturn val"

def test_extract_function_with_no_body():
    # Function with only a 'pass' statement
    src = textwrap.dedent("""
    def empty():
        pass
    """).lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "empty")
    finder = FunctionCallFinder("empty", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 4.39μs -> 3.62μs (21.3% faster)
    expected = "def empty():\n    pass"

def test_extract_function_with_inner_function():
    # Function contains a nested function
    src = textwrap.dedent("""
    def outer():
        def inner():
            return 1
        return inner()
    """).lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "outer")
    finder = FunctionCallFinder("outer", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.32μs -> 4.17μs (27.6% faster)
    expected = "def outer():\n    def inner():\n        return 1\n    return inner()"

def test_extract_method_in_class():
    # Method inside a class, should preserve 4 spaces of indentation
    src = textwrap.dedent("""
    class MyClass:
        def method(self):
            x = 5
            return x
    """).lstrip('\n')
    lines = src.splitlines(keepends=True)
    # Find the method node
    tree = ast.parse(src)
    class_node = next(n for n in ast.walk(tree) if isinstance(n, ast.ClassDef) and n.name == "MyClass")
    method_node = next(n for n in class_node.body if isinstance(n, ast.FunctionDef) and n.name == "method")
    finder = FunctionCallFinder("method", "dummy.py", lines)
    finder.current_class_stack.append("MyClass")  # Simulate being inside a class
    codeflash_output = finder._extract_source_code(method_node); result = codeflash_output # 5.36μs -> 4.49μs (19.4% faster)
    expected = "    def method(self):\n        x = 5\n        return x"

def test_extract_method_with_extra_indentation():
    # Method inside a class, but with extra indentation (e.g., 8 spaces)
    src = textwrap.dedent("""
    class MyClass:
            def weird_indent(self):
                return 123
    """).lstrip('\n')
    lines = src.splitlines(keepends=True)
    tree = ast.parse(src)
    class_node = next(n for n in ast.walk(tree) if isinstance(n, ast.ClassDef) and n.name == "MyClass")
    method_node = next(n for n in class_node.body if isinstance(n, ast.FunctionDef) and n.name == "weird_indent")
    finder = FunctionCallFinder("weird_indent", "dummy.py", lines)
    finder.current_class_stack.append("MyClass")
    codeflash_output = finder._extract_source_code(method_node); result = codeflash_output # 4.86μs -> 4.76μs (2.10% faster)
    # Should still preserve 4 spaces for class method
    expected = "    def weird_indent(self):\n        return 123"

def test_extract_function_with_comment_lines():
    # Function with comments and blank lines
    src = textwrap.dedent("""
    def commented():
        # This is a comment
        x = 1  # inline comment

        # Another comment
        return x
    """).lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "commented")
    finder = FunctionCallFinder("commented", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 6.15μs -> 4.46μs (38.0% faster)
    expected = "def commented():\n    # This is a comment\n    x = 1  # inline comment\n\n    # Another comment\n    return x"

def test_extract_function_with_leading_trailing_blank_lines():
    # Function with extra blank lines before and after
    src = "\n\n" + textwrap.dedent("""
    def spaced():
        x = 1

        return x
    """) + "\n\n"
    src = src.lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "spaced")
    finder = FunctionCallFinder("spaced", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.23μs -> 3.97μs (31.8% faster)
    expected = "def spaced():\n    x = 1\n\n    return x"

def test_extract_function_with_non_ascii_characters():
    # Function with Unicode characters
    src = textwrap.dedent("""
    def greet():
        name = "世界"
        return f"Hello, {name}"
    """).lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "greet")
    finder = FunctionCallFinder("greet", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.81μs -> 4.69μs (23.9% faster)
    expected = 'def greet():\n    name = "世界"\n    return f"Hello, {name}"'

def test_extract_function_with_missing_end_lineno(monkeypatch):
    # Simulate a node without end_lineno attribute (old Python or synthetic node)
    src = textwrap.dedent("""
    def foo():
        x = 1
        y = 2
        return x + y
    """).lstrip('\n')
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "foo")
    # Remove end_lineno attribute if present
    if hasattr(node, "end_lineno"):
        delattr(node, "end_lineno")
    finder = FunctionCallFinder("foo", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 5.67μs -> 4.28μs (32.5% faster)
    # Should extract from start_line to end of file
    expected = "def foo():\n    x = 1\n    y = 2\n    return x + y"

def test_extract_source_code_no_source_lines(monkeypatch):
    # Simulate missing source_lines: should fallback to ast.unparse or error message
    src = textwrap.dedent("""
    def fallback():
        x = 1
        return x
    """).lstrip('\n')
    node = get_funcdef_node(src, "fallback")
    finder = FunctionCallFinder("fallback", "dummy.py", None)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 29.4μs -> 29.2μs (0.757% faster)
    # If ast.unparse is available, should match; otherwise, fallback string
    try:
        expected = ast.unparse(node)
    except AttributeError:
        expected = "# Source code extraction not available for fallback"

def test_extract_source_code_no_lineno(monkeypatch):
    # Simulate node with no lineno attribute
    src = textwrap.dedent("""
    def fallback():
        x = 1
        return x
    """).lstrip('\n')
    node = get_funcdef_node(src, "fallback")
    if hasattr(node, "lineno"):
        delattr(node, "lineno")
    finder = FunctionCallFinder("fallback", "dummy.py", ["dummy"])
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 16.6μs -> 16.8μs (1.13% slower)
    try:
        expected = ast.unparse(node)
    except AttributeError:
        expected = "# Source code extraction not available for fallback"

# 3. LARGE SCALE TEST CASES

def test_extract_large_function():
    # Function with 500 lines
    body = "\n".join([f"    x{i} = {i}" for i in range(500)])
    src = f"def bigfunc():\n{body}\n    return x499\n"
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "bigfunc")
    finder = FunctionCallFinder("bigfunc", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 156μs -> 67.9μs (130% faster)
    # Check that the first and last lines are correct, and the total number of lines
    result_lines = result.splitlines()

def test_extract_many_functions():
    # 100 functions in one file
    src = ""
    for i in range(100):
        src += f"def func{i}():\n    return {i}\n\n"
    lines = src.splitlines(keepends=True)
    for i in range(100):
        node = get_funcdef_node(src, f"func{i}")
        finder = FunctionCallFinder(f"func{i}", "dummy.py", lines)
        codeflash_output = finder._extract_source_code(node); result = codeflash_output # 253μs -> 202μs (25.0% faster)
        expected = f"def func{i}():\n    return {i}"

def test_extract_large_method_in_class():
    # Method with 300 lines in a class
    body = "\n".join([f"        z{i} = {i}" for i in range(300)])
    src = f"class Big:\n    def m(self):\n{body}\n        return z299\n"
    lines = src.splitlines(keepends=True)
    tree = ast.parse(src)
    class_node = next(n for n in ast.walk(tree) if isinstance(n, ast.ClassDef) and n.name == "Big")
    method_node = next(n for n in class_node.body if isinstance(n, ast.FunctionDef) and n.name == "m")
    finder = FunctionCallFinder("m", "dummy.py", lines)
    finder.current_class_stack.append("Big")
    codeflash_output = finder._extract_source_code(method_node); result = codeflash_output # 98.5μs -> 45.0μs (119% faster)
    result_lines = result.splitlines()

def test_extract_source_code_performance():
    # Stress test: 999-line function, check that it works and is reasonably fast
    body = "\n".join([f"    a{i} = {i}" for i in range(999)])
    src = f"def huge():\n{body}\n    return a998\n"
    lines = src.splitlines(keepends=True)
    node = get_funcdef_node(src, "huge")
    finder = FunctionCallFinder("huge", "dummy.py", lines)
    codeflash_output = finder._extract_source_code(node); result = codeflash_output # 304μs -> 130μs (133% faster)
    result_lines = result.splitlines()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr824-2025-10-17T00.51.58 and push.

Codeflash

The optimized version achieves a **50% speedup** through several key micro-optimizations targeting the most expensive operations:

**Key Optimizations:**

1. **Reduced `line.strip()` calls**: The original code called `line.strip()` twice per line (once for empty line check, once for dedenting logic). The optimization caches `stripped = line.strip()` in the min_indent loop, eliminating redundant string operations.

2. **Replaced `min()` with conditional check**: Changed `min_indent = min(min_indent, indent)` to `if indent < min_indent: min_indent = indent`, avoiding function call overhead in the tight loop.

3. **Eliminated branch logic in dedenting**: The original code had separate loops for class methods vs top-level functions. The optimization unifies this by precomputing `dedent_amount` once, then using a single conditional to choose between dedenting or direct copying.

4. **Method reference caching**: `append = result_lines.append` avoids repeated attribute lookups in the result building loop.

5. **Optimized slicing logic**: Removed the `len(line) > dedent_amount` check since Python slicing past string end safely returns empty string.

**Performance Impact by Test Case:**
- **Large functions benefit most**: 500-line function shows 130% speedup, 999-line function shows 133% speedup
- **Basic functions**: 19-38% speedup on typical small functions  
- **Class methods**: 19-25% speedup with preserved indentation logic
- **Edge cases**: Maintain correctness while still gaining 20-32% performance

The optimizations primarily target the string processing bottlenecks (line stripping, indentation calculation, result building) which dominate runtime for functions with many lines, making this especially effective for large code extraction scenarios.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 17, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr824-2025-10-17T00.51.58 branch October 17, 2025 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant