⚡️ Speed up method `FunctionCallFinder._extract_source_code` by 295% #838

codeflash-ai · 2025-10-20T23:38:46Z

📄 295% (2.95x) speedup for `FunctionCallFinder._extract_source_code` in `codeflash/code_utils/code_extractor.py`

⏱️ Runtime : 652 microseconds → 165 microseconds (best of 45 runs)

📝 Explanation and details

The optimized code achieves a 295% speedup through several key optimizations in the _extract_source_code method:

1. Early Exit Optimization in Min Indent Finding
The most significant improvement is adding an early exit when min_indent reaches 0:

if min_indent == 0:
    break  # Early exit if we hit zero (can't get lower)

This eliminates unnecessary iterations when processing functions with no indentation, which is common for top-level functions.

2. Caching Strip Operation
Instead of calling line.strip() multiple times, the optimized version caches it:

stripped = line.strip()
if stripped:

This reduces redundant string operations during the min indent calculation loop.

3. Conditional List Comprehension with Fallbacks
The optimized code replaces nested loops with list comprehensions but only when beneficial:

if dedent_amount > 0:
    result_lines = [line[dedent_amount:] if line.strip() and len(line) > dedent_amount else line
                    for line in func_lines]
else:
    result_lines = list(func_lines)

When no processing is needed, it uses the faster list(func_lines) instead of applying transformations.

Performance Impact by Test Case:

Large functions see the biggest gains (958% faster for 1000-line function, 870% for functions with many blank lines)
Small functions show moderate improvements (16-56% faster)
Unicode content benefits significantly (405% faster) due to reduced string operations
Class methods see smaller gains since they have more complex indentation logic

The optimizations are particularly effective for large codebases and functions with minimal indentation (top-level functions), which are common in real-world code analysis scenarios.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 32 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	96.2%

🌀 Generated Regression Tests and Runtime

import ast

# imports
import pytest
from codeflash.code_utils.code_extractor import FunctionCallFinder

# unit tests

# Helper to get ast.FunctionDef node from source code
def get_function_node(source: str, func_name: str):
    tree = ast.parse(source)
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef) and node.name == func_name:
            return node
    raise ValueError(f"Function '{func_name}' not found in AST.")

# Basic Test Cases

def test_basic_top_level_function():
    # Simple top-level function
    source = [
        "def foo(x, y):\n",
        "    return x + y\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 4.67μs -> 3.62μs (28.7% faster)

def test_basic_method_in_class():
    # Method inside a class, indented
    source = [
        "class Bar:\n",
        "    def baz(self, a):\n",
        "        return a * 2\n"
    ]
    node = get_function_node("".join(source), "baz")
    finder = FunctionCallFinder("baz", "dummy.py", source)
    finder.current_class_stack.append("Bar")  # Simulate being inside a class
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 4.79μs -> 4.12μs (16.2% faster)

def test_basic_function_with_docstring():
    source = [
        "def foo():\n",
        "    \"\"\"This is a docstring.\"\"\"\n",
        "    return 42\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 4.67μs -> 3.12μs (49.3% faster)

def test_basic_function_with_blank_lines():
    source = [
        "def foo():\n",
        "\n",
        "    x = 1\n",
        "\n",
        "    return x\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 4.88μs -> 3.12μs (56.0% faster)

# Edge Test Cases


def test_function_with_mixed_indentation():
    # Function with mixed indentation (tabs and spaces)
    source = [
        "def foo():\n",
        "\treturn 1\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 5.92μs -> 5.42μs (9.23% faster)

def test_function_with_empty_body():
    source = [
        "def foo():\n",
        "    pass\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 4.21μs -> 3.17μs (32.9% faster)

def test_function_with_comments_and_blank_lines():
    source = [
        "def foo():\n",
        "    # Comment line\n",
        "    x = 1\n",
        "    # Another comment\n",
        "    return x\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 5.79μs -> 3.17μs (82.9% faster)

def test_function_with_no_source_lines():
    # Should fallback to ast.unparse if source_lines is empty
    source = [
        "def foo(a):\n",
        "    return a + 1\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", [])
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 46.5μs -> 46.2μs (0.631% faster)

def test_function_without_lineno():
    # Remove lineno attribute to simulate missing info
    source = [
        "def foo():\n",
        "    return 1\n"
    ]
    node = get_function_node("".join(source), "foo")
    del node.lineno  # Remove lineno
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 15.5μs -> 15.7μs (1.07% slower)

def test_method_in_class_with_extra_indentation():
    # Method with extra indentation inside class
    source = [
        "class Foo:\n",
        "        def bar(self):\n",
        "            return 123\n"
    ]
    node = get_function_node("".join(source), "bar")
    finder = FunctionCallFinder("bar", "dummy.py", source)
    finder.current_class_stack.append("Foo")
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 5.50μs -> 5.67μs (2.95% slower)


def test_function_with_decorator():
    source = [
        "@staticmethod\n",
        "def bar():\n",
        "    return 42\n"
    ]
    node = get_function_node("".join(source), "bar")
    finder = FunctionCallFinder("bar", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 6.00μs -> 4.92μs (22.1% faster)

def test_function_with_unicode_and_non_ascii():
    source = [
        "def foo():\n",
        "    s = '你好, мир!'\n",
        "    return s\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 5.75μs -> 3.71μs (55.0% faster)

# Large Scale Test Cases

def test_large_function_extraction():
    # Function with 1000 lines
    source = ["def big():\n"]
    for i in range(1, 1001):
        source.append(f"    x{i} = {i}\n")
    source.append("    return x1000\n")
    node = get_function_node("".join(source), "big")
    finder = FunctionCallFinder("big", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 297μs -> 28.2μs (958% faster)

def test_large_class_with_many_methods():
    # Class with 500 methods, extract one
    source = ["class BigClass:\n"]
    for i in range(1, 501):
        source.append(f"    def method_{i}(self):\n")
        source.append(f"        return {i}\n")
    node = get_function_node("".join(source), f"method_250")
    finder = FunctionCallFinder("method_250", "dummy.py", source)
    finder.current_class_stack.append("BigClass")
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 5.96μs -> 6.42μs (7.14% slower)

def test_large_function_with_blank_lines_and_comments():
    # Function with many blank lines and comments
    source = ["def foo():\n"]
    for i in range(500):
        source.append("    # comment\n")
        source.append("\n")
    source.append("    return 1\n")
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 187μs -> 19.3μs (870% faster)


def test_large_function_with_unicode_lines():
    # Function with many lines containing unicode
    source = ["def foo():\n"]
    for i in range(100):
        source.append(f"    s{i} = '你好{i}'\n")
    source.append("    return s99\n")
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 46.7μs -> 9.25μs (405% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-FunctionCallFinder._extract_source_code-mgzs0zww and push.

The optimized code achieves a **295% speedup** through several key optimizations in the `_extract_source_code` method: **1. Early Exit Optimization in Min Indent Finding** The most significant improvement is adding an early exit when `min_indent` reaches 0: ```python if min_indent == 0: break # Early exit if we hit zero (can't get lower) ``` This eliminates unnecessary iterations when processing functions with no indentation, which is common for top-level functions. **2. Caching Strip Operation** Instead of calling `line.strip()` multiple times, the optimized version caches it: ```python stripped = line.strip() if stripped: ``` This reduces redundant string operations during the min indent calculation loop. **3. Conditional List Comprehension with Fallbacks** The optimized code replaces nested loops with list comprehensions but only when beneficial: ```python if dedent_amount > 0: result_lines = [line[dedent_amount:] if line.strip() and len(line) > dedent_amount else line for line in func_lines] else: result_lines = list(func_lines) ``` When no processing is needed, it uses the faster `list(func_lines)` instead of applying transformations. **Performance Impact by Test Case:** - **Large functions** see the biggest gains (958% faster for 1000-line function, 870% for functions with many blank lines) - **Small functions** show moderate improvements (16-56% faster) - **Unicode content** benefits significantly (405% faster) due to reduced string operations - **Class methods** see smaller gains since they have more complex indentation logic The optimizations are particularly effective for **large codebases** and **functions with minimal indentation** (top-level functions), which are common in real-world code analysis scenarios.

codeflash-ai bot requested a review from KRRT7 October 20, 2025 23:38

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 20, 2025

KRRT7 closed this Oct 20, 2025

codeflash-ai bot deleted the codeflash/optimize-FunctionCallFinder._extract_source_code-mgzs0zww branch October 20, 2025 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `FunctionCallFinder._extract_source_code` by 295% #838

⚡️ Speed up method `FunctionCallFinder._extract_source_code` by 295% #838

Uh oh!

codeflash-ai bot commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method FunctionCallFinder._extract_source_code by 295% #838

⚡️ Speed up method FunctionCallFinder._extract_source_code by 295% #838

Uh oh!

Conversation

codeflash-ai bot commented Oct 20, 2025

📄 295% (2.95x) speedup for FunctionCallFinder._extract_source_code in codeflash/code_utils/code_extractor.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `FunctionCallFinder._extract_source_code` by 295% #838

⚡️ Speed up method `FunctionCallFinder._extract_source_code` by 295% #838

📄 295% (2.95x) speedup for `FunctionCallFinder._extract_source_code` in `codeflash/code_utils/code_extractor.py`