Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Oct 20, 2025

📄 295% (2.95x) speedup for FunctionCallFinder._extract_source_code in codeflash/code_utils/code_extractor.py

⏱️ Runtime : 652 microseconds 165 microseconds (best of 45 runs)

📝 Explanation and details

The optimized code achieves a 295% speedup through several key optimizations in the _extract_source_code method:

1. Early Exit Optimization in Min Indent Finding
The most significant improvement is adding an early exit when min_indent reaches 0:

if min_indent == 0:
    break  # Early exit if we hit zero (can't get lower)

This eliminates unnecessary iterations when processing functions with no indentation, which is common for top-level functions.

2. Caching Strip Operation
Instead of calling line.strip() multiple times, the optimized version caches it:

stripped = line.strip()
if stripped:

This reduces redundant string operations during the min indent calculation loop.

3. Conditional List Comprehension with Fallbacks
The optimized code replaces nested loops with list comprehensions but only when beneficial:

if dedent_amount > 0:
    result_lines = [line[dedent_amount:] if line.strip() and len(line) > dedent_amount else line
                    for line in func_lines]
else:
    result_lines = list(func_lines)

When no processing is needed, it uses the faster list(func_lines) instead of applying transformations.

Performance Impact by Test Case:

  • Large functions see the biggest gains (958% faster for 1000-line function, 870% for functions with many blank lines)
  • Small functions show moderate improvements (16-56% faster)
  • Unicode content benefits significantly (405% faster) due to reduced string operations
  • Class methods see smaller gains since they have more complex indentation logic

The optimizations are particularly effective for large codebases and functions with minimal indentation (top-level functions), which are common in real-world code analysis scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 32 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 96.2%
🌀 Generated Regression Tests and Runtime
import ast

# imports
import pytest
from codeflash.code_utils.code_extractor import FunctionCallFinder

# unit tests

# Helper to get ast.FunctionDef node from source code
def get_function_node(source: str, func_name: str):
    tree = ast.parse(source)
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef) and node.name == func_name:
            return node
    raise ValueError(f"Function '{func_name}' not found in AST.")

# Basic Test Cases

def test_basic_top_level_function():
    # Simple top-level function
    source = [
        "def foo(x, y):\n",
        "    return x + y\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 4.67μs -> 3.62μs (28.7% faster)

def test_basic_method_in_class():
    # Method inside a class, indented
    source = [
        "class Bar:\n",
        "    def baz(self, a):\n",
        "        return a * 2\n"
    ]
    node = get_function_node("".join(source), "baz")
    finder = FunctionCallFinder("baz", "dummy.py", source)
    finder.current_class_stack.append("Bar")  # Simulate being inside a class
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 4.79μs -> 4.12μs (16.2% faster)

def test_basic_function_with_docstring():
    source = [
        "def foo():\n",
        "    \"\"\"This is a docstring.\"\"\"\n",
        "    return 42\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 4.67μs -> 3.12μs (49.3% faster)

def test_basic_function_with_blank_lines():
    source = [
        "def foo():\n",
        "\n",
        "    x = 1\n",
        "\n",
        "    return x\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 4.88μs -> 3.12μs (56.0% faster)

# Edge Test Cases


def test_function_with_mixed_indentation():
    # Function with mixed indentation (tabs and spaces)
    source = [
        "def foo():\n",
        "\treturn 1\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 5.92μs -> 5.42μs (9.23% faster)

def test_function_with_empty_body():
    source = [
        "def foo():\n",
        "    pass\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 4.21μs -> 3.17μs (32.9% faster)

def test_function_with_comments_and_blank_lines():
    source = [
        "def foo():\n",
        "    # Comment line\n",
        "    x = 1\n",
        "    # Another comment\n",
        "    return x\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 5.79μs -> 3.17μs (82.9% faster)

def test_function_with_no_source_lines():
    # Should fallback to ast.unparse if source_lines is empty
    source = [
        "def foo(a):\n",
        "    return a + 1\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", [])
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 46.5μs -> 46.2μs (0.631% faster)

def test_function_without_lineno():
    # Remove lineno attribute to simulate missing info
    source = [
        "def foo():\n",
        "    return 1\n"
    ]
    node = get_function_node("".join(source), "foo")
    del node.lineno  # Remove lineno
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 15.5μs -> 15.7μs (1.07% slower)

def test_method_in_class_with_extra_indentation():
    # Method with extra indentation inside class
    source = [
        "class Foo:\n",
        "        def bar(self):\n",
        "            return 123\n"
    ]
    node = get_function_node("".join(source), "bar")
    finder = FunctionCallFinder("bar", "dummy.py", source)
    finder.current_class_stack.append("Foo")
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 5.50μs -> 5.67μs (2.95% slower)


def test_function_with_decorator():
    source = [
        "@staticmethod\n",
        "def bar():\n",
        "    return 42\n"
    ]
    node = get_function_node("".join(source), "bar")
    finder = FunctionCallFinder("bar", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 6.00μs -> 4.92μs (22.1% faster)

def test_function_with_unicode_and_non_ascii():
    source = [
        "def foo():\n",
        "    s = '你好, мир!'\n",
        "    return s\n"
    ]
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 5.75μs -> 3.71μs (55.0% faster)

# Large Scale Test Cases

def test_large_function_extraction():
    # Function with 1000 lines
    source = ["def big():\n"]
    for i in range(1, 1001):
        source.append(f"    x{i} = {i}\n")
    source.append("    return x1000\n")
    node = get_function_node("".join(source), "big")
    finder = FunctionCallFinder("big", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 297μs -> 28.2μs (958% faster)

def test_large_class_with_many_methods():
    # Class with 500 methods, extract one
    source = ["class BigClass:\n"]
    for i in range(1, 501):
        source.append(f"    def method_{i}(self):\n")
        source.append(f"        return {i}\n")
    node = get_function_node("".join(source), f"method_250")
    finder = FunctionCallFinder("method_250", "dummy.py", source)
    finder.current_class_stack.append("BigClass")
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 5.96μs -> 6.42μs (7.14% slower)

def test_large_function_with_blank_lines_and_comments():
    # Function with many blank lines and comments
    source = ["def foo():\n"]
    for i in range(500):
        source.append("    # comment\n")
        source.append("\n")
    source.append("    return 1\n")
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 187μs -> 19.3μs (870% faster)


def test_large_function_with_unicode_lines():
    # Function with many lines containing unicode
    source = ["def foo():\n"]
    for i in range(100):
        source.append(f"    s{i} = '你好{i}'\n")
    source.append("    return s99\n")
    node = get_function_node("".join(source), "foo")
    finder = FunctionCallFinder("foo", "dummy.py", source)
    codeflash_output = finder._extract_source_code(node); extracted = codeflash_output # 46.7μs -> 9.25μs (405% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-FunctionCallFinder._extract_source_code-mgzs0zww and push.

Codeflash

The optimized code achieves a **295% speedup** through several key optimizations in the `_extract_source_code` method:

**1. Early Exit Optimization in Min Indent Finding**
The most significant improvement is adding an early exit when `min_indent` reaches 0:
```python
if min_indent == 0:
    break  # Early exit if we hit zero (can't get lower)
```
This eliminates unnecessary iterations when processing functions with no indentation, which is common for top-level functions.

**2. Caching Strip Operation**
Instead of calling `line.strip()` multiple times, the optimized version caches it:
```python
stripped = line.strip()
if stripped:
```
This reduces redundant string operations during the min indent calculation loop.

**3. Conditional List Comprehension with Fallbacks**
The optimized code replaces nested loops with list comprehensions but only when beneficial:
```python
if dedent_amount > 0:
    result_lines = [line[dedent_amount:] if line.strip() and len(line) > dedent_amount else line
                    for line in func_lines]
else:
    result_lines = list(func_lines)
```
When no processing is needed, it uses the faster `list(func_lines)` instead of applying transformations.

**Performance Impact by Test Case:**
- **Large functions** see the biggest gains (958% faster for 1000-line function, 870% for functions with many blank lines)
- **Small functions** show moderate improvements (16-56% faster)
- **Unicode content** benefits significantly (405% faster) due to reduced string operations
- **Class methods** see smaller gains since they have more complex indentation logic

The optimizations are particularly effective for **large codebases** and **functions with minimal indentation** (top-level functions), which are common in real-world code analysis scenarios.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 October 20, 2025 23:38
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 20, 2025
@KRRT7 KRRT7 closed this Oct 20, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-FunctionCallFinder._extract_source_code-mgzs0zww branch October 20, 2025 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant