Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Jun 27, 2025

📄 59% (0.59x) speedup for function_has_return_statement in codeflash/discovery/functions_to_optimize.py

⏱️ Runtime : 3.41 milliseconds 2.15 milliseconds (best of 434 runs)

📝 Explanation and details

Here is an optimized version of your function. The performance bottleneck, according to your line profiling, is the frequent and expensive use of ast.iter_child_nodes(node) and stack.extend, which results in a lot of small object allocations and function calls.

A standard approach for faster AST traversals is to avoid repeatedly building stack lists and avoid function call overhead in performance-critical loops. Additionally, instead of manually maintaining your own DFS stack, you can use a custom generator that flattens out iteration, but for maximum speed, we can use the following techniques.

  • Use __slots__ if building custom object wrappers (not needed here).
  • Replace stack.extend with direct in-place iteration (avoid function call).
  • Use a tuple for append instead of repeat calls if possible.
  • Live with the stack, but reduce attr lookups by caching methods.
  • Avoid isinstance in the hot path where possible (but that's hard here; checking for ast.Return is what we want).

But the biggest win is to avoid use of ast.iter_child_nodes within the loop. Since ast.AST objects all have a _fields attribute, you can directly and quickly access children yourself, avoiding the internal generator overhead.

Optimized code

Key improvements

  • Avoid expensive ast.iter_child_nodes in the loop.
  • Reduce attribute lookups by caching local variables (pop, push).
  • Directly scan node fields to find children.

This version will usually run up to 2x faster on deep/crowded ASTs.
The logic and return value are unchanged. All comments are preserved except for changes referring to iter_child_nodes, now referencing direct field inspection.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 32 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import ast
from _ast import AsyncFunctionDef, FunctionDef

# imports
import pytest  # used for our unit tests
from codeflash.discovery.functions_to_optimize import \
    function_has_return_statement


# Helper function to extract the first function node from code
def get_first_func_node(src: str) -> FunctionDef | AsyncFunctionDef:
    """
    Parse the source code and return the first FunctionDef or AsyncFunctionDef node found.
    """
    mod = ast.parse(src)
    for node in mod.body:
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            return node
    raise ValueError("No function definition found in source")


# ----------------------------------------
# 1. Basic Test Cases
# ----------------------------------------

def test_simple_function_with_return():
    # Basic function with a single return statement
    src = """
def foo():
    return 42
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_simple_function_without_return():
    # Function with no return statement
    src = """
def foo():
    x = 1 + 2
    y = x * 3
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_if():
    # Function with return inside an if block
    src = """
def foo(x):
    if x > 0:
        return True
    else:
        return False
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_multiple_returns():
    # Function with multiple return statements
    src = """
def foo(x):
    if x > 0:
        return 1
    if x < 0:
        return -1
    return 0
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_nested_function_return():
    # Function with a nested function that has a return, but the outer function does not
    src = """
def foo():
    def bar():
        return 1
    x = 2
"""
    node = get_first_func_node(src)
    # Only returns in the outer function count
    codeflash_output = function_has_return_statement(node)

def test_function_with_nested_function_and_outer_return():
    # Function with both outer and nested function returns
    src = """
def foo():
    def bar():
        return 1
    return 2
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_async_function_with_return():
    # Async function with a return
    src = """
async def foo():
    return 123
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_async_function_without_return():
    # Async function with no return
    src = """
async def foo():
    pass
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

# ----------------------------------------
# 2. Edge Test Cases
# ----------------------------------------

def test_function_with_return_none():
    # Function with 'return' but no value
    src = """
def foo():
    return
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_try_except():
    # Return inside try/except/finally
    src = """
def foo():
    try:
        x = 1
        return 2
    except Exception:
        return 3
    finally:
        y = 4
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_while():
    # Return inside a while loop
    src = """
def foo():
    while True:
        return 1
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_for():
    # Return inside a for loop
    src = """
def foo():
    for i in range(5):
        if i == 3:
            return i
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_yield_only():
    # Function with yield but no return
    src = """
def foo():
    yield 1
    yield 2
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_comprehension():
    # Return inside a list comprehension (should not be possible in Python, but test for robustness)
    src = """
def foo():
    x = [i for i in range(5)]
    return x
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_lambda_inside():
    # Lambda with return inside (not possible, but lambda can be present)
    src = """
def foo():
    bar = lambda x: x + 1
    return bar(5)
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_only_pass():
    # Function with only pass statement
    src = """
def foo():
    pass
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_docstring_only():
    # Function with only a docstring
    src = '''
def foo():
    """This is a docstring."""
'''
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_class_method():
    # Class method with return
    src = """
class A:
    def foo(self):
        return 5
"""
    mod = ast.parse(src)
    cls = mod.body[0]
    func = [n for n in cls.body if isinstance(n, ast.FunctionDef)][0]
    codeflash_output = function_has_return_statement(func)

def test_function_with_return_in_nested_class():
    # Function with a nested class that contains a return, but the outer function does not
    src = """
def foo():
    class Bar:
        def baz(self):
            return 1
    x = 3
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_nested_class_and_outer():
    # Function with both outer and nested class returns
    src = """
def foo():
    class Bar:
        def baz(self):
            return 1
    return 2
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_deeply_nested_blocks():
    # Return deeply nested in if/for/while/try
    src = """
def foo():
    for i in range(3):
        while True:
            if i > 1:
                try:
                    return i
                except:
                    pass
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_no_body():
    # Function with no body (should be impossible, but test for robustness)
    src = "def foo():\n    pass"
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_else_block():
    # Return in else block
    src = """
def foo(x):
    if x > 0:
        y = 1
    else:
        return -1
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_function_with_return_in_try_finally_only():
    # Return only in finally block (should be detected)
    src = """
def foo():
    try:
        x = 1
    finally:
        return 2
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

# ----------------------------------------
# 3. Large Scale Test Cases
# ----------------------------------------

def test_large_function_with_many_statements_and_one_return():
    # Large function with many statements and a single return at the end
    body = "\n".join([f"    x{i} = {i}" for i in range(900)])
    src = f"""
def foo():
{body}
    return x899
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_large_function_without_return():
    # Large function with many statements and no return
    body = "\n".join([f"    x{i} = {i}" for i in range(950)])
    src = f"""
def foo():
{body}
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_large_function_with_return_in_middle():
    # Large function with return in the middle
    body = "\n".join([f"    x{i} = {i}" for i in range(400)])
    src = f"""
def foo():
{body}
    return 123
    x999 = 999
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_large_function_with_return_in_nested_loop():
    # Large function with a return inside a nested loop
    loop_body = "\n".join([f"            x{i} = {i}" for i in range(100)])
    src = f"""
def foo():
    for i in range(10):
        for j in range(10):
{loop_body}
            if i == 5 and j == 5:
                return i + j
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_large_function_with_many_nested_functions():
    # Large function with many nested functions, only outer has return
    nested_funcs = "\n".join([f"    def bar{i}():\n        pass" for i in range(50)])
    src = f"""
def foo():
{nested_funcs}
    return 1
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_large_function_with_return_only_in_nested_functions():
    # Large function with many nested functions, only nested have return
    nested_funcs = "\n".join([f"    def bar{i}():\n        return {i}" for i in range(50)])
    src = f"""
def foo():
{nested_funcs}
    x = 1
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_large_async_function_with_return():
    # Large async function with a return
    body = "\n".join([f"    x{i} = {i}" for i in range(500)])
    src = f"""
async def foo():
{body}
    return x499
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)

def test_large_async_function_without_return():
    # Large async function with no return
    body = "\n".join([f"    x{i} = {i}" for i in range(800)])
    src = f"""
async def foo():
{body}
"""
    node = get_first_func_node(src)
    codeflash_output = function_has_return_statement(node)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-function_has_return_statement-mce28wa5 and push.

Codeflash

Here is an optimized version of your function. The performance bottleneck, according to your line profiling, is the frequent and expensive use of `ast.iter_child_nodes(node)` and `stack.extend`, which results in a lot of small object allocations and function calls.

A standard approach for faster AST traversals is to avoid repeatedly building stack lists and avoid function call overhead in performance-critical loops. Additionally, instead of manually maintaining your own DFS stack, you can use a custom generator that flattens out iteration, but for **maximum speed**, we can use the following techniques.

- Use `__slots__` if building custom object wrappers (not needed here).
- **Replace `stack.extend` with direct in-place iteration** (avoid function call).
- Use a tuple for `append` instead of repeat calls if possible.
- Live with the stack, but **reduce attr lookups** by caching methods.
- Avoid `isinstance` in the hot path where possible (but that's hard here; checking for `ast.Return` is what we want).

But the biggest win is to avoid use of `ast.iter_child_nodes` within the loop. Since `ast.AST` objects all have a `_fields` attribute, you can directly and quickly access children yourself, avoiding the internal generator overhead.

#### Optimized code



### Key improvements

- **Avoid expensive ast.iter_child_nodes** in the loop. 
- **Reduce attribute lookups** by caching local variables (`pop`, `push`).
- **Directly scan node fields** to find children.

This version will usually run up to 2x faster on deep/crowded ASTs.  
The logic and return value are unchanged. All comments are preserved except for changes referring to iter_child_nodes, now referencing direct field inspection.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 27, 2025
@codeflash-ai codeflash-ai bot requested a review from misrasaurabh1 June 27, 2025 00:15
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-function_has_return_statement-mce28wa5 branch June 28, 2025 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant