⚡️ Speed up function `remove_functions_from_generated_tests` by 23% in PR #769 (`clean-async-branch`) #779

codeflash-ai · 2025-09-27T01:33:23Z

⚡️ This pull request contains optimizations for PR #769

If you approve this dependent PR, these changes will be merged into the original PR branch clean-async-branch.

This PR will be automatically closed if the original PR is merged.

📄 23% (0.23x) speedup for `remove_functions_from_generated_tests` in `codeflash/code_utils/edit_generated_tests.py`

⏱️ Runtime : 1.46 milliseconds → 1.19 milliseconds (best of 11 runs)

📝 Explanation and details

The optimization achieves a 22% speedup by eliminating redundant regex compilation and reducing unnecessary string operations.

Key optimizations:

Pre-compiled regex patterns: The original code compiled the same regex pattern multiple times (3,114 compilations taking 43.4% of total time). The optimized version compiles each pattern only once upfront using _compile_function_patterns(), moving this expensive operation outside the nested loops.
Efficient string manipulation: Instead of using re.sub() which searches the entire string again, the optimized version uses finditer() to get match positions directly, then performs string slicing (source[:start] + source[end:]) to remove matched functions. This avoids the overhead of regex substitution.
Early termination: After finding and removing a function match, the code breaks from the inner loop since only one match per function is expected, preventing unnecessary continued iteration.

Performance impact by test case:

The optimizations are most effective for scenarios with multiple test functions to remove across multiple generated tests (the typical use case)
For edge cases like empty test lists, there's minimal overhead from pre-compilation but no significant benefit
The approach maintains correct behavior for decorated functions (skipping @pytest.mark.parametrize functions as intended)

The line profiler shows the regex compilation time dropped from 43.4% to being absorbed into the 89.8% upfront compilation cost, while the substitution overhead (51.7% in original) is eliminated entirely.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 5 Passed
🌀 Generated Regression Tests	✅ 1 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

⚙️ Existing Unit Tests and Runtime

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`test_remove_functions_from_generated_tests.py::test_keep_parametrized_tests`	486μs	421μs	15.4%✅
`test_remove_functions_from_generated_tests.py::test_multiple_removals`	368μs	269μs	36.8%✅
`test_remove_functions_from_generated_tests.py::test_remove_complex_functions`	348μs	270μs	29.0%✅
`test_remove_functions_from_generated_tests.py::test_simple_removal`	248μs	216μs	14.7%✅

🌀 Generated Regression Tests and Runtime

from __future__ import annotations

import re
from dataclasses import dataclass, field
from typing import List

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.edit_generated_tests import \
    remove_functions_from_generated_tests


# Simulate codeflash.models.models.GeneratedTestsList for testing
@dataclass
class GeneratedTest:
    generated_original_test_source: str

@dataclass
class GeneratedTestsList:
    generated_tests: List[GeneratedTest] = field(default_factory=list)
from codeflash.code_utils.edit_generated_tests import \
    remove_functions_from_generated_tests

# unit tests

# ---- Basic Test Cases ----






def test_remove_function_with_decorator():
    # Should not remove functions decorated with @pytest.mark.parametrize
    source = (
        "@pytest.mark.parametrize('x', [1,2])\n"
        "def test_one(x):\n    assert x in [1,2]\n"
        "def test_two():\n    assert True\n"
    )
    tests_list = GeneratedTestsList([GeneratedTest(source)])
    codeflash_output = remove_functions_from_generated_tests(tests_list, ["test_one"]); result = codeflash_output












def test_empty_generated_tests_list():
    # Should handle empty GeneratedTestsList gracefully
    tests_list = GeneratedTestsList([])
    codeflash_output = remove_functions_from_generated_tests(tests_list, ["test_one"]); result = codeflash_output # 9.29μs -> 11.8μs (21.5% slower)






#------------------------------------------------
from __future__ import annotations

import re
from dataclasses import dataclass
from typing import List

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.edit_generated_tests import \
    remove_functions_from_generated_tests


# Simulate the GeneratedTestsList and its elements
@dataclass
class GeneratedTest:
    generated_original_test_source: str

@dataclass
class GeneratedTestsList:
    generated_tests: List[GeneratedTest]
from codeflash.code_utils.edit_generated_tests import \
    remove_functions_from_generated_tests

# ==========================
# Unit Tests
# ==========================

# BASIC TEST CASES







def test_remove_function_with_decorator_edge():
    # Should NOT remove parametrized tests
    source = (
        "@pytest.mark.parametrize('x', [1,2,3])\ndef test_param(x):\n    assert x\n"
        "def test_normal():\n    assert True\n"
    )
    gtl = GeneratedTestsList([GeneratedTest(source)])
    codeflash_output = remove_functions_from_generated_tests(gtl, ["test_param", "test_normal"]); result = codeflash_output
    s = result.generated_tests[0].generated_original_test_source

def test_remove_function_with_multiline_decorator_edge():
    # Should NOT remove parametrized tests with multiline decorators
    source = (
        "@pytest.mark.parametrize(\n    'x', [1,2,3]\n)\ndef test_param(x):\n    assert x\n"
        "def test_normal():\n    assert True\n"
    )
    gtl = GeneratedTestsList([GeneratedTest(source)])
    codeflash_output = remove_functions_from_generated_tests(gtl, ["test_param", "test_normal"]); result = codeflash_output
    s = result.generated_tests[0].generated_original_test_source

To edit these changes git checkout codeflash/optimize-pr769-2025-09-27T01.33.17 and push.

The optimization achieves a 22% speedup by eliminating redundant regex compilation and reducing unnecessary string operations. **Key optimizations:** 1. **Pre-compiled regex patterns**: The original code compiled the same regex pattern multiple times (3,114 compilations taking 43.4% of total time). The optimized version compiles each pattern only once upfront using `_compile_function_patterns()`, moving this expensive operation outside the nested loops. 2. **Efficient string manipulation**: Instead of using `re.sub()` which searches the entire string again, the optimized version uses `finditer()` to get match positions directly, then performs string slicing (`source[:start] + source[end:]`) to remove matched functions. This avoids the overhead of regex substitution. 3. **Early termination**: After finding and removing a function match, the code breaks from the inner loop since only one match per function is expected, preventing unnecessary continued iteration. **Performance impact by test case:** - The optimizations are most effective for scenarios with multiple test functions to remove across multiple generated tests (the typical use case) - For edge cases like empty test lists, there's minimal overhead from pre-compilation but no significant benefit - The approach maintains correct behavior for decorated functions (skipping `@pytest.mark.parametrize` functions as intended) The line profiler shows the regex compilation time dropped from 43.4% to being absorbed into the 89.8% upfront compilation cost, while the substitution overhead (51.7% in original) is eliminated entirely.

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Sep 27, 2025

codeflash-ai bot mentioned this pull request Sep 27, 2025

enable async function optimization #769

Merged

mypy fix

7a6a432

KRRT7 merged commit e27c133 into clean-async-branch Sep 27, 2025
16 of 20 checks passed

codeflash-ai bot deleted the codeflash/optimize-pr769-2025-09-27T01.33.17 branch September 27, 2025 02:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `remove_functions_from_generated_tests` by 23% in PR #769 (`clean-async-branch`) #779

⚡️ Speed up function `remove_functions_from_generated_tests` by 23% in PR #769 (`clean-async-branch`) #779

Uh oh!

codeflash-ai bot commented Sep 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function remove_functions_from_generated_tests by 23% in PR #769 (clean-async-branch) #779

⚡️ Speed up function remove_functions_from_generated_tests by 23% in PR #769 (clean-async-branch) #779

Uh oh!

Conversation

codeflash-ai bot commented Sep 27, 2025

⚡️ This pull request contains optimizations for PR #769

📄 23% (0.23x) speedup for remove_functions_from_generated_tests in codeflash/code_utils/edit_generated_tests.py

📝 Explanation and details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `remove_functions_from_generated_tests` by 23% in PR #769 (`clean-async-branch`) #779

⚡️ Speed up function `remove_functions_from_generated_tests` by 23% in PR #769 (`clean-async-branch`) #779

📄 23% (0.23x) speedup for `remove_functions_from_generated_tests` in `codeflash/code_utils/edit_generated_tests.py`