Skip to content

Conversation

@misrasaurabh1
Copy link
Contributor

@misrasaurabh1 misrasaurabh1 commented Jun 6, 2025

User description

It creates PRs like #293 where the generated tests show all the timings.


PR Type

Enhancement, Tests


Description

  • Extract test removal utility to separate module

  • Integrate runtime comments into generated tests

  • Refactor optimizer to call removal and comment functions

  • Update test import path for removal utility


Changes walkthrough 📝

Relevant files
Enhancement
edit_generated_tests.py
Extract test removal utility                                                         

codeflash/code_utils/edit_generated_tests.py

  • Add remove_functions_from_generated_tests utility
  • Compile regex to strip specified test functions
  • Return filtered GeneratedTestsList
  • [link]   
    function_optimizer.py
    Integrate runtime comments into optimizer                               

    codeflash/optimization/function_optimizer.py

  • Update import for remove_functions_from_generated_tests
  • Remove old removal call location
  • Insert calls to removal and runtime comment methods
  • Add add_runtime_comments_to_generated_tests method
  • +162/-7 
    Tests
    test_remove_functions_from_generated_tests.py
    Update test import path                                                                   

    tests/test_remove_functions_from_generated_tests.py

  • Update import path to edit_generated_tests
  • Remove outdated blank line
  • +1/-2     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    github-actions bot commented Jun 6, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Missing Import

    The add_runtime_comments_to_generated_tests method uses cst (libcst) but there is no import libcst as cst in this file, causing a NameError at runtime.

    def add_runtime_comments_to_generated_tests(
        self,
        generated_tests: GeneratedTestsList,
        original_test_results: TestResults,
        optimized_test_results: TestResults,
    ) -> GeneratedTestsList:
        """Add runtime performance comments to function calls in generated tests."""
    
        def format_time(nanoseconds: int) -> str:
            """Format nanoseconds into a human-readable string with 3 significant digits when needed."""
    
            def count_significant_digits(num: int) -> int:
                """Count significant digits in an integer."""
                return len(str(abs(num)))
    
            def format_with_precision(value: float, unit: str) -> str:
                """Format a value with 3 significant digits precision."""
                if value >= 100:
                    return f"{value:.0f}{unit}"
                if value >= 10:
                    return f"{value:.1f}{unit}"
                return f"{value:.2f}{unit}"
    
            if nanoseconds < 1_000:
                return f"{nanoseconds}ns"
            if nanoseconds < 1_000_000:
                # Convert to microseconds
                microseconds_int = nanoseconds // 1_000
                if count_significant_digits(microseconds_int) >= 3:
                    return f"{microseconds_int}μs"
                microseconds_float = nanoseconds / 1_000
                return format_with_precision(microseconds_float, "μs")
            if nanoseconds < 1_000_000_000:
                # Convert to milliseconds
                milliseconds_int = nanoseconds // 1_000_000
                if count_significant_digits(milliseconds_int) >= 3:
                    return f"{milliseconds_int}ms"
                milliseconds_float = nanoseconds / 1_000_000
                return format_with_precision(milliseconds_float, "ms")
            # Convert to seconds
            seconds_int = nanoseconds // 1_000_000_000
            if count_significant_digits(seconds_int) >= 3:
                return f"{seconds_int}s"
            seconds_float = nanoseconds / 1_000_000_000
            return format_with_precision(seconds_float, "s")
    
        # Create dictionaries for fast lookup of runtime data
        original_runtime_by_test = original_test_results.usable_runtime_data_by_test_case()
        optimized_runtime_by_test = optimized_test_results.usable_runtime_data_by_test_case()
    
        class RuntimeCommentTransformer(cst.CSTTransformer):
            def __init__(self):
                self.in_test_function = False
                self.current_test_name = None
    
            def visit_FunctionDef(self, node: cst.FunctionDef) -> None:
                if node.name.value.startswith("test_"):
                    self.in_test_function = True
                    self.current_test_name = node.name.value
                else:
                    self.in_test_function = False
                    self.current_test_name = None
    
            def leave_FunctionDef(
                self, original_node: cst.FunctionDef, updated_node: cst.FunctionDef
            ) -> cst.FunctionDef:
                if original_node.name.value.startswith("test_"):
                    self.in_test_function = False
                    self.current_test_name = None
                return updated_node
    
            def leave_SimpleStatementLine(
                self, original_node: cst.SimpleStatementLine, updated_node: cst.SimpleStatementLine
            ) -> cst.SimpleStatementLine:
                if not self.in_test_function or not self.current_test_name:
                    return updated_node
    
                # Look for assignment statements that assign to codeflash_output
                # Handle both single statements and multiple statements on one line
                codeflash_assignment_found = False
                for stmt in updated_node.body:
                    if isinstance(stmt, cst.Assign):
                        if (
                            len(stmt.targets) == 1
                            and isinstance(stmt.targets[0].target, cst.Name)
                            and stmt.targets[0].target.value == "codeflash_output"
                        ):
                            codeflash_assignment_found = True
                            break
    
                if codeflash_assignment_found:
                    # Find matching test cases by looking for this test function name in the test results
                    matching_original_times = []
                    matching_optimized_times = []
    
                    for invocation_id, runtimes in original_runtime_by_test.items():
                        if invocation_id.test_function_name == self.current_test_name:
                            matching_original_times.extend(runtimes)
    
                    for invocation_id, runtimes in optimized_runtime_by_test.items():
                        if invocation_id.test_function_name == self.current_test_name:
                            matching_optimized_times.extend(runtimes)
    
                    if matching_original_times and matching_optimized_times:
                        original_time = min(matching_original_times)
                        optimized_time = min(matching_optimized_times)
    
                        # Create the runtime comment
                        comment_text = f"# {format_time(original_time)} -> {format_time(optimized_time)}"
    
                        # Add comment to the trailing whitespace
                        new_trailing_whitespace = cst.TrailingWhitespace(
                            whitespace=cst.SimpleWhitespace(" "),
                            comment=cst.Comment(comment_text),
                            newline=updated_node.trailing_whitespace.newline,
                        )
    
                        return updated_node.with_changes(trailing_whitespace=new_trailing_whitespace)
    
                return updated_node
    
        # Process each generated test
        modified_tests = []
        for test in generated_tests.generated_tests:
            try:
                # Parse the test source code
                tree = cst.parse_module(test.generated_original_test_source)
    
                # Transform the tree to add runtime comments
                transformer = RuntimeCommentTransformer()
                modified_tree = tree.visit(transformer)
    
                # Convert back to source code
                modified_source = modified_tree.code
    
                # Create a new GeneratedTests object with the modified source
                modified_test = GeneratedTests(
                    generated_original_test_source=modified_source,
                    instrumented_behavior_test_source=test.instrumented_behavior_test_source,
                    instrumented_perf_test_source=test.instrumented_perf_test_source,
                    behavior_file_path=test.behavior_file_path,
                    perf_file_path=test.perf_file_path,
                )
                modified_tests.append(modified_test)
            except Exception as e:
                # If parsing fails, keep the original test
                logger.debug(f"Failed to add runtime comments to test: {e}")
                modified_tests.append(test)
    
        return GeneratedTestsList(generated_tests=modified_tests)
    Missing Tests

    The new runtime comment feature isn’t covered by any unit tests—add tests to verify that comments are correctly injected into generated tests.

        )
        cleanup_paths(paths_to_cleanup)
        if hasattr(get_run_tmp_file, "tmpdir"):
            get_run_tmp_file.tmpdir.cleanup()
    
    def add_runtime_comments_to_generated_tests(
        self,
        generated_tests: GeneratedTestsList,
        original_test_results: TestResults,
        optimized_test_results: TestResults,
    ) -> GeneratedTestsList:
        """Add runtime performance comments to function calls in generated tests."""
    
        def format_time(nanoseconds: int) -> str:
            """Format nanoseconds into a human-readable string with 3 significant digits when needed."""
    
            def count_significant_digits(num: int) -> int:
                """Count significant digits in an integer."""
                return len(str(abs(num)))
    
            def format_with_precision(value: float, unit: str) -> str:
                """Format a value with 3 significant digits precision."""
                if value >= 100:
                    return f"{value:.0f}{unit}"
                if value >= 10:
                    return f"{value:.1f}{unit}"
                return f"{value:.2f}{unit}"
    
            if nanoseconds < 1_000:
                return f"{nanoseconds}ns"
            if nanoseconds < 1_000_000:
                # Convert to microseconds
                microseconds_int = nanoseconds // 1_000
                if count_significant_digits(microseconds_int) >= 3:
                    return f"{microseconds_int}μs"
                microseconds_float = nanoseconds / 1_000
                return format_with_precision(microseconds_float, "μs")
            if nanoseconds < 1_000_000_000:
                # Convert to milliseconds
                milliseconds_int = nanoseconds // 1_000_000
                if count_significant_digits(milliseconds_int) >= 3:
                    return f"{milliseconds_int}ms"
                milliseconds_float = nanoseconds / 1_000_000
                return format_with_precision(milliseconds_float, "ms")
            # Convert to seconds
            seconds_int = nanoseconds // 1_000_000_000
            if count_significant_digits(seconds_int) >= 3:
                return f"{seconds_int}s"
            seconds_float = nanoseconds / 1_000_000_000
            return format_with_precision(seconds_float, "s")
    
        # Create dictionaries for fast lookup of runtime data
        original_runtime_by_test = original_test_results.usable_runtime_data_by_test_case()
        optimized_runtime_by_test = optimized_test_results.usable_runtime_data_by_test_case()
    
        class RuntimeCommentTransformer(cst.CSTTransformer):
            def __init__(self):
                self.in_test_function = False
                self.current_test_name = None
    
            def visit_FunctionDef(self, node: cst.FunctionDef) -> None:
                if node.name.value.startswith("test_"):
                    self.in_test_function = True
                    self.current_test_name = node.name.value
                else:
                    self.in_test_function = False
                    self.current_test_name = None
    
            def leave_FunctionDef(
                self, original_node: cst.FunctionDef, updated_node: cst.FunctionDef
            ) -> cst.FunctionDef:
                if original_node.name.value.startswith("test_"):
                    self.in_test_function = False
                    self.current_test_name = None
                return updated_node
    
            def leave_SimpleStatementLine(
                self, original_node: cst.SimpleStatementLine, updated_node: cst.SimpleStatementLine
            ) -> cst.SimpleStatementLine:
                if not self.in_test_function or not self.current_test_name:
                    return updated_node
    
                # Look for assignment statements that assign to codeflash_output
                # Handle both single statements and multiple statements on one line
                codeflash_assignment_found = False
                for stmt in updated_node.body:
                    if isinstance(stmt, cst.Assign):
                        if (
                            len(stmt.targets) == 1
                            and isinstance(stmt.targets[0].target, cst.Name)
                            and stmt.targets[0].target.value == "codeflash_output"
                        ):
                            codeflash_assignment_found = True
                            break
    
                if codeflash_assignment_found:
                    # Find matching test cases by looking for this test function name in the test results
                    matching_original_times = []
                    matching_optimized_times = []
    
                    for invocation_id, runtimes in original_runtime_by_test.items():
                        if invocation_id.test_function_name == self.current_test_name:
                            matching_original_times.extend(runtimes)
    
                    for invocation_id, runtimes in optimized_runtime_by_test.items():
                        if invocation_id.test_function_name == self.current_test_name:
                            matching_optimized_times.extend(runtimes)
    
                    if matching_original_times and matching_optimized_times:
                        original_time = min(matching_original_times)
                        optimized_time = min(matching_optimized_times)
    
                        # Create the runtime comment
                        comment_text = f"# {format_time(original_time)} -> {format_time(optimized_time)}"
    
                        # Add comment to the trailing whitespace
                        new_trailing_whitespace = cst.TrailingWhitespace(
                            whitespace=cst.SimpleWhitespace(" "),
                            comment=cst.Comment(comment_text),
                            newline=updated_node.trailing_whitespace.newline,
                        )
    
                        return updated_node.with_changes(trailing_whitespace=new_trailing_whitespace)
    
                return updated_node
    
        # Process each generated test
        modified_tests = []
        for test in generated_tests.generated_tests:
            try:
                # Parse the test source code
                tree = cst.parse_module(test.generated_original_test_source)
    
                # Transform the tree to add runtime comments
                transformer = RuntimeCommentTransformer()
                modified_tree = tree.visit(transformer)
    
                # Convert back to source code
                modified_source = modified_tree.code
    
                # Create a new GeneratedTests object with the modified source
                modified_test = GeneratedTests(
                    generated_original_test_source=modified_source,
                    instrumented_behavior_test_source=test.instrumented_behavior_test_source,
                    instrumented_perf_test_source=test.instrumented_perf_test_source,
                    behavior_file_path=test.behavior_file_path,
                    perf_file_path=test.perf_file_path,
                )
                modified_tests.append(modified_test)
            except Exception as e:
                # If parsing fails, keep the original test
                logger.debug(f"Failed to add runtime comments to test: {e}")
                modified_tests.append(test)
    
        return GeneratedTestsList(generated_tests=modified_tests)
    Regex Robustness

    The regex in remove_functions_from_generated_tests may overreach or skip decorated/parametrized functions; validate against edge cases and refine the pattern.

    function_pattern = re.compile(
        rf"(@pytest\.mark\.parametrize\(.*?\)\s*)?def\s+{re.escape(test_function)}\(.*?\):.*?(?=\ndef\s|$)",
        re.DOTALL,
    )
    
    match = function_pattern.search(generated_test.generated_original_test_source)
    
    if match is None or "@pytest.mark.parametrize" in match.group(0):

    @github-actions
    Copy link

    github-actions bot commented Jun 6, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Enable multiline regex matching

    Add the MULTILINE flag to the regex compilation so that the lookahead (?=\ndef\s|$)
    can reliably match function boundaries across lines. This ensures functions are
    removed correctly even when definitions span multiple lines or include indentation.

    codeflash/code_utils/edit_generated_tests.py [12-15]

     function_pattern = re.compile(
    -    rf"(@pytest\.mark\.parametrize\(.*?\)\s*)?def\s+{re.escape(test_function)}\(.*?\):.*?(?=\ndef\s|$)",
    -    re.DOTALL,
    +    rf"(@pytest\.mark\.parametrize\(.*?\)\s*)?def\s+{re.escape(test_function)}\(.*?\):.*?(?=^def\s|\Z)",
    +    re.DOTALL | re.MULTILINE,
     )
    Suggestion importance[1-10]: 6

    __

    Why: Adding re.MULTILINE improves the regex boundary detection across lines and aligns with the intended lookahead semantics without major functional changes.

    Low
    General
    Collapse redundant blank lines

    Collapse any leftover empty lines after removing a function to avoid leaving large
    gaps in the test source. A simple post‐substitution clean‐up can ensure the
    generated tests remain well‐formatted.

    codeflash/code_utils/edit_generated_tests.py [22-24]

    -generated_test.generated_original_test_source = function_pattern.sub(
    -    "", generated_test.generated_original_test_source
    -)
    +cleaned = function_pattern.sub("", generated_test.generated_original_test_source)
    +# remove extra blank lines
    +generated_test.generated_original_test_source = re.sub(r"\n{3,}", "\n\n", cleaned)
    Suggestion importance[1-10]: 5

    __

    Why: Cleaning up extra blank lines after removal maintains test formatting and readability, though it’s a stylistic improvement with moderate impact.

    Low

    @misrasaurabh1 misrasaurabh1 changed the title Add the test case timing to the generated test Add the test case timing to the generated test [CF-482] Jun 6, 2025
    codeflash-ai bot added a commit that referenced this pull request Jun 6, 2025
    …(`add-timing-info-to-generated-tests`)
    
    Here's an optimized and faster version of your program. The main performance inefficiencies in the original code are.
    
    - **Repeated attribute accesses with `dict.get()` inside loops:** Pre-collecting values boosts efficiency.
    - **Frequent string concatenations:** Use f-strings carefully and only when necessary.
    - **Unnecessary use of `sorted` on a set each run.** Build this directly from the data.
    - **Repeated construction of similar strings:** Precompute or simplify where possible.
    - **Using `textwrap.indent` in a loop:** Combine with minimal copies.
    - **No need for `textwrap.dedent` if formatting is already explicit.**
    
    Below is the refactored code following these optimizations.
    
    
    
    **Summary of the changes:**
    - **Single pass for collecting imports and function names.**
    - **Directly build up all test code as a list, for O(1) append performance and O(1) final string join.**
    - **Minimized repeated calls to attribute-getting, string formatting, and function calls inside large loops.**
    - **Efficient, manual indentation instead of `textwrap.indent`.**
    - **Templates are constants, dedented only once.**
    - **All constants precomputed outside the loop.**
    
    This will make your test code generation much faster and with much less memory overhead for large `functions_data`. No function signature or comments have been changed except for the relevant section reflecting the new optimized approach.
    @codeflash-ai
    Copy link
    Contributor

    codeflash-ai bot commented Jun 6, 2025

    ⚡️ Codeflash found optimizations for this PR

    📄 50% (0.50x) speedup for create_trace_replay_test_code in codeflash/benchmarking/replay_test.py

    ⏱️ Runtime : 11.5 milliseconds 7.65 milliseconds (best of 348 runs)

    I created a new dependent PR with the suggested changes. Please review:

    If you approve, it will be merged into this PR (branch add-timing-info-to-generated-tests).

    codeflash-ai bot added a commit that referenced this pull request Jun 6, 2025
    …o-to-generated-tests`)
    
    Certainly! Here is an optimized version of your program, keeping the function signature and return identical, but reducing the runtime by.
    
    - **Inlining helpers**: Removes function call overhead for `count_significant_digits`/`format_with_precision`.
    - **Avoiding `str(abs(num))`**: Uses math for significant digit checks (`if num >= 100`), since you only compare if it is >= 3 digits.
    - **Avoiding unnecessary assignments**: Avoids repeated string formatting expressions.
    - **Reduced code nesting**: Reduces the block levels for slightly improved branch prediction.
    - **No use of internal lists or expensive function calls in the critical path.**
    
    This should reduce the overall call count, remove slow string-conversion logic, and eliminate small value checks that are now obviously constant-time.
    
    
    
    ### Key improvements
    - No inner functions or per-call string allocations for digit counting.
    - Direct math comparison for integer digit checks.
    - Calls to string formatting are done only if needed with minimized branching.
    
    This version will be measurably faster per your line statistics, especially on the now inlined "digit count" and "format" logic.
    codeflash-ai bot added a commit that referenced this pull request Jun 6, 2025
     (`add-timing-info-to-generated-tests`)
    
    Here’s an optimized version of your program. The main bottleneck is not in the construction of `target_functions` (which is a tiny fraction of the runtime), but in the way you handle parsing and transformation with `libcst`.  
    However, gathering `target_functions` can be optimized using a list comprehension with tuple unpacking, while avoiding multiple attribute lookups.
    
    Also, the main time is spent in `module.visit(transformer)` and `cst.parse_module`. If you have control over how the transformer (i.e., `AddDecoratorTransformer`) is written, you should make it as restrictive and fast as possible, using `visit_`/`leave_` functions that early-exit on non-target nodes.  
    Below, I’ll only optimize what’s asked–rewriting this function to minimize unnecessary slow steps and any wasted computations, while preserving the code logic and interface.
    
    
    
    ### Changes.
    - Combined the logic of extracting `(class_name, function_name)` into a set comprehension for fewer attribute accesses and tighter bytecode.
    - Added a check: if `target_functions` is empty, we just return the original code immediately (this prevents any parsing/visiting if there's nothing to decorate).
    - Comments left untouched unless relevant code was modified.
    - Retained function signature and interface.
    - Provided some minor micro-optimizations (generator expressions, less branching).
    - Eliminated redundant variable assignments.
    
    ### Notes.
    - As per your profile, most time is spent in parsing and visiting. Optimize the `AddDecoratorTransformer` for early exits and to do as little as possible, in its own definition (not here), for further improvement.
    - Return early for the empty case: saves any parse/visit calls at all.
    - If you do ever get code objects, you could use `isinstance(module, cst.Module)` to skip reparsing, but as per the signature we always expect `str`.
    
    If you want even more speed, next steps are:  
    - Minimize the work that `AddDecoratorTransformer` does (match by qualified name to avoid visiting subtrees needlessly).
    - Use native AST parsing/writing if you have full control over decorator syntax constraints.
    
    Let me know if you want to see transformer optimizations as well!
    @codeflash-ai
    Copy link
    Contributor

    codeflash-ai bot commented Jun 6, 2025

    ⚡️ Codeflash found optimizations for this PR

    📄 17% (0.17x) speedup for add_codeflash_decorator_to_code in codeflash/benchmarking/instrument_codeflash_trace.py

    ⏱️ Runtime : 596 milliseconds 510 milliseconds (best of 18 runs)

    I created a new dependent PR with the suggested changes. Please review:

    If you approve, it will be merged into this PR (branch add-timing-info-to-generated-tests).

    Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
    @codeflash-ai
    Copy link
    Contributor

    codeflash-ai bot commented Jun 6, 2025

    This PR is now faster! 🚀 Saurabh Misra accepted my code suggestion above.

    @misrasaurabh1
    Copy link
    Contributor Author

    codeflash just sped up vibe code claude code :)

    @misrasaurabh1 misrasaurabh1 merged commit 9be93b6 into main Jun 6, 2025
    15 of 16 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants