Skip to content

Conversation

@alvin-r
Copy link
Contributor

@alvin-r alvin-r commented Mar 19, 2025

PR Type

Enhancement, Tests


Description

  • Added codeflash_trace decorator for function instrumentation.

  • Introduced benchmark tracing with SQLite storage and replay tests.

  • Integrated a pytest plugin and CLI/config support for benchmarking.

  • Enhanced the optimizer pipeline to use benchmark and replay timing data.


Changes walkthrough 📝

Relevant files
Formatting
1 files
bubble_sort.py
Normalize return statement in bubble sort function             
+1/-1     
Enhancement
16 files
bubble_sort_codeflash_trace.py
Added traced bubble sort functions and class methods         
+46/-0   
bubble_sort_multithread.py
Introduced multithreaded sorter using traced bubble sort 
+23/-0   
process_and_bubble_sort.py
Added computation and pairwise products with sorter call 
+28/-0   
process_and_bubble_sort_codeflash_trace.py
Added traced process and sort function variant                     
+28/-0   
benchmark_database_utils.py
New module for managing benchmark trace data via SQLite   
+296/-0 
codeflash_trace.py
Introduced codeflash_trace decorator implementation           
+80/-0   
instrument_codeflash_trace.py
Added transformer to instrument functions with codeflash_trace
+109/-0 
plugin.py
Added a pytest plugin to integrate Codeflash benchmark tracing
+62/-0   
pytest_new_process_trace_benchmarks.py
New script to run benchmark tests and record trace data   
+33/-0   
replay_test.py
Added replay test generation from captured benchmark trace data
+282/-0 
trace_benchmarks.py
Added function to trigger benchmark tracing via subprocess
+42/-0   
utils.py
Added utilities to process and display benchmark timing data
+123/-0 
functions_to_optimize.py
Enhanced static method detection for functions to optimize
+10/-4   
explanation.py
Updated explanation to include benchmark details information
+20/-8   
function_optimizer.py
Integrated benchmark timing and replay test data into optimization
+51/-6   
optimizer.py
Enhanced optimizer to run benchmarks and generate replay tests
+69/-6   
Tests
9 files
test_benchmark_bubble_sort.py
Added benchmark tests for traced bubble sort functionality
+13/-0   
test_process_and_sort.py
Added benchmark tests for process and sort traced functions
+8/-0     
test_multithread_sort.py
Added multithread benchmark test for sorter function         
+4/-0     
test_benchmark_bubble_sort.py
Added additional tests for bubble sort trace decorator     
+20/-0   
test_process_and_sort.py
Added replay and benchmark tests for process and sort functions
+8/-0     
test_codeflash_trace_decorator.py
Added tests for codeflash_trace decorator functionality   
+15/-0   
test_instrument_codeflash_trace.py
Added tests for AST-based instrumentation of codeflash_trace decorator
+246/-0 
test_trace_benchmarks.py
Added tests to validate benchmark trace and replay test generation
+212/-0 
test_unit_test_discovery.py
Updated unit test discovery to handle benchmark test exclusion
+14/-1   
Configuration changes
2 files
cli.py
Extended CLI arguments to support benchmark options           
+9/-1     
config_parser.py
Integrated benchmarks-root into configuration parser         
+2/-2     
Additional files
10 files
test_bubble_sort.py +18/-18 
test_bubble_sort_parametrized.py +18/-18 
__init__.py [link]   
__init__.py [link]   
PrComment.py +6/-2     
models.py +44/-0   
create_pr.py +2/-0     
test_results.py +29/-0   
test_runner.py +2/-0     
verification_utils.py +2/-1     

Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @alvin-r alvin-r marked this pull request as draft March 19, 2025 23:04
    @github-actions
    Copy link

    github-actions bot commented Mar 19, 2025

    PR Reviewer Guide 🔍

    (Review updated until commit 77f43a5)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Benchmark Integration

    The PR adds several new parameters and processing steps for benchmarking (e.g. function_benchmark_timings, total_benchmark_timings, replay performance gain, and benchmark details). It is recommended to verify that the behavior is correct for both benchmark-enabled and disabled modes, and that fallback defaults are handled gracefully.

            best_optimization.candidate.explanation, title="Best Candidate Explanation", border_style="blue"
        )
    )
    processed_benchmark_info = None
    if self.args.benchmark:
        processed_benchmark_info = process_benchmark_data(
            replay_performance_gain=best_optimization.replay_performance_gain,
            fto_benchmark_timings=self.function_benchmark_timings,
            total_benchmark_timings=self.total_benchmark_timings
        )
    explanation = Explanation(
    Trace Overhead & Env Handling

    The new tracing decorator utilizes time.thread_time_ns() and manipulates environment variables to control benchmarking behavior. It is important to review that the measurement overhead is minimal and that the environment-based switching does not introduce unintended side effects in non-benchmark scenarios.

    import functools
    import os
    import pickle
    import time
    from typing import Callable
    
    
    
    
    class CodeflashTrace:
        """A class that provides both a decorator for tracing function calls
        and a context manager for managing the tracing data lifecycle.
        """
    
        def __init__(self) -> None:
            self.function_calls_data = []
    
        def __exit__(self, exc_type, exc_val, exc_tb) -> None:
            # Cleanup is optional here
            pass
    
        def __call__(self, func: Callable) -> Callable:
            """Use as a decorator to trace function execution.
    
            Args:
                func: The function to be decorated
    
            Returns:
                The wrapped function
    
            """
            @functools.wraps(func)
            def wrapper(*args, **kwargs):
                # Measure execution time
                start_time = time.thread_time_ns()
                result = func(*args, **kwargs)
                end_time = time.thread_time_ns()
                # Calculate execution time
                execution_time = end_time - start_time
    
                # Measure overhead
                overhead_start_time = time.thread_time_ns()
    
                try:
                    # Check if currently in pytest benchmark fixture
                    if os.environ.get("CODEFLASH_BENCHMARKING", "False") == "False":
                        return result
    
                    # Pickle the arguments
                    pickled_args = pickle.dumps(args, protocol=pickle.HIGHEST_PROTOCOL)
                    pickled_kwargs = pickle.dumps(kwargs, protocol=pickle.HIGHEST_PROTOCOL)
    
                    # Get benchmark info from environment
                    benchmark_function_name = os.environ.get("CODEFLASH_BENCHMARK_FUNCTION_NAME", "")
                    benchmark_file_name = os.environ.get("CODEFLASH_BENCHMARK_FILE_NAME", "")
                    benchmark_line_number = os.environ.get("CODEFLASH_BENCHMARK_LINE_NUMBER", "")
                    # Get class name
                    class_name = ""
                    qualname = func.__qualname__
                    if "." in qualname:
                        class_name = qualname.split(".")[0]
                    # Calculate overhead time
                    overhead_end_time = time.thread_time_ns()
                    overhead_time = overhead_end_time - overhead_start_time
    
    
                    self.function_calls_data.append(
                        (func.__name__, class_name, func.__module__, func.__code__.co_filename,
                         benchmark_function_name, benchmark_file_name, benchmark_line_number, execution_time,
                         overhead_time, pickled_args, pickled_kwargs)
                    )
                    print("appended")
                except Exception as e:
                    print(f"Error in codeflash_trace: {e}")
    
                return result
            return wrapper
    
    # Create a singleton instance
    codeflash_trace = CodeflashTrace()
    Test Robustness

    New tests are introduced for trace benchmark functionality using SQLite and replay tests. Please validate that the expected record counts and file output behavior remain robust under various conditions and edge cases.

    import sqlite3
    
    from codeflash.benchmarking.benchmark_database_utils import BenchmarkDatabaseUtils
    from codeflash.benchmarking.trace_benchmarks import trace_benchmarks_pytest
    from codeflash.benchmarking.replay_test import generate_replay_test
    from pathlib import Path
    
    from codeflash.benchmarking.utils import print_benchmark_table, validate_and_format_benchmark_table
    import shutil
    
    
    def test_trace_benchmarks():
        # Test the trace_benchmarks function
        project_root = Path(__file__).parent.parent / "code_to_optimize"
        benchmarks_root = project_root / "tests" / "pytest" / "benchmarks_test"
        tests_root = project_root / "tests" / "test_trace_benchmarks"
        tests_root.mkdir(parents=False, exist_ok=False)
        output_file = (tests_root / Path("test_trace_benchmarks.trace")).resolve()
        trace_benchmarks_pytest(benchmarks_root, tests_root, project_root, output_file)
        assert output_file.exists()
        try:
            # check contents of trace file
            # connect to database
            conn = sqlite3.connect(output_file.as_posix())
            cursor = conn.cursor()
    
            # Get the count of records
            # Get all records
            cursor.execute(
                "SELECT function_name, class_name, module_name, file_name, benchmark_function_name, benchmark_file_name, benchmark_line_number FROM function_calls ORDER BY benchmark_file_name, benchmark_function_name, function_name")
            function_calls = cursor.fetchall()
    
            # Assert the length of function calls
            assert len(function_calls) == 7, f"Expected 6 function calls, but got {len(function_calls)}"
    
            bubble_sort_path = (project_root / "bubble_sort_codeflash_trace.py").as_posix()
            process_and_bubble_sort_path = (project_root / "process_and_bubble_sort_codeflash_trace.py").as_posix()
            # Expected function calls
            expected_calls = [
                ("__init__", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace",
                 f"{bubble_sort_path}",
                 "test_class_sort", "test_benchmark_bubble_sort.py", 20),
    
                ("sort_class", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace",
                 f"{bubble_sort_path}",
                 "test_class_sort", "test_benchmark_bubble_sort.py", 18),
    
                ("sort_static", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace",
                 f"{bubble_sort_path}",
                 "test_class_sort", "test_benchmark_bubble_sort.py", 19),
    
                ("sorter", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace",
                 f"{bubble_sort_path}",
                 "test_class_sort", "test_benchmark_bubble_sort.py", 17),
    
                ("sorter", "", "code_to_optimize.bubble_sort_codeflash_trace",
                 f"{bubble_sort_path}",
                 "test_sort", "test_benchmark_bubble_sort.py", 7),
    
                ("compute_and_sort", "", "code_to_optimize.process_and_bubble_sort_codeflash_trace",
                 f"{process_and_bubble_sort_path}",
                 "test_compute_and_sort", "test_process_and_sort.py", 4),
    
                ("sorter", "", "code_to_optimize.bubble_sort_codeflash_trace",
                 f"{bubble_sort_path}",
                 "test_no_func", "test_process_and_sort.py", 8),
            ]
            for idx, (actual, expected) in enumerate(zip(function_calls, expected_calls)):
                assert actual[0] == expected[0], f"Mismatch at index {idx} for function_name"
                assert actual[1] == expected[1], f"Mismatch at index {idx} for class_name"
                assert actual[2] == expected[2], f"Mismatch at index {idx} for module_name"
                assert Path(actual[3]).name == Path(expected[3]).name, f"Mismatch at index {idx} for file_name"
                assert actual[4] == expected[4], f"Mismatch at index {idx} for benchmark_function_name"
                assert actual[5] == expected[5], f"Mismatch at index {idx} for benchmark_file_name"
                assert actual[6] == expected[6], f"Mismatch at index {idx} for benchmark_line_number"
            # Close connection
            conn.close()
            generate_replay_test(output_file, tests_root)
            test_class_sort_path = tests_root / Path("test_benchmark_bubble_sort_py_test_class_sort__replay_test_0.py")
            assert test_class_sort_path.exists()
            test_class_sort_code = f"""
    import dill as pickle
    
    from code_to_optimize.bubble_sort_codeflash_trace import \\
        Sorter as code_to_optimize_bubble_sort_codeflash_trace_Sorter
    from codeflash.benchmarking.replay_test import get_next_arg_and_return
    
    functions = ['sorter', 'sort_class', 'sort_static']
    trace_file_path = r"{output_file.as_posix()}"
    
    def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter_sorter():
        for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sorter", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100):
            args = pickle.loads(args_pkl)
            kwargs = pickle.loads(kwargs_pkl)
            function_name = "sorter"
            if not args:
                raise ValueError("No arguments provided for the method.")
            if function_name == "__init__":
                ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter(*args[1:], **kwargs)
            else:
                instance = args[0] # self
                ret = instance.sorter(*args[1:], **kwargs)
    
    def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter_sort_class():
        for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sort_class", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100):
            args = pickle.loads(args_pkl)
            kwargs = pickle.loads(kwargs_pkl)
            if not args:
                raise ValueError("No arguments provided for the method.")
            ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter.sort_class(*args[1:], **kwargs)
    
    def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter_sort_static():
        for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sort_static", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100):
            args = pickle.loads(args_pkl)
            kwargs = pickle.loads(kwargs_pkl)
            ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter.sort_static(*args, **kwargs)
    
    def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter___init__():
        for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="__init__", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100):
            args = pickle.loads(args_pkl)
            kwargs = pickle.loads(kwargs_pkl)
            function_name = "__init__"
            if not args:
                raise ValueError("No arguments provided for the method.")
            if function_name == "__init__":
                ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter(*args[1:], **kwargs)
            else:
                instance = args[0] # self
                ret = instance(*args[1:], **kwargs)
    
    """
            assert test_class_sort_path.read_text("utf-8").strip()==test_class_sort_code.strip()
    
            test_sort_path = tests_root / Path("test_benchmark_bubble_sort_py_test_sort__replay_test_0.py")
            assert test_sort_path.exists()
            test_sort_code = f"""
    import dill as pickle
    
    from code_to_optimize.bubble_sort_codeflash_trace import \\
        sorter as code_to_optimize_bubble_sort_codeflash_trace_sorter
    from codeflash.benchmarking.replay_test import get_next_arg_and_return
    
    functions = ['sorter']
    trace_file_path = r"{output_file}"
    
    def test_code_to_optimize_bubble_sort_codeflash_trace_sorter():
        for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sorter", file_name=r"{bubble_sort_path}", num_to_get=100):
            args = pickle.loads(args_pkl)
            kwargs = pickle.loads(kwargs_pkl)
            ret = code_to_optimize_bubble_sort_codeflash_trace_sorter(*args, **kwargs)
    
    """
            assert test_sort_path.read_text("utf-8").strip()==test_sort_code.strip()
        finally:
            # cleanup
            shutil.rmtree(tests_root)
            pass
    
    def test_trace_multithreaded_benchmark() -> None:
        project_root = Path(__file__).parent.parent / "code_to_optimize"
        benchmarks_root = project_root / "tests" / "pytest" / "benchmarks_multithread"
        tests_root = project_root / "tests" / "test_trace_benchmarks"
        tests_root.mkdir(parents=False, exist_ok=False)
        output_file = (tests_root / Path("test_trace_benchmarks.trace")).resolve()
        trace_benchmarks_pytest(benchmarks_root, tests_root, project_root, output_file)
        assert output_file.exists()
        try:
            # check contents of trace file
            # connect to database
            conn = sqlite3.connect(output_file.as_posix())
            cursor = conn.cursor()
    
            # Get the count of records
            # Get all records
            cursor.execute(
                "SELECT function_name, class_name, module_name, file_name, benchmark_function_name, benchmark_file_name, benchmark_line_number FROM function_calls ORDER BY benchmark_file_name, benchmark_function_name, function_name")
            function_calls = cursor.fetchall()
    
            # Assert the length of function calls
            assert len(function_calls) == 10, f"Expected 10 function calls, but got {len(function_calls)}"
            function_benchmark_timings = BenchmarkDatabaseUtils.get_function_benchmark_timings(output_file)
            total_benchmark_timings = BenchmarkDatabaseUtils.get_benchmark_timings(output_file)
            function_to_results = validate_and_format_benchmark_table(function_benchmark_timings, total_benchmark_timings)
            assert "code_to_optimize.bubble_sort_codeflash_trace.sorter" in function_to_results
    
            test_name, total_time, function_time, percent = function_to_results["code_to_optimize.bubble_sort_codeflash_trace.sorter"][0]
            assert total_time > 0.0
            assert function_time > 0.0
            assert percent > 0.0
    
            bubble_sort_path = (project_root / "bubble_sort_codeflash_trace.py").as_posix()
            # Expected function calls
            expected_calls = [
                ("sorter", "", "code_to_optimize.bubble_sort_codeflash_trace",
                 f"{bubble_sort_path}",
                 "test_benchmark_sort", "test_multithread_sort.py", 4),
            ]
            for idx, (actual, expected) in enumerate(zip(function_calls, expected_calls)):
                assert actual[0] == expected[0], f"Mismatch at index {idx} for function_name"
                assert actual[1] == expected[1], f"Mismatch at index {idx} for class_name"
                assert actual[2] == expected[2], f"Mismatch at index {idx} for module_name"
                assert Path(actual[3]).name == Path(expected[3]).name, f"Mismatch at index {idx} for file_name"
                assert actual[4] == expected[4], f"Mismatch at index {idx} for benchmark_function_name"
                assert actual[5] == expected[5], f"Mismatch at index {idx} for benchmark_file_name"
                assert actual[6] == expected[6], f"Mismatch at index {idx} for benchmark_line_number"
            # Close connection
            conn.close()
    
        finally:
            # cleanup
            shutil.rmtree(tests_root)
            pass

    @github-actions
    Copy link

    github-actions bot commented Mar 19, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    General
    Remove debugging prints

    Remove or disable debug print statements to clean production logs.

    codeflash/discovery/functions_to_optimize.py [361-366]

     elif any(
                         isinstance(decorator, ast.Name) and decorator.id == "staticmethod"
                         for decorator in body_node.decorator_list
                     ):
                         self.is_staticmethod = True
    -                    print(f"static method found: {self.function_name}")
    Suggestion importance[1-10]: 5

    __

    Why: The suggestion cleanly removes an unnecessary debug print, which improves production log quality without impacting functionality.

    Low

    @alvin-r alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Failure
    @alvin-r alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Failure
    @alvin-r alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Failure
    @alvin-r alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Failure
    @alvin-r alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Error
    @alvin-r alvin-r requested a review from misrasaurabh1 April 17, 2025 22:21
    @alvin-r alvin-r merged commit 711ee5e into main Apr 17, 2025
    17 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    Review effort 5/5 workflow-modified This PR modifies GitHub Actions workflows

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants