Skip to content

Conversation

@KRRT7
Copy link
Contributor

@KRRT7 KRRT7 commented Apr 23, 2025

User description

currently, if we want to benchmark a function which has arguments which can be permuted, such as

def _aggregate_images_shape(
    images: List[np.ndarray], mode: Literal["min", "max", "avg"]
) -> Tuple[int, int]:
    if mode not in SHAPE_AGGREGATION_FUN:
        raise ValueError(
            f"Could not aggregate images shape - provided unknown mode: {mode}. "
            f"Supported modes: {list(SHAPE_AGGREGATION_FUN.keys())}."
        )
    return SHAPE_AGGREGATION_FUN[mode](images)

we would have to express it as

def test_calculate_aggregated_images_shape_min(benchmark, dataset_reference):
    images, image_sizes = dataset_reference
    benchmark(
        _aggregate_images_shape,
        images,
        "min",
    )


def test_calculate_aggregated_images_shape_max(benchmark, dataset_reference):
    images, image_sizes = dataset_reference
    benchmark(
        _aggregate_images_shape,
        images,
        "max",
    )


def test_calculate_aggregated_images_shape_avg(benchmark, dataset_reference):
    images, image_sizes = dataset_reference
    benchmark(
        _aggregate_images_shape,
        images,
        "avg",
    )

instead, a more concise way of expressing it is:

def test_calculate_aggregated_images_shape(benchmark, dataset_reference):
    images, image_sizes = dataset_reference
    for mode in ["min", "max", "avg"]:
        benchmark(
            _aggregate_images_shape,
            images,
            mode,
            benchmark_name_suffix=f"mode_{mode}",
        )

and

def test_is_prediction_registration_forbidden(benchmark):
    predictions: list[Prediction] = [
        {"predictions": [{"x": 37}], "top": "cat"},
        {"top": "cat"},
        {"predictions": []},
        {"is_stub": True},
        {},
    ]
    persist_options = [True, False]
    roboflow_image_id_options = ["some_id", None]

    param_combinations = itertools.product(
        predictions,
        persist_options,
        roboflow_image_id_options,
    )

    for i, (prediction, persist, roboflow_image_id) in enumerate(param_combinations):
        benchmark(
            is_prediction_registration_forbidden,
            prediction=prediction,
            persist_predictions=persist,
            roboflow_image_id=roboflow_image_id,
            benchmark_name_suffix=(
                f"prediction_case_{i}_"
                f"persist_{persist}_"
                f"has_image_id_{roboflow_image_id is not None}"
            ),
        )

for more complicated cases


PR Type

Enhancement, Bug fix, Tests


Description

Replace prints with logger for consistent logging
Support multiple benchmark calls in single test
Add call counters and benchmark name suffix
Introduce cleanup logic for temporary artifacts


Changes walkthrough 📝

Relevant files
Error handling
1 files
codeflash_trace.py
Use logger instead of print in tracing                                     
+52/-25 
Formatting
2 files
instrument_codeflash_trace.py
Refactor decorator insertion formatting                                   
+21/-41 
pytest_new_process_trace_benchmarks.py
Adjust pytest invocation argument formatting                         
+14/-2   
Enhancement
6 files
plugin.py
Improve pytest plugin logging and fixture                               
+49/-67 
trace_benchmarks.py
Simplify error logging in benchmark collection                     
+4/-4     
utils.py
Add recursive cleanup and console integration                       
+48/-41 
code_utils.py
Extend cleanup_paths to remove directories                             
+5/-1     
functions_to_optimize.py
Log function count with console rule separator                     
+1/-0     
optimizer.py
Refactor optimization cleanup logic                                           
+77/-38 
Bug fix
1 files
replay_test.py
Fix SQL query execution and formatting                                     
+42/-44 

Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    SQL Execution

    Incorrect use of SQL query string as callable; should pass the query and parameters to execute() instead of invoking the string.

        query = "SELECT * FROM benchmark_function_timings WHERE benchmark_function_name = ? AND function_name = ? AND file_path = ? AND class_name = ? LIMIT ?"  # noqa: E501
        cursor = cur.execute(query(benchmark_function_name, function_name, file_path, class_name, limit))
    else:
    NameError

    Variable benchmarks_root is undefined in the pytest.main invocation; likely should use tests_root.

    [
        benchmarks_root,
        "--codeflash-trace",
        "-p",
    Env Var Duplicate

    CODEFLASH_BENCHMARKING is set to "True" twice in _run_benchmark; remove the redundant assignment.

    os.environ["CODEFLASH_BENCHMARKING"] = "True"
    os.environ["CODEFLASH_BENCHMARK_FUNCTION_NAME"] = call_identifier
    os.environ["CODEFLASH_BENCHMARK_MODULE_PATH"] = benchmark_module_path
    os.environ["CODEFLASH_BENCHMARK_LINE_NUMBER"] = str(line_number)
    os.environ["CODEFLASH_BENCHMARKING"] = "True"

    @github-actions
    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix SQL execution parameters

    Pass the SQL string and parameters separately to execute() instead of calling the
    query string as a function.

    codeflash/benchmarking/replay_test.py [31-32]

     query = "SELECT * FROM benchmark_function_timings WHERE benchmark_function_name = ? AND function_name = ? AND file_path = ? AND class_name = ? LIMIT ?"  # noqa: E501
    -cursor = cur.execute(query(benchmark_function_name, function_name, file_path, class_name, limit))
    +cursor = cur.execute(query, (benchmark_function_name, function_name, file_path, class_name, limit))
    Suggestion importance[1-10]: 8

    __

    Why: This fixes a functional bug by passing the SQL string and its parameters separately to execute(), preventing a runtime error when treating the query string as a callable.

    Medium
    General
    Support decorator usage

    Restore decorator support by returning a wrapper when call is used without
    direct args, so @benchmark still works.

    codeflash/benchmarking/plugin/plugin.py [230-232]

     def __call__(self, func: Callable[..., Any], *args: Any, **kwargs: Any) -> Any:  # noqa: ANN401
    -    benchmark_name_suffix = kwargs.pop("benchmark_name_suffix", None)
    -    return self._run_benchmark(func, args, kwargs, benchmark_name_suffix)
    +    if args or kwargs or kwargs.get("benchmark_name_suffix") is not None:
    +        suffix = kwargs.pop("benchmark_name_suffix", None)
    +        return self._run_benchmark(func, args, kwargs, suffix)
    +    def wrapped(*a: Any, **k: Any) -> Any:
    +        return self._run_benchmark(func, a, k)
    +    return wrapped
    Suggestion importance[1-10]: 6

    __

    Why: The change reinstates decorator behavior for the Benchmark fixture, allowing both direct calls and @benchmark decorator usage without breaking existing functionality.

    Low

    @KRRT7 KRRT7 force-pushed the roboflow-benchmarking-WIP branch 2 times, most recently from eafa637 to 94a3f73 Compare April 25, 2025 08:46
    @KRRT7 KRRT7 force-pushed the roboflow-benchmarking-WIP branch from 94a3f73 to 7673123 Compare April 28, 2025 06:46
    @KRRT7 KRRT7 closed this May 13, 2025
    @KRRT7 KRRT7 deleted the roboflow-benchmarking-WIP branch May 13, 2025 23:00
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants