allow benchmarking fixture to be used multiple times within a test case #169

KRRT7 · 2025-04-23T12:16:24Z

User description

currently, if we want to benchmark a function which has arguments which can be permuted, such as

def _aggregate_images_shape(
    images: List[np.ndarray], mode: Literal["min", "max", "avg"]
) -> Tuple[int, int]:
    if mode not in SHAPE_AGGREGATION_FUN:
        raise ValueError(
            f"Could not aggregate images shape - provided unknown mode: {mode}. "
            f"Supported modes: {list(SHAPE_AGGREGATION_FUN.keys())}."
        )
    return SHAPE_AGGREGATION_FUN[mode](images)

we would have to express it as

def test_calculate_aggregated_images_shape_min(benchmark, dataset_reference):
    images, image_sizes = dataset_reference
    benchmark(
        _aggregate_images_shape,
        images,
        "min",
    )


def test_calculate_aggregated_images_shape_max(benchmark, dataset_reference):
    images, image_sizes = dataset_reference
    benchmark(
        _aggregate_images_shape,
        images,
        "max",
    )


def test_calculate_aggregated_images_shape_avg(benchmark, dataset_reference):
    images, image_sizes = dataset_reference
    benchmark(
        _aggregate_images_shape,
        images,
        "avg",
    )

instead, a more concise way of expressing it is:

def test_calculate_aggregated_images_shape(benchmark, dataset_reference):
    images, image_sizes = dataset_reference
    for mode in ["min", "max", "avg"]:
        benchmark(
            _aggregate_images_shape,
            images,
            mode,
            benchmark_name_suffix=f"mode_{mode}",
        )

and

def test_is_prediction_registration_forbidden(benchmark):
    predictions: list[Prediction] = [
        {"predictions": [{"x": 37}], "top": "cat"},
        {"top": "cat"},
        {"predictions": []},
        {"is_stub": True},
        {},
    ]
    persist_options = [True, False]
    roboflow_image_id_options = ["some_id", None]

    param_combinations = itertools.product(
        predictions,
        persist_options,
        roboflow_image_id_options,
    )

    for i, (prediction, persist, roboflow_image_id) in enumerate(param_combinations):
        benchmark(
            is_prediction_registration_forbidden,
            prediction=prediction,
            persist_predictions=persist,
            roboflow_image_id=roboflow_image_id,
            benchmark_name_suffix=(
                f"prediction_case_{i}_"
                f"persist_{persist}_"
                f"has_image_id_{roboflow_image_id is not None}"
            ),
        )

for more complicated cases

PR Type

Enhancement, Bug fix, Tests

Description

Replace prints with logger for consistent logging
Support multiple benchmark calls in single test
Add call counters and benchmark name suffix
Introduce cleanup logic for temporary artifacts

Changes walkthrough 📝

Relevant files

Error handling

1 files

codeflash_trace.py `Use logger instead of print in tracing`	+52/-25

Formatting

2 files

instrument_codeflash_trace.py `Refactor decorator insertion formatting`	+21/-41
pytest_new_process_trace_benchmarks.py `Adjust pytest invocation argument formatting`	+14/-2

Enhancement

6 files

plugin.py `Improve pytest plugin logging and fixture`	+49/-67
trace_benchmarks.py `Simplify error logging in benchmark collection`	+4/-4
utils.py `Add recursive cleanup and console integration`	+48/-41
code_utils.py `Extend cleanup_paths to remove directories`	+5/-1
functions_to_optimize.py `Log function count with console rule separator`	+1/-0
optimizer.py `Refactor optimization cleanup logic`	+77/-38

Bug fix

1 files

replay_test.py `Fix SQL query execution and formatting`	+42/-44

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

github-actions · 2025-04-23T12:18:32Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review SQL Execution Incorrect use of SQL query string as callable; should pass the query and parameters to execute() instead of invoking the string. query = "SELECT * FROM benchmark_function_timings WHERE benchmark_function_name = ? AND function_name = ? AND file_path = ? AND class_name = ? LIMIT ?" # noqa: E501 cursor = cur.execute(query(benchmark_function_name, function_name, file_path, class_name, limit)) else: NameError Variable `benchmarks_root` is undefined in the pytest.main invocation; likely should use `tests_root`. [ benchmarks_root, "--codeflash-trace", "-p", Env Var Duplicate `CODEFLASH_BENCHMARKING` is set to "True" twice in `_run_benchmark`; remove the redundant assignment. os.environ["CODEFLASH_BENCHMARKING"] = "True" os.environ["CODEFLASH_BENCHMARK_FUNCTION_NAME"] = call_identifier os.environ["CODEFLASH_BENCHMARK_MODULE_PATH"] = benchmark_module_path os.environ["CODEFLASH_BENCHMARK_LINE_NUMBER"] = str(line_number) os.environ["CODEFLASH_BENCHMARKING"] = "True"

github-actions · 2025-04-23T12:19:47Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix SQL execution parameters Pass the SQL string and parameters separately to `execute()` instead of calling the query string as a function. codeflash/benchmarking/replay_test.py [31-32] query = "SELECT * FROM benchmark_function_timings WHERE benchmark_function_name = ? AND function_name = ? AND file_path = ? AND class_name = ? LIMIT ?" # noqa: E501 -cursor = cur.execute(query(benchmark_function_name, function_name, file_path, class_name, limit)) +cursor = cur.execute(query, (benchmark_function_name, function_name, file_path, class_name, limit)) Suggestion importance[1-10]: 8 __ Why: This fixes a functional bug by passing the SQL string and its parameters separately to `execute()`, preventing a runtime error when treating the query string as a callable.	Medium
General	Support decorator usage Restore decorator support by returning a wrapper when `call` is used without direct args, so `@benchmark` still works. codeflash/benchmarking/plugin/plugin.py [230-232] def __call__(self, func: Callable[..., Any], args: Any, kwargs: Any) -> Any: # noqa: ANN401 - benchmark_name_suffix = kwargs.pop("benchmark_name_suffix", None) - return self._run_benchmark(func, args, kwargs, benchmark_name_suffix) + if args or kwargs or kwargs.get("benchmark_name_suffix") is not None: + suffix = kwargs.pop("benchmark_name_suffix", None) + return self._run_benchmark(func, args, kwargs, suffix) + def wrapped(a: Any, **k: Any) -> Any: + return self._run_benchmark(func, a, k) + return wrapped Suggestion importance[1-10]: 6 __ Why: The change reinstates decorator behavior for the `Benchmark` fixture, allowing both direct calls and `@benchmark` decorator usage without breaking existing functionality.	Low

github-actions bot added the Review effort 4/5 label Apr 23, 2025

KRRT7 force-pushed the roboflow-benchmarking-WIP branch 2 times, most recently from eafa637 to 94a3f73 Compare April 25, 2025 08:46

benchmarking tidy up

7673123

KRRT7 force-pushed the roboflow-benchmarking-WIP branch from 94a3f73 to 7673123 Compare April 28, 2025 06:46

KRRT7 closed this May 13, 2025

KRRT7 deleted the roboflow-benchmarking-WIP branch May 13, 2025 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

allow benchmarking fixture to be used multiple times within a test case #169

allow benchmarking fixture to be used multiple times within a test case #169

Uh oh!

KRRT7 commented Apr 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 23, 2025

Uh oh!

github-actions bot commented Apr 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

allow benchmarking fixture to be used multiple times within a test case #169

allow benchmarking fixture to be used multiple times within a test case #169

Uh oh!

Conversation

KRRT7 commented Apr 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Changes walkthrough 📝

Uh oh!

github-actions bot commented Apr 23, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Apr 23, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KRRT7 commented Apr 23, 2025 •

edited by github-actions bot

Loading