Productionize the dataset we are using for BackendBench #93

PaliC · 2025-08-19T05:56:03Z

Due a bug in gh I couldn't update this pr, the review history is found at #57

This looks like a much bigger PR than it actually is. This is most of the work we need to do on the repo end for #44

This PR

Adds a dataloaders folder to support loading things from parquet files, huggingface urls, and trace files (BackendBench/data_loaders.py)
Creates a script that let's one go back and forth between parquet and trace files (BackendBench/scripts/parquet_trace_converter.py)
Defines a schema for what the final dataset ought to look like
Adds a few filters to help filter out bad inputs (in this case ops we likely don't want to benchmark because they are fill or view). This should be scalable to add more filters like outputs being close to zero or runtime is too short.

I think 3 and 4 definitely require the most review.
The schema is described in the comment at the top of BackendBench/scripts/parquet_trace_converter.py

I'd also take a close look at the filters on BackendBench/scripts/dataset_filters.py as this contains a bunch of ops that seem to not be useful in a benchmark, but I'd like a second look.

BackendBench/scripts/parquet_trace_converter.py offers a trace-to-parquet mode and a parquet-to-trace mode. parquet-to-trace mode is self explanatory. trace-to-parquet mode actually creates two parquet files. The first is a "dev" parquet which contains a bunch of extra metadata on the inputs while the final parquet (I refer to as prod) is the one that should be used in benchmarks and is the result of all the filtering.

You can find explanations of the trace files (this can be removed as it should not be permanent) and the argument schema at https://huggingface.co/datasets/GPUMODE/huggingface_op_trace (I will add the parquet schema once it is done and finalized).

The results of creating and uploading a parquet to huggingface - https://huggingface.co/datasets/GPUMODE/huggingface_op_trace

A validation that this works is that this the roundtrip conversion of the tritonbench data trace -> parquet (dev) -> trace
https://www.diffchecker.com/YYiJ43cq/. The differences are attributed to the fact that we do rename the op "aten.sum.SymInt" to "aten.sum.dim_IntList".

msaroufim · 2025-08-19T16:31:08Z

BackendBench/scripts/dataset_filters.py

@@ -0,0 +1,36 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.


This doesn't seem to be a script

it's not a script, though the next pr I'm adding after this is to add runtime data to things, and I'd want to do it here. Also we'll likely end up skipping more tests for the benchmark, so I think having all the filtering logic in one place would be smart.

msaroufim · 2025-08-19T16:31:57Z

BackendBench/utils.py

@@ -152,4 +154,17 @@ def deserialize_args(inps):
    # f strings introduce quotations we dont want
    for key in dtype_abbrs_parsing:
        inps = inps.replace(f"'{key}'", key)
+
+    # Handle torch.device strings - replace "torch.device(...)" with torch.device(...)


I forget why do we need this?

It's a corner case in deserialization where two traces in to_copy cannot be deserialized properly as it has device="torch.device("cpu")" in kwargs and the current logic treats this as a string. Honestly, given how many tests (order of tens) we have for that op + the nature of that operator. I'm fine removing those.

However, for now let's keep it as we should do a full run on all the ops / tests soon to figure out filtering rules. I'll stick an issue on github.

msaroufim · 2025-08-19T16:32:33Z

BackendBench/torchbench_suite.py

-]
+# for details on the dataset read this:
+# https://huggingface.co/datasets/GPUMODE/huggingface_op_trace
+DEFAULT_HUGGINGFACE_URL = "https://huggingface.co/datasets/GPUMODE/huggingface_op_trace/resolve/main/backend_bench_problems.parquet"


just confirming this is the non augmented version?

yes, (at least all of the synthtic code is specified to not be in the benchmark) and that logic is added you can explore it here https://huggingface.co/datasets/GPUMODE/huggingface_op_trace

PaliC and others added 30 commits July 29, 2025 16:38

Add tests for serialization and deserialization

8803f09

fix

7495107

fix

e23bd3a

[ez] get workflows to run on prs (#39)

0d54c1c

Grab txt file from huggingface as the default (#38)

0eb0753

Installable backends (#27)

037f7c5

Fix flag gems tests and imports (#35)

455b443

Fixes to kernel agent backend tests (#46)

dd1aa1c

Filter out solutions that have cuda streams (#56)

5a5702a

Add tests for serialization and deserialization

e6bb19a

fix

4b1722b

fix

3a670c6

rebase

e4ccfb8

rebase fix

7618519

rebase fix

32d52d1

Merge branch 'main' into serial

d8c186c

Adding parquet file

1c18247

filtering logic

1ecb1f7

Merge branch 'main' into parquet

55bcfd6

cleanup

a1bdf7a

Merge branch 'main' into parquet

32d3c7b

Merge branch 'parquet' of github.com:PaliC/BackendBench into parquet

8940b44

parquet

7408e7a

udpate deps

f535d8a

undo lint

a68fbda

update hf upload

a58f0d8

Mark's comments

e8c5d1a

Merge branch 'main' into parquet

a4c8171

lint

d25c2d3

stream from urls

9ae0cac

PaliC and others added 7 commits August 14, 2025 14:56

simplify

f23690a

lint

dbe3a8d

marks comments

0705fa6

Mark's comments

9094e1a

Mark's comments

5cc096c

remove big inputs from dataset

37d5b27

final fix

30661de

PaliC requested review from msaroufim and jiannanWang as code owners August 19, 2025 05:56

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 19, 2025

licenses

2a50a4a

msaroufim reviewed Aug 19, 2025

View reviewed changes

PaliC requested a review from msaroufim August 19, 2025 16:45

msaroufim approved these changes Aug 19, 2025

View reviewed changes

msaroufim merged commit 4acfe6c into main Aug 19, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Productionize the dataset we are using for BackendBench #93

Productionize the dataset we are using for BackendBench #93

Uh oh!

PaliC commented Aug 19, 2025 •

edited

Loading

Uh oh!

msaroufim Aug 19, 2025

Uh oh!

PaliC Aug 19, 2025

Uh oh!

msaroufim Aug 19, 2025

Uh oh!

PaliC Aug 19, 2025 •

edited

Loading

Uh oh!

msaroufim Aug 19, 2025

Uh oh!

PaliC Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

		@@ -0,0 +1,36 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.

Productionize the dataset we are using for BackendBench #93

Productionize the dataset we are using for BackendBench #93

Uh oh!

Conversation

PaliC commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Due a bug in gh I couldn't update this pr, the review history is found at #57

Uh oh!

msaroufim Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

PaliC Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

msaroufim Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

PaliC Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msaroufim Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

PaliC Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

PaliC commented Aug 19, 2025 •

edited

Loading

PaliC Aug 19, 2025 •

edited

Loading