-
Notifications
You must be signed in to change notification settings - Fork 1
Productionize the dataset we are using for BackendBench #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,36 @@ | |||
# Copyright (c) Meta Platforms, Inc. and affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be a script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a script, though the next pr I'm adding after this is to add runtime data to things, and I'd want to do it here. Also we'll likely end up skipping more tests for the benchmark, so I think having all the filtering logic in one place would be smart.
@@ -152,4 +154,17 @@ def deserialize_args(inps): | |||
# f strings introduce quotations we dont want | |||
for key in dtype_abbrs_parsing: | |||
inps = inps.replace(f"'{key}'", key) | |||
|
|||
# Handle torch.device strings - replace "torch.device(...)" with torch.device(...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a corner case in deserialization where two traces in to_copy cannot be deserialized properly as it has device="torch.device("cpu")" in kwargs and the current logic treats this as a string. Honestly, given how many tests (order of tens) we have for that op + the nature of that operator. I'm fine removing those.
However, for now let's keep it as we should do a full run on all the ops / tests soon to figure out filtering rules. I'll stick an issue on github.
] | ||
# for details on the dataset read this: | ||
# https://huggingface.co/datasets/GPUMODE/huggingface_op_trace | ||
DEFAULT_HUGGINGFACE_URL = "https://huggingface.co/datasets/GPUMODE/huggingface_op_trace/resolve/main/backend_bench_problems.parquet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just confirming this is the non augmented version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, (at least all of the synthtic code is specified to not be in the benchmark) and that logic is added you can explore it here https://huggingface.co/datasets/GPUMODE/huggingface_op_trace
Due a bug in gh I couldn't update this pr, the review history is found at #57
This looks like a much bigger PR than it actually is. This is most of the work we need to do on the repo end for #44
This PR
I think 3 and 4 definitely require the most review.
The schema is described in the comment at the top of BackendBench/scripts/parquet_trace_converter.py
I'd also take a close look at the filters on BackendBench/scripts/dataset_filters.py as this contains a bunch of ops that seem to not be useful in a benchmark, but I'd like a second look.
BackendBench/scripts/parquet_trace_converter.py offers a trace-to-parquet mode and a parquet-to-trace mode. parquet-to-trace mode is self explanatory. trace-to-parquet mode actually creates two parquet files. The first is a "dev" parquet which contains a bunch of extra metadata on the inputs while the final parquet (I refer to as prod) is the one that should be used in benchmarks and is the result of all the filtering.
You can find explanations of the trace files (this can be removed as it should not be permanent) and the argument schema at https://huggingface.co/datasets/GPUMODE/huggingface_op_trace (I will add the parquet schema once it is done and finalized).
The results of creating and uploading a parquet to huggingface - https://huggingface.co/datasets/GPUMODE/huggingface_op_trace
A validation that this works is that this the roundtrip conversion of the tritonbench data trace -> parquet (dev) -> trace
https://www.diffchecker.com/YYiJ43cq/. The differences are attributed to the fact that we do rename the op "aten.sum.SymInt" to "aten.sum.dim_IntList".