[huggingface tracer] Add suite based off of results from tracer #20

PaliC · 2025-07-09T23:50:09Z

This PR is an extension of #21 which is currently being worked on as it's both pretty messy and needs more coverage (right now it only supports 20 models). In practice there are two parts to the tracer. 1) The actual tracer that grabs a bunch of huggingface models and traces them, and 2) turning the output into a test suite. This PR does the latter.

A sample output of the tracer can be found in BackendBench/huggingface_tracer/tracer_ops_and_shapes/sample_inputs.json with an explanation of the schema in BackendBench/huggingface_tracer/tracer_ops_and_shapes/README.md

Effectively, what the code here does is that it takes one of these json outputs and then uses that in order to create test cases for a suite. Currently, I use the 5 most popular sets of inputs and 5 largest sets of inputs to create a set of up to 10 tests per op that we find. Generally this seems to work for correctness, though we still need to support performance.

In order to check dtype / device compatibility, I am using op_info, however, it is not all comprehensive, so I add a few manual ops at BackendBench/huggingface_tracer/manual_ops_mapping.json. We probably need a better solution to this in the long run.

Some weird corner cases

_scaled_dot_product_efficient_attention does not actually pass correctness while everything else does. I am not sure why.
silu interestingly only supports complex dtypes in op_info despite me being able to run it using float32 with the tracer. I am not sure if we should add it if its not officially supported (assuming op_info is official support)

Some more todos

Remove code duplication in main
Add kwargs to data schema
Add tensor strides to data schema

Copilot generated summary to make reviewing this easier

This pull request introduces a comprehensive test suite for HuggingFace tracer data within the BackendBench module. The changes include the addition of new classes and methods for handling tracer operations, parsing JSON data, and generating test cases for PyTorch operations. The updates also include a schema definition for traced inputs and a manual mapping of unsupported operations.

Test Suite Implementation:

BackendBench/huggingface_tracer/__init__.py: Added module-level documentation and exposed key classes and methods (HuggingFaceTracerTest, HuggingFaceTracerOpTest, HuggingFaceTracerTestSuite, build_huggingface_tracer_tests) for creating and running tracer tests.
BackendBench/huggingface_tracer/suite.py: Implemented the HuggingFaceTracerTestSuite class and related functionality for generating tests based on tracer data, including handling unsupported operations and selecting unique inputs.

JSON Data Handling:

BackendBench/huggingface_tracer/tracer_parser.py: Added utilities for loading JSON data, selecting relevant inputs based on popularity and size, and creating tensors and tensor lists from metadata. Special cases for certain operations requiring unique handling were also defined.

Manual Mapping of Unsupported Operations:

BackendBench/huggingface_tracer/manual_ops_mapping.json: Introduced a JSON file mapping unsupported operations to their compatible data types on CPU and CUDA devices, enabling tests for these operations.

Documentation:

BackendBench/huggingface_tracer/tracer_ops_and_shapes/README.md: Added a detailed schema for the structure of traced inputs, including field descriptions and examples, to guide developers in understanding and utilizing the tracer data.

msaroufim · 2025-07-10T00:35:22Z

scripts/main.py

@@ -62,11 +79,17 @@ def cli(suite, backend, ops, llm_max_attempts):
            torch.bfloat16,
            filter=ops,
        ),
+        "huggingface": lambda: HuggingFaceTracerTestSuite(


There's code duplication for HuggingFaceTracerTestSuite - also the file path is still linked to your personal repo

Yeah I agree with that, though let's keep it for now, and get rid of it in a seperate pr. Thanks for the catch on the personal json (though an aws bucket is probably better. It just doesn't make sense to run the tracer once per benchmark)

BackendBench/huggingface_tracer/tracer_parser.py

msaroufim

I'm not totally sure checking in sample_inputs.json will make sense in the repo, especially if we're making updates, mind pushing it to some blob storage instead or HF datasets, something so we can visualize it in browser would be nice too since it's too big to visualize in the Github UI

PaliC · 2025-07-10T19:09:43Z

@msaroufim yeah I agree. I stuck something on huggingface here

[huggingface tracer] Add suite based off of results from tracer

2a2baf6

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 9, 2025

PaliC requested review from bertmaher and msaroufim July 10, 2025 00:04

msaroufim reviewed Jul 10, 2025

View reviewed changes

BackendBench/huggingface_tracer/tracer_parser.py Outdated Show resolved Hide resolved

msaroufim reviewed Jul 10, 2025

View reviewed changes

BackendBench/huggingface_tracer/tracer_parser.py Outdated Show resolved Hide resolved

msaroufim reviewed Jul 10, 2025

View reviewed changes

BackendBench/huggingface_tracer/tracer_parser.py Outdated Show resolved Hide resolved

msaroufim requested changes Jul 10, 2025

View reviewed changes

PaliC added 2 commits July 9, 2025 22:33

fixed issues

29b07d8

fixed issues

10b32dc

fixed issues

5bccf25

PaliC requested a review from msaroufim July 10, 2025 19:10

PaliC closed this Aug 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[huggingface tracer] Add suite based off of results from tracer #20

[huggingface tracer] Add suite based off of results from tracer #20

Uh oh!

PaliC commented Jul 9, 2025 •

edited

Loading

Uh oh!

msaroufim Jul 10, 2025

Uh oh!

PaliC Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msaroufim left a comment

Uh oh!

PaliC commented Jul 10, 2025

Uh oh!

Uh oh!

[huggingface tracer] Add suite based off of results from tracer #20

[huggingface tracer] Add suite based off of results from tracer #20

Uh oh!

Conversation

PaliC commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Copilot generated summary to make reviewing this easier

Test Suite Implementation:

JSON Data Handling:

Manual Mapping of Unsupported Operations:

Documentation:

Uh oh!

msaroufim Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

PaliC Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msaroufim left a comment

Choose a reason for hiding this comment

Uh oh!

PaliC commented Jul 10, 2025

Uh oh!

Uh oh!

PaliC commented Jul 9, 2025 •

edited

Loading