Skip to content

Conversation

PaliC
Copy link
Contributor

@PaliC PaliC commented Aug 21, 2025

The issue this aims to solve is described in #104

Once this is merged I will update the tritonbench suite. This PR is a bit specific to tritonbench atm. It is not comprehensive of everything that needs to be accounted for just for what's in the tritonbench test set right now.

Some analysis

Right now we are using opinfo as our ground truth for testing. However, it has some pretty bogus inputs and outputs (especially with our testing harness in allclose. Effectively, for random or fill ops it outputs empty tensors or watermarked inputs. Some examples of this are below.

randint.default

[2025-08-22 15:05:16][INFO][eval.py] Looking at randint.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (10, torch.Size([0, 5, 0]))
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {'device': 'cuda'}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.int64)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.int64)

Bernoulli.default

[2025-08-22 15:05:16][INFO][eval.py] Looking at bernoulli.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16)

empty_like.default

[2025-08-22 15:05:16][INFO][eval.py] Looking at empty_like.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor(-6.7188, device='cuda:0', dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: -6.71875
[2025-08-22 15:05:16][INFO][eval.py] aten output is: -6.71875

[2025-08-22 15:05:16][INFO][eval.py] Looking at empty_like.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16)

new_empty_strided.default

[2025-08-22 15:05:16][INFO][eval.py] Looking at new_empty_strided.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor(-6.7188, device='cuda:0', dtype=torch.bfloat16), (), ())
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] Error in allclose
[2025-08-22 15:05:16][INFO][eval.py] 
Exception raised for None:
    args: ((T([], bf16), T([], bf16),), {})
    exc: Scalars are not close!

Expected 0.0 but got -6.71875.
Absolute difference: 6.71875 (up to 0.01 allowed)
Relative difference: inf (up to 0.01 allowed)

[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: -6.71875
[2025-08-22 15:05:16][INFO][eval.py] aten output is: 0.0
[2025-08-22 15:05:16][INFO][eval.py] for new_empty_strided.default is_correct=False abs_error=6.71875 rel_error=1.0

This pr allows us to skip these tests for the torchbench as our allclose does not handle them.

What to do later

For pytorch the testing of distributions and random ops can be found at
test_distributions.py and test_random

For fill / tensor creation ops test_tensor_creation_ops.py is where we find those tests

We need to add this testing to backendbench

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 21, 2025
@jiannanWang
Copy link
Contributor

I'm ok with excluding those ops. They always fail in the tests.

My concern is about the naming of these ops. I think bernoulli is a random op. But is it appropriate to also call empty_like, new_empty, new_empty_strieded random ops? I feel there's a difference.

@PaliC
Copy link
Contributor Author

PaliC commented Aug 21, 2025

@jiannanWang That's fair, I guess it technically is it isn't. Let's go with untestable

"empty_like",
"new_empty",
"new_empty_strided",
"bernoulli",
Copy link
Member

@msaroufim msaroufim Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this one is different from the others, it is testable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fixing the error message to say we don't support them yet is correct. There is a way to test these but it would require some custom work

@@ -20,6 +20,13 @@
"_fft_c2c.default", # cuFFT only supports dimensions whose sizes are powers of two when computing in half precision
]

UNTESTABLE_OPERATORS = [
"empty_like",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can test metadata if not the values, like we expect the output to have a certain shape

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same reply as above, I'll add comments to talk about this

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see CIL also if we're making claims around untestable ops I'd like a more comprehensive list it'll just be confusing slowly iterating on this

@PaliC
Copy link
Contributor Author

PaliC commented Aug 21, 2025

@msaroufim this is the entire set of ops for torchbench + opinfo that we don't currently support correctness for (assuming aten is correct). After merging #92 this should become much easier if more stuff comes up as we expand our suites

@PaliC PaliC requested review from msaroufim and jiannanWang August 21, 2025 23:02
Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our offline chat please add more details on the logs you got and how OpInfo does testing for random operators (watermarking) and memory allocation (also likely watermarking)

@PaliC PaliC merged commit 289d8c6 into meta-pytorch:main Aug 22, 2025
3 checks passed
@PaliC PaliC deleted the remove_random_ops branch August 22, 2025 22:13
@msaroufim
Copy link
Member

msaroufim commented Aug 23, 2025

I was kinda hoping for a bit more detail before merge, in particular when linking to testing of randomness or creation ops I don't feel like the PR description adequately explains the current gaps in OpInfo and how we'd go about fixing them. In particular it's obvious to conceive of examples where memory allocation is being done incorrectly so if you're not incorrect on some obvious cases then you're more likely to be correct. Since we have merge rights into PyTorch core itself we have the ability to go make improvements there, this issue points out something "off" about our testing but stops short of scoping out how to fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants