-
Notifications
You must be signed in to change notification settings - Fork 4
Filter out operators we cannot test correctness for #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm ok with excluding those ops. They always fail in the tests. My concern is about the naming of these ops. I think bernoulli is a random op. But is it appropriate to also call empty_like, new_empty, new_empty_strieded random ops? I feel there's a difference. |
@jiannanWang That's fair, I guess it technically is it isn't. Let's go with untestable |
"empty_like", | ||
"new_empty", | ||
"new_empty_strided", | ||
"bernoulli", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this one is different from the others, it is testable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fixing the error message to say we don't support them yet is correct. There is a way to test these but it would require some custom work
@@ -20,6 +20,13 @@ | |||
"_fft_c2c.default", # cuFFT only supports dimensions whose sizes are powers of two when computing in half precision | |||
] | |||
|
|||
UNTESTABLE_OPERATORS = [ | |||
"empty_like", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can test metadata if not the values, like we expect the output to have a certain shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same reply as above, I'll add comments to talk about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see CIL also if we're making claims around untestable ops I'd like a more comprehensive list it'll just be confusing slowly iterating on this
@msaroufim this is the entire set of ops for torchbench + opinfo that we don't currently support correctness for (assuming aten is correct). After merging #92 this should become much easier if more stuff comes up as we expand our suites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our offline chat please add more details on the logs you got and how OpInfo does testing for random operators (watermarking) and memory allocation (also likely watermarking)
I was kinda hoping for a bit more detail before merge, in particular when linking to testing of randomness or creation ops I don't feel like the PR description adequately explains the current gaps in OpInfo and how we'd go about fixing them. In particular it's obvious to conceive of examples where memory allocation is being done incorrectly so if you're not incorrect on some obvious cases then you're more likely to be correct. Since we have merge rights into PyTorch core itself we have the ability to go make improvements there, this issue points out something "off" about our testing but stops short of scoping out how to fix it |
The issue this aims to solve is described in #104
Once this is merged I will update the tritonbench suite. This PR is a bit specific to tritonbench atm. It is not comprehensive of everything that needs to be accounted for just for what's in the tritonbench test set right now.
Some analysis
Right now we are using opinfo as our ground truth for testing. However, it has some pretty bogus inputs and outputs (especially with our testing harness in
allclose
. Effectively, for random or fill ops it outputs empty tensors or watermarked inputs. Some examples of this are below.randint.default
Bernoulli.default
empty_like.default
new_empty_strided.default
This pr allows us to skip these tests for the torchbench as our
allclose
does not handle them.What to do later
For pytorch the testing of distributions and random ops can be found at
test_distributions.py and test_random
For fill / tensor creation ops test_tensor_creation_ops.py is where we find those tests
We need to add this testing to backendbench