Commit 5b10b0a
Slightly improve error message from repeat_interleave kernel (pytorch#157996)
Summary:
In many investigations relating to invalid feature values, the three-argument form of `repeat_interleave` currently prints the following message if there is an inconsistency between `sum(repeats)` and `output_size`:
```
Assertion `result_size == cumsum_ptr[size - 1]` failed.
```
This is a bit hard for model authors to understand so I made the error slightly more comprehensible. After the fix the stdout contains the actual values of these parameters: https://fburl.com/mlhub/cfyyhh3q
```
Invalid input! In `repeat_interleave`, the `output_size` argument (949487) must be the same as the sum of the elements in the `repeats` tensor (949687).
```
In many cases, this is potentially useful information since we know for example that the difference between the two values above (949687-949487=200) happens to be the lengths of one of the features.
## What are my concerns with this change?
1. Outputs from `__assert_fail` go to `stderr` whereas `printf` writes to `stdout`. This is not the usual debugging flow where all logs can be found in `stderr`. I could not find a way to redirect `printf` to stderr or `__assert_fail` to stdout
2. Two checks happen instead of one in the error path. I wanted to preserve the semantics of what happens inside `__assert_fail`.
3. I have not seen this pattern in other PyTorch kernels but `repeat_interleave` with three arguments seems special in other ways too.
Test Plan:
* Built an ephemeral package with my changes:
https://www.internalfb.com/intern/servicelab/build/736441058/
* Verified that a job with these changes indeed prints out the expected message to stdout: https://fburl.com/mlhub/jgbqk8eg
* I will export to GH and run CI/CD tests.
Rollback Plan:
steps:
- manual.note:
content: >-
Just reverting this diff should be sufficient. Since this change is in
CUDA kernels, I do not believe there is a way to change the error
message via a JK.
Reviewed By: mradmila
Differential Revision: D77904753
Pull Request resolved: pytorch#157996
Approved by: https://github.com/ngimel, https://github.com/eqy1 parent fb462ce commit 5b10b0a
1 file changed
+7
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
21 | 27 | | |
22 | 28 | | |
23 | 29 | | |
| |||
0 commit comments