Commit a63ab0b
[Inductor] Fix out-of-bounds indices in repeat_interleave decomposition (pytorch#165368)
When `repeat_interleave` is decomposed into:
```bash
cumsum = repeat.cumsum(0)
pos = torch.arange(output_size, device=repeat.device)
indices = torch.searchsorted(cumsum, pos, right=True)
```
`searchsorted` op with `right=True` returns the insertion point after matching elements. When query values `pos` are `>= cumsum[-1]`, searchsorted returns `len(cumsum)`, which is out of bounds for indexing (valid range: `[0, len(cumsum)-1]`). These invalid indices trigger CUDA device-side assert errors in downstream indexing operations.
This fix adds clamping to ensure all indices stay within the valid range [0, repeat.size(0)-1].
Pull Request resolved: pytorch#165368
Approved by: https://github.com/mlazos1 parent 102b788 commit a63ab0b
File tree
2 files changed
+34
-1
lines changed- test/inductor
- torch/_inductor
2 files changed
+34
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14268 | 14268 | | |
14269 | 14269 | | |
14270 | 14270 | | |
| 14271 | + | |
| 14272 | + | |
| 14273 | + | |
| 14274 | + | |
| 14275 | + | |
| 14276 | + | |
| 14277 | + | |
| 14278 | + | |
| 14279 | + | |
| 14280 | + | |
| 14281 | + | |
| 14282 | + | |
| 14283 | + | |
| 14284 | + | |
| 14285 | + | |
| 14286 | + | |
| 14287 | + | |
| 14288 | + | |
| 14289 | + | |
| 14290 | + | |
| 14291 | + | |
| 14292 | + | |
| 14293 | + | |
| 14294 | + | |
| 14295 | + | |
| 14296 | + | |
| 14297 | + | |
| 14298 | + | |
| 14299 | + | |
| 14300 | + | |
| 14301 | + | |
| 14302 | + | |
14271 | 14303 | | |
14272 | 14304 | | |
14273 | 14305 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1188 | 1188 | | |
1189 | 1189 | | |
1190 | 1190 | | |
1191 | | - | |
| 1191 | + | |
1192 | 1192 | | |
1193 | 1193 | | |
| 1194 | + | |
1194 | 1195 | | |
1195 | 1196 | | |
1196 | 1197 | | |
| |||
0 commit comments