You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ROCm] fix torch.layer_norm invalid configuration problem when input is large tensor (pytorch#144007)
Fixespytorch#136291
This PR is to fix the `invalid configuration argument` problem happened on ROCm when input is a large tensor when calling `torch.layer_norm`.
```
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/functional.py", line 2573, in layer_norm
return torch.layer_norm
RuntimeError: HIP error: invalid configuration argument
```
After investigation, I found that the reason why this error happened is: The amd compute language runtime checks whether `gridDim.x * blockDim.x` is greater than `std::numeric_limits<uint32_t>::max()` or not. If yes, it will error out with the "invalid configuration argument" message.
The fix is to split the whole task to several chunks so that each chunk will not trigger the failure condition. This will ensure the correctness and completeness given the current kernel implementation logic of `vectorized_layer_norm_kernel`.
Also added a largeTensor layer_norm unit test `test_layer_norm_large_tensor` with the same shape `[16, 3000, 3000, 16]` as the one used by the pytorch issue pytorch#136291 so that the unit test can check the expected output value to ensure correctness.
The future work may include performance optimization of layer_norm and CK layer_norm integration.
Pull Request resolved: pytorch#144007
Approved by: https://github.com/eqy
0 commit comments