Skip to content

Conversation

baskrahmer
Copy link
Contributor

@baskrahmer baskrahmer commented Jun 20, 2025

Fixes #20644

Note however that this would affect performance for other users, so the question is whether it is worth optimizing for this edge case that is fundamentally a torch bug.

cc @Borda


📚 Documentation preview 📚: https://pytorch-lightning--20921.org.readthedocs.build/en/20921/

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Jun 20, 2025
@baskrahmer baskrahmer force-pushed the fix/no-grad-amp-bug branch from 08508b6 to d18fb08 Compare June 20, 2025 13:46
@baskrahmer baskrahmer marked this pull request as ready for review June 20, 2025 15:33
@Borda
Copy link
Contributor

Borda commented Jun 23, 2025

Note however that this would affect performance for other users, so the question is whether it is worth optimizing for this edge case that is fundamentally a torch bug.

Then se shall report it and offer a fix in torch
Then, if it is accepted and released, we shall have a version switch in our codebase, so newer Torch versions won't need this compared to the old one. Does it...

BTW, have you measured the performance drop?
cc: @lantiga

@baskrahmer
Copy link
Contributor Author

@Borda it is a long-standing issue in torch. I can try to make a fix if I have some time, but I think it could be complex.

But I agree with you that it should be fixed in torch ideally. Just wanted to open this PR to showcase what a workaround on our end would look like. Shall I close it?

I haven't measured the performance drop since it will vary strongly across architectures and probably also hardware setups.

@Borda
Copy link
Contributor

Borda commented Jul 3, 2025

@baskrahmer lets link the Torch issue also here for visibility 🐰

@baskrahmer
Copy link
Contributor Author

@baskrahmer lets link the Torch issue also here for visibility 🐰

Sure, there's pytorch/pytorch#65766, pytorch/pytorch#112583 and pytorch/pytorch#105211

@Borda Borda merged commit 216f9ec into Lightning-AI:master Sep 2, 2025
82 of 88 checks passed
Borda added a commit that referenced this pull request Sep 3, 2025
* Disable cache for torch.autocast in amp
* Add a test
* Only test for bf16-mixed
* Implement test to reproduce the issue

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <[email protected]>
(cherry picked from commit 216f9ec)
lantiga pushed a commit that referenced this pull request Sep 5, 2025
* Disable cache for torch.autocast in amp
* Add a test
* Only test for bf16-mixed
* Implement test to reproduce the issue

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <[email protected]>
(cherry picked from commit 216f9ec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pl Generic label for PyTorch Lightning package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Computation graph not being built

2 participants