Rel. pos. MHSA: use fused op for attention computation #88

NeoLegends · 2025-12-03T13:18:27Z

For efficiency in training, and also because torch's scaled_dot_product_attention is automatically exported into an optimized ONNX attention op.

Open Qs:

Is this actually more efficient in reality?
I am still doing the integration of the relative positional encoding outside of the op, because I can only have one additional summand (via the mask parameter) and not an additional factor + sum. Can this be done better?
Does this need a flag to be turned on and off (i.e. switched back to a non-fused implementation)?

Tests pass, so the output continues to be torch.allclose(...) to the ESPNet output even when the fused op is used.

albertz · 2025-12-03T16:31:06Z

Did you check what SDPA backend it would actually use? And what it does use?

I was checking a bit the logic. I think you can see that here:

https://github.com/pytorch/pytorch/blob/7ba4680f3755a560af81aa0f688791e367aa3609/aten/src/ATen/native/transformers/attention.cpp#L718
https://github.com/pytorch/pytorch/blob/e3f24fd73ad74c6e7176687986436956c7c18235/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp#L764

I think for example Flash Attention will not be used, because Flash Attention does not support attn_mask.

NeoLegends · 2025-12-22T11:19:00Z

Yes, flash attention is not used due to the mask. I found in profiling it will use "memory efficient" attention.

I've now made some changes that should make the code compatible with cuDNN attention on the relevant GPUs (mainly around proper bfloat16/float16 support), but it seems like the AppTek runtime image is built in such a way that cuDNN attention is disabled (or it's disabled for H200s?), so I wasn't able to test properly.

I think I can merge once tests pass, but I'll leave some time in case you want to re-review the dtype changes.

test-output.txt

This reverts commit 91fff5a.

albertz · 2025-12-22T12:10:48Z

it seems like the AppTek runtime image is built in such a way that cuDNN is disabled

Are you sure? That would be very suboptimal for everything you compute there. You should use CuDNN for a lot of other things as well.

NeoLegends · 2025-12-22T16:19:40Z

Ah, sorry, I was unclear. I was referring just to cuDNN fused attention. For cuDNN in general I don't know, probably is alright. As you can see in the test output, it unfortunately does not print the reason why cuDNN attention is not being used.

Python 3.13.7 (main, Sep 18 2025, 16:28:29) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.8.0'

albertz · 2025-12-22T16:26:22Z

As you can see in the test output, it unfortunately does not print the reason why cuDNN attention is not being used.

You can go through all of the conditions and just check (print) them yourself.

NeoLegends · 2025-12-22T16:30:01Z

I searched for SDPA input validation functions that don't print a debug message if the check goes wrong. I found the one that checks the attention mask. It doesn't print anything when it fails in some cases.
https://github.com/pytorch/pytorch/blob/229d33f7f9b8abcdd9ba17777eff2f2dbbe4afc9/aten/src/ATen/native/transformers/sdp_utils_cpp.h#L269

It seems like if the attention mask requires a gradient (which here it does since we're adding the B and D matrices to it), it cannot be used with cuDNN attention. So I guess we can only use memory efficient attention here.

NeoLegends self-assigned this Dec 3, 2025

NeoLegends force-pushed the moritz-rel-pos-conf-sdpa branch from 324fc4a to 358de26 Compare December 3, 2025 13:35

Use torch's fused SDPA for attention computation

58699f6

NeoLegends force-pushed the moritz-rel-pos-conf-sdpa branch from eac4313 to 58699f6 Compare December 3, 2025 13:47

NeoLegends marked this pull request as ready for review December 3, 2025 13:57

NeoLegends requested review from albertz, curufinwe and kuacakuaca December 3, 2025 13:57

NeoLegends changed the title ~~MHSA: use fused SDPA for attention computation~~ Rel. pos. MHSA: use fused SDPA for attention computation Dec 3, 2025

NeoLegends changed the title ~~Rel. pos. MHSA: use fused SDPA for attention computation~~ Rel. pos. MHSA: use fused op for attention computation Dec 3, 2025

albertz approved these changes Dec 3, 2025

View reviewed changes

kuacakuaca approved these changes Dec 4, 2025

View reviewed changes

REVERTME: add test to evaluate backend support

91fff5a

NeoLegends added 2 commits December 22, 2025 06:19

Add proper bfloat16 support

b4e6ffd

Revert "REVERTME: add test to evaluate backend support"

c2783ec

This reverts commit 91fff5a.

NeoLegends merged commit 46fe27b into main Jan 20, 2026
2 checks passed

NeoLegends deleted the moritz-rel-pos-conf-sdpa branch January 20, 2026 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rel. pos. MHSA: use fused op for attention computation #88

Rel. pos. MHSA: use fused op for attention computation #88

Uh oh!

NeoLegends commented Dec 3, 2025 •

edited

Loading

Uh oh!

albertz commented Dec 3, 2025

Uh oh!

NeoLegends commented Dec 22, 2025 •

edited

Loading

Uh oh!

albertz commented Dec 22, 2025

Uh oh!

NeoLegends commented Dec 22, 2025 •

edited

Loading

Uh oh!

albertz commented Dec 22, 2025

Uh oh!

NeoLegends commented Dec 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Rel. pos. MHSA: use fused op for attention computation #88

Rel. pos. MHSA: use fused op for attention computation #88

Uh oh!

Conversation

NeoLegends commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertz commented Dec 3, 2025

Uh oh!

NeoLegends commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertz commented Dec 22, 2025

Uh oh!

NeoLegends commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertz commented Dec 22, 2025

Uh oh!

NeoLegends commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NeoLegends commented Dec 3, 2025 •

edited

Loading

NeoLegends commented Dec 22, 2025 •

edited

Loading

NeoLegends commented Dec 22, 2025 •

edited

Loading

NeoLegends commented Dec 22, 2025 •

edited

Loading