You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[06-fused-attention] Temporarily modify the source code to get back performance (#4399)
This PR relands recent commits to tutorial 06-fused-attention, and
performs the minimal changes in the source code to get back the original
performance.
@etiotto identified the cause of the FP16 performance regression is due
to different implementation of transpose in the source code:
#4283 (comment).
This PR revert back to the old implementation. The last commit of this
PR should be reverted when the new implementation of transpose can be
handled efficiently.
Fixes#4283
0 commit comments