Skip to content

Commit 2489ff1

Browse files
[06-fused-attention] Temporarily modify the source code to get back performance (#4399)
This PR relands recent commits to tutorial 06-fused-attention, and performs the minimal changes in the source code to get back the original performance. @etiotto identified the cause of the FP16 performance regression is due to different implementation of transpose in the source code: #4283 (comment). This PR revert back to the old implementation. The last commit of this PR should be reverted when the new implementation of transpose can be handled efficiently. Fixes #4283
2 parents ca7b655 + db09298 commit 2489ff1

File tree

3 files changed

+143
-724
lines changed

3 files changed

+143
-724
lines changed

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ test-cpp:
3030

3131
.PHONY: test-unit
3232
test-unit: all
33-
cd python/test/unit && $(PYTEST) -s -n 8 --ignore=cuda/test_flashattention.py \
34-
--ignore=language/test_line_info.py --ignore=language/test_subprocess.py --ignore=test_debug.py
33+
cd python/test/unit && $(PYTEST) -s -n 8 --ignore=language/test_line_info.py \
34+
--ignore=language/test_subprocess.py --ignore=test_debug.py
3535
$(PYTEST) -s -n 8 python/test/unit/language/test_subprocess.py
3636
$(PYTEST) -s -n 8 python/test/unit/test_debug.py --forked
3737
$(PYTEST) -s -n 8 python/triton_kernels/tests/

0 commit comments

Comments
 (0)