Skip to content

Commit a7d0af6

Browse files
authored
[TUTORIAL] Remove block pointer from fused attention + run in CI (#6839)
This unifies all architectures to now use the `_tma` variant (with the suffix now removed). When TMA is not natively supported, we use device-side tensor descriptors which fall back to normal pointer-based loads. I also removed the test `cuda/test_flashattention.py` which seems to just be a clone of the tutorial, and instead run the tutorial itself in CI. Also fixes #6242
1 parent 622e05b commit a7d0af6

File tree

3 files changed

+104
-714
lines changed

3 files changed

+104
-714
lines changed

Makefile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,14 @@ test-cpp:
3030

3131
.PHONY: test-python
3232
test-unit: all
33-
cd python/test/unit && $(PYTEST) -s -n 8 --ignore=cuda/test_flashattention.py \
34-
--ignore=language/test_line_info.py --ignore=language/test_subprocess.py --ignore=test_debug.py
33+
cd python/test/unit && $(PYTEST) -s -n 8 --ignore=language/test_line_info.py \
34+
--ignore=language/test_subprocess.py --ignore=test_debug.py
3535
$(PYTEST) -s -n 8 python/test/unit/language/test_subprocess.py
3636
$(PYTEST) -s -n 8 python/test/unit/test_debug.py --forked
3737
$(PYTEST) -s -n 8 python/triton_kernels/tests/
3838
TRITON_DISABLE_LINE_INFO=0 $(PYTEST) -s python/test/unit/language/test_line_info.py
39-
# Run cuda/test_flashattention.py separately to avoid out of gpu memory
40-
$(PYTEST) -s python/test/unit/cuda/test_flashattention.py
39+
# Run attention separately to avoid out of gpu memory
40+
$(PYTEST) -s python/tutorials/06-fused-attention.py
4141
TRITON_ALWAYS_COMPILE=1 TRITON_DISABLE_LINE_INFO=0 LLVM_PASS_PLUGIN_PATH=python/triton/instrumentation/libGPUInstrumentationTestLib.so \
4242
$(PYTEST) --capture=tee-sys -rfs -vvv python/test/unit/instrumentation/test_gpuhello.py
4343

0 commit comments

Comments
 (0)