We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
2 parents e1245da + b0633a4 commit d31692cCopy full SHA for d31692c
python/perf-kernels/README.md
@@ -42,6 +42,7 @@ This script contains the Flash Attention kernel with the following support
42
- Multi and Grouped Query attention
43
- ALiBi bias
44
- Matrix bias
45
+- Persistent kernels. Useful when the sequence lengths are up to a moderate length and especially when doing causal attention.
46
- Int8 quantization
47
48
These are currently supported for the forward kernel only.
0 commit comments