We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent e6d0ffd commit 61d50a1Copy full SHA for 61d50a1
README.md
@@ -106,7 +106,7 @@ CUDA_VISIBLE_DEVICES=0 python generate.py \
106
107
Please treat the current inference implementation as just a proof of concept! There are a few limitations:
108
- Only FP16 is supported, as Triton does not currently support BF16 `atomic_add`.
109
-- Block-wise greedy sparsities are not currently supported.
+- Block-wise greedy sparsities are not currently supported (expect to have this very soon!).
110
- Quantized sparse kernels are not currently supported (though, would love a PR!).
111
- Speculative decoding is untested
112
0 commit comments