Skip to content

Commit 61d50a1

Browse files
authored
Update README.md
1 parent e6d0ffd commit 61d50a1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ CUDA_VISIBLE_DEVICES=0 python generate.py \
106106

107107
Please treat the current inference implementation as just a proof of concept! There are a few limitations:
108108
- Only FP16 is supported, as Triton does not currently support BF16 `atomic_add`.
109-
- Block-wise greedy sparsities are not currently supported.
109+
- Block-wise greedy sparsities are not currently supported (expect to have this very soon!).
110110
- Quantized sparse kernels are not currently supported (though, would love a PR!).
111111
- Speculative decoding is untested
112112

0 commit comments

Comments
 (0)