Update README.md

chromecast56 · web-flow · commit 61d50a189cb8 · 2024-08-28T15:10:45.000-07:00
diff --git a/README.md b/README.md
@@ -106,7 +106,7 @@ CUDA_VISIBLE_DEVICES=0 python generate.py \
 
 Please treat the current inference implementation as just a proof of concept! There are a few limitations:
 - Only FP16 is supported, as Triton does not currently support BF16 `atomic_add`.
-- Block-wise greedy sparsities are not currently supported.
+- Block-wise greedy sparsities are not currently supported (expect to have this very soon!).
 - Quantized sparse kernels are not currently supported (though, would love a PR!).
 - Speculative decoding is untested