Update README.md

chromecast56 · web-flow · commit ecbee1e28f6b · 2024-12-04T20:09:06.000-08:00
diff --git a/README.md b/README.md
@@ -14,7 +14,7 @@ The current release supports:
 
 - Llama-2 and Mistral based models.
 - Memory efficient 16-bit + 1-bit Δ Linear in PyTorch
-- Triton kernel for fast inference
+- Triton kernel for fast inference (TODO: Update repo with faster [BitBLAS](https://github.com/microsoft/BitBLAS) W1A16 kernel)
 - Gradio demo showcasing batched inference over 6 Mistral-7B based models, using only **30 GB** of GPU memory!
 
 ## News