We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 7c90f61 commit ecbee1eCopy full SHA for ecbee1e
README.md
@@ -14,7 +14,7 @@ The current release supports:
14
15
- Llama-2 and Mistral based models.
16
- Memory efficient 16-bit + 1-bit Δ Linear in PyTorch
17
-- Triton kernel for fast inference
+- Triton kernel for fast inference (TODO: Update repo with faster [BitBLAS](https://github.com/microsoft/BitBLAS) W1A16 kernel)
18
- Gradio demo showcasing batched inference over 6 Mistral-7B based models, using only **30 GB** of GPU memory!
19
20
## News
0 commit comments