We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 8b21aa6 commit f523cb7Copy full SHA for f523cb7
README.md
@@ -102,6 +102,8 @@ CUDA_VISIBLE_DEVICES=0 python generate.py \
102
--interactive
103
```
104
105
+To benchmark inference speed, remove `--interactive`.
106
+
107
Please treat the current inference implementation as just a proof of concept! There are a few limitations:
108
- Only FP16 is supported, as Triton does not currently support BF16 `atomic_add`.
109
- Block-wise greedy sparsities are not currently supported (expect to have this very soon!).
0 commit comments