Skip to content

Commit f523cb7

Browse files
authored
Update README.md
1 parent 8b21aa6 commit f523cb7

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,8 @@ CUDA_VISIBLE_DEVICES=0 python generate.py \
102102
--interactive
103103
```
104104

105+
To benchmark inference speed, remove `--interactive`.
106+
105107
Please treat the current inference implementation as just a proof of concept! There are a few limitations:
106108
- Only FP16 is supported, as Triton does not currently support BF16 `atomic_add`.
107109
- Block-wise greedy sparsities are not currently supported (expect to have this very soon!).

0 commit comments

Comments
 (0)