Skip to content

Commit a076272

Browse files
committed
init
1 parent 37036b3 commit a076272

File tree

3 files changed

+2
-4
lines changed

3 files changed

+2
-4
lines changed

.ci/scripts/test_llama_torchao_lowbit.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,6 @@ ${PYTHON_EXECUTABLE} -m examples.models.llama.export_llama \
7878
-qmode "torchao:8da${QLINEAR_BITWIDTH}w" \
7979
--group_size ${QLINEAR_GROUP_SIZE} \
8080
-E "torchao:${QEMBEDDING_BITWIDTH},${QEMBEDDING_GROUP_SIZE}" \
81-
--disable_dynamic_shape \
8281
-d fp32
8382

8483
# Test run

examples/models/llama/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,7 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de
382382
383383
## Running with low-bit kernels
384384
385-
We now give instructions for quantizating and running your model with low-bit kernels. These are still experimental, and require you do development on an Arm-based Mac. Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results. Currently dynamic shapes must be disabled when exporting a model with these kernels.
385+
We now give instructions for quantizating and running your model with low-bit kernels. These are still experimental, and require you do development on an Arm-based Mac. Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results.
386386
387387
First export your model for lowbit quantization (step 2 above):
388388
@@ -408,7 +408,6 @@ python -m examples.models.llama.export_llama \
408408
-qmode "torchao:8da${QLINEAR_BITWIDTH}w" \
409409
--group_size ${QLINEAR_GROUP_SIZE} \
410410
-E "torchao:${QEMBEDDING_BITWIDTH},${QEMBEDDING_GROUP_SIZE}" \
411-
--disable_dynamic_shape \
412411
-d fp32
413412
```
414413

0 commit comments

Comments
 (0)