init

metascroy · metascroy · commit a0762729e7db · 2025-03-24T15:13:38.000-07:00
diff --git a/.ci/scripts/test_llama_torchao_lowbit.sh b/.ci/scripts/test_llama_torchao_lowbit.sh
@@ -78,7 +78,6 @@ ${PYTHON_EXECUTABLE} -m examples.models.llama.export_llama \
     -qmode "torchao:8da${QLINEAR_BITWIDTH}w" \
     --group_size ${QLINEAR_GROUP_SIZE} \
     -E "torchao:${QEMBEDDING_BITWIDTH},${QEMBEDDING_GROUP_SIZE}" \
-    --disable_dynamic_shape \
     -d fp32
 
 # Test run
diff --git a/examples/models/llama/README.md b/examples/models/llama/README.md
@@ -382,7 +382,7 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de
 
 ## Running with low-bit kernels
 
-We now give instructions for quantizating and running your model with low-bit kernels.  These are still experimental, and require you do development on an Arm-based Mac.  Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results.  Currently dynamic shapes must be disabled when exporting a model with these kernels.
+We now give instructions for quantizating and running your model with low-bit kernels.  These are still experimental, and require you do development on an Arm-based Mac.  Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results.
 
 First export your model for lowbit quantization (step 2 above):
 
@@ -408,7 +408,6 @@ python -m examples.models.llama.export_llama \
   -qmode "torchao:8da${QLINEAR_BITWIDTH}w" \
   --group_size ${QLINEAR_GROUP_SIZE} \
   -E "torchao:${QEMBEDDING_BITWIDTH},${QEMBEDDING_GROUP_SIZE}" \
-  --disable_dynamic_shape \
   -d fp32
 ```
 
diff --git a/third-party/ao b/third-party/ao
@@ -1 +1 @@
-Subproject commit 83eb4903916340900c140afd0fe35dfaddf23c23
+Subproject commit f3ff2e5f8533c778459d3a52da0f00d70c661c4e