You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR
* Renames EXECUTORCH_BUILD_TORCHAO to EXECUTORCH_BUILD_KERNELS_TORCHAO
to be more in line with other kernel options (e.g.,
EXECUTORCH_BUILD_KERNELS_OPTIMIZED)
* Fixes torchao lowbit kernel dependencies in xcframeworks
* Adds torchao lowbit kernels to the swift package
Copy file name to clipboardExpand all lines: docs/source/using-executorch-ios.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,7 @@ The ExecuTorch Runtime for iOS and macOS (ARM64) is distributed as a collection
14
14
*`kernels_llm` - Custom kernels for LLMs
15
15
*`kernels_optimized` - Accelerated generic CPU kernels
16
16
*`kernels_quantized` - Quantized kernels
17
+
*`kernels_torchao` - Quantized CPU kernels from torchao
17
18
18
19
Link your binary with the ExecuTorch runtime and any backends or kernels used by the exported ML model. It is recommended to link the core runtime to the components that use ExecuTorch directly, and link kernels and backends against the main app target.
Copy file name to clipboardExpand all lines: examples/models/llama/README.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -340,11 +340,13 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de
340
340
341
341
## Running with low-bit kernels
342
342
343
-
We now give instructions for quantizating and running your model with low-bit kernels. These are still experimental, and require you do development on an Arm-based Mac, and install executorch from source with the environment variable EXECUTORCH_BUILD_TORCHAO=1 defined:
343
+
We now give instructions for quantizating and running your model with low-bit kernels. These are still experimental, and require you do development on an Arm-based Mac, and install executorch from source with the environment variable EXECUTORCH_BUILD_KERNELS_TORCHAO=1 defined:
(If you'd like lowbit to use KleidiAI when available, you can instead install with `EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_KLEIDIAI=1 python install_executorch.py`.)
349
+
348
350
Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results.
349
351
350
352
First export your model for lowbit quantization (step 2 above):
0 commit comments