Skip to content

Commit 659db4f

Browse files
committed
Scott comments
1 parent 1b2fd42 commit 659db4f

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

docs/source/llm/export-llm.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,9 @@ Quantization options are defined by [`QuantizationConfig`](https://github.com/py
7272
1. TorchAO [`quantize_`](https://docs.pytorch.org/ao/stable/generated/torchao.quantization.quantize_.html) API
7373
2. [pt2e quantization](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html)
7474

75-
### TorchAO
75+
### TorchAO (XNNPACK)
7676
TorchAO quantizes at the source code level, swapping out Linear modules for QuantizedLinear modules.
77-
This is the recommended quantization path for running on CPU.
77+
**To quantize on XNNPACK backend, this is the quantization path to follow.**
7878
The quantization modes are defined [here](https://github.com/pytorch/executorch/blob/main/extension/llm/export/config/llm_config.py#L306).
7979

8080
Common ones to use are:
@@ -106,11 +106,13 @@ python -m extension.llm.export.export_llm \
106106
--config path/to/config.yaml
107107
```
108108

109-
### pt2e
109+
### pt2e (QNN, CoreML, and Vulkan)
110110
pt2e quantizes at the post-export graph level, swapping nodes and injecting quant/dequant nodes.
111-
Used mainly for non-CPU backends (QNN, CoreML, Vulkan).
111+
**To quantize on non-CPU backends (QNN, CoreML, Vulkan), this is the quantization path to follow.**
112112
Read more about pt2e [here](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html), and how ExecuTorch uses pt2e [here](https://github.com/pytorch/executorch/blob/main/docs/source/quantization-overview.md).
113113

114+
*CoreML and Vulkan support for export_llm is currently experimental and limited. To read more about QNN export, please read [Running on Android (Qualcomm)](build-run-llama3-qualcomm-ai-engine-direct-backend.md).*
115+
114116

115117
## Backend support
116118
Backend options are defined by [`BackendConfig`](https://github.com/pytorch/executorch/blob/main/extension/llm/export/config/llm_config.py#L434). Each backend has their own backend configuration options. Here is an example of lowering the LLM to XNNPACK for CPU acceleration:

0 commit comments

Comments
 (0)