You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pt2e quantizes at the post-export graph level, swapping nodes and injecting quant/dequant nodes.
111
-
Used mainly for non-CPU backends (QNN, CoreML, Vulkan).
111
+
**To quantize on non-CPU backends (QNN, CoreML, Vulkan), this is the quantization path to follow.**
112
112
Read more about pt2e [here](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html), and how ExecuTorch uses pt2e [here](https://github.com/pytorch/executorch/blob/main/docs/source/quantization-overview.md).
113
113
114
+
*CoreML and Vulkan support for export_llm is currently experimental and limited. To read more about QNN export, please read [Running on Android (Qualcomm)](build-run-llama3-qualcomm-ai-engine-direct-backend.md).*
115
+
114
116
115
117
## Backend support
116
118
Backend options are defined by [`BackendConfig`](https://github.com/pytorch/executorch/blob/main/extension/llm/export/config/llm_config.py#L434). Each backend has their own backend configuration options. Here is an example of lowering the LLM to XNNPACK for CPU acceleration:
0 commit comments