Skip to content

Commit 58bd91c

Browse files
authored
[doc] Update quantization instructions for clarity (pytorch#15284)
4bit quantization was a bit buried down, during the hackathon
1 parent fe3c1dc commit 58bd91c

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/source/backends/xnnpack/xnnpack-quantization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,8 @@ See [PyTorch 2 Export Post Training Quantization](https://docs.pytorch.org/ao/ma
6161

6262
The XNNPACK backend also supports quantizing models with the [torchao](https://github.com/pytorch/ao) quantize_ API. This is most commonly used for LLMs, requiring more advanced quantization. Since quantize_ is not backend aware, it is important to use a config that is compatible with CPU/XNNPACK:
6363

64-
* Quantize embeedings with IntxWeightOnlyConfig (with weight_dtype torch.int2, torch.int4, or torch.int8, using PerGroup or PerAxis granularity)
65-
* Quantize linear layers with Int8DynamicActivationIntxWeightConfig (with weight_dtype=torch.int4, using PerGroup or PerAxis granularity)
64+
* Quantize embeedings with `IntxWeightOnlyConfig` (with weight_dtype torch.int2, torch.int4, or torch.int8, using PerGroup or PerAxis granularity)
65+
* Quantize linear layers with 4 bit weight and 8bit dynamic activation, use `Int8DynamicActivationIntxWeightConfig` (with weight_dtype=torch.int4, using PerGroup or PerAxis granularity)
6666

6767
Below is a simple example, but a more detailed tutorial including accuracy evaluation on popular LLM benchmarks can be found in the [torchao documentation](https://docs.pytorch.org/ao/main/serving.html#mobile-deployment-with-executorch).
6868

0 commit comments

Comments
 (0)