You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/quantizing_moe/README.md
+16-17Lines changed: 16 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,17 +17,17 @@ pip install -e .
17
17
The provided example script demonstrates an end-to-end process for applying the quantization algorithm:
18
18
19
19
```bash
20
-
python3 mixtral_moe_w8a8_fp8.py
20
+
python3 mixtral_example.py
21
21
```
22
22
23
23
## Creating a Quantized MoE Model
24
24
25
-
This example leverages `llm-compressor` and `compressed-tensors` to create an FP8-quantized `Mixtral-8x7B-Instruct-v0.1` model. The model is calibrated and trained using the `open_platypus` dataset.
25
+
This example leverages `llm-compressor` and `compressed-tensors` to create an FP8-quantized `Mixtral-8x7B-Instruct-v0.1` model. The model is calibrated and trained using the `ultrachat_200k` dataset.
26
26
27
27
You can follow the detailed steps below or simply run the example script with:
28
28
29
29
```bash
30
-
python mixtral_moe_w8a8_fp8.py
30
+
python mixtral_example.py
31
31
```
32
32
33
33
### Step 1: Select a Model, Dataset, and Recipe
@@ -61,7 +61,6 @@ oneshot(
61
61
recipe=recipe,
62
62
save_compressed=True,
63
63
output_dir=output_dir,
64
-
65
64
max_seq_length=2048,
66
65
num_calibration_samples=512,
67
66
)
@@ -74,7 +73,7 @@ NOTE: Only per-tensor quantization is supported in vLLM as of now (`vllm==0.6.1`
74
73
75
74
The repository supports multiple quantization techniques configured via a recipe. Supported strategies include `tensor`, `group`, and `channel` quantization.
76
75
77
-
In the above example, FP8 per-tensor quantization is used as specified by the `FP8` scheme. For other preset schemes, refer to the [quantization schemes](https://github.com/neuralmagic/compressed-tensors/blob/main/src/compressed_tensors/quantization/quant_scheme.py) in the `compressed-tensors` library.
76
+
In the above example, quantization is specified by the `FP8` scheme. For other preset schemes, refer to the [quantization schemes](https://github.com/neuralmagic/compressed-tensors/blob/main/src/compressed_tensors/quantization/quant_scheme.py) in the `compressed-tensors` library.
78
77
79
78
A custom scheme can also be specified using `config_groups`:
80
79
@@ -84,18 +83,18 @@ A custom scheme can also be specified using `config_groups`:
84
83
from llmcompressor.modifiers.quantization.gptq import GPTQModifier
0 commit comments