You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source_en/LLM/Command-line-parameters.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -237,7 +237,7 @@ export parameters inherit from infer parameters, with the following added parame
237
237
-`--merge_lora`: Default is `False`. This parameter is already defined in InferArguments, not a new parameter. Whether to merge lora weights into base model and save full weights. Weights will be saved in the same level directory as `ckpt_dir`, e.g. `'/path/to/your/vx-xxx/checkpoint-xxx-merged'` directory.
238
238
-`--quant_bits`: Number of bits for quantization. Default is `0`, i.e. no quantization. If you set `--quant_method awq`, you can set this to `4` for 4bits quantization. If you set `--quant_method gptq`, you can set this to `2`,`3`,`4`,`8` for corresponding bits quantization. If quantizing original model, weights will be saved in `f'{args.model_type}-{args.quant_method}-int{args.quant_bits}'` directory. If quantizing fine-tuned model, weights will be saved in the same level directory as `ckpt_dir`, e.g. `f'/path/to/your/vx-xxx/checkpoint-xxx-{args.quant_method}-int{args.quant_bits}'` directory.
239
239
-`--quant_method`: Quantization method, default is `'awq'`. Options are 'awq', 'gptq'.
240
-
-`--dataset`: This parameter is already defined in InferArguments, for export it means quantization dataset. Default is `[]`. Recommended to set `--dataset ms-bench-mini`. This dataset contains multilingual content (mainly Chinese) of high quality, with good effect for quantizing Chinese models. You can also set `--dataset pileval`, using autoawq default quantization dataset, the language of this dataset is English. More details: including how to customize quantization dataset, can be found in [LLM Quantization Documentation](LLM-quantization.md).
240
+
-`--dataset`: This parameter is already defined in InferArguments, for export it means quantization dataset. Default is `[]`. More details: including how to customize quantization dataset, can be found in [LLM Quantization Documentation](LLM-quantization.md).
241
241
-`--quant_n_samples`: Quantization parameter, default is `256`. When set to `--quant_method awq`, if OOM occurs during quantization, you can moderately reduce `--quant_n_samples` and `--quant_seqlen`. `--quant_method gptq` generally does not encounter quantization OOM.
242
242
-`--quant_seqlen`: Quantization parameter, default is `2048`.
243
243
-`--quant_device_map`: Default is `'cpu'`, to save memory. You can specify 'cuda:0', 'auto', 'cpu', etc., representing the device to load model during quantization.
# AWQ: Use custom quantization dataset (don't use the `--custom_val_dataset_path` parameter)
50
50
# Same for GPTQ
@@ -67,10 +67,6 @@ CUDA_VISIBLE_DEVICES=0 swift infer \
67
67
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
68
68
```
69
69
70
-
**Comparison of quantization effects**:
71
-
72
-
The comparison shows inference results from the AWQ-INT4 model, GPTQ-INT4 model, and the original unquantized model. The quantized models maintain high quality output while enabling faster inference speeds.
73
-
74
70
## Fine-tuned Model
75
71
76
72
Assume you fine-tuned qwen1half-4b-chat using LoRA, and the model weights directory is: `output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx`.
@@ -79,11 +75,11 @@ Here we only introduce using the AWQ technique to quantize the fine-tuned model.
79
75
80
76
**Merge-LoRA & Quantization**
81
77
```shell
82
-
# Use `ms-bench-mini` as the quantization dataset
78
+
# Use `alpaca-zh alpaca-en sharegpt-gpt4-mini` as the quantization dataset
0 commit comments