Skip to content

Commit db7ad12

Browse files
authored
update docs (#850)
1 parent 3f3fa7e commit db7ad12

File tree

12 files changed

+70
-67
lines changed

12 files changed

+70
-67
lines changed

docs/source/LLM/LLM量化文档.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -38,16 +38,16 @@ pip install -r requirements/llm.txt -U
3838
# 如果出现量化的时候OOM, 可以适度降低`--quant_n_samples`(默认256)和`--quant_seqlen`(默认2048).
3939
# gptq-int4量化 (使用A100大约需要20分钟, 显存占用: 7GB)
4040

41-
# awq: 使用`ms-bench-mini`作为量化数据集
41+
# awq: 使用`alpaca-zh alpaca-en sharegpt-gpt4-mini`作为量化数据集
4242
CUDA_VISIBLE_DEVICES=0 swift export \
4343
--model_type qwen1half-7b-chat --quant_bits 4 \
44-
--dataset ms-bench-mini --quant_method awq
44+
--dataset alpaca-zh alpaca-en sharegpt-gpt4-mini --quant_method awq
4545

46-
# gptq: 使用`ms-bench-mini`作为量化数据集
46+
# gptq: 使用`alpaca-zh alpaca-en sharegpt-gpt4-mini`作为量化数据集
4747
# gptq量化请先查看此issue: https://github.com/AutoGPTQ/AutoGPTQ/issues/439
4848
OMP_NUM_THREADS=14 CUDA_VISIBLE_DEVICES=0 swift export \
4949
--model_type qwen1half-7b-chat --quant_bits 4 \
50-
--dataset ms-bench-mini --quant_method gptq
50+
--dataset alpaca-zh alpaca-en sharegpt-gpt4-mini --quant_method gptq
5151

5252
# awq: 使用自定义量化数据集 (`--custom_val_dataset_path`参数不进行使用)
5353
# gptq同理
@@ -167,11 +167,11 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
167167

168168
**Merge-LoRA & 量化**
169169
```shell
170-
# 使用`ms-bench-mini`作为量化数据集
170+
# 使用`alpaca-zh alpaca-en sharegpt-gpt4-mini`作为量化数据集
171171
CUDA_VISIBLE_DEVICES=0 swift export \
172172
--ckpt_dir 'output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx' \
173173
--merge_lora true --quant_bits 4 \
174-
--dataset ms-bench-mini --quant_method awq
174+
--dataset alpaca-zh alpaca-en sharegpt-gpt4-mini --quant_method awq
175175

176176
# 使用微调时使用的数据集作为量化数据集
177177
CUDA_VISIBLE_DEVICES=0 swift export \

docs/source/LLM/Qwen1.5全流程最佳实践.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -178,16 +178,16 @@ CUDA_VISIBLE_DEVICES=0 swift app-ui \
178178

179179
使用python:
180180
```python
181-
# Experimental environment: A100
182-
# 26GB GPU memory
181+
# Experimental environment: 3090
182+
# 24GB GPU memory
183183
import os
184184
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
185185

186186
from swift.llm import DatasetName, ModelType, SftArguments, sft_main
187187

188188
sft_args = SftArguments(
189189
model_type=ModelType.qwen1half_7b_chat,
190-
dataset=[DatasetName.ms_bench_mini],
190+
dataset=[DatasetName.alpaca_zh, DatasetName.alpaca_en],
191191
train_dataset_sample=1000,
192192
logging_steps=5,
193193
max_length=2048,
@@ -208,11 +208,11 @@ print(f'best_model_checkpoint: {best_model_checkpoint}')
208208
使用模型并行:
209209
```shell
210210
# Experimental environment: 2 * 3090
211-
# 2 * 19GB GPU memory
211+
# 2 * 18GB GPU memory
212212
CUDA_VISIBLE_DEVICES=0,1 \
213213
swift sft \
214214
--model_type qwen1half-7b-chat \
215-
--dataset ms-bench-mini \
215+
--dataset alpaca-zh alpaca-en \
216216
--train_dataset_sample 1000 \
217217
--logging_steps 5 \
218218
--max_length 2048 \
@@ -225,15 +225,15 @@ swift sft \
225225
--model_author 魔搭 ModelScope \
226226
```
227227

228-
使用**zero3**进行分布式训练的脚本:
228+
使用**zero2**进行分布式训练的脚本:
229229
```shell
230230
# Experimental environment: 4 * 3090
231231
# 4 * 24GB GPU memory
232232
CUDA_VISIBLE_DEVICES=0,1,2,3 \
233233
NPROC_PER_NODE=4 \
234234
swift sft \
235235
--model_type qwen1half-7b-chat \
236-
--dataset ms-bench-mini \
236+
--dataset alpaca-zh alpaca-en \
237237
--train_dataset_sample 1000 \
238238
--logging_steps 5 \
239239
--max_length 2048 \
@@ -244,7 +244,7 @@ swift sft \
244244
--self_cognition_sample 500 \
245245
--model_name 小黄 'Xiao Huang' \
246246
--model_author 魔搭 ModelScope \
247-
--deepspeed default-zero3 \
247+
--deepspeed default-zero2 \
248248
```
249249

250250
如果你想要使用**界面的方式进行训练**, 可以输入以下命令, 并填入相应的值:
@@ -484,7 +484,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 \
484484
NPROC_PER_NODE=4 \
485485
swift sft \
486486
--model_type qwen1half-72b-chat \
487-
--dataset ms-bench-mini \
487+
--dataset alpaca-zh alpaca-en \
488488
--train_dataset_sample 1000 \
489489
--logging_steps 5 \
490490
--max_length 4096 \

docs/source/LLM/命令行参数.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,7 @@ export参数继承了infer参数, 除此之外增加了以下参数:
241241
- `--merge_lora`: 默认为`False`. 该参数已在InferArguments中定义, 不属于新增参数. 是否将lora权重merge到基模型中, 并保存完整的权重. 权重会保存在`ckpt_dir`的同级目录中, e.g. `'/path/to/your/vx-xxx/checkpoint-xxx-merged'`目录下.
242242
- `--quant_bits`: 量化的bits数. 默认为`0`, 即不进行量化. 如果你设置了`--quant_method awq`, 你可以设置为`4`进行4bits量化. 如果你设置了`--quant_method gptq`, 你可以设置为`2`,`3`,`4`,`8`进行对应bits的量化. 如果对原始模型进行量化, 权重会保存在`f'{args.model_type}-{args.quant_method}-int{args.quant_bits}'`目录中. 如果对微调后模型进行量化, 权重会保存在`ckpt_dir`的同级目录中, e.g. `f'/path/to/your/vx-xxx/checkpoint-xxx-{args.quant_method}-int{args.quant_bits}'`目录下.
243243
- `--quant_method`: 量化方法, 默认为`'awq'`. 你可以选择为'awq', 'gptq'.
244-
- `--dataset`: 该参数已在InferArguments中定义, 在export时含义为量化数据集. 默认为`[]`. 推荐设置为`--dataset ms-bench-mini`. 该数据集含多语言的内容(中文为主)且质量很高, 量化中文模型具有很好的效果. 你也可以设置`--dataset pileval`, 使用autoawq默认量化数据集, 该数据集的语言为英文. 更多细节: 包括如何自定义量化数据集, 可以参考[LLM量化文档](LLM量化文档.md).
244+
- `--dataset`: 该参数已在InferArguments中定义, 在export时含义为量化数据集. 默认为`[]`. 更多细节: 包括如何自定义量化数据集, 可以参考[LLM量化文档](LLM量化文档.md).
245245
- `--quant_n_samples`: 量化参数, 默认为`256`. 当设置为`--quant_method awq`时, 如果出现量化的时候OOM, 可以适度降低`--quant_n_samples``--quant_seqlen`. `--quant_method gptq`通常不会出现量化OOM.
246246
- `--quant_seqlen`: 量化参数, 默认为`2048`.
247247
- `--quant_device_map`: 默认为`'cpu'`, 节约显存. 你可以指定为'cuda:0', 'auto', 'cpu'等, 表示量化时模型导入的设备.

docs/source/LLM/自我认知微调最佳实践.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -75,15 +75,15 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-4b-chat
7575
使用python:
7676
```python
7777
# Experimental environment: A10, 3090, V100, ...
78-
# 23GB GPU memory
78+
# 22GB GPU memory
7979
import os
8080
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
8181

8282
from swift.llm import DatasetName, ModelType, SftArguments, sft_main
8383

8484
sft_args = SftArguments(
8585
model_type=ModelType.qwen1half_4b_chat,
86-
dataset=[DatasetName.ms_bench_mini],
86+
dataset=[DatasetName.alpaca_zh, DatasetName.alpaca_en],
8787
train_dataset_sample=1000,
8888
logging_steps=5,
8989
max_length=2048,
@@ -132,11 +132,11 @@ Val: 100%|███████████████████████
132132
使用CLI (单卡):
133133
```bash
134134
# Experimental environment: A10, 3090, V100, ...
135-
# 23GB GPU memory
135+
# 22GB GPU memory
136136
CUDA_VISIBLE_DEVICES=0 \
137137
swift sft \
138138
--model_type qwen1half-4b-chat \
139-
--dataset ms-bench-mini \
139+
--dataset alpaca-zh alpaca-en \
140140
--train_dataset_sample 1000 \
141141
--logging_steps 5 \
142142
--max_length 2048 \
@@ -149,16 +149,16 @@ swift sft \
149149
--model_author 魔搭 ModelScope \
150150
```
151151

152-
使用CLI (DDP):
152+
使用CLI (DeepSpeed-ZeRO2):
153153
> 如果你使用的是3090等卡, 可以降低`max_length`来减少显存消耗.
154154
```bash
155-
# Experimental environment: 4 * A100
156-
# 4 * 32GB GPU memory
155+
# Experimental environment: 4 * 3090
156+
# 4 * 24GB GPU memory
157157
CUDA_VISIBLE_DEVICES=0,1,2,3 \
158158
NPROC_PER_NODE=4 \
159159
swift sft \
160160
--model_type qwen1half-4b-chat \
161-
--dataset ms-bench-mini \
161+
--dataset alpaca-zh alpaca-en \
162162
--train_dataset_sample 1000 \
163163
--logging_steps 5 \
164164
--max_length 2048 \
@@ -169,6 +169,7 @@ swift sft \
169169
--self_cognition_sample 500 \
170170
--model_name 小黄 'Xiao Huang' \
171171
--model_author 魔搭 ModelScope \
172+
--deepspeed default-zero2
172173
```
173174

174175
## 微调后推理

docs/source_en/LLM/Command-line-parameters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,7 @@ export parameters inherit from infer parameters, with the following added parame
237237
- `--merge_lora`: Default is `False`. This parameter is already defined in InferArguments, not a new parameter. Whether to merge lora weights into base model and save full weights. Weights will be saved in the same level directory as `ckpt_dir`, e.g. `'/path/to/your/vx-xxx/checkpoint-xxx-merged'` directory.
238238
- `--quant_bits`: Number of bits for quantization. Default is `0`, i.e. no quantization. If you set `--quant_method awq`, you can set this to `4` for 4bits quantization. If you set `--quant_method gptq`, you can set this to `2`,`3`,`4`,`8` for corresponding bits quantization. If quantizing original model, weights will be saved in `f'{args.model_type}-{args.quant_method}-int{args.quant_bits}'` directory. If quantizing fine-tuned model, weights will be saved in the same level directory as `ckpt_dir`, e.g. `f'/path/to/your/vx-xxx/checkpoint-xxx-{args.quant_method}-int{args.quant_bits}'` directory.
239239
- `--quant_method`: Quantization method, default is `'awq'`. Options are 'awq', 'gptq'.
240-
- `--dataset`: This parameter is already defined in InferArguments, for export it means quantization dataset. Default is `[]`. Recommended to set `--dataset ms-bench-mini`. This dataset contains multilingual content (mainly Chinese) of high quality, with good effect for quantizing Chinese models. You can also set `--dataset pileval`, using autoawq default quantization dataset, the language of this dataset is English. More details: including how to customize quantization dataset, can be found in [LLM Quantization Documentation](LLM-quantization.md).
240+
- `--dataset`: This parameter is already defined in InferArguments, for export it means quantization dataset. Default is `[]`. More details: including how to customize quantization dataset, can be found in [LLM Quantization Documentation](LLM-quantization.md).
241241
- `--quant_n_samples`: Quantization parameter, default is `256`. When set to `--quant_method awq`, if OOM occurs during quantization, you can moderately reduce `--quant_n_samples` and `--quant_seqlen`. `--quant_method gptq` generally does not encounter quantization OOM.
242242
- `--quant_seqlen`: Quantization parameter, default is `2048`.
243243
- `--quant_device_map`: Default is `'cpu'`, to save memory. You can specify 'cuda:0', 'auto', 'cpu', etc., representing the device to load model during quantization.

docs/source_en/LLM/LLM-quantization.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,16 +35,16 @@ Here we demonstrate AWQ and GPTQ quantization on the qwen1half-7b-chat model.
3535
# If OOM occurs during quantization, you can appropriately reduce `--quant_n_samples` (default 256) and `--quant_seqlen` (default 2048).
3636
# GPTQ-INT4 quantization (takes about 20 minutes using A100, memory usage: 7GB)
3737

38-
# AWQ: Use `ms-bench-mini` as the quantization dataset
38+
# AWQ: Use `alpaca-zh alpaca-en sharegpt-gpt4-mini` as the quantization dataset
3939
CUDA_VISIBLE_DEVICES=0 swift export \
4040
--model_type qwen1half-7b-chat --quant_bits 4 \
41-
--dataset ms-bench-mini --quant_method awq
41+
--dataset alpaca-zh alpaca-en sharegpt-gpt4-mini --quant_method awq
4242

43-
# GPTQ: Use `ms-bench-mini` as the quantization dataset
43+
# GPTQ: Use `alpaca-zh alpaca-en sharegpt-gpt4-mini` as the quantization dataset
4444
# For GPTQ quantization, please first refer to this issue: https://github.com/AutoGPTQ/AutoGPTQ/issues/439
4545
OMP_NUM_THREADS=14 CUDA_VISIBLE_DEVICES=0 swift export \
4646
--model_type qwen1half-7b-chat --quant_bits 4 \
47-
--dataset ms-bench-mini --quant_method gptq
47+
--dataset alpaca-zh alpaca-en sharegpt-gpt4-mini --quant_method gptq
4848

4949
# AWQ: Use custom quantization dataset (don't use the `--custom_val_dataset_path` parameter)
5050
# Same for GPTQ
@@ -67,10 +67,6 @@ CUDA_VISIBLE_DEVICES=0 swift infer \
6767
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
6868
```
6969

70-
**Comparison of quantization effects**:
71-
72-
The comparison shows inference results from the AWQ-INT4 model, GPTQ-INT4 model, and the original unquantized model. The quantized models maintain high quality output while enabling faster inference speeds.
73-
7470
## Fine-tuned Model
7571

7672
Assume you fine-tuned qwen1half-4b-chat using LoRA, and the model weights directory is: `output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx`.
@@ -79,11 +75,11 @@ Here we only introduce using the AWQ technique to quantize the fine-tuned model.
7975

8076
**Merge-LoRA & Quantization**
8177
```shell
82-
# Use `ms-bench-mini` as the quantization dataset
78+
# Use `alpaca-zh alpaca-en sharegpt-gpt4-mini` as the quantization dataset
8379
CUDA_VISIBLE_DEVICES=0 swift export \
8480
--ckpt_dir 'output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx' \
8581
--merge_lora true --quant_bits 4 \
86-
--dataset ms-bench-mini --quant_method awq
82+
--dataset alpaca-zh alpaca-en sharegpt-gpt4-mini --quant_method awq
8783

8884
# Use the dataset from fine-tuning as the quantization dataset
8985
CUDA_VISIBLE_DEVICES=0 swift export \

docs/source_en/LLM/Qwen1.5-best-practice.md

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -175,16 +175,16 @@ Next, we perform self-cognition fine-tuning on the model to train your own large
175175

176176
Using Python:
177177
```python
178-
# Experimental environment: A100
179-
# 26GB GPU memory
178+
# Experimental environment: 3090
179+
# 24GB GPU memory
180180
import os
181181
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
182182

183183
from swift.llm import DatasetName, ModelType, SftArguments, sft_main
184184

185185
sft_args = SftArguments(
186186
model_type=ModelType.qwen1half_7b_chat,
187-
dataset=[DatasetName.ms_bench_mini],
187+
dataset=[DatasetName.alpaca_zh, DatasetName.alpaca_en],
188188
train_dataset_sample=1000,
189189
logging_steps=5,
190190
max_length=2048,
@@ -193,8 +193,8 @@ sft_args = SftArguments(
193193
output_dir='output',
194194
lora_target_modules=['ALL'],
195195
self_cognition_sample=500,
196-
model_name=['Xiao Huang', 'Xiao Huang'],
197-
model_author=['ModelScope', 'ModelScope'])
196+
model_name=['小黄', 'Xiao Huang'],
197+
model_author=['魔搭', 'ModelScope'])
198198
output = sft_main(sft_args)
199199
best_model_checkpoint = output['best_model_checkpoint']
200200
print(f'best_model_checkpoint: {best_model_checkpoint}')
@@ -207,11 +207,11 @@ Using model parallelism:
207207
```shell
208208

209209
# Experimental environment: 2 * 3090
210-
# 2 * 19GB GPU memory
210+
# 2 * 18GB GPU memory
211211
CUDA_VISIBLE_DEVICES=0,1 \
212212
swift sft \
213213
--model_type qwen1half-7b-chat \
214-
--dataset ms-bench-mini \
214+
--dataset alpaca-zh alpaca-en \
215215
--train_dataset_sample 1000 \
216216
--logging_steps 5 \
217217
--max_length 2048 \
@@ -220,17 +220,19 @@ swift sft \
220220
--output_dir output \
221221
--lora_target_modules ALL \
222222
--self_cognition_sample 500 \
223-
--model_name Xiao Huang 'Xiao Huang' \
224-
--model_author ModelScope ModelScope \```
223+
--model_name 小黄 'Xiao Huang' \
224+
--model_author 魔搭 ModelScope \
225+
```
225226

226-
Script for distributed training using **zero3**:```shell
227+
script for distributed training using **zero2**:
228+
```shell
227229
# Experimental environment: 4 * 3090
228230
# 4 * 24GB GPU memory
229231
CUDA_VISIBLE_DEVICES=0,1,2,3 \
230232
NPROC_PER_NODE=4 \
231233
swift sft \
232234
--model_type qwen1half-7b-chat \
233-
--dataset ms-bench-mini \
235+
--dataset alpaca-zh alpaca-en \
234236
--train_dataset_sample 1000 \
235237
--logging_steps 5 \
236238
--max_length 2048 \
@@ -239,9 +241,10 @@ swift sft \
239241
--output_dir output \
240242
--lora_target_modules ALL \
241243
--self_cognition_sample 500 \
242-
--model_name Xiao Huang 'Xiao Huang' \
243-
--model_author ModelScope ModelScope \
244-
--deepspeed default-zero3 \```
244+
--model_name 小黄 'Xiao Huang' \
245+
--model_author 魔搭 ModelScope \
246+
--deepspeed default-zero2 \
247+
```
245248

246249
If you want to use **the interface to train**, you can enter the following command and fill in the corresponding values:
247250

@@ -408,7 +411,7 @@ for query in ['Who are you?', "what's your name?", 'Who developed you?']:
408411
messages.append({'role': 'assistant', 'content': response})
409412

410413
# streaming
411-
for query in ['78654+657=?', 'What to do if I can't fall asleep at night']:
414+
for query in ['78654+657=?', "What to do if I can't fall asleep at night"]:
412415
messages.append({'role': 'user', 'content': query})
413416
stream_resp = client.chat.completions.create(
414417
model=model_type,
@@ -483,7 +486,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 \
483486
NPROC_PER_NODE=4 \
484487
swift sft \
485488
--model_type qwen1half-72b-chat \
486-
--dataset ms-bench-mini \
489+
--dataset alpaca-zh alpaca-en \
487490
--train_dataset_sample 1000 \
488491
--logging_steps 5 \
489492
--max_length 4096 \
@@ -492,8 +495,8 @@ swift sft \
492495
--output_dir output \
493496
--lora_target_modules ALL \
494497
--self_cognition_sample 500 \
495-
--model_name Xiao Huang 'Xiao Huang' \
496-
--model_author ModelScope ModelScope \
498+
--model_name 小黄 'Xiao Huang' \
499+
--model_author 魔搭 ModelScope \
497500
--deepspeed default-zero3 \
498501
```
499502

@@ -570,7 +573,7 @@ for query in ['Who are you?', "what's your name?", 'Who developed you?']:
570573
messages.append({'role': 'assistant', 'content': response})
571574

572575
# streaming
573-
for query in ['78654+657=?', 'What to do if I can't fall asleep at night']:
576+
for query in ['78654+657=?', "What to do if I can't fall asleep at night"]:
574577
messages.append({'role': 'user', 'content': query})
575578
stream_resp = client.chat.completions.create(
576579
model=model_type,

0 commit comments

Comments
 (0)