Skip to content

Commit 03b8d9e

Browse files
authored
Feat 1121 (#165)
1 parent a95da87 commit 03b8d9e

File tree

86 files changed

+279
-265
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

86 files changed

+279
-265
lines changed

README.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ sft_args = SftArguments(
141141
dataset=[DatasetName.blossom_math_zh],
142142
output_dir='output',
143143
gradient_checkpointing=True)
144-
best_ckpt_dir = sft_main(sft_args)
144+
best_ckpt_dir = sft_main(sft_args)['best_model_checkpoint']
145145
print(f'best_ckpt_dir: {best_ckpt_dir}')
146146
torch.cuda.empty_cache()
147147
infer_args = InferArguments(
@@ -159,7 +159,11 @@ web_ui_main(infer_args)
159159
```bash
160160
# Experimental environment: A10, 3090, A100, ...
161161
# 20GB GPU memory
162-
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
162+
CUDA_VISIBLE_DEVICES=0 \
163+
swift sft \
164+
--model_id_or_path qwen/Qwen-7B-Chat \
165+
--dataset blossom-math-zh \
166+
--output_dir output \
163167

164168
# Using DDP
165169
# Experimental environment: 2 * 3090
@@ -169,18 +173,31 @@ NPROC_PER_NODE=2 \
169173
swift sft \
170174
--model_id_or_path qwen/Qwen-7B-Chat \
171175
--dataset blossom-math-zh \
176+
--output_dir output \
172177

173178
# Using custom dataset
174-
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --custom_train_dataset_path chatml.jsonl
179+
CUDA_VISIBLE_DEVICES=0 \
180+
swift sft \
181+
--model_id_or_path qwen/Qwen-7B-Chat \
182+
--custom_train_dataset_path chatml.jsonl \
183+
--output_dir output \
175184
```
176185

177186
**Inference**:
178187
```bash
188+
# Original Model
189+
CUDA_VISIBLE_DEVICES=0 swift infer --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
190+
191+
# Fine-tuned Model
179192
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
180193
```
181194

182195
**Web-UI**:
183196
```bash
197+
# Original Model
198+
CUDA_VISIBLE_DEVICES=0 swift web-ui --model_id_or_path qwen/Qwen-7B-Chat
199+
200+
# Fine-tuned Model
184201
CUDA_VISIBLE_DEVICES=0 swift web-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
185202
```
186203

README_CN.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ sft_args = SftArguments(
138138
dataset=[DatasetName.blossom_math_zh],
139139
output_dir='output',
140140
gradient_checkpointing=True)
141-
best_ckpt_dir = sft_main(sft_args)
141+
best_ckpt_dir = sft_main(sft_args)['best_model_checkpoint']
142142
print(f'best_ckpt_dir: {best_ckpt_dir}')
143143
torch.cuda.empty_cache()
144144
infer_args = InferArguments(
@@ -156,7 +156,11 @@ web_ui_main(infer_args)
156156
```bash
157157
# Experimental environment: A10, 3090, A100, ...
158158
# 20GB GPU memory
159-
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
159+
CUDA_VISIBLE_DEVICES=0 \
160+
swift sft \
161+
--model_id_or_path qwen/Qwen-7B-Chat \
162+
--dataset blossom-math-zh \
163+
--output_dir output \
160164

161165
# 使用DDP
162166
# Experimental environment: 2 * 3090
@@ -166,18 +170,31 @@ NPROC_PER_NODE=2 \
166170
swift sft \
167171
--model_id_or_path qwen/Qwen-7B-Chat \
168172
--dataset blossom-math-zh \
173+
--output_dir output \
169174

170175
# 使用自己的数据集
171-
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --custom_train_dataset_path chatml.jsonl
176+
CUDA_VISIBLE_DEVICES=0 \
177+
swift sft \
178+
--model_id_or_path qwen/Qwen-7B-Chat \
179+
--custom_train_dataset_path chatml.jsonl \
180+
--output_dir output \
172181
```
173182

174183
**推理**:
175184
```bash
185+
# 原始模型
186+
CUDA_VISIBLE_DEVICES=0 swift infer --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
187+
188+
# 微调后的模型
176189
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
177190
```
178191

179-
**Web-UI**
192+
**Web-UI**:
180193
```bash
194+
# 原始模型
195+
CUDA_VISIBLE_DEVICES=0 swift web-ui --model_id_or_path qwen/Qwen-7B-Chat
196+
197+
# 微调后的模型
181198
CUDA_VISIBLE_DEVICES=0 swift web-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
182199
```
183200

examples/pytorch/llm/README.md

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ sft_args = SftArguments(
104104
dataset=[DatasetName.blossom_math_zh],
105105
output_dir='output',
106106
gradient_checkpointing=True)
107-
best_ckpt_dir = sft_main(sft_args)
107+
best_ckpt_dir = sft_main(sft_args)['best_model_checkpoint']
108108
print(f'best_ckpt_dir: {best_ckpt_dir}')
109109
torch.cuda.empty_cache()
110110
infer_args = InferArguments(
@@ -122,7 +122,11 @@ web_ui_main(infer_args)
122122
```bash
123123
# Experimental environment: A10, 3090, A100, ...
124124
# 20GB GPU memory
125-
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
125+
CUDA_VISIBLE_DEVICES=0 \
126+
swift sft \
127+
--model_id_or_path qwen/Qwen-7B-Chat \
128+
--dataset blossom-math-zh \
129+
--output_dir output \
126130

127131
# Using DDP
128132
# Experimental environment: 2 * 3090
@@ -132,18 +136,31 @@ NPROC_PER_NODE=2 \
132136
swift sft \
133137
--model_id_or_path qwen/Qwen-7B-Chat \
134138
--dataset blossom-math-zh \
139+
--output_dir output \
135140

136141
# Using custom dataset
137-
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --custom_train_dataset_path chatml.jsonl
142+
CUDA_VISIBLE_DEVICES=0 \
143+
swift sft \
144+
--model_id_or_path qwen/Qwen-7B-Chat \
145+
--custom_train_dataset_path chatml.jsonl \
146+
--output_dir output \
138147
```
139148

140149
**Inference**:
141150
```bash
151+
# Original Model
152+
CUDA_VISIBLE_DEVICES=0 swift infer --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
153+
154+
# Fine-tuned Model
142155
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
143156
```
144157

145158
**Web-UI**:
146159
```bash
160+
# Original Model
161+
CUDA_VISIBLE_DEVICES=0 swift web-ui --model_id_or_path qwen/Qwen-7B-Chat
162+
163+
# Fine-tuned Model
147164
CUDA_VISIBLE_DEVICES=0 swift web-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
148165
```
149166

@@ -574,9 +591,9 @@ The template initialization function retrieves the complete chat template based
574591
-- `check_model_is_latest`: Check if the model is the latest, default is `True`. If you need to train without internet connection, please set this parameter to `False`.
575592
- `--max_new_tokens`: The maximum number of new tokens to generate. The default value is `2048`. This parameter only takes effect when `predict_with_generate` is set to True.
576593
- `--do_sample`: Whether to use sampling during generation. The default value is `True`. This parameter only takes effect when `predict_with_generate` is set to True.
577-
- `--temperature`: The temperature value for sampling during generation. The default value is `0.9`. This parameter only takes effect when `predict_with_generate` is set to True.
594+
- `--temperature`: The temperature value for sampling during generation. The default value is `0.3`. This parameter only takes effect when `predict_with_generate` is set to True.
578595
- `--top_k`: The value of k for top-k sampling during generation. The default value is `20`. This parameter only takes effect when `predict_with_generate` is set to True.
579-
- `--top_p`: The cumulative probability threshold for top-p sampling during generation. The default value is `0.9`. This parameter only takes effect when `predict_with_generate` is set to True.
596+
- `--top_p`: The cumulative probability threshold for top-p sampling during generation. The default value is `0.7`. This parameter only takes effect when `predict_with_generate` is set to True.
580597
- `--repetition_penalty`: The repetition penalty applied during generation. The default value is `1.05`. This parameter only takes effect when `predict_with_generate` is set to True.
581598

582599

@@ -606,9 +623,9 @@ The template initialization function retrieves the complete chat template based
606623
- `--bnb_4bit_use_double_quant`: Default value is `True`. For specific parameter details, please refer to the `sft.sh Command Line Arguments`. This parameter is not effective if `quantization_bit` is set to 0.
607624
- `--max_new_tokens`: Maximum number of new tokens to generate. Default value is `2048`.
608625
- `--do_sample`: Whether to use greedy decoding or sampling for generation. Default value is `True`.
609-
- `--temperature`: Default value is `0.9`. This parameter only takes effect when `do_sample` is set to True.
626+
- `--temperature`: Default value is `0.3`. This parameter only takes effect when `do_sample` is set to True.
610627
- `--top_k`: Default value is `20`. This parameter only takes effect when `do_sample` is set to True.
611-
- `--top_p`: Default value is `0.9`. This parameter only takes effect when `do_sample` is set to True.
628+
- `--top_p`: Default value is `0.7`. This parameter only takes effect when `do_sample` is set to True.
612629
- `--repetition_penalty`: Default value is `1.05`.
613630
- `--use_flash_attn`: Default value is `None`, which means 'auto'. For specific parameter details, please refer to the `sft.sh Command Line Arguments`. The models that support 'flash_attn' include: qwen series, qwen-vl series, llama series, openbuddy series, mistral series, yi series, ziya series.
614631
- `--ignore_args_error`: Default value is `False`. For specific parameter details, please refer to the `sft.sh Command Line Arguments`.

examples/pytorch/llm/README_CN.md

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ sft_args = SftArguments(
103103
dataset=[DatasetName.blossom_math_zh],
104104
output_dir='output',
105105
gradient_checkpointing=True)
106-
best_ckpt_dir = sft_main(sft_args)
106+
best_ckpt_dir = sft_main(sft_args)['best_model_checkpoint']
107107
print(f'best_ckpt_dir: {best_ckpt_dir}')
108108
torch.cuda.empty_cache()
109109
infer_args = InferArguments(
@@ -121,7 +121,11 @@ web_ui_main(infer_args)
121121
```bash
122122
# Experimental environment: A10, 3090, A100, ...
123123
# 20GB GPU memory
124-
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
124+
CUDA_VISIBLE_DEVICES=0 \
125+
swift sft \
126+
--model_id_or_path qwen/Qwen-7B-Chat \
127+
--dataset blossom-math-zh \
128+
--output_dir output \
125129

126130
# 使用DDP
127131
# Experimental environment: 2 * 3090
@@ -131,18 +135,31 @@ NPROC_PER_NODE=2 \
131135
swift sft \
132136
--model_id_or_path qwen/Qwen-7B-Chat \
133137
--dataset blossom-math-zh \
138+
--output_dir output \
134139

135140
# 使用自己的数据集
136-
CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat --custom_train_dataset_path chatml.jsonl
141+
CUDA_VISIBLE_DEVICES=0 \
142+
swift sft \
143+
--model_id_or_path qwen/Qwen-7B-Chat \
144+
--custom_train_dataset_path chatml.jsonl \
145+
--output_dir output \
137146
```
138147

139148
**推理**:
140149
```bash
150+
# 原始模型
151+
CUDA_VISIBLE_DEVICES=0 swift infer --model_id_or_path qwen/Qwen-7B-Chat --dataset blossom-math-zh
152+
153+
# 微调后的模型
141154
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
142155
```
143156

144-
**Web-UI**
157+
**Web-UI**:
145158
```bash
159+
# 原始模型
160+
CUDA_VISIBLE_DEVICES=0 swift web-ui --model_id_or_path qwen/Qwen-7B-Chat
161+
162+
# 微调后的模型
146163
CUDA_VISIBLE_DEVICES=0 swift web-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
147164
```
148165

@@ -577,9 +594,9 @@ if __name__ == '__main__':
577594
- `--check_model_is_latest`: 检查模型是否是最新, 默认为`True`. 如果你需要断网进行训练, 请将该参数设置为`False`.
578595
- `--max_new_tokens`: 默认为`2048`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
579596
- `--do_sample`: 默认为`True`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
580-
- `--temperature`: 默认为`0.9`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
597+
- `--temperature`: 默认为`0.3`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
581598
- `--top_k`: 默认为`20`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
582-
- `--top_p`: 默认为`0.9`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
599+
- `--top_p`: 默认为`0.7`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
583600
- `--repetition_penalty`: 默认为`1.05`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
584601

585602

@@ -609,9 +626,9 @@ if __name__ == '__main__':
609626
- `--bnb_4bit_use_double_quant`: 默认值为`True`. 具体的参数介绍可以在`sft.sh命令行参数`中查看. 若`quantization_bit`设置为0, 则该参数失效.
610627
- `--max_new_tokens`: 生成新token的最大数量, 默认值为`2048`.
611628
- `--do_sample`: 是使用贪婪生成的方式还是采样生成的方式, 默认值为`True`.
612-
- `--temperature`: 默认值为`0.9`. 该参数只有在`do_sample`设置为True时才生效.
629+
- `--temperature`: 默认值为`0.3`. 该参数只有在`do_sample`设置为True时才生效.
613630
- `--top_k`: 默认值为`20`. 该参数只有在`do_sample`设置为True时才生效.
614-
- `--top_p`: 默认值为`0.9`. 该参数只有在`do_sample`设置为True时才生效.
631+
- `--top_p`: 默认值为`0.7`. 该参数只有在`do_sample`设置为True时才生效.
615632
- `--repetition_penalty`: 默认值为`1.05`.
616633
- `--use_flash_attn`: 默认值为`None`, 即为'auto'. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
617634
- `--ignore_args_error`: 默认值为`False`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.

examples/pytorch/llm/app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,5 @@
1212
# or chat
1313
args = InferArguments(model_type=ModelType.qwen_7b_chat_int4)
1414
# or load from ckpt dir
15-
# args = InferArguments(ckpt_dir='xxx/vx_xxx/checkpoint-xxx', load_args_from_ckpt_dir=True)
15+
# args = InferArguments(ckpt_dir='xxx/vx_xxx/checkpoint-xxx')
1616
web_ui_main(args)

examples/pytorch/llm/llm_infer.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@
44
from swift.llm.run import infer_main
55

66
if __name__ == '__main__':
7-
infer_main()
7+
result = infer_main()
8+
print(f'infer_main result: {result}')

examples/pytorch/llm/llm_sft.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@
44
from swift.llm.run import sft_main
55

66
if __name__ == '__main__':
7-
best_ckpt_dir = sft_main()
8-
print(f'best_ckpt_dir: {best_ckpt_dir}')
7+
output = sft_main()
8+
print(f'sft_main output: {output}')

examples/pytorch/llm/scripts/baichuan2_13b_chat/lora_ddp_ds/infer.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,8 @@ python llm_infer.py \
88
--eval_human false \
99
--max_length 4096 \
1010
--max_new_tokens 2048 \
11-
--temperature 0.9 \
12-
--top_k 20 \
13-
--top_p 0.9 \
11+
--temperature 0.1 \
12+
--top_p 0.7 \
1413
--repetition_penalty 1.05 \
1514
--do_sample true \
1615
--merge_lora_and_save false \

examples/pytorch/llm/scripts/baichuan2_13b_chat/lora_mp_ddp/infer.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,8 @@ python llm_infer.py \
88
--eval_human false \
99
--max_length 2048 \
1010
--max_new_tokens 2048 \
11-
--temperature 0.9 \
12-
--top_k 20 \
13-
--top_p 0.9 \
11+
--temperature 0.1 \
12+
--top_p 0.7 \
1413
--repetition_penalty 1.05 \
1514
--do_sample true \
1615
--merge_lora_and_save false \

examples/pytorch/llm/scripts/baichuan2_13b_chat/qlora_ddp_ds/infer.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,8 @@ python llm_infer.py \
77
--eval_human false \
88
--max_length 4096 \
99
--max_new_tokens 2048 \
10-
--temperature 0.9 \
11-
--top_k 20 \
12-
--top_p 0.9 \
10+
--temperature 0.1 \
11+
--top_p 0.7 \
1312
--repetition_penalty 1.05 \
1413
--do_sample true \
1514
--merge_lora_and_save false \

0 commit comments

Comments
 (0)