Skip to content

Commit acf3665

Browse files
Jintao-Huangtastelikefeet
authored andcommitted
update multi-line input (infer) (#196)
(cherry picked from commit f3fcadf)
1 parent 8a4a955 commit acf3665

27 files changed

+195
-49
lines changed

README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,10 @@
1515
<p align="center">
1616
<img src="https://img.shields.io/badge/python-%E2%89%A53.8-5be.svg">
1717
<img src="https://img.shields.io/badge/pytorch-%E2%89%A51.12%20%7C%20%E2%89%A52.0-orange.svg">
18-
<a href="https://github.com/modelscope/modelscope/"><img src="https://img.shields.io/badge/modelscope-%E2%89%A51.9.3-5D91D4.svg"></a>
18+
<a href="https://github.com/modelscope/modelscope/"><img src="https://img.shields.io/badge/modelscope-%E2%89%A51.9.5-5D91D4.svg"></a>
19+
<a href="https://pypi.org/project/ms-swift/"><img src="https://badge.fury.io/py/ms-swift.svg"></a>
20+
<a href="https://github.com/modelscope/swift/blob/main/LICENSE"><img src="https://img.shields.io/github/license/modelscope/swift"></a>
21+
<a href="https://pepy.tech/project/ms-swift"><img src="https://pepy.tech/badge/ms-swift"></a>
1922
<a href="https://github.com/modelscope/swift/"><img src="https://img.shields.io/badge/ms--swift-Build from source-6FEBB9.svg"></a>
2023
</p>
2124

@@ -70,6 +73,8 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
7073
- 🔥 2023.11.10: Support for **bluelm** series models: bluelm-7b, bluelm-7b-chat, bluelm-7b-32k, bluelm-7b-chat-32k. The corresponding shell script can be found in [bluelm_7b_chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/bluelm_7b_chat).
7174
- 🔥 2023.11.08: Support the finetuning of **xverse-65b** model, scripts can be found at: [xverse_65b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/xverse_65b).
7275
- 🔥 2023.11.07: Support the finetuning of **yi-6b**, **yi-34b** model, scripts can be found at: [yi_6b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_6b), [yi_34b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_34b).
76+
<details><summary>More</summary>
77+
7378
- 🔥 2023.10.30: Support **QA-LoRA** and **LongLoRA** to decrease memory usage in training.
7479
- 🔥 2023.10.30: Support **ROME**(Rank One Model Editing) to add/modify knowledges, training is not needed!
7580
- 2023.10.30: Support for **skywork-13b** series models: skywork-13b, skywork-13b-chat. The corresponding shell script can be found in [skywork_13b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/skywork_13b).
@@ -79,6 +84,12 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
7984
- 2023.10.12: Supported **mistral-7b** model series: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat.
8085
- 🔥 2023.10.7: Supported **DeepSpeed ZeRO-2**, enabling LoRA (not just QLoRA) to run DDP on 2*A10.
8186
- 2023.10.4: Supported datasets in the fields of mathematics, law, SQL, and coding: blossom-math-zh, school-math-zh, text2sql-en, sql-create-context-en, lawyer-llama-zh, tigerbot-law-zh, leetcode-python-en.
87+
- 🔥 2023.9.25: Supported **qwen-14b** model series: qwen-14b, qwen-14b-chat.
88+
- 2023.9.18: Supported **internlm-20b** model series: internlm-20b, internlm-20b-chat.
89+
- 2023.9.12: Supported training with **MP+DDP** to accelerate full-parameter fine-tuning speed.
90+
- 2023.9.5: Supported **openbuddy-llama2-70b-chat** model.
91+
- 2023.9.3: Supported **baichuan2** model series: baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat.
92+
</details>
8293

8394

8495
## ✨ LLM Training and Inference

README_CN.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,10 @@
1515
<p align="center">
1616
<img src="https://img.shields.io/badge/python-%E2%89%A53.8-5be.svg">
1717
<img src="https://img.shields.io/badge/pytorch-%E2%89%A51.12%20%7C%20%E2%89%A52.0-orange.svg">
18-
<a href="https://github.com/modelscope/modelscope/"><img src="https://img.shields.io/badge/modelscope-%E2%89%A51.9.3-5D91D4.svg"></a>
18+
<a href="https://github.com/modelscope/modelscope/"><img src="https://img.shields.io/badge/modelscope-%E2%89%A51.9.5-5D91D4.svg"></a>
19+
<a href="https://pypi.org/project/ms-swift/"><img src="https://badge.fury.io/py/ms-swift.svg"></a>
20+
<a href="https://github.com/modelscope/swift/blob/main/LICENSE"><img src="https://img.shields.io/github/license/modelscope/swift"></a>
21+
<a href="https://pepy.tech/project/ms-swift"><img src="https://pepy.tech/badge/ms-swift"></a>
1922
<a href="https://github.com/modelscope/swift/"><img src="https://img.shields.io/badge/ms--swift-Build from source-6FEBB9.svg"></a>
2023
</p>
2124

@@ -68,6 +71,8 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
6871
- 🔥 2023.11.10: 支持**bluelm**系列模型: bluelm-7b, bluelm-7b-chat, bluelm-7b-32k, bluelm-7b-chat-32k. 对应的sh脚本可以查看[bluelm_7b_chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/bluelm_7b_chat).
6972
- 🔥 2023.11.08: 支持**xverse-65b**模型的训练和推理流程,脚本在[xverse_65b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/xverse_65b).
7073
- 🔥 2023.11.07: 支持**yi-6b**, **yi-34b**模型的训练和推理流程,脚本在[yi_6b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_6b), [yi_34b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_34b).
74+
<details><summary>更多</summary>
75+
7176
- 🔥 2023.10.30: 支持 **QA-LoRA****LongLoRA**两种新的tuners.
7277
- 🔥 2023.10.30: 支持使用**ROME**(Rank One Model Editing)来编辑模型,在无需训练的情况下即可给模型灌注新知识!
7378
- 2023.10.30: 支持**skywork-13b**系列模型: skywork-13b, skywork-13b-chat. 对应的sh脚本可以查看[skywork_13b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/skywork_13b).
@@ -77,6 +82,12 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
7782
- 2023.10.12: 支持**mistral-7b**系列模型: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat.
7883
- 🔥 2023.10.7: 支持**DeepSpeed ZeRO-2**, 使得lora(不仅仅是qlora)可以在双卡A10上运行DDP.
7984
- 2023.10.4: 支持更多数学, 法律, SQL, 代码领域的数据集: blossom-math-zh, school-math-zh, text2sql-en, sql-create-context-en, lawyer-llama-zh, tigerbot-law-zh, leetcode-python-en.
85+
- 🔥 2023.9.25: Supported **qwen-14b** model series: qwen-14b, qwen-14b-chat.
86+
- 2023.9.18: Supported **internlm-20b** model series: internlm-20b, internlm-20b-chat.
87+
- 2023.9.12: Supported training with **MP+DDP** to accelerate full-parameter fine-tuning speed.
88+
- 2023.9.5: Supported **openbuddy-llama2-70b-chat** model.
89+
- 2023.9.3: Supported **baichuan2** model series: baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat.
90+
</details>
8091

8192

8293
## ✨ 大模型训练推理

docs/source/LLM/支持的模型和数据集.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@
9696
- Dataset Name: 数据集在swift中注册的dataset\_name.
9797
- Dataset ID: 数据集在[ModelScope](https://www.modelscope.cn/my/overview)上的dataset\_id.
9898
- Size: 数据集中的数据样本数量.
99-
- Statistic: 数据集的统计量. 我们使用token数进行统计, 这对于调整`max_length`超参数有帮助. 我们将数据集的训练集和验证集进行拼接, 然后进行统计. 我们使用qwen的tokenizer对数据集进行分词. 不同的tokenizer的统计量不同, 如果你要获取其他的模型的tokenizer的token统计量, 可以通过[脚本](https://github.com/modelscope/swift/tree/main/benchmark/run_dataset.py)自行获取.
99+
- Statistic: 数据集的统计量. 我们使用token数进行统计, 这对于调整`max_length`超参数有帮助. 我们将数据集的训练集和验证集进行拼接, 然后进行统计. 我们使用qwen的tokenizer对数据集进行分词. 不同的tokenizer的统计量不同, 如果你要获取其他的模型的tokenizer的token统计量, 可以通过[脚本](https://github.com/modelscope/swift/tree/main/scripts/utils/run_dataset_info.py)自行获取.
100100

101101
| Dataset Name | Dataset ID | Train Size | Val Size | Statistic (token) | Tags |
102102
| ------------ | ---------- | ---------- | -------- | ----------------- | ---- |

docs/source/LLM/自定义与拓展.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,6 @@
3030

3131
**格式1:**
3232

33-
```csv
34-
instruction,input,output
35-
11111,22222,33333
36-
aaaaa,bbbbb,ccccc
37-
AAAAA,BBBBB,CCCCC
38-
```
39-
40-
**格式2:**
41-
4233
Pretraining
4334

4435
```csv
@@ -83,6 +74,15 @@ Multi-Round Dialogue
8374
{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]]}]
8475
```
8576

77+
**格式2:**
78+
79+
```csv
80+
instruction,input,output
81+
11111,22222,33333
82+
aaaaa,bbbbb,ccccc
83+
AAAAA,BBBBB,CCCCC
84+
```
85+
8686
**格式3:**
8787

8888
```jsonl

docs/source/LLM/自我认知微调最佳实践.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ pip install -r requirements/llm.txt -U
2525
```
2626

2727
## 微调前推理
28-
如果你要进行单样本推理, 可以参考[LLM推理文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E6%8E%A8%E7%90%86%E6%96%87%E6%A1%A3.md#qwen-7b-chat)
2928

3029
使用python:
3130
```python
@@ -69,6 +68,7 @@ My name is QianWen, developed by Alibaba Cloud. I am designed to answer various
6968
如果以上方法都不能帮助你改善睡眠,建议你咨询医生或专业的睡眠治疗师。
7069
"""
7170
```
71+
如果你要进行单样本推理, 可以参考[LLM推理文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E6%8E%A8%E7%90%86%E6%96%87%E6%A1%A3.md#qwen-7b-chat)
7272

7373
使用CLI:
7474
```bash

scripts/utils/test_readme.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import os
2+
import re
3+
4+
import torch
5+
from modelscope import snapshot_download
6+
7+
from swift.llm import MODEL_MAPPING
8+
9+
10+
def test_readme():
11+
for model_type in MODEL_MAPPING.keys():
12+
model_id = MODEL_MAPPING[model_type]['model_id_or_path']
13+
model_dir = snapshot_download(model_id, revision='master')
14+
readme_path = os.path.join(model_dir, 'README.md')
15+
assert os.path.exists(readme_path)
16+
with open(readme_path, 'r') as f:
17+
text = f.read()
18+
19+
code_list = re.findall(r'```python\n(.+?)\n```', text, re.M | re.S)
20+
print(f'model_type: {model_type}')
21+
for code in code_list:
22+
if 'import' not in code or 'modelscope' not in code:
23+
continue
24+
try:
25+
exec(code)
26+
except Exception:
27+
print(code)
28+
input('[ENTER')
29+
torch.cuda.empty_cache()
30+
31+
32+
if __name__ == '__main__':
33+
test_readme()

swift/llm/infer.py

Lines changed: 35 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,12 @@
77
import json
88
import torch
99
from modelscope import BitsAndBytesConfig, GenerationConfig
10+
from tqdm import tqdm
1011
from transformers import PreTrainedModel
1112

1213
from swift.tuners import Swift
1314
from swift.utils import (append_to_jsonl, get_logger, print_model_info,
14-
seed_everything, show_layers)
15+
read_multi_line, seed_everything, show_layers)
1516
from .utils import (InferArguments, Template, get_dataset, get_model_tokenizer,
1617
get_template, inference, inference_stream)
1718

@@ -157,21 +158,35 @@ def llm_infer(args: InferArguments) -> None:
157158
if args.save_result and args.ckpt_dir is not None:
158159
time = dt.datetime.now().strftime('%Y%m%d-%H%M%S')
159160
jsonl_path = os.path.join(args.ckpt_dir, f'infer_result_{time}.jsonl')
161+
input_mode: Literal['S', 'M'] = 'S'
160162
if args.eval_human:
161-
print_str = 'Input `exit` to exit the conversation'
163+
logger.info('Input `exit` to exit the conversation.')
164+
logger.info('Input `multi-line` to switch to multi-line input mode.')
162165
if template.support_multi_round:
163-
print_str += ', input `clear` to clear the history.'
166+
logger.info('Input `clear` to clear the history.')
164167
else:
165-
print_str += ', The current template only supports single-round dialogues.'
166-
logger.info(print_str)
168+
logger.info(
169+
'The current template only supports single-round dialogues.')
167170
history = []
168171
while True:
169-
query = input('<<< ')
172+
if input_mode == 'S':
173+
query = input('<<< ')
174+
else:
175+
query = read_multi_line()
170176
if query.strip().lower() == 'exit':
171177
break
172178
elif query.strip().lower() == 'clear':
173179
history = []
174180
continue
181+
if input_mode == 'S' and query.strip().lower() == 'multi-line':
182+
input_mode = 'M'
183+
logger.info('End multi-line input with `#`.')
184+
logger.info(
185+
'Input `single-line` to switch to single-line input mode.')
186+
continue
187+
if input_mode == 'M' and query.strip().lower() == 'single-line':
188+
input_mode == 'S'
189+
continue
175190
if not template.support_multi_round:
176191
history = []
177192
gen = inference_stream(model, template, query, history)
@@ -198,24 +213,33 @@ def llm_infer(args: InferArguments) -> None:
198213
val_dataset = val_dataset.select(
199214
range(min(args.val_dataset_sample, val_dataset.shape[0])))
200215
logger.info(f'val_dataset: {val_dataset}')
216+
if args.verbose is None:
217+
if len(val_dataset) >= 100:
218+
args.verbose = False
219+
else:
220+
args.verbose = True
221+
logger.info(f'Setting args.verbose: {args.verbose}')
222+
if not args.verbose:
223+
val_dataset = tqdm(val_dataset)
201224
for data in val_dataset:
202225
_, history = inference(
203226
model,
204227
template,
205228
data.get('query'),
206229
data.get('history'),
207230
data.get('system'),
208-
stream=args.stream,
209-
verbose=True)
231+
stream=args.stream and args.verbose,
232+
verbose=args.verbose)
210233
label = data.get('response')
211234
item = history[0]
212235
obj = {'query': item[0], 'response': item[1], 'label': label}
213236
if jsonl_path is not None:
214237
append_to_jsonl(jsonl_path, obj)
215238
result.append(obj)
216-
print()
217-
print(f'[LABELS]{label}')
218-
print('-' * 50)
239+
if args.verbose:
240+
print()
241+
print(f'[LABELS]{label}')
242+
print('-' * 50)
219243
if args.save_result and args.ckpt_dir is not None:
220244
logger.info(f'save_result_path: {jsonl_path}')
221245
return {'result': result}

swift/llm/sft.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -141,8 +141,7 @@ def llm_sft(args: SftArguments) -> str:
141141
if val_dataset is not None and val_dataset_sample is not None and val_dataset_sample >= 0:
142142
if val_dataset.shape[0] > val_dataset_sample:
143143
logger.info(f'val_dataset_sample: {val_dataset_sample}')
144-
val_idxs = random_state.permutation(val_dataset_sample)
145-
val_dataset = val_dataset.select(val_idxs)
144+
val_dataset = val_dataset.select(range(val_dataset_sample))
146145
# add self-cognition dataset
147146
if args.self_cognition_sample > 0:
148147
train_dataset = add_self_cognition_dataset(train_dataset,

swift/llm/utils/argument.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# Copyright (c) Alibaba, Inc. and its affiliates.
2+
import math
23
import os
34
from dataclasses import dataclass, field
45
from typing import List, Optional, Set, Tuple, Union
@@ -101,7 +102,7 @@ class SftArguments:
101102
optim: str = 'adamw_torch'
102103
learning_rate: Optional[float] = None
103104
weight_decay: float = 0.01
104-
gradient_accumulation_steps: int = 16
105+
gradient_accumulation_steps: Optional[int] = None
105106
max_grad_norm: float = 0.5
106107
predict_with_generate: bool = False
107108
lr_scheduler_type: str = 'cosine'
@@ -186,8 +187,9 @@ def __post_init__(self) -> None:
186187
logger.info(f'output_dir: {self.output_dir}')
187188

188189
self.torch_dtype, self.fp16, self.bf16 = select_dtype(self)
190+
world_size = 1
189191
if is_dist():
190-
rank, local_rank, _, _ = get_dist_setting()
192+
rank, local_rank, world_size, _ = get_dist_setting()
191193
torch.cuda.set_device(local_rank)
192194
self.seed += rank # Avoid the same dropout
193195
if self.ddp_backend == 'gloo' and self.quantization_bit != 0:
@@ -267,6 +269,9 @@ def __post_init__(self) -> None:
267269
self.logging_dir = f'{self.output_dir}/runs'
268270
if self.report_to is None:
269271
self.report_to == ['all']
272+
if self.gradient_accumulation_steps is None:
273+
self.gradient_accumulation_steps = math.ceil(16 / self.batch_size
274+
/ world_size)
270275

271276

272277
@dataclass
@@ -334,6 +339,7 @@ class InferArguments:
334339
stream: bool = True
335340
merge_lora_and_save: bool = False
336341
overwrite_generation_config: bool = False
342+
verbose: Optional[bool] = None
337343
# compatibility
338344
show_dataset_sample: int = 10
339345

@@ -485,11 +491,15 @@ def set_model_type(args: Union[SftArguments, InferArguments]) -> None:
485491
model_id_or_path = args.model_id_or_path
486492
model_id_or_path_lower = model_id_or_path.lower()
487493
if model_id_or_path_lower not in model_mapping_reversed:
488-
error_msg = f"`model_id_or_path`: '{model_id_or_path}' is not registered."
489-
if os.path.exists(model_id_or_path):
490-
error_msg += (
491-
' Please use `--model_id_or_path <model_id> --model_cache_dir <local_path>` '
492-
'to specify the local cache path for the model.')
494+
if isinstance(args,
495+
InferArguments) and 'checkpoint' in model_id_or_path:
496+
error_msg = 'Please use `--ckpt_dir vx_xxx/checkpoint-xxx` to use the checkpoint.'
497+
else:
498+
error_msg = f"`model_id_or_path`: '{model_id_or_path}' is not registered."
499+
if os.path.exists(model_id_or_path):
500+
error_msg += (
501+
' Please use `--model_id_or_path <model_id> --model_cache_dir <local_path>` '
502+
'to specify the local cache path for the model.')
493503
raise ValueError(error_msg)
494504
args.model_type = model_mapping_reversed[model_id_or_path_lower]
495505

swift/llm/utils/dataset.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -912,6 +912,7 @@ def get_dataset(
912912
train_subset_split_list=dataset_info['train_subset_split_list'],
913913
val_subset_split_list=dataset_info['val_subset_split_list'],
914914
preprocess_func=dataset_info['preprocess_func'])
915+
train_d: HfDataset
915916
if isinstance(dataset, (list, tuple)):
916917
train_d, val_d = dataset
917918
else:

0 commit comments

Comments
 (0)