Skip to content

Commit cb33ecd

Browse files
committed
Merge branch 'main' into release/2.2
2 parents 99cb37f + 53c14d2 commit cb33ecd

File tree

123 files changed

+4540
-1799
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

123 files changed

+4540
-1799
lines changed

README.md

Lines changed: 174 additions & 165 deletions
Large diffs are not rendered by default.

README_CN.md

Lines changed: 146 additions & 142 deletions
Large diffs are not rendered by default.

asset/discord_qr.jpg

70.7 KB
Loading

docs/source/.readthedocs.yaml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# .readthedocs.yaml
2+
# Read the Docs configuration file
3+
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
4+
5+
# Required
6+
version: 2
7+
8+
# Set the OS, Python version and other tools you might need
9+
build:
10+
os: ubuntu-22.04
11+
tools:
12+
python: "3.12"
13+
14+
# Build documentation in the "docs/" directory with Sphinx
15+
sphinx:
16+
configuration: docs/source/conf.py
17+
18+
# Optionally build your docs in additional formats such as PDF and ePub
19+
# formats:
20+
# - pdf
21+
# - epub
22+
23+
# Optional but recommended, declare the Python requirements required
24+
# to build your documentation
25+
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
26+
python:
27+
install:
28+
- requirements: requirements/docs.txt
29+
- requirements: requirements/framework.txt
30+
- requirements: requirements/llm.txt

docs/source/GetStarted/界面训练推理.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ swift web-ui
88

99
开启界面训练和推理。
1010

11-
web-ui没有传入参数,所有可控部分都在界面中。但是有几个环境变量可以使用
11+
web-ui可以通过环境变量或者参数控制UI行为。环境变量如下
1212

1313
> WEBUI_SHARE=1/0 默认为0 控制gradio是否是share状态
1414
>
@@ -19,3 +19,5 @@ web-ui没有传入参数,所有可控部分都在界面中。但是有几个
1919
> WEBUI_PORT web-ui的端口号
2020
>
2121
> USE_INFERENCE=1/0 默认0. 控制gradio的推理页面是直接加载模型推理或者部署(USE_INFERENCE=0)
22+
23+
如果使用参数,请参考[命令行参数](../LLM/命令行参数.md#web-ui-参数)

docs/source/LLM/Agent微调最佳实践.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ Final Answer: 如果您想要一款拍照表现出色的手机,我为您推荐
165165
| ms-bench | 60000(抽样) |
166166
| self-recognition | 3000(重复抽样) |
167167

168-
我们也支持使用自己的Agent数据集。数据集格式需要符合[自定义数据集](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86)的要求。更具体地,Agent的response/system应该符合上述的Action/Action Input/Observation格式。
168+
我们也支持使用自己的Agent数据集。数据集格式需要符合[自定义数据集](%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86)的要求。更具体地,Agent的response/system应该符合上述的Action/Action Input/Observation格式。
169169

170170
我们将**MLP****Embedder**加入了lora_target_modules. 你可以通过指定`--lora_target_modules ALL`在所有的linear层(包括qkvo以及mlp和embedder)加lora. 这**通常是效果最好的**.
171171

docs/source/LLM/LLM微调文档.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ pip install -r requirements/llm.txt -U
3737
```
3838

3939
## 微调
40-
如果你要使用界面的方式进行微调与推理, 可以查看[界面训练与推理文档](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
40+
如果你要使用界面的方式进行微调与推理, 可以查看[界面训练与推理文档](../GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
4141

4242
### 使用python
4343
```python
@@ -100,6 +100,7 @@ swift sft \
100100
--output_dir output \
101101

102102
# 多机多卡
103+
# 如果多机共用磁盘请在各机器sh中额外指定`--save_on_each_node false`.
103104
# node0
104105
CUDA_VISIBLE_DEVICES=0,1,2,3 \
105106
NNODES=2 \
@@ -246,6 +247,7 @@ print(f'history: {history}')
246247

247248
使用**数据集**评估:
248249
```bash
250+
# 如果要推理所有数据集样本, 请额外指定`--show_dataset_sample -1`
249251
# 直接推理
250252
CUDA_VISIBLE_DEVICES=0 swift infer \
251253
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' \

docs/source/LLM/LLM量化文档.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,7 +305,7 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
305305
```
306306

307307
**注意**
308-
- hqq支持更多自定义参数,比如为不同网络层指定不同量化配置,具体请见[命令行参数](https://github.com/modelscope/swift/blob/main/docs/source/LLM/命令行参数.md)
308+
- hqq支持更多自定义参数,比如为不同网络层指定不同量化配置,具体请见[命令行参数](命令行参数.md)
309309
- eetq量化为8bit量化,无需指定quantization_bit。目前不支持bf16,需要指定dtype为fp16
310310
- eetq目前qlora速度比较慢,推荐使用hqq。参考[issue](https://github.com/NetEase-FuXi/EETQ/issues/17)
311311

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# LmDeploy推理加速与部署
2+
3+
## 目录
4+
- [环境准备](#环境准备)
5+
- [推理加速](#推理加速)
6+
- [部署](#部署)
7+
- [多模态](#多模态)
8+
9+
## 环境准备
10+
GPU设备: A10, 3090, V100, A100均可.
11+
```bash
12+
# 设置pip全局镜像 (加速下载)
13+
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
14+
# 安装ms-swift
15+
git clone https://github.com/modelscope/swift.git
16+
cd swift
17+
pip install -e '.[llm]'
18+
19+
pip install lmdeploy
20+
```
21+
22+
## 推理加速
23+
24+
### 使用python
25+
26+
```python
27+
import os
28+
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
29+
30+
from swift.llm import (
31+
ModelType, get_lmdeploy_engine, get_default_template_type,
32+
get_template, inference_lmdeploy, inference_stream_lmdeploy
33+
)
34+
35+
model_type = ModelType.qwen_7b_chat
36+
lmdeploy_engine = get_lmdeploy_engine(model_type)
37+
template_type = get_default_template_type(model_type)
38+
template = get_template(template_type, lmdeploy_engine.hf_tokenizer)
39+
# 与`transformers.GenerationConfig`类似的接口
40+
lmdeploy_engine.generation_config.max_new_tokens = 256
41+
generation_info = {}
42+
43+
request_list = [{'query': '你好!'}, {'query': '浙江的省会在哪?'}]
44+
resp_list = inference_lmdeploy(lmdeploy_engine, template, request_list, generation_info=generation_info)
45+
for request, resp in zip(request_list, resp_list):
46+
print(f"query: {request['query']}")
47+
print(f"response: {resp['response']}")
48+
print(generation_info)
49+
50+
# stream
51+
history1 = resp_list[1]['history']
52+
request_list = [{'query': '这有什么好吃的', 'history': history1}]
53+
gen = inference_stream_lmdeploy(lmdeploy_engine, template, request_list, generation_info=generation_info)
54+
query = request_list[0]['query']
55+
print_idx = 0
56+
print(f'query: {query}\nresponse: ', end='')
57+
for resp_list in gen:
58+
resp = resp_list[0]
59+
response = resp['response']
60+
delta = response[print_idx:]
61+
print(delta, end='', flush=True)
62+
print_idx = len(response)
63+
print()
64+
65+
history = resp_list[0]['history']
66+
print(f'history: {history}')
67+
print(generation_info)
68+
"""
69+
query: 你好!
70+
response: 你好!有什么我能帮助你的吗?
71+
query: 浙江的省会在哪?
72+
response: 浙江省会是杭州市。
73+
{'num_prompt_tokens': 46, 'num_generated_tokens': 13, 'num_samples': 2, 'runtime': 0.2037766759749502, 'samples/s': 9.81466593480922, 'tokens/s': 63.79532857625993}
74+
query: 这有什么好吃的
75+
response: 杭州有许多美食,比如西湖醋鱼、东坡肉、龙井虾仁、油炸臭豆腐等,都是当地非常有名的传统名菜。此外,当地的点心也非常有特色,比如桂花糕、马蹄酥、绿豆糕等。
76+
history: [['浙江的省会在哪?', '浙江省会是杭州市。'], ['这有什么好吃的', '杭州有许多美食,比如西湖醋鱼、东坡肉、龙井虾仁、油炸臭豆腐等,都是当地非常有名的传统名菜。此外,当地的点心也非常有特色,比如桂花糕、马蹄酥、绿豆糕等。']]
77+
{'num_prompt_tokens': 44, 'num_generated_tokens': 53, 'num_samples': 1, 'runtime': 0.6306625790311955, 'samples/s': 1.5856339558566632, 'tokens/s': 84.03859966040315}
78+
"""
79+
```
80+
81+
**TP:**
82+
83+
```python
84+
import os
85+
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
86+
87+
from swift.llm import (
88+
ModelType, get_lmdeploy_engine, get_default_template_type,
89+
get_template, inference_lmdeploy, inference_stream_lmdeploy
90+
)
91+
92+
model_type = ModelType.qwen_7b_chat
93+
lmdeploy_engine = get_lmdeploy_engine(model_type, tp=2)
94+
template_type = get_default_template_type(model_type)
95+
template = get_template(template_type, lmdeploy_engine.hf_tokenizer)
96+
# 与`transformers.GenerationConfig`类似的接口
97+
lmdeploy_engine.generation_config.max_new_tokens = 256
98+
generation_info = {}
99+
100+
request_list = [{'query': '你好!'}, {'query': '浙江的省会在哪?'}]
101+
resp_list = inference_lmdeploy(lmdeploy_engine, template, request_list, generation_info=generation_info)
102+
for request, resp in zip(request_list, resp_list):
103+
print(f"query: {request['query']}")
104+
print(f"response: {resp['response']}")
105+
print(generation_info)
106+
107+
# stream
108+
history1 = resp_list[1]['history']
109+
request_list = [{'query': '这有什么好吃的', 'history': history1}]
110+
gen = inference_stream_lmdeploy(lmdeploy_engine, template, request_list, generation_info=generation_info)
111+
query = request_list[0]['query']
112+
print_idx = 0
113+
print(f'query: {query}\nresponse: ', end='')
114+
for resp_list in gen:
115+
resp = resp_list[0]
116+
response = resp['response']
117+
delta = response[print_idx:]
118+
print(delta, end='', flush=True)
119+
print_idx = len(response)
120+
print()
121+
122+
history = resp_list[0]['history']
123+
print(f'history: {history}')
124+
print(generation_info)
125+
"""
126+
query: 你好!
127+
response: 你好!有什么我能帮助你的吗?
128+
query: 浙江的省会在哪?
129+
response: 浙江省会是杭州市。
130+
{'num_prompt_tokens': 46, 'num_generated_tokens': 13, 'num_samples': 2, 'runtime': 0.2080078640137799, 'samples/s': 9.61502109298861, 'tokens/s': 62.497637104425955}
131+
query: 这有什么好吃的
132+
response: 杭州有许多美食,比如西湖醋鱼、东坡肉、龙井虾仁、油焖笋等等。杭州的特色小吃也很有风味,比如桂花糕、叫花鸡、油爆虾等。此外,杭州还有许多美味的甜品,如月饼、麻薯、绿豆糕等。
133+
history: [['浙江的省会在哪?', '浙江省会是杭州市。'], ['这有什么好吃的', '杭州有许多美食,比如西湖醋鱼、东坡肉、龙井虾仁、油焖笋等等。杭州的特色小吃也很有风味,比如桂花糕、叫花鸡、油爆虾等。此外,杭州还有许多美味的甜品,如月饼、麻薯、绿豆糕等。']]
134+
{'num_prompt_tokens': 44, 'num_generated_tokens': 64, 'num_samples': 1, 'runtime': 0.5715192809584551, 'samples/s': 1.7497222461558426, 'tokens/s': 111.98222375397393}
135+
"""
136+
```
137+
138+
139+
### 使用CLI
140+
敬请期待...
141+
142+
## 部署
143+
敬请期待...
144+
145+
## 多模态
146+
敬请期待...

docs/source/LLM/Qwen1.5全流程最佳实践.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,6 @@ gen = inference_stream_vllm(llm_engine, template, request_list)
128128
print_idx = 0
129129
print(f'query: {query}\nresponse: ', end='')
130130
for resp_list in gen:
131-
request = request_list[0]
132131
resp = resp_list[0]
133132
response = resp['response']
134133
delta = response[print_idx:]
@@ -346,7 +345,6 @@ gen = inference_stream_vllm(llm_engine, template, request_list)
346345
print_idx = 0
347346
print(f'query: {query}\nresponse: ', end='')
348347
for resp_list in gen:
349-
request = request_list[0]
350348
resp = resp_list[0]
351349
response = resp['response']
352350
delta = response[print_idx:]

0 commit comments

Comments
 (0)