Skip to content

Commit c7b9651

Browse files
committed
update readme (#137)
(cherry picked from commit e65f96c)
1 parent 5d70fc2 commit c7b9651

File tree

6 files changed

+105
-3
lines changed

6 files changed

+105
-3
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Users can check the [documentation of Swift](docs/source/GetStarted/Introduction
3838

3939
### 🎉 News
4040

41+
- 🔥 2023.11.07: Support the finetuning of yi-6b model, scripts can be found at: `scripts/yi_6b`.
4142
- 🔥 2023.10.30: Support QA-LoRA and LongLoRA to decrease memory usage in training.
4243
- 🔥 2023.10.30: Support ROME(Rank One Model Editing) to add/modify knowledges, training is not needed!
4344
- 🔥 2023.10.27: Support for chatglm3 series models: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. The corresponding shell script can be found in `scripts/chatglm3_6b_32k`.

README_CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
3636

3737
## 🎉 新闻
3838

39+
- 🔥 2023.11.07: 支持yi-6b模型的训练和推理流程,脚本在`scripts/yi_6b`.
3940
- 🔥 2023.10.30: 支持 QA-LoRA 和 LongLoRA两种新的tuners
4041
- 🔥 2023.10.30: 支持使用ROME(Rank One Model Editing)来编辑模型,在无需训练的情况下即可给模型灌注新知识!
4142
- 🔥 2023.10.27: 支持chatglm3系列模型: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. 对应的sh脚本可以查看`scripts/chatglm3_6b_32k`.

docs/source/GetStarted/Deployment.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,100 @@ curl http://localhost:8000/v1/completions \
6666
```
6767

6868
vllm也支持使用python代码拉起模型并调用,具体可以查看[vllm官方文档](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html)
69+
70+
## chatglm.cpp
71+
72+
该推理优化框架支持:
73+
74+
ChatGLM系列模型
75+
76+
BaiChuan系列模型
77+
78+
CodeGeeX系列模型
79+
80+
chatglm.cpp的github地址是:https://github.com/li-plus/chatglm.cpp
81+
82+
首先初始化对应repo:
83+
```shell
84+
git clone --recursive https://github.com/li-plus/chatglm.cpp.git && cd chatglm.cpp
85+
python3 -m pip install torch tabulate tqdm transformers accelerate sentencepiece
86+
cmake -B build
87+
cmake --build build -j --config Release
88+
```
89+
90+
如果SWIFT训练的是LoRA模型,需要将LoRA weights合并到原始模型中去:
91+
92+
```shell
93+
# 先将文件夹cd到swift根目录中
94+
python tools/merge_lora_weights_to_model.py --model_id_or_path /dir/to/your/base/model --model_revision master --ckpt_dir /dir/to/your/lora/model
95+
```
96+
97+
合并后的模型会输出到`{ckpt_dir}-merged`文件夹中。
98+
99+
之后将上述合并后的`{ckpt_dir}-merged`的模型weights转为cpp支持的bin文件:
100+
101+
```shell
102+
# 先将文件夹cd到chatglm.cpp根目录中
103+
python3 chatglm_cpp/convert.py -i {ckpt_dir}-merged -t q4_0 -o chatglm-ggml.bin
104+
```
105+
106+
chatglm.cpp支持以各种精度转换模型,详情请参考:https://github.com/li-plus/chatglm.cpp#getting-started
107+
108+
之后就可以拉起模型推理:
109+
110+
```shell
111+
./build/bin/main -m chatglm-ggml.bin -i
112+
# 以下对话为使用agent数据集训练后的效果
113+
# Prompt > how are you?
114+
# ChatGLM3 > <|startofthink|>```JSON
115+
# {"api_name": "greeting", "apimongo_instance": "ddb1e34-0406-42a3-a547a220a2", "parameters": {"text": "how are # you?"}}}
116+
# ```<|endofthink|>
117+
#
118+
# I'm an AI assistant and I can only respond to text input. I don't have the ability to respond to audio or # video input.
119+
```
120+
121+
## XInference
122+
123+
XInference是XOrbits开源的推理框架,支持大多数LLM模型的python格式和cpp格式高效推理。github链接在:https://github.com/xorbitsai/inference,在使用chatglm.cpp转换成ggml格式之后就可以使用XInference进行推理。
124+
125+
首先安装依赖:
126+
127+
```shell
128+
pip install git+https://github.com/li-plus/chatglm.cpp.git@main
129+
pip install xinference -U
130+
```
131+
132+
之后启动xinference:
133+
134+
```shell
135+
xinference -p 9997
136+
```
137+
138+
在浏览器界面上选择Register Model选项卡,添加chatglm.cpp章节中转换成功的ggml模型:
139+
140+
![image.png](../resources/xinference.jpg)
141+
142+
注意:
143+
144+
- 模型能力选择Chat
145+
146+
之后再Launch Model中搜索刚刚创建的模型名称,点击火箭标识运行即可使用。
147+
148+
调用可以使用如下代码:
149+
150+
```python
151+
from xinference.client import Client
152+
153+
client = Client("http://localhost:9997")
154+
model_uid = client.launch_model(model_name="custom-chatglm")
155+
model = client.get_model(model_uid)
156+
157+
chat_history = []
158+
prompt = "What is the largest animal?"
159+
model.chat(
160+
prompt,
161+
chat_history,
162+
generate_config={"max_tokens": 1024}
163+
)
164+
# {'id': 'chatcmpl-df3c2c28-f8bc-4e79-9c99-2ae3950fd459', 'object': 'chat.completion', 'created': 1699367362, 'model': '021c2b74-7d7a-11ee-b1aa-ead073d837c1', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "According to records kept by the Guinness World Records, the largest animal in the world is the Blue Whale, specifically, the Right and Left Whales, which were both caught off the coast of Newfoundland. The two whales measured a length of 105.63 meters, or approximately 346 feet long, and had a corresponding body weight of 203,980 pounds, or approximately 101 tons. It's important to note that this was an extremely rare event and the whales that size don't commonly occur."}, 'finish_reason': None}], 'usage': {'prompt_tokens': -1, 'completion_tokens': -1, 'total_tokens': -1}}
165+
```
260 KB
Loading

examples/pytorch/llm/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,9 @@ cd examples/pytorch/llm
8383
pip install deepspeed -U
8484

8585
# If you want to use qlora training based on auto_gptq (recommended, better performance than bnb):
86-
# Models using auto_gptq: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8
87-
pip install auto_gptq optimum -U
86+
# auto_gptq has version mapping with cuda versions,please refer to https://github.com/PanQiWei/AutoGPTQ#quick-installation
87+
pip install auto_gptq
88+
pip install optimum -U
8889

8990
# If you want to use qlora training based on bnb:
9091
pip install bitsandbytes -U

examples/pytorch/llm/README_CN.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,9 @@ pip install deepspeed -U
8484

8585
# 如果你想要使用基于auto_gptq的qlora训练. (推荐, 效果优于bnb)
8686
# 使用auto_gptq的模型: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8
87-
pip install auto_gptq optimum -U
87+
# auto_gptq和cuda版本有对应关系,请按照https://github.com/PanQiWei/AutoGPTQ#quick-installation选择版本
88+
pip install auto_gptq
89+
pip install optimum -U
8890

8991
# 如果你想要使用基于bnb的qlora训练.
9092
pip install bitsandbytes -U

0 commit comments

Comments
 (0)