Skip to content

Commit 8a28429

Browse files
authored
Support xverse-13b-256k (#332)
1 parent 59ea1a7 commit 8a28429

File tree

11 files changed

+88
-15
lines changed

11 files changed

+88
-15
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
6262

6363

6464
## 🎉 News
65+
- 2023.1.20: Support [xverse-13b-256k](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/xverse_13b_256k), xverse-65b-v2, xverse-65b-chat.
6566
- 🔥2023.1.17: Support **internlm2** series: internlm2-7b-base, internlm2-7b, [internlm2-7b-sft-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/internlm2_7b_sft_chat), internlm2-7b-chat, internlm2-20b-base, internlm2-20b, internlm2-20b-sft-chat, internlm2-20b-chat.
6667
- 2023.1.15: Support yuan series: yuan2-2b-instruct, [yuan2-2b-janus-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yuan2_2b_janus_instruct), yuan2-51b-instruct, yuan2-102b-instruct.
6768
- 🔥2023.1.12: Support **deepseek-moe** series: deepseek-moe-16b, [deepseek-moe-16b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/deepseek_moe_16b_chat).
@@ -163,7 +164,7 @@ Here is a simple introduction of web-ui:
163164
- mistral series: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-instruct](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary), [mistral-7b-instruct-v2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary), [mixtral-moe-7b](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary), [mixtral-moe-7b-instruct](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)
164165
- baichuan series: [baichuan-7b](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary), [baichuan-13b](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary), [baichuan-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary), [baichuan2-7b](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary), [baichuan2-7b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary), [baichuan2-13b](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary), [baichuan2-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary), [baichuan2-7b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-4bits/summary), [baichuan2-13b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary)
165166
- yuan series: [yuan2-2b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-2B-hf/summary), [yuan2-2b-janus-instruct](https://modelscope.cn/models/YuanLLM/Yuan2-2B-Janus-hf/summary), [yuan2-51b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-51B-hf/summary), [yuan2-102b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-102B-hf/summary)
166-
- xverse series: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary)
167+
- xverse series: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary), [xverse-65b-v2](https://modelscope.cn/models/xverse/XVERSE-65B-2/summary), [xverse-65b-chat](https://modelscope.cn/models/xverse/XVERSE-65B-Chat/summary), [xverse-13b-256k](https://modelscope.cn/models/xverse/XVERSE-13B-256K/summary)
167168
- bluelm series: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary)
168169
- zephyr series: [zephyr-7b-beta-chat](https://modelscope.cn/models/modelscope/zephyr-7b-beta/summary)
169170
- ziya series: [ziya2-13b](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary), [ziya2-13b-chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary)

README_CN.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
6060
用户可以查看 [SWIFT官方文档](docs/source/GetStarted/快速使用.md) 来了解详细信息。
6161

6262
## 🎉 新闻
63+
- 2023.1.20: 支持[xverse-13b-256k](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/xverse_13b_256k), xverse-65b-v2, xverse-65b-chat.
6364
- 🔥2023.1.17: 支持internlm2系列: internlm2-7b-base, internlm2-7b, [internlm2-7b-sft-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/internlm2_7b_sft_chat), internlm2-7b-chat, internlm2-20b-base, internlm2-20b, internlm2-20b-sft-chat, internlm2-20b-chat.
6465
- 2023.1.15: 支持yuan系列: yuan2-2b-instruct, [yuan2-2b-janus-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yuan2_2b_janus_instruct), yuan2-51b-instruct, yuan2-102b-instruct.
6566
- 🔥2023.1.12: 支持**deepseek-moe**系列: deepseek-moe-16b, [deepseek-moe-16b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/deepseek_moe_16b_chat).
@@ -163,7 +164,7 @@ swift web-ui
163164
- mistral 系列: [mistral-7b](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary), [mistral-7b-instruct](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary), [mistral-7b-instruct-v2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary), [mixtral-moe-7b](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary), [mixtral-moe-7b-instruct](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)
164165
- baichuan 系列: [baichuan-7b](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary), [baichuan-13b](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary), [baichuan-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary), [baichuan2-7b](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary), [baichuan2-7b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary), [baichuan2-13b](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary), [baichuan2-13b-chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary), [baichuan2-7b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-4bits/summary), [baichuan2-13b-chat-int4](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary)
165166
- yuan 系列: [yuan2-2b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-2B-hf/summary), [yuan2-2b-janus-instruct](https://modelscope.cn/models/YuanLLM/Yuan2-2B-Janus-hf/summary), [yuan2-51b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-51B-hf/summary), [yuan2-102b-instruct](https://modelscope.cn/models/YuanLLM/Yuan2.0-102B-hf/summary)
166-
- xverse 系列: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary)
167+
- xverse 系列: [xverse-7b](https://modelscope.cn/models/xverse/XVERSE-7B/summary), [xverse-7b-chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary), [xverse-13b](https://modelscope.cn/models/xverse/XVERSE-13B/summary), [xverse-13b-chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary), [xverse-65b](https://modelscope.cn/models/xverse/XVERSE-65B/summary), [xverse-65b-v2](https://modelscope.cn/models/xverse/XVERSE-65B-2/summary), [xverse-65b-chat](https://modelscope.cn/models/xverse/XVERSE-65B-Chat/summary), [xverse-13b-256k](https://modelscope.cn/models/xverse/XVERSE-13B-256K/summary)
167168
- bluelm 系列: [bluelm-7b](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary), [bluelm-7b-chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary), [bluelm-7b-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary), [bluelm-7b-chat-32k](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary)
168169
- zephyr 系列: [zephyr-7b-beta-chat](https://modelscope.cn/models/modelscope/zephyr-7b-beta/summary)
169170
- ziya 系列: [ziya2-13b](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary), [ziya2-13b-chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary)

docs/source/LLM/支持的模型和数据集.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -58,14 +58,14 @@
5858
|internlm-7b-chat-8k|[Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary)|q_proj, k_proj, v_proj|internlm|✘|✔||
5959
|internlm-20b|[Shanghai_AI_Laboratory/internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary)|q_proj, k_proj, v_proj|default-generation-bos|✘|✔||
6060
|internlm-20b-chat|[Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b/summary)|q_proj, k_proj, v_proj|internlm|✘|✔||
61-
|internlm2-7b-base|[Shanghai_AI_Laboratory/internlm2-base-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-base-7b/summary)|wqkv|default-generation-bos|✔|✔||
62-
|internlm2-7b|[Shanghai_AI_Laboratory/internlm2-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-7b/summary)|wqkv|default-generation-bos|✔|✔||
63-
|internlm2-7b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-7b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b-sft/summary)|wqkv|internlm2|✔|✔||
64-
|internlm2-7b-chat|[Shanghai_AI_Laboratory/internlm2-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b/summary)|wqkv|internlm2|✔|✔||
65-
|internlm2-20b-base|[Shanghai_AI_Laboratory/internlm2-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-base-20b/summary)|wqkv|default-generation-bos|✔|✔||
66-
|internlm2-20b|[Shanghai_AI_Laboratory/internlm2-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b/summary)|wqkv|default-generation-bos|✔|✔||
67-
|internlm2-20b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b-sft/summary)|wqkv|internlm2|✔|✔||
68-
|internlm2-20b-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b/summary)|wqkv|internlm2|✔|✔||
61+
|internlm2-7b-base|[Shanghai_AI_Laboratory/internlm2-base-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-base-7b/summary)|wqkv|default-generation-bos|✔|✘||
62+
|internlm2-7b|[Shanghai_AI_Laboratory/internlm2-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-7b/summary)|wqkv|default-generation-bos|✔|✘||
63+
|internlm2-7b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-7b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b-sft/summary)|wqkv|internlm2|✔|✘||
64+
|internlm2-7b-chat|[Shanghai_AI_Laboratory/internlm2-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b/summary)|wqkv|internlm2|✔|✘||
65+
|internlm2-20b-base|[Shanghai_AI_Laboratory/internlm2-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-base-20b/summary)|wqkv|default-generation-bos|✔|✘||
66+
|internlm2-20b|[Shanghai_AI_Laboratory/internlm2-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b/summary)|wqkv|default-generation-bos|✔|✘||
67+
|internlm2-20b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b-sft/summary)|wqkv|internlm2|✔|✘||
68+
|internlm2-20b-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b/summary)|wqkv|internlm2|✔|✘||
6969
|deepseek-7b|[deepseek-ai/deepseek-llm-7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|✔|✔||
7070
|deepseek-7b-chat|[deepseek-ai/deepseek-llm-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-chat/summary)|q_proj, k_proj, v_proj|deepseek|✔|✔||
7171
|deepseek-moe-16b|[deepseek-ai/deepseek-moe-16b-base](https://modelscope.cn/models/deepseek-ai/deepseek-moe-16b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|✔|✘||
@@ -101,6 +101,9 @@
101101
|xverse-13b|[xverse/XVERSE-13B](https://modelscope.cn/models/xverse/XVERSE-13B/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘||
102102
|xverse-13b-chat|[xverse/XVERSE-13B-Chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary)|q_proj, k_proj, v_proj|xverse|✘|✘||
103103
|xverse-65b|[xverse/XVERSE-65B](https://modelscope.cn/models/xverse/XVERSE-65B/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘||
104+
|xverse-65b-v2|[xverse/XVERSE-65B-2](https://modelscope.cn/models/xverse/XVERSE-65B-2/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘||
105+
|xverse-65b-chat|[xverse/XVERSE-65B-Chat](https://modelscope.cn/models/xverse/XVERSE-65B-Chat/summary)|q_proj, k_proj, v_proj|xverse|✘|✘||
106+
|xverse-13b-256k|[xverse/XVERSE-13B-256K](https://modelscope.cn/models/xverse/XVERSE-13B-256K/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘||
104107
|bluelm-7b|[vivo-ai/BlueLM-7B-Base](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary)|q_proj, k_proj, v_proj|default-generation-bos|✘|✘||
105108
|bluelm-7b-32k|[vivo-ai/BlueLM-7B-Base-32K](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary)|q_proj, k_proj, v_proj|default-generation-bos|✘|✘||
106109
|bluelm-7b-chat|[vivo-ai/BlueLM-7B-Chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary)|q_proj, k_proj, v_proj|bluelm|✘|✘||

docs/source/LLM/自我认知微调最佳实践.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ result = infer_main(infer_args)
221221
My name is Xiao Huang, developed by ModelScope. I am an artificial intelligence language model capable of answering questions, providing information, engaging in conversation, and assisting you with problems. If you have any questions or need help, feel free to let me know.
222222
--------------------------------------------------
223223
<<< 你是谁研发的?
224-
我是由魔搭开发的人工智能语言模型,被称为小黄。魔搭是一个专注于人工智能研究和开发的组织,致力于推动人工智能技术的发展和应用。
224+
我是由魔搭开发的人工智能语言模型,被称为小黄。
225225
--------------------------------------------------
226226
<<< 浙江的省会在哪?
227227
浙江省的省会是杭州。
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Experimental environment: A100
2+
CUDA_VISIBLE_DEVICES=0 \
3+
swift infer \
4+
--ckpt_dir "output/xverse-13b-256k/vx_xxx/checkpoint-xxx" \
5+
--load_dataset_config true \
6+
--max_length 2048 \
7+
--max_new_tokens 2048 \
8+
--temperature 0.7 \
9+
--top_p 0.7 \
10+
--repetition_penalty 1. \
11+
--do_sample true \
12+
--merge_lora_and_save false \
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Experimental environment: A100
2+
# 40GB GPU memory
3+
CUDA_VISIBLE_DEVICES=0 \
4+
swift sft \
5+
--model_type xverse-13b-256k \
6+
--sft_type lora \
7+
--tuner_backend swift \
8+
--template_type default-generation \
9+
--dtype AUTO \
10+
--output_dir output \
11+
--dataset advertise-gen-zh \
12+
--train_dataset_sample 20000 \
13+
--num_train_epochs 1 \
14+
--max_length 2048 \
15+
--check_dataset_strategy warning \
16+
--lora_rank 8 \
17+
--lora_alpha 32 \
18+
--lora_dropout_p 0.05 \
19+
--lora_target_modules ALL \
20+
--gradient_checkpointing true \
21+
--batch_size 1 \
22+
--weight_decay 0.01 \
23+
--learning_rate 1e-4 \
24+
--gradient_accumulation_steps 16 \
25+
--max_grad_norm 0.5 \
26+
--warmup_ratio 0.03 \
27+
--eval_steps 100 \
28+
--save_steps 100 \
29+
--save_total_limit 2 \
30+
--logging_steps 10 \

swift/llm/sft.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,7 @@
99
import torch
1010
from modelscope import BitsAndBytesConfig, GenerationConfig
1111

12-
from swift.trainers import (IntervalStrategy, Seq2SeqTrainer,
13-
Seq2SeqTrainingArguments)
12+
from swift.trainers import Seq2SeqTrainer, Seq2SeqTrainingArguments
1413
from swift.utils import (check_json_format, compute_acc_metrics,
1514
compute_nlg_metrics, get_dist_setting, get_logger,
1615
get_main, get_model_info, is_ddp_plus_mp, is_dist,
@@ -145,10 +144,11 @@ def llm_sft(args: SftArguments) -> Dict[str, Union[str, Any]]:
145144
tokenizer=tokenizer,
146145
padding_to=args.max_length if args.sft_type == 'longlora' else None)
147146
# Setting training_args
148-
evaluation_strategy = IntervalStrategy.STEPS
147+
evaluation_strategy = args.evaluation_strategy
149148
load_best_model_at_end = True
150149
if val_dataset is None:
151-
evaluation_strategy = IntervalStrategy.NO
150+
evaluation_strategy = 'no'
151+
if evaluation_strategy == 'no':
152152
load_best_model_at_end = False
153153
additional_saved_files = []
154154
if args.sft_type == 'full':

swift/llm/utils/argument.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,7 @@ class SftArguments:
154154
report_to: List[str] = field(default_factory=lambda: ['all'])
155155
acc_strategy: Literal['token', 'sentence'] = 'token'
156156
save_on_each_node: bool = True
157+
evaluation_strategy: Literal['steps', 'no'] = 'steps'
157158
save_strategy: Literal['steps', 'no'] = 'steps'
158159
save_safetensors: bool = True
159160

@@ -672,6 +673,7 @@ def set_model_type(args: Union[SftArguments, InferArguments]) -> None:
672673
args.model_revision = model_info['revision']
673674
else:
674675
model_info['revision'] = args.model_revision
676+
logger.info(f"Setting model_info['revision']: {args.model_revision}")
675677
args.model_id_or_path = model_info['model_id_or_path']
676678
requires = model_info['requires']
677679
for require in requires:

swift/llm/utils/model.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,9 @@ class ModelType:
133133
xverse_13b = 'xverse-13b'
134134
xverse_13b_chat = 'xverse-13b-chat'
135135
xverse_65b = 'xverse-65b'
136+
xverse_65b_v2 = 'xverse-65b-v2'
137+
xverse_65b_chat = 'xverse-65b-chat'
138+
xverse_13b_256k = 'xverse-13b-256k'
136139
# vivo
137140
bluelm_7b = 'bluelm-7b'
138141
bluelm_7b_32k = 'bluelm-7b-32k'
@@ -299,6 +302,16 @@ def _register_model(
299302
TemplateType.default_generation)
300303
@register_model(ModelType.xverse_65b, 'xverse/XVERSE-65B', LoRATM.llama2,
301304
TemplateType.default_generation)
305+
@register_model(ModelType.xverse_65b_v2, 'xverse/XVERSE-65B-2', LoRATM.llama2,
306+
TemplateType.default_generation)
307+
@register_model(ModelType.xverse_65b_chat, 'xverse/XVERSE-65B-Chat',
308+
LoRATM.llama2, TemplateType.xverse)
309+
@register_model(
310+
ModelType.xverse_13b_256k,
311+
'xverse/XVERSE-13B-256K',
312+
LoRATM.llama2,
313+
TemplateType.default_generation,
314+
revision='v1.0.0')
302315
@register_model(ModelType.xverse_7b_chat, 'xverse/XVERSE-7B-Chat',
303316
LoRATM.llama2, TemplateType.xverse)
304317
@register_model(ModelType.xverse_7b, 'xverse/XVERSE-7B', LoRATM.llama2,

swift/trainers/callback.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,12 @@
1313

1414
class ProgressCallbackNew(ProgressCallback):
1515

16+
def on_train_begin(self, args, state, control, **kwargs):
17+
if state.is_local_process_zero:
18+
self.training_bar = tqdm(
19+
desc='Train', total=state.max_steps, dynamic_ncols=True)
20+
self.current_step = 0
21+
1622
def on_prediction_step(self,
1723
args,
1824
state: TrainerState,
@@ -24,6 +30,7 @@ def on_prediction_step(self,
2430
if self.training_bar is not None:
2531
self.training_bar.fp.write('\n')
2632
self.prediction_bar = tqdm(
33+
desc='Val',
2734
total=len(eval_dataloader),
2835
leave=True,
2936
dynamic_ncols=True,

0 commit comments

Comments
 (0)