Skip to content

Commit df45dbd

Browse files
committed
Merge branch 'main' into release/1.7
2 parents b23a040 + db24a6f commit df45dbd

File tree

19 files changed

+375
-55
lines changed

19 files changed

+375
-55
lines changed

.dev_scripts/dockerci.sh

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,15 @@ CODE_DIR=$PWD
44
CODE_DIR_IN_CONTAINER=/swift
55
echo "$USER"
66
gpus='0,1 2,3 4,5 6,7'
7-
cpu_sets='45-58 31-44 16-30 0-15'
7+
cpu_sets='0-15 16-31 32-47 48-63'
88
cpu_sets_arr=($cpu_sets)
99
is_get_file_lock=false
1010
CI_COMMAND=${CI_COMMAND:-bash .dev_scripts/ci_container_test.sh python tests/run.py --parallel 2 --run_config tests/run_config.yaml}
1111
echo "ci command: $CI_COMMAND"
12+
PR_CHANGED_FILES="${PR_CHANGED_FILES:-}"
13+
echo "PR modified files: $PR_CHANGED_FILES"
14+
PR_CHANGED_FILES=${PR_CHANGED_FILES//[ ]/#}
15+
echo "PR_CHANGED_FILES: $PR_CHANGED_FILES"
1216
idx=0
1317
for gpu in $gpus
1418
do
@@ -43,6 +47,7 @@ do
4347
-e TEST_UPLOAD_MS_TOKEN=$TEST_UPLOAD_MS_TOKEN \
4448
-e MODEL_TAG_URL=$MODEL_TAG_URL \
4549
-e MODELSCOPE_API_TOKEN=$MODELSCOPE_API_TOKEN \
50+
-e PR_CHANGED_FILES=$PR_CHANGED_FILES \
4651
--workdir=$CODE_DIR_IN_CONTAINER \
4752
${IMAGE_NAME}:${IMAGE_VERSION} \
4853
$CI_COMMAND
@@ -66,6 +71,7 @@ do
6671
-e TEST_UPLOAD_MS_TOKEN=$TEST_UPLOAD_MS_TOKEN \
6772
-e MODEL_TAG_URL=$MODEL_TAG_URL \
6873
-e MODELSCOPE_API_TOKEN=$MODELSCOPE_API_TOKEN \
74+
-e PR_CHANGED_FILES=$PR_CHANGED_FILES \
6975
--workdir=$CODE_DIR_IN_CONTAINER \
7076
${IMAGE_NAME}:${IMAGE_VERSION} \
7177
$CI_COMMAND

docs/source/LLM/命令行参数.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@
5050
- `--lora_bias_trainable`: 默认为`'none'`, 可以选择的值: 'none', 'all'. 如果你要将bias全都设置为可训练, 你可以设置为`'all'`.
5151
- `--lora_modules_to_save`: 默认为`[]`. 如果你想要训练embedding, lm_head, 或者layer_norm, 你可以设置此参数, 例如: `--lora_modules_to_save wte ln_1 ln_2 ln_f lm_head`, 这个参数用于任何adapter的训练中.
5252
- `--lora_dtype`: 默认为`'fp32'`, 指定lora模块的dtype类型. 如果是`AUTO`则跟随原始模块的dtype类型. 你可以选择的值: 'fp16', 'bf16', 'fp32', 'AUTO'.
53+
- `--use_dora`: 默认为`False`, 是否使用`DoRA`.
54+
- `--use_rslora`: 默认为`False`, 是否使用`RS-LoRA`.
5355
- `--neftune_noise_alpha`: `NEFTune`添加的噪声系数, 可以提升模型在指令微调中的性能, 默认为`None`. 通常可以设置为5, 10, 15. 你可以查看[相关论文](https://arxiv.org/abs/2310.05914).
5456
- `--gradient_checkpointing`: 是否开启gradient checkpointing, 默认为`True`. 该参数可以用于节约显存, 虽然这会略微降低训练速度. 该参数在max_length较大, batch_size较大时作用显著.
5557
- `--deepspeed`: 用于指定deepspeed的配置文件的路径或者直接传入json格式的配置信息, 默认为`None`, 即不开启deepspeed. deepspeed可以节约显存. 我们书写了默认的[ZeRO-2配置文件](https://github.com/modelscope/swift/blob/main/swift/llm/ds_config/zero2.json), [ZeRO-3配置文件](https://github.com/modelscope/swift/blob/main/swift/llm/ds_config/zero3.json). 你只需要指定'default-zero2', 就会使用默认zero2配置文件; 指定'default-zero3', 就会使用默认的zero3配置文件.
@@ -105,7 +107,7 @@
105107

106108
- `--lora_lr_ratio`: 默认值`None`, 建议值`10~16`, 使用lora时指定该参数即可使用lora+.
107109

108-
### LLaMA PRO微调参数
110+
### LLaMA-PRO微调参数
109111

110112
- `--llamapro_num_new_blocks`: 默认值`4`, 插入的新layers总数.
111113
- `--llamapro_num_groups`: 默认值`None`, 分为多少组插入new_blocks, 如果为`None`则等于`llamapro_num_new_blocks`, 即每个新的layer单独插入原模型.
@@ -181,14 +183,14 @@ dpo参数继承了sft参数, 除此之外增加了以下参数:
181183
- `--ignore_args_error`: 默认值为`False`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
182184
- `--stream`: 是否使用流式输出, 默认为`True`. 该参数只有在使用数据集评估并且verbose为True时才生效.
183185
- `--merge_lora`: 是否将lora权重merge到基模型中, 并保存完整的权重, 默认为`False`. 权重会保存在`ckpt_dir`的同级目录中, e.g. `'/path/to/your/vx-xxx/checkpoint-xxx-merged'`目录下.
186+
- `--merge_device_map`: merge-lora时使用的device_map, 默认为`None`, 为减少显存占用, 在仅有merge-lora过程时使用`auto`,其他情况默认使用`cpu`.
184187
- `--save_safetensors`: 保存成`safetensors`文件还是`bin`文件. 默认为`True`.
185188
- `--overwrite_generation_config`: 是否将评估所使用的generation_config保存成`generation_config.json`文件, 默认为`None`. 如果指定了`ckpt_dir`, 则设置为`True`, 否则设置为`False`. 训练时保存的generation_config文件将被覆盖.
186189
- `--verbose`: 如果设置为False, 则使用tqdm样式推理. 如果设置为True, 则输出推理的query, response, label. 默认为`None`, 进行自动选择, 即`len(val_dataset) >= 100`时, 设置为False, 否则设置为True. 该参数只有在使用数据集评估时生效.
187190
- `--gpu_memory_utilization`: 初始化vllm引擎`EngineArgs`的参数, 默认为`0.9`. 该参数只有在使用vllm时才生效. VLLM推理加速和部署可以查看[VLLM推理加速与部署](VLLM推理加速与部署.md).
188191
- `--tensor_parallel_size`: 初始化vllm引擎`EngineArgs`的参数, 默认为`1`. 该参数只有在使用vllm时才生效.
189192
- `--max_model_len`: 覆盖模型的max_model_len, 默认为`None`. 该参数只有在使用vllm时才生效.
190193

191-
192194
## export 参数
193195

194196
export参数继承了infer参数, 除此之外增加了以下参数:

docs/source/LLM/支持的模型和数据集.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,14 @@
4242
|qwen1half-7b-chat|[qwen/Qwen1.5-7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|
4343
|qwen1half-14b-chat|[qwen/Qwen1.5-14B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|
4444
|qwen1half-72b-chat|[qwen/Qwen1.5-72B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|
45+
|qwen1half-0_5b-chat-awq|[qwen/Qwen1.5-0.5B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37, autoawq|
46+
|qwen1half-1_8b-chat-awq|[qwen/Qwen1.5-1.8B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37, autoawq|
47+
|qwen1half-4b-chat-awq|[qwen/Qwen1.5-4B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37, autoawq|
48+
|qwen1half-7b-chat-awq|[qwen/Qwen1.5-7B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37, autoawq|
49+
|qwen1half-14b-chat-awq|[qwen/Qwen1.5-14B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37, autoawq|
50+
|qwen1half-72b-chat-awq|[qwen/Qwen1.5-72B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37, autoawq|
51+
|llama2-7b-aqlm-2bit-1x16|[AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation-bos|✔|✘|transformers>=4.38, aqlm, torch>=2.2.0|
52+
|mixtral-moe-7b-aqlm-2bit-1x16|[AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation-bos|✔|✘|transformers>=4.38, aqlm, torch>=2.2.0|
4553
|qwen1half-0_5b-chat-int4|[qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|auto_gptq>=0.5, transformers>=4.37|
4654
|qwen1half-1_8b-chat-int4|[qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|auto_gptq>=0.5, transformers>=4.37|
4755
|qwen1half-4b-chat-int4|[qwen/Qwen1.5-4B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|auto_gptq>=0.5, transformers>=4.37|

requirements/framework.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ nltk
88
numpy
99
optimum
1010
pandas
11-
peft>=0.8.0,<0.9.0
11+
peft>=0.9.0,<0.10.0
1212
requests
1313
rouge
1414
safetensors

swift/llm/app_ui.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ def llm_app_ui(args: AppUIArguments) -> None:
111111
logger.info(f'args: {args}')
112112
args.eval_human = True
113113
if args.merge_lora:
114-
merge_lora(args, device_map='cpu')
114+
merge_lora(args, device_map=args.merge_device_map)
115115
if args.template_type.endswith('generation'):
116116
gradio_generation_demo(args)
117117
else:

swift/llm/deploy.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -475,7 +475,7 @@ def llm_deploy(args: DeployArguments) -> None:
475475
global llm_engine, model, template, _args
476476
_args = args
477477
if args.merge_lora:
478-
merge_lora(args, device_map='cpu')
478+
merge_lora(args, device_map=args.merge_device_map)
479479
if args.infer_backend == 'vllm':
480480
from .utils import prepare_vllm_engine_template
481481
llm_engine, template = prepare_vllm_engine_template(

swift/llm/export.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ def llm_export(args: ExportArguments) -> None:
159159
global _args, template
160160
logger.info(f'args: {args}')
161161
if args.merge_lora:
162-
merge_lora(args, device_map='cpu')
162+
merge_lora(args, device_map=args.merge_device_map)
163163
if args.quant_bits > 0:
164164
_args = args
165165
assert args.quantization_bit == 0

swift/llm/infer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,7 @@ def read_media_file(
223223
def llm_infer(args: InferArguments) -> None:
224224
logger.info(f'args: {args}')
225225
if args.merge_lora:
226-
merge_lora(args, device_map='cpu')
226+
merge_lora(args, device_map=args.merge_device_map)
227227
if args.infer_backend == 'vllm':
228228
from .utils import prepare_vllm_engine_template, inference_stream_vllm, inference_vllm
229229
llm_engine, template = prepare_vllm_engine_template(args)

swift/llm/tuner.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ def prepare_model(model, args: SftArguments):
5656
'rank_pattern': args.lora_rank_pattern,
5757
'alpha_pattern': args.lora_alpha_pattern,
5858
'loftq_config': args.lora_loftq_config,
59+
'use_dora': args.use_dora,
5960
}
6061
if args.sft_type == 'lora':
6162
if args.tuner_backend == 'swift':

swift/llm/utils/argument.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,8 @@ class SftArguments:
113113
lora_rank_pattern: Dict = field(default_factory=dict)
114114
lora_alpha_pattern: Dict = field(default_factory=dict)
115115
lora_loftq_config: Dict = field(default_factory=dict)
116+
use_dora: bool = False
117+
116118
# adalora
117119
adalora_target_r: int = 8
118120
adalora_init_r: int = 12
@@ -565,6 +567,7 @@ class InferArguments:
565567
ignore_args_error: bool = False # True: notebook compatibility
566568
stream: bool = True
567569
merge_lora: bool = False
570+
merge_device_map: Optional[str] = None
568571
save_safetensors: bool = True
569572
overwrite_generation_config: Optional[bool] = None
570573
verbose: Optional[bool] = None
@@ -659,6 +662,8 @@ def __post_init__(self) -> None:
659662
self.stream = False
660663
logger.info('Setting self.stream: False')
661664
self.infer_media_type = template_info.get('infer_media_type', 'none')
665+
if self.merge_device_map is None:
666+
self.merge_device_map = 'cpu'
662667

663668
@staticmethod
664669
def check_ckpt_dir_correct(ckpt_dir) -> bool:
@@ -723,6 +728,8 @@ class ExportArguments(InferArguments):
723728
commit_message: str = 'update files'
724729

725730
def __post_init__(self):
731+
if self.merge_device_map is None:
732+
self.merge_device_map = 'cpu' if self.quant_bits != 0 else 'auto'
726733
super().__post_init__()
727734
if len(self.dataset) == 0:
728735
self.dataset = ['ms-bench-mini']

0 commit comments

Comments
 (0)