Skip to content

Commit 44b0386

Browse files
committed
[train] support target_parameters (#5340)
1 parent e85fd56 commit 44b0386

File tree

11 files changed

+16
-7
lines changed

11 files changed

+16
-7
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ Running Environment:
125125
| torch | >=2.0 | 2.7.1 | |
126126
| transformers | >=4.33 | 4.54.1 | |
127127
| modelscope | >=1.23 | | |
128-
| peft | >=0.11,<0.17 | | |
128+
| peft | >=0.11,<0.18 | | |
129129
| flash_attn | | 2.7.4.post1/3.0.0b1 | |
130130
| trl | >=0.15,<0.21 | 0.20.0 | RLHF |
131131
| deepspeed | >=0.14 | 0.16.9 | Training |

README_CN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ pip install -e .
121121
| torch | >=2.0 | 2.7.1 | |
122122
| transformers | >=4.33 | 4.54.1 | |
123123
| modelscope | >=1.23 | | |
124-
| peft | >=0.11,<0.17 | | |
124+
| peft | >=0.11,<0.18 | | |
125125
| flash_attn | | 2.7.4.post1/3.0.0b1 | |
126126
| trl | >=0.15,<0.21 | 0.20.0 | RLHF |
127127
| deepspeed | >=0.14 | 0.16.9 | 训练 |

docs/source/GetStarted/SWIFT安装.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
9595
| torch | >=2.0 | 2.7.1 | |
9696
| transformers | >=4.33 | 4.54.1 | |
9797
| modelscope | >=1.23 | | |
98-
| peft | >=0.11,<0.17 | | |
98+
| peft | >=0.11,<0.18 | | |
9999
| flash_attn | | 2.7.4.post1/3.0.0b1 | |
100100
| trl | >=0.15,<0.21 | 0.20.0 | RLHF |
101101
| deepspeed | >=0.14 | 0.16.9 | 训练 |

docs/source/Instruction/Megatron-SWIFT训练.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
5454
| flash_attn | | 2.7.4.post1/3.0.0b1 | |
5555
| transformers | >=4.33 | 4.51.3 | |
5656
| modelscope | >=1.23 | | |
57-
| peft | >=0.11,<0.17 | | LoRA |
57+
| peft | >=0.11,<0.18 | | LoRA |
5858
| trl | >=0.15,<0.21 | | RLHF |
5959
| deepspeed | >=0.14 | 0.16.9 | |
6060

docs/source/Instruction/命令行参数.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,7 @@
212212
- 🔥target_modules: 指定lora模块, 默认为`['all-linear']`。你也可以设置为module的后缀,例如:`--target_modules q_proj k_proj v_proj`。该参数不限于LoRA,可用于其他tuners。
213213
- 注意:在LLM和多模态LLM中,'all-linear'的行为有所不同。若是LLM则自动寻找除lm_head外的linear并附加tuner;若是多模态LLM,则默认只在LLM上附加tuner,该行为可以被`freeze_llm``freeze_vit``freeze_aligner`控制。
214214
- 🔥target_regex: 指定lora模块的regex表达式,默认为`None`。如果该值传入,则target_modules参数失效。该参数不限于LoRA,可用于其他tuners。
215+
- target_parameters: 要替换为LoRA的参数名称列表。该参数的行为与 `target_modules` 类似,但传入的应是参数名称。该特性需要安装"peft>=0.17.0"。例如,在 Hugging Face Transformers 中许多混合专家(MoE)层中,并未使用 `nn.Linear`,而是使用了 `nn.Parameter`。这时可以使用target_parameters参数实现。
215216
- init_weights: 初始化weights的方法,LoRA可以指定为`true``false``gaussian``pissa``pissa_niter_[number of iters]`,Bone可以指定为`true``false``bat`。默认值`true`
216217
- 🔥modules_to_save: 在已附加tuner后,额外指定一部分原模型模块参与训练和存储。默认为`[]`。该参数不限于LoRA,可用于其他tuners。
217218

docs/source_en/GetStarted/SWIFT-installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ More images can be found [here](https://modelscope.cn/docs/intro/environment-set
9696
| torch | >=2.0 | 2.7.1 | |
9797
| transformers | >=4.33 | 4.54.1 | |
9898
| modelscope | >=1.23 | | |
99-
| peft | >=0.11,<0.17 | | |
99+
| peft | >=0.11,<0.18 | | |
100100
| flash_attn | | 2.7.4.post1/3.0.0b1 | |
101101
| trl | >=0.15,<0.21 | 0.20.0 | RLHF |
102102
| deepspeed | >=0.14 | 0.16.9 | Training |

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,7 @@ Other important parameters:
216216
- 🔥 target_modules: Specifies the LoRA modules. The default is `['all-linear']`, but you can also pass layer-name suffixes, e.g. `--target_modules q_proj k_proj v_proj`. This argument is not restricted to LoRA and can be used with other tuners as well.
217217
- Note: The behavior of the special value `'all-linear'` differs between plain LLMs and multimodal LLMs. For a standard LLM, it automatically locates every linear layer except `lm_head` and attaches a tuner. For a multimodal LLM, it attaches the tuner only to the LLM component by default. This default can be changed with the `freeze_llm`, `freeze_vit`, and `freeze_aligner` options.
218218
- 🔥target_regex: Specifies a regex expression for LoRA modules, with a default of `None`. If this value is provided, the target_modules parameter becomes ineffective. This parameter is not limited to LoRA and can be used for other tuners.
219+
- target_parameters: List of parameter names to be replaced with LoRA. This argument behaves similarly to target_modules, but you should pass parameter names instead. This feature requires "peft>=0.17.0". For example, in many Mixture-of-Experts (MoE) layers in Hugging Face Transformers, `nn.Linear` is not used; instead, `nn.Parameter` is used. In such cases, the `target_parameters` argument can be used to apply LoRA.
219220
- init_weights: Specifies the method for initializing weights. LoRA can specify `true`, `false`, `gaussian`, `pissa`, `pissa_niter_[number of iters]`. Bone can specify `true`, `false`, `bat`. The default is `true`.
220221
- 🔥modules_to_save: After attaching a tuner, explicitly specifies additional original model modules to participate in training and storage. The default is `[]`. This parameter is not limited to LoRA and can be used for other tuners.
221222

docs/source_en/Instruction/Megatron-SWIFT-Training.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Recommended Operating Environment:
5656
| flash_attn | | 2.7.4.post1/3.0.0b1 | |
5757
| transformers | >=4.33 | 4.51.3 | |
5858
| modelscope | >=1.23 | | |
59-
| peft | >=0.11,<0.17 | | LoRA |
59+
| peft | >=0.11,<0.18 | | LoRA |
6060
| trl | >=0.15,<0.21 | | RLHF |
6161
| deepspeed | >=0.14 | 0.16.9 | |
6262

swift/llm/argument/tuner_args.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ class TunerArguments:
108108
# tuners
109109
target_modules: List[str] = field(default_factory=lambda: ['all-linear'])
110110
target_regex: Optional[str] = None
111+
target_parameters: Optional[List[str]] = None
111112
# e.g. ['wte', 'ln_1', 'ln_2', 'ln_f', 'lm_head']
112113
modules_to_save: List[str] = field(default_factory=list)
113114

swift/llm/train/tuner.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,8 @@ def prepare_adapter(args: TrainArguments, model, *, template=None, train_dataset
173173
task_type = 'SEQ_CLS'
174174
elif task_type == 'GENERATIVE_RERANKER':
175175
task_type = 'CAUSAL_LM'
176+
if args.target_parameters is not None:
177+
lora_kwargs['target_parameters'] = args.target_parameters
176178
lora_config = LoraConfig(task_type=task_type, lora_dtype=args.lora_dtype, **lora_kwargs)
177179
if args.init_weights == 'lora-ga':
178180
try:

0 commit comments

Comments
 (0)