Skip to content

[train] support target_parameters #5340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/Instruction/Megatron-SWIFT训练.md
Original file line number Diff line number Diff line change
Expand Up @@ -432,6 +432,7 @@ lora训练:
- adapter_load: 加载adapter的权重路径,用于lora断点续训,默认为None。lora断点续训方式与全参数一致,请关注`--finetune`参数的含义。
- 🔥target_modules: 指定lora模块的后缀, 默认为`['all-linear']`。
- 🔥target_regex: 指定lora模块的regex表达式,默认为`None`。如果该值传入,则target_modules参数失效。
- target_parameters: 要替换为LoRA的参数名称列表。该参数的行为与 `target_modules` 类似,但传入的应是参数名称。该特性需要安装"peft>=0.17.0"。
- 🔥modules_to_save: 在已附加tuner后,额外指定一部分原模型模块参与训练和存储。默认为`[]`。
- 🔥lora_rank: 默认为`8`。
- 🔥lora_alpha: 默认为`32`。
Expand Down
1 change: 1 addition & 0 deletions docs/source/Instruction/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@
- 🔥target_modules: 指定lora模块, 默认为`['all-linear']`。你也可以设置为module的后缀,例如:`--target_modules q_proj k_proj v_proj`。该参数不限于LoRA,可用于其他tuners。
- 注意:在LLM和多模态LLM中,'all-linear'的行为有所不同。若是LLM则自动寻找除lm_head外的linear并附加tuner;若是多模态LLM,则默认只在LLM上附加tuner,该行为可以被`freeze_llm`、`freeze_vit`、`freeze_aligner`控制。
- 🔥target_regex: 指定lora模块的regex表达式,默认为`None`。如果该值传入,则target_modules参数失效。该参数不限于LoRA,可用于其他tuners。
- target_parameters: 要替换为LoRA的参数名称列表。该参数的行为与 `target_modules` 类似,但传入的应是参数名称。该特性需要安装"peft>=0.17.0"。
- init_weights: 初始化weights的方法,LoRA可以指定为`true`、`false`、`gaussian`、`pissa`、`pissa_niter_[number of iters]`,Bone可以指定为`true`、`false`、`bat`。默认值`true`。
- 🔥modules_to_save: 在已附加tuner后,额外指定一部分原模型模块参与训练和存储。默认为`[]`。该参数不限于LoRA,可用于其他tuners。

Expand Down
1 change: 1 addition & 0 deletions docs/source_en/Instruction/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ Other important parameters:
- 🔥 target_modules: Specifies the LoRA modules. The default is `['all-linear']`, but you can also pass layer-name suffixes, e.g. `--target_modules q_proj k_proj v_proj`. This argument is not restricted to LoRA and can be used with other tuners as well.
- Note: The behavior of the special value `'all-linear'` differs between plain LLMs and multimodal LLMs. For a standard LLM, it automatically locates every linear layer except `lm_head` and attaches a tuner. For a multimodal LLM, it attaches the tuner only to the LLM component by default. This default can be changed with the `freeze_llm`, `freeze_vit`, and `freeze_aligner` options.
- 🔥target_regex: Specifies a regex expression for LoRA modules, with a default of `None`. If this value is provided, the target_modules parameter becomes ineffective. This parameter is not limited to LoRA and can be used for other tuners.
- target_parameters: List of parameter names to be replaced with LoRA. This argument behaves similarly to target_modules, but you should pass parameter names instead. This feature requires "peft>=0.17.0".
- init_weights: Specifies the method for initializing weights. LoRA can specify `true`, `false`, `gaussian`, `pissa`, `pissa_niter_[number of iters]`. Bone can specify `true`, `false`, `bat`. The default is `true`.
- 🔥modules_to_save: After attaching a tuner, explicitly specifies additional original model modules to participate in training and storage. The default is `[]`. This parameter is not limited to LoRA and can be used for other tuners.

Expand Down
1 change: 1 addition & 0 deletions docs/source_en/Instruction/Megatron-SWIFT-Training.md
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,7 @@ LoRA Training:
- adapter_load: The path to the adapter weights for loading, used for resuming LoRA training from a checkpoint. The default is None. The method for resuming LoRA training from a checkpoint is the same as for full-parameter training. Please pay attention to the meaning of the `--finetune` parameter.
- 🔥target_modules: Suffixes of modules to apply LoRA to. Default is `['all-linear']`.
- 🔥target_regex: Regex expression to specify LoRA modules. Default is `None`. If this value is provided, the `target_modules` parameter will be ignored.
- target_parameters: List of parameter names to be replaced with LoRA. This argument behaves similarly to target_modules, but you should pass parameter names instead. This feature requires "peft>=0.17.0".
- 🔥modules_to_save: After attaching a tuner, explicitly specifies additional original model modules to participate in training and storage. The default is `[]`.
- 🔥lora_rank: Default is `8`.
- 🔥lora_alpha: Default is `32`.
Expand Down
2 changes: 1 addition & 1 deletion requirements/framework.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ numpy
openai
oss2
pandas
peft>=0.11,<0.17
peft>=0.11,<0.18
pillow
PyYAML>=5.4
requests
Expand Down
1 change: 1 addition & 0 deletions swift/llm/argument/tuner_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ class TunerArguments:
# tuners
target_modules: List[str] = field(default_factory=lambda: ['all-linear'])
target_regex: Optional[str] = None
target_parameters: Optional[list[str]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other type hints in this file, such as target_modules and modules_to_save, it's better to use List[str] from the typing module instead of the built-in list[str]. This will ensure uniformity across the codebase.

Suggested change
target_parameters: Optional[list[str]] = None
target_parameters: Optional[List[str]] = None

# e.g. ['wte', 'ln_1', 'ln_2', 'ln_f', 'lm_head']
modules_to_save: List[str] = field(default_factory=list)

Expand Down
2 changes: 2 additions & 0 deletions swift/llm/train/tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,8 @@ def prepare_adapter(args: TrainArguments, model, *, template=None, train_dataset
task_type = 'SEQ_CLS'
elif task_type == 'GENERATIVE_RERANKER':
task_type = 'CAUSAL_LM'
if args.target_parameters is not None:
lora_kwargs['target_parameters'] = args.target_parameters
lora_config = LoraConfig(task_type=task_type, lora_dtype=args.lora_dtype, **lora_kwargs)
if args.init_weights == 'lora-ga':
try:
Expand Down
1 change: 1 addition & 0 deletions swift/megatron/argument/megatron_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ class MegatronTunerMixin:
adapter_load: Optional[str] = None
target_modules: List[str] = field(default_factory=lambda: ['all-linear'])
target_regex: Optional[str] = None
target_parameters: Optional[List[str]] = None
modules_to_save: List[str] = field(default_factory=list)

# lora
Expand Down
2 changes: 2 additions & 0 deletions swift/megatron/utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ def prepare_adapter(model):
'modules_to_save': modules_to_save,
'use_rslora': args.use_rslora,
}
if args.target_parameters is not None:
lora_kwargs['target_parameters'] = args.target_parameters
lora_config = LoraConfig(task_type='CAUSAL_LM', lora_dtype=args.lora_dtype, **lora_kwargs)
logger.info(f'lora_config: {lora_config}')
return Swift.prepare_model(model, lora_config)
Expand Down
Loading