diff --git "a/docs/source/Instruction/Megatron-SWIFT\350\256\255\347\273\203.md" "b/docs/source/Instruction/Megatron-SWIFT\350\256\255\347\273\203.md" index 282ba9df4f..6c30747b8a 100644 --- "a/docs/source/Instruction/Megatron-SWIFT\350\256\255\347\273\203.md" +++ "b/docs/source/Instruction/Megatron-SWIFT\350\256\255\347\273\203.md" @@ -432,6 +432,7 @@ lora训练: - adapter_load: 加载adapter的权重路径,用于lora断点续训,默认为None。lora断点续训方式与全参数一致,请关注`--finetune`参数的含义。 - 🔥target_modules: 指定lora模块的后缀, 默认为`['all-linear']`。 - 🔥target_regex: 指定lora模块的regex表达式,默认为`None`。如果该值传入,则target_modules参数失效。 +- target_parameters: 要替换为LoRA的参数名称列表。该参数的行为与 `target_modules` 类似,但传入的应是参数名称。该特性需要安装"peft>=0.17.0"。 - 🔥modules_to_save: 在已附加tuner后,额外指定一部分原模型模块参与训练和存储。默认为`[]`。 - 🔥lora_rank: 默认为`8`。 - 🔥lora_alpha: 默认为`32`。 diff --git "a/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" "b/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" index 0f06bcec2a..e7f1a93770 100644 --- "a/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" +++ "b/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" @@ -212,6 +212,7 @@ - 🔥target_modules: 指定lora模块, 默认为`['all-linear']`。你也可以设置为module的后缀,例如:`--target_modules q_proj k_proj v_proj`。该参数不限于LoRA,可用于其他tuners。 - 注意:在LLM和多模态LLM中,'all-linear'的行为有所不同。若是LLM则自动寻找除lm_head外的linear并附加tuner;若是多模态LLM,则默认只在LLM上附加tuner,该行为可以被`freeze_llm`、`freeze_vit`、`freeze_aligner`控制。 - 🔥target_regex: 指定lora模块的regex表达式,默认为`None`。如果该值传入,则target_modules参数失效。该参数不限于LoRA,可用于其他tuners。 +- target_parameters: 要替换为LoRA的参数名称列表。该参数的行为与 `target_modules` 类似,但传入的应是参数名称。该特性需要安装"peft>=0.17.0"。 - init_weights: 初始化weights的方法,LoRA可以指定为`true`、`false`、`gaussian`、`pissa`、`pissa_niter_[number of iters]`,Bone可以指定为`true`、`false`、`bat`。默认值`true`。 - 🔥modules_to_save: 在已附加tuner后,额外指定一部分原模型模块参与训练和存储。默认为`[]`。该参数不限于LoRA,可用于其他tuners。 diff --git a/docs/source_en/Instruction/Command-line-parameters.md b/docs/source_en/Instruction/Command-line-parameters.md index f0a67f63a1..211c116940 100644 --- a/docs/source_en/Instruction/Command-line-parameters.md +++ b/docs/source_en/Instruction/Command-line-parameters.md @@ -216,6 +216,7 @@ Other important parameters: - 🔥 target_modules: Specifies the LoRA modules. The default is `['all-linear']`, but you can also pass layer-name suffixes, e.g. `--target_modules q_proj k_proj v_proj`. This argument is not restricted to LoRA and can be used with other tuners as well. - Note: The behavior of the special value `'all-linear'` differs between plain LLMs and multimodal LLMs. For a standard LLM, it automatically locates every linear layer except `lm_head` and attaches a tuner. For a multimodal LLM, it attaches the tuner only to the LLM component by default. This default can be changed with the `freeze_llm`, `freeze_vit`, and `freeze_aligner` options. - 🔥target_regex: Specifies a regex expression for LoRA modules, with a default of `None`. If this value is provided, the target_modules parameter becomes ineffective. This parameter is not limited to LoRA and can be used for other tuners. +- target_parameters: List of parameter names to be replaced with LoRA. This argument behaves similarly to target_modules, but you should pass parameter names instead. This feature requires "peft>=0.17.0". - init_weights: Specifies the method for initializing weights. LoRA can specify `true`, `false`, `gaussian`, `pissa`, `pissa_niter_[number of iters]`. Bone can specify `true`, `false`, `bat`. The default is `true`. - 🔥modules_to_save: After attaching a tuner, explicitly specifies additional original model modules to participate in training and storage. The default is `[]`. This parameter is not limited to LoRA and can be used for other tuners. diff --git a/docs/source_en/Instruction/Megatron-SWIFT-Training.md b/docs/source_en/Instruction/Megatron-SWIFT-Training.md index d813322f02..a0641ce142 100644 --- a/docs/source_en/Instruction/Megatron-SWIFT-Training.md +++ b/docs/source_en/Instruction/Megatron-SWIFT-Training.md @@ -453,6 +453,7 @@ LoRA Training: - adapter_load: The path to the adapter weights for loading, used for resuming LoRA training from a checkpoint. The default is None. The method for resuming LoRA training from a checkpoint is the same as for full-parameter training. Please pay attention to the meaning of the `--finetune` parameter. - 🔥target_modules: Suffixes of modules to apply LoRA to. Default is `['all-linear']`. - 🔥target_regex: Regex expression to specify LoRA modules. Default is `None`. If this value is provided, the `target_modules` parameter will be ignored. +- target_parameters: List of parameter names to be replaced with LoRA. This argument behaves similarly to target_modules, but you should pass parameter names instead. This feature requires "peft>=0.17.0". - 🔥modules_to_save: After attaching a tuner, explicitly specifies additional original model modules to participate in training and storage. The default is `[]`. - 🔥lora_rank: Default is `8`. - 🔥lora_alpha: Default is `32`. diff --git a/requirements/framework.txt b/requirements/framework.txt index d147e5080d..fef91f96fa 100644 --- a/requirements/framework.txt +++ b/requirements/framework.txt @@ -19,7 +19,7 @@ numpy openai oss2 pandas -peft>=0.11,<0.17 +peft>=0.11,<0.18 pillow PyYAML>=5.4 requests diff --git a/swift/llm/argument/tuner_args.py b/swift/llm/argument/tuner_args.py index 35351f3c19..fb5e038613 100644 --- a/swift/llm/argument/tuner_args.py +++ b/swift/llm/argument/tuner_args.py @@ -108,6 +108,7 @@ class TunerArguments: # tuners target_modules: List[str] = field(default_factory=lambda: ['all-linear']) target_regex: Optional[str] = None + target_parameters: Optional[list[str]] = None # e.g. ['wte', 'ln_1', 'ln_2', 'ln_f', 'lm_head'] modules_to_save: List[str] = field(default_factory=list) diff --git a/swift/llm/train/tuner.py b/swift/llm/train/tuner.py index e7c0f4ef2f..57459c49f4 100644 --- a/swift/llm/train/tuner.py +++ b/swift/llm/train/tuner.py @@ -173,6 +173,8 @@ def prepare_adapter(args: TrainArguments, model, *, template=None, train_dataset task_type = 'SEQ_CLS' elif task_type == 'GENERATIVE_RERANKER': task_type = 'CAUSAL_LM' + if args.target_parameters is not None: + lora_kwargs['target_parameters'] = args.target_parameters lora_config = LoraConfig(task_type=task_type, lora_dtype=args.lora_dtype, **lora_kwargs) if args.init_weights == 'lora-ga': try: diff --git a/swift/megatron/argument/megatron_args.py b/swift/megatron/argument/megatron_args.py index 585899670c..1f23b337c8 100644 --- a/swift/megatron/argument/megatron_args.py +++ b/swift/megatron/argument/megatron_args.py @@ -40,6 +40,7 @@ class MegatronTunerMixin: adapter_load: Optional[str] = None target_modules: List[str] = field(default_factory=lambda: ['all-linear']) target_regex: Optional[str] = None + target_parameters: Optional[List[str]] = None modules_to_save: List[str] = field(default_factory=list) # lora diff --git a/swift/megatron/utils/utils.py b/swift/megatron/utils/utils.py index d233474652..c09286ad7d 100644 --- a/swift/megatron/utils/utils.py +++ b/swift/megatron/utils/utils.py @@ -71,6 +71,8 @@ def prepare_adapter(model): 'modules_to_save': modules_to_save, 'use_rslora': args.use_rslora, } + if args.target_parameters is not None: + lora_kwargs['target_parameters'] = args.target_parameters lora_config = LoraConfig(task_type='CAUSAL_LM', lora_dtype=args.lora_dtype, **lora_kwargs) logger.info(f'lora_config: {lora_config}') return Swift.prepare_model(model, lora_config)