Skip to content

Commit 70abbe7

Browse files
Support peft 0.11.0 (#953)
1 parent 9074a2f commit 70abbe7

File tree

11 files changed

+247
-34
lines changed

11 files changed

+247
-34
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
4141
SWIFT has rich documentations for users, please check [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM).
4242

4343
## 🎉 News
44+
- 🔥2024.05.17: Support peft=0.11.0. Meanwhile support 3 new tuners: `BOFT`, `Vera` and `Pissa`. use `--sft_type boft/vera` to use BOFT or Vera, use `--init_lora_weights pissa` with `--sft_type lora` to use Pissa.
4445
- 2024.05.16: Supports Llava-Next (Stronger) series models. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source_en/Multi-Modal/llava-best-practice.md).
4546
- 🔥2024.05.13: Support Yi-1.5 series models,use `--model_type yi-1_5-9b-chat` to begin!
4647
- 2024.05.11: Support for qlora training and quantized inference using [hqq](https://github.com/mobiusml/hqq) and [eetq](https://github.com/NetEase-FuXi/EETQ). For more information, see the [LLM Quantization Documentation](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/LLM-quantization.md).

README_CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
4242
SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM).
4343

4444
## 🎉 新闻
45+
- 🔥2024.05.17: 支持peft=0.11.0. 同时支持了三个新的tuner方法: `BOFT`, `Vera``Pissa`. 使用 `--sft_type boft/vera` 开启BOFT或者Vera, 使用 `--init_lora_weights pissa` 以及 `--sft_type lora` 来使用 Pissa.
4546
- 2024.05.16: 支持Llava-Next (Stronger)系列模型,最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
4647
- 🔥2024.05.13: 支持Yi-1.5系列模型,使用`--model_type yi-1_5-9b-chat`等开始体验
4748
- 2024.05.11: 支持使用[hqq](https://github.com/mobiusml/hqq)[eetq](https://github.com/NetEase-FuXi/EETQ)进行qlora训练和量化推理,可以查看[LLM量化文档](https://github.com/modelscope/swift/tree/main/docs/source/LLM/LLM量化文档.md)

docs/source/LLM/命令行参数.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@
6363
- `--lora_rank`: 默认为`8`. 只有当`sft_type`指定为'lora'时才生效.
6464
- `--lora_alpha`: 默认为`32`. 只有当`sft_type`指定为'lora'时才生效.
6565
- `--lora_dropout_p`: 默认为`0.05`, 只有当`sft_type`指定为'lora'时才生效.
66+
- `--init_lora_weights`: 初始化LoRA weights的方法, 可以指定为`true`, `false`, `guassian`, `pissa`, `pissa_niter_[number of iters]`, 默认值`true`.
6667
- `--lora_bias_trainable`: 默认为`'none'`, 可以选择的值: 'none', 'all'. 如果你要将bias全都设置为可训练, 你可以设置为`'all'`.
6768
- `--lora_modules_to_save`: 默认为`[]`. 如果你想要训练embedding, lm_head, 或者layer_norm, 你可以设置此参数, 例如: `--lora_modules_to_save EMBEDDING LN lm_head`. 如果传入`'EMBEDDING'`, 则将Embedding层添加到`lora_modules_to_save`. 如果传入`'LN'`, 则将`RMSNorm``LayerNorm`添加到`lora_modules_to_save`.
6869
- `--lora_dtype`: 默认为`'AUTO'`, 指定lora模块的dtype类型. 如果是`AUTO`则跟随原始模块的dtype类型. 你可以选择的值: 'fp16', 'bf16', 'fp32', 'AUTO'.
@@ -136,6 +137,23 @@
136137

137138
- `--sequence_parallel_size`: 默认值`1`, 大于1时可以拆分一个sequence到多张显卡上以节省显存, 值需要设置为能被DDP数量整除
138139

140+
### BOFT 参数
141+
142+
- `--boft_block_size`: BOFT块尺寸, 默认值4.
143+
- `--boft_block_num`: BOFT块数量, 不能和`boft_block_size`同时使用.
144+
- `--boft_target_modules`: BOFT目标模块. 默认为`['DEFAULT']`. 如果boft_target_modules传入`'DEFAULT'` or `'AUTO'`, 则根据`model_type`查找`MODEL_MAPPING`中的boft_target_modules`(默认指定为qkv). 如果传入`'ALL'`, 则将所有的Linear层(不含head)指定为boft模块.
145+
- `--boft_dropout`: boft的dropout值, 默认0.0.
146+
- `--boft_modules_to_save`: 需要额外训练和存储的模块, 默认为`None`.
147+
148+
### Vera参数
149+
150+
- `--vera_rank`: Vera Attention的尺寸, 默认值256.
151+
- `--vera_projection_prng_key`: 是否存储Vera映射矩阵, 默认为True.
152+
- `--vera_target_modules`: Vera目标模块. 默认为`['DEFAULT']`. 如果vera_target_modules传入`'DEFAULT'` or `'AUTO'`, 则根据`model_type`查找`MODEL_MAPPING`中的vera_target_modules`(默认指定为qkv). 如果传入`'ALL'`, 则将所有的Linear层(不含head)指定为vera模块.
153+
- `--vera_dropout`: Vera的dropout值, 默认`0.0`.
154+
- `--vera_d_initial`: Vera的d矩阵的初始值, 默认`0.1`.
155+
- `--vera_modules_to_save`: 需要额外训练和存储的模块, 默认为`None`.
156+
139157
### LoRA+微调参数
140158

141159
- `--lora_lr_ratio`: 默认值`None`, 建议值`10~16`, 使用lora时指定该参数即可使用lora+.

docs/source_en/LLM/Command-line-parameters.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@
6363
- `--lora_rank`: Default is `8`. Only takes effect when `sft_type` is 'lora'.
6464
- `--lora_alpha`: Default is `32`. Only takes effect when `sft_type` is 'lora'.
6565
- `--lora_dropout_p`: Default is `0.05`, only takes effect when `sft_type` is 'lora'.
66+
- `--init_lora_weights`: Method to initialize LoRA weights, can be specified as `true`, `false`, `gaussian`, `pissa`, or `pissa_niter_[number of iters]`. Default value `true`.
6667
- `--lora_bias_trainable`: Default is `'none'`, options: 'none', 'all'. Set to `'all'` to make all biases trainable.
6768
- `--lora_modules_to_save`: Default is `[]`. If you want to train embedding, lm_head, or layer_norm, you can set this parameter, e.g. `--lora_modules_to_save EMBEDDING LN lm_head`. If passed `'EMBEDDING'`, Embedding layer will be added to `lora_modules_to_save`. If passed `'LN'`, `RMSNorm` and `LayerNorm` will be added to `lora_modules_to_save`.
6869
- `--lora_dtype`: Default is `'AUTO'`, specifies dtype for lora modules. If `AUTO`, follow dtype of original module. Options: 'fp16', 'bf16', 'fp32', 'AUTO'.
@@ -135,6 +136,23 @@
135136

136137
- `--sequence_parallel_size`: Default value `1`, a positive value can be used to split a sequence to multiple GPU to reduce memory usage. The value should divide the GPU count.
137138

139+
### BOFT Parameters
140+
141+
- `--boft_block_size`: BOFT block size, default value is 4.
142+
- `--boft_block_num`: Number of BOFT blocks, cannot be used simultaneously with `boft_block_size`.
143+
- `--boft_target_modules`: BOFT target modules. Default is `['DEFAULT']`. If `boft_target_modules` is set to `'DEFAULT'` or `'AUTO'`, it will look up `boft_target_modules` in the `MODEL_MAPPING` based on `model_type` (default specified as qkv). If set to `'ALL'`, all Linear layers (excluding the head) will be designated as BOFT modules.
144+
- `--boft_dropout`: Dropout value for BOFT, default is 0.0.
145+
- `--boft_modules_to_save`: Additional modules to be trained and saved, default is `None`.
146+
147+
### Vera Parameters
148+
149+
- `--vera_rank`: Size of Vera Attention, default value is 256.
150+
- `--vera_projection_prng_key`: Whether to store the Vera projection matrix, default is True.
151+
- `--vera_target_modules`: Vera target modules. Default is `['DEFAULT']`. If `vera_target_modules` is set to `'DEFAULT'` or `'AUTO'`, it will look up `vera_target_modules` in the `MODEL_MAPPING` based on `model_type` (default specified as qkv). If set to `'ALL'`, all Linear layers (excluding the head) will be designated as Vera modules. Vera modules need to share a same shape.
152+
- `--vera_dropout`: Dropout value for Vera, default is 0.0.
153+
- `--vera_d_initial`: Initial value for Vera's d matrix, default is 0.1.
154+
- `--vera_modules_to_save`: Additional modules to be trained and saved, default is `None`.
155+
138156
### LoRA+ Fine-tuning Parameters
139157

140158
- `--lora_lr_ratio`: Default `None`, recommended value `10~16`, specify this parameter when using lora to enable lora+.

requirements/framework.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ nltk
88
numpy
99
optimum>=1.17.0
1010
pandas
11-
peft>=0.9.0,<0.11.0
11+
peft>=0.11.0,<0.12.0
1212
requests
1313
rouge
1414
safetensors

swift/llm/tuner.py

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@
99

1010
from swift.torchacc_utils import consolidate_checkpoint
1111
from swift.trainers import TrainerCallback
12-
from swift.tuners import (AdaLoraConfig, AdapterConfig, IA3Config, LongLoRAModelType, LoraConfig, LoRAConfig,
13-
NEFTuneConfig, Swift)
12+
from swift.tuners import (AdaLoraConfig, AdapterConfig, BOFTConfig, IA3Config, LongLoRAModelType, LoraConfig,
13+
LoRAConfig, NEFTuneConfig, Swift, VeraConfig)
1414
from swift.tuners.llamapro import LLaMAProConfig
1515
from swift.tuners.module_mapping import MODEL_KEYS_MAPPING
1616
from swift.utils import activate_model_parameters, freeze_model_parameters, get_logger, use_torchacc
@@ -24,6 +24,10 @@ def handle_target_modules(model, args: SftArguments) -> None:
2424
target_modules = args.ia3_target_modules
2525
assert len(args.ia3_feedforward_modules) > 0, ('Setting ia3_target_modules to `ALL` '
2626
'need to pass MLP linear names to `ia3_feedforward_modules`')
27+
elif args.sft_type == 'vera':
28+
target_modules = args.vera_target_modules
29+
elif args.sft_type == 'boft':
30+
target_modules = args.boft_target_modules
2731
else:
2832
target_modules = args.lora_target_modules
2933
if args.lora_use_embedding:
@@ -33,14 +37,43 @@ def handle_target_modules(model, args: SftArguments) -> None:
3337
if args.sft_type == 'ia3':
3438
args.ia3_target_modules = target_modules
3539
logger.info(f'ia3_target_modules: {args.ia3_target_modules}')
40+
elif args.sft_type == 'vera':
41+
args.vera_target_modules = target_modules
42+
logger.info(f'vera_target_modules: {args.ia3_target_modules}')
43+
elif args.sft_type == 'boft':
44+
args.boft_target_modules = target_modules
45+
logger.info(f'boft_target_modules: {args.boft_target_modules}')
3646
else:
3747
args.lora_target_modules = target_modules
3848
logger.info(f'lora_target_modules: {args.lora_target_modules}')
3949

4050

51+
def handle_same_dim_target_modules(model: torch.nn.Module, config: VeraConfig):
52+
target_modules = config.target_modules
53+
modules_dict = {
54+
name: module.weight.shape
55+
for name, module in model.named_modules()
56+
if isinstance(module, torch.nn.Linear) and any([t in name for t in target_modules])
57+
} # only Linear for now
58+
if len(set(modules_dict.values())) > 1:
59+
v = [t for t in target_modules if 'v' in t]
60+
if not v:
61+
raise ValueError('Please manually pass in `vera_target_modules`, do not use `DEFAULT` or `ALL`,'
62+
'because Vera need all target linears to be the same size.')
63+
v = v[0]
64+
shape = [shape for name, shape in modules_dict.items() if v in name][0]
65+
names = [_name for _name, _shape in modules_dict.items() if _shape == shape]
66+
config.target_modules = [t for t in target_modules if any([t in name for name in names])]
67+
return config
68+
69+
4170
def handle_modules_to_save(model, args: SftArguments) -> None:
4271
if args.sft_type == 'ia3':
4372
modules_to_save = args.ia3_modules_to_save
73+
elif args.sft_type == 'vera':
74+
modules_to_save = args.vera_modules_to_save
75+
elif args.sft_type == 'boft':
76+
modules_to_save = args.boft_modules_to_save
4477
else:
4578
modules_to_save = args.lora_modules_to_save
4679
if args.lora_m2s_use_embedding:
@@ -51,6 +84,12 @@ def handle_modules_to_save(model, args: SftArguments) -> None:
5184
if args.sft_type == 'ia3':
5285
args.ia3_modules_to_save = modules_to_save
5386
logger.info(f'ia3_modules_to_save: {args.ia3_modules_to_save}')
87+
elif args.sft_type == 'vera':
88+
args.vera_modules_to_save = modules_to_save
89+
logger.info(f'vera_modules_to_save: {args.vera_modules_to_save}')
90+
elif args.sft_type == 'boft':
91+
args.boft_modules_to_save = modules_to_save
92+
logger.info(f'boft_modules_to_save: {args.boft_modules_to_save}')
5493
else:
5594
args.lora_modules_to_save = modules_to_save
5695
logger.info(f'lora_modules_to_save: {args.lora_modules_to_save}')
@@ -62,6 +101,8 @@ def prepare_model(model, args: SftArguments):
62101
if args.resume_from_checkpoint is None:
63102
handle_target_modules(model, args)
64103
handle_modules_to_save(model, args)
104+
if args.init_lora_weights and args.init_lora_weights.lower() in ('true', 'false'):
105+
args.init_lora_weights = args.init_lora_weights.lower() in ('true', 'True')
65106
lora_kwargs = {
66107
'r': args.lora_rank,
67108
'target_modules': args.lora_target_modules,
@@ -72,6 +113,7 @@ def prepare_model(model, args: SftArguments):
72113
'use_rslora': args.use_rslora,
73114
'use_dora': args.use_dora,
74115
'lorap_lr_ratio': args.lora_lr_ratio,
116+
'init_lora_weights': args.init_lora_weights,
75117
}
76118
if args.sft_type in ('lora', 'longlora'):
77119
if args.lora_dtype == 'AUTO':
@@ -158,6 +200,29 @@ def prepare_model(model, args: SftArguments):
158200
act_layer=args.adapter_act)
159201
model = Swift.prepare_model(model, adapter_config)
160202
logger.info(f'adapter_config: {adapter_config}')
203+
elif args.sft_type == 'vera':
204+
vera_config = VeraConfig(
205+
r=args.vera_rank,
206+
target_modules=args.vera_target_modules,
207+
projection_prng_key=args.vera_projection_prng_key,
208+
vera_dropout=args.vera_dropout,
209+
d_initial=args.vera_d_initial,
210+
modules_to_save=args.vera_modules_to_save,
211+
)
212+
vera_config = handle_same_dim_target_modules(model, vera_config)
213+
model = Swift.prepare_model(model, vera_config)
214+
logger.info(f'vera_config: {vera_config}')
215+
elif args.sft_type == 'boft':
216+
boft_config = BOFTConfig(
217+
boft_block_size=args.boft_block_size,
218+
boft_block_num=args.boft_block_num,
219+
boft_n_butterfly_factor=args.boft_n_butterfly_factor,
220+
target_modules=args.boft_target_modules,
221+
boft_dropout=args.boft_dropout,
222+
modules_to_save=args.boft_modules_to_save,
223+
)
224+
model = Swift.prepare_model(model, boft_config)
225+
logger.info(f'boft_config: {boft_config}')
161226
else:
162227
if use_torchacc():
163228
consolidate_checkpoint(args.resume_from_checkpoint, 'adapter_model')

swift/llm/utils/argument.py

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232

3333

3434
def is_adapter(sft_type: str) -> bool:
35-
return sft_type in {'lora', 'longlora', 'adalora', 'ia3', 'llamapro', 'adapter'}
35+
return sft_type in {'lora', 'longlora', 'adalora', 'ia3', 'llamapro', 'adapter', 'vera', 'boft'}
3636

3737

3838
class ArgumentsBase:
@@ -404,7 +404,7 @@ class SftArguments(ArgumentsBase):
404404
default=None,
405405
metadata={'help': "Decoder Class name of model, e.g. 'QWenBlock' for QWen, 'LlamaDecoderLayer' for LLama"})
406406

407-
sft_type: Literal['lora', 'full', 'longlora', 'adalora', 'ia3', 'llamapro', 'adapter'] = 'lora'
407+
sft_type: Literal['lora', 'full', 'longlora', 'adalora', 'ia3', 'llamapro', 'adapter', 'vera', 'boft'] = 'lora'
408408
freeze_parameters: float = 0. # 0 ~ 1
409409
additional_trainable_parameters: List[str] = field(default_factory=list)
410410
tuner_backend: Literal['swift', 'peft', 'unsloth'] = 'peft'
@@ -457,6 +457,23 @@ class SftArguments(ArgumentsBase):
457457
lora_lr_ratio: float = None
458458
use_rslora: bool = False
459459
use_dora: bool = False
460+
init_lora_weights: Literal['gaussian', 'pissa', 'pissa_niter_[number of iters]', 'loftq', 'true', 'false'] = 'true'
461+
462+
# BOFT
463+
boft_block_size: int = 4
464+
boft_block_num: int = 0
465+
boft_n_butterfly_factor: int = 1
466+
boft_target_modules: Optional[Union[List[str], str]] = field(default_factory=lambda: ['DEFAULT'])
467+
boft_dropout: float = 0.0
468+
boft_modules_to_save: List[str] = field(default_factory=list)
469+
470+
# Vera
471+
vera_rank: int = 256
472+
vera_target_modules: Optional[Union[List[str], str]] = field(default_factory=lambda: ['DEFAULT'])
473+
vera_projection_prng_key: int = 0
474+
vera_dropout: float = 0.0
475+
vera_d_initial: float = 0.1
476+
vera_modules_to_save: List[str] = field(default_factory=list)
460477

461478
# adapter
462479
adapter_act: str = 'gelu'
@@ -684,6 +701,12 @@ def __post_init__(self) -> None:
684701
self.ia3_feedforward_modules = self._prepare_target_modules(self.ia3_feedforward_modules)
685702
self.ia3_target_modules = self._prepare_target_modules(self.ia3_target_modules)
686703
self.ia3_modules_to_save = self._prepare_modules_to_save(self.ia3_modules_to_save)
704+
elif self.sft_type == 'vera':
705+
self.vera_target_modules = self._prepare_target_modules(self.vera_target_modules)
706+
self.vera_modules_to_save = self._prepare_modules_to_save(self.vera_modules_to_save)
707+
elif self.sft_type == 'boft':
708+
self.boft_target_modules = self._prepare_target_modules(self.boft_target_modules)
709+
self.boft_modules_to_save = self._prepare_modules_to_save(self.boft_modules_to_save)
687710
else:
688711
self.lora_target_modules = self._prepare_target_modules(self.lora_target_modules)
689712
self.lora_modules_to_save = self._prepare_modules_to_save(self.lora_modules_to_save)
@@ -926,7 +949,7 @@ class InferArguments(ArgumentsBase):
926949
model_id_or_path: Optional[str] = None
927950
model_revision: Optional[str] = None
928951

929-
sft_type: Literal['lora', 'longlora', 'full', 'adalora', 'ia3', 'llamapro'] = 'lora'
952+
sft_type: Literal['lora', 'longlora', 'full', 'adalora', 'ia3', 'llamapro', 'vera', 'boft'] = 'lora'
930953
template_type: str = field(
931954
default='AUTO', metadata={'help': f"template_type choices: {list(TEMPLATE_MAPPING.keys()) + ['AUTO']}"})
932955
infer_backend: Literal['AUTO', 'vllm', 'pt'] = 'AUTO'

swift/trainers/mixin.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111

1212
import json
1313
import numpy as np
14+
import peft
1415
import safetensors
1516
import torch
1617
import transformers
@@ -250,6 +251,8 @@ def __init__(self,
250251
optimizers=optimizers,
251252
preprocess_logits_for_metrics=preprocess_logits_for_metrics,
252253
**kwargs)
254+
if not self.label_names:
255+
self.label_names = ['labels']
253256
if is_quantized and use_swift:
254257
model._hf_peft_config_loaded = _hf_peft_config_loaded
255258

@@ -381,7 +384,7 @@ def _save(self, output_dir: Optional[str] = None, state_dict=None):
381384
from swift import SWIFT_MAPPING
382385
addtional_module_tuners = [
383386
name.lower() for name, (config, cls) in SWIFT_MAPPING.items() if cls.has_additional_modules()
384-
]
387+
] + list(peft.PEFT_TYPE_TO_CONFIG_MAPPING.keys())
385388
if self.tokenizer is not None and sft_args.sft_type not in addtional_module_tuners:
386389
self.tokenizer.save_pretrained(output_dir)
387390
# training_args.bin

0 commit comments

Comments
 (0)