Skip to content

Commit 2accb9b

Browse files
authored
fix merge_lora_dtype (#842)
1 parent 339e0ff commit 2accb9b

File tree

10 files changed

+10
-11
lines changed

10 files changed

+10
-11
lines changed

docs/source/LLM/Grok训练和推理.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ torchrun \
5959
--lora_rank 8 \
6060
--lora_alpha 32 \
6161
--lora_dropout_p 0.05 \
62-
--lora_dtype bf16 \
62+
--lora_dtype AUTO \
6363
--lora_target_modules DEFAULT \
6464
--gradient_checkpointing true \
6565
--batch_size 2 \

docs/source/LLM/命令行参数.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@
5151
- `--lora_dropout_p`: 默认为`0.05`, 只有当`sft_type`指定为'lora'时才生效.
5252
- `--lora_bias_trainable`: 默认为`'none'`, 可以选择的值: 'none', 'all'. 如果你要将bias全都设置为可训练, 你可以设置为`'all'`.
5353
- `--lora_modules_to_save`: 默认为`[]`. 如果你想要训练embedding, lm_head, 或者layer_norm, 你可以设置此参数, 例如: `--lora_modules_to_save EMBEDDING LN lm_head`. 如果传入`'EMBEDDING'`, 则将Embedding层添加到`lora_modules_to_save`. 如果传入`'LN'`, 则将`RMSNorm``LayerNorm`添加到`lora_modules_to_save`.
54-
- `--lora_dtype`: 默认为`'fp32'`, 指定lora模块的dtype类型. 如果是`AUTO`则跟随原始模块的dtype类型. 你可以选择的值: 'fp16', 'bf16', 'fp32', 'AUTO'.
54+
- `--lora_dtype`: 默认为`'AUTO'`, 指定lora模块的dtype类型. 如果是`AUTO`则跟随原始模块的dtype类型. 你可以选择的值: 'fp16', 'bf16', 'fp32', 'AUTO'.
5555
- `--use_dora`: 默认为`False`, 是否使用`DoRA`.
5656
- `--use_rslora`: 默认为`False`, 是否使用`RS-LoRA`.
5757
- `--neftune_noise_alpha`: `NEFTune`添加的噪声系数, 可以提升模型在指令微调中的性能, 默认为`None`. 通常可以设置为5, 10, 15. 你可以查看[相关论文](https://arxiv.org/abs/2310.05914).

docs/source_en/LLM/Command-line-parameters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@
5050
- `--lora_dropout_p`: Default is `0.05`, only takes effect when `sft_type` is 'lora'.
5151
- `--lora_bias_trainable`: Default is `'none'`, options: 'none', 'all'. Set to `'all'` to make all biases trainable.
5252
- `--lora_modules_to_save`: Default is `[]`. If you want to train embedding, lm_head, or layer_norm, you can set this parameter, e.g. `--lora_modules_to_save EMBEDDING LN lm_head`. If passed `'EMBEDDING'`, Embedding layer will be added to `lora_modules_to_save`. If passed `'LN'`, `RMSNorm` and `LayerNorm` will be added to `lora_modules_to_save`.
53-
- `--lora_dtype`: Default is `'fp32'`, specifies dtype for lora modules. If `AUTO`, follow dtype of original module. Options: 'fp16', 'bf16', 'fp32', 'AUTO'.
53+
- `--lora_dtype`: Default is `'AUTO'`, specifies dtype for lora modules. If `AUTO`, follow dtype of original module. Options: 'fp16', 'bf16', 'fp32', 'AUTO'.
5454
- `--use_dora`: Default is `False`, whether to use `DoRA`.
5555
- `--use_rslora`: Default is `False`, whether to use `RS-LoRA`.
5656
- `--neftune_noise_alpha`: The noise coefficient added by `NEFTune` can improve performance of instruction fine-tuning, default is `None`. Usually can be set to 5, 10, 15. See [related paper](https://arxiv.org/abs/2310.05914).

docs/source_en/LLM/Grok-1-best-practice.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ torchrun \
5757
--lora_rank 8 \
5858
--lora_alpha 32 \
5959
--lora_dropout_p 0.05 \
60-
--lora_dtype bf16 \
60+
--lora_dtype AUTO \
6161
--lora_target_modules DEFAULT \
6262
--gradient_checkpointing true \
6363
--batch_size 2 \

examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ swift sft \
1919
--lora_alpha 32 \
2020
--lora_dropout_p 0.05 \
2121
--lora_target_modules ALL \
22-
--lora_dtype bf16 \
22+
--lora_dtype AUTO \
2323
--gradient_checkpointing false \
2424
--batch_size 1 \
2525
--weight_decay 0.1 \

examples/pytorch/llm/scripts/grok-1/lora_ddp_ds/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ torchrun \
2121
--lora_rank 8 \
2222
--lora_alpha 32 \
2323
--lora_dropout_p 0.05 \
24-
--lora_dtype bf16 \
24+
--lora_dtype AUTO \
2525
--lora_target_modules DEFAULT \
2626
--gradient_checkpointing true \
2727
--batch_size 2 \

examples/pytorch/llm/scripts/llama2_70b_chat/qlora_fsdp/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ accelerate launch --config_file "./scripts/llama2_70b_chat/qlora_fsdp/fsdp_offlo
2323
--bnb_4bit_quant_storage bfloat16 \
2424
--lora_rank 8 \
2525
--lora_alpha 32 \
26-
--lora_dtype bf16 \
26+
--lora_dtype AUTO \
2727
--lora_dropout_p 0.05 \
2828
--lora_target_modules DEFAULT \
2929
--gradient_checkpointing true \

examples/pytorch/llm/scripts/xverse_moe_a4_2b/lora/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ swift sft \
1111
--num_train_epochs 1 \
1212
--max_length 1024 \
1313
--check_dataset_strategy warning \
14-
--lora_dtype fp16 \
14+
--lora_dtype AUTO \
1515
--lora_rank 8 \
1616
--lora_alpha 32 \
1717
--lora_dropout_p 0.05 \

swift/llm/infer.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -199,8 +199,7 @@ def prepare_model_template(
199199
if is_adapter(args.sft_type) and args.ckpt_dir is not None:
200200
model = Swift.from_pretrained(
201201
model, args.ckpt_dir, inference_mode=True)
202-
if args.sft_type == 'adalora':
203-
model = model.to(model.dtype)
202+
model = model.to(model.dtype)
204203

205204
if verbose:
206205
show_layers(model)

swift/llm/utils/argument.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -366,7 +366,7 @@ class SftArguments(ArgumentsBase):
366366
lora_bias_trainable: Literal['none', 'all'] = 'none'
367367
# e.g. ['wte', 'ln_1', 'ln_2', 'ln_f', 'lm_head']
368368
lora_modules_to_save: List[str] = field(default_factory=list)
369-
lora_dtype: Literal['fp16', 'bf16', 'fp32', 'AUTO'] = 'fp32'
369+
lora_dtype: Literal['fp16', 'bf16', 'fp32', 'AUTO'] = 'AUTO'
370370
lora_lr_ratio: float = None
371371
use_rslora: bool = False
372372
use_dora: bool = False

0 commit comments

Comments
 (0)