You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,6 +39,7 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
39
39
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
40
40
41
41
## 🎉 News
42
+
- 2024.04.04: Support **QLoRA+FSDP** to train a 70B model with two 24G memory GPUs, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/llama2_70b_chat/qlora_fsdp/sft.sh) to train.
42
43
- 🔥2024.04.03: Support **Qwen1.5-32B** series: Qwen1.5-32B, Qwen1.5-32B-Chat, Qwen1.5-32B-Chat-GPTQ-Int4.use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen1half_32b_chat/lora_mp/sft.sh) to start training!
43
44
- 🔥2024.04.02: Support the fine-tuning and inference of Mengzi3-13B-Base model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mengzi3_13b_base/lora_ddp_ds/sft.sh) to start training!
44
45
- 🔥2024.04.01: Support **dbrx** series: dbrx-base and dbrx-instruct, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh) to start training!
Copy file name to clipboardExpand all lines: docs/source_en/LLM/Command-line-parameters.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,6 +43,7 @@
43
43
-`--bnb_4bit_comp_dtype`: When doing 4bit quantization, we need to dequantize during model forward and backward passes. This specifies the torch_dtype after dequantization. Default is `'AUTO'`, i.e. consistent with `dtype`. Options: 'fp16', 'bf16', 'fp32'. Has no effect when quantization_bit is 0.
44
44
-`--bnb_4bit_quant_type`: Quantization method for 4bit quantization, default is `'nf4'`. Options: 'nf4', 'fp4'. Has no effect when quantization_bit is 0.
45
45
-`--bnb_4bit_use_double_quant`: Whether to enable double quantization for 4bit quantization, default is `True`. Has no effect when quantization_bit is 0.
46
+
-`--bnb_4bit_quant_storage`: Default vlaue `None`.This sets the storage type to pack the quanitzed 4-bit prarams. Has no effect when quantization_bit is 0.
46
47
-`--lora_target_modules`: Specify lora modules, default is `['DEFAULT']`. If lora_target_modules is passed `'DEFAULT'` or `'AUTO'`, look up `lora_target_modules` in `MODEL_MAPPING` based on `model_type` (default specifies qkv). If passed `'ALL'`, all Linear layers (excluding head) will be specified as lora modules. If passed `'EMBEDDING'`, Embedding layer will be specified as lora module. If memory allows, setting to 'ALL' is recommended. You can also set `['ALL', 'EMBEDDING']` to specify all Linear and embedding layers as lora modules. This parameter only takes effect when `sft_type` is 'lora'.
47
48
-`--lora_rank`: Default is `8`. Only takes effect when `sft_type` is 'lora'.
48
49
-`--lora_alpha`: Default is `32`. Only takes effect when `sft_type` is 'lora'.
@@ -104,6 +105,12 @@
104
105
-`--train_dataset_mix_ds`: Default is `ms-bench`. General knowledge dataset used to prevent knowledge forgetting.
105
106
-`--use_loss_scale`: Default is `False`. When taking effect, strengthens loss weight of some Agent fields (Action/Action Input part) to enhance CoT, has no effect in regular SFT scenarios.
106
107
108
+
### FSDP Parameters
109
+
110
+
-`--fsdp`: Default value`''`, the FSDP type, please check[this documentation](https://huggingface.co/docs/transformers/v4.39.3/en/main_classes/trainer#transformers.TrainingArguments.fsdp) for details.
111
+
112
+
-`--fsdp_config`: Default value`None`, the FSDP config file path, `fsdp_offload` is a special value, check [here](https://github.com/modelscope/swift/tree/main/swift/llm/fsdp_config/fsdp_offload.json) for details.
113
+
107
114
### LoRA+ Fine-tuning Parameters
108
115
109
116
-`--lora_lr_ratio`: Default `None`, recommended value `10~16`, specify this parameter when using lora to enable lora+.
@@ -184,6 +191,7 @@ dpo parameters inherit from sft parameters, with the following added parameters:
184
191
-`--bnb_4bit_comp_dtype`: Default is `'AUTO'`. See `sft.sh command line arguments` for parameter details. If `quantization_bit` is set to 0, this parameter has no effect.
185
192
-`--bnb_4bit_quant_type`: Default is `'nf4'`. See `sft.sh command line arguments` for parameter details. If `quantization_bit` is set to 0, this parameter has no effect.
186
193
-`--bnb_4bit_use_double_quant`: Default is `True`. See `sft.sh command line arguments` for parameter details. If `quantization_bit` is set to 0, this parameter has no effect.
194
+
-`--bnb_4bit_quant_storage`: Default value `None`.See `sft.sh command line arguments` for parameter details. If `quantization_bit` is set to 0, this parameter has no effect.
187
195
-`--max_new_tokens`: Maximum number of new tokens to generate, default is `2048`.
188
196
-`--do_sample`: Whether to use greedy generation or sampling generation, default is `True`.
189
197
-`--temperature`: Default is `0.3`. This parameter only takes effect when `do_sample` is set to True. This parameter will be used as default value in deployment parameters.
0 commit comments