You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/Instruction/命令行参数.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,6 +148,7 @@
148
148
- 注意:多模态模型且是LoRA训练时,当设置了`--freeze_vit false`,且命令行中出现以下警告:`UserWarning: None of the inputs have requires_grad=True. Gradients will be None`,请设置`--vit_gradient_checkpointing false`,或提相关issue。全参数训练则不会出现该问题。
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/Command-line-parameters.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -150,7 +150,8 @@ This parameter list inherits from transformers `Seq2SeqTrainingArguments`, with
150
150
- 🔥vit_gradient_checkpointing: Whether to enable gradient_checkpointing for the vit part during multi-modal model training. Defaults to None, meaning it is set to `gradient_checkpointing`. For an example, please refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh).
151
151
- Note: For multimodal models using LoRA training, when `--freeze_vit false` is set and the following warning appears in the command line: `UserWarning: None of the inputs have requires_grad=True. Gradients will be None`, please set `--vit_gradient_checkpointing false`, or raise a related issue. This problem does not occur during full-parameter training.
152
152
- 🔥deepspeed: Defaults to None. It can be set to 'zero0', 'zero1', 'zero2', 'zero3', 'zero2_offload', 'zero3_offload' to use the built-in deepspeed configuration file of ms-swift. You can also provide a path to a custom DeepSpeed configuration file.
153
-
- zero_hpz_partition_size: Default is `None`. This parameter is a feature of `ZeRO++`, which implements model sharding within nodes and data sharding between nodes. If you encounter grad_norm `NaN` issues, please try using `--torch_dtype float16`
153
+
- zero_hpz_partition_size: Default is `None`. This parameter is a feature of `ZeRO++`, which implements model sharding within nodes and data sharding between nodes. If you encounter grad_norm `NaN` issues, please try using `--torch_dtype float16`.
154
+
- deepspeed_autotp_size: DeepSpeed tensor parallelism size, default is 1. When using DeepSpeed AutoTP, the argument `--deepspeed` must be set to 'zero0', 'zero1', or 'zero2'. (Note: This feature only supports full-parameter training.)
154
155
- 🔥per_device_train_batch_size: Default is 1.
155
156
- 🔥per_device_eval_batch_size: Default is 1.
156
157
- 🔥gradient_accumulation_steps: Gradient accumulation, default is None, meaning set gradient_accumulation_steps such that total_batch_size >= 16. The total_batch_size equals `per_device_train_batch_size * gradient_accumulation_steps * world_size`. In GRPO Training, the default is 1.
Copy file name to clipboardExpand all lines: docs/source_en/Megatron-SWIFT/Multimodal-Model.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Multimodal Models
2
2
3
-
ms-swift introduces Megatron's parallelization techniques to accelerate the training of large multimodal models. Currently, it supports CPT/SFT/DPO for models such as Qwen2.5-VL, Qwen2.5-Omni. For a complete list of supported models, please refer to the [Supported Models and Datasets documentation](../Instruction/Supported-models-and-datasets.md).
3
+
ms-swift introduces Megatron's parallelization techniques to accelerate the training of large multimodal models. Currently, it supports CPT/SFT/DPO for models such as Qwen2.5-VL, Qwen2.5-Omni, InternVL3.5, GLM4.5v. For a complete list of supported models, please refer to the [Supported Models and Datasets documentation](../Instruction/Supported-models-and-datasets.md).
4
4
5
5
For environment setup, please refer to the Megatron-SWIFT [Quick Start guide](./Quick-start.md).
6
6
@@ -167,7 +167,6 @@ The model conversion steps for MoE models are the same as those for Dense models
0 commit comments