You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_megatron.md
+17-4Lines changed: 17 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -140,6 +140,10 @@ actor_rollout_ref:
140
140
# Use mBridge for parameter import/export (optional)
141
141
use_mbridge: false
142
142
143
+
# Use Megatron checkpoint
144
+
use_dist_checkpointing: false
145
+
dist_checkpointing_path: null
146
+
143
147
# Recomputation settings (helps save memory during training)
144
148
override_transformer_config:
145
149
recompute_granularity: full
@@ -155,6 +159,8 @@ actor_rollout_ref:
155
159
grad_offload: false
156
160
optimizer_offload: false
157
161
use_mbridge: false
162
+
use_dist_checkpointing: false
163
+
dist_checkpointing_path: null
158
164
override_transformer_config:
159
165
recompute_granularity: full
160
166
recompute_method: uniform
@@ -171,6 +177,8 @@ critic:
171
177
grad_offload: false
172
178
optimizer_offload: false
173
179
use_mbridge: false
180
+
use_dist_checkpointing: false
181
+
dist_checkpointing_path: null
174
182
override_transformer_config:
175
183
recompute_granularity: full
176
184
recompute_method: uniform
@@ -182,9 +190,14 @@ critic:
182
190
183
191
### Training Mixture-of-Experts (MoE) Models
184
192
185
-
If you're training an MoE model like **Qwen/Qwen3-30B-A3B**, you have two options:
193
+
If you're training an MoE model like **Qwen/Qwen3-30B-A3B**, you’ll need to take one of the following two approaches to ensure it works properly:
194
+
195
+
1.**Use MBridge (Recommended)**:
196
+
Simply set `use_mbridge: true` in your configuration file. This enables the necessary support for MoE models directly.
186
197
187
-
1.**Enable mBridge**: Set `use_mbridge: true` in the config.
188
-
2.**Convert the model first**: Use the [Hugging Face to MCore converter](https://github.com/volcengine/verl/blob/main/scripts/converter_hf_to_mcore.py) from the **verl** to convert your model before training.
198
+
2.**Convert the model manually**:
199
+
If you prefer not to use MBridge, set `use_mbridge: false`. Before training, you must first convert your Hugging Face model to the MCore format using the [Hugging Face to MCore converter](https://github.com/volcengine/verl/blob/main/scripts/converter_hf_to_mcore.py) from the **verl** repository. After conversion, update your config with:
0 commit comments