docs: add usage example for mcore --> hf converter (#807)

ashors1 · web-flow · commit 436540239601 · 2025-07-31T17:03:04.000Z
Signed-off-by: ashors1 &lt;ashors@nvidia.com&gt;
diff --git a/README.md b/README.md
@@ -412,6 +412,17 @@ uv run python examples/converters/convert_dcp_to_hf.py \
     --dcp-ckpt-path results/grpo/step_170/policy/weights/ \
     --hf-ckpt-path results/grpo/hf
 ```
+
+If you have a model saved in Megatron format, you can use the following command to convert it to Hugging Face format prior to running evaluation:
+
+```sh
+# Example for a GRPO checkpoint at step 170
+uv run python examples/converters/convert_megatron_to_hf.py \
+    --config results/grpo/step_170/config.yaml \
+    --dcp-ckpt-path results/grpo/step_170/policy/weights/iter_0000000 \
+    --hf-ckpt-path results/grpo/hf
+```
+
 > **Note:** Adjust the paths according to your training output directory structure.
 
 For an in-depth explanation of checkpointing, refer to the [Checkpointing documentation](docs/design-docs/checkpointing.md).
diff --git a/docs/design-docs/checkpointing.md b/docs/design-docs/checkpointing.md
@@ -1,8 +1,10 @@
-# Checkpointing with Hugging Face Models 
+# Exporting Checkpoints to Hugging Face Format
 
 NeMo RL provides two checkpoint formats for Hugging Face models: Torch distributed and Hugging Face format. Torch distributed is used by default for efficiency, and Hugging Face format is provided for compatibility with Hugging Face's `AutoModel.from_pretrained` API. Note that Hugging Face format checkpoints save only the model weights, ignoring the optimizer states. It is recommended to use Torch distributed format to save intermediate checkpoints and to save a Hugging Face checkpoint only at the end of training. 
 
-A checkpoint converter is provided to convert a Torch distributed checkpoint checkpoint to Hugging Face format after training:
+## Converting Torch Distributed Checkpoints to Hugging Face Format
+
+A checkpoint converter is provided to convert a Torch distributed checkpoint to Hugging Face format after training:
 
 ```sh
 uv run examples/converters/convert_dcp_to_hf.py --config=<YAML CONFIG USED DURING TRAINING> <ANY CONFIG OVERRIDES USED DURING TRAINING> --dcp-ckpt-path=<PATH TO DIST CHECKPOINT TO CONVERT> --hf-ckpt-path=<WHERE TO SAVE HF CHECKPOINT>
@@ -17,3 +19,13 @@ CKPT_DIR=results/sft/step_10
 uv run examples/converters/convert_dcp_to_hf.py --config=$CKPT_DIR/config.yaml --dcp-ckpt-path=$CKPT_DIR/policy/weights --hf-ckpt-path=${CKPT_DIR}-hf
 rsync -ahP $CKPT_DIR/policy/tokenizer ${CKPT_DIR}-hf/
 ```
+
+## Converting Megatron Checkpoints to Hugging Face Format
+
+For models that were originally trained using the Megatron-LM backend, a separate converter is available to convert Megatron checkpoints to Hugging Face format. This script requires Megatron-Core, so make sure to launch the conversion with the `mcore` extra. For example,
+
+```sh
+CKPT_DIR=results/sft/step_10
+
+uv run --extra mcore examples/converters/convert_megatron_to_hf.py --config=$CKPT_DIR/config.yaml --megatron-ckpt-path=$CKPT_DIR/policy/weights/iter_0000000/ --hf-ckpt-path=<path_to_save_hf_ckpt>
+```
diff --git a/examples/converters/convert_megatron_to_hf.py b/examples/converters/convert_megatron_to_hf.py
@@ -18,6 +18,13 @@
 
 from nemo_rl.models.megatron.community_import import export_model_from_megatron
 
+""" NOTE: this script requires mcore. Make sure to launch with the mcore extra:
+uv run --extra mcore python examples/converters/convert_megatron_to_hf.py \
+  --config <path_to_ckpt>/config.yaml \
+  --megatron-ckpt-path <path_to_ckpt>/policy/weights/iter_xxxxx \
+  --hf-ckpt-path <path_to_save_hf_ckpt>
+"""
+
 
 def parse_args():
     """Parse command line arguments."""