update

yaoyu-33 · yaoyu-33 · commit 60eaacd4111e · 2026-01-28T12:36:22.000-08:00
Signed-off-by: yaoyu-33 &lt;yaoyu.094@gmail.com&gt;
diff --git a/examples/models/vlm/gemma3_vl/README.md b/examples/models/vlm/gemma3_vl/README.md
@@ -21,6 +21,36 @@ See the [conversion.sh](conversion.sh) script for commands to:
 - Export Megatron checkpoints back to Hugging Face format
 - Run multi-GPU round-trip validation between formats
 
+
+## Inference
+
+**See the [inference.sh](inference.sh) script for commands to:
+- Run inference with Hugging Face checkpoints
+- Run inference with imported Megatron checkpoints
+- Run inference with exported Hugging Face checkpoints
+
+**Expected output:**
+```
+...
+Generation step 46
+Generation step 47
+Generation step 48
+Generation step 49
+======== GENERATED TEXT OUTPUT ========
+Image: https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png
+Prompt: Describe this image.
+Generated: <bos><bos><start_of_turn>user
+...
+Describe this image.<end_of_turn>
+<start_of_turn>model
+Here's a description of the image you sent, breaking down the technical specifications of the H100 SXM and H100 NVL server cards:
+
+**Overall:**
+
+The image is a table comparing the technical specifications of two
+=======================================
+```
+
 ## Pretrain
 
 Pretraining is not verified for this model.
@@ -37,25 +67,6 @@ See the [peft.sh](peft.sh) script for LoRA fine-tuning with configurable tensor
 
 [W&B Report](TODO)
 
-## Inference
-
-See the [inference.sh](inference.sh) script for commands to:
-- Run inference with Hugging Face checkpoints
-- Run inference with imported Megatron checkpoints
-- Run inference with exported Hugging Face checkpoints
-
-**Example output:**
-```
-Describe this image.<end_of_turn>
-<start_of_turn>model
-Here's a description of the image you sent, breaking down the technical specifications of the H100 SXM and H100 NVL server cards:
-
-**Overall:**
+## Evaluation
 
-The image is a table comparing the technical specifications of two NVIDIA server cards: the H100 SXM and the H100 NVL. It's designed to highlight the performance differences between the two cards, particularly in terms of compute power and memory.
-
-**Column Breakdown:**
-
-*
-=======================================
-```
+TBD
diff --git a/examples/models/vlm/gemma3_vl/inference.sh b/examples/models/vlm/gemma3_vl/inference.sh
@@ -2,29 +2,29 @@
 WORKSPACE=${WORKSPACE:-/workspace}
 
 # Inference with Hugging Face checkpoints
-uv run torchrun --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
+uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
     --hf_model_path google/gemma-3-4b-it \
     --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
     --prompt "Describe this image." \
-    --max_new_tokens 100 \
+    --max_new_tokens 50 \
     --tp 2 \
     --pp 2
 
 # Inference with imported Megatron checkpoints
-uv run torchrun --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
+uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
     --hf_model_path google/gemma-3-4b-it \
     --megatron_model_path ${WORKSPACE}/models/gemma-3-4b-it/iter_0000000 \
     --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
     --prompt "Describe this image." \
-    --max_new_tokens 100 \
+    --max_new_tokens 50 \
     --tp 2 \
     --pp 2
 
 # Inference with exported HF checkpoints
-uv run torchrun --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
+uv run python -m torch.distributed.run --nproc_per_node=4 examples/conversion/hf_to_megatron_generate_vlm.py \
     --hf_model_path ${WORKSPACE}/models/gemma-3-4b-it-hf-export \
     --image_path "https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/resolve/main/images/table.png" \
     --prompt "Describe this image." \
-    --max_new_tokens 100 \
+    --max_new_tokens 50 \
     --tp 2 \
     --pp 2