Feat: Eagle3 HF Online - support nemotron models #463

h-guo18 · 2025-10-25T01:07:11Z

What does this PR do?

Type of change: New feature

Overview:

Support the nano and nano-VL in eagle3 online mode:
- Added submodule path detection for base model, lm_head, and embeddings to adapt different base model naming structure;
- Refactored data loading/preprocessing to support VLM;
Attn backend improvement:
- Added option of sdpa in case flex_attn doesn't work.
- Added a unified TTT mask function that produce either BlockMask for flex_attn or tensor masks for regular attn.
Logging improvements:
- Added estimated AR validation during training. This is available for both online and offline.
- Plot estimated AR and training acc to wandb for better training visualization;

Usage

For VLM as base model, pass in extra arguments --vlm_processor <hf_model_path> --vlm_img_dir <path to images> in original launching commands. Other usage unchanged.
E.g.

./launch_train.sh --model $MODEL \
            --output_dir $OUTPUT_DIR \
            --data $DATA \
            --num_gpu 1 \
            --num_epochs 2 \
            --train_bs 2 \
            --lr 3e-5 \
            --eagle_config eagle_config.json \
            --training_seq_len 4096 \
            --vlm_processor $MODEL \
            --vlm_img_dir  <path to images>

Testing

Tested short training with HF Online training on following models:

llama-3.2-1b - data: daring-anteater
The new nano (Hyrbid LLM) - data: daring-anteater
The nano-VL - data: Llama-Nemotron-VLM-Dataset-v1/ocr_1

See loss decreasing and AR > 1.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

copy-pr-bot · 2025-10-25T01:07:14Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

h-guo18 · 2025-10-27T23:28:24Z

modelopt/torch/speculative/plugins/transformers.py

-            device = self.model.layers[-1].self_attn.q_proj.weight.device
-        elif hasattr(self.model.layers[-1].self_attn, "qkv_proj"):
-            device = self.model.layers[-1].self_attn.qkv_proj.weight.device
-        self.eagle_module.to(self.dtype).to(device)


TODO: confirm this device detection with @yeyu-nvidia

Signed-off-by: h-guo18 <[email protected]>

codecov · 2025-10-27T23:53:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.38%. Comparing base (41de55f) to head (9c791d9).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #463   +/-   ##
=======================================
  Coverage   73.38%   73.38%           
=======================================
  Files         180      180           
  Lines       18110    18110           
=======================================
  Hits        13290    13290           
  Misses       4820     4820

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: h-guo18 <[email protected]>

yeyu-nvidia · 2025-10-30T16:27:04Z

examples/speculative_decoding/eagle_utils.py

+        input_ids = output.input_ids[0]
+        attention_mask = output.attention_mask[0]
+        loss_mask = torch.ones_like(input_ids)
+        labels = torch.full_like(input_ids, IGNORE_TOKEN_ID)


So all labels are IGNORE_TOKEN_ID?

Yes. This is aligned with previous behavior:

TensorRT-Model-Optimizer/examples/speculative_decoding/eagle_utils.py

Line 87 in 9c791d9

labels = torch.full_like(input_ids, IGNORE_TOKEN_ID)

but previously we will update labels here

TensorRT-Model-Optimizer/examples/speculative_decoding/eagle_utils.py

Line 99 in 9c791d9

labels[indices] = input_ids[indices]

I see. Do you think we will use labels in the future? Otherwise we can get rid of loss_mask and labels and simplify data loading.

Labels should only be useful when we want to tune the base model. it's currently not used in our code.

We will need labels (even if it's dummy) as HF trainer needs this for training. We definitely need loss_mask as we need to exclude padded tokens. I would say we keep them.

yeyu-nvidia · 2025-10-30T16:40:58Z

examples/speculative_decoding/eagle_utils.py

        return ret


 class OfflineSupervisedDataset(Dataset):


Does this support VLM data?

Not yet. Will update to this PR later.

yeyu-nvidia · 2025-10-30T16:51:36Z

examples/speculative_decoding/eagle_utils.py

+        if wandb and is_master():
            wandb.init()

+    def on_log(self, args, state, control, **kwargs):


Can you explain how you estimate AR? I'm not sure it's a good idea to expose "estimated AR" as it may mislead users.

This is calculated by 1 + step_1_acc + step_1_accstep_2_acc + step_1_accstep_2_acc*step_3_acc.

I think we have discussed this and agree on not to use estimated AR. We can either just use acc or run real AR validation.

yeyu-nvidia · 2025-10-30T16:52:31Z

examples/speculative_decoding/main.py

        metadata={"help": "Path to the d2t cache directory."},
    )
+    vlm_img_dir: str = field(default=None, metadata={"help": "Path to the VLM image directory."})
+    vlm_processor: str = field(default=None, metadata={"help": "Path to the VLM processor."})


what is VLM processor?

It's a processor that nano-vl use to pre-process the image and the text into tokens. It is defined here:
https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16/blob/main/processing.py#L43

yeyu-nvidia · 2025-10-30T16:59:31Z

modelopt/torch/speculative/plugins/transformers.py

-        for param in self.model.embed_tokens.parameters():
+        # find base model, lm head, and embeddings paths
+        self._find_base_model_parts()
+        self.eagle_module.to(self._base_model.dtype).to(self._base_model_lm_head.weight.device)


Need to check if ptq/inference fails. We want to make sure eagle_module.device is the same as last base model decoder layer, but this is not necessarily the same as lm_head.device.

Can you confirm on this?

I see the idea. I think it make sense to put eagle on last layer's device.

yeyu-nvidia · 2025-10-30T17:02:09Z

modelopt/torch/speculative/plugins/transformers.py


+        dtypemin = torch.finfo(self._base_llm_config.dtype).min
+        q_len = seq_length
+        kv_len = seq_length * (2 + ttt_step)


Why 2 + ttt_step?

At 0th ttt step, we have kv_len = 2*seq_len

why is that?

Signed-off-by: h-guo18 <[email protected]>

h-guo18 self-assigned this Oct 25, 2025

h-guo18 commented Oct 27, 2025

View reviewed changes

h-guo18 added 3 commits October 27, 2025 23:40

add support for nanov3

20b8d9d

Signed-off-by: h-guo18 <[email protected]>

support nano2-vlm; other improvement

75a4dc1

Signed-off-by: h-guo18 <[email protected]>

minor: read attn impl from config json

a85d473

Signed-off-by: h-guo18 <[email protected]>

h-guo18 force-pushed the haoguo/support-nano branch from 8eb6abf to a85d473 Compare October 27, 2025 23:40

h-guo18 changed the title ~~Feat: eagle3 support for nanov3~~ Feat: eagle3 support for nano2-vlm and nano3 Oct 27, 2025

h-guo18 changed the title ~~Feat: eagle3 support for nano2-vlm and nano3~~ Feat: Eagle3 HF Online - support nano2-vlm and nano3 Oct 27, 2025

h-guo18 marked this pull request as ready for review October 27, 2025 23:56

h-guo18 requested a review from a team as a code owner October 27, 2025 23:56

h-guo18 requested a review from yeyu-nvidia October 27, 2025 23:56

minor: revert irrelevant change

9c791d9

Signed-off-by: h-guo18 <[email protected]>

h-guo18 changed the title ~~Feat: Eagle3 HF Online - support nano2-vlm and nano3~~ Feat: Eagle3 HF Online - support nemotron models Oct 28, 2025

yeyu-nvidia reviewed Oct 30, 2025

View reviewed changes

h-guo18 added 2 commits November 7, 2025 02:13

fix type; support ptq; address comments;

96d71ae

Signed-off-by: h-guo18 <[email protected]>

add vision embedding in eagle input

0e98540

Signed-off-by: h-guo18 <[email protected]>

Feat: Eagle3 HF Online - support nemotron models #463

Are you sure you want to change the base?

Feat: Eagle3 HF Online - support nemotron models #463

Conversation

h-guo18 commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Oct 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-guo18 Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

h-guo18 commented Oct 25, 2025 •

edited

Loading

codecov bot commented Oct 27, 2025 •

edited

Loading

h-guo18 Nov 6, 2025 •

edited

Loading