Fix: supporting gpt-oss HF eagle #398

h-guo18 · 2025-10-01T20:35:48Z

What does this PR do?

Type of change: Bug fix

Overview:
This PR contains two minor fix to support gpt-oss eagle training:

Added head_dim in default eagle config to prevent Llama inferring the head_dim by hidden_size/num_heads. This leads to wrong head dim for models like GPT-oss, where hidden_size != num_heads * head_dim.
Refactored eagle checkpoint export to avoid passing speculative decoding model to _export_hf_checkpoint, which triggers error for offline training checkpoints.

Usage

Not changed.

Testing

Tested with gpt-oss-120b with offline training, export, and tested checkpoint on spec-bench.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

New Features
- Added a dedicated speculative-decoding export path that writes safetensors and an updated config for the “eagle” mode, plus public export helpers for speculative checkpoints.
Documentation
- Clarified evaluation support: in-framework evaluation is only for online training; offline-training checkpoints must be evaluated via serving frameworks.
Refactor
- Removed legacy speculative-decoding post-processing and enforced a single supported “eagle” mode for simpler, more predictable exports.
Config
- Default speculative-decoding config updated: head_dim set to 128.

Signed-off-by: h-guo18 <[email protected]>

copy-pr-bot · 2025-10-01T20:35:52Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2025-10-01T20:35:56Z

Walkthrough

Adds an eagle-only speculative-decoding export path with new spec_opt_only gating and dedicated exporters for state_dict and config; export_hf_checkpoint now early-returns for speculative-only models writing model.safetensors and config.json; removes prior spec-decoding rename/prune and config-adjust hooks; adds head_dim: 128 to default eagle config; updates README guidance on evaluation.

Changes

Cohort / File(s)	Summary
Documentation `examples/speculative_decoding/README.md`	Clarifies evaluation support: in-framework evaluation only for online training; offline training checkpoints must be evaluated via serving frameworks.
Speculative export plugin API `modelopt/torch/export/plugins/hf_spec_export.py`	Adds `spec_opt_only`, `export_spec_ckpt_state_dict`, `export_spec_ckpt_config`; removes `rename_and_prune_if_spec_decoding` and `set_config_if_spec_decoding`; enforces eagle-only assertions and gates existing key-mapping/state_dict transforms behind the new exporters.
Unified HF export flow `modelopt/torch/export/unified_export_hf.py`	Adds early-exit speculative path: if `spec_opt_only(model)`, write `model.safetensors` via `safetensors.save_file` and `config.json` via `export_spec_ckpt_config`, then return; removes prior speculative post-processing from the normal export path.
Eagle default config `modelopt/torch/speculative/eagle/default_config.py`	Adds `head_dim: 128` to `default_eagle_config`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as Caller
  participant E as export_hf_checkpoint
  participant P as hf_spec_export (plugin)
  participant S as safetensors.save_file
  participant W as _export_hf_checkpoint

  U->>E: export_hf_checkpoint(model, out_dir)
  E->>P: spec_opt_only(model)
  alt Speculative-only (eagle)
    E->>P: export_spec_ckpt_state_dict(model)
    P-->>E: state_dict
    E->>S: save_file(state_dict, "out_dir/model.safetensors")
    E->>P: export_spec_ckpt_config(model)
    P-->>E: config_json
    E->>E: write "out_dir/config.json"
    E-->>U: return (early exit)
  else Non-speculative or mixed
    E->>W: _export_hf_checkpoint(model, out_dir)
    W-->>E: standard artifacts
    E-->>U: return
  end

  note over E,P: Prior rename/prune and config-adjust hooks removed from normal path

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

Thump-thump, I hop and peep with glee,
An eagle-only export now sets me free.
Safetensors saved, config tucked in neat,
Old hooks have hopped away, no repeat.
Head_dim is 128—my whiskers twitch with pride! 🐰✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely indicates that this pull request provides fixes to enable GPT-OSS support in the Hugging Face “eagle” speculative decoding workflow, which aligns directly with the two main changes of adding head_dim and refactoring the checkpoint export.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch haoguo/gptoss-fix

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8b5007 and 36222c4.

📒 Files selected for processing (1)

modelopt/torch/speculative/eagle/default_config.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

modelopt/torch/speculative/eagle/default_config.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: h-guo18 <[email protected]>

codecov · 2025-10-01T20:58:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.79%. Comparing base (cb44c55) to head (36222c4).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #398   +/-   ##
=======================================
  Coverage   73.79%   73.79%           
=======================================
  Files         171      171           
  Lines       17591    17591           
=======================================
  Hits        12982    12982           
  Misses       4609     4609

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

modelopt/torch/export/unified_export_hf.py (1)
512-518: Consider honoring save_modelopt_state parameter and adding error handling.

The early-exit path correctly prevents errors with offline training checkpoints, but consider these improvements:

The save_modelopt_state parameter (line 499) is unused in this path. If users request modelopt state preservation, should it be saved separately?

File write operations lack error handling, unlike the try-except block in the standard export path (lines 520-550).

Consider using Path operations for consistency: export_dir / "model.safetensors" instead of f-strings.

Optional refactor to use Path operations:
     if spec_opt_only(model):
-        save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors")
-        with open(f"{export_dir}/config.json", "w") as file:
+        save_file(export_spec_ckpt_state_dict(model), export_dir / "model.safetensors")
+        with open(export_dir / "config.json", "w") as file:
             json.dump(export_spec_ckpt_config(model), file, indent=4)
         return
modelopt/torch/export/plugins/hf_spec_export.py (3)
77-79: Track the TODO for cleaner lm_head.weight handling.

The temporary fix for handling missing eagle_lm_head.weight works but should be addressed. The fallback to model.state_dict()["lm_head.weight"] could fail if the key doesn't exist in the base model either.

Do you want me to open a new issue to track this technical debt?

141-141: Fix typo in comment.

Minor typo: "load fron eagle config" should be "load from eagle config".
-            # First, we try to load fron eagle config.
+            # First, we try to load from eagle config.
62-62: Consider more descriptive assertion message.

The assertion message "Not purely eagle model." could be more helpful for debugging. Consider providing information about what optimization modes were found.
-    assert spec_opt_only(model), "Not purely eagle model."
+    opt_modes = getattr(model, "_modelopt_state", None)
+    assert spec_opt_only(model), (
+        f"Expected purely eagle model but found optimization modes: {opt_modes}. "
+        "This export path only supports models with a single 'eagle' optimization."
+    )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb44c55 and b8b5007.

📒 Files selected for processing (4)

examples/speculative_decoding/README.md (1 hunks)
modelopt/torch/export/plugins/hf_spec_export.py (2 hunks)
modelopt/torch/export/unified_export_hf.py (3 hunks)
modelopt/torch/speculative/eagle/default_config.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

modelopt/torch/export/unified_export_hf.py (1)

modelopt/torch/export/plugins/hf_spec_export.py (3)

export_spec_ckpt_config (84-148)

export_spec_ckpt_state_dict (59-81)

spec_opt_only (51-56)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait

🔇 Additional comments (1)

modelopt/torch/speculative/eagle/default_config.py (1)

50-50: Verify head_dim in eagle default_config
In modelopt/torch/speculative/eagle/default_config.py (line 50), head_dim is set to 64. Confirm that this matches hidden_size / num_attention_heads in the same file (or document why it intentionally differs) to avoid silent mis-inference.

modelopt/torch/speculative/eagle/default_config.py

Signed-off-by: h-guo18 <[email protected]>

fix for gptoss

63cf465

Signed-off-by: h-guo18 <[email protected]>

minor: update readme

b8b5007

Signed-off-by: h-guo18 <[email protected]>

h-guo18 self-assigned this Oct 1, 2025

h-guo18 requested a review from yeyu-nvidia October 1, 2025 21:21

h-guo18 marked this pull request as ready for review October 1, 2025 21:21

h-guo18 requested review from a team as code owners October 1, 2025 21:21

h-guo18 requested a review from cjluo-nv October 1, 2025 21:21

coderabbitai bot reviewed Oct 1, 2025

View reviewed changes

yeyu-nvidia reviewed Oct 1, 2025

View reviewed changes

modelopt/torch/speculative/eagle/default_config.py Outdated Show resolved Hide resolved

update default headdim to 128

36222c4

Signed-off-by: h-guo18 <[email protected]>

h-guo18 requested a review from yeyu-nvidia October 3, 2025 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: supporting gpt-oss HF eagle #398

Fix: supporting gpt-oss HF eagle #398

Uh oh!

h-guo18 commented Oct 1, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Oct 1, 2025

Uh oh!

coderabbitai bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Fix: supporting gpt-oss HF eagle #398

Are you sure you want to change the base?

Fix: supporting gpt-oss HF eagle #398

Uh oh!

Conversation

h-guo18 commented Oct 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Oct 1, 2025

Uh oh!

coderabbitai bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

h-guo18 commented Oct 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 1, 2025 •

edited

Loading

codecov bot commented Oct 1, 2025 •

edited

Loading