Skip to content

Conversation

h-guo18
Copy link
Contributor

@h-guo18 h-guo18 commented Oct 1, 2025

What does this PR do?

Type of change: Bug fix

Overview:
This PR contains two minor fix to support gpt-oss eagle training:

  • Added head_dim in default eagle config to prevent Llama inferring the head_dim by hidden_size/num_heads. This leads to wrong head dim for models like GPT-oss, where hidden_size != num_heads * head_dim.
  • Refactored eagle checkpoint export to avoid passing speculative decoding model to _export_hf_checkpoint, which triggers error for offline training checkpoints.

Usage

Not changed.

Testing

Tested with gpt-oss-120b with offline training, export, and tested checkpoint on spec-bench.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • New Features

    • Added a dedicated speculative-decoding export path that writes safetensors and an updated config for the “eagle” mode, plus public export helpers for speculative checkpoints.
  • Documentation

    • Clarified evaluation support: in-framework evaluation is only for online training; offline-training checkpoints must be evaluated via serving frameworks.
  • Refactor

    • Removed legacy speculative-decoding post-processing and enforced a single supported “eagle” mode for simpler, more predictable exports.
  • Config

    • Default speculative-decoding config updated: head_dim set to 128.

Signed-off-by: h-guo18 <[email protected]>
Copy link

copy-pr-bot bot commented Oct 1, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link

coderabbitai bot commented Oct 1, 2025

Walkthrough

Adds an eagle-only speculative-decoding export path with new spec_opt_only gating and dedicated exporters for state_dict and config; export_hf_checkpoint now early-returns for speculative-only models writing model.safetensors and config.json; removes prior spec-decoding rename/prune and config-adjust hooks; adds head_dim: 128 to default eagle config; updates README guidance on evaluation.

Changes

Cohort / File(s) Summary
Documentation
examples/speculative_decoding/README.md
Clarifies evaluation support: in-framework evaluation only for online training; offline training checkpoints must be evaluated via serving frameworks.
Speculative export plugin API
modelopt/torch/export/plugins/hf_spec_export.py
Adds spec_opt_only, export_spec_ckpt_state_dict, export_spec_ckpt_config; removes rename_and_prune_if_spec_decoding and set_config_if_spec_decoding; enforces eagle-only assertions and gates existing key-mapping/state_dict transforms behind the new exporters.
Unified HF export flow
modelopt/torch/export/unified_export_hf.py
Adds early-exit speculative path: if spec_opt_only(model), write model.safetensors via safetensors.save_file and config.json via export_spec_ckpt_config, then return; removes prior speculative post-processing from the normal export path.
Eagle default config
modelopt/torch/speculative/eagle/default_config.py
Adds head_dim: 128 to default_eagle_config.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as Caller
  participant E as export_hf_checkpoint
  participant P as hf_spec_export (plugin)
  participant S as safetensors.save_file
  participant W as _export_hf_checkpoint

  U->>E: export_hf_checkpoint(model, out_dir)
  E->>P: spec_opt_only(model)
  alt Speculative-only (eagle)
    E->>P: export_spec_ckpt_state_dict(model)
    P-->>E: state_dict
    E->>S: save_file(state_dict, "out_dir/model.safetensors")
    E->>P: export_spec_ckpt_config(model)
    P-->>E: config_json
    E->>E: write "out_dir/config.json"
    E-->>U: return (early exit)
  else Non-speculative or mixed
    E->>W: _export_hf_checkpoint(model, out_dir)
    W-->>E: standard artifacts
    E-->>U: return
  end

  note over E,P: Prior rename/prune and config-adjust hooks removed from normal path
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

Thump-thump, I hop and peep with glee,
An eagle-only export now sets me free.
Safetensors saved, config tucked in neat,
Old hooks have hopped away, no repeat.
Head_dim is 128—my whiskers twitch with pride! 🐰✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely indicates that this pull request provides fixes to enable GPT-OSS support in the Hugging Face “eagle” speculative decoding workflow, which aligns directly with the two main changes of adding head_dim and refactoring the checkpoint export.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch haoguo/gptoss-fix

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8b5007 and 36222c4.

📒 Files selected for processing (1)
  • modelopt/torch/speculative/eagle/default_config.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/torch/speculative/eagle/default_config.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: h-guo18 <[email protected]>
@h-guo18 h-guo18 self-assigned this Oct 1, 2025
Copy link

codecov bot commented Oct 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.79%. Comparing base (cb44c55) to head (36222c4).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #398   +/-   ##
=======================================
  Coverage   73.79%   73.79%           
=======================================
  Files         171      171           
  Lines       17591    17591           
=======================================
  Hits        12982    12982           
  Misses       4609     4609           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@h-guo18 h-guo18 requested a review from yeyu-nvidia October 1, 2025 21:21
@h-guo18 h-guo18 marked this pull request as ready for review October 1, 2025 21:21
@h-guo18 h-guo18 requested review from a team as code owners October 1, 2025 21:21
@h-guo18 h-guo18 requested a review from cjluo-nv October 1, 2025 21:21
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
modelopt/torch/export/unified_export_hf.py (1)

512-518: Consider honoring save_modelopt_state parameter and adding error handling.

The early-exit path correctly prevents errors with offline training checkpoints, but consider these improvements:

  1. The save_modelopt_state parameter (line 499) is unused in this path. If users request modelopt state preservation, should it be saved separately?
  2. File write operations lack error handling, unlike the try-except block in the standard export path (lines 520-550).
  3. Consider using Path operations for consistency: export_dir / "model.safetensors" instead of f-strings.

Optional refactor to use Path operations:

     if spec_opt_only(model):
-        save_file(export_spec_ckpt_state_dict(model), f"{export_dir}/model.safetensors")
-        with open(f"{export_dir}/config.json", "w") as file:
+        save_file(export_spec_ckpt_state_dict(model), export_dir / "model.safetensors")
+        with open(export_dir / "config.json", "w") as file:
             json.dump(export_spec_ckpt_config(model), file, indent=4)
         return
modelopt/torch/export/plugins/hf_spec_export.py (3)

77-79: Track the TODO for cleaner lm_head.weight handling.

The temporary fix for handling missing eagle_lm_head.weight works but should be addressed. The fallback to model.state_dict()["lm_head.weight"] could fail if the key doesn't exist in the base model either.

Do you want me to open a new issue to track this technical debt?


141-141: Fix typo in comment.

Minor typo: "load fron eagle config" should be "load from eagle config".

-            # First, we try to load fron eagle config.
+            # First, we try to load from eagle config.

62-62: Consider more descriptive assertion message.

The assertion message "Not purely eagle model." could be more helpful for debugging. Consider providing information about what optimization modes were found.

-    assert spec_opt_only(model), "Not purely eagle model."
+    opt_modes = getattr(model, "_modelopt_state", None)
+    assert spec_opt_only(model), (
+        f"Expected purely eagle model but found optimization modes: {opt_modes}. "
+        "This export path only supports models with a single 'eagle' optimization."
+    )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb44c55 and b8b5007.

📒 Files selected for processing (4)
  • examples/speculative_decoding/README.md (1 hunks)
  • modelopt/torch/export/plugins/hf_spec_export.py (2 hunks)
  • modelopt/torch/export/unified_export_hf.py (3 hunks)
  • modelopt/torch/speculative/eagle/default_config.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
modelopt/torch/export/unified_export_hf.py (1)
modelopt/torch/export/plugins/hf_spec_export.py (3)
  • export_spec_ckpt_config (84-148)
  • export_spec_ckpt_state_dict (59-81)
  • spec_opt_only (51-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
🔇 Additional comments (1)
modelopt/torch/speculative/eagle/default_config.py (1)

50-50: Verify head_dim in eagle default_config
In modelopt/torch/speculative/eagle/default_config.py (line 50), head_dim is set to 64. Confirm that this matches hidden_size / num_attention_heads in the same file (or document why it intentionally differs) to avoid silent mis-inference.

@h-guo18 h-guo18 requested a review from yeyu-nvidia October 3, 2025 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants