Skip to content

Conversation

h-guo18
Copy link
Contributor

@h-guo18 h-guo18 commented Sep 23, 2025

What does this PR do?

Type of change: Bug fix

Overview:

This PR fix two minor bugs of speculative decoding:

  • Updates tests/examples/speculative_decoding after PR 300;
  • Disabled AR validation during offline training to avoid error;

Usage

# Add a code snippet demonstrating how to use this

Testing

  • Tested with pytest -s tests/examples/speculative_decoding
  • Dummy online/offline training runs without errors;

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • New Features

    • When an offline data path is provided, the training command now also sets --ar_validate_steps to -1 automatically.
  • Tests

    • Test runs updated to invoke the training launcher.
    • Removed explicit --do_eval False flag from test invocations, changing evaluation behavior during those tests.

Copy link

copy-pr-bot bot commented Sep 23, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link

coderabbitai bot commented Sep 23, 2025

Warning

Rate limit exceeded

@h-guo18 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 25 minutes and 43 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between c1f38e3 and 98c4de4.

📒 Files selected for processing (3)
  • examples/speculative_decoding/launch_train.sh (1 hunks)
  • tests/examples/speculative_decoding/test_eagle.py (1 hunks)
  • tests/examples/speculative_decoding/test_medusa.py (1 hunks)

Walkthrough

Appends --ar_validate_steps -1 to OFFLINE_TRAINING_ARGS in examples/speculative_decoding/launch_train.sh when offline data is provided. Updates tests to call ./launch_train.sh instead of ./launch.sh and removes the --do_eval False flag.

Changes

Cohort / File(s) Summary of changes
Speculative decoding training script
examples/speculative_decoding/launch_train.sh
When --offline-data-path is used, OFFLINE_TRAINING_ARGS now includes --ar_validate_steps -1 in addition to the path.
Speculative decoding tests
tests/examples/speculative_decoding/test_eagle.py, tests/examples/speculative_decoding/test_medusa.py
Test invocations changed from ./launch.sh to ./launch_train.sh; removed the explicit --do_eval False argument from both test commands.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my whiskers, scripts refined,
A tiny flag, a test aligned.
Hop-hop! the launch path nudges through,
Offline steps told what to do.
Carrots counted, quiet cheer—🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately summarizes the main changes — updating the eagle test and disabling evaluation for offline runs — which align with the modified test files and the offline-training argument change in launch_train.sh; it is concise, specific, and relevant to the changeset.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@h-guo18 h-guo18 self-assigned this Sep 23, 2025
@h-guo18 h-guo18 marked this pull request as ready for review September 23, 2025 18:19
@h-guo18 h-guo18 requested a review from a team as a code owner September 23, 2025 18:19
@kevalmorabia97
Copy link
Collaborator

Does test_medusa.py pass as well?

@h-guo18 h-guo18 marked this pull request as draft September 23, 2025 18:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/examples/speculative_decoding/test_eagle.py (1)

47-47: Cast tmp_path to str for output_dir to avoid type issues.

Align with how config_file is handled and prevent potential arg serialization issues.

Apply:

-            "--output_dir", tmp_path / "eagle-tinyllama",
+            "--output_dir", str(tmp_path / "eagle-tinyllama"),
examples/speculative_decoding/launch_train.sh (1)

132-132: Quote OFFLINE_DATA_PATH to handle spaces safely.

Prevents word-splitting if the path contains spaces.

Apply:

-    OFFLINE_TRAINING_ARGS="--offline-data-path $OFFLINE_DATA_PATH --ar_validate_steps -1"
+    OFFLINE_TRAINING_ARGS="--offline-data-path \"$OFFLINE_DATA_PATH\" --ar_validate_steps -1"

(Optional follow-up) Consider quoting other path-like vars in CMD (MODEL, OUTPUT_DIR, DATA).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d5f636 and 18b073f.

📒 Files selected for processing (2)
  • examples/speculative_decoding/launch_train.sh (1 hunks)
  • tests/examples/speculative_decoding/test_eagle.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: benchislett
PR: NVIDIA/TensorRT-Model-Optimizer#300
File: examples/speculative_decoding/main.py:150-159
Timestamp: 2025-09-17T14:14:44.961Z
Learning: In examples/speculative_decoding/main.py, setting num_hidden_layers=0 for offline training has been successfully tested by benchislett with past models and works in their use case for EAGLE3 offline training.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: linux
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (2)
examples/speculative_decoding/launch_train.sh (1)

127-134: Approve — disabling AR validation for offline training is supported

examples/speculative_decoding/main.py declares ar_validate_steps, ARValidationCallback returns early when ar_validate_steps <= 0, and the trainer is constructed with ARValidationCallback(training_args.ar_validate_steps); offline_data_path toggles offline mode. Using --ar_validate_steps -1 is valid.

tests/examples/speculative_decoding/test_eagle.py (1)

37-41: Approved — ./launch_train.sh path and executable verified

run_example_command sets cwd to MODELOPT_ROOT / "examples" / example_path, and examples/speculative_decoding/launch_train.sh exists with git mode 100755 (executable).

@h-guo18 h-guo18 marked this pull request as ready for review September 23, 2025 18:27
Copy link

codecov bot commented Sep 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.46%. Comparing base (115b145) to head (98c4de4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #358      +/-   ##
==========================================
- Coverage   73.46%   73.46%   -0.01%     
==========================================
  Files         172      172              
  Lines       17640    17640              
==========================================
- Hits        12960    12959       -1     
- Misses       4680     4681       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@h-guo18 h-guo18 enabled auto-merge (squash) September 23, 2025 19:08
@h-guo18 h-guo18 merged commit 26c203a into main Sep 23, 2025
27 checks passed
@h-guo18 h-guo18 deleted the haoguo/eagle3-fix branch September 23, 2025 19:26
yeyu-nvidia pushed a commit that referenced this pull request Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants