fix:eagle3 offline #456

h-guo18 · 2025-10-21T19:48:49Z

What does this PR do?

Type of change: Bug fix

Overview:

A few minor fixes for eagle3 offline:

Fixed sample index mismatch when dumping hiddens with TRTLLM when some conversations are skipped.
Now keep system prompt when preprocessing dataset;
Accept dataset from a directory of multiple jsonl files; (To support data synthesized from here)
Accept conversation id from field uuid and assert there is a conversation id; (Same as above)
Optionally disable training progress bar to avoid CI logging overflow;
Add Qwen3 to support matrix.

Usage

No change;

Testing

Offline/Online dummy training tested;
Tested Qwen3-30B-A3B for training, export, and eval;

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

New Features
- Added support for QWen model versions 1.5, 2, 2.5, and 3
- Added a --disable_tqdm option to turn off the training progress bar
- Data loading can now ingest entire directories of JSONL files
Bug Fixes
- More robust handling and validation for missing data, conversation IDs, and hidden-state dumps
- Removed strict role-order enforcement during data preprocessing
Documentation
- Updated Support Matrix to include new QWen versions

copy-pr-bot · 2025-10-21T19:48:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-10-21T19:49:18Z

Walkthrough

Refactors speculative decoding examples: adds a disable-tqdm flag, updates README support matrix, converts TRTLLM hidden-state post-processing to async with validation and cleanup, and strengthens data-loading and conversation ID resolution and validation.

Changes

Cohort / File(s)	Summary
tqdm Progress Bar Control `examples/speculative_decoding/launch_train.sh`, `examples/speculative_decoding/main.py`	Added `--disable_tqdm` CLI flag parsing in the launch script and a `disable_tqdm: bool = False` field on `TrainingArguments`; flag is forwarded into the accelerator command.
Documentation `examples/speculative_decoding/README.md`	Updated Support Matrix to list QWen versions `1.5`, `2`, `2.5`, and `3`.
Hidden States Post-processing `examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py`	Made `_post_process_trtllm_dumped` asynchronous and returning `bool`; added existence check, strict validation expecting a single-element list with a dict containing `id` and `hidden_state`, writes per-conversation output, unlinks original dump on success, and adjusted caller logic to await result and increment success counts only on true. Removed input-id slicing and aligned dump indexing with valid conversations.
Data Loading & Validation `examples/speculative_decoding/eagle_utils.py`	Removed enforced alternating-role assumption in preprocessing; allow loading all `.jsonl` files from a directory; resolve conversation IDs from `conversation_id` → `uuid` → `id`, raising `ValueError` if none; added logging and validation for `offline_data_path` and `.pt` presence and descriptive error/warning handling.

Sequence Diagram(s)

sequenceDiagram
  participant Collector as Collector
  participant TRT_dump as TRTLLM Dump File
  participant PostProc as _post_process_trtllm_dumped (async)
  participant Output as Conversation Output File
  note over Collector,TRT_dump: Hidden states collection produces a dump file

  Collector->>TRT_dump: write dump (idx N)
  Collector->>PostProc: await post_process(trt_dump, conv_id)
  alt file missing
    PostProc-->>Collector: return False
  else file present
    PostProc->>PostProc: validate structure == [ { id, hidden_state } ]
    PostProc->>Output: write conversation-specific output
    PostProc->>TRT_dump: unlink original dump
    PostProc-->>Collector: return True
  end
  alt success
    Collector->>Collector: increment num_success and progress
  else failure
    Collector->>Collector: skip increment, log/warn
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through flags and async night,

Hidden dumps validated in moonlight,
QWen versions lined in rows,
Data IDs found where no one knows,
Tqdm rests — the rabbit nods in delight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title Check	❓ Inconclusive	The PR title "fix:eagle3 offline" is related to the changeset's subject area but lacks specificity about what is actually being fixed. The PR objectives reveal multiple significant changes including index mismatch fixes, dataset preprocessing improvements, conversation ID field handling, progress bar control, and Qwen3 model support. However, the title provides no indication of these specific fixes and uses generic phrasing that doesn't convey meaningful information beyond identifying the area being modified. A teammate scanning pull request history would not understand the primary changes from this title alone.	Consider revising the title to be more specific about the primary fix. For example: "fix: eagle3 offline index mismatch and dataset handling" or similar phrasing that captures the main changes. This would help reviewers and future maintainers quickly understand the scope and intent of the changeset without needing to review all the objectives.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch haoguo/fix-eagle-1021

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-10-21T20:04:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.39%. Comparing base (4476f21) to head (313d3b1).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #456   +/-   ##
=======================================
  Coverage   73.39%   73.39%           
=======================================
  Files         180      180           
  Lines       17976    17976           
=======================================
+ Hits        13193    13194    +1     
+ Misses       4783     4782    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: h-guo18 <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

examples/speculative_decoding/eagle_utils.py (1)
316-316: Remove duplicate log message.

The log message "Loading input conversations..." appears twice—once at line 256 (in the offline path) and again here at line 316 (in the non-offline path). Consider removing this duplicate since the message at line 256 already covers the loading step.

Apply this diff to remove the duplicate:
-    print_rank_0("Loading input conversations...")
     dataset_cls = LazySupervisedDataset if data_args.lazy_preprocess else SupervisedDataset

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d19f4f5 and 70d4eee.

📒 Files selected for processing (5)

examples/speculative_decoding/README.md (1 hunks)
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (3 hunks)
examples/speculative_decoding/eagle_utils.py (3 hunks)
examples/speculative_decoding/launch_train.sh (3 hunks)
examples/speculative_decoding/main.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

examples/speculative_decoding/eagle_utils.py (1)

modelopt/torch/utils/logging.py (1)

print_rank_0 (92-95)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: linux

🔇 Additional comments (12)

examples/speculative_decoding/README.md (1)

315-315: LGTM! Documentation updated to reflect expanded model support.

The support matrix now correctly includes QWen versions 1.5, 2, 2.5, and 3, aligning with the PR's goal of adding Qwen3 model support.

examples/speculative_decoding/main.py (1)

95-95: LGTM! Clean addition of progress bar control.

The new disable_tqdm field follows the established pattern for training arguments and addresses the PR objective to prevent CI logging overflow.

examples/speculative_decoding/launch_train.sh (3)

81-84: LGTM! Flag parsing follows established pattern.

The --disable_tqdm flag is parsed consistently with other command-line arguments in the script.

117-117: LGTM! Default value aligns with main.py.

The default value of False matches the field default in TrainingArguments.

173-173: LGTM! Flag correctly forwarded to training command.

The --disable_tqdm flag is properly injected into the accelerate launch command.

examples/speculative_decoding/eagle_utils.py (3)

63-65: LGTM! Removed hard-coded role alternation check.

This change enables more flexible conversation formats, including system prompts, which aligns with the PR's objective to preserve system prompts during dataset preprocessing.

256-269: LGTM! Directory support for multi-file datasets.

The enhanced data loading logic now accepts a directory containing multiple .jsonl files, which fulfills the PR objective. The implementation correctly handles both single files and directories with appropriate path-based branching.

286-293: LGTM! Robust conversation ID resolution with proper validation.

The enhanced ID resolution now tries conversation_id, then uuid, then id, and raises a descriptive ValueError if none are found. This aligns with the PR's objective to accept the "uuid" field and ensure an ID is present.

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (4)

211-239: LGTM! Robust async post-processing with validation.

The refactored _post_process_trtllm_dumped function correctly:

Returns early if the dump file doesn't exist (line 219-220)

Validates the expected format with clear assertions (lines 223-230)

Transforms the data to match HF format conventions

Cleans up the temporary dump file (line 238)

The async implementation is appropriate for I/O-heavy operations.

247-248: LGTM! Correct async usage with success tracking.

The code properly awaits the async post-processing function and converts the boolean result to an integer for the success counter.

255-274: LGTM! Fixes index mismatch for skipped conversations.

The key fix here is initializing idx = 0 (line 255) and incrementing only for valid conversations (line 274). This ensures that the dump file index aligns with the actual number of processed conversations, resolving the sample index mismatch mentioned in the PR objectives when conversations are skipped.

264-264: LGTM! Full conversation preservation.

Removing the [:256] slice ensures that complete conversations are processed, with proper length validation occurring downstream at line 268.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (2)
211-239: Blocking I/O operations in async function will block the event loop.

The function is declared async but performs synchronous blocking I/O operations (torch.load, torch.save, Path.unlink) without using await. This means these operations will block the event loop when the function is awaited, defeating the purpose of async concurrency and potentially degrading performance when processing multiple conversations.

Consider one of these solutions:

Option 1 (simpler): Remove async and call it synchronously since post-processing is relatively fast:
-    async def _post_process_trtllm_dumped(trtllm_dumped_file: str, conversation_id: int):
+    def _post_process_trtllm_dumped(trtllm_dumped_file: str, conversation_id: int):
         """
         Post-process the TRTLLM dumped file to same format as HF dumped:
         1. Remove id field, replace it with conversation_id
Then update the call site:
-        dump_success = await _post_process_trtllm_dumped(trtllm_dumped_file, conversation_id)
+        dump_success = _post_process_trtllm_dumped(trtllm_dumped_file, conversation_id)
Option 2 (proper async I/O): Use asyncio.to_thread to run blocking I/O in a thread pool:
async def _post_process_trtllm_dumped(trtllm_dumped_file: str, conversation_id: int):
    return await asyncio.to_thread(_post_process_trtllm_dumped_sync, trtllm_dumped_file, conversation_id)

def _post_process_trtllm_dumped_sync(trtllm_dumped_file: str, conversation_id: int):
    # existing implementation
152-153: Replace assertion with proper validation in filter function.

Using assert for data validation in a filter function is risky. If a conversation lacks both conversation_id and uuid fields, the assertion will raise an exception that could crash the entire filtering operation. Assertions should be used for invariants, not data validation.
     def keep_conversation(entry):
         conversation_id = entry.get("conversation_id", entry.get("uuid", None))
-        assert conversation_id is not None, "conversation_id is required"
+        if conversation_id is None:
+            print(f"Warning: Skipping conversation without conversation_id or uuid")
+            return False
         output_file = args.output_dir / f"{conversation_id}.pt"
         return not output_file.exists()

🧹 Nitpick comments (2)

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (2)
219-220: Consider logging when the TRTLLM dump file is missing.

The early existence check is good defensive programming, but silently returning False when the file doesn't exist could make debugging difficult. If TRTLLM generation fails to create the expected file, it would be helpful to log this for troubleshooting.
         if not trtllm_dumped_file.exists():
+            print(f"Warning: TRTLLM dump file not found: {trtllm_dumped_file}")
             return False
238-239: Add error handling for file deletion.

The unlink() operation could raise exceptions (e.g., permission errors, filesystem issues). While unlikely, a failure here shouldn't necessarily fail the entire post-processing since the data was already successfully saved.
-        trtllm_dumped_file.unlink()
-        return True
+        try:
+            trtllm_dumped_file.unlink()
+        except OSError as e:
+            print(f"Warning: Failed to delete temporary file {trtllm_dumped_file}: {e}")
+        return True

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70d4eee and 313d3b1.

📒 Files selected for processing (5)

examples/speculative_decoding/README.md (1 hunks)
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (3 hunks)
examples/speculative_decoding/eagle_utils.py (3 hunks)
examples/speculative_decoding/launch_train.sh (3 hunks)
examples/speculative_decoding/main.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

examples/speculative_decoding/README.md
examples/speculative_decoding/launch_train.sh
examples/speculative_decoding/eagle_utils.py
examples/speculative_decoding/main.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (3)

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (3)
241-249: Logic correctly tracks post-processing success.

The function properly awaits the asynchronous operations and only increments num_success when post-processing succeeds. The file indexing (idx + 1) correctly accounts for TRTLLM's 1-based file naming.

Note: The effectiveness of this implementation depends on fixing the blocking I/O issue in _post_process_trtllm_dumped (see earlier comment).

255-274: Index tracking fix correctly aligns with TRTLLM dump files.

The explicit idx counter that only increments for valid conversations (line 274) is the key fix mentioned in the PR objectives. This ensures the dump file index matches the actual conversations processed, preventing index mismatches when some conversations are skipped.

The removal of the [:256] slice on line 264 is intentional and allows processing full conversations up to max_seq_len.

Optional defensive improvement: Add a safety check for conversation_id even though the earlier filter should have removed None values:
         for entry in dataset:
             conversation_id = entry.get("conversation_id", entry.get("uuid"))
+            if conversation_id is None:
+                num_invalid += 1
+                continue

             conversations = entry["conversations"]
264-267: Full conversation processing is correctly implemented.

Removing the [:256] slice allows the script to process complete conversations up to max_seq_len, as intended by the PR. The subsequent length validation (lines 268-270) ensures conversations stay within bounds.

h-guo18 self-assigned this Oct 21, 2025

h-guo18 force-pushed the haoguo/fix-eagle-1021 branch from 4d4c153 to 885e017 Compare October 21, 2025 20:05

h-guo18 requested a review from yeyu-nvidia October 21, 2025 20:07

yeyu-nvidia approved these changes Oct 21, 2025

View reviewed changes

h-guo18 force-pushed the haoguo/fix-eagle-1021 branch from 885e017 to 70d4eee Compare October 21, 2025 20:19

h-guo18 marked this pull request as ready for review October 21, 2025 20:28

h-guo18 requested a review from a team as a code owner October 21, 2025 20:28

h-guo18 requested a review from ChenhanYu October 21, 2025 20:28

fix:eagle3 offline

313d3b1

Signed-off-by: h-guo18 <[email protected]>

h-guo18 force-pushed the haoguo/fix-eagle-1021 branch from 70d4eee to 313d3b1 Compare October 21, 2025 20:31

coderabbitai bot reviewed Oct 21, 2025

View reviewed changes

h-guo18 merged commit ff8a1ed into main Oct 21, 2025
27 checks passed

h-guo18 deleted the haoguo/fix-eagle-1021 branch October 21, 2025 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix:eagle3 offline #456

fix:eagle3 offline #456

Uh oh!

h-guo18 commented Oct 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Oct 21, 2025

Uh oh!

coderabbitai bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix:eagle3 offline #456

fix:eagle3 offline #456

Uh oh!

Conversation

h-guo18 commented Oct 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Oct 21, 2025

Uh oh!

coderabbitai bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

h-guo18 commented Oct 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 21, 2025 •

edited

Loading

codecov bot commented Oct 21, 2025 •

edited

Loading