Skip to content

Conversation

@h-guo18
Copy link
Contributor

@h-guo18 h-guo18 commented Oct 21, 2025

What does this PR do?

Type of change: Bug fix

Overview:

A few minor fixes for eagle3 offline:

  • Fixed sample index mismatch when dumping hiddens with TRTLLM when some conversations are skipped.
  • Now keep system prompt when preprocessing dataset;
  • Accept dataset from a directory of multiple jsonl files; (To support data synthesized from here)
  • Accept conversation id from field uuid and assert there is a conversation id; (Same as above)
  • Optionally disable training progress bar to avoid CI logging overflow;
  • Add Qwen3 to support matrix.

Usage

No change;

Testing

  • Offline/Online dummy training tested;
  • Tested Qwen3-30B-A3B for training, export, and eval;

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • New Features

    • Added support for QWen model versions 1.5, 2, 2.5, and 3
    • Added a --disable_tqdm option to turn off the training progress bar
    • Data loading can now ingest entire directories of JSONL files
  • Bug Fixes

    • More robust handling and validation for missing data, conversation IDs, and hidden-state dumps
    • Removed strict role-order enforcement during data preprocessing
  • Documentation

    • Updated Support Matrix to include new QWen versions

@h-guo18 h-guo18 self-assigned this Oct 21, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

Walkthrough

Refactors speculative decoding examples: adds a disable-tqdm flag, updates README support matrix, converts TRTLLM hidden-state post-processing to async with validation and cleanup, and strengthens data-loading and conversation ID resolution and validation.

Changes

Cohort / File(s) Summary
tqdm Progress Bar Control
examples/speculative_decoding/launch_train.sh, examples/speculative_decoding/main.py
Added --disable_tqdm CLI flag parsing in the launch script and a disable_tqdm: bool = False field on TrainingArguments; flag is forwarded into the accelerator command.
Documentation
examples/speculative_decoding/README.md
Updated Support Matrix to list QWen versions 1.5, 2, 2.5, and 3.
Hidden States Post-processing
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py
Made _post_process_trtllm_dumped asynchronous and returning bool; added existence check, strict validation expecting a single-element list with a dict containing id and hidden_state, writes per-conversation output, unlinks original dump on success, and adjusted caller logic to await result and increment success counts only on true. Removed input-id slicing and aligned dump indexing with valid conversations.
Data Loading & Validation
examples/speculative_decoding/eagle_utils.py
Removed enforced alternating-role assumption in preprocessing; allow loading all .jsonl files from a directory; resolve conversation IDs from conversation_iduuidid, raising ValueError if none; added logging and validation for offline_data_path and .pt presence and descriptive error/warning handling.

Sequence Diagram(s)

sequenceDiagram
  participant Collector as Collector
  participant TRT_dump as TRTLLM Dump File
  participant PostProc as _post_process_trtllm_dumped (async)
  participant Output as Conversation Output File
  note over Collector,TRT_dump: Hidden states collection produces a dump file

  Collector->>TRT_dump: write dump (idx N)
  Collector->>PostProc: await post_process(trt_dump, conv_id)
  alt file missing
    PostProc-->>Collector: return False
  else file present
    PostProc->>PostProc: validate structure == [ { id, hidden_state } ]
    PostProc->>Output: write conversation-specific output
    PostProc->>TRT_dump: unlink original dump
    PostProc-->>Collector: return True
  end
  alt success
    Collector->>Collector: increment num_success and progress
  else failure
    Collector->>Collector: skip increment, log/warn
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through flags and async night,

Hidden dumps validated in moonlight,
QWen versions lined in rows,
Data IDs found where no one knows,
Tqdm rests — the rabbit nods in delight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The PR title "fix:eagle3 offline" is related to the changeset's subject area but lacks specificity about what is actually being fixed. The PR objectives reveal multiple significant changes including index mismatch fixes, dataset preprocessing improvements, conversation ID field handling, progress bar control, and Qwen3 model support. However, the title provides no indication of these specific fixes and uses generic phrasing that doesn't convey meaningful information beyond identifying the area being modified. A teammate scanning pull request history would not understand the primary changes from this title alone. Consider revising the title to be more specific about the primary fix. For example: "fix: eagle3 offline index mismatch and dataset handling" or similar phrasing that captures the main changes. This would help reviewers and future maintainers quickly understand the scope and intent of the changeset without needing to review all the objectives.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch haoguo/fix-eagle-1021

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.39%. Comparing base (4476f21) to head (313d3b1).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #456   +/-   ##
=======================================
  Coverage   73.39%   73.39%           
=======================================
  Files         180      180           
  Lines       17976    17976           
=======================================
+ Hits        13193    13194    +1     
+ Misses       4783     4782    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@h-guo18 h-guo18 force-pushed the haoguo/fix-eagle-1021 branch from 4d4c153 to 885e017 Compare October 21, 2025 20:05
@h-guo18 h-guo18 requested a review from yeyu-nvidia October 21, 2025 20:07
@h-guo18 h-guo18 force-pushed the haoguo/fix-eagle-1021 branch from 885e017 to 70d4eee Compare October 21, 2025 20:19
@h-guo18 h-guo18 marked this pull request as ready for review October 21, 2025 20:28
@h-guo18 h-guo18 requested a review from a team as a code owner October 21, 2025 20:28
@h-guo18 h-guo18 requested a review from ChenhanYu October 21, 2025 20:28
Signed-off-by: h-guo18 <[email protected]>
@h-guo18 h-guo18 force-pushed the haoguo/fix-eagle-1021 branch from 70d4eee to 313d3b1 Compare October 21, 2025 20:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
examples/speculative_decoding/eagle_utils.py (1)

316-316: Remove duplicate log message.

The log message "Loading input conversations..." appears twice—once at line 256 (in the offline path) and again here at line 316 (in the non-offline path). Consider removing this duplicate since the message at line 256 already covers the loading step.

Apply this diff to remove the duplicate:

-    print_rank_0("Loading input conversations...")
     dataset_cls = LazySupervisedDataset if data_args.lazy_preprocess else SupervisedDataset
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d19f4f5 and 70d4eee.

📒 Files selected for processing (5)
  • examples/speculative_decoding/README.md (1 hunks)
  • examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (3 hunks)
  • examples/speculative_decoding/eagle_utils.py (3 hunks)
  • examples/speculative_decoding/launch_train.sh (3 hunks)
  • examples/speculative_decoding/main.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/speculative_decoding/eagle_utils.py (1)
modelopt/torch/utils/logging.py (1)
  • print_rank_0 (92-95)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: linux
🔇 Additional comments (12)
examples/speculative_decoding/README.md (1)

315-315: LGTM! Documentation updated to reflect expanded model support.

The support matrix now correctly includes QWen versions 1.5, 2, 2.5, and 3, aligning with the PR's goal of adding Qwen3 model support.

examples/speculative_decoding/main.py (1)

95-95: LGTM! Clean addition of progress bar control.

The new disable_tqdm field follows the established pattern for training arguments and addresses the PR objective to prevent CI logging overflow.

examples/speculative_decoding/launch_train.sh (3)

81-84: LGTM! Flag parsing follows established pattern.

The --disable_tqdm flag is parsed consistently with other command-line arguments in the script.


117-117: LGTM! Default value aligns with main.py.

The default value of False matches the field default in TrainingArguments.


173-173: LGTM! Flag correctly forwarded to training command.

The --disable_tqdm flag is properly injected into the accelerate launch command.

examples/speculative_decoding/eagle_utils.py (3)

63-65: LGTM! Removed hard-coded role alternation check.

This change enables more flexible conversation formats, including system prompts, which aligns with the PR's objective to preserve system prompts during dataset preprocessing.


256-269: LGTM! Directory support for multi-file datasets.

The enhanced data loading logic now accepts a directory containing multiple .jsonl files, which fulfills the PR objective. The implementation correctly handles both single files and directories with appropriate path-based branching.


286-293: LGTM! Robust conversation ID resolution with proper validation.

The enhanced ID resolution now tries conversation_id, then uuid, then id, and raises a descriptive ValueError if none are found. This aligns with the PR's objective to accept the "uuid" field and ensure an ID is present.

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (4)

211-239: LGTM! Robust async post-processing with validation.

The refactored _post_process_trtllm_dumped function correctly:

  • Returns early if the dump file doesn't exist (line 219-220)
  • Validates the expected format with clear assertions (lines 223-230)
  • Transforms the data to match HF format conventions
  • Cleans up the temporary dump file (line 238)

The async implementation is appropriate for I/O-heavy operations.


247-248: LGTM! Correct async usage with success tracking.

The code properly awaits the async post-processing function and converts the boolean result to an integer for the success counter.


255-274: LGTM! Fixes index mismatch for skipped conversations.

The key fix here is initializing idx = 0 (line 255) and incrementing only for valid conversations (line 274). This ensures that the dump file index aligns with the actual number of processed conversations, resolving the sample index mismatch mentioned in the PR objectives when conversations are skipped.


264-264: LGTM! Full conversation preservation.

Removing the [:256] slice ensures that complete conversations are processed, with proper length validation occurring downstream at line 268.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (2)

211-239: Blocking I/O operations in async function will block the event loop.

The function is declared async but performs synchronous blocking I/O operations (torch.load, torch.save, Path.unlink) without using await. This means these operations will block the event loop when the function is awaited, defeating the purpose of async concurrency and potentially degrading performance when processing multiple conversations.

Consider one of these solutions:

Option 1 (simpler): Remove async and call it synchronously since post-processing is relatively fast:

-    async def _post_process_trtllm_dumped(trtllm_dumped_file: str, conversation_id: int):
+    def _post_process_trtllm_dumped(trtllm_dumped_file: str, conversation_id: int):
         """
         Post-process the TRTLLM dumped file to same format as HF dumped:
         1. Remove id field, replace it with conversation_id

Then update the call site:

-        dump_success = await _post_process_trtllm_dumped(trtllm_dumped_file, conversation_id)
+        dump_success = _post_process_trtllm_dumped(trtllm_dumped_file, conversation_id)

Option 2 (proper async I/O): Use asyncio.to_thread to run blocking I/O in a thread pool:

async def _post_process_trtllm_dumped(trtllm_dumped_file: str, conversation_id: int):
    return await asyncio.to_thread(_post_process_trtllm_dumped_sync, trtllm_dumped_file, conversation_id)

def _post_process_trtllm_dumped_sync(trtllm_dumped_file: str, conversation_id: int):
    # existing implementation

152-153: Replace assertion with proper validation in filter function.

Using assert for data validation in a filter function is risky. If a conversation lacks both conversation_id and uuid fields, the assertion will raise an exception that could crash the entire filtering operation. Assertions should be used for invariants, not data validation.

     def keep_conversation(entry):
         conversation_id = entry.get("conversation_id", entry.get("uuid", None))
-        assert conversation_id is not None, "conversation_id is required"
+        if conversation_id is None:
+            print(f"Warning: Skipping conversation without conversation_id or uuid")
+            return False
         output_file = args.output_dir / f"{conversation_id}.pt"
         return not output_file.exists()
🧹 Nitpick comments (2)
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (2)

219-220: Consider logging when the TRTLLM dump file is missing.

The early existence check is good defensive programming, but silently returning False when the file doesn't exist could make debugging difficult. If TRTLLM generation fails to create the expected file, it would be helpful to log this for troubleshooting.

         if not trtllm_dumped_file.exists():
+            print(f"Warning: TRTLLM dump file not found: {trtllm_dumped_file}")
             return False

238-239: Add error handling for file deletion.

The unlink() operation could raise exceptions (e.g., permission errors, filesystem issues). While unlikely, a failure here shouldn't necessarily fail the entire post-processing since the data was already successfully saved.

-        trtllm_dumped_file.unlink()
-        return True
+        try:
+            trtllm_dumped_file.unlink()
+        except OSError as e:
+            print(f"Warning: Failed to delete temporary file {trtllm_dumped_file}: {e}")
+        return True
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70d4eee and 313d3b1.

📒 Files selected for processing (5)
  • examples/speculative_decoding/README.md (1 hunks)
  • examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (3 hunks)
  • examples/speculative_decoding/eagle_utils.py (3 hunks)
  • examples/speculative_decoding/launch_train.sh (3 hunks)
  • examples/speculative_decoding/main.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • examples/speculative_decoding/README.md
  • examples/speculative_decoding/launch_train.sh
  • examples/speculative_decoding/eagle_utils.py
  • examples/speculative_decoding/main.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: linux
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (3)
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_trtllm.py (3)

241-249: Logic correctly tracks post-processing success.

The function properly awaits the asynchronous operations and only increments num_success when post-processing succeeds. The file indexing (idx + 1) correctly accounts for TRTLLM's 1-based file naming.

Note: The effectiveness of this implementation depends on fixing the blocking I/O issue in _post_process_trtllm_dumped (see earlier comment).


255-274: Index tracking fix correctly aligns with TRTLLM dump files.

The explicit idx counter that only increments for valid conversations (line 274) is the key fix mentioned in the PR objectives. This ensures the dump file index matches the actual conversations processed, preventing index mismatches when some conversations are skipped.

The removal of the [:256] slice on line 264 is intentional and allows processing full conversations up to max_seq_len.

Optional defensive improvement: Add a safety check for conversation_id even though the earlier filter should have removed None values:

         for entry in dataset:
             conversation_id = entry.get("conversation_id", entry.get("uuid"))
+            if conversation_id is None:
+                num_invalid += 1
+                continue

             conversations = entry["conversations"]

264-267: Full conversation processing is correctly implemented.

Removing the [:256] slice allows the script to process complete conversations up to max_seq_len, as intended by the PR. The subsequent length validation (lines 268-270) ensures conversations stay within bounds.

@h-guo18 h-guo18 merged commit ff8a1ed into main Oct 21, 2025
27 checks passed
@h-guo18 h-guo18 deleted the haoguo/fix-eagle-1021 branch October 21, 2025 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants