Fix gpu test mcore inference utility for Mcore 0.14+ #401

kevalmorabia97 · 2025-10-06T11:00:37Z

What does this PR do?

Type of change: Mcore Test utility fix

Testing

CI tests passing

Summary by CodeRabbit

Tests
- Enabled previously skipped test for heterogeneous sharded state dict to run across supported versions.
- Cleaned up unused imports and simplified version checks.
Chores
- Updated test utilities to use a context-aware inference setup for better alignment with current inference workflows.
- Clarified docstrings to reflect supported model types when using dummy inputs.

coderabbitai · 2025-10-06T11:01:09Z

Walkthrough

Introduces a StaticInferenceContext derived from InferenceWrapperConfig and routes inference via a context-aware GPTInferenceWrapper. Adjusts prep_model_for_inference call signature, sets materialize_only_last_token_logits=False, and keeps downstream steps unchanged. Separately, removes a version-based skip and unused imports in a GPU quantization test to run a test unconditionally.

Changes

Cohort / File(s)	Summary
Inference context integration `tests/_test_utils/torch_dist/plugins/megatron_common.py`	Switches to StaticInferenceContext created from InferenceWrapperConfig; initializes GPTInferenceWrapper with context; calls prep_model_for_inference without prompt_tokens; sets materialize_only_last_token_logits=False; downstream inference steps unchanged; docstring wording updated for dummy-input path.
Test cleanup and guard removal `tests/gpu/torch/quantization/plugins/test_megatron.py`	Removes version guard and unused imports so test_heterogenous_sharded_state_dict runs unconditionally; no other logic changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor T as Test
  participant C as StaticInferenceContext
  participant W as GPTInferenceWrapper
  participant M as Model

  Note over T,C: Build inference context from InferenceWrapperConfig
  T->>C: create from InferenceWrapperConfig
  C-->>T: inference_context

  Note over T,W: Initialize wrapper with context
  T->>W: init(inference_context)

  Note over W,M: Preparation phase
  T->>W: prep_model_for_inference()
  W->>M: configure(materialize_only_last_token_logits=false)

  Note over W,M: Inference flow (unchanged steps, context-aware)
  T->>W: prep_inference_input(...)
  W->>W: get_batch_for_context_window(...)
  W->>M: run_one_forward_step(...)
  M-->>W: logits/output
  W-->>T: broadcast_from_last_pipeline_stage(...)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I tuned my ears to context’s hum,
Wrapped in whispers, logits come.
No skip today—the tests all run!
Carrots queued, the work is done.
Hop, compile, and swiftly see—
A tidy trail of victory. 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly and succinctly states that the GPU test inference utility for Mcore is being fixed to support Mcore version 0.14 and above, directly reflecting the primary change introduced in the pull request.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kmorabia/mcore-inference-0.14

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 340eb7a and 1f0080a.

📒 Files selected for processing (2)

tests/_test_utils/torch_dist/plugins/megatron_common.py (3 hunks)
tests/gpu/torch/quantization/plugins/test_megatron.py (0 hunks)

💤 Files with no reviewable changes (1)

tests/gpu/torch/quantization/plugins/test_megatron.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: code-quality
GitHub Check: build-docs

🔇 Additional comments (3)

tests/_test_utils/torch_dist/plugins/megatron_common.py (3)

27-27: LGTM: Import addition for context-based inference.

The import of StaticInferenceContext is correctly placed and necessary for the Mcore 0.14+ context-based inference flow implemented below.

385-385: Minor docstring simplification.

The removal of "wrapped" from the docstring is appropriate since the wrapping is now an internal implementation detail of the function rather than a user-facing concern.

353-365: LGTM: Verified inference function signatures unchanged and all callers remain compatible.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Keval Morabia <[email protected]>

codecov · 2025-10-06T11:14:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.79%. Comparing base (340eb7a) to head (1f0080a).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #401      +/-   ##
==========================================
- Coverage   73.79%   73.79%   -0.01%     
==========================================
  Files         171      171              
  Lines       17591    17591              
==========================================
- Hits        12982    12981       -1     
- Misses       4609     4610       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 requested review from ChenhanYu, AAnoosheh, jenchen13 and realAsma October 6, 2025 11:00

Fix gpu test mcore inference utility for Mcore 0.14+

1f0080a

Signed-off-by: Keval Morabia <[email protected]>

kevalmorabia97 force-pushed the kmorabia/mcore-inference-0.14 branch from 2f2e9a8 to 1f0080a Compare October 6, 2025 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix gpu test mcore inference utility for Mcore 0.14+ #401

Fix gpu test mcore inference utility for Mcore 0.14+ #401

kevalmorabia97 commented Oct 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 6, 2025

Uh oh!

Uh oh!

Fix gpu test mcore inference utility for Mcore 0.14+ #401

Are you sure you want to change the base?

Fix gpu test mcore inference utility for Mcore 0.14+ #401

Conversation

kevalmorabia97 commented Oct 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Oct 6, 2025

Codecov Report

Uh oh!

Uh oh!

kevalmorabia97 commented Oct 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 6, 2025 •

edited

Loading