Skip to content

Conversation

kevalmorabia97
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 commented Oct 6, 2025

What does this PR do?

Type of change: Mcore Test utility fix

Testing

  • CI tests passing

Summary by CodeRabbit

  • Tests

    • Enabled previously skipped test for heterogeneous sharded state dict to run across supported versions.
    • Cleaned up unused imports and simplified version checks.
  • Chores

    • Updated test utilities to use a context-aware inference setup for better alignment with current inference workflows.
    • Clarified docstrings to reflect supported model types when using dummy inputs.

Copy link

coderabbitai bot commented Oct 6, 2025

Walkthrough

Introduces a StaticInferenceContext derived from InferenceWrapperConfig and routes inference via a context-aware GPTInferenceWrapper. Adjusts prep_model_for_inference call signature, sets materialize_only_last_token_logits=False, and keeps downstream steps unchanged. Separately, removes a version-based skip and unused imports in a GPU quantization test to run a test unconditionally.

Changes

Cohort / File(s) Summary
Inference context integration
tests/_test_utils/torch_dist/plugins/megatron_common.py
Switches to StaticInferenceContext created from InferenceWrapperConfig; initializes GPTInferenceWrapper with context; calls prep_model_for_inference without prompt_tokens; sets materialize_only_last_token_logits=False; downstream inference steps unchanged; docstring wording updated for dummy-input path.
Test cleanup and guard removal
tests/gpu/torch/quantization/plugins/test_megatron.py
Removes version guard and unused imports so test_heterogenous_sharded_state_dict runs unconditionally; no other logic changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor T as Test
  participant C as StaticInferenceContext
  participant W as GPTInferenceWrapper
  participant M as Model

  Note over T,C: Build inference context from InferenceWrapperConfig
  T->>C: create from InferenceWrapperConfig
  C-->>T: inference_context

  Note over T,W: Initialize wrapper with context
  T->>W: init(inference_context)

  Note over W,M: Preparation phase
  T->>W: prep_model_for_inference()
  W->>M: configure(materialize_only_last_token_logits=false)

  Note over W,M: Inference flow (unchanged steps, context-aware)
  T->>W: prep_inference_input(...)
  W->>W: get_batch_for_context_window(...)
  W->>M: run_one_forward_step(...)
  M-->>W: logits/output
  W-->>T: broadcast_from_last_pipeline_stage(...)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I tuned my ears to context’s hum,
Wrapped in whispers, logits come.
No skip today—the tests all run!
Carrots queued, the work is done.
Hop, compile, and swiftly see—
A tidy trail of victory. 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and succinctly states that the GPU test inference utility for Mcore is being fixed to support Mcore version 0.14 and above, directly reflecting the primary change introduced in the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kmorabia/mcore-inference-0.14

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 340eb7a and 1f0080a.

📒 Files selected for processing (2)
  • tests/_test_utils/torch_dist/plugins/megatron_common.py (3 hunks)
  • tests/gpu/torch/quantization/plugins/test_megatron.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/gpu/torch/quantization/plugins/test_megatron.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (3)
tests/_test_utils/torch_dist/plugins/megatron_common.py (3)

27-27: LGTM: Import addition for context-based inference.

The import of StaticInferenceContext is correctly placed and necessary for the Mcore 0.14+ context-based inference flow implemented below.


385-385: Minor docstring simplification.

The removal of "wrapped" from the docstring is appropriate since the wrapping is now an internal implementation detail of the function rather than a user-facing concern.


353-365: LGTM: Verified inference function signatures unchanged and all callers remain compatible.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/mcore-inference-0.14 branch from 2f2e9a8 to 1f0080a Compare October 6, 2025 11:01
Copy link

codecov bot commented Oct 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.79%. Comparing base (340eb7a) to head (1f0080a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #401      +/-   ##
==========================================
- Coverage   73.79%   73.79%   -0.01%     
==========================================
  Files         171      171              
  Lines       17591    17591              
==========================================
- Hits        12982    12981       -1     
- Misses       4609     4610       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant