Skip to content

Conversation

@kinjalpatel27
Copy link
Contributor

@kinjalpatel27 kinjalpatel27 commented Oct 17, 2025

What does this PR do?

Bug fix

Overview:
For amax sync test, set the weights explicitly with constant value based on each rank which should ensure different amax without synchronization

Usage

pytest -svvv gpu/torch/quantization/plugins/test_megatron.py::test_expert_parallel_sync

Testing

Ran the above mentioned pytest

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: NA
  • Did you add or update any necessary documentation?: NA
  • Did you update Changelog?: NA

Additional Information

Summary by CodeRabbit

  • New Features

    • Added mention of MCore MoE support for PTQ/QAT/QAD in release notes.
  • Tests

    • Enhanced distributed quantization tests to use rank-aware data initialization, improving validation of activation-statistic synchronization across expert-parallel model configurations.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 17, 2025

Walkthrough

Test initialization was changed to set per-parameter, rank-dependent weight values (excluding embeddings) and to create a rank-scaled prompt_tokens; this produces distinct amax values across ranks to exercise amax synchronization in expert-parallel quantization tests.

Changes

Cohort / File(s) Summary
Test initialization refactor
tests/gpu/torch/quantization/plugins/test_megatron.py
Replaced single prompt_tokens initialization with per-parameter weight initialization: iterates model parameters, targets weight matrices (names containing "weight" and not "embedding"), fills trainable params with dim ≥ 2 using rank- and index-dependent constants, and sets prompt_tokens to a rank-aware long CUDA tensor. This produces distinct amax values across ranks for synchronization testing.
Changelog update
CHANGELOG.rst
Added a bullet under "New Features" noting support for MCore MoE PTQ/QAT/QAD. No other structural changes.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Test as Test Process
participant Model as Model (per-rank)
participant CUDA as CUDA tensors
participant Sync as amax Sync/Check

Note over Test,Model: Initialization phase (per-rank)
Test->>Model: iterate parameters
alt param is weight && not embedding && dim >= 2
    Test->>Model: fill param with rank- and index-dependent constant
    Model-->>CUDA: parameter placed on device
else other params
    Test->>Model: leave unchanged or default init
end

Note over Test,CUDA: prompt_tokens setup
Test->>CUDA: create rank-aware sequence -> to long CUDA tensor

Note over Test,Sync: validation phase
Test->>Sync: compute local amax values
Sync->>Sync: aggregate/compare amax across ranks
Sync-->>Test: synchronization result (pass/fail)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through tensors, row by row,

Per-rank I painted weights to show,
Amaxes shimmer, not the same,
Experts nod — they join the game,
Testing hops, and off I go! 🎩🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Updated amax_sync test to set const weights based on rank" accurately captures the main change in the changeset. The primary modification updates the _test_expert_model_parallel_amax_sync test function to initialize weight tensors with rank-dependent constant values instead of a single prompt_tokens initialization, which aligns directly with what the title describes. The title is specific, concise (10 words), and uses clear technical terminology that a reviewer can quickly understand. While the PR also includes a changelog entry as a secondary change, the core substance of the PR is the test modification, which the title appropriately highlights.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kinjal/fix_test_amax_sync

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bed2ecd and d7dd6ad.

📒 Files selected for processing (1)
  • CHANGELOG.rst (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/gpu/torch/quantization/plugins/test_megatron.py (1)

691-691: Consider improving prompt_tokens variation across ranks.

The current formula (torch.ones(...) * 0.05 + rank * 0.5).long() produces identical values for consecutive rank pairs (ranks 0-1 get 0, ranks 2-3 get 1, etc.). While this doesn't break the test (since weights already differ), you could improve variation:

-    prompt_tokens = (torch.ones((2, model.max_sequence_length)) * 0.05 + rank * 0.5).cuda().long()
+    prompt_tokens = (torch.ones((2, model.max_sequence_length)) * rank).cuda().long()

This creates unique token indices for each rank: 0, 1, 2, 3, etc., providing better per-rank distinction.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ef9954 and bed2ecd.

📒 Files selected for processing (1)
  • tests/gpu/torch/quantization/plugins/test_megatron.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/gpu/torch/quantization/plugins/test_megatron.py (2)
modelopt/torch/utils/distributed.py (2)
  • rank (68-72)
  • rank (200-202)
modelopt/torch/quantization/src/tensor_quant_mx.cu (4)
  • cuda (75-101)
  • cuda (75-76)
  • cuda (105-137)
  • cuda (105-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (1)
tests/gpu/torch/quantization/plugins/test_megatron.py (1)

677-690: Excellent approach to create rank-dependent weight values.

The weight initialization logic correctly:

  • Filters for trainable weight matrices (excluding embeddings and biases)
  • Assigns distinct constant values per rank and parameter index
  • Ensures different amax values across ranks for proper synchronization testing

The filtering conditions are appropriate and the rank-dependent formula 0.1 + (rank * 0.5) + (weight_idx * 0.05) effectively creates the needed variation.

@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.39%. Comparing base (6ef9954) to head (d7dd6ad).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #451      +/-   ##
==========================================
+ Coverage   73.38%   73.39%   +0.01%     
==========================================
  Files         180      180              
  Lines       17942    17976      +34     
==========================================
+ Hits        13166    13194      +28     
- Misses       4776     4782       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kinjalpatel27 kinjalpatel27 merged commit d19f4f5 into main Oct 21, 2025
27 checks passed
@kinjalpatel27 kinjalpatel27 deleted the kinjal/fix_test_amax_sync branch October 21, 2025 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants