Skip to content

Conversation

yueshen2016
Copy link
Contributor

@yueshen2016 yueshen2016 commented Sep 17, 2025

What does this PR do?

Type of change: ?
Bug fix
Overview: ?
Fix the issue when attention.core_attention is None in the process of megatron ckpt export

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • Bug Fixes
    • Improved stability when exporting Megatron models by better detecting when softmax offset logic should apply.
    • Prevents rare crashes or incorrect export behavior in configurations where the offset is present but unset.
    • Results in fewer export errors and smoother runs for mixed or custom model setups.

@yueshen2016 yueshen2016 requested a review from a team as a code owner September 17, 2025 23:28
Copy link

coderabbitai bot commented Sep 17, 2025

Walkthrough

Updates the softmax_offset presence check in unified_export_megatron.py to use a guarded getattr(..., "softmax_offset", None) is not None check before invoking the softmax_offset rule. No public interfaces or declarations changed.

Changes

Cohort / File(s) Summary
Megatron export guard update
modelopt/torch/export/unified_export_megatron.py
Replace hasattr(...) check with getattr(layer.self_attention.core_attention, "softmax_offset", None) is not None to gate invocation of the softmax_offset rule.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Exporter
    participant Layer
    participant Core as CoreAttention

    Exporter->>Layer: access self_attention.core_attention
    Layer-->>Exporter: CoreAttention ref
    Exporter->>Core: getattr("softmax_offset", None)
    alt softmax_offset is not None
        Note over Exporter,Core: guarded non-None path
        Exporter->>Core: invoke softmax_offset rule
        Core-->>Exporter: rule applied
    else softmax_offset is None or missing
        Exporter-->>Exporter: skip softmax_offset rule
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

A whisk of code, a careful hop,
I peek for None before I stop—
If offset’s found, I nudge the gate,
If not, I skip and stay light-footed.
Hops secure, the export’s calm. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Fix issue of attention.core_attention is None" accurately and concisely reflects the primary fix in this changeset — handling cases where attention.core_attention can be None by guarding attribute access (e.g., switching to a getattr check for softmax_offset) and therefore is directly related to the main change. It is specific enough for a reviewer scanning the history to understand the PR's purpose.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yueshen/fix-megatron-export

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f482c39 and 804a075.

📒 Files selected for processing (1)
  • modelopt/torch/export/unified_export_megatron.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/torch/export/unified_export_megatron.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 146d1d9 and f482c39.

📒 Files selected for processing (1)
  • modelopt/torch/export/unified_export_megatron.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs

Comment on lines +1088 to 1094
if (
getattr(layer.self_attention.core_attention, "softmax_offset", None)
is not None
):
self.rules["softmax_offset"](
layer.self_attention.core_attention.softmax_offset, layer_id
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

❓ Verification inconclusive

Guard both core_attention and softmax_offset to avoid AttributeError

getattr(layer.self_attention.core_attention, ...) still dereferences core_attention. If self_attention lacks core_attention, this will crash. Also, avoid re-accessing the attribute after the check.

Apply this diff:

-                        if (
-                            getattr(layer.self_attention.core_attention, "softmax_offset", None)
-                            is not None
-                        ):
-                            self.rules["softmax_offset"](
-                                layer.self_attention.core_attention.softmax_offset, layer_id
-                            )
+                        core_attn = getattr(layer.self_attention, "core_attention", None)
+                        softmax_offset = getattr(core_attn, "softmax_offset", None)
+                        if softmax_offset is not None:
+                            self.rules["softmax_offset"](softmax_offset, layer_id)

Would you add a quick unit/integration test covering:

  • core_attention is None
  • core_attention present but softmax_offset is None
  • core_attention + softmax_offset tensor present (rule invoked once)?

I can draft a minimal test double for layer.self_attention to exercise these branches if helpful.


Guard core_attention before accessing softmax_offset

getattr(layer.self_attention.core_attention, "softmax_offset", None) dereferences core_attention and can raise AttributeError — cache core_attention and then getattr on it.

File: modelopt/torch/export/unified_export_megatron.py lines 1088-1094

-                        if (
-                            getattr(layer.self_attention.core_attention, "softmax_offset", None)
-                            is not None
-                        ):
-                            self.rules["softmax_offset"](
-                                layer.self_attention.core_attention.softmax_offset, layer_id
-                            )
+                        core_attn = getattr(layer.self_attention, "core_attention", None)
+                        softmax_offset = getattr(core_attn, "softmax_offset", None)
+                        if softmax_offset is not None:
+                            self.rules["softmax_offset"](softmax_offset, layer_id)

Add unit tests covering:

  • core_attention is None
  • core_attention present but softmax_offset is None
  • core_attention + softmax_offset tensor present (rule invoked once)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (
getattr(layer.self_attention.core_attention, "softmax_offset", None)
is not None
):
self.rules["softmax_offset"](
layer.self_attention.core_attention.softmax_offset, layer_id
)
core_attn = getattr(layer.self_attention, "core_attention", None)
softmax_offset = getattr(core_attn, "softmax_offset", None)
if softmax_offset is not None:
self.rules["softmax_offset"](softmax_offset, layer_id)
🤖 Prompt for AI Agents
In modelopt/torch/export/unified_export_megatron.py around lines 1088-1094, the
code accesses layer.self_attention.core_attention directly inside getattr which
can raise AttributeError if core_attention is missing; cache core_attention into
a local variable first, check it is not None, then use getattr(core_attention,
"softmax_offset", None) and only invoke self.rules["softmax_offset"] when the
cached softmax_offset is not None; also add unit tests for three cases:
core_attention is None, core_attention present but softmax_offset is None, and
core_attention with a softmax_offset tensor (assert the rule is invoked exactly
once).

@yueshen2016 yueshen2016 force-pushed the yueshen/fix-megatron-export branch from f482c39 to 804a075 Compare September 17, 2025 23:43
@yueshen2016 yueshen2016 enabled auto-merge (squash) September 17, 2025 23:44
@yueshen2016
Copy link
Contributor Author

/ok to test 804a075

Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.82%. Comparing base (146d1d9) to head (804a075).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #334      +/-   ##
==========================================
- Coverage   73.82%   73.82%   -0.01%     
==========================================
  Files         172      172              
  Lines       17438    17438              
==========================================
- Hits        12874    12873       -1     
- Misses       4564     4565       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yueshen2016 yueshen2016 merged commit bbb2304 into main Sep 18, 2025
22 checks passed
@yueshen2016 yueshen2016 deleted the yueshen/fix-megatron-export branch September 18, 2025 01:19
yeyu-nvidia pushed a commit that referenced this pull request Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants