Fix issue of attention.core_attention is None #334

yueshen2016 · 2025-09-17T23:28:47Z

What does this PR do?

Type of change: ?
Bug fix
Overview: ?
Fix the issue when attention.core_attention is None in the process of megatron ckpt export

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Bug Fixes
- Improved stability when exporting Megatron models by better detecting when softmax offset logic should apply.
- Prevents rare crashes or incorrect export behavior in configurations where the offset is present but unset.
- Results in fewer export errors and smoother runs for mixed or custom model setups.

coderabbitai · 2025-09-17T23:28:55Z

Walkthrough

Updates the softmax_offset presence check in unified_export_megatron.py to use a guarded getattr(..., "softmax_offset", None) is not None check before invoking the softmax_offset rule. No public interfaces or declarations changed.

Changes

Cohort / File(s)	Summary
Megatron export guard update `modelopt/torch/export/unified_export_megatron.py`	Replace `hasattr(...)` check with `getattr(layer.self_attention.core_attention, "softmax_offset", None) is not None` to gate invocation of the `softmax_offset` rule.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Exporter
    participant Layer
    participant Core as CoreAttention

    Exporter->>Layer: access self_attention.core_attention
    Layer-->>Exporter: CoreAttention ref
    Exporter->>Core: getattr("softmax_offset", None)
    alt softmax_offset is not None
        Note over Exporter,Core: guarded non-None path
        Exporter->>Core: invoke softmax_offset rule
        Core-->>Exporter: rule applied
    else softmax_offset is None or missing
        Exporter-->>Exporter: skip softmax_offset rule
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

A whisk of code, a careful hop,
I peek for None before I stop—
If offset’s found, I nudge the gate,
If not, I skip and stay light-footed.
Hops secure, the export’s calm. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Fix issue of attention.core_attention is None" accurately and concisely reflects the primary fix in this changeset — handling cases where attention.core_attention can be None by guarding attribute access (e.g., switching to a getattr check for softmax_offset) and therefore is directly related to the main change. It is specific enough for a reviewer scanning the history to understand the PR's purpose.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yueshen/fix-megatron-export

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f482c39 and 804a075.

📒 Files selected for processing (1)

modelopt/torch/export/unified_export_megatron.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

modelopt/torch/export/unified_export_megatron.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 146d1d9 and f482c39.

📒 Files selected for processing (1)

modelopt/torch/export/unified_export_megatron.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: wait-checks / wait
GitHub Check: code-quality
GitHub Check: build-docs

coderabbitai · 2025-09-17T23:33:29Z

modelopt/torch/export/unified_export_megatron.py

+                        if (
+                            getattr(layer.self_attention.core_attention, "softmax_offset", None)
+                            is not None
+                        ):
                            self.rules["softmax_offset"](
                                layer.self_attention.core_attention.softmax_offset, layer_id
                            )


⚠️ Potential issue

❓ Verification inconclusive

Guard both core_attention and softmax_offset to avoid AttributeError

getattr(layer.self_attention.core_attention, ...) still dereferences core_attention. If self_attention lacks core_attention, this will crash. Also, avoid re-accessing the attribute after the check.

Apply this diff:

- if ( - getattr(layer.self_attention.core_attention, "softmax_offset", None) - is not None - ): - self.rules["softmax_offset"]( - layer.self_attention.core_attention.softmax_offset, layer_id - ) + core_attn = getattr(layer.self_attention, "core_attention", None) + softmax_offset = getattr(core_attn, "softmax_offset", None) + if softmax_offset is not None: + self.rules["softmax_offset"](softmax_offset, layer_id)

Would you add a quick unit/integration test covering:

core_attention is None

core_attention present but softmax_offset is None

core_attention + softmax_offset tensor present (rule invoked once)?

I can draft a minimal test double for layer.self_attention to exercise these branches if helpful.

Guard core_attention before accessing softmax_offset

getattr(layer.self_attention.core_attention, "softmax_offset", None) dereferences core_attention and can raise AttributeError — cache core_attention and then getattr on it.

File: modelopt/torch/export/unified_export_megatron.py lines 1088-1094

- if ( - getattr(layer.self_attention.core_attention, "softmax_offset", None) - is not None - ): - self.rules["softmax_offset"]( - layer.self_attention.core_attention.softmax_offset, layer_id - ) + core_attn = getattr(layer.self_attention, "core_attention", None) + softmax_offset = getattr(core_attn, "softmax_offset", None) + if softmax_offset is not None: + self.rules["softmax_offset"](softmax_offset, layer_id)

Add unit tests covering:

core_attention is None

core_attention present but softmax_offset is None

core_attention + softmax_offset tensor present (rule invoked once)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (

getattr(layer.self_attention.core_attention, "softmax_offset", None)

is not None

):

self.rules["softmax_offset"](

layer.self_attention.core_attention.softmax_offset, layer_id

)

core_attn = getattr(layer.self_attention, "core_attention", None)

softmax_offset = getattr(core_attn, "softmax_offset", None)

if softmax_offset is not None:

self.rules["softmax_offset"](softmax_offset, layer_id)

🤖 Prompt for AI Agents

In modelopt/torch/export/unified_export_megatron.py around lines 1088-1094, the code accesses layer.self_attention.core_attention directly inside getattr which can raise AttributeError if core_attention is missing; cache core_attention into a local variable first, check it is not None, then use getattr(core_attention, "softmax_offset", None) and only invoke self.rules["softmax_offset"] when the cached softmax_offset is not None; also add unit tests for three cases: core_attention is None, core_attention present but softmax_offset is None, and core_attention with a softmax_offset tensor (assert the rule is invoked exactly once).

Signed-off-by: Yue <[email protected]>

yueshen2016 · 2025-09-17T23:44:54Z

/ok to test 804a075

codecov · 2025-09-17T23:58:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.82%. Comparing base (146d1d9) to head (804a075).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #334      +/-   ##
==========================================
- Coverage   73.82%   73.82%   -0.01%     
==========================================
  Files         172      172              
  Lines       17438    17438              
==========================================
- Hits        12874    12873       -1     
- Misses       4564     4565       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Yue <[email protected]> Signed-off-by: Ye Yu <[email protected]>

yueshen2016 requested a review from a team as a code owner September 17, 2025 23:28

yueshen2016 requested a review from ChenhanYu September 17, 2025 23:28

ChenhanYu approved these changes Sep 17, 2025

View reviewed changes

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

Fix issue of attention.core_attention is None

804a075

Signed-off-by: Yue <[email protected]>

yueshen2016 force-pushed the yueshen/fix-megatron-export branch from f482c39 to 804a075 Compare September 17, 2025 23:43

yueshen2016 enabled auto-merge (squash) September 17, 2025 23:44

yueshen2016 merged commit bbb2304 into main Sep 18, 2025
22 checks passed

yueshen2016 deleted the yueshen/fix-megatron-export branch September 18, 2025 01:19

yeyu-nvidia pushed a commit that referenced this pull request Sep 18, 2025

Fix issue of attention.core_attention is None (#334)

3303f35

Signed-off-by: Yue <[email protected]> Signed-off-by: Ye Yu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix issue of attention.core_attention is None #334

Fix issue of attention.core_attention is None #334

Uh oh!

yueshen2016 commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 17, 2025

Uh oh!

yueshen2016 commented Sep 17, 2025

Uh oh!

codecov bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix issue of attention.core_attention is None #334

Fix issue of attention.core_attention is None #334

Uh oh!

Conversation

yueshen2016 commented Sep 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

yueshen2016 commented Sep 17, 2025

Uh oh!

codecov bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yueshen2016 commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

codecov bot commented Sep 17, 2025 •

edited

Loading