-
Notifications
You must be signed in to change notification settings - Fork 161
Fix issue of attention.core_attention is None #334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdates the softmax_offset presence check in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Exporter
participant Layer
participant Core as CoreAttention
Exporter->>Layer: access self_attention.core_attention
Layer-->>Exporter: CoreAttention ref
Exporter->>Core: getattr("softmax_offset", None)
alt softmax_offset is not None
Note over Exporter,Core: guarded non-None path
Exporter->>Core: invoke softmax_offset rule
Core-->>Exporter: rule applied
else softmax_offset is None or missing
Exporter-->>Exporter: skip softmax_offset rule
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
modelopt/torch/export/unified_export_megatron.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
if ( | ||
getattr(layer.self_attention.core_attention, "softmax_offset", None) | ||
is not None | ||
): | ||
self.rules["softmax_offset"]( | ||
layer.self_attention.core_attention.softmax_offset, layer_id | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Verification inconclusive
Guard both core_attention and softmax_offset to avoid AttributeError
getattr(layer.self_attention.core_attention, ...) still dereferences core_attention. If self_attention lacks core_attention, this will crash. Also, avoid re-accessing the attribute after the check.
Apply this diff:
- if (
- getattr(layer.self_attention.core_attention, "softmax_offset", None)
- is not None
- ):
- self.rules["softmax_offset"](
- layer.self_attention.core_attention.softmax_offset, layer_id
- )
+ core_attn = getattr(layer.self_attention, "core_attention", None)
+ softmax_offset = getattr(core_attn, "softmax_offset", None)
+ if softmax_offset is not None:
+ self.rules["softmax_offset"](softmax_offset, layer_id)
Would you add a quick unit/integration test covering:
- core_attention is None
- core_attention present but softmax_offset is None
- core_attention + softmax_offset tensor present (rule invoked once)?
I can draft a minimal test double for layer.self_attention to exercise these branches if helpful.
Guard core_attention before accessing softmax_offset
getattr(layer.self_attention.core_attention, "softmax_offset", None) dereferences core_attention and can raise AttributeError — cache core_attention and then getattr on it.
File: modelopt/torch/export/unified_export_megatron.py lines 1088-1094
- if (
- getattr(layer.self_attention.core_attention, "softmax_offset", None)
- is not None
- ):
- self.rules["softmax_offset"](
- layer.self_attention.core_attention.softmax_offset, layer_id
- )
+ core_attn = getattr(layer.self_attention, "core_attention", None)
+ softmax_offset = getattr(core_attn, "softmax_offset", None)
+ if softmax_offset is not None:
+ self.rules["softmax_offset"](softmax_offset, layer_id)
Add unit tests covering:
- core_attention is None
- core_attention present but softmax_offset is None
- core_attention + softmax_offset tensor present (rule invoked once)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if ( | |
getattr(layer.self_attention.core_attention, "softmax_offset", None) | |
is not None | |
): | |
self.rules["softmax_offset"]( | |
layer.self_attention.core_attention.softmax_offset, layer_id | |
) | |
core_attn = getattr(layer.self_attention, "core_attention", None) | |
softmax_offset = getattr(core_attn, "softmax_offset", None) | |
if softmax_offset is not None: | |
self.rules["softmax_offset"](softmax_offset, layer_id) |
🤖 Prompt for AI Agents
In modelopt/torch/export/unified_export_megatron.py around lines 1088-1094, the
code accesses layer.self_attention.core_attention directly inside getattr which
can raise AttributeError if core_attention is missing; cache core_attention into
a local variable first, check it is not None, then use getattr(core_attention,
"softmax_offset", None) and only invoke self.rules["softmax_offset"] when the
cached softmax_offset is not None; also add unit tests for three cases:
core_attention is None, core_attention present but softmax_offset is None, and
core_attention with a softmax_offset tensor (assert the rule is invoked exactly
once).
Signed-off-by: Yue <[email protected]>
f482c39
to
804a075
Compare
/ok to test 804a075 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #334 +/- ##
==========================================
- Coverage 73.82% 73.82% -0.01%
==========================================
Files 172 172
Lines 17438 17438
==========================================
- Hits 12874 12873 -1
- Misses 4564 4565 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Yue <[email protected]> Signed-off-by: Ye Yu <[email protected]>
What does this PR do?
Type of change: ?
Bug fix
Overview: ?
Fix the issue when attention.core_attention is None in the process of megatron ckpt export
Usage
# Add a code snippet demonstrating how to use this
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit