Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion modelopt/torch/export/unified_export_megatron.py
Original file line number Diff line number Diff line change
Expand Up @@ -1085,7 +1085,10 @@ def _get_state_dict(self):
self.rules["k_layernorm"](layer.self_attention.k_layernorm, layer_id)
self.rules["linear_qkv"](layer.self_attention.linear_qkv, layer_id)
self.rules["linear_proj"](layer.self_attention.linear_proj, layer_id)
if hasattr(layer.self_attention.core_attention, "softmax_offset"):
if (
getattr(layer.self_attention.core_attention, "softmax_offset", None)
is not None
):
self.rules["softmax_offset"](
layer.self_attention.core_attention.softmax_offset, layer_id
)
Comment on lines +1088 to 1094
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

❓ Verification inconclusive

Guard both core_attention and softmax_offset to avoid AttributeError

getattr(layer.self_attention.core_attention, ...) still dereferences core_attention. If self_attention lacks core_attention, this will crash. Also, avoid re-accessing the attribute after the check.

Apply this diff:

-                        if (
-                            getattr(layer.self_attention.core_attention, "softmax_offset", None)
-                            is not None
-                        ):
-                            self.rules["softmax_offset"](
-                                layer.self_attention.core_attention.softmax_offset, layer_id
-                            )
+                        core_attn = getattr(layer.self_attention, "core_attention", None)
+                        softmax_offset = getattr(core_attn, "softmax_offset", None)
+                        if softmax_offset is not None:
+                            self.rules["softmax_offset"](softmax_offset, layer_id)

Would you add a quick unit/integration test covering:

  • core_attention is None
  • core_attention present but softmax_offset is None
  • core_attention + softmax_offset tensor present (rule invoked once)?

I can draft a minimal test double for layer.self_attention to exercise these branches if helpful.


Guard core_attention before accessing softmax_offset

getattr(layer.self_attention.core_attention, "softmax_offset", None) dereferences core_attention and can raise AttributeError — cache core_attention and then getattr on it.

File: modelopt/torch/export/unified_export_megatron.py lines 1088-1094

-                        if (
-                            getattr(layer.self_attention.core_attention, "softmax_offset", None)
-                            is not None
-                        ):
-                            self.rules["softmax_offset"](
-                                layer.self_attention.core_attention.softmax_offset, layer_id
-                            )
+                        core_attn = getattr(layer.self_attention, "core_attention", None)
+                        softmax_offset = getattr(core_attn, "softmax_offset", None)
+                        if softmax_offset is not None:
+                            self.rules["softmax_offset"](softmax_offset, layer_id)

Add unit tests covering:

  • core_attention is None
  • core_attention present but softmax_offset is None
  • core_attention + softmax_offset tensor present (rule invoked once)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (
getattr(layer.self_attention.core_attention, "softmax_offset", None)
is not None
):
self.rules["softmax_offset"](
layer.self_attention.core_attention.softmax_offset, layer_id
)
core_attn = getattr(layer.self_attention, "core_attention", None)
softmax_offset = getattr(core_attn, "softmax_offset", None)
if softmax_offset is not None:
self.rules["softmax_offset"](softmax_offset, layer_id)
🤖 Prompt for AI Agents
In modelopt/torch/export/unified_export_megatron.py around lines 1088-1094, the
code accesses layer.self_attention.core_attention directly inside getattr which
can raise AttributeError if core_attention is missing; cache core_attention into
a local variable first, check it is not None, then use getattr(core_attention,
"softmax_offset", None) and only invoke self.rules["softmax_offset"] when the
cached softmax_offset is not None; also add unit tests for three cases:
core_attention is None, core_attention present but softmax_offset is None, and
core_attention with a softmax_offset tensor (assert the rule is invoked exactly
once).

Expand Down
Loading