Leverage __call__ impl of nn Module instead of calling forward on attention

kimishpatel · facebook-github-bot · commit 6a6d047120b1 · 2025-09-24T08:34:42.000-07:00
Summary: In the current llama transformer definition we explicitly invoke forward method on various attention impls. This prevents us from leveraging register_forward_hook which explicitly gets called only via __call__ override here https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/module.py#L1781. By removing explicit call to forward we enable hooks to appropriately execute Created from CodeHub with https://fburl.com/edit-in-codehub Differential Revision: D83156099
diff --git a/examples/models/llama/llama_transformer.py b/examples/models/llama/llama_transformer.py
@@ -117,7 +117,7 @@ def from_type(cls, layer_id, args, rope) -> "TransformerBlock":
         return TransformerBlock(args, attention)
 
     def forward(self, x, freqs_cos, freqs_sin, attn_options: ForwardOptions):  # x: 1xN
-        h, attn_options_update = self.attention.forward(
+        h, attn_options_update = self.attention(
             self.attention_norm(x), freqs_cos, freqs_sin, **attn_options
         )
 

Original file line number	Diff line number	Diff line change
`@@ -117,7 +117,7 @@ def from_type(cls, layer_id, args, rope) -> "TransformerBlock":`
`117`	`117`	`return TransformerBlock(args, attention)`
`118`	`118`
`119`	`119`	`def forward(self, x, freqs_cos, freqs_sin, attn_options: ForwardOptions): # x: 1xN`
`120`		`- h, attn_options_update = self.attention.forward(`
	`120`	`+ h, attn_options_update = self.attention(`
`121`	`121`	`self.attention_norm(x), freqs_cos, freqs_sin, **attn_options`
`122`	`122`	`)`
`123`	`123`