[fix] call init_attention_mask inside of training loop #76

ebsmothers · 2025-08-26T23:25:05Z

After the changes in pytorch/torchtitan#1616, we need to explicitly initialize the attention mask in our trainer code.

Test plan: hacked Llama3 8B in my local titan code to enable flex as in this config. Then ran:

forge run --nproc_per_node 2 apps/sft/main.py --config apps/sft/llama3_8b.yaml

On main:

...
[rank0]:     output = self.sdpa(xq, xk, xv)
[rank0]:   File "/home/ebs/.fbpkg_conda_envs/forge-6f4168f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/ebs/.fbpkg_conda_envs/forge-6f4168f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/ebs/torchtitan/torchtitan/models/attention.py", line 88, in forward
[rank0]:     block_mask = FlexAttention.block_masks[self.mask_key]
[rank0]: KeyError: ('block_causal', None)

On this branch:

...
4|Loss: 12.06763744354248:   0%|▉                    | 5/1000 [00:08<23:58,  1.45s/it]

[fix] call init_attention_mask inside of training loop

28d5fd3

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 26, 2025

ebsmothers requested review from Jack-Khuu, allenwang28, joecummings and pbontrager August 26, 2025 23:25

Jack-Khuu approved these changes Aug 26, 2025

View reviewed changes

ebsmothers merged commit 89903fa into meta-pytorch:main Aug 26, 2025
4 checks passed

ebsmothers mentioned this pull request Aug 27, 2025

RLTrainer #40

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix] call init_attention_mask inside of training loop #76

[fix] call init_attention_mask inside of training loop #76

Uh oh!

ebsmothers commented Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[fix] call init_attention_mask inside of training loop #76

[fix] call init_attention_mask inside of training loop #76

Uh oh!

Conversation

ebsmothers commented Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants