Skip to content

Actually correct training w/ packed datasets defaults#2610

Draft
joecummings wants to merge 3 commits intopytorch:fix-pos-idfrom
joecummings:flex-block-causal-defaults
Draft

Actually correct training w/ packed datasets defaults#2610
joecummings wants to merge 3 commits intopytorch:fix-pos-idfrom
joecummings:flex-block-causal-defaults

Conversation

@joecummings
Copy link
Member

No description provided.

Add a position buffer that tracks per-document RoPE positions,
resetting at each document boundary. These positions are yielded
alongside input tokens and used when block_causal attention is
configured.

Also add is_packed validation to catch misconfigured attention
backends at trainer init time: packed dataloaders require flex or
varlen with block_causal to prevent cross-document attention leakage.
- Add is_packed attribute on HuggingFaceTextDataset, delegated through
  the dataloader to the trainer for early validation that packed
  datasets use flex/varlen + block_causal attention.
- Remove the % seq_len clamp on position IDs. Document positions are
  valid indices into the RoPE cache (sized to max_seq_len), and
  unclamped positions preserve correct absolute offsets for document
  fragments spanning chunk boundaries.
…causal

With packed datasets, sdpa/causal allows cross-document attention
leakage and uses sequential positions across document boundaries.
flex/block_causal isolates documents in attention and enables
per-document RoPE position IDs.

llama4, deepseek_v3, and gpt_oss already used flex/block_causal.
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 16, 2026
@joecummings joecummings changed the title Flex block causal defaults Actually correct training w/ padded datasets defaults Mar 16, 2026
@joecummings joecummings changed the title Actually correct training w/ padded datasets defaults Actually correct training w/ packed datasets defaults Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant