You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: correct misleading TODO comments about positions guard
The comments blamed DTensor+FSDP for the positions guard, but the
actual issue is an out-of-bounds RoPE cache index: per-document
position IDs from packed datasets can exceed max_seq_len (e.g. 6545
vs cache size 2048). The guard is also semantically correct — causal
attention treats the packed sequence as one document, so sequential
positions via the None path are what we want.
0 commit comments