You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: wrap per-document RoPE positions at seq_len to prevent OOB gather
Documents longer than seq_len produce position IDs that exceed the RoPE
cache size, causing an index-out-of-bounds error in torch.gather during
apply_rotary_emb. Wrap positions with modulo seq_len in the dataloader,
which effectively chunks long documents for RoPE purposes while
preserving all tokens for training.
Also update comments to clarify: per-document positions are dropped for
causal attention (whole sequence is one document), and kept for
block_causal to match inference frameworks (e.g. vLLM) that reset
positions to 0 per request.
0 commit comments