Fix double sequence partition during training with context-parallel#3498
Fix double sequence partition during training with context-parallel#3498lorenzbaraldi wants to merge 1 commit intoaxolotl-ai-cloud:mainfrom
Conversation
…ager and Accelerate's native CP
📝 WalkthroughWalkthroughA monkeypatch file is modified to replace active context parallel (CP) setup logic with a no-op context manager. Import statements are updated to use Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.Add a .trivyignore file to your project to customize which findings Trivy reports. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/axolotl/monkeypatch/accelerate/parallelism_config.py (1)
89-95: Add a comment explaining the intentional no-op.The fix correctly prevents double sequence partitioning by replacing Accelerate's CP context with a no-op. Consider adding a brief comment to clarify the rationale for future maintainers.
📝 Suggested documentation
+ # No-op context manager to prevent double sequence partitioning when + # SequenceParallelContextManager is already handling the split. `@contextlib.contextmanager` def _noop_cp_context( buffers=None, buffer_seq_dims=None, no_restore_buffers=None ): yield🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/axolotl/monkeypatch/accelerate/parallelism_config.py` around lines 89 - 95, Add a brief explanatory comment above the _noop_cp_context definition to document that this no-op context manager intentionally replaces Accelerate's CP context to prevent double sequence partitioning; mention that _noop_cp_context yields immediately and is assigned to self._cp_context to avoid re-partitioning buffers when Accelerate's original context would run.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/axolotl/monkeypatch/accelerate/parallelism_config.py`:
- Around line 89-95: Add a brief explanatory comment above the _noop_cp_context
definition to document that this no-op context manager intentionally replaces
Accelerate's CP context to prevent double sequence partitioning; mention that
_noop_cp_context yields immediately and is assigned to self._cp_context to avoid
re-partitioning buffers when Accelerate's original context would run.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cba8a82f-3154-4add-8a7b-6e4561fb28d0
📒 Files selected for processing (1)
src/axolotl/monkeypatch/accelerate/parallelism_config.py
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Description
This PR fixes an issue caused by double context partitioning when both Accelerate native Context Parallelism (CP) and the SequenceParallelContextManager are applied simultaneously.
Motivation and Context
During training with context parallelism enabled, the token sequence was unintentionally split by a factor of 1 / cp_size².
This happened because:
• SequenceParallelContextManager already partitions the sequence by 1 / cp_size.
• At the same time, Accelerate applies additional context partitioning through maybe_context_parallel.
As a result, the sequence was partitioned twice, leading to an incorrect effective sequence length.
This patch prevents the double partitioning and ensures the sequence is split only once as intended.
How has this been tested?
The fix was tested using a CP configuration with 8 GPUs.
Testing consisted of debugging the apply_sequence_parallelism function with and without the patch. Without the fix, the training loss was consistently higher than the evaluation loss, indicating incorrect training behavior. After applying the patch, the losses behaved as expected.
AI Usage Disclaimer
Yes — Opus was used to assist with debugging.
Screenshots (if appropriate)
Types of changes
Bug fix
Social Handles (Optional)
Summary by CodeRabbit