Skip to content

Fix double sequence partition during training with context-parallel#3498

Open
lorenzbaraldi wants to merge 1 commit intoaxolotl-ai-cloud:mainfrom
lorenzbaraldi:fix/context-parallelism
Open

Fix double sequence partition during training with context-parallel#3498
lorenzbaraldi wants to merge 1 commit intoaxolotl-ai-cloud:mainfrom
lorenzbaraldi:fix/context-parallelism

Conversation

@lorenzbaraldi
Copy link
Contributor

@lorenzbaraldi lorenzbaraldi commented Mar 15, 2026

Description

This PR fixes an issue caused by double context partitioning when both Accelerate native Context Parallelism (CP) and the SequenceParallelContextManager are applied simultaneously.

Motivation and Context

During training with context parallelism enabled, the token sequence was unintentionally split by a factor of 1 / cp_size².

This happened because:
• SequenceParallelContextManager already partitions the sequence by 1 / cp_size.
• At the same time, Accelerate applies additional context partitioning through maybe_context_parallel.

As a result, the sequence was partitioned twice, leading to an incorrect effective sequence length.

This patch prevents the double partitioning and ensures the sequence is split only once as intended.

How has this been tested?

The fix was tested using a CP configuration with 8 GPUs.

Testing consisted of debugging the apply_sequence_parallelism function with and without the patch. Without the fix, the training loss was consistently higher than the evaluation loss, indicating incorrect training behavior. After applying the patch, the losses behaved as expected.

AI Usage Disclaimer

Yes — Opus was used to assist with debugging.

Screenshots (if appropriate)

Types of changes

Bug fix

Social Handles (Optional)

Summary by CodeRabbit

  • Refactor
    • Updated context parallel initialization handling in the accelerate integration for parallelism configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 15, 2026

📝 Walkthrough

Walkthrough

A monkeypatch file is modified to replace active context parallel (CP) setup logic with a no-op context manager. Import statements are updated to use contextlib instead of functools, and multiple CP-related imports and functionality are removed while introducing a simple no-operation context manager for self._cp_context.

Changes

Cohort / File(s) Summary
Context Parallel Setup Simplification
src/axolotl/monkeypatch/accelerate/parallelism_config.py
Replaced active CP context initialization with no-op context manager. Removed imports and usage of accelerate.big_modeling._attach_context_parallel_hooks, torch.distributed.tensor.experimental.context_parallel, and set_rotate_method. Updated imports from functools to contextlib while preserving function signature and structure.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

under review

Suggested reviewers

  • winglian
  • NanoCode012
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main fix: preventing double sequence partition when using context-parallel, which is the core issue addressed in the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.

Add a .trivyignore file to your project to customize which findings Trivy reports.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/axolotl/monkeypatch/accelerate/parallelism_config.py (1)

89-95: Add a comment explaining the intentional no-op.

The fix correctly prevents double sequence partitioning by replacing Accelerate's CP context with a no-op. Consider adding a brief comment to clarify the rationale for future maintainers.

📝 Suggested documentation
+        # No-op context manager to prevent double sequence partitioning when
+        # SequenceParallelContextManager is already handling the split.
         `@contextlib.contextmanager`
         def _noop_cp_context(
             buffers=None, buffer_seq_dims=None, no_restore_buffers=None
         ):
             yield
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/axolotl/monkeypatch/accelerate/parallelism_config.py` around lines 89 -
95, Add a brief explanatory comment above the _noop_cp_context definition to
document that this no-op context manager intentionally replaces Accelerate's CP
context to prevent double sequence partitioning; mention that _noop_cp_context
yields immediately and is assigned to self._cp_context to avoid re-partitioning
buffers when Accelerate's original context would run.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/axolotl/monkeypatch/accelerate/parallelism_config.py`:
- Around line 89-95: Add a brief explanatory comment above the _noop_cp_context
definition to document that this no-op context manager intentionally replaces
Accelerate's CP context to prevent double sequence partitioning; mention that
_noop_cp_context yields immediately and is assigned to self._cp_context to avoid
re-partitioning buffers when Accelerate's original context would run.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cba8a82f-3154-4add-8a7b-6e4561fb28d0

📥 Commits

Reviewing files that changed from the base of the PR and between d8a0574 and 80edaf5.

📒 Files selected for processing (1)
  • src/axolotl/monkeypatch/accelerate/parallelism_config.py

@codecov
Copy link

codecov bot commented Mar 15, 2026

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...olotl/monkeypatch/accelerate/parallelism_config.py 0.00% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant