Fix double sequence partition during training with context-parallel by lorenzbaraldi · Pull Request #3498 · axolotl-ai-cloud/axolotl

lorenzbaraldi · 2026-03-15T15:14:24Z

Description

This PR fixes an issue caused by double context partitioning when both Accelerate native Context Parallelism (CP) and the SequenceParallelContextManager are applied simultaneously.

Motivation and Context

During training with context parallelism enabled, the token sequence was unintentionally split by a factor of 1 / cp_size².

This happened because:
• SequenceParallelContextManager already partitions the sequence by 1 / cp_size.
• At the same time, Accelerate applies additional context partitioning through maybe_context_parallel.

As a result, the sequence was partitioned twice, leading to an incorrect effective sequence length.

This patch prevents the double partitioning and ensures the sequence is split only once as intended.

How has this been tested?

The fix was tested using a CP configuration with 8 GPUs.

Testing consisted of debugging the apply_sequence_parallelism function with and without the patch. Without the fix, the training loss was consistently higher than the evaluation loss, indicating incorrect training behavior. After applying the patch, the losses behaved as expected.

AI Usage Disclaimer

Yes — Opus was used to assist with debugging.

Screenshots (if appropriate)

Types of changes

Bug fix

Social Handles (Optional)

Summary by CodeRabbit

Refactor
- Updated context parallel initialization handling in the accelerate integration for parallelism configuration.

…ager and Accelerate's native CP

coderabbitai · 2026-03-15T15:14:41Z

📝 Walkthrough

Walkthrough

A monkeypatch file is modified to replace active context parallel (CP) setup logic with a no-op context manager. Import statements are updated to use contextlib instead of functools, and multiple CP-related imports and functionality are removed while introducing a simple no-operation context manager for self._cp_context.

Changes

Cohort / File(s)	Summary
Context Parallel Setup Simplification `src/axolotl/monkeypatch/accelerate/parallelism_config.py`	Replaced active CP context initialization with no-op context manager. Removed imports and usage of `accelerate.big_modeling._attach_context_parallel_hooks`, `torch.distributed.tensor.experimental.context_parallel`, and `set_rotate_method`. Updated imports from `functools` to `contextlib` while preserving function signature and structure.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

under review

Suggested reviewers

winglian
NanoCode012

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the main fix: preventing double sequence partition when using context-parallel, which is the core issue addressed in the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.

Add a .trivyignore file to your project to customize which findings Trivy reports.

coderabbitai

🧹 Nitpick comments (1)

src/axolotl/monkeypatch/accelerate/parallelism_config.py (1)

89-95: Add a comment explaining the intentional no-op.

The fix correctly prevents double sequence partitioning by replacing Accelerate's CP context with a no-op. Consider adding a brief comment to clarify the rationale for future maintainers.

📝 Suggested documentation

+        # No-op context manager to prevent double sequence partitioning when
+        # SequenceParallelContextManager is already handling the split.
         `@contextlib.contextmanager`
         def _noop_cp_context(
             buffers=None, buffer_seq_dims=None, no_restore_buffers=None
         ):
             yield

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/axolotl/monkeypatch/accelerate/parallelism_config.py` around lines 89 -
95, Add a brief explanatory comment above the _noop_cp_context definition to
document that this no-op context manager intentionally replaces Accelerate's CP
context to prevent double sequence partitioning; mention that _noop_cp_context
yields immediately and is assigned to self._cp_context to avoid re-partitioning
buffers when Accelerate's original context would run.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/axolotl/monkeypatch/accelerate/parallelism_config.py`:
- Around line 89-95: Add a brief explanatory comment above the _noop_cp_context
definition to document that this no-op context manager intentionally replaces
Accelerate's CP context to prevent double sequence partitioning; mention that
_noop_cp_context yields immediately and is assigned to self._cp_context to avoid
re-partitioning buffers when Accelerate's original context would run.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cba8a82f-3154-4add-8a7b-6e4561fb28d0

📥 Commits

Reviewing files that changed from the base of the PR and between d8a0574 and 80edaf5.

📒 Files selected for processing (1)

src/axolotl/monkeypatch/accelerate/parallelism_config.py

codecov · 2026-03-15T15:29:44Z

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...olotl/monkeypatch/accelerate/parallelism_config.py	0.00%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

fix: solved double sequence partition from SequenceParallelContextMan…

80edaf5

…ager and Accelerate's native CP

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix double sequence partition during training with context-parallel#3498

Fix double sequence partition during training with context-parallel#3498
lorenzbaraldi wants to merge 1 commit intoaxolotl-ai-cloud:mainfrom
lorenzbaraldi:fix/context-parallelism

lorenzbaraldi commented Mar 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lorenzbaraldi commented Mar 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

AI Usage Disclaimer

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 15, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lorenzbaraldi commented Mar 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 15, 2026 •

edited

Loading