[fix] Add cu_seqlens_argmin to vlm packed sequence by cuichenx · Pull Request #2246 · NVIDIA-NeMo/Megatron-Bridge

cuichenx · 2026-02-05T22:19:10Z

What does this PR do ?

#1997 support in-batch sequence packing for VLMs but introduced a perf degradation.
#2180 resolved the perf issue but introduced a bug for in-batch sequence packing.
This PR fixes the bug by passing in cu_seqlens_argmin in vlm_step.py so there is no perf degradation.

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Documentation
- Added guidance on packed sequence optimization considerations, highlighting potential performance implications for training workflows to help ensure awareness of best practices.
Improvements
- Optimized packed sequence parameter configuration and metadata handling to support more efficient sequence processing during model training.

Signed-off-by: Chen Cui <chcui@nvidia.com>

coderabbitai · 2026-02-05T22:21:37Z

📝 Walkthrough

Walkthrough

The pull request adds documentation about potential device-to-host synchronization costs from torch.argmin calls in packed sequence utilities and introduces a new cu_seqlens_argmin scalar tensor parameter to the packed sequence metadata handling in VLM training.

Changes

Cohort / File(s)	Summary
Packed Sequence Support Enhancement `src/megatron/bridge/training/utils/packed_seq_utils.py`, `src/megatron/bridge/training/vlm_step.py`	Added documentation note warning about `torch.argmin` device-to-host synchronization overhead and introduced `cu_seqlens_argmin` as a new parameter in packed sequence metadata to enable pre-computed argmin values.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Revert packed seq extra checks #2180 — Directly related as it removes argmin-based padding checks and uses provided argmin values, complementing this PR's introduction of the cu_seqlens_argmin parameter.

Suggested reviewers

yaoyu-33
ko3n1g

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces performance optimization (cu_seqlens_argmin) to avoid device-to-host synchronization. Commits and code explicitly reference performance impact, yet PR description contains only template placeholders with no test results, performance metrics, or validation data.	Add PR description with performance comparison data, test results confirming correctness, test configurations, and confirmation that numerics/convergence are unaffected by this change.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately summarizes the main change: adding cu_seqlens_argmin to VLM packed sequence support, which is directly reflected in the file changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chcui/fix_vlm_packed_sequence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/megatron/bridge/training/utils/packed_seq_utils.py`:
- Around line 46-47: Remove the trailing space at the end of the comment line
that starts with "# note: if argmin is not pre-computed in the dataloader,
torch.argmin here will incur a" in
src/megatron/bridge/training/utils/packed_seq_utils.py; edit that comment to end
without any trailing whitespace (and optionally run the repo's pre-commit hooks
or a trim-whitespace action to ensure no other trailing spaces remain).

🧹 Nitpick comments (1)

src/megatron/bridge/training/vlm_step.py (1)
405-412: LGTM! The implementation correctly avoids device-to-host sync.

The logic is correct: since pack_batch_sequences creates cu_seqlens without -1 padding, the argmin index should be the full length of the tensor. Providing this pre-computed value avoids the torch.argmin call and the associated device-to-host synchronization mentioned in the documentation note.

Optional: Consider specifying dtype=torch.int32 for consistency with max_seqlen and other scalar metadata tensors created in pack_batch_sequences.
♻️ Optional consistency improvement
-        cu_seqlens_argmin = torch.tensor(len(cu_seqlens))  # no padding in cu_seqlens since packing is done in-batch
+        cu_seqlens_argmin = torch.tensor(len(cu_seqlens), dtype=torch.int32)  # no padding in cu_seqlens since packing is done in-batch

src/megatron/bridge/training/utils/packed_seq_utils.py

Signed-off-by: Chen Cui <chcui@nvidia.com>

Add validation for micro_batch_size when packing sequences Signed-off-by: Chen Cui <chcui@nvidia.com>

Signed-off-by: Chen Cui <chcui@nvidia.com>

Add cu_seqlens_argmin to vlm packed sequence

1ca35bd

Signed-off-by: Chen Cui <chcui@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci February 5, 2026 22:19 Error

copy-pr-bot bot had a problem deploying to test February 5, 2026 22:19 Error

Add note on argmin performance impact

b446ec1

Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 5, 2026

copy-pr-bot bot temporarily deployed to nemo-ci February 5, 2026 22:20 Inactive

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

src/megatron/bridge/training/utils/packed_seq_utils.py Outdated Show resolved Hide resolved

lint

bca5288

Signed-off-by: Chen Cui <chcui@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci February 6, 2026 00:19 Error

copy-pr-bot bot had a problem deploying to test February 6, 2026 00:19 Error

Validate micro_batch_size for packed sequences

8eff5b3

Add validation for micro_batch_size when packing sequences Signed-off-by: Chen Cui <chcui@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci February 6, 2026 00:38 Error

lint

52d96c9

Signed-off-by: Chen Cui <chcui@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci February 6, 2026 00:40 Inactive

copy-pr-bot bot temporarily deployed to test February 6, 2026 00:40 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 6, 2026 01:16 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 6, 2026 01:53 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 6, 2026 02:12 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 7, 2026 01:20 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 7, 2026 01:28 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 7, 2026 01:44 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Add cu_seqlens_argmin to vlm packed sequence#2246

[fix] Add cu_seqlens_argmin to vlm packed sequence#2246
cuichenx wants to merge 6 commits intomainfrom
chcui/fix_vlm_packed_sequence

cuichenx commented Feb 5, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cuichenx commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cuichenx commented Feb 5, 2026 •

edited

Loading

coderabbitai bot commented Feb 5, 2026 •

edited

Loading