Skip to content

Conversation

@cuichenx
Copy link
Contributor

@cuichenx cuichenx commented Jan 20, 2026

What does this PR do ?

Support THD Training in VLMs

TODOs

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Gemma 3 VL: Loss and grad norm are both matching closely between cp1 and cp2, for both BSHD and THD.

image

MInistral 3: Not sure why grad norm is different between cp1 and cp2. It might be a display issue. I will look into this further. Loss is matching very closely between cp1 and cp2, for both BSHD and THD.
image

More plots to come

  • Related to # (issue)

Summary by CodeRabbit

  • New Features

    • Added batch-level sequence packing support for optimized dataset processing
    • Introduced context-parallel distributed training support for Gemma3, Ministral3, GLM, and Qwen vision-language models
  • Refactor

    • Updated model forward signatures to support packed sequence parameters across vision-language models
  • Tests

    • Added comprehensive test coverage for sequence packing utilities, distributed training configurations, and attention scaling algorithms

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants