[cogview4][feat] Support attention mechanism with variable-length support and batch packing #11349

OleehyO · 2025-04-17T05:54:21Z

What does this PR do?

This PR is primarily aimed at adding native multi-resolution + packing training support to CogView4 to better meet user needs. We have conducted relevant tests in this PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@a-r-r-o-w
@zRzRzRzRzRzRzR

…nd batch packing Add support for variable-length attention between text and vision tokens while maintaining the original attention pattern. Implement batch packing capability to improve computational efficiency during inference and training.

src/diffusers/models/transformers/transformer_cogview4.py

yiyixuxu · 2025-04-17T23:36:35Z

let me know what you think too @a-r-r-o-w
I wonder if it would be easier to make a new attention processor for this?

src/diffusers/models/transformers/transformer_cogview4.py

OleehyO · 2025-04-18T09:16:54Z

@a-r-r-o-w It has been renamed to CogView4TrainingAttnProcessor, and a bug in the original CogView4AttnProcessor has been fixed. We use the same naming format in both CogView4AttnProcessor and CogView4TrainingAttnProcessor. Please take a look

src/diffusers/models/transformers/transformer_cogview4.py

OleehyO · 2025-04-19T02:54:20Z

I have changed back to using attention_kwargs in CogView4Transformer2DModel to pass attention parameters.

a-r-r-o-w

Thanks for addressing the comments!

yiyixuxu · 2025-04-21T18:30:42Z

@bot /style

HuggingFaceDocBuilderDev · 2025-04-21T18:35:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/diffusers/models/transformers/transformer_cogview4.py

yiyixuxu · 2025-04-21T18:36:01Z

@bot /style

github-actions · 2025-04-21T18:36:54Z

Style fixes have been applied. View the workflow run here.

OleehyO added 4 commits April 9, 2025 10:24

[cogview4] Fix rope

1fae35e

[cogview4] Fix tensor type after qk norm

255cb5a

[cogview4] Add docs for attn processor

1a48dcd

OleehyO mentioned this pull request Apr 17, 2025

Major update: Add QLoRA & multi-resolution packing support THUDM/CogKit#26

Merged

yiyixuxu reviewed Apr 17, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_cogview4.py Show resolved Hide resolved

a-r-r-o-w reviewed Apr 18, 2025

View reviewed changes

OleehyO added 2 commits April 18, 2025 05:49

[chore] Change type hint

f2a6e5c

Rename as CogView4TrainingAttnProcessor

ccf6752

a-r-r-o-w reviewed Apr 18, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_cogview4.py Outdated Show resolved Hide resolved

src/diffusers/models/transformers/transformer_cogview4.py Outdated Show resolved Hide resolved

src/diffusers/models/transformers/transformer_cogview4.py Outdated Show resolved Hide resolved

[refactor] Back to original signature, using attention_kwargs instead

fe0c30b

a-r-r-o-w approved these changes Apr 19, 2025

View reviewed changes

a-r-r-o-w requested a review from yiyixuxu April 19, 2025 15:06

yiyixuxu added the close-to-merge label Apr 21, 2025

yiyixuxu reviewed Apr 21, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_cogview4.py Outdated Show resolved Hide resolved

Update src/diffusers/models/transformers/transformer_cogview4.py

b70f208

Apply style fixes

1f657ac

Merge branch 'main' into main

ddab008

yiyixuxu merged commit 0434db9 into huggingface:main Apr 21, 2025
12 checks passed

yiyixuxu removed the close-to-merge label Apr 21, 2025

Uh oh!

[cogview4][feat] Support attention mechanism with variable-length support and batch packing #11349

[cogview4][feat] Support attention mechanism with variable-length support and batch packing #11349

Uh oh!

Conversation

OleehyO commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

yiyixuxu commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OleehyO commented Apr 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OleehyO commented Apr 19, 2025

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Apr 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 21, 2025

Uh oh!

Uh oh!

yiyixuxu commented Apr 21, 2025

Uh oh!

github-actions bot commented Apr 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

OleehyO commented Apr 17, 2025 •

edited

Loading