Major update: Add QLoRA & multi-resolution packing support #26

OleehyO · 2025-04-17T02:35:32Z

Description

This PR introduces significant changes to enable:

Fine-tuning on resource-constrained devices via QLoRA implementation
Native multi-resolution training through packed sequences

Key improvements:

QLoRA integration combined with offload strategies reduces minimum VRAM requirements from 30GB to 9GB for CogView4
Multi-resolution packing achieves ~4x training efficiency gain on pixelart dataset compared to resize approach and improving generation quality.
Fixed bugs in LoRA loading/unloading and introduced a user-friendly interface based on PEFT's LoRA injection.
Refactor trainer to improve validation speed and code readability.

Notes

Multi-resolution packing currently supports only the cogview series.
For cogvideo, QLoRA offers limited benefits due to activation-dominated memory usage and is typically suitable only for batch size = 1 scenarios.
On the pixelart dataset, QLoRA fine-tuning may negatively impact final generation quality.
Changes include modifications to CogView4 attention processor in diffusers to support packed multi-resolution sequences (see this PR).

- Fix issues in previous LoRA implementation - Implement custom LoRA interface based on PEFT library - Add proper injection, saving, loading, and unloading functions for LoRA adapters - Expose LoRA utility functions in package exports

- Implement multi-resolution packing for CogView4 to improve training efficiency - Add QLoRA support for both CogView and CogVideo models - Refactor trainers to fix training bugs and optimize computation pipeline - Update dataset utilities and fine-tuning base components This update significantly improves model training efficiency and flexibility.

OleehyO · 2025-04-17T02:38:18Z

We will follow up with a comparison of resize training, packing training, and QLoRA+packing training effects on the pixelart dataset.

OleehyO · 2025-04-17T05:19:18Z

From left to right are: multi-resolution + packing + q4 quantization; multi-resolution + packing; resize.
From top to bottom represents the number of training steps (in intervals of 30 steps).

OleehyO · 2025-04-17T05:21:28Z

prompt：A black and white photograph of a stone carving on a wall, featuring a grotesque face with a beard and a crown-like structure above it. The carving is set within a stone archway, and the image has a vintage appearance with a faded and aged look.

original:

OleehyO · 2025-04-17T05:25:53Z

prompt: a beautiful photorealistic painting of cemetery urbex unfinished building building industrial architecture nature abandoned by thomas cole, nature extraterrestial tron forest darkacademia thermal vision futuristic tokyo, archdaily, wallpaper, highly detailed, trending on artstation.

original:

OleehyO · 2025-04-17T05:29:50Z

prompt: doom eternal, evangelion, game concept art, veins and worms, muscular, crustacean exoskeleton, chiroptera head, chiroptera ears, mecha, ferocious, fierce, hyperrealism, fine details, artstation, cgsociety, zbrush, no background

original:

OleehyO · 2025-04-17T05:34:45Z

TLDR; Based on pixelart, it can be observed that: 1) packing has a faster convergence speed than resize; 2) packing yields better generation quality than resize, reflected in the details and sharpness of the generated images (results from the resize approach tend to be blurry); 3) although qlora can reduce the hardware threshold for fine-tuning, it significantly impacts model quality on the pixelart dataset (resulting in grid-like images)

zhangch9

LGTM

OleehyO added 10 commits April 16, 2025 07:19

[WIP] Script impl for packing & QLoRA

62c9bfb

[chore] Add script for resizing images

aea1663

Fix arg parsing error

6ea5059

Fix PIL image to torch tensor error

1266c43

[chore] Add utility functions for dictionary processing

ba95aab

[chore] Add utils for multi-resolution training and packing

d71c72c

[fix] Resolve ckpt-related bugs in Gradio demos

9bbd8a5

[deps] Add tensorboard

31e967b

OleehyO requested review from zRzRzRzRzRzRzR and zhangch9 and removed request for zRzRzRzRzRzRzR April 17, 2025 02:38

OleehyO mentioned this pull request Apr 17, 2025

[cogview4][feat] Support attention mechanism with variable-length support and batch packing huggingface/diffusers#11349

Merged

6 tasks

[chore] Cleanup

e4f6da0

OleehyO force-pushed the feat branch from 37051cd to e4f6da0 Compare April 18, 2025 06:40

OleehyO added 2 commits April 21, 2025 07:51

[chore] Align with diffusers

1bfef3f

[api] Fix lora path, remove lora scale

e67f97c

zhangch9 approved these changes Apr 27, 2025

View reviewed changes

zhangch9 merged commit 888b45f into main Apr 27, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Major update: Add QLoRA & multi-resolution packing support #26

Major update: Add QLoRA & multi-resolution packing support #26

Uh oh!

OleehyO commented Apr 17, 2025 •

edited

Loading

Uh oh!

OleehyO commented Apr 17, 2025

Uh oh!

OleehyO commented Apr 17, 2025

Uh oh!

OleehyO commented Apr 17, 2025 •

edited

Loading

Uh oh!

OleehyO commented Apr 17, 2025 •

edited

Loading

Uh oh!

OleehyO commented Apr 17, 2025 •

edited

Loading

Uh oh!

OleehyO commented Apr 17, 2025

Uh oh!

zhangch9 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Major update: Add QLoRA & multi-resolution packing support #26

Major update: Add QLoRA & multi-resolution packing support #26

Uh oh!

Conversation

OleehyO commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Notes

Uh oh!

OleehyO commented Apr 17, 2025

Uh oh!

OleehyO commented Apr 17, 2025

Uh oh!

OleehyO commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OleehyO commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OleehyO commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OleehyO commented Apr 17, 2025

Uh oh!

zhangch9 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OleehyO commented Apr 17, 2025 •

edited

Loading

OleehyO commented Apr 17, 2025 •

edited

Loading

OleehyO commented Apr 17, 2025 •

edited

Loading

OleehyO commented Apr 17, 2025 •

edited

Loading