Skip to content

Conversation

@OleehyO
Copy link
Collaborator

@OleehyO OleehyO commented Apr 17, 2025

Description

This PR introduces significant changes to enable:

  1. Fine-tuning on resource-constrained devices via QLoRA implementation
  2. Native multi-resolution training through packed sequences

Key improvements:

  • QLoRA integration combined with offload strategies reduces minimum VRAM requirements from 30GB to 9GB for CogView4
  • Multi-resolution packing achieves ~4x training efficiency gain on pixelart dataset compared to resize approach and improving generation quality.
  • Fixed bugs in LoRA loading/unloading and introduced a user-friendly interface based on PEFT's LoRA injection.
  • Refactor trainer to improve validation speed and code readability.

Notes

  • Multi-resolution packing currently supports only the cogview series.

  • For cogvideo, QLoRA offers limited benefits due to activation-dominated memory usage and is typically suitable only for batch size = 1 scenarios.

  • On the pixelart dataset, QLoRA fine-tuning may negatively impact final generation quality.

  • Changes include modifications to CogView4 attention processor in diffusers to support packed multi-resolution sequences (see this PR).

OleehyO added 10 commits April 16, 2025 07:19
- Fix issues in previous LoRA implementation
- Implement custom LoRA interface based on PEFT library
- Add proper injection, saving, loading, and unloading functions for LoRA adapters
- Expose LoRA utility functions in package exports
- Implement multi-resolution packing for CogView4 to improve training efficiency
- Add QLoRA support for both CogView and CogVideo models
- Refactor trainers to fix training bugs and optimize computation pipeline
- Update dataset utilities and fine-tuning base components

This update significantly improves model training efficiency and flexibility.
@OleehyO
Copy link
Collaborator Author

OleehyO commented Apr 17, 2025

We will follow up with a comparison of resize training, packing training, and QLoRA+packing training effects on the pixelart dataset.

@OleehyO OleehyO requested review from zRzRzRzRzRzRzR and zhangch9 and removed request for zRzRzRzRzRzRzR April 17, 2025 02:38
@OleehyO
Copy link
Collaborator Author

OleehyO commented Apr 17, 2025

  • From left to right are: multi-resolution + packing + q4 quantization; multi-resolution + packing; resize.
  • From top to bottom represents the number of training steps (in intervals of 30 steps).

@OleehyO
Copy link
Collaborator Author

OleehyO commented Apr 17, 2025

prompt:A black and white photograph of a stone carving on a wall, featuring a grotesque face with a beard and a crown-like structure above it. The carving is set within a stone archway, and the image has a vintage appearance with a faded and aged look.

original:
截屏2025-04-21 10 24 27


截屏2025-04-17 13 19 48 截屏2025-04-17 13 20 04 截屏2025-04-17 13 20 19 截屏2025-04-17 13 20 46

@OleehyO
Copy link
Collaborator Author

OleehyO commented Apr 17, 2025

prompt: a beautiful photorealistic painting of cemetery urbex unfinished building building industrial architecture nature abandoned by thomas cole, nature extraterrestial tron forest darkacademia thermal vision futuristic tokyo, archdaily, wallpaper, highly detailed, trending on artstation.

original:
截屏2025-04-21 10 27 46


截屏2025-04-17 13 23 50 截屏2025-04-17 13 24 06 截屏2025-04-17 13 24 30 截屏2025-04-17 13 25 03

@OleehyO
Copy link
Collaborator Author

OleehyO commented Apr 17, 2025

prompt: doom eternal, evangelion, game concept art, veins and worms, muscular, crustacean exoskeleton, chiroptera head, chiroptera ears, mecha, ferocious, fierce, hyperrealism, fine details, artstation, cgsociety, zbrush, no background

original:
截屏2025-04-21 10 30 24


截屏2025-04-17 13 27 27 截屏2025-04-17 13 27 55 截屏2025-04-17 13 28 19 截屏2025-04-17 13 28 33

@OleehyO
Copy link
Collaborator Author

OleehyO commented Apr 17, 2025

TLDR; Based on pixelart, it can be observed that: 1) packing has a faster convergence speed than resize; 2) packing yields better generation quality than resize, reflected in the details and sharpness of the generated images (results from the resize approach tend to be blurry); 3) although qlora can reduce the hardware threshold for fine-tuning, it significantly impacts model quality on the pixelart dataset (resulting in grid-like images)

Copy link
Collaborator

@zhangch9 zhangch9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhangch9 zhangch9 merged commit 888b45f into main Apr 27, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants