-
Notifications
You must be signed in to change notification settings - Fork 11
Major update: Add QLoRA & multi-resolution packing support #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Fix issues in previous LoRA implementation - Implement custom LoRA interface based on PEFT library - Add proper injection, saving, loading, and unloading functions for LoRA adapters - Expose LoRA utility functions in package exports
- Implement multi-resolution packing for CogView4 to improve training efficiency - Add QLoRA support for both CogView and CogVideo models - Refactor trainers to fix training bugs and optimize computation pipeline - Update dataset utilities and fine-tuning base components This update significantly improves model training efficiency and flexibility.
|
We will follow up with a comparison of resize training, packing training, and QLoRA+packing training effects on the pixelart dataset. |
|
|
TLDR; Based on pixelart, it can be observed that: 1) packing has a faster convergence speed than resize; 2) packing yields better generation quality than resize, reflected in the details and sharpness of the generated images (results from the resize approach tend to be blurry); 3) although qlora can reduce the hardware threshold for fine-tuning, it significantly impacts model quality on the pixelart dataset (resulting in grid-like images) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM















Description
This PR introduces significant changes to enable:
Key improvements:
Notes
Multi-resolution packing currently supports only the cogview series.
For cogvideo, QLoRA offers limited benefits due to activation-dominated memory usage and is typically suitable only for batch size = 1 scenarios.
On the pixelart dataset, QLoRA fine-tuning may negatively impact final generation quality.
Changes include modifications to CogView4 attention processor in diffusers to support packed multi-resolution sequences (see this PR).