Skip to content

Commit 8c6edb3

Browse files
committed
Merge branch 'main' into refactor/condense-group-offloading
2 parents 447e881 + ba2ba90 commit 8c6edb3

File tree

20 files changed

+2758
-16
lines changed

20 files changed

+2758
-16
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,7 @@ jobs:
333333
additional_deps: ["peft"]
334334
- backend: "gguf"
335335
test_location: "gguf"
336-
additional_deps: ["peft"]
336+
additional_deps: ["peft", "kernels"]
337337
- backend: "torchao"
338338
test_location: "torchao"
339339
additional_deps: []

docs/source/en/api/loaders/lora.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
3030
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
3131
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
3232
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
33+
- [`QwenImageLoraLoaderMixin`] provides similar functions for [Qwen Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwen)
3334
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
3435

3536
<Tip>
@@ -105,6 +106,10 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
105106

106107
[[autodoc]] loaders.lora_pipeline.HiDreamImageLoraLoaderMixin
107108

109+
## QwenImageLoraLoaderMixin
110+
111+
[[autodoc]] loaders.lora_pipeline.QwenImageLoraLoaderMixin
112+
108113
## LoraBaseMixin
109114

110115
[[autodoc]] loaders.lora_base.LoraBaseMixin

docs/source/en/api/pipelines/qwenimage.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@
1414

1515
# QwenImage
1616

17-
<!-- TODO: update this section when model is out -->
17+
Qwen-Image from the Qwen team is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. Experiments show strong general capabilities in both image generation and editing, with exceptional performance in text rendering, especially for Chinese.
18+
19+
Check out the model card [here](https://huggingface.co/Qwen/Qwen-Image) to learn more.
1820

1921
<Tip>
2022

@@ -28,6 +30,6 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
2830
- all
2931
- __call__
3032

31-
## QwenImagePipeline
33+
## QwenImagePipelineOutput
3234

3335
[[autodoc]] pipelines.qwenimage.pipeline_output.QwenImagePipelineOutput

docs/source/en/quantization/gguf.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,16 @@ image = pipe(prompt, generator=torch.manual_seed(0)).images[0]
5353
image.save("flux-gguf.png")
5454
```
5555

56+
## Using Optimized CUDA Kernels with GGUF
57+
58+
Optimized CUDA kernels can accelerate GGUF quantized model inference by approximately 10%. This functionality requires a compatible GPU with `torch.cuda.get_device_capability` greater than 7 and the kernels library:
59+
60+
```shell
61+
pip install -U kernels
62+
```
63+
64+
Once installed, set `DIFFUSERS_GGUF_CUDA_KERNELS=true` to use optimized kernels when available. Note that CUDA kernels may introduce minor numerical differences compared to the original GGUF implementation, potentially causing subtle visual variations in generated images. To disable CUDA kernel usage, set the environment variable `DIFFUSERS_GGUF_CUDA_KERNELS=false`.
65+
5666
## Supported Quantization Types
5767

5868
- BF16

examples/dreambooth/README_qwen.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# DreamBooth training example for Qwen Image
2+
3+
[DreamBooth](https://huggingface.co/papers/2208.12242) is a method to personalize text2image models like stable diffusion given just a few (3~5) images of a subject.
4+
5+
The `train_dreambooth_lora_qwen_image.py` script shows how to implement the training procedure with [LoRA](https://huggingface.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora) and adapt it for [Qwen Image](https://huggingface.co/Qwen/Qwen-Image).
6+
7+
8+
This will also allow us to push the trained model parameters to the Hugging Face Hub platform.
9+
10+
## Running locally with PyTorch
11+
12+
### Installing the dependencies
13+
14+
Before running the scripts, make sure to install the library's training dependencies:
15+
16+
**Important**
17+
18+
To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
19+
20+
```bash
21+
git clone https://github.com/huggingface/diffusers
22+
cd diffusers
23+
pip install -e .
24+
```
25+
26+
Then cd in the `examples/dreambooth` folder and run
27+
```bash
28+
pip install -r requirements_sana.txt
29+
```
30+
31+
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
32+
33+
```bash
34+
accelerate config
35+
```
36+
37+
Or for a default accelerate configuration without answering questions about your environment
38+
39+
```bash
40+
accelerate config default
41+
```
42+
43+
Or if your environment doesn't support an interactive shell (e.g., a notebook)
44+
45+
```python
46+
from accelerate.utils import write_basic_config
47+
write_basic_config()
48+
```
49+
50+
When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups.
51+
Note also that we use PEFT library as backend for LoRA training, make sure to have `peft>=0.14.0` installed in your environment.
52+
53+
54+
### Dog toy example
55+
56+
Now let's get our dataset. For this example we will use some dog images: https://huggingface.co/datasets/diffusers/dog-example.
57+
58+
Let's first download it locally:
59+
60+
```python
61+
from huggingface_hub import snapshot_download
62+
63+
local_dir = "./dog"
64+
snapshot_download(
65+
"diffusers/dog-example",
66+
local_dir=local_dir, repo_type="dataset",
67+
ignore_patterns=".gitattributes",
68+
)
69+
```
70+
71+
This will also allow us to push the trained LoRA parameters to the Hugging Face Hub platform.
72+
73+
Now, we can launch training using:
74+
75+
```bash
76+
export MODEL_NAME="Qwen/Qwen-Image"
77+
export INSTANCE_DIR="dog"
78+
export OUTPUT_DIR="trained-sana-lora"
79+
80+
accelerate launch train_dreambooth_lora_sana.py \
81+
--pretrained_model_name_or_path=$MODEL_NAME \
82+
--instance_data_dir=$INSTANCE_DIR \
83+
--output_dir=$OUTPUT_DIR \
84+
--mixed_precision="bf16" \
85+
--instance_prompt="a photo of sks dog" \
86+
--resolution=1024 \
87+
--train_batch_size=1 \
88+
--gradient_accumulation_steps=4 \
89+
--use_8bit_adam \
90+
--learning_rate=2e-4 \
91+
--report_to="wandb" \
92+
--lr_scheduler="constant" \
93+
--lr_warmup_steps=0 \
94+
--max_train_steps=500 \
95+
--validation_prompt="A photo of sks dog in a bucket" \
96+
--validation_epochs=25 \
97+
--seed="0" \
98+
--push_to_hub
99+
```
100+
101+
For using `push_to_hub`, make you're logged into your Hugging Face account:
102+
103+
```bash
104+
hf auth login
105+
```
106+
107+
To better track our training experiments, we're using the following flags in the command above:
108+
109+
* `report_to="wandb` will ensure the training runs are tracked on [Weights and Biases](https://wandb.ai/site). To use it, be sure to install `wandb` with `pip install wandb`. Don't forget to call `wandb login <your_api_key>` before training if you haven't done it before.
110+
* `validation_prompt` and `validation_epochs` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
111+
112+
## Notes
113+
114+
Additionally, we welcome you to explore the following CLI arguments:
115+
116+
* `--lora_layers`: The transformer modules to apply LoRA training on. Please specify the layers in a comma separated. E.g. - "to_k,to_q,to_v" will result in lora training of attention layers only.
117+
* `--max_sequence_length`: Maximum sequence length to use for text embeddings.
118+
119+
We provide several options for optimizing memory optimization:
120+
121+
* `--offload`: When enabled, we will offload the text encoder and VAE to CPU, when they are not used.
122+
* `cache_latents`: When enabled, we will pre-compute the latents from the input images with the VAE and remove the VAE from memory once done.
123+
* `--use_8bit_adam`: When enabled, we will use the 8bit version of AdamW provided by the `bitsandbytes` library.
124+
125+
Refer to the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwenimage) of the `QwenImagePipeline` to know more about the models available under the SANA family and their preferred dtypes during inference.
126+
127+
## Using quantization
128+
129+
You can quantize the base model with [`bitsandbytes`](https://huggingface.co/docs/bitsandbytes/index) to reduce memory usage. To do so, pass a JSON file path to `--bnb_quantization_config_path`. This file should hold the configuration to initialize `BitsAndBytesConfig`. Below is an example JSON file:
130+
131+
```json
132+
{
133+
"load_in_4bit": true,
134+
"bnb_4bit_quant_type": "nf4"
135+
}
136+
```

0 commit comments

Comments
 (0)