[wip] feat: support lora in qwen image and training script #12056

sayakpaul · 2025-08-04T04:30:09Z

What does this PR do?

Still testing. Needs a custom token to test (refer Slack). We support quantization through the bnb_quantization_config_path CLI argument.

test command

export MODEL_NAME="Qwen/Qwen-Image"
export INSTANCE_DIR="linoyts/3d_icon"
export OUTPUT_DIR="trained-qwen-image-lora"

accelerate launch train_dreambooth_lora_qwen_image.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --dataset_name=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="bf16" \
  --instance_prompt="3dicon" \
  --caption_column="prompt"\
  --validation_prompt="a 3dicon, a llama eating ramen" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --use_8bit_adam \
  --rank=8 \
  --learning_rate=2e-4 \
  --report_to="wandb" \
  --lr_scheduler="constant_with_warmup" \
  --lr_warmup_steps=100 \
  --max_train_steps=1000 \
  --cache_latents\
  --gradient_checkpointing \
  --validation_epochs=25 \
  --seed="0"

TODOs:

Button up README.
Add tests for pipeline, trainer, LoRA (for tests we need to be able to deal with small sizes)

I prefer to tackle the tests in a separate PR. Some tests are already in tests/qwen-image branch, I think.

HuggingFaceDocBuilderDev · 2025-08-04T04:37:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul · 2025-08-04T06:09:52Z

examples/dreambooth/train_dreambooth_lora_qwen_image.py

+    # Qwen expects a `num_frames` dimension too.
+    if pixel_values.ndim == 4:
+        pixel_values = pixel_values.unsqueeze(2)


Let's refactor the AutoencoderKLQwenImage methods to not use frame dimension. I think that code was copy-pasted from Wan, but we don't need frame dimension. cc @naykun

I can wait for your refactor PR to come through. Or do you prefer this PR? 👀

Yeah, for now the frame dimension is not needed.

Feel free to take it up in this PR, as I am logging off for a few hours

Ah, I was taking a stab at this, but we also need to consider the design of QwenImageCausalConv3d, which inherits from nn.Conv3d. So, a bit more involved PR than I had original thought. So, would prefer to do that in a separate PR to not block this one.

Okay sounds good, let's do in separate PR

sayakpaul · 2025-08-04T06:10:06Z

examples/dreambooth/train_dreambooth_lora_qwen_image.py

+                    (1, args.resolution // vae_scale_factor // 2, args.resolution // vae_scale_factor // 2)
+                ] * bsz
+                # transpose the dimensions
+                noisy_model_input = noisy_model_input.permute(0, 2, 1, 3, 4)


sayakpaul · 2025-08-04T06:11:04Z

examples/dreambooth/train_dreambooth_lora_qwen_image.py

+    parser.add_argument(
+        "--guidance_scale",
+        type=float,
+        default=0.0,
+        help="Qwen image is a guidance distilled model",


Took this value from the official doc example. Correct?

It's not a guidance distilled model, the guidance is actually None no matter what guidance_scale is set. Only true_cfg_scale works.

Should we supply any guidance value at all during training? My reference is:

diffusers/src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

Line 729 in cb8e61e

guidance=guidance,

it seems this:

diffusers/src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

Line 692 in cb8e61e

if self.transformer.config.guidance_embeds:

is false by default so guidance is not actually ever used as @haofanwang mentioned, so we can probably remove it altogether?
also this seems relevant: https://github.com/huggingface/diffusers/pull/12057/files#r2250725231

guidance gone in the latest commit.
cb1b6b4

@haofanwang @naykun Let us know if we should remove the guidance embed config from the transformer implementation if it's not used: #12057 (comment)

Also, instead of calling it true_cfg_scale, we should just remove it and use guidance_scale to mean the actual CFG scale. For guidance-distilled models like Flux, we mean guidance_scale as the embedded-scale, whereas true_cfg_scale as the true scale. But, for most of the normal released models, we default to naming the CFG parameter as guidance_scale and not true_cfg_scale

+1 to this. Totally agree.

This time, we are releasing the raw model without guidance distillation. However, we hope a distilled version will become available soon—either from the community or from us. To ensure future compatibility, we may keep this unchanged?

Oh, if it's planned to release guidance-distilled (and I think it will be highly expected in community too, so someone might take the initiative), then I think it's okay to keep as-is. Thanks for letting us know!

For the purpose of this PR, I have just removed the option of configuring the guidance_scale from the training script. I think that should cut the deal?

sayakpaul · 2025-08-04T06:12:10Z

examples/dreambooth/train_dreambooth_lora_qwen_image.py

+        with offload_models(text_encoding_pipeline, device=accelerator.device, offload=args.offload):
+            instance_prompt_embeds, instance_prompt_embeds_mask, _ = compute_text_embeddings(
+                args.instance_prompt, text_encoding_pipeline
+            )


Use of the offload_models() utility to easily offload and onload modules we don't always want to be present on the accelerator device.

sayakpaul · 2025-08-04T06:13:49Z

examples/dreambooth/train_dreambooth_lora_qwen_image.py

+                        pixel_values = batch["pixel_values"].to(dtype=vae.dtype)
+                    model_input = vae.encode(pixel_values).latent_dist.sample()
+
+                model_input = (model_input - latents_mean) * latents_std


Reversal of

diffusers/src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

Lines 779 to 782 in 8e53cd9

latents_std = 1.0 / torch.tensor(self.vae.config.latents_std).view(1, self.vae.config.z_dim, 1, 1, 1).to(

latents.device, latents.dtype

)

latents = latents / latents_std + latents_mean

sayakpaul · 2025-08-04T06:15:22Z

examples/dreambooth/train_dreambooth_lora_qwen_image.py

+    vae = AutoencoderKLQwenImage.from_pretrained(
+        args.pretrained_model_name_or_path,
+        subfolder="vae",
+        revision=args.revision,
+        variant=args.variant,
+    )


Keeping it in FP32 for numerical stability. Haven't yet verified if using BF16 is alright.

sayakpaul · 2025-08-04T06:21:06Z

examples/dreambooth/train_dreambooth_lora_qwen_image.py

+    parser.add_argument(
+        "--weighting_scheme",
+        type=str,
+        default="none",
+        choices=["sigma_sqrt", "logit_normal", "mode", "cosmap", "none"],
+        help=('We default to the "none" weighting scheme for uniform sampling and uniform loss'),
+    )


This is a reasonable default. However, we know that this can impact training significantly. For example, SD3 and LTX ise logit_normal however, for Flux and SANA, none work.

I think we should check this with the Qwen Image authors.

sayakpaul · 2025-08-04T07:50:43Z

Tested with the following:

export MODEL_NAME="Qwen/Qwen-Image"
export INSTANCE_DIR="linoyts/3d_icon"
export OUTPUT_DIR="trained-qwen-image-lora"

accelerate launch train_dreambooth_lora_qwen_image.py \
  --pretrained_model_name_or_path $MODEL_NAME \
  --dataset_name             $INSTANCE_DIR \
  --output_dir               $OUTPUT_DIR \
  --mixed_precision          bf16 \
  --instance_prompt          "3dicon" \
  --caption_column           prompt \
  --resolution               1024 \
  --train_batch_size         1 \
  --gradient_accumulation_steps 4 \
  --use_8bit_adam \
  --rank                     8 \
  --learning_rate            2e-4 \
  --guidance_scale           1.0 \
  --report_to                wandb \
  --lr_scheduler             constant \
  --lr_warmup_steps          100 \
  --max_train_steps          1000 \
  --cache_latents \
  --gradient_checkpointing \
  --validation_epochs        25 \
  --seed                     0

Inference:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image", torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("trained-qwen-image-lora")

image = pipe(
    "a 3dicon, a llama with a signboard saying 'Qwen is awesome'", guidance_scale=1.0, num_inference_steps=50
).images[0]
image.save("llama.png")

linoytsaban

thanks @sayakpaul 🙌🏻
left one comment re:guidance, other than that looking good!

a-r-r-o-w

Thanks, LGTM!

examples/dreambooth/train_dreambooth_lora_qwen_image.py

a-r-r-o-w · 2025-08-04T17:01:30Z

examples/dreambooth/train_dreambooth_lora_qwen_image.py

+
+
+if is_wandb_available():
+    import wandb


import trackio as wandb 😛 We should do this sometime soon :)

Full support!

a-r-r-o-w · 2025-08-04T17:03:10Z

examples/dreambooth/train_dreambooth_lora_qwen_image.py

+    # Qwen expects a `num_frames` dimension too.
+    if pixel_values.ndim == 4:
+        pixel_values = pixel_values.unsqueeze(2)


Okay sounds good, let's do in separate PR

src/diffusers/loaders/lora_pipeline.py

tests/lora/test_lora_layers_qwenimage.py

Co-authored-by: Aryan <[email protected]>

…ce#12056) * feat: support lora in qwen image and training script * up * up * up * up * up * up * add lora tests * fix * add tests * fix * reviewer feedback * up[ * Apply suggestions from code review Co-authored-by: Aryan <[email protected]> --------- Co-authored-by: Aryan <[email protected]>

feat: support lora in qwen image and training script

7761732

sayakpaul added 2 commits August 4, 2025 10:08

up

2f76ca7

up

b8c280a

sayakpaul requested review from a-r-r-o-w and linoytsaban August 4, 2025 05:22

sayakpaul added 2 commits August 4, 2025 11:13

up

603265b

up

1420af5

sayakpaul commented Aug 4, 2025

View reviewed changes

sayakpaul added 2 commits August 4, 2025 13:24

up

b6a828a

Merge branch 'main' into qwen-image-training

987766e

linoytsaban reviewed Aug 4, 2025

View reviewed changes

up

cb1b6b4

sayakpaul marked this pull request as ready for review August 4, 2025 11:12

sayakpaul added 8 commits August 4, 2025 16:43

resolve conflicts.

67dfa47

Merge branch 'main' into qwen-image-training

446e28d

Merge branch 'main' into qwen-image-training

f25934a

add lora tests

0632db6

Merge branch 'main' into qwen-image-training

ae3ced6

fix

69ae746

add tests

ef83f04

fix

09a4acb

a-r-r-o-w approved these changes Aug 4, 2025

View reviewed changes

sayakpaul and others added 4 commits August 4, 2025 22:52

reviewer feedback

381c0ec

up[

f41a390

Apply suggestions from code review

05c894c

Co-authored-by: Aryan <[email protected]>

Merge branch 'main' into qwen-image-training

421d6e4

sayakpaul merged commit 9c1d4e3 into main Aug 5, 2025
31 of 32 checks passed

sayakpaul deleted the qwen-image-training branch August 5, 2025 01:36

	latents_std = 1.0 / torch.tensor(self.vae.config.latents_std).view(1, self.vae.config.z_dim, 1, 1, 1).to(
	latents.device, latents.dtype
	)
	latents = latents / latents_std + latents_mean

[wip] feat: support lora in qwen image and training script #12056

[wip] feat: support lora in qwen image and training script #12056

Uh oh!

Conversation

sayakpaul commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linoytsaban left a comment

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

sayakpaul commented Aug 4, 2025 •

edited

Loading

sayakpaul Aug 4, 2025 •

edited

Loading

sayakpaul commented Aug 4, 2025 •

edited

Loading