-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Hi, thanks for your great work.
I am using the qwen-image-finetune framework to fine-tune the qwen-image-edit-2509 model. I follow the instruction in README to start with the example task of face segmentation.
However, after completing the training, the model's inference results do not show any expected changes or improvements. The output appears nearly identical to the base model.
Could you please help me look into this issue? Is this due to the hyperparameter setting?
Thank you so much!
I run the inference with https://github.com/tsiendragon/qwen-image-finetune/blob/5019d46278555e154b035ea9e5811502441223b9/tests/src/trainer/test_qwen_image_edit_plus.ipynb. The pretrained LoRA https://huggingface.co/TsienDragon/qwen-image-edit-plus-lora-face-seg works as expected so I assume it may be not the issue with inference process.
Here is the generated model with LoRA trained by myself, there s no improvement.

I run the training with the following run.sh
config_file='../tests/test_configs/test_example_qwen_image_edit_plus_fp16.yaml'
echo "Used config file: $config_file"
cd src/
accelerate launch \
--num_processes 1 \
--mixed_precision bf16 \
-m qflux.main --config $config_file
Here are my training hyperparameters,
model:
pretrained_model_name_or_path: Qwen/Qwen-Image-Edit-2509
quantize: false
lora:
r: 16 # LoRA rank, can be adjusted(8, 16, 32), larger r means more parameters
lora_alpha: 16 # LoRA alpha, usually equal to r
init_lora_weights: gaussian
target_modules: [to_k, to_q, to_v, to_out.0]
pretrained_weight: null
adapter_name: lora_edit
data:
class_path: qflux.data.dataset.ImageDataset
init_args:
dataset_path:
- repo_id: TsienDragon/face_segmentation_20
split: train
caption_dropout_rate: 0.0
prompt_image_dropout_rate: 0.0
use_edit_mask: true # if true, dataset output edit mask
selected_control_indexes: [1]
processor:
class_path: qflux.data.preprocess.ImageProcessor
init_args:
process_type: center_crop
resize_mode: bilinear
target_size: [832, 576] # [768, 1344]
controls_size: [[832, 576]]
batch_size: 2 # adjust batch size according to the available memory, can be set to 1, 2, 4
num_workers: 1 # set to 1 to avoid potential deadlock issues with multiprocessing
shuffle: true
logging:
output_dir: /qwen-image-finetune/image_edit_lora/ # change the path to your own output path
# report_to: tensorboard
report_to: wandb
# report_to: swanlab
tracker_project_name: faceSegQwenImageEditPlusFp16
tags:
- test
- QwenImageEditPlus
- FaceSeg
notes: "This is a test configuration for QwenImageEditPlus on FaceSeg dataset"
validation:
enabled: false
steps: 100
max_samples: 2
seed: 42
samples:
- prompt: "change the image from the face to the face segmentation mask"
images:
- /data6/wangshengyuan/3Dbuilding/training/qwen-image-finetune/test_person.png
controls_size: [[832, 576]]
height: 832
width: 576
- prompt: "change the image from the face to the face segmentation mask"
images:
- /data6/wangshengyuan/3Dbuilding/training/qwen-image-finetune/test_male.png
controls_size: [[832, 576]]
height: 832
width: 576
optimizer:
# class_path: bitsandbytes.optim.Adam8bit # 8bit Adam optimizer to save memory
# init_args:
# lr: 0.0001 # face segmentation task uses smaller learning rate
# betas: [0.9, 0.999]
class_path: prodigyopt.Prodigy # bitsandbytes.optim.Adam8bit # 8bit Adam optimizer to save memory
init_args:
lr: 0.0001
use_bias_correction: True
safeguard_warmup: True
weight_decay: 0.01
lr_scheduler:
scheduler_type: cosine # cosine scheduler, better for fine-grained tasks
warmup_steps: 50 # increase warmup steps
num_cycles: 0.5
power: 1.0
loss:
mask_loss: true
forground_weight: 2.0
background_weight: 1.0
train:
gradient_accumulation_steps: 1 # increase gradient accumulation to simulate larger batch size
max_train_steps: 600 # face segmentation data is small, reduce training steps
checkpointing_steps: 100 # save checkpoints more frequently
max_grad_norm: 1.0
mixed_precision: bf16
gradient_checkpointing: True # enable gradient checkpointing to save memory
low_memory: True # is used low_memory mode, then the model will be loaded on the specified devices
# otherwise, the model will be loaded on all the gpus
fit_device:
vae: cuda:1
text_encoder: cuda:1
trainer: QwenImageEditPlus
resume: null
cache:
devices:
vae: cuda:3
text_encoder: cuda:3
cache_dir: ${logging.output_dir}/${logging.tracker_project_name}/cache
use_cache: true
prompt_empty_drop_keys:
- prompt_embeds
- pooled_prompt_embeds
predict:
devices:
vae: cuda:3
text_encoder: cuda:3 # CLIP
dit: cuda:3
