Skip to content

No observable improvement on Face Segmentation task when fine-tuning qwen-image-edit-2509 #58

@Wang-Shengyuan

Description

@Wang-Shengyuan

Hi, thanks for your great work.

I am using the qwen-image-finetune framework to fine-tune the qwen-image-edit-2509 model. I follow the instruction in README to start with the example task of face segmentation.

However, after completing the training, the model's inference results do not show any expected changes or improvements. The output appears nearly identical to the base model.

Could you please help me look into this issue? Is this due to the hyperparameter setting?

Thank you so much!

I run the inference with https://github.com/tsiendragon/qwen-image-finetune/blob/5019d46278555e154b035ea9e5811502441223b9/tests/src/trainer/test_qwen_image_edit_plus.ipynb. The pretrained LoRA https://huggingface.co/TsienDragon/qwen-image-edit-plus-lora-face-seg works as expected so I assume it may be not the issue with inference process.

Here is the generated model with LoRA trained by myself, there s no improvement.
Image

I run the training with the following run.sh

config_file='../tests/test_configs/test_example_qwen_image_edit_plus_fp16.yaml'
echo "Used config file: $config_file"
cd src/
accelerate launch \
  --num_processes 1 \
  --mixed_precision bf16 \
  -m qflux.main --config $config_file

Here are my training hyperparameters,

model:
  pretrained_model_name_or_path: Qwen/Qwen-Image-Edit-2509
  quantize: false
  lora:
    r: 16  # LoRA rank, can be adjusted(8, 16, 32), larger r means more parameters
    lora_alpha: 16  # LoRA alpha, usually equal to r
    init_lora_weights: gaussian
    target_modules: [to_k, to_q, to_v, to_out.0]
    pretrained_weight: null
    adapter_name: lora_edit

data:
  class_path: qflux.data.dataset.ImageDataset
  init_args:
    dataset_path:
      - repo_id: TsienDragon/face_segmentation_20
        split: train
    caption_dropout_rate: 0.0
    prompt_image_dropout_rate: 0.0
    use_edit_mask: true  # if true, dataset output edit mask
    selected_control_indexes: [1]
    processor:
      class_path: qflux.data.preprocess.ImageProcessor
      init_args:
        process_type: center_crop
        resize_mode: bilinear
        target_size:  [832, 576] # [768, 1344]
        controls_size: [[832, 576]]

  batch_size: 2  # adjust batch size according to the available memory, can be set to 1, 2, 4
  num_workers: 1  # set to 1 to avoid potential deadlock issues with multiprocessing
  shuffle: true

logging:
  output_dir: /qwen-image-finetune/image_edit_lora/  # change the path to your own output path
  # report_to: tensorboard
  report_to: wandb
  # report_to: swanlab
  tracker_project_name: faceSegQwenImageEditPlusFp16
  tags:
    - test
    - QwenImageEditPlus
    - FaceSeg
  notes: "This is a test configuration for QwenImageEditPlus on FaceSeg dataset"

validation:
  enabled: false
  steps: 100
  max_samples: 2
  seed: 42
  samples:
    - prompt: "change the image from the face to the face segmentation mask"
      images:
        - /data6/wangshengyuan/3Dbuilding/training/qwen-image-finetune/test_person.png
      controls_size: [[832, 576]]
      height: 832
      width: 576
    - prompt: "change the image from the face to the face segmentation mask"
      images:
        - /data6/wangshengyuan/3Dbuilding/training/qwen-image-finetune/test_male.png
      controls_size: [[832, 576]]
      height: 832
      width: 576

optimizer:
  # class_path: bitsandbytes.optim.Adam8bit  # 8bit Adam optimizer to save memory
  # init_args:
  #   lr: 0.0001  # face segmentation task uses smaller learning rate
  #   betas: [0.9, 0.999]
  class_path:  prodigyopt.Prodigy # bitsandbytes.optim.Adam8bit  # 8bit Adam optimizer to save memory
  init_args:
    lr: 0.0001
    use_bias_correction: True
    safeguard_warmup: True
    weight_decay: 0.01

lr_scheduler:
  scheduler_type: cosine  # cosine scheduler, better for fine-grained tasks
  warmup_steps: 50  # increase warmup steps
  num_cycles: 0.5
  power: 1.0

loss:
  mask_loss: true
  forground_weight: 2.0
  background_weight: 1.0

train:
  gradient_accumulation_steps: 1  # increase gradient accumulation to simulate larger batch size
  max_train_steps: 600  # face segmentation data is small, reduce training steps
  checkpointing_steps: 100  # save checkpoints more frequently
  max_grad_norm: 1.0
  mixed_precision: bf16
  gradient_checkpointing: True  # enable gradient checkpointing to save memory
  low_memory: True  # is used low_memory mode, then the model will be loaded on the specified devices
  # otherwise, the model will be loaded on all the gpus
  fit_device:
    vae: cuda:1
    text_encoder: cuda:1

trainer: QwenImageEditPlus
resume: null

cache:
  devices:
    vae: cuda:3
    text_encoder: cuda:3
  cache_dir: ${logging.output_dir}/${logging.tracker_project_name}/cache
  use_cache: true
  prompt_empty_drop_keys:
    - prompt_embeds
    - pooled_prompt_embeds

predict:
  devices:
    vae: cuda:3
    text_encoder: cuda:3 # CLIP
    dit: cuda:3

Here is the training loss curve.
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions