Qwen-Image-Edit Inferior Results Compared to ComfyUI

### Describe the bug

I am trying to do multi-image editing with Qwen-Image-Edit (It is a simplified version of [this](https://x.com/hellorob/status/1958197227135906087)). The ComfyUI workflow and diffusers script are shared below for reproducibility. I am using the same (unquantized) models and the same parameters.

The results I get with diffusers are noticeably inferior compared to ComfyUI especially in terms of preservation of the details.

Here is the input image and prompt:
<img width="1317" height="796" alt="Image" src="https://github.com/user-attachments/assets/1cf175d5-572a-4511-bdb8-0ea360caf07a" />
Prompt: `The woman is displaying a plush toy product in her hand, while preserving her exact facial features, expression, clothing, and pose. Maintain the same background, natural lighting, and overall photographic composition and style.`

Here are the outputs from Comfy:
<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/1d6f4267-96cd-4cf7-b10b-2121d1cd6e8b" />
<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/c51b1a46-cdce-4f97-a507-65990613dd22" />

Here are the outputs from Diffusers:
<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/30f3530f-f43c-4c87-92d8-99438a58824d" />
<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/f190dba8-5885-4e27-8890-5c9dfee01f59" />

It changes both the woman and the toy a lot compared to Comfy implementation.

### Reproduction

Diffusers reproduction script
```python
import torch
from PIL import Image
from diffusers import QwenImageEditPipeline

pipeline = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16, device_map="cuda")
pipeline.set_progress_bar_config(disable=None)

input_image = Image.open("qwen_image_edit_input.png")

seed = 43
prompt = "The woman is displaying a plush toy product in her hand, while preserving her exact facial features, expression, clothing, and pose. Maintain the same background, natural lighting, and overall photographic composition and style."
inputs = {
    "image": input_image,
    "prompt": prompt,
    "generator": torch.manual_seed(seed),
    "true_cfg_scale": 4.5,
    "negative_prompt": " ",
    "num_inference_steps": 50,
    "height": 1024,
    "width": 1024,
}

with torch.inference_mode():
    output = pipeline(**inputs)
    output_image = output.images[0]
    output_image.save(f"diffusers_qwen_image_edit_out_{seed}.png")
```

Comfy Workflow file and image:
[qwen_image_edit-multi_image-v1.0-compare.json](https://github.com/user-attachments/files/21935108/qwen_image_edit-multi_image-v1.0-compare.json)
<img width="1301" height="728" alt="Image" src="https://github.com/user-attachments/assets/4e8786fe-c4f0-432d-81df-03205e087cd6" />

### Logs

```shell

```

### System Info

- 🤗 Diffusers version: 0.35.1
- Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.18
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.34.4
- Transformers version: 4.55.2
- Accelerate version: 1.2.1
- PEFT version: 0.17.0
- Bitsandbytes version: not installed
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA A100-SXM4-80GB, 81920 MiB
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

### Who can help?

@asomoza @yiyixuxu 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen-Image-Edit Inferior Results Compared to ComfyUI #12216

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen-Image-Edit Inferior Results Compared to ComfyUI #12216

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions