|
| 1 | +--- |
| 2 | +title: "Qwen-Image-Layered ComfyUI Workflow Example" |
| 3 | +description: "Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers, enabling inherent editability through layer decomposition." |
| 4 | +sidebarTitle: "Qwen-Image-Layered" |
| 5 | +--- |
| 6 | + |
| 7 | +import UpdateReminder from '/snippets/tutorials/update-reminder.mdx' |
| 8 | + |
| 9 | +**Qwen-Image-Layered** is a model developed by Alibaba's Qwen team that can decompose an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content. |
| 10 | + |
| 11 | +**Key Features**: |
| 12 | +- **Inherent Editability**: Each layer can be independently manipulated without affecting other content |
| 13 | +- **High-Fidelity Elementary Operations**: Supports resizing, repositioning, and recoloring with physical isolation of semantic components |
| 14 | +- **Variable-Layer Decomposition**: Not limited to a fixed number of layers - decompose into 3, 4, 8, or more layers as needed |
| 15 | +- **Recursive Decomposition**: Any layer can be further decomposed, enabling infinite decomposition depth |
| 16 | + |
| 17 | +**Related Links**: |
| 18 | +- [Hugging Face](https://huggingface.co/Qwen/Qwen-Image-Layered) |
| 19 | +- [Research Paper](https://arxiv.org/abs/2512.15603) |
| 20 | +- [Blog](https://qwenlm.github.io/blog/qwen-image-layered/) |
| 21 | + |
| 22 | +## Qwen-Image-Layered workflow |
| 23 | + |
| 24 | +<a className="prose" target='_blank' href="https://raw.githubusercontent.com/Comfy-Org/workflow_templates/refs/heads/main/templates/image_qwen_image_layered.json" style={{ display: 'inline-block', backgroundColor: '#0078D6', color: '#ffffff', padding: '10px 20px', borderRadius: '8px', borderColor: "transparent", textDecoration: 'none', fontWeight: 'bold', marginRight: '10px'}}> |
| 25 | + <p className="prose" style={{ margin: 0, fontSize: "0.8rem" }}>Download JSON Workflow File</p> |
| 26 | +</a> |
| 27 | + |
| 28 | +<UpdateReminder /> |
| 29 | + |
| 30 | +## Model links |
| 31 | + |
| 32 | +**text_encoders** |
| 33 | + |
| 34 | +- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/HunyuanVideo_1.5_repackaged/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors) |
| 35 | + |
| 36 | +**diffusion_models** |
| 37 | + |
| 38 | +- [qwen_image_layered_bf16.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_layered_bf16.safetensors) |
| 39 | + |
| 40 | +**vae** |
| 41 | + |
| 42 | +- [qwen_image_layered_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/resolve/main/split_files/vae/qwen_image_layered_vae.safetensors) |
| 43 | + |
| 44 | +**Model Storage Location** |
| 45 | + |
| 46 | +``` |
| 47 | +📂 ComfyUI/ |
| 48 | +├── 📂 models/ |
| 49 | +│ ├── 📂 text_encoders/ |
| 50 | +│ │ └── qwen_2.5_vl_7b_fp8_scaled.safetensors |
| 51 | +│ ├── 📂 diffusion_models/ |
| 52 | +│ │ └── qwen_image_layered_bf16.safetensors |
| 53 | +│ └── 📂 vae/ |
| 54 | +│ └── qwen_image_layered_vae.safetensors |
| 55 | +``` |
| 56 | + |
| 57 | +## FP8 version |
| 58 | + |
| 59 | +By default we are using bf16, which requires high VRAM. For lower VRAM usage, you can use the fp8 version: |
| 60 | + |
| 61 | +- [qwen_image_layered_fp8mixed.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image-Layered_ComfyUI/blob/main/split_files/diffusion_models/qwen_image_layered_fp8mixed.safetensors) |
| 62 | + |
| 63 | +Then update the **Load Diffusion model** node inside the [Subgraph](/interface/features/subgraph) to use it. |
| 64 | + |
| 65 | +## Workflow settings |
| 66 | + |
| 67 | +### Sampler settings |
| 68 | + |
| 69 | +This model is slow. The original sampling settings are steps: 50 and CFG: 4.0, which will at least double the generation time. |
| 70 | + |
| 71 | +### Input size |
| 72 | + |
| 73 | +For input size, 640px is recommended. Use 1024px for high-resolution output. |
| 74 | + |
| 75 | +### Prompt (optional) |
| 76 | + |
| 77 | +The text prompt is intended to describe the overall content of the input image—including elements that may be partially occluded (e.g., you may specify the text hidden behind a foreground object). It is not designed to control the semantic content of individual layers explicitly. |
0 commit comments