Release Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more · huggingface/diffusers

This release comes packed with new image generation and editing pipelines, a new video pipeline, new training scripts, quality-of-life improvements, and much more. Read the rest of the release notes fully to not miss out on the fun stuff.

New pipelines 🧨

We welcomed new pipelines in this release:

Wan 2.2
Flux-Kontext
Qwen-Image
Qwen-Image-Edit

Wan 2.2 📹

This update to Wan provides significant improvements in video fidelity, prompt adherence, and style. Please check out the official doc to learn more.

Flux-Kontext 🎇

Flux-Kontext is a 12-billion-parameter rectified flow transformer capable of editing images based on text instructions. Please check out the official doc to learn more about it.

Qwen-Image 🌅

After a successful run of delivering language models and vision-language models, the Qwen team is back with an image generation model, which is Apache-2.0 licensed! It achieves significant advances in complex text rendering and precise image editing. To learn more about this powerful model, refer to our docs.

Thanks to @naykun for contributing both Qwen-Image and Qwen-Image-Edit via this PR and this PR.

New training scripts 🎛️

Make these newly added models your own with our training scripts:

Single-file modeling implementations

Following the 🤗 Transformers’ philosophy of single-file modeling implementations, we have started implementing modeling code in single and self-contained files. The Flux Transformer code is one example of this.

Attention refactor

We have massively refactored how we do attention in the models. This allows us to provide support for different attention backends (such as PyTorch native scaled_dot_product_attention, Flash Attention 3, SAGE attention, etc.) in the library seamlessly.

Having attention supported this way also allows us to integrate different parallelization mechanisms, which we’re actively working on. Follow this PR if you’re interested.

Users shouldn’t be affected at all by these changes. Please open an issue if you face any problems.

Regional compilation

Regional compilation trims cold-start latency by only compiling the small and frequently-repeated block(s) of a model - typically a transformer layer - and enables reusing compiled artifacts for every subsequent occurrence. For many diffusion architectures, this delivers the same runtime speedups as full-graph compilation and reduces compile time by 8–10x. Refer to this doc to learn more.

Thanks to @anijain2305 for contributing this feature in this PR.

We have also authored a number of posts that center around the use of torch.compile. You can check them out at the links below:

Faster pipeline loading ⚡️

Users can now load pipelines directly on an accelerator device leading to significantly faster load times. This particularly becomes evident when loading large pipelines like Wan and Qwen-Image.

from diffusers import DiffusionPipeline
import torch 

ckpt_id = "Qwen/Qwen-Image"
pipe = DiffusionPipeline.from_pretrained(
-    ckpt_id, torch_dtype=torch.bfloat16
- ).to("cuda")
+    ckpt_id, torch_dtype=torch.bfloat16, device_map="cuda"
+ )

You can speed up loading even more by enabling parallelized loading of state dict shards. This is particularly helpful when you’re working with large models like Wan and Qwen-Image, where the model state dicts are typically sharded across multiple files.

import os
os.environ["HF_ENABLE_PARALLEL_LOADING"] = "yes"

# rest of the loading code
....

Better GGUF integration

@Isotr0py contributed support for native GGUF CUDA kernels in this PR. This should provide an approximately 10% improvement in inference speed.

We have also worked on a tool for converting regular checkpoints to GGUF, letting the community easily share their GGUF checkpoints. Learn more here.

We now support loading of Diffusers format GGUF checkpoints.

You can learn more about all of this in our GGUF official docs.

Modular Diffusers (Experimental)

Modular Diffusers is a system for building diffusion pipelines pipelines with individual pipeline blocks. It is highly customisable, with blocks that can be mixed and matched to adapt to or create a pipeline for a specific workflow or multiple workflows.

The API is currently in active development and is being released as an experimental feature. Learn more in our docs.

All commits

[tests] skip instead of returning. by @sayakpaul in #11793
adjust to get CI test cases passed on XPU by @kaixuanliu in #11759
fix deprecation in lora after 0.34.0 release by @sayakpaul in #11802
[chore] post release v0.34.0 by @sayakpaul in #11800
Follow up for Group Offload to Disk by @DN6 in #11760
[rfc][compile] compile method for DiffusionPipeline by @anijain2305 in #11705
[tests] add a test on torch compile for varied resolutions by @sayakpaul in #11776
adjust tolerance criteria for test_float16_inference in unit test by @kaixuanliu in #11809
Flux Kontext by @a-r-r-o-w in #11812
Kontext training by @sayakpaul in #11813
Kontext fixes by @a-r-r-o-w in #11815
remove syncs before denoising in Kontext by @sayakpaul in #11818
[CI] disable onnx, mps, flax from the CI by @sayakpaul in #11803
TorchAO compile + offloading tests by @a-r-r-o-w in #11697
Support dynamically loading/unloading loras with group offloading by @a-r-r-o-w in #11804
[lora] fix: lora unloading behvaiour by @sayakpaul in #11822
[lora]feat: use exclude modules to loraconfig. by @sayakpaul in #11806
ENH: Improve speed of function expanding LoRA scales by @BenjaminBossan in #11834
Remove print statement in SCM Scheduler by @a-r-r-o-w in #11836
[tests] add test for hotswapping + compilation on resolution changes by @sayakpaul in #11825
reset deterministic in tearDownClass by @jiqing-feng in #11785
[tests] Fix failing float16 cuda tests by @a-r-r-o-w in #11835
[single file] Cosmos by @a-r-r-o-w in #11801
[docs] fix single_file example. by @sayakpaul in #11847
Use real-valued instead of complex tensors in Wan2.1 RoPE by @mjkvaak-amd in #11649
[docs] Batch generation by @stevhliu in #11841
[docs] Deprecated pipelines by @stevhliu in #11838
fix norm not training in train_control_lora_flux.py by @Luo-Yihang in #11832
[From Single File] support from_single_file method for WanVACE3DTransformer by @J4BEZ in #11807
[lora] tests for exclude_modules with Wan VACE by @sayakpaul in #11843
update: FluxKontextInpaintPipeline support by @vuongminh1907 in #11820
[Flux Kontext] Support Fal Kontext LoRA by @linoytsaban in #11823
[docs] Add a note of _keep_in_fp32_modules by @a-r-r-o-w in #11851
[benchmarks] overhaul benchmarks by @sayakpaul in #11565
FIX set_lora_device when target layers differ by @BenjaminBossan in #11844
Fix Wan AccVideo/CausVid fuse_lora by @a-r-r-o-w in #11856
[chore] deprecate blip controlnet pipeline. by @sayakpaul in #11877
[docs] fix references in flux pipelines. by @sayakpaul in #11857
[tests] remove tests for deprecated pipelines. by @sayakpaul in #11879
[docs] LoRA metadata by @stevhliu in #11848
[training ] add Kontext i2i training by @sayakpaul in #11858
[CI] Fix big GPU test marker by @DN6 in #11786
First Block Cache by @a-r-r-o-w in #11180
[tests] annotate compilation test classes with bnb by @sayakpaul in #11715
Update chroma.md by @shm4r7 in #11891
[CI] Speed up GPU PR Tests by @DN6 in #11887
Pin k-diffusion for CI by @sayakpaul in #11894
[Docker] update doc builder dockerfile to include quant libs. by @sayakpaul in #11728
[tests] Remove more deprecated tests by @sayakpaul in #11895
[tests] mark the wanvace lora tester flaky by @sayakpaul in #11883
[tests] add compile + offload tests for GGUF. by @sayakpaul in #11740
feat: add multiple input image support in Flux Kontext by @Net-Mist in #11880
Fix unique memory address when doing group-offloading with disk by @sayakpaul in #11767
[SD3] CFG Cutoff fix and official callback by @asomoza in #11890
The Modular Diffusers by @yiyixuxu in #9672
[quant] QoL improvements for pipeline-level quant config by @sayakpaul in #11876
Bump torch from 2.4.1 to 2.7.0 in /examples/server by @dependabot[bot] in #11429
[LoRA] fix: disabling hooks when loading loras. by @sayakpaul in #11896
[utils] account for MPS when available in get_device(). by @sayakpaul in #11905
[ControlnetUnion] Multiple Fixes by @asomoza in #11888
Avoid creating tensor in CosmosAttnProcessor2_0 by @chenxiao111222 in #11761)
[tests] Unify compilation + offloading tests in quantization by @sayakpaul in #11910
Speedup model loading by 4-5x ⚡ by @a-r-r-o-w in #11904
[docs] torch.compile blog post by @stevhliu in #11837
Flux: pass joint_attention_kwargs when using gradient_checkpointing by @piercus in #11814
Fix: Align VAE processing in ControlNet SD3 training with inference by @Henry-Bi in #11909
Bump aiohttp from 3.10.10 to 3.12.14 in /examples/server by @dependabot[bot] in #11924
[tests] Improve Flux tests by @a-r-r-o-w in #11919
Remove device synchronization when loading weights by @a-r-r-o-w in #11927
Remove forced float64 from onnx stable diffusion pipelines by @lostdisc in #11054
Fixed bug: Uncontrolled recursive calls that caused an infinite loop when loading certain pipelines containing Transformer2DModel by @lengmo1996 in #11923
[ControlnetUnion] Propagate #11888 to img2img by @asomoza in #11929
enable flux pipeline compatible with unipc and dpm-solver by @gameofdimension in #11908
[training] add an offload utility that can be used as a context manager. by @sayakpaul in #11775
Add SkyReels V2: Infinite-Length Film Generative Model by @tolgacangoz in #11518
[refactor] Flux/Chroma single file implementation + Attention Dispatcher by @a-r-r-o-w in #11916
[docs] clarify the mapping between Transformer2DModel and finegrained variants. by @sayakpaul in #11947
[Modular] Updates for Custom Pipeline Blocks by @DN6 in #11940
[docs] Update toctree by @stevhliu in #11936
[docs] include bp link. by @sayakpaul in #11952
Fix kontext finetune issue when batch size >1 by @mymusise in #11921
[tests] Add test slices for Hunyuan Video by @a-r-r-o-w in #11954
[tests] Add test slices for Cosmos by @a-r-r-o-w in #11955
[tests] Add fast test slices for HiDream-Image by @a-r-r-o-w in #11953
[Modular] update the collection behavior by @yiyixuxu in #11963
fix "Expected all tensors to be on the same device, but found at least two devices" error by @yao-matrix in #11690
Remove logger warnings for attention backends and hard error during runtime instead by @a-r-r-o-w in #11967
[Examples] Uniform notations in train_flux_lora by @tomguluson92 in #10011
fix style by @yiyixuxu in #11975
[tests] Add test slices for Wan by @a-r-r-o-w in #11920
[docs] update guidance_scale docstring for guidance_distilled models. by @sayakpaul in #11935
[tests] enforce torch version in the compilation tests. by @sayakpaul in #11979
[modular diffusers] Wan by @a-r-r-o-w in #11913
[compile] logger statements create unnecessary guards during dynamo tracing by @a-r-r-o-w in #11987
enable quantcompile test on xpu by @yao-matrix in #11988
[WIP] Wan2.2 by @yiyixuxu in #12004
[refactor] some shared parts between hooks + docs by @a-r-r-o-w in #11968
[refactor] Wan single file implementation by @a-r-r-o-w in #11918
Fix huggingface-hub failing tests by @asomoza in #11994
feat: add flux kontext by @jlonge4 in #11985
[modular] add Modular flux for text-to-image by @sayakpaul in #11995
[docs] include lora fast post. by @sayakpaul in #11993
[docs] quant_kwargs by @stevhliu in #11712
[docs] Fix link by @stevhliu in #12018
[wan2.2] add 5b i2v by @yiyixuxu in #12006
wan2.2 i2v FirstBlockCache fix by @okaris in #12013
[core] support attention backends for LTX by @sayakpaul in #12021
[docs] Update index by @stevhliu in #12020
[Fix] huggingface-cli to hf missed files by @asomoza in #12008
[training-scripts] Make pytorch examples UV-compatible by @sayakpaul in #12000
[wan2.2] fix vae patches by @yiyixuxu in #12041
Allow SD pipeline to use newer schedulers, eg: FlowMatch by @ppbrown in #12015
[LoRA] support lightx2v lora in wan by @sayakpaul in #12040
Fix type of force_upcast to bool by @BerndDoser in #12046
Update autoencoder_kl_cosmos.py by @tanuj-rai in #12045
Qwen-Image by @naykun in #12055
[wan2.2] follow-up by @yiyixuxu in #12024
tests + minor refactor for QwenImage by @a-r-r-o-w in #12057
Cross attention module to Wan Attention by @samuelt0 in #12058
fix(qwen-image): update vae license by @naykun in #12063
CI fixing by @paulinebm in #12059
enable all gpus when running ci. by @sayakpaul in #12062
fix the rest for all GPUs in CI by @sayakpaul in #12064
[docs] Install by @stevhliu in #12026
[wip] feat: support lora in qwen image and training script by @sayakpaul in #12056
[docs] small corrections to the example in the Qwen docs by @sayakpaul in #12068
[tests] Fix Qwen test_inference slices by @a-r-r-o-w in #12070
[tests] deal with the failing AudioLDM2 tests by @sayakpaul in #12069
optimize QwenImagePipeline to reduce unnecessary CUDA synchronization by @chengzeyi in #12072
Add cuda kernel support for GGUF inference by @Isotr0py in #11869
fix input shape for WanGGUFTexttoVideoSingleFileTests by @jiqing-feng in #12081
[refactor] condense group offloading by @a-r-r-o-w in #11990
Fix group offloading synchronization bug for parameter-only GroupModule's by @a-r-r-o-w in #12077
Helper functions to return skip-layer compatible layers by @a-r-r-o-w in #12048
Make prompt_2 optional in Flux Pipelines by @DN6 in #12073
[tests] tighten compilation tests for quantization by @sayakpaul in #12002
Implement Frequency-Decoupled Guidance (FDG) as a Guider by @dg845 in #11976
fix flux type hint by @DefTruth in #12089
[qwen] device typo by @yiyixuxu in #12099
[lora] adapt new LoRA config injection method by @sayakpaul in #11999
lora_conversion_utils: replace lora up/down with a/b even if transformer. in key by @Beinsezii in #12101
[tests] device placement for non-denoiser components in group offloading LoRA tests by @sayakpaul in #12103
[Modular] Fast Tests by @yiyixuxu in #11937
[GGUF] feat: support loading diffusers format gguf checkpoints. by @sayakpaul in #11684
[docs] diffusers gguf checkpoints by @sayakpaul in #12092
[core] add modular support for Flux I2I by @sayakpaul in #12086
[lora] support loading loras from lightx2v/Qwen-Image-Lightning by @sayakpaul in #12119
[Modular] More Updates for Custom Code Loading by @DN6 in #11969
enable compilation in qwen image. by @sayakpaul in #12061
[tests] Add inference test slices for SD3 and remove unnecessary tests by @a-r-r-o-w in #12106
[chore] complete the licensing statement. by @sayakpaul in #12001
[docs] Cache link by @stevhliu in #12105
[Modular] Add experimental feature warning for Modular Diffusers by @DN6 in #12127
Add low_cpu_mem_usage option to from_single_file to align with from_pretrained by @IrisRainbowNeko in #12114
[docs] Modular diffusers by @stevhliu in #11931
[Bugfix] typo fix in NPU FA by @leisuzz in #12129
Add QwenImage Inpainting and Img2Img pipeline by @Trgtuan10 in #12117
[core] parallel loading of shards by @sayakpaul in #12028
try to use deepseek with an agent to auto i18n to zh by @SamYuan1990 in #12032
[docs] Refresh effective and efficient doc by @stevhliu in #12134
Fix bf15/fp16 for pipeline_wan_vace.py by @SlimRG in #12143
make parallel loading flag a part of constants. by @sayakpaul in #12137
[docs] Parallel loading of shards by @stevhliu in #12135
feat: cuda device_map for pipelines. by @sayakpaul in #12122
[core] respect local_files_only=True when using sharded checkpoints by @sayakpaul in #12005
support hf_quantizer in cache warmup. by @sayakpaul in #12043
make test_gguf all pass on xpu by @yao-matrix in #12158
[docs] Quickstart by @stevhliu in #12128
Qwen Image Edit Support by @naykun in #12164
remove silu for CogView4 by @lambertwjh in #12150
[qwen] Qwen image edit followups by @sayakpaul in #12166
Minor modification to support DC-AE-turbo by @chenjy2003 in #12169
[Docs] typo error in qwen image by @leisuzz in #12144
fix: caching allocator behaviour for quantization. by @sayakpaul in #12172
fix(training_utils): wrap device in list for DiffusionPipeline by @MengAiDev in #12178
[docs] Clarify guidance scale in Qwen pipelines by @sayakpaul in #12181
[LoRA] feat: support more Qwen LoRAs from the community. by @sayakpaul in #12170
Update README.md by @Taechai in #12182
[chore] add lora button to qwenimage docs by @sayakpaul in #12183
[Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora by @linoytsaban in #12074
Release: v0.35.0 by @sayakpaul (direct commit on v0.35.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@vuongminh1907
- update: FluxKontextInpaintPipeline support (#11820)
@Net-Mist
- feat: add multiple input image support in Flux Kontext (#11880)
@tolgacangoz
- Add SkyReels V2: Infinite-Length Film Generative Model (#11518)
@naykun
- Qwen-Image (#12055)
- fix(qwen-image): update vae license (#12063)
- Qwen Image Edit Support (#12164)
@Trgtuan10
- Add QwenImage Inpainting and Img2Img pipeline (#12117)
@SamYuan1990
- try to use deepseek with an agent to auto i18n to zh (#12032)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more