feat: Add ZImage/LongCat/Sana diffusion support + LLM VL improvements… by er6y · Pull Request #4165 · alibaba/MNN

er6y · 2026-02-12T09:26:14Z

… + OpenCL fixes

Diffusion Engine:

Add ZImageDiffusion subclass: FlowMatch Euler scheduler, PhiloxRNG noise, CLIP text encoder
Add LongCatDiffusion subclass: LLM text encoder (lazy load), Flux-like latent packing, VAE enc/dec, T2I + Image Edit modes
Integrate SanaDiffusion with SanaLlm into unified API, fix Euler sampling (dt/1000->dt) and CFG order
Add DiffusionConfig/LLMEncoderConfig, GPU config (BUFFER mode, FP32, Memory_Low, OpenCL cache)
Add DiffusionGpuMemoryMode/PrecisionMode/CFGMode enums
Extend createDiffusion factory with full parameters for all model types
Unify diffusion_demo to support SD1.5/Taiyi/Sana/ZImage/LongCat with cfg_scale and input_image args
Add image processing utilities (resize, crop, colorspace, pack/unpack latents)
Implement FlowMatchEuler scheduler

LLM / Vision:

omni.cpp: Qwen3-VL vision fixes (floor rounding, nullptr check)
omni.hpp: mrope position ids fix max(T,H,W)+1
tokenizer.hpp: public wrapper header, MNN_PUBLIC export
llm_demo: LLM_DEMO_ONELINE mode
pymnn llm.h: forward_all() binding
Fix Qwen2_5Vision transformer_fuse for window attention compatibility

OpenCL fixes:

BinaryBufExecution: localWorkSize divisibility check
binary_buf.cl: float4/int4 init, per-element ReLU (NVIDIA compiler bug workaround)

Other fixes:

ShapeSliceTf/ShapeWhere: shape calculation rewrite
OnnxEinsum: outer product broadcast _Unsqueeze fix
Pipeline: NaN/Inf debug check macro (disabled by default)
Fix LongCat unpackLatentsGPU to match master implementation

… + OpenCL fixes Diffusion Engine: - Add ZImageDiffusion subclass: FlowMatch Euler scheduler, PhiloxRNG noise, CLIP text encoder - Add LongCatDiffusion subclass: LLM text encoder (lazy load), Flux-like latent packing, VAE enc/dec, T2I + Image Edit modes - Integrate SanaDiffusion with SanaLlm into unified API, fix Euler sampling (dt/1000->dt) and CFG order - Add DiffusionConfig/LLMEncoderConfig, GPU config (BUFFER mode, FP32, Memory_Low, OpenCL cache) - Add DiffusionGpuMemoryMode/PrecisionMode/CFGMode enums - Extend createDiffusion factory with full parameters for all model types - Unify diffusion_demo to support SD1.5/Taiyi/Sana/ZImage/LongCat with cfg_scale and input_image args - Add image processing utilities (resize, crop, colorspace, pack/unpack latents) - Implement FlowMatchEuler scheduler LLM / Vision: - omni.cpp: Qwen3-VL vision fixes (floor rounding, nullptr check) - omni.hpp: mrope position ids fix max(T,H,W)+1 - tokenizer.hpp: public wrapper header, MNN_PUBLIC export - llm_demo: LLM_DEMO_ONELINE mode - pymnn llm.h: forward_all() binding - Fix Qwen2_5Vision transformer_fuse for window attention compatibility OpenCL fixes: - BinaryBufExecution: localWorkSize divisibility check - binary_buf.cl: float4/int4 init, per-element ReLU (NVIDIA compiler bug workaround) Other fixes: - ShapeSliceTf/ShapeWhere: shape calculation rewrite - OnnxEinsum: outer product broadcast _Unsqueeze fix - Pipeline: NaN/Inf debug check macro (disabled by default) - Fix LongCat unpackLatentsGPU to match master implementation

CLAassistant · 2026-02-12T09:26:22Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

wangzhaode self-assigned this Feb 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add ZImage/LongCat/Sana diffusion support + LLM VL improvements…#4165

feat: Add ZImage/LongCat/Sana diffusion support + LLM VL improvements…#4165
er6y wants to merge 1 commit intoalibaba:masterfrom
er6y:master

er6y commented Feb 12, 2026

Uh oh!

CLAassistant commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

er6y commented Feb 12, 2026

Uh oh!

CLAassistant commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments