Skip to content

Conversation

@Pfannkuchensack
Copy link
Collaborator

Summary

Add support for Z-Image ControlNet V2.0 alongside the existing V1 support.

Key changes:

  • Auto-detect control_in_dim from adapter weights (16 for V1, 33 for V2.0)
  • Auto-detect n_refiner_layers from state dict
  • Add zero-padding for V2.0's additional control channels (diffusers approach)
  • Use accelerate.init_empty_weights() for more efficient model creation
  • Add ControlNet_Checkpoint_ZImage_Config to frontend schema

Related Issues / Discussions

Part of Z-Image feature implementation.

QA Instructions

  1. Load a Z-Image ControlNet V1 model (control_in_dim=16) and verify it works
  2. Load a Z-Image ControlNet V2.0 model (control_in_dim=33) and verify it works
  3. Test with different control types: Canny, Depth, Pose
  4. Recommended control_context_scale: 0.65-0.80

Merge Plan

Can be merged after review. No special considerations needed.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added api python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files python-deps PRs that change python dependencies labels Dec 14, 2025
@blessedcoolant
Copy link
Collaborator

Merged the Z Image PR. Can you rebase this against main now so we can go through the checks? Thank you.

feat: Add Z-Image ControlNet support with spatial conditioning

Add comprehensive ControlNet support for Z-Image models including:

Backend:
- New ControlNet_Checkpoint_ZImage_Config for Z-Image control adapter models
- Z-Image control key detection (_has_z_image_control_keys) to identify control layers
- ZImageControlAdapter loader for standalone control models
- ZImageControlTransformer2DModel combining base transformer with control layers
- Memory-efficient model loading by building combined state dict
VRAM usage is high.

- Auto-detect control_in_dim from adapter weights (16 for V1, 33 for V2.0)
- Auto-detect n_refiner_layers from state dict
- Add zero-padding for V2.0's additional channels
- Use accelerate.init_empty_weights() for efficient model creation
- Add ControlNet_Checkpoint_ZImage_Config to frontend schema
- Add missing ControlNet_Checkpoint_ZImage_Config import
- Remove unused imports (Any, Dict, ADALN_EMBED_DIM, is_torch_version)
- Add strict=True to zip() calls
- Replace mutable list defaults with immutable tuples
- Replace dict() calls with literal syntax
- Sort imports in z_image_denoise.py
Implement Z-Image ControlNet as an Extension pattern (similar to FLUX ControlNet)
instead of merging control weights into the base transformer. This provides:
- Lower memory usage (no weight duplication)
- Flexibility to enable/disable control per step
- Cleaner architecture with separate control adapter

Key implementation details:
- ZImageControlNetExtension: computes control hints per denoising step
- z_image_forward_with_control: custom forward pass with hint injection
- patchify_control_context: utility for control image patchification
- ZImageControlAdapter: standalone adapter with control_layers and noise_refiner

Architecture matches original VideoX-Fun implementation:
- Hints computed ONCE using INITIAL unified state (before main layers)
- Hints injected at every other main transformer layer (15 control blocks)
- Control signal added after each designated layer's forward pass

V2.0 ControlNet support (control_in_dim=33):
- Channels 0-15: control image latents
- Channels 16-31: reference image (zeros for pure control)
- Channel 32: inpaint mask (1.0 = don't inpaint, use control signal)
@blessedcoolant
Copy link
Collaborator

blessedcoolant commented Dec 22, 2025

Seems to be working fine. I didn't have the last commit pulled when I tested the last time. My bad.

Also not sure if it is LoRA problem or the ControlNet issue but the results with the Arcane lora I linked in the other PR are quite muddy.

This is the output I get with the scaled safetensor. But I am guessing coz that is coz scaled weights are not supported yet with the ControlNet?

opera_eGWB6Zdw2a

@Pfannkuchensack
Copy link
Collaborator Author

So the v1 and v2 are really bad. the v2.1 works fine https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1

During testing, we found that applying ControlNet to Z-Image-Turbo caused the model to lose its acceleration capability and become blurry. We performed 8-step distillation on the version 2.1 model, and the distilled model demonstrates better performance when using 8-step prediction. Additionally, we have uploaded a tile model that can be used for super-resolution generation. [2025.12.22]

@blessedcoolant
Copy link
Collaborator

So the v1 and v2 are really bad. the v2.1 works fine https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1

During testing, we found that applying ControlNet to Z-Image-Turbo caused the model to lose its acceleration capability and become blurry. We performed 8-step distillation on the version 2.1 model, and the distilled model demonstrates better performance when using 8-step prediction. Additionally, we have uploaded a tile model that can be used for super-resolution generation. [2025.12.22]

Ah. Brand new. I'll check it out in a bit. Both the regular and the tile version. If they are go, we can set them as the suggested starter models and merge this one up too and move on to the regional guidance part.

@blessedcoolant
Copy link
Collaborator

Tested out with the newer models. Definitely better performance. The quality of the controlnet models themselves is alright. LoRA functionality is much better but not Z Image base level yet.

But this PR is good to go I think. ControlNet models seem to be working. Both tile and union.

I synced up with main and fixed the ruff checks. If there's nothing else to add to this one, let me know. I can merge this and we can move on to the next one.

Great job overall implementing Z Image. Looking great.

@blessedcoolant blessedcoolant marked this pull request as ready for review December 23, 2025 00:24
@blessedcoolant blessedcoolant merged commit aa764f8 into invoke-ai:main Dec 23, 2025
25 checks passed
@Pfannkuchensack Pfannkuchensack deleted the feature/z-image-control branch December 23, 2025 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-deps PRs that change python dependencies Root

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants