Skip to content

Commit 90e3400

Browse files
fix(z-image): Fix padding token shape mismatch for GGUF models (#8690)
## Summary Fix shape mismatch when loading GGUF-quantized Z-Image transformer models. GGUF Z-Image models store `x_pad_token` and `cap_pad_token` with shape `[3840]`, but diffusers `ZImageTransformer2DModel` expects `[1, 3840]` (with batch dimension). This caused a `RuntimeError` on Linux systems when loading models like `z_image_turbo-Q4_K.gguf`. The fix: - Dequantizes GGMLTensors first (since they don't support `unsqueeze`) - Reshapes the tensors to add the missing batch dimension ## Related Issues / Discussions Reported by Linux user using: - https://huggingface.co/leejet/Z-Image-Turbo-GGUF/resolve/main/z_image_turbo-Q4_K.gguf - https://huggingface.co/worstplayer/Z-Image_Qwen_3_4b_text_encoder_GGUF/resolve/main/Qwen_3_4b-Q6_K.gguf ## QA Instructions 1. Install a GGUF-quantized Z-Image model (e.g., `z_image_turbo-Q4_K.gguf`) 2. Install a Qwen3 GGUF encoder 3. Run a Z-Image generation 4. Verify no `RuntimeError: size mismatch for x_pad_token` error occurs ## Merge Plan None, straightforward fix. ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [ ] _Tests added / updated (if applicable)_ - [ ] _❗Changes to a redux slice have a corresponding migration_ - [ ] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_
2 parents aa764f8 + 7068cf9 commit 90e3400

File tree

1 file changed

+12
-0
lines changed
  • invokeai/backend/model_manager/load/model_loaders

1 file changed

+12
-0
lines changed

invokeai/backend/model_manager/load/model_loaders/z_image.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ def _convert_z_image_gguf_to_diffusers(sd: dict[str, Any]) -> dict[str, Any]:
4242
- x_embedder.* -> all_x_embedder.2-1.*
4343
- final_layer.* -> all_final_layer.2-1.*
4444
- norm_final.* -> skipped (diffusers uses non-learnable LayerNorm)
45+
- x_pad_token, cap_pad_token: [dim] -> [1, dim] (diffusers expects batch dimension)
4546
"""
4647
new_sd: dict[str, Any] = {}
4748

@@ -50,6 +51,17 @@ def _convert_z_image_gguf_to_diffusers(sd: dict[str, Any]) -> dict[str, Any]:
5051
new_sd[key] = value
5152
continue
5253

54+
# Handle padding tokens: GGUF has shape [dim], diffusers expects [1, dim]
55+
if key in ("x_pad_token", "cap_pad_token"):
56+
if hasattr(value, "shape") and len(value.shape) == 1:
57+
# GGMLTensor doesn't support unsqueeze, so dequantize first if needed
58+
if hasattr(value, "get_dequantized_tensor"):
59+
value = value.get_dequantized_tensor()
60+
# Use reshape instead of unsqueeze for better compatibility
61+
value = torch.as_tensor(value).reshape(1, -1)
62+
new_sd[key] = value
63+
continue
64+
5365
# Handle x_embedder -> all_x_embedder.2-1
5466
if key.startswith("x_embedder."):
5567
suffix = key[len("x_embedder.") :]

0 commit comments

Comments
 (0)