ZImage output is just noise after hack to make zimage work in fp16

### Custom Node Testing

- [x] I have tried disabling custom nodes and the issue persists (see [how to disable custom nodes](https://docs.comfy.org/troubleshooting/custom-node-issues#step-1%3A-test-with-all-custom-nodes-disabled) if you need help)

### Expected Behavior

The output should somewhat resemble what it was before when I ran the model and "manual cast: FP32" was printed in the logs.

### Actual Behavior

The output is just noise.

<img width="360" height="360" alt="Image" src="https://github.com/user-attachments/assets/a111c82d-7a46-472c-9de7-26ad67e277c2" />

### Steps to Reproduce

I can reproduce it by simply running the following workflow:

<img width="360" height="360" alt="Image" src="https://github.com/user-attachments/assets/f86d80de-6ac8-49dc-a190-ecaf49b7f7cb" />

### Debug Logs

```powershell
Checkpoint files will always be loaded safely.
Total VRAM 12272 MB, total RAM 31494 MB
pytorch version: 2.4.1+rocm6.1
AMD arch: gfx1030
ROCm version: (6, 1)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 6800M : native
Enabled pinned memory 29919.0
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
Python version: 3.11.13 (main, Jun  5 2025, 13:12:00) [GCC 11.2.0]
ComfyUI version: 0.3.76
ComfyUI frontend version: 1.33.10
[Prompt Server] web root: /home/user/envs/stable_diffusion/lib/python3.11/site-packages/comfyui_frontend_package/static
Total VRAM 12272 MB, total RAM 31494 MB
pytorch version: 2.4.1+rocm6.1
AMD arch: gfx1030
ROCm version: (6, 1)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 6800M : native
Enabled pinned memory 29919.0
Skipping loading of custom nodes
Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load ZImageTEModel_
loaded completely; 11028.80 MB usable, 7672.25 MB loaded, full load: True
model weight dtype torch.float16, manual cast: None
model_type FLOW
unet missing: ['norm_final.weight']
Requested to load Lumina2
loaded partially; 4795.20 MB usable, 4720.20 MB loaded, 7019.35 MB offloaded, 75.00 MB buffer reserved, lowvram patches: 0
100%|██████████████████████████████████████████████████████████| 8/8 [01:21<00:00, 10.14s/it]
Requested to load AutoencodingEngine
0 models unloaded.
loaded completely; 4773.80 MB usable, 319.75 MB loaded, full load: True
Prompt executed in 114.16 seconds
```

### Other

I am fairly certain that the issue stems from commit https://github.com/comfyanonymous/ComfyUI/pull/11057.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZImage output is just noise after hack to make zimage work in fp16 #11087

Custom Node Testing

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ZImage output is just noise after hack to make zimage work in fp16 #11087

Description

Custom Node Testing

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions