All notable changes to this project will be documented in this file.
- 33 Resolution Presets: Instruct resolution dropdown now includes all model-native bucket resolutions (~1MP each), ordered tallest portrait (512×2048) → square (1024×1024) → widest landscape (2048×512).
- Multi-Image Fusion 5-input support: Added
image_4andimage_5optional inputs (experimental — model officially supports up to 3, pipeline accepts more).
- Issue #16 — NF4 Low VRAM OOM: Two-stage
max_memoryestimation in quantized loader replaces one-shot approach that left no headroom for inference tensors. - Issue #15 — Multi-GPU device mismatch: Explicit
.to(device)onfreqs_cis/image_pos_idprevents cross-device errors during block-swap forward pass. - Issue #12 — Transformers 5.x compatibility:
_lookupdict guard in block swap,BitsAndBytesConfigimport path, andmodeling_utilsattribute checks updated for forward compatibility. - Instruct Image Edit / Multi-Fusion: Added missing
torch.cuda.OutOfMemoryErrorhandlers with actionable error messages. - Instruct Multi-Fusion: Applied multi-GPU block-swap device patch (was missing from instruct nodes).
- Instruct Multi-Fusion
fuse()method refactored: image path conversion uses a loop instead of separate if-blocks for each image. - Resolution tooltips updated across all Instruct generate nodes.
- Multi-Fusion workflow diagram updated for 3+ images with
think_recaptionrecommendation.
- Dead
gcimport fromhunyuan_highres_nodes.py.
hunyuan_cache_v2.py: Addedclear_generation_cache()helper used by all generate nodes for KV cache cleanup.hunyuan_shared.py: Centralized_aggressive_vram_cleanup()with stale KV-cache detection.hunyuan_block_swap.py:_lookupguard for INT8Module._applyhook (transformers 5.x).hunyuan_quantized_nodes.py: Two-stagemax_memorywith headroom for inference VRAM.hunyuan_loader_clean.py: Multi-GPU device-mismatch fix forfreqs_cis/image_pos_id.
- Instruct Model Nodes: 5 new nodes for HunyuanImage-3.0-Instruct and Instruct-Distil models
- Hunyuan Instruct Loader: Load any Instruct variant (BF16/INT8/NF4, Distil/Full). Auto-detects quant type from folder name.
- Hunyuan Instruct Generate: Text-to-image with bot_task modes (image/recaption/think_recaption). Returns CoT reasoning text.
- Hunyuan Instruct Image Edit: Edit images with natural language instructions.
- Hunyuan Instruct Multi-Image Fusion: Combine 2–3 reference images with instructions.
- Hunyuan Instruct Unload: Free cached Instruct model from VRAM/RAM.
- Block Swap: Async GPU↔CPU transformer block swapping for all loaders. Enables running BF16 (~160GB) and INT8 (~81GB) models on 48–96GB GPUs.
- HighRes Efficient Node: Loop-based MoE expert routing uses ~75× less VRAM than dispatch_mask. Generates 3MP–4K+ images on 96GB GPUs.
- Unified V2 Node: Single auto-detecting generate node with integrated block swap, VAE management, and VRAM budget.
- Flexible Model Paths: All loaders now use ComfyUI's
folder_pathssystem. Models can be stored anywhere viaextra_model_paths.yaml(hunyuanandhunyuan_instructcategories). - Pre-quantized Instruct models on Hugging Face: INT8 and NF4 variants for both Instruct and Instruct-Distil.
- INT8 bitsandbytes fix: Guard hooks that fix
Module._applydiscardingInt8Params.CB/SCBduring.to()calls. Enables block swap with INT8 models. - Soft Unload node: Move model to CPU (keep cached) for fast restore without full reload.
- Force Unload node: Complete VRAM + RAM cleanup with aggressive garbage collection.
- Clear Downstream node: Clear other models from VRAM while preserving cached Hunyuan model.
- Instruct Loader model discovery uses
folder_paths.get_folder_paths()instead of hardcoded paths - All base loaders (NF4, INT8, BF16, Multi-GPU, HighRes) migrated to centralized
get_available_hunyuan_models()andresolve_hunyuan_model_path()inhunyuan_shared.py - Updated README with comprehensive Instruct documentation, HuggingFace links, hardware tables, and workflow diagrams
- Instruct (full) INT8 with block swap: OOM during inference. Distil-INT8 works fine. Under investigation.
- RAM accumulation: Successive model loads may leak RAM. Restart ComfyUI if needed.
- Rewritten Prompt Output: Both
HunyuanImage3GenerateandHunyuanImage3GenerateLargenow output the rewritten prompt used for generation- Useful for saving to EXIF metadata
- Can be reused for regeneration or variations
- Contains the LLM-enhanced prompt when prompt rewriting is enabled
- Status Output: Both generation nodes now provide a status message indicating:
- Whether prompt rewriting was used and which style
- If prompt rewriting failed with error message
- Large image mode settings (CPU offload status)
- Generation nodes now return 3 outputs:
(image, rewritten_prompt, status)instead of just(image,) - Status messages provide better feedback about generation settings
- Low VRAM NF4 Loader: Resolved validation errors on 24GB/32GB cards by implementing a custom device map strategy that forces NF4 layers to GPU while allowing other components to offload to CPU.
- Device Mapping: Added logic to prevent
bitsandbytesfrom seeing 4-bit layers on CPU, which was causing crashes in Low VRAM mode.
rewritten_prompt: STRING - The final prompt used for generation (either original or LLM-rewritten)status: STRING - Human-readable status message about the generation process
- Full BF16 and NF4 quantized model loading
- Multi-GPU support with smart memory management
- Official HunyuanImage-3.0 prompt enhancement with LLM APIs
- Large image generation with CPU offload
- Professional resolution presets with megapixel indicators
- Resolved validation errors on 24GB/32GB cards by implementing a custom device map strategy that forces NF4 layers to GPU while allowing other components to offload to CPU.
- Added logic to prevent
bitsandbytesfrom seeing 4-bit layers on CPU, which was causing crashes in Low VRAM mode.