Skip to content

Could not move pipeline object to CUDA: Allocation on device . #28

@Harbinger79

Description

@Harbinger79

Hey all, new to all of this and can't get it running, here is what i have done so far:

Steps I have Done:

Since using an 5070 Ti I Downloaded ComfyUI Nightly from Comfy-Org/ComfyUI#6643 and extracted it

cloned the repository https://github.com/SanDiegoDude/ComfyUI-HiDream-Sampler/ to my ComfyUI\custom_nodes\ Directory via
git clone https://github.com/SanDiegoDude/ComfyUI-HiDream-Sampler.git

installed requirements using
.\python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-HiDream-Sampler\requirements.txt

tried to run a Prompt, got this Error:
ImportError: Loading a GPTQ quantized model requires gptqmodel (pip install gptqmodel) or auto-gptq (pip install auto-gptq) library.

installed gptqmodel using
.\python_embeded\python.exe -m pip install gptqmodel

tried to run a Prompt, got this Error:
!!! ERROR during execution: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at: https://github.com/triton-lang/triton

installed triton using
.\python_embeded\python.exe -m pip install -U --pre triton-windows

tried to run test_triton.py and it was succesful

tried to run a Prompt, but it just keeps crashing from here

Logfile show this:

g:\Downloads_AI\ComfyUI_windows_portable_nightly_pytorch>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build
Checkpoint files will always be loaded safely.
Total VRAM 16302 MB, total RAM 16323 MB
pytorch version: 2.8.0.dev20250321+cu128
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 5070 Ti : cudaMallocAsync
Using pytorch attention
ComfyUI version: 0.3.27
ComfyUI frontend version: 1.14.5
[Prompt Server] web root: g:\Downloads_AI\ComfyUI_windows_portable_nightly_pytorch\python_embeded\Lib\site-packages\comfyui_frontend_package\static
Flash Attention 2 is not available, will use PyTorch's native attention if possible.
PyTorch SDPA (Scaled Dot Product Attention) is available.
GPTQModel (transformers) support is available (recommended).
auto_gptq not available.
GPTQ support composite: (GPTQModel: True | auto_gptq: False) -> True
Flash Attention is not available for HiDream, will use PyTorch's native attention.
PyTorch SDPA (Scaled Dot Product Attention) is available for HiDream.
GPTQ dependencies available - all models should work
HiDream: Successfully registered with ComfyUI memory management

HiDream Sampler Node Initialized
Available Models: ['full-nf4', 'dev-nf4', 'fast-nf4', 'full', 'dev', 'fast']

Import times for custom nodes:
0.0 seconds: G:\Downloads_AI\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\custom_nodes\websocket_image_save.py
2.8 seconds: G:\Downloads_AI\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\custom_nodes\ComfyUI-HiDream-Sampler

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
Using resolution: 1360×768 from aspect ratio: 16:9 (1360×768)
HiDream: Initial VRAM usage: 0.00 MB
Loading model for fast-nf4...
--- Loading Model Type: fast-nf4 ---
Model Path: azaneko/HiDream-I1-Fast-nf4
NF4: True, Requires BNB: False, Requires GPTQ deps: True
Using alternate LLM: False
(Start VRAM: 0.00 MB)
Cache check for key: fast-nf4_standard
Cache contains: []

[1a] Preparing LLM (GPTQ): hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4
Setting max memory limit: 6GiB of 15.9GiB
Using device_map='auto'.
[1b] Loading Tokenizer: hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4...
Tokenizer loaded.
✅ Fixed rope_scaling to: {'type': 'linear', 'factor': 1.0}
[GPTQ] Using transformers.GPTQConfig for quantization
g:\Downloads_AI\ComfyUI_windows_portable_nightly_pytorch\python_embeded\Lib\site-packages\transformers\quantizers\auto.py:212: UserWarning: You passed quantization_config or equivalent parameters to from_pretrained but the model you're loading already has a quantization_config attribute. The quantization_config from the model will be used.However, loading attributes (e.g. ['backend', 'use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to from_pretrained. The rest will be ignored.
warnings.warn(warning_msg)
PyTorch version 2.8.0.dev20250321+cu128 available.

INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
INFO Kernel: Auto-selection: adding candidate TritonV2QuantLinear
loss_type=None was set in the config but it is unrecognised.Using the default loss: ForCausalLMLoss.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:15<00:00, 7.91s/it]
Some weights of the model checkpoint at hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4 were not used when initializing LlamaForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.11.mlp.down_proj.bias', 'model.layers.11.mlp.gate_proj.bias', 'model.layers.11.mlp.up_proj.bias', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.11.self_attn.o_proj.bias', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.12.mlp.down_proj.bias', 'model.layers.12.mlp.gate_proj.bias', 'model.layers.12.mlp.up_proj.bias', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.12.self_attn.o_proj.bias', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.13.mlp.down_proj.bias', 'model.layers.13.mlp.gate_proj.bias', 'model.layers.13.mlp.up_proj.bias', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.13.self_attn.o_proj.bias', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.14.mlp.down_proj.bias', 'model.layers.14.mlp.gate_proj.bias', 'model.layers.14.mlp.up_proj.bias', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.14.self_attn.o_proj.bias', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.15.mlp.down_proj.bias', 'model.layers.15.mlp.gate_proj.bias', 'model.layers.15.mlp.up_proj.bias', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.15.self_attn.o_proj.bias', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.16.mlp.down_proj.bias', 'model.layers.16.mlp.gate_proj.bias', 'model.layers.16.mlp.up_proj.bias', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.16.self_attn.o_proj.bias', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.17.mlp.down_proj.bias', 'model.layers.17.mlp.gate_proj.bias', 'model.layers.17.mlp.up_proj.bias', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.17.self_attn.o_proj.bias', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.18.mlp.down_proj.bias', 'model.layers.18.mlp.gate_proj.bias', 'model.layers.18.mlp.up_proj.bias', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.18.self_attn.o_proj.bias', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.19.mlp.down_proj.bias', 'model.layers.19.mlp.gate_proj.bias', 'model.layers.19.mlp.up_proj.bias', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.19.self_attn.o_proj.bias', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.2.mlp.down_proj.bias', 'model.layers.2.mlp.gate_proj.bias', 'model.layers.2.mlp.up_proj.bias', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.2.self_attn.o_proj.bias', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.20.mlp.down_proj.bias', 'model.layers.20.mlp.gate_proj.bias', 'model.layers.20.mlp.up_proj.bias', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.20.self_attn.o_proj.bias', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.21.mlp.down_proj.bias', 'model.layers.21.mlp.gate_proj.bias', 'model.layers.21.mlp.up_proj.bias', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.21.self_attn.o_proj.bias', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.22.mlp.down_proj.bias', 'model.layers.22.mlp.gate_proj.bias', 'model.layers.22.mlp.up_proj.bias', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.22.self_attn.o_proj.bias', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.23.mlp.down_proj.bias', 'model.layers.23.mlp.gate_proj.bias', 'model.layers.23.mlp.up_proj.bias', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.23.self_attn.o_proj.bias', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.24.mlp.down_proj.bias', 'model.layers.24.mlp.gate_proj.bias', 'model.layers.24.mlp.up_proj.bias', 'model.layers.24.self_attn.k_proj.bias', 'model.layers.24.self_attn.o_proj.bias', 'model.layers.24.self_attn.q_proj.bias', 'model.layers.24.self_attn.v_proj.bias', 'model.layers.25.mlp.down_proj.bias', 'model.layers.25.mlp.gate_proj.bias', 'model.layers.25.mlp.up_proj.bias', 'model.layers.25.self_attn.k_proj.bias', 'model.layers.25.self_attn.o_proj.bias', 'model.layers.25.self_attn.q_proj.bias', 'model.layers.25.self_attn.v_proj.bias', 'model.layers.26.mlp.down_proj.bias', 'model.layers.26.mlp.gate_proj.bias', 'model.layers.26.mlp.up_proj.bias', 'model.layers.26.self_attn.k_proj.bias', 'model.layers.26.self_attn.o_proj.bias', 'model.layers.26.self_attn.q_proj.bias', 'model.layers.26.self_attn.v_proj.bias', 'model.layers.27.mlp.down_proj.bias', 'model.layers.27.mlp.gate_proj.bias', 'model.layers.27.mlp.up_proj.bias', 'model.layers.27.self_attn.k_proj.bias', 'model.layers.27.self_attn.o_proj.bias', 'model.layers.27.self_attn.q_proj.bias', 'model.layers.27.self_attn.v_proj.bias', 'model.layers.28.mlp.down_proj.bias', 'model.layers.28.mlp.gate_proj.bias', 'model.layers.28.mlp.up_proj.bias', 'model.layers.28.self_attn.k_proj.bias', 'model.layers.28.self_attn.o_proj.bias', 'model.layers.28.self_attn.q_proj.bias', 'model.layers.28.self_attn.v_proj.bias', 'model.layers.29.mlp.down_proj.bias', 'model.layers.29.mlp.gate_proj.bias', 'model.layers.29.mlp.up_proj.bias', 'model.layers.29.self_attn.k_proj.bias', 'model.layers.29.self_attn.o_proj.bias', 'model.layers.29.self_attn.q_proj.bias', 'model.layers.29.self_attn.v_proj.bias', 'model.layers.3.mlp.down_proj.bias', 'model.layers.3.mlp.gate_proj.bias', 'model.layers.3.mlp.up_proj.bias', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.3.self_attn.o_proj.bias', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.30.mlp.down_proj.bias', 'model.layers.30.mlp.gate_proj.bias', 'model.layers.30.mlp.up_proj.bias', 'model.layers.30.self_attn.k_proj.bias', 'model.layers.30.self_attn.o_proj.bias', 'model.layers.30.self_attn.q_proj.bias', 'model.layers.30.self_attn.v_proj.bias', 'model.layers.31.mlp.down_proj.bias', 'model.layers.31.mlp.gate_proj.bias', 'model.layers.31.mlp.up_proj.bias', 'model.layers.31.self_attn.k_proj.bias', 'model.layers.31.self_attn.o_proj.bias', 'model.layers.31.self_attn.q_proj.bias', 'model.layers.31.self_attn.v_proj.bias', 'model.layers.4.mlp.down_proj.bias', 'model.layers.4.mlp.gate_proj.bias', 'model.layers.4.mlp.up_proj.bias', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.4.self_attn.o_proj.bias', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.5.mlp.down_proj.bias', 'model.layers.5.mlp.gate_proj.bias', 'model.layers.5.mlp.up_proj.bias', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.5.self_attn.o_proj.bias', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.6.mlp.down_proj.bias', 'model.layers.6.mlp.gate_proj.bias', 'model.layers.6.mlp.up_proj.bias', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.6.self_attn.o_proj.bias', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.7.mlp.down_proj.bias', 'model.layers.7.mlp.gate_proj.bias', 'model.layers.7.mlp.up_proj.bias', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.7.self_attn.o_proj.bias', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.8.mlp.down_proj.bias', 'model.layers.8.mlp.gate_proj.bias', 'model.layers.8.mlp.up_proj.bias', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.8.self_attn.o_proj.bias', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.9.mlp.down_proj.bias', 'model.layers.9.mlp.gate_proj.bias', 'model.layers.9.mlp.up_proj.bias', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.9.self_attn.o_proj.bias', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.9.self_attn.v_proj.bias']

  • This IS expected if you are initializing LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    INFO Format: Converting checkpoint_format from gptq to internal gptq_v2.
    INFO Format: Converting GPTQ v1 to v2
    INFO Format: Conversion complete: 0.06823968887329102s
    INFO Optimize: TritonV2QuantLinear compilation triggered.
    ✅ Text encoder loaded! (VRAM: 5467.26 MB)

[2] Preparing Transformer from: azaneko/HiDream-I1-Fast-nf4
Type: NF4
Loading Transformer... (May download files)
Moving Transformer to CUDA...
✅ Transformer loaded! (VRAM: 14646.96 MB)

[3] Preparing Scheduler: FlashFlowMatchEulerDiscreteScheduler
Using Scheduler: FlashFlowMatchEulerDiscreteScheduler

[4] Loading Pipeline from: azaneko/HiDream-I1-Fast-nf4
Passing pre-loaded components...
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.00s/it]
Loading pipeline components...: 100%|██████████████████████████████████████████████████| 10/10 [00:04<00:00, 2.04it/s]
Pipeline structure loaded.

[5] Finalizing Pipeline...
Assigning transformer...
Moving pipeline object to CUDA (final check)...
Warning: Could not move pipeline object to CUDA: Allocation on device .
Attempting CPU offload for NF4...

g:\Downloads_AI\ComfyUI_windows_portable_nightly_pytorch>pause

I don't know how to proceed from here to troubleshoot further, any ideas are welcome!

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions