Could not move pipeline object to CUDA: Allocation on device .

Hey all, new to all of this and can't get it running, here is what i have done so far:

Steps I have Done:

Since using an 5070 Ti I Downloaded ComfyUI Nightly from https://github.com/comfyanonymous/ComfyUI/discussions/6643 and extracted it

cloned the repository https://github.com/SanDiegoDude/ComfyUI-HiDream-Sampler/ to my ComfyUI\custom_nodes\ Directory via
git clone https://github.com/SanDiegoDude/ComfyUI-HiDream-Sampler.git

installed requirements using
.\python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-HiDream-Sampler\requirements.txt

tried to run a Prompt, got this Error:
ImportError: Loading a GPTQ quantized model requires gptqmodel (`pip install gptqmodel`) or auto-gptq (`pip install auto-gptq`) library. 

installed gptqmodel using
.\python_embeded\python.exe -m pip install gptqmodel

tried to run a Prompt, got this Error:
!!! ERROR during execution: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at: https://github.com/triton-lang/triton

installed triton using
.\python_embeded\python.exe -m pip install -U --pre triton-windows

tried to run test_triton.py and it was succesful

tried to run a Prompt, but it just keeps crashing from here

Logfile show this:

g:\Downloads\_AI\ComfyUI_windows_portable_nightly_pytorch>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build
Checkpoint files will always be loaded safely.
Total VRAM 16302 MB, total RAM 16323 MB
pytorch version: 2.8.0.dev20250321+cu128
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 5070 Ti : cudaMallocAsync
Using pytorch attention
ComfyUI version: 0.3.27
ComfyUI frontend version: 1.14.5
[Prompt Server] web root: g:\Downloads\_AI\ComfyUI_windows_portable_nightly_pytorch\python_embeded\Lib\site-packages\comfyui_frontend_package\static
Flash Attention 2 is not available, will use PyTorch's native attention if possible.
PyTorch SDPA (Scaled Dot Product Attention) is available.
GPTQModel (transformers) support is available (recommended).
auto_gptq not available.
GPTQ support composite: (GPTQModel: True | auto_gptq: False) -> True
Flash Attention is not available for HiDream, will use PyTorch's native attention.
PyTorch SDPA (Scaled Dot Product Attention) is available for HiDream.
GPTQ dependencies available - all models should work
HiDream: Successfully registered with ComfyUI memory management
--------------------------------------------------
HiDream Sampler Node Initialized
Available Models: ['full-nf4', 'dev-nf4', 'fast-nf4', 'full', 'dev', 'fast']
--------------------------------------------------

Import times for custom nodes:
   0.0 seconds: G:\Downloads\_AI\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\custom_nodes\websocket_image_save.py
   2.8 seconds: G:\Downloads\_AI\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\custom_nodes\ComfyUI-HiDream-Sampler

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
Using resolution: 1360×768 from aspect ratio: 16:9 (1360×768)
HiDream: Initial VRAM usage: 0.00 MB
Loading model for fast-nf4...
--- Loading Model Type: fast-nf4 ---
Model Path: azaneko/HiDream-I1-Fast-nf4
NF4: True, Requires BNB: False, Requires GPTQ deps: True
Using alternate LLM: False
(Start VRAM: 0.00 MB)
Cache check for key: fast-nf4_standard
Cache contains: []

[1a] Preparing LLM (GPTQ): hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4
     Setting max memory limit: 6GiB of 15.9GiB
     Using device_map='auto'.
[1b] Loading Tokenizer: hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4...
     Tokenizer loaded.
     ✅ Fixed rope_scaling to: {'type': 'linear', 'factor': 1.0}
[GPTQ] Using transformers.GPTQConfig for quantization
g:\Downloads\_AI\ComfyUI_windows_portable_nightly_pytorch\python_embeded\Lib\site-packages\transformers\quantizers\auto.py:212: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.However, loading attributes (e.g. ['backend', 'use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
  warnings.warn(warning_msg)
PyTorch version 2.8.0.dev20250321+cu128 available.

INFO  ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
INFO   Kernel: Auto-selection: adding candidate `TritonV2QuantLinear`
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:15<00:00,  7.91s/it]
Some weights of the model checkpoint at hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4 were not used when initializing LlamaForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.11.mlp.down_proj.bias', 'model.layers.11.mlp.gate_proj.bias', 'model.layers.11.mlp.up_proj.bias', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.11.self_attn.o_proj.bias', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.12.mlp.down_proj.bias', 'model.layers.12.mlp.gate_proj.bias', 'model.layers.12.mlp.up_proj.bias', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.12.self_attn.o_proj.bias', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.13.mlp.down_proj.bias', 'model.layers.13.mlp.gate_proj.bias', 'model.layers.13.mlp.up_proj.bias', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.13.self_attn.o_proj.bias', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.14.mlp.down_proj.bias', 'model.layers.14.mlp.gate_proj.bias', 'model.layers.14.mlp.up_proj.bias', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.14.self_attn.o_proj.bias', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.15.mlp.down_proj.bias', 'model.layers.15.mlp.gate_proj.bias', 'model.layers.15.mlp.up_proj.bias', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.15.self_attn.o_proj.bias', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.16.mlp.down_proj.bias', 'model.layers.16.mlp.gate_proj.bias', 'model.layers.16.mlp.up_proj.bias', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.16.self_attn.o_proj.bias', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.17.mlp.down_proj.bias', 'model.layers.17.mlp.gate_proj.bias', 'model.layers.17.mlp.up_proj.bias', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.17.self_attn.o_proj.bias', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.18.mlp.down_proj.bias', 'model.layers.18.mlp.gate_proj.bias', 'model.layers.18.mlp.up_proj.bias', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.18.self_attn.o_proj.bias', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.19.mlp.down_proj.bias', 'model.layers.19.mlp.gate_proj.bias', 'model.layers.19.mlp.up_proj.bias', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.19.self_attn.o_proj.bias', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.2.mlp.down_proj.bias', 'model.layers.2.mlp.gate_proj.bias', 'model.layers.2.mlp.up_proj.bias', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.2.self_attn.o_proj.bias', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.20.mlp.down_proj.bias', 'model.layers.20.mlp.gate_proj.bias', 'model.layers.20.mlp.up_proj.bias', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.20.self_attn.o_proj.bias', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.21.mlp.down_proj.bias', 'model.layers.21.mlp.gate_proj.bias', 'model.layers.21.mlp.up_proj.bias', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.21.self_attn.o_proj.bias', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.22.mlp.down_proj.bias', 'model.layers.22.mlp.gate_proj.bias', 'model.layers.22.mlp.up_proj.bias', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.22.self_attn.o_proj.bias', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.23.mlp.down_proj.bias', 'model.layers.23.mlp.gate_proj.bias', 'model.layers.23.mlp.up_proj.bias', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.23.self_attn.o_proj.bias', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.24.mlp.down_proj.bias', 'model.layers.24.mlp.gate_proj.bias', 'model.layers.24.mlp.up_proj.bias', 'model.layers.24.self_attn.k_proj.bias', 'model.layers.24.self_attn.o_proj.bias', 'model.layers.24.self_attn.q_proj.bias', 'model.layers.24.self_attn.v_proj.bias', 'model.layers.25.mlp.down_proj.bias', 'model.layers.25.mlp.gate_proj.bias', 'model.layers.25.mlp.up_proj.bias', 'model.layers.25.self_attn.k_proj.bias', 'model.layers.25.self_attn.o_proj.bias', 'model.layers.25.self_attn.q_proj.bias', 'model.layers.25.self_attn.v_proj.bias', 'model.layers.26.mlp.down_proj.bias', 'model.layers.26.mlp.gate_proj.bias', 'model.layers.26.mlp.up_proj.bias', 'model.layers.26.self_attn.k_proj.bias', 'model.layers.26.self_attn.o_proj.bias', 'model.layers.26.self_attn.q_proj.bias', 'model.layers.26.self_attn.v_proj.bias', 'model.layers.27.mlp.down_proj.bias', 'model.layers.27.mlp.gate_proj.bias', 'model.layers.27.mlp.up_proj.bias', 'model.layers.27.self_attn.k_proj.bias', 'model.layers.27.self_attn.o_proj.bias', 'model.layers.27.self_attn.q_proj.bias', 'model.layers.27.self_attn.v_proj.bias', 'model.layers.28.mlp.down_proj.bias', 'model.layers.28.mlp.gate_proj.bias', 'model.layers.28.mlp.up_proj.bias', 'model.layers.28.self_attn.k_proj.bias', 'model.layers.28.self_attn.o_proj.bias', 'model.layers.28.self_attn.q_proj.bias', 'model.layers.28.self_attn.v_proj.bias', 'model.layers.29.mlp.down_proj.bias', 'model.layers.29.mlp.gate_proj.bias', 'model.layers.29.mlp.up_proj.bias', 'model.layers.29.self_attn.k_proj.bias', 'model.layers.29.self_attn.o_proj.bias', 'model.layers.29.self_attn.q_proj.bias', 'model.layers.29.self_attn.v_proj.bias', 'model.layers.3.mlp.down_proj.bias', 'model.layers.3.mlp.gate_proj.bias', 'model.layers.3.mlp.up_proj.bias', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.3.self_attn.o_proj.bias', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.30.mlp.down_proj.bias', 'model.layers.30.mlp.gate_proj.bias', 'model.layers.30.mlp.up_proj.bias', 'model.layers.30.self_attn.k_proj.bias', 'model.layers.30.self_attn.o_proj.bias', 'model.layers.30.self_attn.q_proj.bias', 'model.layers.30.self_attn.v_proj.bias', 'model.layers.31.mlp.down_proj.bias', 'model.layers.31.mlp.gate_proj.bias', 'model.layers.31.mlp.up_proj.bias', 'model.layers.31.self_attn.k_proj.bias', 'model.layers.31.self_attn.o_proj.bias', 'model.layers.31.self_attn.q_proj.bias', 'model.layers.31.self_attn.v_proj.bias', 'model.layers.4.mlp.down_proj.bias', 'model.layers.4.mlp.gate_proj.bias', 'model.layers.4.mlp.up_proj.bias', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.4.self_attn.o_proj.bias', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.5.mlp.down_proj.bias', 'model.layers.5.mlp.gate_proj.bias', 'model.layers.5.mlp.up_proj.bias', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.5.self_attn.o_proj.bias', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.6.mlp.down_proj.bias', 'model.layers.6.mlp.gate_proj.bias', 'model.layers.6.mlp.up_proj.bias', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.6.self_attn.o_proj.bias', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.7.mlp.down_proj.bias', 'model.layers.7.mlp.gate_proj.bias', 'model.layers.7.mlp.up_proj.bias', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.7.self_attn.o_proj.bias', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.8.mlp.down_proj.bias', 'model.layers.8.mlp.gate_proj.bias', 'model.layers.8.mlp.up_proj.bias', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.8.self_attn.o_proj.bias', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.9.mlp.down_proj.bias', 'model.layers.9.mlp.gate_proj.bias', 'model.layers.9.mlp.up_proj.bias', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.9.self_attn.o_proj.bias', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.9.self_attn.v_proj.bias']
- This IS expected if you are initializing LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
INFO  Format: Converting `checkpoint_format` from `gptq` to internal `gptq_v2`.
INFO  Format: Converting GPTQ v1 to v2
INFO  Format: Conversion complete: 0.06823968887329102s
INFO  Optimize: `TritonV2QuantLinear` compilation triggered.
✅ Text encoder loaded! (VRAM: 5467.26 MB)

[2] Preparing Transformer from: azaneko/HiDream-I1-Fast-nf4
     Type: NF4
     Loading Transformer... (May download files)
     Moving Transformer to CUDA...
✅ Transformer loaded! (VRAM: 14646.96 MB)

[3] Preparing Scheduler: FlashFlowMatchEulerDiscreteScheduler
     Using Scheduler: FlashFlowMatchEulerDiscreteScheduler

[4] Loading Pipeline from: azaneko/HiDream-I1-Fast-nf4
     Passing pre-loaded components...
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.00s/it]
Loading pipeline components...: 100%|██████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.04it/s]
     Pipeline structure loaded.

[5] Finalizing Pipeline...
     Assigning transformer...
     Moving pipeline object to CUDA (final check)...
     Warning: Could not move pipeline object to CUDA: Allocation on device .
     Attempting CPU offload for NF4...

g:\Downloads\_AI\ComfyUI_windows_portable_nightly_pytorch>pause

I don't know how to proceed from here to troubleshoot further, any ideas are welcome!

Thanks



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not move pipeline object to CUDA: Allocation on device . #28

HiDream Sampler Node Initialized
Available Models: ['full-nf4', 'dev-nf4', 'fast-nf4', 'full', 'dev', 'fast']

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Could not move pipeline object to CUDA: Allocation on device . #28

Description

HiDream Sampler Node Initialized Available Models: ['full-nf4', 'dev-nf4', 'fast-nf4', 'full', 'dev', 'fast']

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

HiDream Sampler Node Initialized
Available Models: ['full-nf4', 'dev-nf4', 'fast-nf4', 'full', 'dev', 'fast']