Getting issue when running hf llama2_13b chat that data is on cpu and not on cuda #474

e2eNAK · 2024-04-29T07:33:20Z

e2eNAK
Apr 29, 2024

The exact issue it states is : :/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py:1535: UserWarning: You are calling .generate() with the ⁠ input_ids ⁠ being on a device type different than your model's device. ⁠ input_ids ⁠ is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put ⁠ input_ids ⁠ to the correct device by calling for example input_ids = input_ids.to('cuda') before running ⁠ .generate() ⁠.
warnings.warn(
WARNING:nemoguardrails.actions.action_dispatcher:Error while execution generate_user_intent: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

when i tried messages.to("cuda") it gives me :
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32000, 5120)
(layers): ModuleList(
(0-39): 40 x LlamaDecoderLayer(
(self_attn): LlamaAttention(
(q_proj): Linear(in_features=5120, out_features=5120, bias=False)
(k_proj): Linear(in_features=5120, out_features=5120, bias=False)
(v_proj): Linear(in_features=5120, out_features=5120, bias=False)
(o_proj): Linear(in_features=5120, out_features=5120, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=5120, out_features=13824, bias=False)
(up_proj): Linear(in_features=5120, out_features=13824, bias=False)
(down_proj): Linear(in_features=13824, out_features=5120, bias=False)
(act_fn): SiLUActivation()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): Linear(in_features=5120, out_features=32000, bias=False)
)
Fetching 7 files: 100%|██████████| 7/7 [00:00<00:00, 6462.72it/s]
ERROR:nemoguardrails.server.api:'list' object has no attribute 'to'
Traceback (most recent call last):
File "/nemoguardrails/nemoguardrails/server/api.py", line 337, in chat_completion
messages = messages.to("cuda")
AttributeError: 'list' object has no attribute 'to'

drazvan · 2024-04-29T20:04:30Z

drazvan
Apr 29, 2024
Maintainer

Hi @e2eNAK ! Can you provide the exact config for reproducing this. Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting issue when running hf llama2_13b chat that data is on cpu and not on cuda #474

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Getting issue when running hf llama2_13b chat that data is on cpu and not on cuda #474

Uh oh!

e2eNAK Apr 29, 2024

Replies: 1 comment

Uh oh!

drazvan Apr 29, 2024 Maintainer

e2eNAK
Apr 29, 2024

drazvan
Apr 29, 2024
Maintainer