Non generic example causes. Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

I followed>https://huggingface.co/learn/cookbook/fine_tuning_llm_grpo_trl

However it seems like the example ASSUME:
- user has a decent gpu to load all of the model into a single place.
- Do not do any customization.

If I followed the example with a 4b/7b model it would throw.
`Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device`

If I tried to customize with a bnb config into 4bit loading(This is probably trl problem,will open issue there.)
`nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16,
   llm_int8_enable_fp32_cpu_offload=True,
)`
it would throw:
`Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: struct c10::Half and value.dtype: struct c10::Half instead.`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non generic example causes. Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. #292

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non generic example causes. Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. #292

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions