Skip to content

Non generic example causes. Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. #292

@hgftrdw45ud67is8o89

Description

@hgftrdw45ud67is8o89

I followed>https://huggingface.co/learn/cookbook/fine_tuning_llm_grpo_trl

However it seems like the example ASSUME:

  • user has a decent gpu to load all of the model into a single place.
  • Do not do any customization.

If I followed the example with a 4b/7b model it would throw.
Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device

If I tried to customize with a bnb config into 4bit loading(This is probably trl problem,will open issue there.)
nf4_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16, llm_int8_enable_fp32_cpu_offload=True, )
it would throw:
Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: struct c10::Half and value.dtype: struct c10::Half instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions