I followed>https://huggingface.co/learn/cookbook/fine_tuning_llm_grpo_trl
However it seems like the example ASSUME:
- user has a decent gpu to load all of the model into a single place.
- Do not do any customization.
If I followed the example with a 4b/7b model it would throw.
Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device
If I tried to customize with a bnb config into 4bit loading(This is probably trl problem,will open issue there.)
nf4_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16, llm_int8_enable_fp32_cpu_offload=True, )
it would throw:
Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: struct c10::Half and value.dtype: struct c10::Half instead.