Outdated documentation after Transformers v5 dtype default behavior changed

huggingface/transformers#42805 drastically changed the default behaviour of loading models with `AutoModelForCausalLM.from_pretrained`.

Following up on fixes and suggestions in #4770:
> ### Longer-term: TRL should provide training-oriented defaults
>
> [...]
>
> However, it looks like the load dtype often follows the model dtype, which can implicitly put users/tests into fp16/bf16 without intent:
>
> [...]
>
> 1. Make the default load dtype fp32: when the user passes a model ID
>
> [...]
>
>  key idea is: we should not end up training in the model dtype unless it’s intentional, especially in tests that are not meant to validate this specific (and likely unstable) case.

_Originally posted by @qgallouedec in https://github.com/huggingface/trl/issues/4770#issuecomment-3705883591_ 

It would be nice if the documentation was updated accordingly for the cases when the initialization of the model is not handled by TRL but by the user. This means basically everywhere `AutoModelForCausalLM.from_pretrained` is used without explicit `dtype` argument. For example in the [Training Customization docs page](https://huggingface.co/docs/trl/en/customization):

https://github.com/huggingface/trl/blob/4fea6d13dbf61415a2df1447c5162dd0fede540c/docs/source/customization.md?plain=1#L19

The documentation should also warn user that, due to this change in transformers, they may unintentionally end up training fully in fp16/bf16, which can negatively affect training stability and convergence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outdated documentation after Transformers v5 dtype default behavior changed #5329

Longer-term: TRL should provide training-oriented defaults

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Outdated documentation after Transformers v5 dtype default behavior changed #5329

Description

Longer-term: TRL should provide training-oriented defaults

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions