diff --git a/docs/hub/model-card-annotated.md b/docs/hub/model-card-annotated.md index f52e260d6..7b612a182 100644 --- a/docs/hub/model-card-annotated.md +++ b/docs/hub/model-card-annotated.md @@ -158,6 +158,9 @@ _Write 1-2 sentences on what the training data is. Ideally this links to a Datas ## Training Procedure [optional] +_When you want to know what hardware you'll need to train or fine-tune a model, consider the following factors: the number of parameters in the model and the training regime you plan to use._ + +_e.g A model with 3B parameters and fp32 precision format needs at least 48GB of GPU memory, while bf16 requires at least 24GB of memory with Amphere or higher hardware. Mixed pf16 requires at least 54GB of GPU memory._ ### Preprocessing @@ -166,6 +169,13 @@ _Write 1-2 sentences on what the training data is. Ideally this links to a Datas _Detail tokenization, resizing/rewriting (depending on the modality), etc._ +### Training Hyperparameters + + +* **Training regime:** training_regime` + +_Detail the model training process, specifically the type of precision used - whether it is **fp32/fp16/bf16** - and whether it is **mixed or non-mixed precision**_ + ### Speeds, Sizes, Times