Skip to content

Commit 219b914

Browse files
osansevieropcuenca
authored andcommitted
Add some info on dtypes (huggingface#1425)
* Add some info on dtypes * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * Update codellama.md --------- Co-authored-by: Pedro Cuenca <[email protected]>
1 parent 4b54e49 commit 219b914

File tree

1 file changed

+11
-2
lines changed

1 file changed

+11
-2
lines changed

codellama.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ Code LLMs are an exciting development for software engineers because they can bo
3939
- [How to use Code Llama?](#how-to-use-code-llama)
4040
- [Demo](#demo)
4141
- [Transformers](#transformers)
42+
- [A Note on dtypes](#a-note-on-dtypes)
4243
- [Code Completion](#code-completion)
4344
- [Code Infilling](#code-infilling)
4445
- [Conversational Instructions](#conversational-instructions)
@@ -76,9 +77,8 @@ You can easily try the Code Llama Model (13 billion parameters!) in **[this Spa
7677

7778
Under the hood, this playground uses Hugging Face's [Text Generation Inference](https://github.com/huggingface/text-generation-inference), the same technology that powers [HuggingChat](https://huggingface.co/chat/), and we'll share more in the following sections.
7879

79-
You can also check [this chat-based demo](https://huggingface.co/spaces/codellama/codellama-13b-chat) and clone it for your use – it's self-contained so you can examine the source code and adapt it as you wish!
80+
If you want to try out the bigger instruct-tuned 34B model, it is now available on **HuggingChat**! You can try it out here: [hf.co/chat](https://hf.co/chat). Make sure to specify the Code Llama model. You can also check [this chat-based demo](https://huggingface.co/spaces/codellama/codellama-13b-chat) and duplicate it for your use – it's self-contained, so you can examine the source code and adapt it as you wish!
8081

81-
If you want to try out the bigger instruct-tuned 34B model, it is now available on **HuggingChat**! You can try it out here: [hf.co/chat](https://hf.co/chat)
8282
### Transformers
8383

8484
With the upcoming release of `transformers` 4.33, you can use Code Llama and leverage all the tools within the HF ecosystem, such as:
@@ -94,6 +94,15 @@ Until `transformers` 4.33 is released, please install it from the main branch.
9494
```bash
9595
!pip install git+https://github.com/huggingface/transformers.git@main accelerate
9696
```
97+
#### A Note on dtypes
98+
99+
When using models like Code Llama, it's important to take a look at the data types of the models.
100+
101+
* 32-bit floating point (`float32`): PyTorch convention on model initialization is to load models in `float32`, no matter with which precision the model weights were stored. `transformers` also follows this convention for consistency with PyTorch.
102+
* 16-bit Brain floating point (`bfloat16`): Code Llama was trained with this precision, so we recommend using it for further training or fine-tuning.
103+
* 16-bit floating point (`float16`): We recommend running inference using this precision, as it's usually faster than `bfloat16`, and evaluation metrics show no discernible degradation with respect to `bfloat16`. You can also run inference using `bfloat16`, and we recommend you check inference results with both `float16` and `bfloat16` after fine-tuning.
104+
105+
As mentioned above, `transformers` loads weights using `float32` (no matter with which precision the models are stored), so it's important to specify the desired `dtype` when loading the models. If you want to fine-tune Code Llama, it's recommended to use `bfloat16`, as using `float16` can lead to overflows and NaNs. If you run inference, we recommend using `float16` because `bfloat16` can be slower.
97106

98107
#### Code Completion
99108

0 commit comments

Comments
 (0)