You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+21-12Lines changed: 21 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,32 +1,38 @@
1
1
# Llama Recipes: Examples to get started using the Llama models from Meta
2
2
<!-- markdown-link-check-disable -->
3
-
The 'llama-recipes' repository is a companion to the [Meta Llama 3](https://github.com/meta-llama/llama3) models. The goal of this repositoryis to provide a scalable library for fine-tuning Meta Llama models, along with some example scripts and notebooks to quickly get started with using the models in a variety of use-cases, including fine-tuning for domain adaptation and building LLM-based applications with Meta Llama and other tools in the LLM ecosystem. The examples here showcase how to run Meta Llama locally, in the cloud, and on-prem. [Meta Llama 2](https://github.com/meta-llama/llama) is also supported in this repository. We highly recommend everyone to utilize [Meta Llama 3](https://github.com/meta-llama/llama3) due to its enhanced capabilities.
3
+
The 'llama-recipes' repository is a companion to the [Meta Llama](https://github.com/meta-llama/llama-models) models. We support the latest version, [Llama 3.1](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md), in this repository. The goal is to provide a scalable library for fine-tuning Meta Llama models, along with some example scripts and notebooks to quickly get started with using the models in a variety of use-cases, including fine-tuning for domain adaptation and building LLM-based applications with Llama and other tools in the LLM ecosystem. The examples here showcase how to run Llama locally, in the cloud, and on-prem.
4
4
5
5
<!-- markdown-link-check-enable -->
6
6
> [!IMPORTANT]
7
-
> Meta Llama 3 has a new prompt template and special tokens (based on the tiktoken tokenizer).
7
+
> Meta Llama 3.1 has a new prompt template and special tokens.
8
8
> | Token | Description |
9
9
> |---|---|
10
-
> `<\|begin_of_text\|>` | This is equivalent to the BOS token. |
11
-
> `<\|end_of_text\|>` | This is equivalent to the EOS token. For multiturn-conversations it's usually unused. Instead, every message is terminated with `<\|eot_id\|>` instead.|
12
-
> `<\|eot_id\|>` | This token signifies the end of the message in a turn i.e. the end of a single message by a system, user or assistant role as shown below.|
13
-
> `<\|start_header_id\|>{role}<\|end_header_id\|>` | These tokens enclose the role for a particular message. The possible roles can be: system, user, assistant. |
10
+
> `<\|begin_of_text\|>` | Specifies the start of the prompt. |
11
+
> `<\|eot_id\|>` | This token signifies the end of a turn i.e. the end of the model's interaction either with the user or tool executor. |
12
+
> `<\|eom_id\|>` | End of Message. A message represents a possible stopping point where the model can inform the execution environment that a tool call needs to be made. |
13
+
> `<\|python_tag\|>` | A special tag used in the model’s response to signify a tool call. |
14
+
> `<\|finetune_right_pad_id\|>` | Used for padding text sequences in a batch to the same length. |
15
+
> `<\|start_header_id\|>{role}<\|end_header_id\|>` | These tokens enclose the role for a particular message. The possible roles can be: system, user, assistant and ipython. |
16
+
> `<\|end_of_text\|>` | This is equivalent to the EOS token. For multiturn-conversations it's usually unused, this token is expected to be generated only by the base models. |
14
17
>
15
-
> A multiturn-conversation with Meta Llama 3follows this prompt template:
18
+
> A multiturn-conversation with Meta Llama 3.1 that includes tool-calling follows this structure:
> Each message gets trailed by an `<|eot_id|>` token before a new header is started, signaling a role change.
28
33
>
29
-
> More details on the new tokenizer and prompt template can be found [here](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3#special-tokens-used-with-meta-llama-3).
34
+
> More details on the new tokenizer and prompt template can be found [here](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1).
35
+
30
36
>
31
37
> [!NOTE]
32
38
> The llama-recipes repository was recently refactored to promote a better developer experience of using the examples. Some files have been moved to new locations. The `src/` folder has NOT been modified, so the functionality of this repo and package is not impacted.
@@ -139,6 +145,7 @@ Contains examples are organized in folders by topic:
139
145
[use_cases](./recipes/use_cases)|Scripts showing common applications of Meta Llama3
140
146
[3p_integrations](./recipes/3p_integrations)|Partner owned folder showing common applications of Meta Llama3
141
147
[responsible_ai](./recipes/responsible_ai)|Scripts to use PurpleLlama for safeguarding model outputs
148
+
[experimental](./recipes/experimental)|Meta Llama implementations of experimental LLM techniques
142
149
143
150
### `src/`
144
151
@@ -160,7 +167,9 @@ Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduc
160
167
## License
161
168
<!-- markdown-link-check-disable -->
162
169
163
-
See the License file for Meta Llama 3 [here](https://llama.meta.com/llama3/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama3/use-policy/)
170
+
See the License file for Meta Llama 3.1 [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) and Acceptable Use Policy [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/USE_POLICY.md)
171
+
172
+
See the License file for Meta Llama 3 [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3/LICENSE) and Acceptable Use Policy [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3/USE_POLICY.md)
164
173
165
-
See the License file for Meta Llama 2 [here](https://llama.meta.com/llama2/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama2/use-policy/)
174
+
See the License file for Meta Llama 2 [here](https://github.com/meta-llama/llama-models/blob/main/models/llama2/LICENSE) and Acceptable Use Policy [here](https://github.com/meta-llama/llama-models/blob/main/models/llama2/USE_POLICY.md)
Copy file name to clipboardExpand all lines: docs/LLM_finetuning.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
## LLM Fine-Tuning
2
2
3
-
Here we discuss fine-tuning Meta Llama 3 with a couple of different recipes. We will cover two scenarios here:
3
+
Here we discuss fine-tuning Meta Llama with a couple of different recipes. We will cover two scenarios here:
4
4
5
5
6
6
## 1. **Parameter Efficient Model Fine-Tuning**
@@ -18,8 +18,6 @@ These methods will address three aspects:
18
18
19
19
HF [PEFT](https://github.com/huggingface/peft) library provides an easy way of using these methods which we make use of here. Please read more [here](https://huggingface.co/blog/peft).
20
20
21
-
22
-
23
21
## 2. **Full/ Partial Parameter Fine-Tuning**
24
22
25
23
Full parameter fine-tuning has its own advantages, in this method there are multiple strategies that can help:
Copy file name to clipboardExpand all lines: docs/multi_gpu.md
+19-12Lines changed: 19 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,13 +6,12 @@ To run fine-tuning on multi-GPUs, we will make use of two packages:
6
6
7
7
2.[FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](LLM_finetuning.md/#2-full-partial-parameter-finetuning).
8
8
9
-
Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 3 8B model on multiple GPUs in one node or multi-node.
9
+
Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 8B model on multiple GPUs in one node.
10
+
For big models like 405B we will need to fine-tune in a multi-node setup even if 4bit quantization is enabled.
10
11
11
12
## Requirements
12
13
To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`finetuning.py`](../recipes/quickstart/finetuning/finetuning.py) script with torchrun (See [README.md](../README.md) for details).
13
14
14
-
**Please note that the llama_recipes package will install PyTorch 2.0.1 version, in case you want to run FSDP + PEFT, please make sure to install PyTorch nightlies.**
15
-
16
15
## How to run it
17
16
18
17
Get access to a machine with multiple GPUs ( in this case we tested with 4 A100 and A10s).
@@ -24,7 +23,7 @@ This runs with the `samsum_dataset` for summarization application by default.
@@ -43,7 +42,7 @@ We use `torchrun` here to spawn multiple processes for FSDP.
43
42
Setting `use_fast_kernels` will enable using of Flash Attention or Xformer memory-efficient kernels based on the hardware being used. This would speed up the fine-tuning job. This has been enabled in `optimum` library from HuggingFace as a one-liner API, please read more [here](https://pytorch.org/blog/out-of-the-box-acceleration/).
@@ -182,7 +189,7 @@ It lets us specify the training settings for everything from `model_name` to `da
182
189
183
190
*`fsdp_activation_checkpointing` enables activation checkpoining for FSDP, this saves significant amount of memory with the trade off of recomputing itermediate activations during the backward pass. The saved memory can be re-invested in higher batch sizes to increase the throughput. We recommond you use this option.
184
191
185
-
*`pure_bf16` it moves the model to `BFloat16` and if `optimizer` is set to `anyprecision` then optimizer states will be kept in `BFloat16` as well. You can use this option if necessary.
192
+
*`fsdp_config.pure_bf16` it moves the model to `BFloat16` and if `optimizer` is set to `anyprecision` then optimizer states will be kept in `BFloat16` as well. You can use this option if necessary.
Copy file name to clipboardExpand all lines: docs/single_gpu.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,10 +17,11 @@ To run the examples, make sure to install the llama-recipes package (See [README
17
17
18
18
Get access to a machine with one GPU or if using a multi-GPU machine please make sure to only make one of them visible using `export CUDA_VISIBLE_DEVICES=GPU:id` and run the following. It runs by default with `samsum_dataset` for summarization application.
19
19
20
+
**NOTE** To run the fine-tuning with `QLORA`, make sure to set `--peft_method lora` and `--quantization int4`.
## [Running Llama 3 On-Prem with vLLM and TGI](llama_on_prem.md)
2
-
This tutorial shows how to use Llama 3 with [vLLM](https://github.com/vllm-project/vllm) and Hugging Face [TGI](https://github.com/huggingface/text-generation-inference) to build Llama 3 on-prem apps.
1
+
## Llama-Recipes 3P Integrations
2
+
3
+
This folder contains example scripts showcasing the use of Meta Llama with popular platforms and tooling in the LLM ecosystem.
4
+
5
+
Each folder is maintained by the platform-owner.
6
+
7
+
> [!NOTE]
8
+
> If you'd like to add your platform here, please open a new issue with details of your examples.
0 commit comments