Skip to content

Commit 91fd013

Browse files
authored
Merge branch 'main' into raft
2 parents 4756ffb + bc936dd commit 91fd013

File tree

75 files changed

+13872
-364
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+13872
-364
lines changed

.github/scripts/spellcheck_conf/wordlist.txt

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1351,6 +1351,12 @@ Weaviate
13511351
MediaGen
13521352
SDXL
13531353
SVD
1354+
QLORA
1355+
Agentic
1356+
AutoGen
1357+
DeepLearning
1358+
Deeplearning
1359+
Llamaindex
13541360
KV
13551361
KVs
13561362
XSUM
@@ -1407,3 +1413,22 @@ numRefusal
14071413
totalQA
14081414
DirectoryLoader
14091415
SitemapLoader
1416+
nf
1417+
quant
1418+
DLAI
1419+
agentic
1420+
containts
1421+
dlai
1422+
Prerequirements
1423+
tp
1424+
QLoRA
1425+
ntasks
1426+
srun
1427+
xH
1428+
unquantized
1429+
eom
1430+
ipython
1431+
CPUs
1432+
modelUpgradeExample
1433+
guardrailing
1434+

README.md

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,38 @@
11
# Llama Recipes: Examples to get started using the Llama models from Meta
22
<!-- markdown-link-check-disable -->
3-
The 'llama-recipes' repository is a companion to the [Meta Llama 3](https://github.com/meta-llama/llama3) models. The goal of this repository is to provide a scalable library for fine-tuning Meta Llama models, along with some example scripts and notebooks to quickly get started with using the models in a variety of use-cases, including fine-tuning for domain adaptation and building LLM-based applications with Meta Llama and other tools in the LLM ecosystem. The examples here showcase how to run Meta Llama locally, in the cloud, and on-prem. [Meta Llama 2](https://github.com/meta-llama/llama) is also supported in this repository. We highly recommend everyone to utilize [Meta Llama 3](https://github.com/meta-llama/llama3) due to its enhanced capabilities.
3+
The 'llama-recipes' repository is a companion to the [Meta Llama](https://github.com/meta-llama/llama-models) models. We support the latest version, [Llama 3.1](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md), in this repository. The goal is to provide a scalable library for fine-tuning Meta Llama models, along with some example scripts and notebooks to quickly get started with using the models in a variety of use-cases, including fine-tuning for domain adaptation and building LLM-based applications with Llama and other tools in the LLM ecosystem. The examples here showcase how to run Llama locally, in the cloud, and on-prem.
44

55
<!-- markdown-link-check-enable -->
66
> [!IMPORTANT]
7-
> Meta Llama 3 has a new prompt template and special tokens (based on the tiktoken tokenizer).
7+
> Meta Llama 3.1 has a new prompt template and special tokens.
88
> | Token | Description |
99
> |---|---|
10-
> `<\|begin_of_text\|>` | This is equivalent to the BOS token. |
11-
> `<\|end_of_text\|>` | This is equivalent to the EOS token. For multiturn-conversations it's usually unused. Instead, every message is terminated with `<\|eot_id\|>` instead.|
12-
> `<\|eot_id\|>` | This token signifies the end of the message in a turn i.e. the end of a single message by a system, user or assistant role as shown below.|
13-
> `<\|start_header_id\|>{role}<\|end_header_id\|>` | These tokens enclose the role for a particular message. The possible roles can be: system, user, assistant. |
10+
> `<\|begin_of_text\|>` | Specifies the start of the prompt. |
11+
> `<\|eot_id\|>` | This token signifies the end of a turn i.e. the end of the model's interaction either with the user or tool executor. |
12+
> `<\|eom_id\|>` | End of Message. A message represents a possible stopping point where the model can inform the execution environment that a tool call needs to be made. |
13+
> `<\|python_tag\|>` | A special tag used in the model’s response to signify a tool call. |
14+
> `<\|finetune_right_pad_id\|>` | Used for padding text sequences in a batch to the same length. |
15+
> `<\|start_header_id\|>{role}<\|end_header_id\|>` | These tokens enclose the role for a particular message. The possible roles can be: system, user, assistant and ipython. |
16+
> `<\|end_of_text\|>` | This is equivalent to the EOS token. For multiturn-conversations it's usually unused, this token is expected to be generated only by the base models. |
1417
>
15-
> A multiturn-conversation with Meta Llama 3 follows this prompt template:
18+
> A multiturn-conversation with Meta Llama 3.1 that includes tool-calling follows this structure:
1619
> ```
1720
> <|begin_of_text|><|start_header_id|>system<|end_header_id|>
1821
>
1922
> {{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
2023
>
2124
> {{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
2225
>
23-
> {{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>
26+
> <|python_tag|>{{ model_tool_call_1 }}<|eom_id|><|start_header_id|>ipython<|end_header_id|>
2427
>
25-
> {{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
28+
> {{ tool_response }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
29+
>
30+
> {{model_response_based_on_tool_response}}<|eot_id|>
2631
> ```
2732
> Each message gets trailed by an `<|eot_id|>` token before a new header is started, signaling a role change.
2833
>
29-
> More details on the new tokenizer and prompt template can be found [here](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3#special-tokens-used-with-meta-llama-3).
34+
> More details on the new tokenizer and prompt template can be found [here](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1).
35+
3036
>
3137
> [!NOTE]
3238
> The llama-recipes repository was recently refactored to promote a better developer experience of using the examples. Some files have been moved to new locations. The `src/` folder has NOT been modified, so the functionality of this repo and package is not impacted.
@@ -139,6 +145,7 @@ Contains examples are organized in folders by topic:
139145
[use_cases](./recipes/use_cases)|Scripts showing common applications of Meta Llama3
140146
[3p_integrations](./recipes/3p_integrations)|Partner owned folder showing common applications of Meta Llama3
141147
[responsible_ai](./recipes/responsible_ai)|Scripts to use PurpleLlama for safeguarding model outputs
148+
[experimental](./recipes/experimental)|Meta Llama implementations of experimental LLM techniques
142149

143150
### `src/`
144151

@@ -160,7 +167,9 @@ Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduc
160167
## License
161168
<!-- markdown-link-check-disable -->
162169

163-
See the License file for Meta Llama 3 [here](https://llama.meta.com/llama3/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama3/use-policy/)
170+
See the License file for Meta Llama 3.1 [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) and Acceptable Use Policy [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/USE_POLICY.md)
171+
172+
See the License file for Meta Llama 3 [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3/LICENSE) and Acceptable Use Policy [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3/USE_POLICY.md)
164173

165-
See the License file for Meta Llama 2 [here](https://llama.meta.com/llama2/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama2/use-policy/)
174+
See the License file for Meta Llama 2 [here](https://github.com/meta-llama/llama-models/blob/main/models/llama2/LICENSE) and Acceptable Use Policy [here](https://github.com/meta-llama/llama-models/blob/main/models/llama2/USE_POLICY.md)
166175
<!-- markdown-link-check-enable -->

docs/LLM_finetuning.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## LLM Fine-Tuning
22

3-
Here we discuss fine-tuning Meta Llama 3 with a couple of different recipes. We will cover two scenarios here:
3+
Here we discuss fine-tuning Meta Llama with a couple of different recipes. We will cover two scenarios here:
44

55

66
## 1. **Parameter Efficient Model Fine-Tuning**
@@ -18,8 +18,6 @@ These methods will address three aspects:
1818

1919
HF [PEFT](https://github.com/huggingface/peft) library provides an easy way of using these methods which we make use of here. Please read more [here](https://huggingface.co/blog/peft).
2020

21-
22-
2321
## 2. **Full/ Partial Parameter Fine-Tuning**
2422

2523
Full parameter fine-tuning has its own advantages, in this method there are multiple strategies that can help:

docs/multi_gpu.md

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,12 @@ To run fine-tuning on multi-GPUs, we will make use of two packages:
66

77
2. [FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](LLM_finetuning.md/#2-full-partial-parameter-finetuning).
88

9-
Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 3 8B model on multiple GPUs in one node or multi-node.
9+
Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 8B model on multiple GPUs in one node.
10+
For big models like 405B we will need to fine-tune in a multi-node setup even if 4bit quantization is enabled.
1011

1112
## Requirements
1213
To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`finetuning.py`](../recipes/quickstart/finetuning/finetuning.py) script with torchrun (See [README.md](../README.md) for details).
1314

14-
**Please note that the llama_recipes package will install PyTorch 2.0.1 version, in case you want to run FSDP + PEFT, please make sure to install PyTorch nightlies.**
15-
1615
## How to run it
1716

1817
Get access to a machine with multiple GPUs ( in this case we tested with 4 A100 and A10s).
@@ -24,7 +23,7 @@ This runs with the `samsum_dataset` for summarization application by default.
2423

2524
```bash
2625

27-
torchrun --nnodes 1 --nproc_per_node 4 examples/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model
26+
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model
2827

2928
```
3029

@@ -43,7 +42,7 @@ We use `torchrun` here to spawn multiple processes for FSDP.
4342
Setting `use_fast_kernels` will enable using of Flash Attention or Xformer memory-efficient kernels based on the hardware being used. This would speed up the fine-tuning job. This has been enabled in `optimum` library from HuggingFace as a one-liner API, please read more [here](https://pytorch.org/blog/out-of-the-box-acceleration/).
4443

4544
```bash
46-
torchrun --nnodes 1 --nproc_per_node 4 examples/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model --use_fast_kernels
45+
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model --use_fast_kernels
4746
```
4847

4948
### Fine-tuning using FSDP Only
@@ -52,8 +51,16 @@ If interested in running full parameter finetuning without making use of PEFT me
5251

5352
```bash
5453

55-
torchrun --nnodes 1 --nproc_per_node 8 examples/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --use_fast_kernels
54+
torchrun --nnodes 1 --nproc_per_node 8 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --fsdp_config.pure_bf16 --use_fast_kernels
55+
56+
```
57+
58+
### Fine-tuning using FSDP + QLORA
59+
60+
This has been tested on 4 H100s GPUs.
5661

62+
```bash
63+
FSDP_CPU_RAM_EFFICIENT_LOADING=1 ACCELERATE_USE_FSDP=1 torchrun --nnodes 1 --nproc_per_node 4 finetuning.py --enable_fsdp --quantization 4bit --model_name /path_of_model_folder/70B --mixed_precision False --low_cpu_fsdp --use_peft --peft_method lora --output_dir Path/to/save/PEFT/model
5764
```
5865

5966
### Fine-tuning using FSDP on 70B Model
@@ -62,7 +69,7 @@ If you are interested in running full parameter fine-tuning on the 70B model, yo
6269

6370
```bash
6471

65-
torchrun --nnodes 1 --nproc_per_node 8 examples/finetuning.py --enable_fsdp --low_cpu_fsdp --pure_bf16 --model_name /path_of_model_folder/70B --batch_size_training 1 --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned
72+
torchrun --nnodes 1 --nproc_per_node 8 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --low_cpu_fsdp --fsdp_config.pure_bf16 --model_name /path_of_model_folder/70B --batch_size_training 1 --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned
6673

6774
```
6875

@@ -72,7 +79,7 @@ Here we use a slurm script to schedule a job with slurm over multiple nodes.
7279

7380
```bash
7481

75-
sbatch examples/multi_node.slurm
82+
sbatch recipes/quickstart/finetuning/multi_node.slurm
7683
# Change the num nodes and GPU per nodes in the script before running.
7784

7885
```
@@ -95,16 +102,16 @@ To run with each of the datasets set the `dataset` flag in the command as shown
95102

96103
```bash
97104
# grammer_dataset
98-
torchrun --nnodes 1 --nproc_per_node 4 examples/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --use_peft --peft_method lora --dataset grammar_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model
105+
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --use_peft --peft_method lora --dataset grammar_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --fsdp_config.pure_bf16 --output_dir Path/to/save/PEFT/model
99106

100107
# alpaca_dataset
101108

102-
torchrun --nnodes 1 --nproc_per_node 4 examples/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --use_peft --peft_method lora --dataset alpaca_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model
109+
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --use_peft --peft_method lora --dataset alpaca_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --fsdp_config.pure_bf16 --output_dir Path/to/save/PEFT/model
103110

104111

105112
# samsum_dataset
106113

107-
torchrun --nnodes 1 --nproc_per_node 4 examples/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --use_peft --peft_method lora --dataset samsum_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --pure_bf16 --output_dir Path/to/save/PEFT/model
114+
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --model_name /path_of_model_folder/8B --use_peft --peft_method lora --dataset samsum_dataset --save_model --dist_checkpoint_root_folder model_checkpoints --dist_checkpoint_folder fine-tuned --fsdp_config.pure_bf16 --output_dir Path/to/save/PEFT/model
108115

109116
```
110117

@@ -182,7 +189,7 @@ It lets us specify the training settings for everything from `model_name` to `da
182189

183190
* `fsdp_activation_checkpointing` enables activation checkpoining for FSDP, this saves significant amount of memory with the trade off of recomputing itermediate activations during the backward pass. The saved memory can be re-invested in higher batch sizes to increase the throughput. We recommond you use this option.
184191

185-
* `pure_bf16` it moves the model to `BFloat16` and if `optimizer` is set to `anyprecision` then optimizer states will be kept in `BFloat16` as well. You can use this option if necessary.
192+
* `fsdp_config.pure_bf16` it moves the model to `BFloat16` and if `optimizer` is set to `anyprecision` then optimizer states will be kept in `BFloat16` as well. You can use this option if necessary.
186193

187194
## FLOPS Counting and Pytorch Profiling
188195

docs/single_gpu.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,11 @@ To run the examples, make sure to install the llama-recipes package (See [README
1717

1818
Get access to a machine with one GPU or if using a multi-GPU machine please make sure to only make one of them visible using `export CUDA_VISIBLE_DEVICES=GPU:id` and run the following. It runs by default with `samsum_dataset` for summarization application.
1919

20+
**NOTE** To run the fine-tuning with `QLORA`, make sure to set `--peft_method lora` and `--quantization int4`.
2021

2122
```bash
2223

23-
python -m llama_recipes.finetuning --use_peft --peft_method lora --quantization --use_fp16 --model_name /path_of_model_folder/8B --output_dir Path/to/save/PEFT/model
24+
python -m llama_recipes.finetuning --use_peft --peft_method lora --quantization 8bit --use_fp16 --model_name /path_of_model_folder/8B --output_dir Path/to/save/PEFT/model
2425

2526
```
2627
The args used in the command above are:
@@ -51,16 +52,16 @@ to run with each of the datasets set the `dataset` flag in the command as shown
5152
```bash
5253
# grammer_dataset
5354

54-
python -m llama_recipes.finetuning --use_peft --peft_method lora --quantization --dataset grammar_dataset --model_name /path_of_model_folder/8B --output_dir Path/to/save/PEFT/model
55+
python -m llama_recipes.finetuning --use_peft --peft_method lora --quantization 8bit --dataset grammar_dataset --model_name /path_of_model_folder/8B --output_dir Path/to/save/PEFT/model
5556

5657
# alpaca_dataset
5758

58-
python -m llama_recipes.finetuning --use_peft --peft_method lora --quantization --dataset alpaca_dataset --model_name /path_of_model_folder/8B --output_dir Path/to/save/PEFT/model
59+
python -m llama_recipes.finetuning --use_peft --peft_method lora --quantization 8bit --dataset alpaca_dataset --model_name /path_of_model_folder/8B --output_dir Path/to/save/PEFT/model
5960

6061

6162
# samsum_dataset
6263

63-
python -m llama_recipes.finetuning --use_peft --peft_method lora --quantization --dataset samsum_dataset --model_name /path_of_model_folder/8B --output_dir Path/to/save/PEFT/model
64+
python -m llama_recipes.finetuning --use_peft --peft_method lora --quantization 8bit --dataset samsum_dataset --model_name /path_of_model_folder/8B --output_dir Path/to/save/PEFT/model
6465

6566
```
6667

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "llama-recipes"
7-
version = "0.0.2"
7+
version = "0.0.3"
88
authors = [
99
{ name="Hamid Shojanazeri", email="[email protected]" },
1010
{ name="Matthias Reso", email="[email protected]" },

recipes/3p_integrations/README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,8 @@
1-
## [Running Llama 3 On-Prem with vLLM and TGI](llama_on_prem.md)
2-
This tutorial shows how to use Llama 3 with [vLLM](https://github.com/vllm-project/vllm) and Hugging Face [TGI](https://github.com/huggingface/text-generation-inference) to build Llama 3 on-prem apps.
1+
## Llama-Recipes 3P Integrations
2+
3+
This folder contains example scripts showcasing the use of Meta Llama with popular platforms and tooling in the LLM ecosystem.
4+
5+
Each folder is maintained by the platform-owner.
6+
7+
> [!NOTE]
8+
> If you'd like to add your platform here, please open a new issue with details of your examples.

0 commit comments

Comments
 (0)