Skip to content

Commit b013d27

Browse files
committed
merged from main
2 parents 576e574 + cf83593 commit b013d27

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+4709
-1321
lines changed

.github/scripts/spellcheck_conf/wordlist.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1458,3 +1458,11 @@ Multistep
14581458
multistep
14591459
algorithmically
14601460
asymptote
1461+
Triaging
1462+
matplotlib
1463+
remediations
1464+
walkthrough
1465+
OCRVQA
1466+
OCRVQADataCollator
1467+
ocrvqa
1468+
langchain

.watchman-cookie-devgpu003.cco3.facebook.com-3137776-2746

Whitespace-only changes.

README.md

Lines changed: 25 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,25 @@
11
# Llama Recipes: Examples to get started using the Llama models from Meta
22
<!-- markdown-link-check-disable -->
3-
The 'llama-recipes' repository is a companion to the [Meta Llama](https://github.com/meta-llama/llama-models) models. We support the latest version, [Llama 3.1](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md), in this repository. The goal is to provide a scalable library for fine-tuning Meta Llama models, along with some example scripts and notebooks to quickly get started with using the models in a variety of use-cases, including fine-tuning for domain adaptation and building LLM-based applications with Llama and other tools in the LLM ecosystem. The examples here showcase how to run Llama locally, in the cloud, and on-prem.
3+
The 'llama-recipes' repository is a companion to the [Meta Llama](https://github.com/meta-llama/llama-models) models. We support the latest version, [Llama 3.2 Vision](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md) and [Llama 3.2 Text](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md), in this repository. This repository contains example scripts and notebooks to get started with the models in a variety of use-cases, including fine-tuning for domain adaptation and building LLM-based applications with Llama and other tools in the LLM ecosystem. The examples here use Llama locally, in the cloud, and on-prem.
4+
5+
> [!TIP]
6+
> Get started with Llama 3.2 with these new recipes:
7+
> * [Finetune Llama 3.2 Vision](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/finetuning/finetune_vision_model.md)
8+
> * [Multimodal Inference with Llama 3.2 Vision](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/inference/local_inference/README.md#multimodal-inference)
9+
> * [Inference on Llama Guard 1B + Multimodal inference on Llama Guard 11B-Vision](https://github.com/meta-llama/llama-recipes/blob/main/recipes/responsible_ai/llama_guard/llama_guard_text_and_vision_inference.ipynb)
10+
411

512
<!-- markdown-link-check-enable -->
6-
> [!IMPORTANT]
7-
> Meta Llama 3.1 has a new prompt template and special tokens.
8-
> | Token | Description |
9-
> |---|---|
10-
> `<\|begin_of_text\|>` | Specifies the start of the prompt. |
11-
> `<\|eot_id\|>` | This token signifies the end of a turn i.e. the end of the model's interaction either with the user or tool executor. |
12-
> `<\|eom_id\|>` | End of Message. A message represents a possible stopping point where the model can inform the execution environment that a tool call needs to be made. |
13-
> `<\|python_tag\|>` | A special tag used in the model’s response to signify a tool call. |
14-
> `<\|finetune_right_pad_id\|>` | Used for padding text sequences in a batch to the same length. |
15-
> `<\|start_header_id\|>{role}<\|end_header_id\|>` | These tokens enclose the role for a particular message. The possible roles can be: system, user, assistant and ipython. |
16-
> `<\|end_of_text\|>` | This is equivalent to the EOS token. For multiturn-conversations it's usually unused, this token is expected to be generated only by the base models. |
17-
>
18-
> A multiturn-conversation with Meta Llama 3.1 that includes tool-calling follows this structure:
19-
> ```
20-
> <|begin_of_text|><|start_header_id|>system<|end_header_id|>
21-
>
22-
> {{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
23-
>
24-
> {{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
25-
>
26-
> <|python_tag|>{{ model_tool_call_1 }}<|eom_id|><|start_header_id|>ipython<|end_header_id|>
27-
>
28-
> {{ tool_response }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
29-
>
30-
> {{model_response_based_on_tool_response}}<|eot_id|>
31-
> ```
32-
> Each message gets trailed by an `<|eot_id|>` token before a new header is started, signaling a role change.
33-
>
34-
> More details on the new tokenizer and prompt template can be found [here](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1).
35-
36-
>
3713
> [!NOTE]
38-
> The llama-recipes repository was recently refactored to promote a better developer experience of using the examples. Some files have been moved to new locations. The `src/` folder has NOT been modified, so the functionality of this repo and package is not impacted.
39-
>
40-
> Make sure you update your local clone by running `git pull origin main`
14+
> Llama 3.2 follows the same prompt template as Llama 3.1, with a new special token `<|image|>` representing the input image for the multimodal models.
15+
>
16+
> More details on the prompt templates for image reasoning, tool-calling and code interpreter can be found [on the documentation website](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_2).
17+
18+
4119

4220
## Table of Contents
4321

44-
- [Llama Recipes: Examples to get started using the Meta Llama models from Meta](#llama-recipes-examples-to-get-started-using-the-llama-models-from-meta)
22+
- [Llama Recipes: Examples to get started using the Llama models from Meta](#llama-recipes-examples-to-get-started-using-the-llama-models-from-meta)
4523
- [Table of Contents](#table-of-contents)
4624
- [Getting Started](#getting-started)
4725
- [Prerequisites](#prerequisites)
@@ -94,6 +72,10 @@ To use the sensitive topics safety checker install with:
9472
```
9573
pip install llama-recipes[auditnlg]
9674
```
75+
Some recipes require the presence of langchain. To install the packages follow the recipe description or install with:
76+
```
77+
pip install llama-recipes[langchain]
78+
```
9779
Optional dependencies can also be combines with [option1,option2].
9880

9981
#### Install from source
@@ -113,23 +95,21 @@ pip install -e .[tests,auditnlg,vllm]
11395
```
11496

11597

116-
### Getting the Meta Llama models
117-
You can find Meta Llama models on Hugging Face hub [here](https://huggingface.co/meta-llama), **where models with `hf` in the name are already converted to Hugging Face checkpoints so no further conversion is needed**. The conversion step below is only for original model weights from Meta that are hosted on Hugging Face model hub as well.
98+
### Getting the Llama models
99+
You can find Llama models on Hugging Face hub [here](https://huggingface.co/meta-llama), **where models with `hf` in the name are already converted to Hugging Face checkpoints so no further conversion is needed**. The conversion step below is only for original model weights from Meta that are hosted on Hugging Face model hub as well.
118100

119101
#### Model conversion to Hugging Face
120-
The recipes and notebooks in this folder are using the Meta Llama model definition provided by Hugging Face's transformers library.
121-
122-
Given that the original checkpoint resides under models/7B you can install all requirements and convert the checkpoint with:
102+
If you have the model checkpoints downloaded from the Meta website, you can convert it to the Hugging Face format with:
123103

124104
```bash
125105
## Install Hugging Face Transformers from source
126-
pip freeze | grep transformers ## verify it is version 4.31.0 or higher
106+
pip freeze | grep transformers ## verify it is version 4.45.0 or higher
127107

128108
git clone [email protected]:huggingface/transformers.git
129109
cd transformers
130110
pip install protobuf
131111
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
132-
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path
112+
--input_dir /path/to/downloaded/llama/weights --model_size 3B --output_dir /output/path
133113
```
134114

135115

@@ -192,6 +172,8 @@ Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduc
192172
## License
193173
<!-- markdown-link-check-disable -->
194174

175+
See the License file for Meta Llama 3.2 [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE) and Acceptable Use Policy [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/USE_POLICY.md)
176+
195177
See the License file for Meta Llama 3.1 [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) and Acceptable Use Policy [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/USE_POLICY.md)
196178

197179
See the License file for Meta Llama 3 [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3/LICENSE) and Acceptable Use Policy [here](https://github.com/meta-llama/llama-models/blob/main/models/llama3/USE_POLICY.md)

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "llama-recipes"
7-
version = "0.0.3"
7+
version = "0.0.4.post1"
88
authors = [
99
{ name="Hamid Shojanazeri", email="[email protected]" },
1010
{ name="Matthias Reso", email="[email protected]" },
1111
{ name="Geeta Chauhan", email="[email protected]" },
1212
]
13-
description = "Llama-recipes is a companion project to the Llama 2 model. It's goal is to provide examples to quickly get started with fine-tuning for domain adaptation and how to run inference for the fine-tuned models. "
13+
description = "Llama-recipes is a companion project to the Llama models. It's goal is to provide examples to quickly get started with fine-tuning for domain adaptation and how to run inference for the fine-tuned models."
1414
readme = "README.md"
1515
requires-python = ">=3.8"
1616
classifiers = [
@@ -24,6 +24,7 @@ dynamic = ["dependencies"]
2424
vllm = ["vllm"]
2525
tests = ["pytest-mock"]
2626
auditnlg = ["auditnlg"]
27+
langchain = ["langchain_openai", "langchain", "langchain_community"]
2728

2829
[project.urls]
2930
"Homepage" = "https://github.com/facebookresearch/llama-recipes/"

recipes/3p_integrations/lamini/text2sql_memory_tuning/util/get_default_finetune_args.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
def get_default_finetune_args():
22
return {
3-
"learning_rate": 3e-4,
4-
"max_steps": 360,
3+
"learning_rate": 0.0003,
4+
"max_steps": 60,
55
"early_stopping": False,
66
"load_best_model_at_end": False,
77
"peft_args": {"r_value": 32},

recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@
8181
"\n",
8282
"def llama3(prompt):\n",
8383
" data = {\n",
84-
" \"model\": \"llama3\",\n",
84+
" \"model\": \"llama3.1\",\n",
8585
" \"messages\": [\n",
8686
" {\n",
8787
" \"role\": \"user\",\n",

recipes/quickstart/finetuning/datasets/custom_dataset.py

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,19 +9,30 @@
99

1010

1111
B_INST, E_INST = "[INST]", "[/INST]"
12+
EOT_ID = 128009 #<|eot_id|>
13+
14+
def mask_target(target,seq):
15+
for i in range(len(seq)-len(target)):
16+
if seq[i:i+len(target)] == target:
17+
seq[i:i+len(target)] = [-100] * len(target)
18+
return seq
1219

1320
def tokenize_dialog(dialog, tokenizer):
1421
if tokenizer.vocab_size >= 128000:
1522
dialog_tokens = tokenizer.apply_chat_template(dialog)
16-
dialog_tokens = dialog_tokens[:-4] # Remove generation prompt <|start_header_id|>assistant<|end_header_id|>\n\n
17-
eot_indices = [i for i,n in enumerate(dialog_tokens) if n == 128009]
23+
eot_indices = [i for i,n in enumerate(dialog_tokens) if n == EOT_ID]
1824
labels = copy.copy(dialog_tokens)
19-
last_idx = 0
25+
#determine token for system and user
26+
system_or_user = (tokenizer.encode("system")[-1], tokenizer.encode("user")[-1])
27+
labels[0] = -100 # bos token
28+
last_idx = 1
2029
for n, idx in enumerate(eot_indices):
21-
if n % 2 == 1:
22-
last_idx = idx
23-
else:
30+
role_token = labels[last_idx+1]
31+
if role_token in system_or_user:
32+
# Set labels to -100 for system and user tokens to ignore in loss function
2433
labels[last_idx:idx+1] = [-100] * (idx-last_idx+1)
34+
last_idx = idx + 1
35+
mask_target(tokenizer.encode("<|start_header_id|>assistant<|end_header_id|>", add_special_tokens=False), labels)
2536

2637
dialog_tokens = [dialog_tokens]
2738
labels_tokens = [labels]
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# This software may be used and distributed according to the terms of the Llama 3 Community License Agreement.
3+
4+
5+
import copy
6+
from datasets import load_dataset
7+
import itertools
8+
import torch
9+
10+
# check system prompt token seq or user prompt token seq is in the current token list
11+
def check_header(targets,seq):
12+
for i in range(len(seq)-3):
13+
if seq[i:i+3] in targets:
14+
return True
15+
return False
16+
def replace_target(target,seq):
17+
for i in range(len(seq)-3):
18+
if seq[i:i+3] == target:
19+
seq[i],seq[i+1],seq[i+2] = -100,-100,-100
20+
return seq
21+
def tokenize_dialogs(dialogs, images, processor):
22+
text_prompt = processor.apply_chat_template(dialogs)
23+
batch = processor(images=images, text=text_prompt,padding = True, return_tensors="pt")
24+
label_list = []
25+
for i in range(len(batch["input_ids"])):
26+
dialog_tokens = batch["input_ids"][i].tolist()
27+
labels = copy.copy(dialog_tokens)
28+
eot_indices = [i for i,n in enumerate(labels) if n == 128009]
29+
last_idx = 0
30+
# system prompt header "<|start_header_id|>system<|end_header_id|>" has been tokenized to [128006, 9125, 128007]
31+
# user prompt header "<|start_header_id|>user<|end_header_id|>" has been tokenized to [128006, 882, 128007]
32+
prompt_header_seqs = [[128006, 9125, 128007],[128006, 882, 128007]]
33+
for n, idx in enumerate(eot_indices):
34+
current_seq = labels[last_idx:idx+1]
35+
if check_header(prompt_header_seqs,current_seq):
36+
# found prompt header, indicating that this seq should be masked
37+
labels[last_idx:idx+1] = [-100] * (idx-last_idx+1)
38+
else:
39+
last_idx = idx+1
40+
# Mask all the assistant header prompt <|start_header_id|>assistant<|end_header_id|>, which has been tokenized to [128006, 78191, 128007]
41+
assistant_header_seq = [128006, 78191, 128007]
42+
labels = replace_target(assistant_header_seq,labels)
43+
# Mask the padding token and image token 128256
44+
for i in range(len(labels)):
45+
if labels[i] == processor.tokenizer.pad_token_id or labels[i] == 128256: # 128256 is image token index
46+
labels[i] = -100
47+
label_list.append(labels)
48+
batch["labels"] = torch.tensor(label_list)
49+
return batch
50+
51+
52+
def get_custom_dataset(dataset_config, processor, split, split_ratio=0.9):
53+
# load_dataset will return DatasetDict that contains all the data in the train set
54+
dataset_dict = load_dataset("HuggingFaceM4/the_cauldron", name="ocrvqa")
55+
dataset = dataset_dict['train']
56+
# Comment out the following line to use the full dataset, for quick testing only use 2000 samples
57+
dataset = dataset.select(range(2000))
58+
dataset = dataset.train_test_split(test_size=1-split_ratio, shuffle=True, seed=42)[split]
59+
return dataset
60+
61+
class OCRVQADataCollator:
62+
def __init__(self, processor):
63+
self.processor = processor
64+
self.processor.tokenizer.padding_side = "right" # during training, one always uses padding on the right
65+
def __call__(self, samples):
66+
dialogs,images = [],[]
67+
for sample in samples:
68+
image_list,sample_list = sample["images"],sample["texts"]
69+
if len(image_list) > 1:
70+
raise ValueError("Only support one image per sample")
71+
image = image_list[0].convert("RGB") # only use the first image
72+
dialog = []
73+
for sample_dict in sample_list:
74+
if not dialog:
75+
# only append image to the first sentence
76+
dialog += [
77+
{"role":"user","content":[{"type": "image"},{"type": "text", "text": sample_dict["user"].strip()}]},
78+
{"role":"assistant","content":[{"type": "text", "text": sample_dict["assistant"].strip()}]}
79+
]
80+
81+
else:
82+
dialog += [
83+
{"role":"user","content":[{"type": "text", "text": sample_dict["user"].strip()}]},
84+
{"role":"assistant","content":[{"type": "text", "text": sample_dict["assistant"].strip()}]}
85+
]
86+
dialogs.append(dialog)
87+
images.append([image])
88+
return tokenize_dialogs(dialogs,images, self.processor)
89+
def get_data_collator(processor):
90+
return OCRVQADataCollator(processor)
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
## Llama 3.2 Vision Models Fine-Tuning Recipe
2+
This recipe steps you through how to finetune a Llama 3.2 vision model on the OCR VQA task using the [OCRVQA](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron/viewer/ocrvqa?row=0) dataset.
3+
4+
**Disclaimer**: As our vision models already have a very good OCR ability, here we use the OCRVQA dataset only for demonstration purposes of the required steps for fine-tuning our vision models with llama-recipes.
5+
6+
### Fine-tuning steps
7+
8+
We created an example script [ocrvqa_dataset.py](./datasets/ocrvqa_dataset.py) that can load the OCRVQA dataset with `get_custom_dataset` function, then provide OCRVQADataCollator class to process the image dataset.
9+
10+
For **full finetuning with FSDP**, we can run the following code:
11+
12+
```bash
13+
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --num_epochs 3 --batch_size_training 2 --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --dist_checkpoint_root_folder ./finetuned_model --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/ocrvqa_dataset.py" --run_validation True --batching_strategy padding
14+
```
15+
16+
For **LoRA finetuning with FSDP**, we can run the following code:
17+
18+
```bash
19+
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --num_epochs 3 --batch_size_training 2 --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --dist_checkpoint_root_folder ./finetuned_model --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/ocrvqa_dataset.py" --run_validation True --batching_strategy padding --use_peft --peft_method lora
20+
```
21+
**Note**: `--batching_strategy padding` is needed as the vision model will not work with `packing` method.
22+
23+
For more details about the finetuning configurations, please read the [finetuning readme](./README.md).
24+
25+
### How to use a custom dataset to fine-tune vision model
26+
27+
In order to use a custom dataset, please follow the steps below:
28+
29+
1. Create a new dataset python file under `recipes/quickstart/finetuning/dataset` folder.
30+
2. In this python file, you need to define a `get_custom_dataset(dataset_config, processor, split, split_ratio=0.9)` function that handles the data loading.
31+
3. In this python file, you need to define a `get_data_collator(processor)` that returns a custom data collator that can be used by the Pytorch Data Loader.
32+
4. This custom data collator class must have a `__call__(self, samples)` function that converts the image and text samples into the actual inputs that vision model expects.
33+
5. Run the `torchrun` commend from above section, please change the `--custom_dataset.file` to the new dataset python file, adjust the learning rate accordingly.

0 commit comments

Comments
 (0)