Skip to content

Commit c18a0d2

Browse files
committed
changed dataset to ocrvqa
1 parent bd22f40 commit c18a0d2

File tree

5 files changed

+19
-29
lines changed

5 files changed

+19
-29
lines changed

recipes/quickstart/finetuning/datasets/vqa_dataset.py renamed to recipes/quickstart/finetuning/datasets/ocrvqa_dataset.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,18 +48,19 @@ def tokenize_dialogs(dialogs, images, processor):
4848
labels[i] = -100
4949
label_list.append(labels)
5050
batch["labels"] = torch.tensor(label_list)
51-
tokenizer_length = len(processor.tokenizer)
5251
return batch
5352

5453

5554
def get_custom_dataset(dataset_config, processor, split, split_ratio=0.9):
5655
# load_dataset will return DatasetDict that contains all the data in the train set
57-
dataset_dict = load_dataset("HuggingFaceM4/the_cauldron", name="ai2d")
56+
dataset_dict = load_dataset("HuggingFaceM4/the_cauldron", name="ocrvqa")
5857
dataset = dataset_dict['train']
58+
# Comment out the following line to use the full dataset, for quick testing only use 2000 samples
59+
dataset = dataset.select(range(2000))
5960
dataset = dataset.train_test_split(test_size=1-split_ratio, shuffle=True, seed=42)[split]
6061
return dataset
6162

62-
class VQADataCollator:
63+
class OCRVQADataCollator:
6364
def __init__(self, processor):
6465
self.processor = processor
6566
self.processor.tokenizer.padding_side = "right" # during training, one always uses padding on the right
@@ -88,4 +89,4 @@ def __call__(self, samples):
8889
images.append([image])
8990
return tokenize_dialogs(dialogs,images, self.processor)
9091
def get_data_collator(processor):
91-
return VQADataCollator(processor)
92+
return OCRVQADataCollator(processor)
Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,31 @@
11
## Fine-Tuning Meta Llama Multi Modal Models recipe
2-
This recipe steps you through how to finetune a Llama 3.2 vision model on the VQA task using the [the_cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) dataset.
3-
4-
### Concepts
5-
Model Architecture
6-
Our Meta Llama 3.2 11B and 90B models consist of two main components: (1) an image encoder, (2) an image adapter.
7-
8-
[Model Architecture PICTURE]
9-
10-
We need have a new processor class added, that will handle the image processing and text tokenization. A processor example looks like this:
11-
12-
2+
This recipe steps you through how to finetune a Llama 3.2 vision model on the VQA task using the [OCRVQA](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron/viewer/ocrvqa?row=0) dataset.
133

144
### Fine-tuning steps
155

6+
We created an example script [ocrvqa_dataset.py](./datasets/ocrvqa_dataset.py) that can load the OCRVQA dataset with `get_custom_dataset` function, then provide OCRVQADataCollator class to process the image dataset.
167

178
For **full finetuning with FSDP**, we can run the following code:
9+
1810
```bash
19-
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 3 --batch_size_training 2 --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --dist_checkpoint_root_folder ./finetuned_model --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/vqa_dataset.py" --run_validation True --batching_strategy padding
11+
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --num_epochs 3 --batch_size_training 2 --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --dist_checkpoint_root_folder ./finetuned_model --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/ocrvqa_dataset.py" --run_validation True --batching_strategy padding
2012
```
2113

2214
For **LoRA finetuning with FSDP**, we can run the following code:
15+
2316
```bash
24-
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 3 --batch_size_training 2 --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --dist_checkpoint_root_folder ./finetuned_model --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/vqa_dataset.py" --run_validation True --batching_strategy padding --use_peft --peft_method lora
17+
torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --num_epochs 3 --batch_size_training 2 --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --dist_checkpoint_root_folder ./finetuned_model --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/ocrvqa_dataset.py" --run_validation True --batching_strategy padding --use_peft --peft_method lora
2518
```
2619
**Note**: `--batching_strategy padding` is needed as the vision model will not work with `packing` method.
2720

2821
For more details about the finetuning configurations, please read the [finetuning readme](./README.md).
2922

3023
### How to use custom dataset to fine-tune vision model
3124

32-
1. Create a new dataset python file under `recipes/quickstart/finetuning/dataset` folder
33-
2. In this python file, you need to define a `get_custom_dataset(dataset_config, processor, split, split_ratio=0.9)` function that handles the dataloading.
34-
3. In this python file, you need to define a `get_data_collator(processor)` that returns a custom data collartor that can be used by the Pytorch Data Loader.
25+
In order to use a custom dataset, please follow the steps below:
26+
27+
1. Create a new dataset python file under `recipes/quickstart/finetuning/dataset` folder.
28+
2. In this python file, you need to define a `get_custom_dataset(dataset_config, processor, split, split_ratio=0.9)` function that handles the data loading.
29+
3. In this python file, you need to define a `get_data_collator(processor)` that returns a custom data collator that can be used by the Pytorch Data Loader.
3530
4. This custom data collator class must have a `__call__(self, samples)` function that converts the image and text samples into the actual inputs that vision model expects.
31+
5. Run the `torchrun` commend from above section, please change the `--custom_dataset.file` to the new dataset python file, adjust the learning rate accordingly.

src/llama_recipes/finetuning.py

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,13 @@
1414
FullyShardedDataParallel as FSDP,
1515
ShardingStrategy
1616
)
17-
from torch.distributed.fsdp.wrap import (
18-
always_wrap_policy,
19-
ModuleWrapPolicy,
20-
transformer_auto_wrap_policy,
21-
)
2217
from torch.distributed.fsdp.fully_sharded_data_parallel import CPUOffload
2318
from torch.optim.lr_scheduler import StepLR
2419
from transformers import (
2520
AutoConfig,
2621
AutoTokenizer,
2722
BitsAndBytesConfig,
2823
LlamaForCausalLM,
29-
LlamaConfig,
3024
AutoProcessor,
3125
MllamaForConditionalGeneration
3226
)
@@ -152,7 +146,8 @@ def main(**kwargs):
152146
raise ValueError(f"Model type {config.model_type} is not supported. Please use llama or mllama model.")
153147
# Load the tokenizer and add special tokens
154148
tokenizer = AutoTokenizer.from_pretrained(train_config.model_name if train_config.tokenizer_name is None else train_config.tokenizer_name)
155-
tokenizer.pad_token_id = tokenizer.eos_token_id
149+
if not tokenizer.pad_token_id:
150+
tokenizer.pad_token_id = tokenizer.eos_token_id
156151

157152
# If there is a mismatch between tokenizer vocab size and embedding matrix,
158153
# throw a warning and then expand the embedding matrix

src/llama_recipes/utils/config_utils.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@
1414
)
1515
from transformers import default_data_collator
1616
from transformers.data import DataCollatorForSeq2Seq
17-
from functools import partial
1817

1918
from llama_recipes.configs import datasets, lora_config, llama_adapter_config, prefix_config, train_config
2019
from llama_recipes.data.sampler import LengthBasedBatchSampler, DistributedLengthBasedBatchSampler

src/llama_recipes/utils/train_utils.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -360,7 +360,6 @@ def evaluation(model,train_config, eval_dataloader, local_rank, tokenizer, wandb
360360
# Ensure no gradients are computed for this scope to save memory
361361
with torch.no_grad():
362362
# Forward pass and compute loss
363-
#outputs = model(**batch,use_cache=False)
364363
outputs = model(**batch)
365364
loss = outputs.loss
366365
if train_config.save_metrics:

0 commit comments

Comments
 (0)