|
1 | 1 | ## Fine-Tuning Meta Llama Multi Modal Models recipe
|
2 |
| -This recipe steps you through how to finetune a Llama 3.2 vision model on the VQA task using the [the_cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) dataset. |
3 |
| - |
4 |
| -### Concepts |
5 |
| -Model Architecture |
6 |
| -Our Meta Llama 3.2 11B and 90B models consist of two main components: (1) an image encoder, (2) an image adapter. |
7 |
| - |
8 |
| -[Model Architecture PICTURE] |
9 |
| - |
10 |
| -We need have a new processor class added, that will handle the image processing and text tokenization. A processor example looks like this: |
11 |
| - |
12 |
| - |
| 2 | +This recipe steps you through how to finetune a Llama 3.2 vision model on the VQA task using the [OCRVQA](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron/viewer/ocrvqa?row=0) dataset. |
13 | 3 |
|
14 | 4 | ### Fine-tuning steps
|
15 | 5 |
|
| 6 | +We created an example script [ocrvqa_dataset.py](./datasets/ocrvqa_dataset.py) that can load the OCRVQA dataset with `get_custom_dataset` function, then provide OCRVQADataCollator class to process the image dataset. |
16 | 7 |
|
17 | 8 | For **full finetuning with FSDP**, we can run the following code:
|
| 9 | + |
18 | 10 | ```bash
|
19 |
| - torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 3 --batch_size_training 2 --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --dist_checkpoint_root_folder ./finetuned_model --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/vqa_dataset.py" --run_validation True --batching_strategy padding |
| 11 | + torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --num_epochs 3 --batch_size_training 2 --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --dist_checkpoint_root_folder ./finetuned_model --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/ocrvqa_dataset.py" --run_validation True --batching_strategy padding |
20 | 12 | ```
|
21 | 13 |
|
22 | 14 | For **LoRA finetuning with FSDP**, we can run the following code:
|
| 15 | + |
23 | 16 | ```bash
|
24 |
| - torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 3 --batch_size_training 2 --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --dist_checkpoint_root_folder ./finetuned_model --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/vqa_dataset.py" --run_validation True --batching_strategy padding --use_peft --peft_method lora |
| 17 | + torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --num_epochs 3 --batch_size_training 2 --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --dist_checkpoint_root_folder ./finetuned_model --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/ocrvqa_dataset.py" --run_validation True --batching_strategy padding --use_peft --peft_method lora |
25 | 18 | ```
|
26 | 19 | **Note**: `--batching_strategy padding` is needed as the vision model will not work with `packing` method.
|
27 | 20 |
|
28 | 21 | For more details about the finetuning configurations, please read the [finetuning readme](./README.md).
|
29 | 22 |
|
30 | 23 | ### How to use custom dataset to fine-tune vision model
|
31 | 24 |
|
32 |
| -1. Create a new dataset python file under `recipes/quickstart/finetuning/dataset` folder |
33 |
| -2. In this python file, you need to define a `get_custom_dataset(dataset_config, processor, split, split_ratio=0.9)` function that handles the dataloading. |
34 |
| -3. In this python file, you need to define a `get_data_collator(processor)` that returns a custom data collartor that can be used by the Pytorch Data Loader. |
| 25 | +In order to use a custom dataset, please follow the steps below: |
| 26 | + |
| 27 | +1. Create a new dataset python file under `recipes/quickstart/finetuning/dataset` folder. |
| 28 | +2. In this python file, you need to define a `get_custom_dataset(dataset_config, processor, split, split_ratio=0.9)` function that handles the data loading. |
| 29 | +3. In this python file, you need to define a `get_data_collator(processor)` that returns a custom data collator that can be used by the Pytorch Data Loader. |
35 | 30 | 4. This custom data collator class must have a `__call__(self, samples)` function that converts the image and text samples into the actual inputs that vision model expects.
|
| 31 | +5. Run the `torchrun` commend from above section, please change the `--custom_dataset.file` to the new dataset python file, adjust the learning rate accordingly. |
0 commit comments