You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Fine-Tuning Meta Llama Multi Modal Models recipe
2
+
This recipe steps you through how to finetune a Llama 3.2 vision model on the OCR VQA task using the [OCRVQA](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron/viewer/ocrvqa?row=0) dataset.
3
+
4
+
**Disclaimer**: As our vision models already have a very good OCR ability, here we just use the OCRVQA dataset only for demonstration purposes of the required steps for fine-tuning our vision models with llama-recipes.
5
+
6
+
### Fine-tuning steps
7
+
8
+
We created an example script [ocrvqa_dataset.py](./datasets/ocrvqa_dataset.py) that can load the OCRVQA dataset with `get_custom_dataset` function, then provide OCRVQADataCollator class to process the image dataset.
9
+
10
+
For **full finetuning with FSDP**, we can run the following code:
**Note**: `--batching_strategy padding` is needed as the vision model will not work with `packing` method.
22
+
23
+
For more details about the finetuning configurations, please read the [finetuning readme](./README.md).
24
+
25
+
### How to use a custom dataset to fine-tune vision model
26
+
27
+
In order to use a custom dataset, please follow the steps below:
28
+
29
+
1. Create a new dataset python file under `recipes/quickstart/finetuning/dataset` folder.
30
+
2. In this python file, you need to define a `get_custom_dataset(dataset_config, processor, split, split_ratio=0.9)` function that handles the data loading.
31
+
3. In this python file, you need to define a `get_data_collator(processor)` that returns a custom data collator that can be used by the Pytorch Data Loader.
32
+
4. This custom data collator class must have a `__call__(self, samples)` function that converts the image and text samples into the actual inputs that vision model expects.
33
+
5. Run the `torchrun` commend from above section, please change the `--custom_dataset.file` to the new dataset python file, adjust the learning rate accordingly.
0 commit comments