readme draft

wukaixingxp · wukaixingxp · commit c38cccb66fe8 · 2024-09-10T13:50:40.000-07:00
diff --git a/recipes/quickstart/finetuning/datasets/vqa_dataset.py b/recipes/quickstart/finetuning/datasets/vqa_dataset.py
@@ -60,7 +60,7 @@ def tokenize_dialog(dialog, images, processor):
     # pixel_values =  batch["pixel_values"],
     # image_sizes = batch["image_sizes"]
 #    print("combined_tokens",combined_tokens[image_sizes])
-
+    
     return combined_tokens
 def image_tokenize(sample, processor):
     processor.tokenizer.padding_side = "right" # during training, one always uses padding on the right
diff --git a/recipes/quickstart/finetuning/finetune_vision_model.md b/recipes/quickstart/finetuning/finetune_vision_model.md
@@ -0,0 +1,29 @@
+## Fine-Tuning Meta Llama Multi Modal Models recipe
+Here we discuss fine-tuning Meta Llama 3.2 11B and 90B models.
+
+### Concepts
+Model Architecture
+Our Meta Llama 3.2 11B and 90B models consist of two main components: (1) an image encoder, (2) an image adapter.
+
+[Model Architecture PICTURE]
+
+We need have a new processor class added, that will handle the image processing and text tokenization. A processor example looks like this:
+
+
+
+### Fine-tuning steps
+1. Download the dataset:
+an example of the dataset looks like this:
+2. Processor example looks like this
+
+3. Load the dataset
+
+Full-finetune
+```bash
+  torchrun --nnodes 1 --nproc_per_node 4  recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 1 --batch_size_training 1 --model_name llava-hf/llama3-llava-next-8b-hf --dist_checkpoint_root_folder /home/kaiwu/work/fb_connect/finetune_model --dist_checkpoint_folder fine-tuned  --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/vqa_dataset.py" --use-wandb  --run_validation True
+```
+
+LoRA:
+```bash
+  torchrun --nnodes 1 --nproc_per_node 4  recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 1 --batch_size_training 1 --model_name llava-hf/llama3-llava-next-8b-hf --dist_checkpoint_root_folder /home/kaiwu/work/fb_connect/finetune_model --dist_checkpoint_folder fine-tuned  --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/quickstart/finetuning/datasets/vqa_dataset.py" --use-wandb  --run_validation True
+```