Merge remote-tracking branch 'origin/main'

ChrisLiu6 · ChrisLiu6 · commit c4eaa98b0ce7 · 2023-07-27T23:38:39.000+08:00
diff --git a/README.md b/README.md
@@ -1,5 +1,7 @@
 # LLaMA-Adapter: Efficient Fine-tuning of LLaMA 🚀
 
+## Announcement: We release **[LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory)**, an open-source toolkit for **pre-training**, **fine-tuning** and **deployment** of **LLMs** and **mutlimodal LLMs**.🔥
+
 Official implementation of ['LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention'](https://arxiv.org/pdf/2303.16199.pdf) and ['LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model'](https://arxiv.org/pdf/2304.15010.pdf).
 
 <p align="center">                                                                                                                                          <img src="docs/logo_v4.png"/ width="100%"> <br>
@@ -11,13 +13,15 @@ This repo proposes **LLaMA-Adapter (V2)**, a lightweight adaption method for fin
 Try out the web demo 🤗 of LLaMA-Adapter: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/csuhan/LLaMA-Adapter), [LLaMA-Adapter V2](http://llama-adapter.opengvlab.com/) and [ImageBind-LLM](http://imagebind-llm.opengvlab.com/).
 
 ## News
+- **[2023.07.24]** We release **[LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory)**, an open-source toolkit for **pre-training**, **fine-tuning** and **deployment** of **Large Language Models (LLMs)** and **mutlimodal LLMs**. Please check [Alpha-VLLM/LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory) for more details!🔥🔥🔥
+- **[2023.07.05]** We release the pretrain/finetune code of [llama_adapter_v2_multimodal](https://github.com/OpenGVLab/LLaMA-Adapter/tree/main/llama_adapter_v2_multimodal).
 - **[2023.07.04]** We release the code for reproducing [Gorilla](https://github.com/ShishirPatil/gorilla) by both full finetune and LLaMA-Adapter, please see [gorilla/README.md](https://github.com/OpenGVLab/LLaMA-Adapter/blob/main/gorilla/README.md).
-- **[2023.06.08]** We release the [demo](http://imagebind-llm.opengvlab.com/) of ImageBind-LLM 🔥🔥🔥.
-- **[2023.06.06]** We release [Point-Bind](https://github.com/ZrrSkywalker/Point-Bind) 🔥🔥🔥 to extend ImageBind with 3D point clouds, which achieves 3D instruction-following capacity for [imagebind_LLM](imagebind_LLM).
+- **[2023.06.08]** We release the [demo](http://imagebind-llm.opengvlab.com/) of ImageBind-LLM.
+- **[2023.06.06]** We release [Point-Bind](https://github.com/ZrrSkywalker/Point-Bind) to extend ImageBind with 3D point clouds, which achieves 3D instruction-following capacity for [imagebind_LLM](imagebind_LLM).
 - **[2023.06.05]** We support the integration of LLaMA-Adapter (both V1 and V2) and [LangChain](https://python.langchain.com/en/latest/index.html). Check out the [Notebook](/docs/langchain_LLaMA_AdapterV2_demo.ipynb).
-- **[2023.05.29]** We release the code of ImageBind-LLM at [imagebind_LLM](imagebind_LLM) 🔥🔥🔥.
+- **[2023.05.29]** We release the code of ImageBind-LLM at [imagebind_LLM](imagebind_LLM).
 - **[2023.05.23]** We release the [demos](http://llama-adapter.opengvlab.com/) and [multi-modal code](llama_adapter_v2_multimodal) of LLaMA-Adapter V2!
-- **[2023.05.05]** We release the paper and code of our new work [Personalize Segment Anything](https://github.com/ZrrSkywalker/Personalize-SAM) 🔥🔥🔥, which efficiently fine-tunes Segment Anything with **10 seconds**, and improves DreamBooth for better **text-to-image generation**. 
+- **[2023.05.05]** We release the paper and code of our new work [Personalize Segment Anything](https://github.com/ZrrSkywalker/Personalize-SAM), which efficiently fine-tunes Segment Anything with **10 seconds**, and improves DreamBooth for better **text-to-image generation**. 
 - **[2023.04.30]** We noticed that GPT-4 evaluation has a strong positional bias in favor of the first response. We will soon update the paper to reveal the position bias. Great thanks to [Canwen Xu](https://scholar.google.com/citations?user=oopKCDMAAAAJ&hl=en).
 - **[2023.04.28]** We release **LLaMA-Adapter V2**, a multi-modal instruction model. Check out our [paper](https://arxiv.org/abs/2304.15010), [demos](#demos) and [code](llama_adapter_v2_chat65b)!
 - **[2023.03.28]**  The [paper](https://arxiv.org/pdf/2303.16199.pdf) and [training code](alpaca_finetuning_v1) for **LLaMA-Adapter V1** are released. 📌
@@ -39,20 +43,6 @@ Try out the web demo 🤗 of LLaMA-Adapter: [![Hugging Face Spaces](https://img.
 + **ImageBind-dialog** will be release soon
 
 
-## <div id="demos">Demos (LLaMA-Adapter V2)</div>
-
-### -> Chatbot System
-
-<img src="docs/chat_demo.png" width="80%" />
-
-
-<!-- | <img src="docs/multi_model_example_1.png" />  | <img src="docs/multi_model_example_2.png" />  |
-|---|---|
-|  <img src="docs/multi_model_example_3.png" /> | <img src="docs/multi_model_example_4.png" />  | -->
-
-
-
-
 ## Overview
 Efficiency Comparison:
 |  Model | Parameters | Storage Space | Training Time  
diff --git a/llama_adapter_v2_chat65b/README.md b/llama_adapter_v2_chat65b/README.md
@@ -75,6 +75,9 @@ conda env create -f environment.yml
 
 * Use Ctrl+C to exit the demo at any time.
 
+## Demo
+<img src="../docs/chat_demo.png" width="80%" />
+
 ## Known issues
 
 * Some users may experience the error `RuntimeError: Expected is_sm80 to be true, but got false.` (Mostly sm_86 GPU users, including A6000, A5000 and 3090). This is because we changed the attention module to use `torch.nn.functional.scaled_dot_product_attention` if it exists, but a [dispatch logic error](https://github.com/pytorch/pytorch/issues/94883) in PyTorch = 2.0.0 causes failure on some GPU architectures. The affected users can upgrade to PyTorch >= 2.1.0 or the nightly build, in which the bug is fixed.
diff --git a/llama_adapter_v2_multimodal/README.md b/llama_adapter_v2_multimodal/README.md
@@ -1,7 +1,7 @@
 # LLaMA-Adapter-V2 Multi-modal
 
 ## News
-
+* [July 5, 2023] Release pre-traininig and fine-tuning codes.
 * [May 26, 2023] Initial release.
 
 
@@ -23,7 +23,7 @@
   └── tokenizer.model
   ```
 
-## Usage
+## Inference
 
 Here is a simple inference script for LLaMA-Adapter V2. The pre-trained model will be downloaded directly from [Github Release](https://github.com/ZrrSkywalker/LLaMA-Adapter/releases/tag/v.2.0.0).
 
@@ -37,7 +37,9 @@ device = "cuda" if torch.cuda.is_available() else "cpu"
 
 llama_dir = "/path/to/LLaMA/"
 
+# choose from BIAS-7B, LORA-BIAS-7B
 model, preprocess = llama.load("BIAS-7B", llama_dir, device)
+model.eval()
 
 prompt = llama.format_prompt("Please introduce this painting.")
 img = Image.fromarray(cv2.imread("../docs/logo_v1.png"))
@@ -71,4 +73,7 @@ import llama
 print(llama.available_models())
 ```
 
-Now we provide `BIAS-7B`, which fine-tunes the `bias` and `norm` parameters of LLaMA. We will include more pretrained models in the future, such as the LoRA fine-tuning model `LoRA-7B` and partial-tuning model `PARTIAL-7B`.
+Now we provide `BIAS-7B` which fine-tunes the `bias` and `norm` parameters of LLaMA, and `LORA-BIAS-7B` which fine-tunes the `bias`, `norm` and `lora` parameters of LLaMA. We will include more pretrained models in the future, such as the LoRA fine-tuning model `LORA-7B` and partial-tuning model `PARTIAL-7B`.
+
+## Pre-traininig & Fine-tuning
+See [train.md](docs/train.md)
diff --git a/llama_adapter_v2_multimodal/demo.py b/llama_adapter_v2_multimodal/demo.py
@@ -7,13 +7,14 @@
 
 llama_dir = "/path/to/LLaMA/"
 
+# choose from BIAS-7B, LORA-BIAS-7B
 model, preprocess = llama.load("BIAS-7B", llama_dir, device)
 model.eval()
 
 prompt = llama.format_prompt('Please introduce this painting.')
-img = Image.fromarray(cv2.imread("./docs/logo_v1.png"))
+img = Image.fromarray(cv2.imread("../docs/logo_v1.png"))
 img = preprocess(img).unsqueeze(0).to(device)
 
 result = model.generate(img, [prompt])[0]
 
-print(result)
+print(result)
diff --git a/llama_adapter_v2_multimodal/docs/train.md b/llama_adapter_v2_multimodal/docs/train.md
@@ -1,9 +1,10 @@
-The training process of LLaMA-Adapter V2 consists of the pre-training and fine-tuning phases. 
+The training process of LLaMA-Adapter V2 consists of the pre-training and fine-tuning phases.
 
 ## Pre-training
+
 ### Data
-* We use multiple datasets with **image-text pairs** for pre-training. The texts are English-only.
 
+* We use multiple datasets with **image-text pairs** for pre-training. The texts are English-only.
 * For each dataset, the meta file should be organized in the `.csv` format as following:
 
   ```
@@ -14,8 +15,8 @@ The training process of LLaMA-Adapter V2 consists of the pre-training and fine-t
   ```
 
   Alternatively, you may modify the [`PretrainDataset`](/data/dataset.py) implementation to adapt to your own meta file format.
-
 * Write a `.yaml` config file to specify the datasets for pre-training:
+
   ```
   META:
     - '/path/to/cc3m.csv'
@@ -25,29 +26,25 @@ The training process of LLaMA-Adapter V2 consists of the pre-training and fine-t
 
 ### Start pre-training
 
-We are now ready to start pre-training (please make sure that the original LLaMA / Open-Chinese-LLaMA weights are available in `/path/to/llama_model_weights`). 
+We are now ready to start pre-training (please make sure that the original LLaMA weights are available in `/path/to/llama_model_weights`).
 
 ```bash
 . exps/pretrain.sh /path/to/llama_model_weights /path/to/pretrain-data-config.yaml /output/path
 ```
 
-
-
 ## Fine-tuning
 
 ### Data
 
 * We fine-tune LLaMA-Adapter V2 on text-only as well as image-text instruction following datasets.
-
 * The following lists the datasets we use for training our release weights:
 
-  | Name                     | Link                                                         |
-  | ------------------------ | ------------------------------------------------------------ |
-  | alpaca_gpt4_data.json    | [File Link](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/alpaca_gpt4_data.json) |
+  | Name                     | Link                                                                                                         |
+  | ------------------------ | ------------------------------------------------------------------------------------------------------------ |
+  | alpaca_gpt4_data.json    | [File Link](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/alpaca_gpt4_data.json)    |
   | alpaca_gpt4_data_zh.json | [File Link](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/alpaca_gpt4_data_zh.json) |
-  | llava_instruct_150k.json | [File Link](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/raw/main/llava_instruct_150k.json) |
-  | alpaca_data_zh_51k.json  | [File Link](https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/data/alpaca_data_zh_51k.json) |
-
+  | llava_instruct_150k.json | [File Link](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/raw/main/llava_instruct_150k.json)   |
+  | alpaca_data_zh_51k.json  | [File Link](https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/data/alpaca_data_zh_51k.json)               |
 * Similar to pre-training, write a `.yaml` config file to specify the datasets for fine-tuning:
 
   ```
@@ -61,7 +58,36 @@ We are now ready to start pre-training (please make sure that the original LLaMA
 
 ```bash
 . exps/finetune.sh \
- /path/to/llama_model_weights /path/to/pre-trained/checkopint.pth \
+ /path/to/llama_model_weights /path/to/pre-trained/checkpoint.pth \
  /path/to/finetune-data-config.yaml /output/path
 ```
 
+### Test and Save
+
+```python
+import os 
+from llama.llama_adapter import LLaMA_adapter
+import util.misc as misc
+import util.extract_adapter_from_checkpoint as extract
+
+device = "cuda" if torch.cuda.is_available() else "cpu"
+
+llama_dir = "path/to/llama/"
+llama_type = '7B'
+llama_ckpt_dir = os.path.join(llama_dir, llama_type)
+llama_tokenzier_path = os.path.join(llama_dir, 'tokenizer.model')
+model = LLaMA_adapter(llama_ckpt_dir, llama_tokenzier_path)
+
+misc.load_model(model, 'path/to/finetune/checkpoint.pth')
+model.eval()
+model.to(device)
+
+prompt = llama.format_prompt('your prompt')
+img = Image.fromarray(cv2.imread("your image"))
+img = model.clip_transform(img).unsqueeze(0).to(device)
+
+result = model.generate(img, [prompt])[0]
+print(result)
+
+extract.save(model,'path/to/adapter-7B.pth','BIAS') # Please end it with -llama_type.pth
+```
diff --git a/llama_adapter_v2_multimodal/llama/llama.py b/llama_adapter_v2_multimodal/llama/llama.py
@@ -26,6 +26,7 @@ class ModelArgs:
     w_bias: bool = False # use bias tuning
     w_lora: bool = False # use lora tuning
     lora_rank: int = 16
+    w_new_gate: bool = False # for compatibility
 
 
 class RMSNorm(torch.nn.Module):
@@ -125,6 +126,10 @@ def __init__(self, args: ModelArgs):
         self.cache_v = None
 
         self.gate = torch.nn.Parameter(torch.zeros(1, self.n_local_heads, 1, 1))
+        
+        self.w_new_gate = args.w_new_gate
+        if args.w_new_gate:
+            self.new_gate = torch.nn.Parameter(torch.ones(1, 1, 1, 1))
 
 
     def train(self, mode: bool = True):
@@ -194,6 +199,8 @@ def forward(self, x: torch.Tensor, start_pos: int, freqs_cis: torch.Tensor, mask
             if adapter_len > 1:
                 adapter_scores = torch.matmul(xq, adapter_k.transpose(2, 3)) / math.sqrt(self.head_dim)
                 adapter_scores = self.gate.tanh() * F.softmax(adapter_scores.float(), dim=-1).type_as(xq)
+                if self.w_new_gate:
+                    adapter_scores = self.new_gate * adapter_scores
                 output = output + torch.matmul(adapter_scores, adapter_v)
             else:
                 output = output + self.gate.tanh() * adapter_v
diff --git a/llama_adapter_v2_multimodal/llama/llama_adapter.py b/llama_adapter_v2_multimodal/llama/llama_adapter.py
@@ -20,6 +20,9 @@ def __init__(self, llama_ckpt_dir, llama_tokenizer,
                  v_embed_dim=768, v_depth=8,
                  v_num_heads=16, v_mlp_ratio=4.0,
                  query_len=10, query_layer=31,
+                 w_bias=False, 
+                 w_lora=False, lora_rank=16, 
+                 w_new_gate=False,
                  phase="finetune"):
         super().__init__()
 
@@ -58,6 +61,9 @@ def __init__(self, llama_ckpt_dir, llama_tokenizer,
 
         # 5. llama
         model_args.w_bias = w_bias
+        model_args.w_lora = w_lora
+        model_args.lora_rank = lora_rank
+        model_args.w_new_gate = w_new_gate
         model_args.vocab_size = self.tokenizer.n_words
         torch.set_default_tensor_type(torch.cuda.HalfTensor)
         self.llama = Transformer(model_args)
@@ -268,8 +274,10 @@ def generate(
         return decoded
 
 
+
 _MODELS = {
     "BIAS-7B": "https://github.com/OpenGVLab/LLaMA-Adapter/releases/download/v.2.0.0/7fa55208379faf2dd862565284101b0e4a2a72114d6490a95e432cf9d9b6c813_BIAS-7B.pth",
+    "LORA-BIAS-7B": "https://github.com/OpenGVLab/LLaMA-Adapter/releases/download/v.2.0.0/1bcbffc43484332672092e0024a8699a6eb5f558161aebf98a7c6b1db67224d1_LORA-BIAS-7B.pth",
     # "LORA16-7B": "",
     # "PARTIAL-7B": ""
 }
@@ -284,10 +292,8 @@ def load(name, llama_dir, device="cuda" if torch.cuda.is_available() else "cpu",
     elif os.path.isfile(name):
         model_path = name
     else:
-        return RuntimeError(f"Model {name} not found; available models = {available_models()}")
+        return RuntimeError(f"Model {name} not found; available models = {available_models()}"), None
 
-    ckpt = torch.load(model_path, map_location='cpu')
-    
     # BIAS-7B or https://xxx/sha256_BIAS-7B.pth -> 7B
     llama_type = name.split('.')[0].split('-')[-1]
     llama_ckpt_dir = os.path.join(llama_dir, llama_type)
@@ -296,6 +302,7 @@ def load(name, llama_dir, device="cuda" if torch.cuda.is_available() else "cpu",
     # load llama_adapter weights and model_cfg
     print(f'Loading LLaMA-Adapter from {model_path}')
     ckpt = torch.load(model_path, map_location='cpu')
+    model_cfg = ckpt.get('config', {})
 
     model = LLaMA_adapter(
         llama_ckpt_dir, llama_tokenzier_path,
@@ -304,6 +311,10 @@ def load(name, llama_dir, device="cuda" if torch.cuda.is_available() else "cpu",
         v_embed_dim=768, v_depth=8,
         v_num_heads=16, v_mlp_ratio=4.0,
         query_len=10, query_layer=31,
+        w_bias=model_cfg.get('w_bias', False), 
+        w_lora=model_cfg.get('w_lora', False), 
+        lora_rank=model_cfg.get('lora_rank', 16),
+        w_new_gate=model_cfg.get('w_lora', False), # for compatibility
         phase=phase)
 
     load_result = model.load_state_dict(ckpt['model'], strict=False)
diff --git a/llama_adapter_v2_multimodal/util/extract_adapter_from_checkpoint.py b/llama_adapter_v2_multimodal/util/extract_adapter_from_checkpoint.py
@@ -0,0 +1,52 @@
+import torch
+
+def save(full_model, path, model_type = 'BIAS'):
+    if model_type == 'BIAS':
+        keys = [
+            f'visual_blocks.{i}.{key}.{suffix}'
+            for i in range(8)
+            for key in ['norm1', 'attn.qkv', 'attn.proj', 'norm2', 'mlp.fc1', 'mlp.fc2']
+            for suffix in ['weight', 'bias']
+        ] + [
+            f'llama.layers.{i}.{key}'
+            for i in range(32)
+            for key in ['attention.gate', 'attention.wq.bias', 'attention.wo.bias', 'feed_forward.w1.bias', 'feed_forward.w2.bias', 'feed_forward.w3.bias', 'attention_norm.weight', 'ffn_norm.weight']
+        ] + [
+            f'{base_key}.{suffix}'
+            for base_key in ['clip_proj_norm', 'visual_proj_norm', 'visual_proj', 'clip_proj']
+            for suffix in ['weight', 'bias']
+        ] + ['llama.norm.weight', 'visual_query.weight', 'adapter_query.weight']
+
+    
+    elif model_type == 'LORA':
+        keys = [
+            f'visual_blocks.{i}.{key}.{suffix}'
+            for i in range(8)
+            for key in [f'norm{j}' for j in range(1, 3)] + ['attn.qkv', 'attn.proj', 'mlp.fc1', 'mlp.fc2']
+            for suffix in ['weight', 'bias']
+        ] + [
+            f'llama.layers.{i}.{key}'
+            for i in range(32)
+            for key in ['attention.gate', 'attention.wq.bias', 'attention.wo.bias', 'feed_forward.w1.bias', 'feed_forward.w2.bias', 'feed_forward.w3.bias', 'attention_norm.weight', 'ffn_norm.weight']
+                + [f'attention.lora_wk_l{j}.weight' for j in range(1, 3)]
+                + [f'attention.lora_wo_l{j}.weight' for j in range(1, 3)]
+                + [f'feed_forward.lora_w{k}_l{j}.weight' for k in range(1, 4) for j in range(1, 3)]
+                + [f'attention.lora_wq_l{j}.weight' for j in range(1, 3)]
+                + [f'attention.lora_wv_l{j}.weight' for j in range(1, 3)]
+                + ['attention.new_gate']
+        ] + [
+            f'{base_key}.{suffix}'
+            for base_key in ['clip_proj_norm', 'visual_proj_norm', 'visual_proj', 'clip_proj']
+            for suffix in ['weight', 'bias']
+        ] + ['llama.norm.weight', 'visual_query.weight', 'adapter_query.weight']
+
+    ## TODO: Add other model types
+
+    full_model_state_dict = full_model.state_dict()
+    small_weights = {key: full_model_state_dict[key] for key in keys}
+    if model_type == 'BIAS':
+        wrapped_small_weights = {'model': small_weights,'config': {'w_bias': True, 'w_lora': False, 'lora_rank': 16}}
+    elif model_type == 'LORA':
+        wrapped_small_weights = {'model': small_weights,'config': {'w_bias': True, 'w_lora': True,  'lora_rank': 16}}
+    # Save the wrapped small weights
+    torch.save(wrapped_small_weights, path)