Skip to content

Commit 2aaa3ab

Browse files
support add_answer for vlm models (#213)
1 parent 9b91523 commit 2aaa3ab

File tree

12 files changed

+99
-23
lines changed

12 files changed

+99
-23
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
110110

111111
- 💥**Supported Formats**: Supports both ✨`quantization` (integer and floating-point) and ✨`sparsity`, specifically including ✅weight-activation, ✅weight-only, ✅mixed-precision quantization, as well as ✅structured and ✅unstructured sparsity.
112112

113-
- 💥**Wide Model Support**: Offers support for a diverse array of ✨`LLM models`, including ✅LLama, ✅Mistral, ✅InternLM2, ✅Qwen2, among others, as well as ✅MOE(DeepSeekv2, Deepseekv2.5) and ✅VLM(Llama3.2-vision, Qwen-vl) models (see [Supported Model List](#supported-model-list)).
113+
- 💥**Wide Model Support**: Offers support for a diverse array of ✨`LLM models`, including ✅LLama, ✅Mistral, ✅InternLM2, ✅Qwen2, among others, as well as ✅MOE(DeepSeekv2, Deepseekv2.5) and ✅VLM(Llama3.2-vision, Qwen2-vl) models (see [Supported Model List](#supported-model-list)).
114114

115115
- 💥**Multi-backend Compatibility**: Seamlessly integrates with various backends for enhanced deployment flexibility. Multiple quantization settings and model formats are compatible with a wide range of backends and hardware platforms, such as ✅VLLM, ✅Sglang, ✅LightLLM, ✅MLC-LLM, and ✅AutoAWQ, making it highly versatile(see Section `Backend` [here](https://llmc-en.readthedocs.io/en/latest/)).
116116

@@ -166,7 +166,9 @@ Please refer to the 🚀`Quick Start` section in the [documentation](https://llm
166166

167167
[Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
168168

169-
[Qwen-VL](https://huggingface.co/Qwen/Qwen-VL)
169+
[Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
170+
171+
[InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
170172

171173
You can add your own model type referring to files under `llmc/models/*.py`.
172174

README_ja.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
108108

109109
- 💥**サポートされているフォーマット**: ✨`量子化`(整数および浮動小数点)と ✨`疎性` の両方をサポートし、具体的には ✅重量-活性化、✅重量のみ、✅混合精度量子化、および ✅構造化疎性 と ✅非構造化疎性 を含みます。
110110

111-
- 💥**広範なモデルサポート**: 多様な ✨`LLMモデル` をサポートしており、✅LLama、✅Mistral、✅InternLM2、✅Qwen2 など、さらに ✅✅MOE(DeepSeekv2, Deepseekv2.5) モデルや ✅VLM(Llama3.2-vision, Qwen-vl) モデルもサポートしています([サポートされているモデルリスト](#supported-model-list)を参照してください)。
111+
- 💥**広範なモデルサポート**: 多様な ✨`LLMモデル` をサポートしており、✅LLama、✅Mistral、✅InternLM2、✅Qwen2 など、さらに ✅✅MOE(DeepSeekv2, Deepseekv2.5) モデルや ✅VLM(Llama3.2-vision, Qwen2-vl) モデルもサポートしています([サポートされているモデルリスト](#supported-model-list)を参照してください)。
112112

113113
- 💥**マルチバックエンドの互換性**: 複数のバックエンドとシームレスに統合し、展開の柔軟性を強化します。さまざまな量子化設定およびモデルフォーマットが、✅VLLM、✅Sglang、✅LightLLM、✅MLC-LLM、✅AutoAWQ など、幅広いバックエンドおよびハードウェアプラットフォームと互換性があり、高い柔軟性を実現しています(`Backend`セクションは[こちら](https://llmc-en.readthedocs.io/en/latest/)をご覧ください)。
114114

@@ -164,7 +164,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
164164

165165
[Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
166166

167-
[Qwen-VL](https://huggingface.co/Qwen/Qwen-VL)
167+
[Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
168+
169+
[InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
168170

169171
独自のモデルタイプを追加するには、`llmc/models/*.py` ファイルを参照してください。
170172

README_zh.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
108108

109109
- 💥**支持的格式**: 支持 ✨`量化`(整型和浮点)和 ✨`稀疏化`,具体包括 ✅权重激活量化、✅权重量化、✅混合精度量化,以及 ✅结构化 和 ✅非结构化稀疏化。
110110

111-
- 💥**广泛模型支持**: 支持多种 ✨`LLM模型`,包括 ✅LLama、✅Mistral、✅InternLM2、✅Qwen2 等,以及 ✅MOE(DeepSeekv2, Deepseekv2.5) 和 ✅VLM(Llama3.2-vision, Qwen-vl) 模型(参见[支持的模型列表](#supported-model-list))。
111+
- 💥**广泛模型支持**: 支持多种 ✨`LLM模型`,包括 ✅LLama、✅Mistral、✅InternLM2、✅Qwen2 等,以及 ✅MOE(DeepSeekv2, Deepseekv2.5) 和 ✅VLM(Llama3.2-vision, Qwen2-vl) 模型(参见[支持的模型列表](#supported-model-list))。
112112

113113
- 💥**多后端兼容性**: 无缝集成多个后端,增强部署灵活性。多种量化设置和模型格式兼容广泛的后端和硬件平台,例如 ✅VLLM、✅Sglang、✅LightLLM、✅MLC-LLM 和 ✅AutoAWQ,使其高度灵活(参见✨`推理后端` 章节 [此处](https://llmc-zhcn.readthedocs.io/en/latest/))。
114114

@@ -164,7 +164,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
164164

165165
[Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)
166166

167-
[Qwen-VL](https://huggingface.co/Qwen/Qwen-VL)
167+
[Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
168+
169+
[InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)
168170

169171
你可以参考 `llmc/models/*.py` 文件添加自己的模型类型。
170172

configs/quantization/methods/Awq/awq_w_only_custom_vlm_data_padding.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ calib:
1010
type: img_txt
1111
download: False
1212
path: calib data path
13+
add_answer: False # Defalut is False. If set it to Ture, calib data will add answers.
1314
n_samples: 3
1415
bs: -1
1516
seq_len: 512
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
base:
2+
seed: &seed 42
3+
model:
4+
type: model_type
5+
path: model path
6+
tokenizer_mode: slow
7+
torch_dtype: auto
8+
calib:
9+
name: vlm_datastes
10+
type: img_txt
11+
download: False
12+
path: calib data path
13+
add_answer: False # Defalut is False. If set it to Ture, calib data will add answers.
14+
n_samples: 3
15+
bs: -1
16+
seq_len: 512
17+
preproc: vlm_general
18+
padding: True
19+
seed: *seed
20+
eval:
21+
eval_pos: [pretrain, fake_quant]
22+
type: img_txt
23+
name: MME
24+
download: False
25+
path: MME dataset path
26+
bs: 16
27+
inference_per_block: False
28+
quant:
29+
method: Awq
30+
weight:
31+
bit: 4
32+
symmetric: False
33+
granularity: per_group
34+
group_size: 128
35+
special:
36+
trans: True
37+
# The options for "trans_version" include "v1" and "v2".
38+
# But their results don't differ significantly.
39+
trans_version: v2
40+
weight_clip: True
41+
# For 2-bit quantization, setting "clip_sym: False" will yield better results.
42+
clip_sym: False
43+
save:
44+
save_trans: False
45+
save_fake: False
46+
save_path: /path/to/save/

llmc/data/dataset/base_dataset.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -111,8 +111,6 @@ def get_calib_samples(self):
111111
preproc = PREPROC_REGISTRY[self.preproc]
112112
samples = preproc(
113113
self.calib_dataset,
114-
self.tokenizer,
115-
self.batch_process,
116114
self.n_samples
117115
)
118116
else:
@@ -222,15 +220,15 @@ def txt_group_samples_wo_mask(self, samples): # without mask
222220
def img_txt_group_samples_with_mask(self, samples):
223221
calib_samples = []
224222
if self.calib_bs < 0:
225-
calib_samples.append(self.batch_process(samples))
223+
calib_samples.append(self.batch_process(samples, calib_or_eval='calib'))
226224
elif self.calib_bs == 1:
227-
calib_samples = [self.batch_process([sample]) for sample in samples]
225+
calib_samples = [self.batch_process([sample], calib_or_eval='calib') for sample in samples] # noqa
228226
elif self.calib_bs > 1:
229227
for i in range(0, len(samples), self.calib_bs):
230228
start = i
231229
end = min(i + self.calib_bs, len(samples))
232230
batch = samples[start:end]
233-
calib_samples.append(self.batch_process(batch))
231+
calib_samples.append(self.batch_process(batch, calib_or_eval='calib'))
234232
return calib_samples
235233

236234
def img_group_samples_wo_mask(self, samples): # without mask

llmc/data/dataset/specified_preproc.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ def pileval_omni(calib_dataset, tokenizer, n_samples, seq_len):
102102

103103

104104
@PREPROC_REGISTRY
105-
def vlm_general(calib_dataset, tokenizer, batch_process, n_samples):
105+
def vlm_general(calib_dataset, n_samples):
106106
img_qa_json = os.path.join(calib_dataset, 'img_qa.json')
107107
fp = open(img_qa_json)
108108
img_qas = json.load(fp)

llmc/eval/eval_vlm.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,6 @@ def load_mme(self):
3535
return img_qas
3636

3737
def patch_datasets(self, model_type):
38-
if self.dataset == 'MME':
39-
if model_type == 'InternVL2':
40-
for idx in range(len(self.img_qas)):
41-
if '<image>\n' not in self.img_qas[idx]['question']:
42-
self.img_qas[idx]['question'] = '<image>\n' + self.img_qas[idx]['question']
4338
if model_type == 'InternVL2':
4439
self.output_include_input = False
4540
elif model_type == 'Llava':

llmc/models/internvl2.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,8 +137,10 @@ def build_model(self):
137137
'Besides, you can also put the <image> into your calib dataset.'
138138
)
139139

140-
def batch_process(self, img_qas):
140+
def batch_process(self, img_qas, calib_or_eval='eval'):
141+
assert calib_or_eval == 'calib' or calib_or_eval == 'eval'
141142
questions = []
143+
answers = []
142144
pixel_values_list = []
143145
num_patches_list = []
144146
for idx in range(len(img_qas)):
@@ -166,6 +168,7 @@ def batch_process(self, img_qas):
166168
else:
167169
assert img_qas[idx]['question'].count('<image>') == len(img_path), f"{img_qas[idx]['img']} this data prompt is wrong." # noqa
168170
questions.append(img_qas[idx]['question'])
171+
answers.append(img_qas[idx]['answer'] + '<|im_end|>')
169172

170173
pixel_values = (
171174
torch.cat(pixel_values_list, dim=0) if len(pixel_values_list) > 0 else None
@@ -189,6 +192,10 @@ def batch_process(self, img_qas):
189192
template.append_message(template.roles[0], question)
190193
template.append_message(template.roles[1], None)
191194
query = template.get_prompt()
195+
if calib_or_eval == 'calib' and self.config['calib'].get('add_answer', False):
196+
query += answers[idx]
197+
if calib_or_eval == 'calib':
198+
logger.info(f'Calib data is:\n{query}')
192199
for _num_patches_i in num_patches:
193200
image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.vlm_model.num_image_token * _num_patches_i + IMG_END_TOKEN # noqa
194201
query = query.replace('<image>', image_tokens, 1)

llmc/models/llava.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,11 @@ def build_model(self):
3333

3434
self.processor = AutoProcessor.from_pretrained(self.model_path)
3535

36-
def batch_process(self, img_qas):
36+
def batch_process(self, img_qas, calib_or_eval='eval'):
37+
assert calib_or_eval == 'calib' or calib_or_eval == 'eval'
3738
messages = []
3839
images = []
40+
answers = []
3941
for idx in range(len(img_qas)):
4042
img_path = img_qas[idx]['img']
4143
image = Image.open(img_path)
@@ -50,10 +52,19 @@ def batch_process(self, img_qas):
5052
]
5153
messages.append(message)
5254
images.append(image)
55+
answers.append(img_qas[idx]['answer'])
5356
texts = [
54-
self.processor.apply_chat_template(msg, add_generation_prompt=True)
55-
for msg in messages
57+
self.processor.apply_chat_template(messages[n], add_generation_prompt=True)
58+
for n in range(len(messages))
5659
]
60+
if calib_or_eval == 'calib' and self.config['calib'].get('add_answer', False):
61+
texts = [
62+
texts[n] + ' ' + answers[n]
63+
for n in range(len(texts))
64+
]
65+
if calib_or_eval == 'calib':
66+
logger.info(f'Calib data is:\n{texts}')
67+
5768
inputs = self.processor(
5869
text=texts,
5970
images=images,

0 commit comments

Comments
 (0)