Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates

- 💥**Supported Formats**: Supports both ✨`quantization` (integer and floating-point) and ✨`sparsity`, specifically including ✅weight-activation, ✅weight-only, ✅mixed-precision quantization, as well as ✅structured and ✅unstructured sparsity.

- 💥**Wide Model Support**: Offers support for a diverse array of ✨`LLM models`, including ✅LLama, ✅Mistral, ✅InternLM2, ✅Qwen2, among others, as well as ✅MOE(DeepSeekv2, Deepseekv2.5) and ✅VLM(Llama3.2-vision, Qwen-vl) models (see [Supported Model List](#supported-model-list)).
- 💥**Wide Model Support**: Offers support for a diverse array of ✨`LLM models`, including ✅LLama, ✅Mistral, ✅InternLM2, ✅Qwen2, among others, as well as ✅MOE(DeepSeekv2, Deepseekv2.5) and ✅VLM(Llama3.2-vision, Qwen2-vl) models (see [Supported Model List](#supported-model-list)).

- 💥**Multi-backend Compatibility**: Seamlessly integrates with various backends for enhanced deployment flexibility. Multiple quantization settings and model formats are compatible with a wide range of backends and hardware platforms, such as ✅VLLM, ✅Sglang, ✅LightLLM, ✅MLC-LLM, and ✅AutoAWQ, making it highly versatile(see Section `Backend` [here](https://llmc-en.readthedocs.io/en/latest/)).

Expand Down Expand Up @@ -166,7 +166,9 @@ Please refer to the 🚀`Quick Start` section in the [documentation](https://llm

✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)

✅ [Qwen-VL](https://huggingface.co/Qwen/Qwen-VL)
✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)

✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)

You can add your own model type referring to files under `llmc/models/*.py`.

Expand Down
6 changes: 4 additions & 2 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates

- 💥**サポートされているフォーマット**: ✨`量子化`(整数および浮動小数点)と ✨`疎性` の両方をサポートし、具体的には ✅重量-活性化、✅重量のみ、✅混合精度量子化、および ✅構造化疎性 と ✅非構造化疎性 を含みます。

- 💥**広範なモデルサポート**: 多様な ✨`LLMモデル` をサポートしており、✅LLama、✅Mistral、✅InternLM2、✅Qwen2 など、さらに ✅✅MOE(DeepSeekv2, Deepseekv2.5) モデルや ✅VLM(Llama3.2-vision, Qwen-vl) モデルもサポートしています([サポートされているモデルリスト](#supported-model-list)を参照してください)。
- 💥**広範なモデルサポート**: 多様な ✨`LLMモデル` をサポートしており、✅LLama、✅Mistral、✅InternLM2、✅Qwen2 など、さらに ✅✅MOE(DeepSeekv2, Deepseekv2.5) モデルや ✅VLM(Llama3.2-vision, Qwen2-vl) モデルもサポートしています([サポートされているモデルリスト](#supported-model-list)を参照してください)。

- 💥**マルチバックエンドの互換性**: 複数のバックエンドとシームレスに統合し、展開の柔軟性を強化します。さまざまな量子化設定およびモデルフォーマットが、✅VLLM、✅Sglang、✅LightLLM、✅MLC-LLM、✅AutoAWQ など、幅広いバックエンドおよびハードウェアプラットフォームと互換性があり、高い柔軟性を実現しています(`Backend`セクションは[こちら](https://llmc-en.readthedocs.io/en/latest/)をご覧ください)。

Expand Down Expand Up @@ -164,7 +164,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates

✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)

✅ [Qwen-VL](https://huggingface.co/Qwen/Qwen-VL)
✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)

✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)

独自のモデルタイプを追加するには、`llmc/models/*.py` ファイルを参照してください。

Expand Down
6 changes: 4 additions & 2 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates

- 💥**支持的格式**: 支持 ✨`量化`(整型和浮点)和 ✨`稀疏化`,具体包括 ✅权重激活量化、✅权重量化、✅混合精度量化,以及 ✅结构化 和 ✅非结构化稀疏化。

- 💥**广泛模型支持**: 支持多种 ✨`LLM模型`,包括 ✅LLama、✅Mistral、✅InternLM2、✅Qwen2 等,以及 ✅MOE(DeepSeekv2, Deepseekv2.5) 和 ✅VLM(Llama3.2-vision, Qwen-vl) 模型(参见[支持的模型列表](#supported-model-list))。
- 💥**广泛模型支持**: 支持多种 ✨`LLM模型`,包括 ✅LLama、✅Mistral、✅InternLM2、✅Qwen2 等,以及 ✅MOE(DeepSeekv2, Deepseekv2.5) 和 ✅VLM(Llama3.2-vision, Qwen2-vl) 模型(参见[支持的模型列表](#supported-model-list))。

- 💥**多后端兼容性**: 无缝集成多个后端,增强部署灵活性。多种量化设置和模型格式兼容广泛的后端和硬件平台,例如 ✅VLLM、✅Sglang、✅LightLLM、✅MLC-LLM 和 ✅AutoAWQ,使其高度灵活(参见✨`推理后端` 章节 [此处](https://llmc-zhcn.readthedocs.io/en/latest/))。

Expand Down Expand Up @@ -164,7 +164,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates

✅ [Qwen MOE](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)

✅ [Qwen-VL](https://huggingface.co/Qwen/Qwen-VL)
✅ [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)

✅ [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-2B)

你可以参考 `llmc/models/*.py` 文件添加自己的模型类型。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ calib:
type: img_txt
download: False
path: calib data path
add_answer: False # Defalut is False. If set it to Ture, calib data will add answers.
n_samples: 3
bs: -1
seq_len: 512
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
base:
seed: &seed 42
model:
type: model_type
path: model path
tokenizer_mode: slow
torch_dtype: auto
calib:
name: vlm_datastes
type: img_txt
download: False
path: calib data path
add_answer: False # Defalut is False. If set it to Ture, calib data will add answers.
n_samples: 3
bs: -1
seq_len: 512
preproc: vlm_general
padding: True
seed: *seed
eval:
eval_pos: [pretrain, fake_quant]
type: img_txt
name: MME
download: False
path: MME dataset path
bs: 16
inference_per_block: False
quant:
method: Awq
weight:
bit: 4
symmetric: False
granularity: per_group
group_size: 128
special:
trans: True
# The options for "trans_version" include "v1" and "v2".
# But their results don't differ significantly.
trans_version: v2
weight_clip: True
# For 2-bit quantization, setting "clip_sym: False" will yield better results.
clip_sym: False
save:
save_trans: False
save_fake: False
save_path: /path/to/save/
8 changes: 3 additions & 5 deletions llmc/data/dataset/base_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,6 @@ def get_calib_samples(self):
preproc = PREPROC_REGISTRY[self.preproc]
samples = preproc(
self.calib_dataset,
self.tokenizer,
self.batch_process,
self.n_samples
)
else:
Expand Down Expand Up @@ -222,15 +220,15 @@ def txt_group_samples_wo_mask(self, samples): # without mask
def img_txt_group_samples_with_mask(self, samples):
calib_samples = []
if self.calib_bs < 0:
calib_samples.append(self.batch_process(samples))
calib_samples.append(self.batch_process(samples, calib_or_eval='calib'))
elif self.calib_bs == 1:
calib_samples = [self.batch_process([sample]) for sample in samples]
calib_samples = [self.batch_process([sample], calib_or_eval='calib') for sample in samples] # noqa
elif self.calib_bs > 1:
for i in range(0, len(samples), self.calib_bs):
start = i
end = min(i + self.calib_bs, len(samples))
batch = samples[start:end]
calib_samples.append(self.batch_process(batch))
calib_samples.append(self.batch_process(batch, calib_or_eval='calib'))
return calib_samples

def img_group_samples_wo_mask(self, samples): # without mask
Expand Down
2 changes: 1 addition & 1 deletion llmc/data/dataset/specified_preproc.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def pileval_omni(calib_dataset, tokenizer, n_samples, seq_len):


@PREPROC_REGISTRY
def vlm_general(calib_dataset, tokenizer, batch_process, n_samples):
def vlm_general(calib_dataset, n_samples):
img_qa_json = os.path.join(calib_dataset, 'img_qa.json')
fp = open(img_qa_json)
img_qas = json.load(fp)
Expand Down
5 changes: 0 additions & 5 deletions llmc/eval/eval_vlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,6 @@ def load_mme(self):
return img_qas

def patch_datasets(self, model_type):
if self.dataset == 'MME':
if model_type == 'InternVL2':
for idx in range(len(self.img_qas)):
if '<image>\n' not in self.img_qas[idx]['question']:
self.img_qas[idx]['question'] = '<image>\n' + self.img_qas[idx]['question']
if model_type == 'InternVL2':
self.output_include_input = False
elif model_type == 'Llava':
Expand Down
9 changes: 8 additions & 1 deletion llmc/models/internvl2.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,10 @@ def build_model(self):
'Besides, you can also put the <image> into your calib dataset.'
)

def batch_process(self, img_qas):
def batch_process(self, img_qas, calib_or_eval='eval'):
assert calib_or_eval == 'calib' or calib_or_eval == 'eval'
questions = []
answers = []
pixel_values_list = []
num_patches_list = []
for idx in range(len(img_qas)):
Expand Down Expand Up @@ -166,6 +168,7 @@ def batch_process(self, img_qas):
else:
assert img_qas[idx]['question'].count('<image>') == len(img_path), f"{img_qas[idx]['img']} this data prompt is wrong." # noqa
questions.append(img_qas[idx]['question'])
answers.append(img_qas[idx]['answer'] + '<|im_end|>')

pixel_values = (
torch.cat(pixel_values_list, dim=0) if len(pixel_values_list) > 0 else None
Expand All @@ -189,6 +192,10 @@ def batch_process(self, img_qas):
template.append_message(template.roles[0], question)
template.append_message(template.roles[1], None)
query = template.get_prompt()
if calib_or_eval == 'calib' and self.config['calib'].get('add_answer', False):
query += answers[idx]
if calib_or_eval == 'calib':
logger.info(f'Calib data is:\n{query}')
for _num_patches_i in num_patches:
image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.vlm_model.num_image_token * _num_patches_i + IMG_END_TOKEN # noqa
query = query.replace('<image>', image_tokens, 1)
Expand Down
17 changes: 14 additions & 3 deletions llmc/models/llava.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,11 @@ def build_model(self):

self.processor = AutoProcessor.from_pretrained(self.model_path)

def batch_process(self, img_qas):
def batch_process(self, img_qas, calib_or_eval='eval'):
assert calib_or_eval == 'calib' or calib_or_eval == 'eval'
messages = []
images = []
answers = []
for idx in range(len(img_qas)):
img_path = img_qas[idx]['img']
image = Image.open(img_path)
Expand All @@ -50,10 +52,19 @@ def batch_process(self, img_qas):
]
messages.append(message)
images.append(image)
answers.append(img_qas[idx]['answer'])
texts = [
self.processor.apply_chat_template(msg, add_generation_prompt=True)
for msg in messages
self.processor.apply_chat_template(messages[n], add_generation_prompt=True)
for n in range(len(messages))
]
if calib_or_eval == 'calib' and self.config['calib'].get('add_answer', False):
texts = [
texts[n] + ' ' + answers[n]
for n in range(len(texts))
]
if calib_or_eval == 'calib':
logger.info(f'Calib data is:\n{texts}')

inputs = self.processor(
text=texts,
images=images,
Expand Down
3 changes: 2 additions & 1 deletion llmc/models/mllama.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ def build_model(self):
self.model = self.vlm_model.language_model
self.model_config = self.vlm_model_config.text_config

def batch_process(self, img_qas):
def batch_process(self, img_qas, calib_or_eval='eval'):
assert calib_or_eval == 'calib' or calib_or_eval == 'eval'
if len(img_qas) == 1:
return self.single_process(img_qas[0])
processor = AutoProcessor.from_pretrained(self.model_path)
Expand Down
13 changes: 12 additions & 1 deletion llmc/models/qwen2vl.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,10 @@ def build_model(self):
max_pixels=self.max_pixels
)

def batch_process(self, img_qas):
def batch_process(self, img_qas, calib_or_eval='eval'):
assert calib_or_eval == 'calib' or calib_or_eval == 'eval'
messages = []
answers = []
for idx in range(len(img_qas)):
img_path = img_qas[idx]['img']
if img_path is not None:
Expand All @@ -87,10 +89,19 @@ def batch_process(self, img_qas):
}
]
messages.append(message)
answers.append(img_qas[idx]['answer'] + '<|im_end|>')
texts = [
self.processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
for msg in messages
]
if calib_or_eval == 'calib' and self.config['calib'].get('add_answer', False):
texts = [
texts[n] + answers[n]
for n in range(len(texts))
]
if calib_or_eval == 'calib':
logger.info(f'Calib data is:\n{texts}')

image_inputs, video_inputs = process_vision_info(messages)
inputs = self.processor(
text=texts,
Expand Down
Loading