Skip to content

Commit 47b0fca

Browse files
SNHIPOWanxiangsir
andauthored
Downstream (#83)
* add downstream branch * update performance * fix some bugs --------- Co-authored-by: Xiang An <anxiangsir@outlook.com>
1 parent b32f4cb commit 47b0fca

File tree

4 files changed

+17
-5
lines changed

4 files changed

+17
-5
lines changed

downstream/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,12 @@
99
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco?p=multi-label-cluster-discrimination-for-visual)
1010

1111

12+
# MLCD-Seg
13+
[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-MLCD_SEG_Model-yellow)](https://huggingface.co/DeepGlint-AI/MLCD-Seg-7B)
14+
15+
This repository is dedicated to researching the application of multimodal large models in downstream tasks through an end-to-end approach. At present, the segmentation part has achieved excellent results in the reference segmentation project
16+
17+
1218
## RefCOCO Segmentation Evaluation:
1319

1420
| Dataset | Split | MLCD-seg-7B | EVF-SAM | GLaMM | VisionLLM v2| LISA |

downstream/eval/eval/model_vqa_refcoco.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@
99
import numpy as np
1010

1111
import sys
12-
sys.path.insert(0,'./downstream/llava')
12+
13+
sys.path.insert(0,'.')
14+
1315
from llava.constants import IGNORE_INDEX, IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN, DEFAULT_SEG_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN
1416
from llava.conversation import conv_templates, SeparatorStyle
1517
from llava.model.builder import load_pretrained_model
Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
json_path=./eval
22
gpu_num=8
3-
checkpoints_name=./checkpoints/
3+
44
result_name=./eval/results
5+
train_image_path=/vlm/kunwu/data/llava_train_img/glamm_data
56

6-
model_name=llava-seg-DeepGlint-AI_mlcd-vit-large-patch14-336-Qwen_Qwen2.5-7B-Instruct-1.8m
7+
model_name=DeepGlint-AI/MLCD-Seg-7B
78
echo $model_name
89

9-
./eval/script/eval_multiprocess.sh $checkpoints_name/$model_name $json_path/refcoco.json $result_name/$model_name/refcoco /vlm/kunwu/data/llava_train_img/glamm_data "" $gpu_num 0.2
10+
./eval/script/eval_multiprocess.sh $model_name $json_path/refcoco.json $result_name/$model_name/refcoco $train_image_path "" $gpu_num 0.2
1011
python ./eval/eval/evaluate_refcoco.py --result-dir $result_name/$model_name/refcoco
12+

downstream/llava/model/builder.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,9 @@ def load_pretrained_model(model_path, model_base, model_name, load_8bit=False, l
5454
model_name = f"llava_qwen_{model_name}"
5555
if "llava-seg-DeepGlint" in model_name:
5656
model_name = f"llava_qwen_{model_name}"
57-
57+
if "MLCD" in model_name:
58+
model_name = f"llava_qwen_{model_name}"
59+
5860
if "llava" in model_name.lower() or is_multimodal:
5961
# Load LLaVA model
6062
if "lora" in model_name.lower() and model_base is None:

0 commit comments

Comments
 (0)