Skip to content

Commit e5d743e

Browse files
authored
[Task] Added VSIBench debiased & pruned (#975)
* added vsibench_debiased and vsibench_pruned subset * fixed yaml name * fixed yaml files
1 parent 1a52f92 commit e5d743e

File tree

4 files changed

+56
-36
lines changed

4 files changed

+56
-36
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
dataset_path: nyu-visionx/VSI-Bench
2+
3+
output_type: generate_until
4+
process_docs: !function utils.process_docs
5+
doc_to_visual: !function utils.vsibench_doc_to_visual
6+
doc_to_text: !function utils.vsibench_doc_to_text
7+
doc_to_target: "ground_truth"
8+
generation_kwargs:
9+
max_new_tokens: 16
10+
temperature: 0
11+
top_p: 1.0
12+
num_beams: 1
13+
do_sample: false
14+
# The return value of process_results will be used by metrics
15+
process_results: !function utils.vsibench_process_results
16+
# Note that the metric name can be either a registed metric function (such as the case for GQA) or a key name returned by process_results
17+
metric_list:
18+
- metric: vsibench_score
19+
aggregation: !function utils.vsibench_aggregate_results
20+
higher_is_better: true
21+
lmms_eval_specific_kwargs:
22+
default:
23+
pre_prompt: ""
24+
mca_post_prompt: "Answer with the option's letter from the given choices directly."
25+
na_post_prompt: "Please answer the question using a single word or phrase."
26+
gemini_api:
27+
pre_prompt: ""
28+
mca_post_prompt: "Answer with the option's letter from the given choices directly."
29+
na_post_prompt: "Do not response anything other than a single number!"
30+
gpt4v:
31+
pre_prompt: ""
32+
mca_post_prompt: "Answer with the option's letter from the given choices directly."
33+
na_post_prompt: "Do not response anything other than a single number!"
34+
metadata:
35+
- version: 0.0
Lines changed: 5 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,9 @@
1-
dataset_path: nyu-visionx/VSI-Bench
1+
dataset_name: full
2+
test_split: test
3+
task: "vsibench"
24
dataset_kwargs:
35
token: True
46
cache_dir: vsibench
57
video: True
6-
task: vsibench
7-
test_split: test
8-
output_type: generate_until
9-
process_docs: !function utils.process_docs
10-
doc_to_visual: !function utils.vsibench_doc_to_visual
11-
doc_to_text: !function utils.vsibench_doc_to_text
12-
doc_to_target: "ground_truth"
13-
generation_kwargs:
14-
max_new_tokens: 16
15-
temperature: 0
16-
top_p: 1.0
17-
num_beams: 1
18-
do_sample: false
19-
# The return value of process_results will be used by metrics
20-
process_results: !function utils.vsibench_process_results
21-
# Note that the metric name can be either a registed metric function (such as the case for GQA) or a key name returned by process_results
22-
metric_list:
23-
- metric: vsibench_score
24-
aggregation: !function utils.vsibench_aggregate_results
25-
higher_is_better: true
26-
lmms_eval_specific_kwargs:
27-
default:
28-
pre_prompt: ""
29-
mca_post_prompt: "Answer with the option's letter from the given choices directly."
30-
na_post_prompt: "Please answer the question using a single word or phrase."
31-
gemini_api:
32-
pre_prompt: ""
33-
mca_post_prompt: "Answer with the option's letter from the given choices directly."
34-
na_post_prompt: "Do not response anything other than a single number!"
35-
gpt4v:
36-
pre_prompt: ""
37-
mca_post_prompt: "Answer with the option's letter from the given choices directly."
38-
na_post_prompt: "Do not response anything other than a single number!"
39-
metadata:
40-
- version: 0.0
8+
include: _default_template_yaml
9+
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
dataset_name: debiased
2+
test_split: test
3+
task: "vsibench_debiased"
4+
dataset_kwargs:
5+
token: True
6+
cache_dir: vsibench
7+
video: True
8+
include: _default_template_yaml
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
dataset_name: full
2+
test_split: test
3+
task: "vsibench_pruned"
4+
dataset_kwargs:
5+
token: True
6+
cache_dir: vsibench
7+
video: True
8+
include: _default_template_yaml

0 commit comments

Comments
 (0)