Skip to content

Commit 9a895c4

Browse files
New tasks supported: EMMA (#790)
* init emma * remove log files * Update .gitignore * add minor changes
1 parent 0aaff1d commit 9a895c4

File tree

4 files changed

+482
-1
lines changed

4 files changed

+482
-1
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,4 +49,4 @@ outputs/
4949
span.log
5050
uv.lock
5151
workspace/*
52-
.claude/*
52+
.claude/*

lmms_eval/tasks/emma/emma_all.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
dataset_path: lmms-lab/EMMA
2+
dataset_name: All # Options available are: "All" for all data, "Chemistry" for chemistry only, "Physics" for physics only, "Coding" for code only and "Math" for math only
3+
dataset_kwargs:
4+
token: True
5+
cache_dir: EMMA
6+
force_download: true
7+
task: "emma"
8+
test_split: test
9+
output_type: generate_until
10+
doc_to_visual: !function utils.emma_doc_to_visual
11+
doc_to_text: !function utils.emma_doc_to_text
12+
doc_to_target: utils.emma_doc_to_target
13+
doc_to_messages: !function utils.emma_doc_to_messages
14+
generation_kwargs:
15+
max_new_tokens: 4096
16+
temperature: 0.7
17+
# The return value of process_results will be used by metrics
18+
process_results: !function utils.emma_process_results
19+
# Note that the metric name can be either a registed metric function (such as the case for GQA) or a key name returned by process_results
20+
metric_list:
21+
- metric: emma_score
22+
aggregation: !function utils.emma_aggregate_results
23+
higher_is_better: true
24+
metadata:
25+
strategy: CoT
26+
interleaved_format: True
27+
use_lmms_judge: True
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
dataset_path: lmms-lab/EMMA-mini
2+
dataset_name: All # Options available are: "All" for all data, "Chemistry" for chemistry only, "Physics" for physics only, "Coding" for code only and "Math" for math only
3+
dataset_kwargs:
4+
token: True
5+
cache_dir: EMMA-mini
6+
force_download: true
7+
task: "emma-mini"
8+
test_split: test
9+
output_type: generate_until
10+
doc_to_visual: !function utils.emma_doc_to_visual
11+
doc_to_text: !function utils.emma_doc_to_text
12+
doc_to_target: utils.emma_doc_to_target
13+
generation_kwargs:
14+
max_new_tokens: 4096
15+
temperature: 0.7
16+
# The return value of process_results will be used by metrics
17+
process_results: !function utils.emma_process_results
18+
# Note that the metric name can be either a registed metric function (such as the case for GQA) or a key name returned by process_results
19+
metric_list:
20+
- metric: emma_score
21+
aggregation: !function utils.emma_aggregate_results
22+
higher_is_better: true
23+
metadata:
24+
strategy: CoT
25+
interleaved_format: False
26+
use_lmms_judge: True

0 commit comments

Comments
 (0)