The test results of lynx on the MSCOCO ITM task are questionable

First of all, thank you for a great job! I ran into a few issues while following the tutorial to reproduce:

I first follow [tutorial](https://github.com/FudanDISC/ReForm-Eval#create-your-own-model-interface) to emersion lynx ACC on MSCOCO_ITM task, that is, Table18 in the paper. I used the following command:
```
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2  run_eval.py \
    --model lynx  --model_name models/interfaces/lynx/configs/LYNX.yaml \
    --dataset_name MSCOCO --output_dir output/lynx/MSCOCO/test_generation/ \
    --per_gpu_eval_batch_size 4 --formulation SingleChoice \
    --infer_method generation --do_eval --half_evaluation  --dataset_duplication 1 \
    --in_context_sample --option_mark upper \
    --dataset_config build/configs/ImageTextMatching_val.yaml \
    --offline_hf
```

I used `generation` as the inference method, but the results I get were rather strange:
```
2023-11-01 16:00:35,236 ReForm-Eval Evaluation INFO: the evalueted SingleChoice result: 0.0
2023-11-01 16:00:35,236 ReForm-Eval Evaluation INFO: the format hit rate is 0.0
```

If I use `likelihood` as the inference method, the results are still different from that in the paper:
```
2023-11-01 15:39:14,806 ReForm-Eval Evaluation INFO: the evalueted SingleChoice result: 0.5183333333333333
2023-11-01 15:39:14,806 ReForm-Eval Evaluation INFO: the format hit rate is 1.0
```

I'm at a loss to understand, and I hope you can help to point out where the problem may be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The test results of lynx on the MSCOCO ITM task are questionable #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The test results of lynx on the MSCOCO ITM task are questionable #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions