DeepEyesV2/evaluation/VLMEvalKit/README.md at main · Visual-Agent/DeepEyesV2

Evaluation

We adopt VLMEvalKit to conduct the evaluation.

Environment Setup

See the instruction of VLMEvalKit for more details on installation.

Pretrained Model

Download the pretrained checkpoints from here.

Usage

We utilize llm-as-a-judge to conduct the judgement of QA, the judgement operation is similar to that in RL training. You should set the JUDGE_API_BASE in .env.

After setting the judge server, you should deploy DeepEyesV2 as a server (you can use vLLM) and then modify the MODEL_CONFIGS in eval.sh.

MODEL_CONFIGS='{"DeepEyesV2-vllm": {"api_base": "http://10.39.6.41:28000,http://10.39.6.41:18000", "max_tokens": 20480}}'

The api_base is the IP address and port of the DeepEyesv2 server. You can start multiple servers to accelerate the evaluation, and note that commas should be used to separate different ports. Then, you can use the following command to conduct evaluation.

bash eval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation

Environment Setup

Pretrained Model

Usage

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Evaluation

Environment Setup

Pretrained Model

Usage