-
Notifications
You must be signed in to change notification settings - Fork 33
Description
I setup the API key in VLMEvalkit and want to inference for gpt4o, using the command below. However, the results obtained are really bad. Did you face the issue ?
export PW_TEST_SCREENSHOT_NO_FONTS_READY=1 PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ python eval_requery.py \ --rank 0 \ --world-size 1 \ --model_type vlmevalkit_GPT4o_20241120 \ --save_path 'output/requery/vlmevalkit_GPT4o_20241120'
The results are:
{ "f1_score": { "total_dict": { "total_length": 300, "average": 0.003703703703703704 }, "area_dict": { "news": { "length": 221, "average": 0.005027652086475616 }, "knowledge": { "length": 79, "average": 0.0 } }, "subfield_dict": { "auto": { "length": 10, "average": 0.0 }, "fashion": { "length": 10, "average": 0.0 }, "anime": { "length": 10, "average": 0.0 }, "architecture": { "length": 14, "average": 0.0 }, "paper": { "length": 24, "average": 0.0 }, "art": { "length": 25, "average": 0.0 }, "traditional sports": { "length": 16, "average": 0.0 }, "entertainment": { "length": 41, "average": 0.0 }, "finance": { "length": 31, "average": 0.007168458781362007 }, "general": { "length": 50, "average": 0.0 }, "astronomy": { "length": 10, "average": 0.0 }, "technology": { "length": 18, "average": 0.012345679012345678 }, "falsepremise": { "length": 13, "average": 0.0 }, "esports": { "length": 28, "average": 0.023809523809523808 } } } }