❓ General Questions
I evaluated the model using lm-evaluation-harness on MedMCQA, MedQA-USMLE and PubMedQA and the model performs barely above llama2 7b with only 38% on the USMLE, 36% on MedMCQa and 73.9% on PubMedQA.
Could you describe how you got your results?