Skip to content

Parsed Model Answer logging+ MCQ Answer analysis#110

Open
mnishant2 wants to merge 1 commit intoMedARC-AI:mainfrom
mnishant2:feat/answer_analysis
Open

Parsed Model Answer logging+ MCQ Answer analysis#110
mnishant2 wants to merge 1 commit intoMedARC-AI:mainfrom
mnishant2:feat/answer_analysis

Conversation

@mnishant2
Copy link
Contributor

@mnishant2 mnishant2 commented Jan 26, 2026

This PR performs two things

  • Added the capability to log the model's parsed answer as well as the parsing method used as part of the info_dict, you can add this to any environments not yet covered by adding one line(info=info in accuracy function call, check README)

  • Added an answer analysis script to perform a comprehensive analysis of the model's answers, including variability, semantic consistency across rollouts, positional bias, a confusion metric, and an overall performance measure, using model logs (with and without parsed answer logged). Currently has a hardcoded list of benchmarks. Feel free to adjust; all the analysis output files are created in the output directory specified, which you can check out and use to create various plots/tables

  • Also has a low-key visualization script which creates a few heatmaps, scatter plots related to models variation rate, semantic consistency, win rate etc, feel free to expand on it
    Both these scripts are in the scripts folder

@mnishant2 mnishant2 changed the title log parsed answer + answer analysis Parsed Model Answer logging+ MCQ Answer analysis Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant