Skip to content

feat: add MedReason environment and environment validation tests#119

Open
Rakshitha-Ireddi wants to merge 1 commit intoMedARC-AI:mainfrom
Rakshitha-Ireddi:feat/medreason-env-and-validation-tests
Open

feat: add MedReason environment and environment validation tests#119
Rakshitha-Ireddi wants to merge 1 commit intoMedARC-AI:mainfrom
Rakshitha-Ireddi:feat/medreason-env-and-validation-tests

Conversation

@Rakshitha-Ireddi
Copy link

  • Add MedReason verifiers environment

    • Supports mixed MCQ and open-ended question evaluation
    • MCQ items graded via multiple_choice_accuracy
    • Open-ended items evaluated via LLM-as-Judge (JudgeRubric)
    • Configurable answer format (XML/boxed), shuffle, judge model
  • Add environment package validation test suite (original contribution)

    • Auto-discovers all 35 environment packages
    • Validates pyproject.toml structure, loader discoverability, load_environment presence, verifiers dependency
    • 7 pre-existing issues documented as xfail markers

Authors

  • Ireddi Rakshitha
  • Devavarapu Yaswanth

@CLAassistant
Copy link

CLAassistant commented Feb 16, 2026

CLA assistant check
All committers have signed the CLA.

- Add MedReason verifiers environment (closes MedARC-AI#31)
  - Supports mixed MCQ and open-ended question evaluation
  - MCQ items graded via multiple_choice_accuracy
  - Open-ended items evaluated via LLM-as-Judge (JudgeRubric)
  - Configurable answer format (XML/boxed), shuffle, judge model

- Add environment package validation test suite (original contribution)
  - Auto-discovers all 35 environment packages
  - Validates pyproject.toml structure, loader discoverability,
    load_environment presence, verifiers dependency
  - 7 pre-existing issues documented as xfail markers
@Rakshitha-Ireddi Rakshitha-Ireddi force-pushed the feat/medreason-env-and-validation-tests branch from 39b71bb to 7afe2f8 Compare February 16, 2026 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants