Skip to content

Restructure GPQA Diamond evaluator section for JObject deserialization#4782

Merged
AbdelmohsenMS merged 3 commits intomainfrom
copilot/update-gpqa-diamond-version
Feb 13, 2026
Merged

Restructure GPQA Diamond evaluator section for JObject deserialization#4782
AbdelmohsenMS merged 3 commits intomainfrom
copilot/update-gpqa-diamond-version

Conversation

Copy link
Contributor

Copilot AI commented Feb 13, 2026

Refactors the evaluator section in gpqa_diamond/spec.yaml to support JObject deserialization in C# consumers. The evaluator configuration is now encapsulated in a testingCriteria field.

Changes

  • Evaluator structure: Reduced to two fields: id and testingCriteria
  • testingCriteria: Contains full evaluator config as a JSON-serializable object with snake_case keys:
    • evaluatorNameevaluator_name
    • versionevaluator_version
    • overrideInitParameterSchemainitialization_parameters
    • dataMappingSchemadata_mapping
  • Version: Bumped from 1 to 2

New Structure

evaluator:
  id: "azureml://registries/azureml/evaluators/builtin.f1_score/versions/1"
  testingCriteria:
    type: "azure_ai_evaluator"
    name: "GPQA_Diamond"
    evaluator_name: "builtin.f1_score"
    evaluator_version: "1"
    initialization_parameters:
      threshold: 0.5
    data_mapping:
      response: "{{sample.output_text}}"
      ground_truth: "{{item.Correct_Answer}}"

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

… version to 2

Co-authored-by: AbdelmohsenMS <127798197+AbdelmohsenMS@users.noreply.github.com>
Copilot AI changed the title [WIP] Update version and evaluator section in spec.yaml Restructure GPQA Diamond evaluator section for JObject deserialization Feb 13, 2026
Copilot AI requested a review from AbdelmohsenMS February 13, 2026 02:26
@AbdelmohsenMS AbdelmohsenMS marked this pull request as ready for review February 13, 2026 03:06
@AbdelmohsenMS AbdelmohsenMS requested a review from a team as a code owner February 13, 2026 03:06
@github-actions
Copy link

github-actions bot commented Feb 13, 2026

Test Results for assets-test

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit c7ee384.

♻️ This comment has been updated with latest results.

@AbdelmohsenMS AbdelmohsenMS merged commit ae409b6 into main Feb 13, 2026
36 checks passed
@AbdelmohsenMS AbdelmohsenMS deleted the copilot/update-gpqa-diamond-version branch February 13, 2026 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants