Skip to content

Add TruthfulQA benchmarkspec asset#4781

Merged
AbdelmohsenMS merged 8 commits intomainfrom
copilot/add-benchmarkspec-builtin-truthful-qa
Feb 13, 2026
Merged

Add TruthfulQA benchmarkspec asset#4781
AbdelmohsenMS merged 8 commits intomainfrom
copilot/add-benchmarkspec-builtin-truthful-qa

Conversation

Copy link
Contributor

Copilot AI commented Feb 13, 2026

Adds benchmarkspec for TruthfulQA dataset (817 questions measuring model truthfulness on common misconceptions across 38 categories). Restructures evaluator schema to support C# JObject deserialization.

Changes

  • New benchmarkspec: assets/benchmarkspecs/builtin/truthful_qa/

    • spec.yaml: Full benchmark definition with string_check evaluator
    • asset.yaml: Asset metadata
  • Simplified evaluator schema: Reduced to id + testingCriteria for JObject compatibility

    evaluator:
      id: "azureml://registries/azureml/evaluators/builtin.string_check/versions/2"
      testingCriteria:
        type: "string_check"
        input: "{{sample.output_text}}"
        operation: "eq"
        reference: "{{item.Best_Answer}}"
        name: "TruthfulQA"

Dataset Configuration

  • Source: domenicrosati/TruthfulQA (HuggingFace)
  • Schema: 7 string columns (Type, Category, Question, Best_Answer, Correct_Answers, Incorrect_Answers, Source)
  • Evaluator: Exact string match against Best_Answer
  • License: Apache 2.0

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits February 13, 2026 00:06
Co-authored-by: AbdelmohsenMS <127798197+AbdelmohsenMS@users.noreply.github.com>
Co-authored-by: AbdelmohsenMS <127798197+AbdelmohsenMS@users.noreply.github.com>
Copilot AI changed the title [WIP] Add benchmark specification for builtin.truthful_qa dataset Add builtin.truthful_qa benchmarkspec with string_check evaluator Feb 13, 2026
Copilot AI requested a review from AbdelmohsenMS February 13, 2026 00:09
Co-authored-by: AbdelmohsenMS <127798197+AbdelmohsenMS@users.noreply.github.com>
Copilot AI changed the title Add builtin.truthful_qa benchmarkspec with string_check evaluator Add builtin.truthful_qa benchmarkspec Feb 13, 2026
Co-authored-by: AbdelmohsenMS <127798197+AbdelmohsenMS@users.noreply.github.com>
@AbdelmohsenMS AbdelmohsenMS marked this pull request as ready for review February 13, 2026 06:32
@AbdelmohsenMS AbdelmohsenMS requested a review from a team as a code owner February 13, 2026 06:32
Co-authored-by: AbdelmohsenMS <127798197+AbdelmohsenMS@users.noreply.github.com>
Copilot AI changed the title Add builtin.truthful_qa benchmarkspec Add TruthfulQA benchmarkspec with simplified evaluator schema Feb 13, 2026
@AbdelmohsenMS AbdelmohsenMS changed the title Add TruthfulQA benchmarkspec with simplified evaluator schema Add TruthfulQA benchmarkspec asset Feb 13, 2026
@github-actions
Copy link

github-actions bot commented Feb 13, 2026

Test Results for assets-test

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit 303953c.

♻️ This comment has been updated with latest results.

@vizhur vizhur added the safe to publish Pull request containing new asset has been tested properly label Feb 13, 2026
@AbdelmohsenMS AbdelmohsenMS merged commit 2a7a8ee into main Feb 13, 2026
36 checks passed
@AbdelmohsenMS AbdelmohsenMS deleted the copilot/add-benchmarkspec-builtin-truthful-qa branch February 13, 2026 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to publish Pull request containing new asset has been tested properly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants