Skip to content

Add LLM-Evaluations semantic anchor#276

Merged
rdmueller merged 2 commits intomainfrom
copilot/add-llm-evaluations-anchor
Mar 18, 2026
Merged

Add LLM-Evaluations semantic anchor#276
rdmueller merged 2 commits intomainfrom
copilot/add-llm-evaluations-anchor

Conversation

Copy link
Contributor

Copilot AI commented Mar 18, 2026

Adds the LLM-Evaluations semantic anchor, covering the established frameworks, benchmark suites, and metrics used to assess Large Language Model capabilities.

New Files

  • docs/anchors/llm-evaluations.adoc — English anchor with concepts: benchmark suites (MMLU, HellaSwag, HumanEval, BIG-Bench, GSM8K), evaluation metrics (perplexity, BLEU, ROUGE, pass@k), HELM, Chatbot Arena/Elo, Open LLM Leaderboard, red-teaming, and contamination detection
  • docs/anchors/llm-evaluations.de.adoc — German translation

Updated Files

  • website/public/data/anchors.json — Entry added (105 anchors total); regenerated via extract-metadata.js
  • skill/semantic-anchor-translator/references/catalog.md — Entry appended to the Testing & Quality section

Metadata:

  • Category: testing-quality
  • Roles: data-scientist, software-developer, qa-engineer, software-architect
  • Proponents: Percy Liang (Stanford HELM), EleutherAI (Open LLM Leaderboard), LMSYS (Chatbot Arena)
  • Related: chain-of-thought, sota, mutation-testing
Original prompt

This section details on the original issue you should resolve

<issue_title>[Anchor Proposal]: LLM-Evaluations</issue_title>
<issue_description>### Proposed Term

LLM-Evaluations

Context (Optional)

No response

Pre-submission Checklist

  • I have searched existing anchors and this term is not already included
  • This term refers to a well-established concept (not something I invented)</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

Co-authored-by: rdmueller <1856308+rdmueller@users.noreply.github.com>
Copilot AI changed the title [WIP] [Anchor Proposal] Add LLM-Evaluations to anchors Add LLM-Evaluations semantic anchor Mar 18, 2026
Copilot AI requested a review from rdmueller March 18, 2026 09:47
@rdmueller rdmueller marked this pull request as ready for review March 18, 2026 12:40
@rdmueller rdmueller merged commit 8bf9002 into main Mar 18, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Anchor Proposal]: LLM-Evaluations

2 participants