Add LLM-Evaluations semantic anchor by Copilot · Pull Request #276 · LLM-Coding/Semantic-Anchors

Copilot · 2026-03-18T09:39:43Z

Adds the LLM-Evaluations semantic anchor, covering the established frameworks, benchmark suites, and metrics used to assess Large Language Model capabilities.

New Files

docs/anchors/llm-evaluations.adoc — English anchor with concepts: benchmark suites (MMLU, HellaSwag, HumanEval, BIG-Bench, GSM8K), evaluation metrics (perplexity, BLEU, ROUGE, pass@k), HELM, Chatbot Arena/Elo, Open LLM Leaderboard, red-teaming, and contamination detection
docs/anchors/llm-evaluations.de.adoc — German translation

Updated Files

website/public/data/anchors.json — Entry added (105 anchors total); regenerated via extract-metadata.js
skill/semantic-anchor-translator/references/catalog.md — Entry appended to the Testing & Quality section

Metadata:

Category: testing-quality
Roles: data-scientist, software-developer, qa-engineer, software-architect
Proponents: Percy Liang (Stanford HELM), EleutherAI (Open LLM Leaderboard), LMSYS (Chatbot Arena)
Related: chain-of-thought, sota, mutation-testing

Original prompt

This section details on the original issue you should resolve

<issue_title>[Anchor Proposal]: LLM-Evaluations</issue_title>
<issue_description>### Proposed Term

LLM-Evaluations

Context (Optional)

No response

Pre-submission Checklist

I have searched existing anchors and this term is not already included

This term refers to a well-established concept (not something I invented)</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes [Anchor Proposal]: LLM-Evaluations #275

💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

Co-authored-by: rdmueller <1856308+rdmueller@users.noreply.github.com>

Initial plan

8d50e47

Copilot AI assigned Copilot and rdmueller Mar 18, 2026

Copilot started work on behalf of rdmueller March 18, 2026 09:39 View session

Add LLM-Evaluations semantic anchor with EN/DE files and catalog entry

e22839e

Co-authored-by: rdmueller <1856308+rdmueller@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] [Anchor Proposal] Add LLM-Evaluations to anchors~~ Add LLM-Evaluations semantic anchor Mar 18, 2026

Copilot AI requested a review from rdmueller March 18, 2026 09:47

Copilot finished work on behalf of rdmueller March 18, 2026 09:47

rdmueller marked this pull request as ready for review March 18, 2026 12:40

rdmueller merged commit 8bf9002 into main Mar 18, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLM-Evaluations semantic anchor#276

Add LLM-Evaluations semantic anchor#276
rdmueller merged 2 commits intomainfrom
copilot/add-llm-evaluations-anchor

Copilot AI commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Files

Updated Files

Context (Optional)

Pre-submission Checklist

Comments on the Issue (you are @copilot in this section)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 18, 2026 •

edited

Loading