New Alternative to LLM-as-a-judge! 

Hello Clementine and the Evaluation Community,

We would like to introduce you to our **new** metric, **HumanRankEval**, an alternative to the popular 'llm-as-a-judge'. Instead of using the LLM to judge machine-generated text, we use human-generated text to 'judge' the LLM! :) Please take a look, thank you very much! Let us know what you think :)

NAACL '24 PAPER LINK: https://aclanthology.org/2024.naacl-long.456/
CODE: https://github.com/huawei-noah/noah-research/tree/master/NLP/HumanRankEval
DATA: https://huggingface.co/datasets/huawei-noah/human_rank_eval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Alternative to LLM-as-a-judge! #24

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New Alternative to LLM-as-a-judge! #24

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions