-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
Hello Clementine and the Evaluation Community,
We would like to introduce you to our new metric, HumanRankEval, an alternative to the popular 'llm-as-a-judge'. Instead of using the LLM to judge machine-generated text, we use human-generated text to 'judge' the LLM! :) Please take a look, thank you very much! Let us know what you think :)
NAACL '24 PAPER LINK: https://aclanthology.org/2024.naacl-long.456/
CODE: https://github.com/huawei-noah/noah-research/tree/master/NLP/HumanRankEval
DATA: https://huggingface.co/datasets/huawei-noah/human_rank_eval
Metadata
Metadata
Assignees
Labels
No labels