Skip to content

mh-tang/Utility-Focused-LLM-Annotation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Utility-focused-annotation

[🎉 2025-09] MS MARCO and NQ annotations are released in Hugging Face.
[🎉 2025-08] Our paper "Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation" has been accepted at #EMNLP25 for the main conference!

Overview

This repository contains the code, datasets, and models used in our paper: "Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation".

This paper explores the use of large language models (LLMs) for annotating document utility in training retrieval and retrieval-augmented generation (RAG) systems, aiming to reduce dependence on costly human annotations. We address the gap between retrieval relevance and generative utility by employing LLMs to annotate document utility. To effectively utilize multiple positive samples per query, we introduce a novel loss that maximizes their summed marginal likelihood. Using the Qwen-2.5-32B model, we annotate utility on the MS MARCO dataset and conduct retrieval experiments on MS MARCO and BEIR, as well as RAG experiments on MS MARCO QA, NQ, and HotpotQA. Our results show that LLM-generated annotations enhance out-of-domain retrieval performance and improve RAG outcomes compared to models trained solely on human annotations or downstream QA metrics. Furthermore, combining LLM annotations with just 20% of human labels achieves performance comparable to using full human annotations. Our study offers a comprehensive approach to utilizing LLM annotations for initializing QA systems on new corpora.

Download dataset

We utilize in-domain settings (MSMARCO v1 and TREC-DL NQ) and out-of-domain settings (BEIR) on both the retrieval and RAG tasks.

LLMs Annotations

We use the hard negative samples provided by Tevatron official's repository. The prompts used in our paper are shown in prompts.md.

Retrievers Training

We use the RetroMAE and Contriever as our retriever backbone, which can be downloaded on RetroMAE Pre-training on MSMARCO Passage and Contriever

sh run.sh

Checkpoint

Currently, all the LLM annotated positive labels and the models' checkpoints are in the Hugging Face.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 61.7%
  • Jupyter Notebook 35.9%
  • Shell 2.4%