Skip to content

Trustworthy-Information-Access/UtilitySelection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for paper "Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation"

This project aims to explore relevance ranking vs utility selection in RAG.
Thanks for RankGPT, RankLLM and Utility Annotation for dense retrieval.

🎉 [News]: [2025.09] Our paper is accepted by SIGIR-AP25.
🪧 [News]: Checkpoints and training dataset of RankQwen and UtilityQwen in our paper are released on UtilityQwen1.7B.

Quick example

Installation

  • Utility selection and relevance ranking needs anserini and Pyserini, which need Java. Please install Pyserini, and refer to the official documentation.
  • Generation distillation needs accelerate and flash-attn.

Datasets

100K training queries are sampled by RankGPT. Each query has the top 20 BM25-retrieved passages.

Start

Relevance ranking and utility selection annotation

cd llm_utility
sh utility.sh 

Generation distillation

cd rank_llm/training
sh run.sh 

Relevance ranking and utility selection test

cd RelevanceRank_UtilitySelection
sh run.sh 

About

Code of the paper "Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors