Code for paper "Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation"
This project aims to explore relevance ranking vs utility selection in RAG.
Thanks for RankGPT, RankLLM and Utility Annotation for dense retrieval.
🎉 [News]: [2025.09] Our paper is accepted by SIGIR-AP25.
🪧 [News]: Checkpoints and training dataset of RankQwen and UtilityQwen in our paper are released on UtilityQwen1.7B.
- Utility selection and relevance ranking needs anserini and Pyserini, which need Java. Please install Pyserini, and refer to the official documentation.
- Generation distillation needs accelerate and flash-attn.
100K training queries are sampled by RankGPT. Each query has the top 20 BM25-retrieved passages.
cd llm_utility
sh utility.sh
cd rank_llm/training
sh run.sh
cd RelevanceRank_UtilitySelection
sh run.sh