PrefRAG: Preference-Driven Multi-Source Retrieval Augmented Generation

PrefRAG PrefRAG is a novel multi-source ARAG framework, which enhances RAG by enabling in-depth and controllable exploration of diverse retrieval sources through preference-driven adaptive retrieval and self-reflection.

⚙️ Environmental Setup

Install the requirements with pip: pip install -r requirements.txt. For model inference, we recommend using vLLM to significantly speed up the inference process.

⚙️ Data Preparation

You can download our standardized datasets (including corpus, training and test sets) by running the command below. For the BioASQ-Y/N corpus, due to its large size, please download it separately from link.

bash download/raw_data.sh

The data will be downloaded in the data/.

⚙️ Retriever Setup

We implement two types of retrievers:

Sparse Retriever Based on BM25 algorithm implemented in Elasticsearch

Download and install Elasticsearch server

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512
shasum -a 512 -c elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-7.10.2-linux-x86_64.tar.gz
cd elasticsearch-7.10.2/
./bin/elasticsearch # Start the server

Create document index:

# Take MusiQue dataset as an example
cd src/create_index/es
python index_musique.py

Dense Retriever Based on bge-large-en-v1.5 model

Download the bge-large-en-v1.5 model
Create document embedding index:

# Take MusiQue dataset as an example
cd src/create_index/emb
python index.py --dataset musique

🖥️ PrefRAG Training

Prepare DPO training dataset:

# Generate DPO training data with specified dataset and device
python pre_dpo_data.py --output_path ../data/dpo_data --evaluator_model glm-4-plus --device 0,1,2,3

# After data generation, use process_data.ipynb to customize the proportion of different data types in the generated training set

Start training:

bash train.sh

📊 Running PrefRAG and Evaluation

python main.py --method prefrag --retrieve_top_k 5 --dataset musique  --model gpt-4o-mini-2024-07-18 --retrieve_method es

The inference process and evaluation results can be found in the output/ directory.

Evaluation Result on Each Dataset

Here we present partial experimental results across all datasets, where BM25 is used as the retrieval method with top-k=5 documents retrieved.

Methods & LLMs	HotpotQA				2WikiMQA				MuSiQue				BioASQ-Y/N
	Acc.	F1	EM	Avg.	Acc.	F1	EM	Avg.	Acc.	F1	EM	Avg.	Acc.
PrefRAG_Llama3.1-8B-Instruct	42.0	51.1	38.8	44.0	42.0	43.2	35.8	40.3	15.4	21.0	12.8	16.4	89.6
PrefRAG_GLM4-9B-chat	45.4	56.3	42.2	48.0	55.0	53.7	42.0	50.2	23.0	29.4	20.0	24.1	87.6
PrefRAG_GPT-4o-mini	58.6	66.0	50.4	56.6	76.2	72.1	59.4	69.2	28.2	34.3	21.2	27.9	92.8
PrefRAG_GLM4-Plus	59.0	68.4	55.0	60.8	79.6	76.7	65.2	73.8	32.2	39.4	27.4	33.0	94.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrefRAG: Preference-Driven Multi-Source Retrieval Augmented Generation

⚙️ Environmental Setup

⚙️ Data Preparation

⚙️ Retriever Setup

🖥️ PrefRAG Training

📊 Running PrefRAG and Evaluation

Evaluation Result on Each Dataset

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

PrefRAG: Preference-Driven Multi-Source Retrieval Augmented Generation

⚙️ Environmental Setup

⚙️ Data Preparation

⚙️ Retriever Setup

🖥️ PrefRAG Training

📊 Running PrefRAG and Evaluation

Evaluation Result on Each Dataset