This repo is the official implementation of the paper "PairJudge RM: Perform Best-of-N Sampling with Knockout Tournament".
- 2025-01-31: We have released the checkpoint of our PairJudgeRM model. You can download it from here.
- 2025-01-31: We have released the training data of our PairJudgeRM model. You can download it from here.
data/: contains the datasets used in the experiments.PairJudge/: contains the source code of PairJudgeRM.PairJudge/compare_resp.py: contains the implementation of PairJudgeRM.PairJudge/knockout.py: contains the implementation of Knockout Tournament.
The checkpoint of our PairJudgeRM model is coming soon. Stay tuned!
Before that you can run the code will online llm api like gpt4o,claude-3.5-sonnet or gemini-1.5-flash
for example:
export PYTHONPATH=$PYTHONPATH:$(pwd)
# Define the input file
input_file=data/math-500/LLaMA-3.1-8B-Instruction_64.json
# Define the prompt template
prompt_template=prompts/compare_0_ex.md
# Define the base URL and API key
judge_model=gpt-4o
base_url="https://api.openai.com/v1"
api_key="YOUR_API_KEY"
# Run the Python script with the appropriate arguments
python pairwise/knockout.py \
--model $judge_model \
--input $input_file \
--prompt_template $prompt_template \
--base_url $base_url \
--api_key $api_key \
-n 64If you want to run the code on our PairJudgeRM model, you can replace the
judge_modelwithPairJudge-RMandbase_urlwithhttp://localhost:8000/v1. One vllm server is needed to run the code.
If you find our work useful, please consider citing our paper:
@article{liu2025PairJudge,
title={PairJudge RM: Perform Best-of-N Sampling with Knockout Tournament},
author={Liu, Yantao and Yao, Zijun and Min, Rui and Cao, Yixin and Hou, Lei and Li, Juanzi},
journal={arXiv preprint arXiv:2501.13007},
year={2025},
note={in progress work},
url={https://doi.org/10.48550/arXiv.2501.13007}
}