PairJudgeRM

This repo is the official implementation of the paper "PairJudge RM: Perform Best-of-N Sampling with Knockout Tournament".

News

2025-01-31: We have released the checkpoint of our PairJudgeRM model. You can download it from here.
2025-01-31: We have released the training data of our PairJudgeRM model. You can download it from here.

Repository Structure

data/: contains the datasets used in the experiments.
PairJudge/: contains the source code of PairJudgeRM.
PairJudge/compare_resp.py: contains the implementation of PairJudgeRM.
PairJudge/knockout.py: contains the implementation of Knockout Tournament.

The checkpoint of our PairJudgeRM model is coming soon. Stay tuned!

Before that you can run the code will online llm api like gpt4o,claude-3.5-sonnet or gemini-1.5-flash

for example:

export PYTHONPATH=$PYTHONPATH:$(pwd)

# Define the input file
input_file=data/math-500/LLaMA-3.1-8B-Instruction_64.json

# Define the prompt template
prompt_template=prompts/compare_0_ex.md

# Define the base URL and API key
judge_model=gpt-4o
base_url="https://api.openai.com/v1"
api_key="YOUR_API_KEY"

# Run the Python script with the appropriate arguments
python pairwise/knockout.py \
    --model $judge_model \
    --input $input_file \
    --prompt_template $prompt_template \
    --base_url $base_url \
    --api_key $api_key \
    -n 64

If you want to run the code on our PairJudgeRM model, you can replace the judge_model with PairJudge-RM and base_url with http://localhost:8000/v1. One vllm server is needed to run the code.

Citation

If you find our work useful, please consider citing our paper:

@article{liu2025PairJudge,
  title={PairJudge RM: Perform Best-of-N Sampling with Knockout Tournament},
  author={Liu, Yantao and Yao, Zijun and Min, Rui and Cao, Yixin and Hou, Lei and Li, Juanzi},
  journal={arXiv preprint arXiv:2501.13007},
  year={2025},
  note={in progress work},
  url={https://doi.org/10.48550/arXiv.2501.13007}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
pairwise		pairwise
prompt		prompt
tools		tools
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PairJudgeRM

News

Repository Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PairJudgeRM

News

Repository Structure

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages