Skip to content

Commit f799c71

Browse files
committed
update bibtex and paper
1 parent c56a258 commit f799c71

File tree

1 file changed

+15
-1
lines changed

1 file changed

+15
-1
lines changed

README.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,19 @@
1616
<a href="https://huggingface.co/THU-KEG/IF-Verifier-7B">
1717
<img src="https://img.shields.io/badge/Model-Verifier-blue" alt="Verifier">
1818
</a>
19+
<a href="https://arxiv.org/abs/2506.09942">
20+
<img src="https://img.shields.io/badge/paper-arxiv-pink"
21+
alt="Paper">
22+
</a>
1923

2024
</div>
2125

26+
2227
---
2328

2429
## Introduction
2530

26-
**VerIF** is a practical and efficient method for **verification in instruction-following reinforcement learning**. Built on the idea of *Reinforcement Learning with Verifiable Rewards (RLVR)*, VerIF integrates **rule-based code checks** with **LLM-based reasoning verification** (e.g., QwQ-32B) to provide accurate and scalable reward signals.
31+
[**VerIF**](https://arxiv.org/abs/2506.09942) is a practical and efficient method for **verification in instruction-following reinforcement learning**. Built on the idea of *Reinforcement Learning with Verifiable Rewards (RLVR)*, VerIF integrates **rule-based code checks** with **LLM-based reasoning verification** (e.g., QwQ-32B) to provide accurate and scalable reward signals.
2732

2833
To support this method, we construct a high-quality dataset, **VerInstruct**, with ~22,000 instruction-following instances paired with verification signals. Models trained with VerIF not only achieve **state-of-the-art performance** on several benchmarks across models at similar scale but also maintain their general capabilities.
2934

@@ -98,4 +103,13 @@ We thank the [**verl**](https://github.com/volcengine/verl) team for their open-
98103
## Citations
99104
If this repo helps, please kindly cite us:
100105
```
106+
@misc{peng2025verif,
107+
title={VerIF: Verification Engineering for Reinforcement Learning in Instruction Following},
108+
author={Hao Peng and Yunjia Qi and Xiaozhi Wang and Bin Xu and Lei Hou and Juanzi Li},
109+
year={2025},
110+
eprint={2506.09942},
111+
archivePrefix={arXiv},
112+
primaryClass={cs.CL},
113+
url={https://arxiv.org/abs/2506.09942},
114+
}
101115
```

0 commit comments

Comments
 (0)