Skip to content

Commit 87e3c70

Browse files
committed
update readme
1 parent 0fe258c commit 87e3c70

File tree

1 file changed

+12
-10
lines changed

1 file changed

+12
-10
lines changed

README.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,24 +8,26 @@
88
**VerIF** is a practical and efficient method for **verification in instruction-following reinforcement learning**. Built on the idea of *Reinforcement Learning with Verifiable Rewards (RLVR)*, VerIF integrates **rule-based code checks** with **LLM-based reasoning verification** (e.g., QwQ-32B) to provide accurate and scalable reward signals.
99

1010
To support this method, we construct a high-quality dataset, **VerInstruct**, with ~22,000 instruction-following instances paired with verification signals. Models trained with VerIF not only achieve **state-of-the-art performance** on several benchmarks across models at similar scale but also maintain their general capabilities.
11-
---
1211

13-
## 🔥 Results
12+
### 🔥 Results
13+
14+
<img src="./assets/results.png" alt="Result Chart" width="100%"/>
1415

15-
![Result Chart](./assets/results.png)
1616
*RL with VerIF significantly improves instruction-following performance across benchmarks.*
1717

18-
---
19-
## Method
20-
![Method Figure](./assets/method.png)
18+
### Method
19+
<p align="center">
20+
<img src="./assets/method.png" alt="Method Figure" width="80%"/>
21+
</p>
22+
2123
*VerIF integrates **rule-based code checks** with **LLM-based reasoning verification** (e.g., QwQ-32B) to provide accurate and scalable reward signals.*
2224

2325
---
2426

2527

2628
## Data & Trained Models
2729

28-
- [VerInstrcut (22k instruction-following examples with verifiable signals)](./data/)
30+
- [VerInstruct (22k instruction-following examples with verifiable signals)](./data/)
2931
- [R1-Distill-Qwen-7B-VerIF](./models/qwen2-7b-verif/), based on DeepSeek-R1-R1-Distill-Qwen-7B
3032
- [TULU3-VerIF](./models/tulu3-8b-verif/), based on Llama-3.1-Tulu-3-8B-SFT
3133

@@ -49,7 +51,7 @@ Please refer to the original [verl documentation](https://github.com/volcengine/
4951

5052
### Step 1: Preprocess Data
5153
Download data from [here](). Use `./examples/data_preprocess/if_prompts.py` to preprocess VerInstruct.
52-
> Make sure to adjust paths and add the import path for `./verl/utils/reward_score/local_server` at the top of each function.
54+
> Make sure to add the import path for `./verl/utils/reward_score/local_server` at the top of each function.
5355
5456
### Step 2: Setup the Verifier Model
5557
For **soft constraint verification**, use an LLM-based verifier. You may:
@@ -67,13 +69,13 @@ Use the provided training scripts:
6769

6870
These use DeepSeek-RL-Distilled-Qwen-7B and TULU 3 SFT as base models.
6971
Update paths to point to your model checkpoint if needed.
72+
7073
---
7174

7275
## Acknowledgments
7376

74-
We thank the [**verl**](https://github.com/volcengine/verl) team for their open-source framework, and the [**Crab**](https://github.com/THU-KEG/Crab) team for their original data.
77+
We thank the [**verl**](https://github.com/volcengine/verl) team for their open-source framework, and the [**Crab**](https://github.com/THU-KEG/Crab) team for their open-sourced original data.
7578

76-
---
7779

7880
## Citations
7981
If this repo helps, please kindly cite us:

0 commit comments

Comments
 (0)