You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-10Lines changed: 12 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,24 +8,26 @@
8
8
**VerIF** is a practical and efficient method for **verification in instruction-following reinforcement learning**. Built on the idea of *Reinforcement Learning with Verifiable Rewards (RLVR)*, VerIF integrates **rule-based code checks** with **LLM-based reasoning verification** (e.g., QwQ-32B) to provide accurate and scalable reward signals.
9
9
10
10
To support this method, we construct a high-quality dataset, **VerInstruct**, with ~22,000 instruction-following instances paired with verification signals. Models trained with VerIF not only achieve **state-of-the-art performance** on several benchmarks across models at similar scale but also maintain their general capabilities.
*VerIF integrates **rule-based code checks** with **LLM-based reasoning verification** (e.g., QwQ-32B) to provide accurate and scalable reward signals.*
22
24
23
25
---
24
26
25
27
26
28
## Data & Trained Models
27
29
28
-
-[VerInstrcut (22k instruction-following examples with verifiable signals)](./data/)
30
+
-[VerInstruct (22k instruction-following examples with verifiable signals)](./data/)
29
31
-[R1-Distill-Qwen-7B-VerIF](./models/qwen2-7b-verif/), based on DeepSeek-R1-R1-Distill-Qwen-7B
30
32
-[TULU3-VerIF](./models/tulu3-8b-verif/), based on Llama-3.1-Tulu-3-8B-SFT
31
33
@@ -49,7 +51,7 @@ Please refer to the original [verl documentation](https://github.com/volcengine/
49
51
50
52
### Step 1: Preprocess Data
51
53
Download data from [here](). Use `./examples/data_preprocess/if_prompts.py` to preprocess VerInstruct.
52
-
> Make sure to adjust paths and add the import path for `./verl/utils/reward_score/local_server` at the top of each function.
54
+
> Make sure to add the import path for `./verl/utils/reward_score/local_server` at the top of each function.
53
55
54
56
### Step 2: Setup the Verifier Model
55
57
For **soft constraint verification**, use an LLM-based verifier. You may:
@@ -67,13 +69,13 @@ Use the provided training scripts:
67
69
68
70
These use DeepSeek-RL-Distilled-Qwen-7B and TULU 3 SFT as base models.
69
71
Update paths to point to your model checkpoint if needed.
72
+
70
73
---
71
74
72
75
## Acknowledgments
73
76
74
-
We thank the [**verl**](https://github.com/volcengine/verl) team for their open-source framework, and the [**Crab**](https://github.com/THU-KEG/Crab) team for their original data.
77
+
We thank the [**verl**](https://github.com/volcengine/verl) team for their open-source framework, and the [**Crab**](https://github.com/THU-KEG/Crab) team for their open-sourced original data.
0 commit comments