You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -27,15 +45,15 @@ To support this method, we construct a high-quality dataset, **VerInstruct**, wi
27
45
28
46
## Data & Trained Models
29
47
30
-
-[VerInstruct (22k instruction-following examples with verifiable signals)](data)
31
-
-[R1-Distill-Qwen-7B-VerIF](model), based on DeepSeek-R1-R1-Distill-Qwen-7B
32
-
-[TULU3-VerIF](model), based on Llama-3.1-Tulu-3-8B-SFT
48
+
-[VerInstruct (24k instruction-following examples with verifiable signals)](https://huggingface.co/datasets/THU-KEG/VerInstruct)
49
+
-[TULU3-VerIF](https://huggingface.co/THU-KEG/TULU3-VerIF), based on Llama-3.1-Tulu-3-8B-SFT
50
+
-[R1-Distill-Qwen-7B-VerIF](https://huggingface.co/THU-KEG/R1-Distill-Qwen-7B-VerIF), based on DeepSeek-R1-R1-Distill-Qwen-7B
33
51
34
52
---
35
53
36
54
## Training Guide
37
55
38
-
This repo is forked from [verl](https://github.com/volcengine/verl). We sincerely thank the authors for their excellent framework. VerIF introduces two key adjustments:
56
+
This repo is forked from [verl](https://github.com/volcengine/verl). We sincerely thank the authors for their excellent framework. We introduce two key adjustments:
39
57
40
58
1.**Efficient Local Reward Server**:
41
59
We provide a `local_server` version of the reward function for better efficiency. We recommend running it inside a **sandboxed Docker** environment to avoid potential security issues. You may also deploy your own remote server.
@@ -50,12 +68,12 @@ This repo is forked from [verl](https://github.com/volcengine/verl). We sincerel
50
68
Please refer to the original [verl documentation](https://github.com/volcengine/verl) for environment setup.
51
69
52
70
### Step 1: Preprocess Data
53
-
Download data from [here](#data). Use `./examples/data_preprocess/if_prompts.py` to preprocess VerInstruct.
71
+
Download data from [here](https://huggingface.co/datasets/THU-KEG/VerInstruct). Use `./examples/data_preprocess/if_prompts.py` to preprocess VerInstruct.
54
72
> Make sure to add the import path for `./verl/utils/reward_score/local_server` at the top of each function.
55
73
56
74
### Step 2: Setup the Verifier Model
57
75
For **soft constraint verification**, use an LLM-based verifier. You may:
58
-
- Use our own trained [verifier](#verifier) based on R1-Distilled-Qwen-7B
76
+
- Use our own trained [verifier](https://huggingface.co/THU-KEG/IF-Verifier-7B) based on R1-Distilled-Qwen-7B
59
77
- Use **QwQ-32B** as the verifier
60
78
61
79
We suggest using **SGLang** or **vLLM** for deployment.
0 commit comments