Skip to content

Commit c56a258

Browse files
committed
Add links
1 parent 7f660c6 commit c56a258

File tree

1 file changed

+25
-7
lines changed

1 file changed

+25
-7
lines changed

README.md

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# VerIF: Verification Engineering for RL in Instruction Following
2-
[**Model**](#trained-models--data) | [**Data**](#trained-models--data) | [**Paper**](#paper)
2+
3+
<!-- [![Model](https://img.shields.io/badge/Model-TULU3-blue
4+
)](https://huggingface.co/THU-KEG/TULU3-VerIF) [![Data](https://img.shields.io/badge/Data-VerInstruct-yellow
5+
)](https://huggingface.co/datasets/THU-KEG/VerInstruct) [![Verifier](https://img.shields.io/badge/Model-Verifier-blue
6+
)](https://huggingface.co/THU-KEG/IF-Verifier-7B) -->
7+
8+
<div align="center">
9+
10+
<a href="https://huggingface.co/THU-KEG/TULU3-VerIF">
11+
<img src="https://img.shields.io/badge/Model-TULU3-blue" alt="Model">
12+
</a>
13+
<a href="https://huggingface.co/datasets/THU-KEG/VerInstruct">
14+
<img src="https://img.shields.io/badge/Data-VerInstruct-yellow" alt="Data">
15+
</a>
16+
<a href="https://huggingface.co/THU-KEG/IF-Verifier-7B">
17+
<img src="https://img.shields.io/badge/Model-Verifier-blue" alt="Verifier">
18+
</a>
19+
20+
</div>
321

422
---
523

@@ -27,15 +45,15 @@ To support this method, we construct a high-quality dataset, **VerInstruct**, wi
2745

2846
## Data & Trained Models
2947

30-
- [VerInstruct (22k instruction-following examples with verifiable signals)](data)
31-
- [R1-Distill-Qwen-7B-VerIF](model), based on DeepSeek-R1-R1-Distill-Qwen-7B
32-
- [TULU3-VerIF](model), based on Llama-3.1-Tulu-3-8B-SFT
48+
- [VerInstruct (24k instruction-following examples with verifiable signals)](https://huggingface.co/datasets/THU-KEG/VerInstruct)
49+
- [TULU3-VerIF](https://huggingface.co/THU-KEG/TULU3-VerIF), based on Llama-3.1-Tulu-3-8B-SFT
50+
- [R1-Distill-Qwen-7B-VerIF](https://huggingface.co/THU-KEG/R1-Distill-Qwen-7B-VerIF), based on DeepSeek-R1-R1-Distill-Qwen-7B
3351

3452
---
3553

3654
## Training Guide
3755

38-
This repo is forked from [verl](https://github.com/volcengine/verl). We sincerely thank the authors for their excellent framework. VerIF introduces two key adjustments:
56+
This repo is forked from [verl](https://github.com/volcengine/verl). We sincerely thank the authors for their excellent framework. We introduce two key adjustments:
3957

4058
1. **Efficient Local Reward Server**:
4159
We provide a `local_server` version of the reward function for better efficiency. We recommend running it inside a **sandboxed Docker** environment to avoid potential security issues. You may also deploy your own remote server.
@@ -50,12 +68,12 @@ This repo is forked from [verl](https://github.com/volcengine/verl). We sincerel
5068
Please refer to the original [verl documentation](https://github.com/volcengine/verl) for environment setup.
5169

5270
### Step 1: Preprocess Data
53-
Download data from [here](#data). Use `./examples/data_preprocess/if_prompts.py` to preprocess VerInstruct.
71+
Download data from [here](https://huggingface.co/datasets/THU-KEG/VerInstruct). Use `./examples/data_preprocess/if_prompts.py` to preprocess VerInstruct.
5472
> Make sure to add the import path for `./verl/utils/reward_score/local_server` at the top of each function.
5573
5674
### Step 2: Setup the Verifier Model
5775
For **soft constraint verification**, use an LLM-based verifier. You may:
58-
- Use our own trained [verifier](#verifier) based on R1-Distilled-Qwen-7B
76+
- Use our own trained [verifier](https://huggingface.co/THU-KEG/IF-Verifier-7B) based on R1-Distilled-Qwen-7B
5977
- Use **QwQ-32B** as the verifier
6078

6179
We suggest using **SGLang** or **vLLM** for deployment.

0 commit comments

Comments
 (0)