Add links

h-peng17 · h-peng17 · commit c56a258c9384 · 2025-06-11T14:43:48.000+08:00
diff --git a/README.md b/README.md
@@ -1,5 +1,23 @@
 # VerIF: Verification Engineering for RL in Instruction Following
-[**Model**](#trained-models--data) | [**Data**](#trained-models--data) | [**Paper**](#paper)
+
+<!-- [![Model](https://img.shields.io/badge/Model-TULU3-blue
+)](https://huggingface.co/THU-KEG/TULU3-VerIF)  [![Data](https://img.shields.io/badge/Data-VerInstruct-yellow
+)](https://huggingface.co/datasets/THU-KEG/VerInstruct)  [![Verifier](https://img.shields.io/badge/Model-Verifier-blue
+)](https://huggingface.co/THU-KEG/IF-Verifier-7B) -->
+
+<div align="center">
+
+  <a href="https://huggingface.co/THU-KEG/TULU3-VerIF">
+    <img src="https://img.shields.io/badge/Model-TULU3-blue" alt="Model">
+  </a>
+  <a href="https://huggingface.co/datasets/THU-KEG/VerInstruct">
+    <img src="https://img.shields.io/badge/Data-VerInstruct-yellow" alt="Data">
+  </a>
+  <a href="https://huggingface.co/THU-KEG/IF-Verifier-7B">
+    <img src="https://img.shields.io/badge/Model-Verifier-blue" alt="Verifier">
+  </a>
+
+</div>
 
 ---
 
@@ -27,15 +45,15 @@ To support this method, we construct a high-quality dataset, **VerInstruct**, wi
 
 ## Data & Trained Models
 
-- [VerInstruct (22k instruction-following examples with verifiable signals)](data)
-- [R1-Distill-Qwen-7B-VerIF](model), based on DeepSeek-R1-R1-Distill-Qwen-7B
-- [TULU3-VerIF](model), based on Llama-3.1-Tulu-3-8B-SFT
+- [VerInstruct (24k instruction-following examples with verifiable signals)](https://huggingface.co/datasets/THU-KEG/VerInstruct)
+- [TULU3-VerIF](https://huggingface.co/THU-KEG/TULU3-VerIF), based on Llama-3.1-Tulu-3-8B-SFT
+- [R1-Distill-Qwen-7B-VerIF](https://huggingface.co/THU-KEG/R1-Distill-Qwen-7B-VerIF), based on DeepSeek-R1-R1-Distill-Qwen-7B
 
 ---
 
 ## Training Guide
 
-This repo is forked from [verl](https://github.com/volcengine/verl). We sincerely thank the authors for their excellent framework. VerIF introduces two key adjustments:
+This repo is forked from [verl](https://github.com/volcengine/verl). We sincerely thank the authors for their excellent framework. We introduce two key adjustments:
 
 1. **Efficient Local Reward Server**:  
    We provide a `local_server` version of the reward function for better efficiency. We recommend running it inside a **sandboxed Docker** environment to avoid potential security issues. You may also deploy your own remote server.
@@ -50,12 +68,12 @@ This repo is forked from [verl](https://github.com/volcengine/verl). We sincerel
 Please refer to the original [verl documentation](https://github.com/volcengine/verl) for environment setup.
 
 ### Step 1: Preprocess Data  
-Download data from [here](#data). Use `./examples/data_preprocess/if_prompts.py` to preprocess VerInstruct.  
+Download data from [here](https://huggingface.co/datasets/THU-KEG/VerInstruct). Use `./examples/data_preprocess/if_prompts.py` to preprocess VerInstruct.  
 > Make sure to add the import path for `./verl/utils/reward_score/local_server` at the top of each function.
 
 ### Step 2: Setup the Verifier Model  
 For **soft constraint verification**, use an LLM-based verifier. You may:
-- Use our own trained [verifier](#verifier) based on R1-Distilled-Qwen-7B
+- Use our own trained [verifier](https://huggingface.co/THU-KEG/IF-Verifier-7B) based on R1-Distilled-Qwen-7B
 - Use **QwQ-32B** as the verifier
 
 We suggest using **SGLang** or **vLLM** for deployment.