Skip to content

Commit 2463ac3

Browse files
author
wenhu chen
authored
Update index.html
1 parent 43a152f commit 2463ac3

File tree

1 file changed

+24
-4
lines changed

1 file changed

+24
-4
lines changed

index.html

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -94,17 +94,37 @@ <h1 class="title is-1 publication-title">
9494
<span class="icon">
9595
🤗
9696
</span>
97-
<span>Dataset</span>
97+
<span>AceCode-89K</span>
9898
</a>
9999
</span>
100100

101+
<span class="link-block">
102+
<a href="https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K" target="_blank"
103+
class="external-link button is-normal is-rounded is-dark">
104+
<span class="icon">
105+
🤗
106+
</span>
107+
<span>AceCode-Pairs</span>
108+
</a>
109+
</span>
110+
111+
<span class="link-block">
112+
<a href="https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba" target="_blank"
113+
class="external-link button is-normal is-rounded is-dark">
114+
<span class="icon">
115+
🤗
116+
</span>
117+
<span>Reward Models</span>
118+
</a>
119+
</span>
120+
101121
<span class="link-block">
102122
<a href="https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba" target="_blank"
103123
class="external-link button is-normal is-rounded is-dark">
104124
<span class="icon">
105125
🤗
106126
</span>
107-
<span>Models</span>
127+
<span>RL Models</span>
108128
</a>
109129
</span>
110130
</div>
@@ -156,7 +176,7 @@ <h1 class="title is-1 acecoder">
156176
<h2 class="title is-3">Overview</h2>
157177
<div class="content has-text-justified">
158178
<p>
159-
We introduce <b>AceCoder</b>, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset <b>AcoCode-89K</b>, where we start from a seed code dataset and prompt powerful LLMs to "imagine" proper test cases for the coding question and filter the noisy ones. We sample inferences from existing coder models and compute their pass rate as the reliable and verifiable rewards for both training the reward model and conducting the reinforcement learning for coder LLM.
179+
We introduce <b>AceCoder</b>, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset <b>AceCode-89K</b>, where we start from a seed code dataset and prompt powerful LLMs to "imagine" proper test cases for the coding question and filter the noisy ones. We sample inferences from existing coder models and compute their pass rate as the reliable and verifiable rewards for both training the reward model and conducting the reinforcement learning for coder LLM.
160180
</p>
161181
<div class="content has-text-centered">
162182
<img src="static/images/ac_overview.png" alt="algebraic reasoning" width="100%"/>
@@ -258,4 +278,4 @@ <h2 class="title">Reference</h2>
258278
</footer>
259279

260280
</body>
261-
</html>
281+
</html>

0 commit comments

Comments
 (0)