You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<metaname="description" content="AceCoder: Acing Coder RL via Automated Test-Case Synthesis">
6
+
<metaproperty="og:title" content="AceCoder: Acing Coder RL via Automated Test-Case Synthesis" />
7
+
<metaproperty="og:description" content="We propose Critique Fine-Tuning (CFT), where models learn to critique noisy responses rather than simply imitate correct ones." />
Recent years have witnessed the great performance of code model in code generation, code fix, etc. However, most recent work has been focused on supervised-fine-tuning (SFT) while the potential of reinforcement learning in training code models has been untapped. This is mostly hindered by the lack of reliable reward signals in the code domain. In this paper, we aim to empower code model training with automated test-case synthesis on a large scale. Specifically, we design a pipeline to synthesize large-scale (question, test-cases) pairs from existing seed code data.
126
+
</p>
127
+
</div>
128
+
</div>
129
+
</div>
130
+
</div>
131
+
</section>
132
+
133
+
<!-- BibTeX citation -->
134
+
<sectionclass="section" id="BibTeX">
135
+
<divclass="container is-max-desktop content">
136
+
<h2class="title">Reference</h2>
137
+
Please kindly cite our paper if you use our code or results:
138
+
<pre><code>
139
+
140
+
</code></pre>
141
+
</div>
142
+
</section>
143
+
144
+
<footerclass="footer">
145
+
<divclass="container">
146
+
<divclass="columns is-centered">
147
+
<divclass="column is-8">
148
+
<divclass="content has-text-centered">
149
+
<p>
150
+
This website is licensed under a <arel="license" href="http://creativecommons.org/licenses/by-sa/4.0/" target="_blank">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
0 commit comments