Skip to content

Commit 64de008

Browse files
committed
add codemmlu leaderboard
1 parent f26e7b7 commit 64de008

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

leaderboards/codemmlu/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ <h3>📝 Notes</h3>
192192
<!-- <li>
193193
💤 indicates the models having at least a difference of 1% between the calibrated Pass@1 and the original one. What does this imply? Instruction-tuned models <u><a href="https://community.openai.com/t/why-i-think-gpt-is-now-lazy">can be lazy</a></u>, omitting essential code parts and thus failing on some tasks.
194194
Therefore, we add the missing parts during evaluation, and report the calibrated Pass@1 score as default, -->
195-
<li>
195+
<!-- <li>
196196
✨ marks models evaluated using a chat setting, while others
197197
perform direct code completion. We note that some instruction-tuned models miss the chat template in their tokenizer configuration.
198198
</li>
@@ -206,7 +206,7 @@ <h3>📝 Notes</h3>
206206
open SFT data, but the base model is not data-open. What does this
207207
imply? 💚💙 models open-source the data such that one can
208208
concretely reason about contamination.
209-
</li>
209+
</li> -->
210210
<li>
211211
"Size" here is the amount of activated model weight during inference.
212212
</li>

0 commit comments

Comments
 (0)