add codemmlu leaderboard

NamCyan · NamCyan · commit 64de008acf12 · 2024-10-08T21:16:02.000+07:00
diff --git a/leaderboards/codemmlu/index.html b/leaderboards/codemmlu/index.html
@@ -192,7 +192,7 @@ <h3>📝 Notes</h3>
             <!-- <li>
               💤 indicates the models having at least a difference of 1% between the calibrated Pass@1 and the original one. What does this imply? Instruction-tuned models <u><a href="https://community.openai.com/t/why-i-think-gpt-is-now-lazy">can be lazy</a></u>, omitting essential code parts and thus failing on some tasks.
               Therefore, we add the missing parts during evaluation, and report the calibrated Pass@1 score as default,  -->
-            <li>
+            <!-- <li>
               ✨ marks models evaluated using a chat setting, while others
               perform direct code completion. We note that some instruction-tuned models miss the chat template in their tokenizer configuration.
             </li>
@@ -206,7 +206,7 @@ <h3>📝 Notes</h3>
               open SFT data, but the base model is not data-open. What does this
               imply? 💚💙 models open-source the data such that one can
               concretely reason about contamination.
-            </li>
+            </li> -->
             <li>
               "Size" here is the amount of activated model weight during inference.
             </li>