+> **Note:** All evaluation results obtained before version `v1.6.0` cannot be compared to results obtained on or after `v1.6.0` due to a critical fix in how hidden tests are handled, unless you use the --legacy flag. From here on, we will probably be introducing breaking changes to improve the eval for fairness and better evaluation. Once we are happy with all the improvements this will probably birth a new v2 leaderboard. For now current leaderboard is in maintenance mode and will only be getting a few new updates, and reruns if any bugs are found that affected fairness.
0 commit comments