Skip to content

Commit 0e0968f

Browse files
committed
2 parents 164d6e4 + 32f5382 commit 0e0968f

File tree

4 files changed

+3
-9
lines changed

4 files changed

+3
-9
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@ We share pre-generated code samples from LLMs we have [evaluated](https://huggin
347347
348348
## 🐞 Known Issues
349349
350-
- [ ] Due to [the Hugging Face tokenizer update](https://github.com/huggingface/transformers/pull/31305), some tokenizer may be broken and will degrade the performance of the evaluation. Therefore, we set up with `legacy=False` for the initialization. If you notice the unexpected change, please try `--tokenizer_legacy` during the generation.
350+
- [ ] Due to [the Hugging Face tokenizer update](https://github.com/huggingface/transformers/pull/31305), some tokenizer may be broken and will degrade the performance of the evaluation. Therefore, we set up with `legacy=False` for the initialization. If you notice the unexpected behaviors, please try `--tokenizer_legacy` during the generation.
351351
352352
- [ ] Due to the flakes in the evaluation, the execution results may vary slightly (~0.2%) between runs. We are working on improving the evaluation stability.
353353

analysis/bcb_subset.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,6 @@ def read_task_perf(tids, task="complete"):
7575
continue
7676
task_perf = {f"BigCodeBench/{task_id}": 0 for task_id in range(1140)}
7777
model = model.replace("/", "--")
78-
# if info["link"].startswith("https://huggingface.co/"):
79-
# model = info["link"].split("https://huggingface.co/")[-1].replace("/", "--")
8078
try:
8179
if info["prompted"] and not info["direct_complete"]:
8280
files = glob(f"results/{model}--bigcodebench-{task}*-0-1-sanitized-calibrated_eval_results.json")

analysis/get_results.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,6 @@ def get_results(tids):
5252
hf_model = ""
5353
files = glob(f"results/{model}--bigcodebench-*.json")
5454
assert files, f"No files found for results/{model}--bigcodebench-*.json"
55-
# if "https://huggingface.co/" in info["link"]:
56-
# hf_model = info["link"].split("https://huggingface.co/")[-1]
57-
# model = hf_model.replace("/", "--")
5855
for file in files:
5956
_, suffix = os.path.basename(file).split("--bigcodebench-")
6057
status = []
@@ -153,8 +150,6 @@ def read_task_perf(tids, task="complete"):
153150

154151
task_perf = dict()
155152
model = model.replace("/", "--")
156-
# if info["link"].startswith("https://huggingface.co/"):
157-
# model = info["link"].split("https://huggingface.co/")[-1].replace("/", "--")
158153
try:
159154
if info["prompted"] and not info["direct_complete"]:
160155
files = glob(f"results/{model}--bigcodebench-{task}*-0-1-sanitized-calibrated_eval_results.json")

bigcodebench/evaluate.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -283,7 +283,8 @@ def stucking_checker():
283283
json.dump(results, f, indent=2)
284284

285285
pass_at_k_path = result_path.replace("_eval_results.json", "_pass_at_k.json")
286-
pass_at_k["model"] = flags.samples.split("/")[-1].replace(".jsonl", "")
286+
pass_at_k["model"] = os.path.basename(flags.samples).split("--bigcodebench-")[0]
287+
pass_at_k["calibrated"] = "sanitized-calibrated" in flags.samples
287288
pass_at_k["subset"] = flags.subset
288289

289290
def save_pass_at_k():

0 commit comments

Comments
 (0)