-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi Jiale,
Thanks for open-sourcing ML4CO-Bench-101 — it has been very helpful for reproduction.
I noticed a possible issue with the Hugging Face training data for MCut.
In the repository README, users are asked to download train_dataset from Hugging Face. However, under:
ML4CO-Bench-101-SL / train_dataset / mcut / mcut_ba_small
I can currently only find one file:
mcut_ba-small_64k_1.txt
This seems incomplete, because the ML4CO-Bench-101 paper lists the MCut BA-SMALL training set size as 128,000 instances. Based on the current Hugging Face folder, only a single 64K shard appears to be available.
For comparison, mcut_ba_large seems to be uploaded as multiple shards (mcut_ba-large_16k_1.txt to mcut_ba-large_16k_8.txt), which looks consistent with a complete 128K training set. So I suspect that mcut_ba_small may be missing the remaining shard(s), such as a second 64K file, or another equivalent complete upload.
Could you please check whether the mcut_ba_small training dataset on Hugging Face is incomplete, and if so, upload the full version?
This would be very helpful for reproducing the MCut BA-SMALL experiments in the benchmark.
Thanks again for releasing the benchmark and the code.
Best,
53mins