Skip to content

Possible missing files in Hugging Face train_dataset/mcut/mcut_ba_small #3

@53mins

Description

@53mins

Hi Jiale,

Thanks for open-sourcing ML4CO-Bench-101 — it has been very helpful for reproduction.

I noticed a possible issue with the Hugging Face training data for MCut.

In the repository README, users are asked to download train_dataset from Hugging Face. However, under:

ML4CO-Bench-101-SL / train_dataset / mcut / mcut_ba_small

I can currently only find one file:

  • mcut_ba-small_64k_1.txt

This seems incomplete, because the ML4CO-Bench-101 paper lists the MCut BA-SMALL training set size as 128,000 instances. Based on the current Hugging Face folder, only a single 64K shard appears to be available.

For comparison, mcut_ba_large seems to be uploaded as multiple shards (mcut_ba-large_16k_1.txt to mcut_ba-large_16k_8.txt), which looks consistent with a complete 128K training set. So I suspect that mcut_ba_small may be missing the remaining shard(s), such as a second 64K file, or another equivalent complete upload.

Could you please check whether the mcut_ba_small training dataset on Hugging Face is incomplete, and if so, upload the full version?

This would be very helpful for reproducing the MCut BA-SMALL experiments in the benchmark.

Thanks again for releasing the benchmark and the code.

Best,
53mins

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions