Skip to content

Adding converted versions of all irregular datasets on Oscar-Oliveira/OR-Datasets#182

Closed
JeroenGar wants to merge 5 commits intofontanf:masterfrom
JeroenGar:feat/extra_academic_benchmarks
Closed

Adding converted versions of all irregular datasets on Oscar-Oliveira/OR-Datasets#182
JeroenGar wants to merge 5 commits intofontanf:masterfrom
JeroenGar:feat/extra_academic_benchmarks

Conversation

@JeroenGar
Copy link
Contributor

I've converted all instances from OR-Datasets to the internal JSON format used here.

They are all in data/irregular/or-datasets. The ones that were already available (shirts, swim, trousers, shapes) are also included to be able to verify the conversion is correct.

In data/irregular_raw, there's just a link to the original files instead of copying them all.

@fontanf
Copy link
Owner

fontanf commented Mar 29, 2025

Hi,

Thank you for the contribution.

I prefer to get instances from the most primary source possible to reduce the probability of errors when successive conversions are applied. Here I think it is the ESICUP website https://www.euro-online.org/websites/esicup/data-sets/#1535972088237-bbcb74e3-b507

I also store the original instances in the repository (if not too big) because it often happens that the page hosting the files disappears; and in case there are errors in my conversion scripts, I can still easily check. For irregular packing, I store them in this directory https://github.com/fontanf/packingsolver/tree/master/data/irregular_raw

If these new instances are in the same format as the ones already in this repository, the conversion function is already implemented. One just needs to add a line for each new instance here

convert_oliveira2000(os.path.join("oliveira2000", "blaz_2007-04-23", "blaz.xml"))
convert_oliveira2000(os.path.join("oliveira2000", "shapes_2007-04-23", "shapes0.xml"))
convert_oliveira2000(os.path.join("oliveira2000", "shapes_2007-04-23", "shapes1.xml"))
convert_oliveira2000(os.path.join("oliveira2000", "shirts_2007-05-15", "shirts.xml"))
convert_oliveira2000(os.path.join("oliveira2000", "swim_2007-05-15", "swim.xml"))
convert_oliveira2000(os.path.join("oliveira2000", "trousers_2007-05-15", "trousers.xml"))

And then running this scripts/convert_irregular.py script will generate the files in the data/irregular directory.

When instances come from a scientific article, I store them in a directory named as firstauthorYEAR to make it easier to retrieve where they come from. For example https://github.com/fontanf/packingsolver/tree/master/data/rectangle_raw By the way, that makes me think that it would be worth adding a doi.url containing the DOI of these articles in each of these directories.

@JeroenGar
Copy link
Contributor Author

Oh, I was not aware you had a script in the repo do the conversion.
I completely agree with you about getting as close to the source as possible, but the XML files on the ESICUP website are not the originals.
And they do, as we discovered earlier, contain errors.
I have no idea who did the XML conversions and with what code.
The raw txt files on the website are the originals though.

There have actually been talks to move the entire ESICUP datasets page to GitHub because the site is not really being maintained at the moment.
It was only because my PhD supervisor is the current chair of ESICUP that I was able to quickly rectify the bug in the xml on the website.

We can discuss at ESICUP, I think it would be good to have your input on this.

@JeroenGar JeroenGar closed this Mar 29, 2025
@fontanf
Copy link
Owner

fontanf commented Mar 30, 2025

The .txt files are not available for all instances and don't contain the bin dimensions

I'm adding the missing instances.

Putting datasets on Github seems to be a good idea. It would make it easier to submit new dataset and to discuss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants