Commit 38f9bf0
authored
Parquet Support (#1029)
* Store the minio_url from description xml
* Add minio dependency
* Add call for downloading file from minio bucket
* Allow objects to be located in directories
* add parquet equivalent of _get_dataset_arff
* Store parquet alongside arff, if available
* Deal with unknown buckets, fix path expectation
* Update test to reflect parquet file is downloaded
* Download parquet file through lazy loading
i.e. if the dataset was initially retrieved with download_data=False,
make sure to download the dataset on first get_data call.
* Load data from parquet if available
* Update (doc) strings
* Cast to signify url is str
* Make cache file path generation extension agnostic
Fixes a bug where the parquet files would simply be overwritten.
Also now only save the local files to members only if they actually
exist.
* Remove return argument
* Add clear test messages, update minio urls
* Debugging on CI with print
* Add pyarrow dependency for loading parquet
* Remove print1 parent 4ff66ed commit 38f9bf0
File tree
6 files changed
+247
-18
lines changed- openml
- datasets
- tests
- files/org/openml/test/datasets/30
- test_datasets
6 files changed
+247
-18
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
| 8 | + | |
7 | 9 | | |
8 | 10 | | |
9 | | - | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
10 | 14 | | |
11 | 15 | | |
12 | 16 | | |
| |||
68 | 72 | | |
69 | 73 | | |
70 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
71 | 114 | | |
72 | 115 | | |
73 | 116 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
99 | 103 | | |
100 | 104 | | |
101 | 105 | | |
| |||
128 | 132 | | |
129 | 133 | | |
130 | 134 | | |
| 135 | + | |
| 136 | + | |
131 | 137 | | |
132 | 138 | | |
133 | 139 | | |
| |||
202 | 208 | | |
203 | 209 | | |
204 | 210 | | |
| 211 | + | |
205 | 212 | | |
| 213 | + | |
206 | 214 | | |
207 | 215 | | |
208 | 216 | | |
| |||
291 | 299 | | |
292 | 300 | | |
293 | 301 | | |
294 | | - | |
| 302 | + | |
295 | 303 | | |
296 | 304 | | |
| 305 | + | |
| 306 | + | |
297 | 307 | | |
298 | 308 | | |
299 | 309 | | |
| |||
454 | 464 | | |
455 | 465 | | |
456 | 466 | | |
457 | | - | |
458 | | - | |
459 | | - | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
460 | 471 | | |
461 | 472 | | |
462 | | - | |
463 | | - | |
| 473 | + | |
| 474 | + | |
464 | 475 | | |
465 | | - | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
466 | 481 | | |
467 | 482 | | |
468 | 483 | | |
469 | 484 | | |
470 | | - | |
| 485 | + | |
471 | 486 | | |
472 | | - | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
473 | 499 | | |
474 | 500 | | |
475 | 501 | | |
| |||
480 | 506 | | |
481 | 507 | | |
482 | 508 | | |
| 509 | + | |
| 510 | + | |
483 | 511 | | |
484 | 512 | | |
485 | 513 | | |
| 514 | + | |
486 | 515 | | |
487 | 516 | | |
488 | 517 | | |
| 518 | + | |
489 | 519 | | |
490 | 520 | | |
491 | 521 | | |
| |||
496 | 526 | | |
497 | 527 | | |
498 | 528 | | |
499 | | - | |
500 | | - | |
501 | | - | |
502 | | - | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
503 | 532 | | |
504 | 533 | | |
505 | 534 | | |
| |||
543 | 572 | | |
544 | 573 | | |
545 | 574 | | |
546 | | - | |
| 575 | + | |
| 576 | + | |
547 | 577 | | |
548 | 578 | | |
549 | 579 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
424 | 424 | | |
425 | 425 | | |
426 | 426 | | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
427 | 431 | | |
428 | 432 | | |
429 | 433 | | |
| |||
437 | 441 | | |
438 | 442 | | |
439 | 443 | | |
440 | | - | |
| 444 | + | |
441 | 445 | | |
442 | 446 | | |
443 | 447 | | |
| |||
908 | 912 | | |
909 | 913 | | |
910 | 914 | | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
| 963 | + | |
911 | 964 | | |
912 | 965 | | |
913 | 966 | | |
| |||
1031 | 1084 | | |
1032 | 1085 | | |
1033 | 1086 | | |
| 1087 | + | |
1034 | 1088 | | |
1035 | 1089 | | |
1036 | 1090 | | |
| |||
1045 | 1099 | | |
1046 | 1100 | | |
1047 | 1101 | | |
| 1102 | + | |
| 1103 | + | |
1048 | 1104 | | |
1049 | 1105 | | |
1050 | 1106 | | |
| |||
1081 | 1137 | | |
1082 | 1138 | | |
1083 | 1139 | | |
| 1140 | + | |
| 1141 | + | |
1084 | 1142 | | |
1085 | 1143 | | |
1086 | 1144 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| |||
65 | 67 | | |
66 | 68 | | |
67 | 69 | | |
68 | | - | |
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
| |||
Binary file not shown.
0 commit comments