Skip to content

Commit 7c1f4bd

Browse files
First principles datasets (#181)
* First principles datasets Data comes from two symbolic regression repos: - Miles Cranmer's PySR: https://github.com/MilesCranmer/PySR - Etienne Russeil et al.'s MvSR: https://github.com/erusseil/MvSR-analysis They are all datasets that have a first-principle equation derived from data and used in their respective papers to show how symbolic regression has the potential of retrieving the original equation when only observational data is available. While some of them have just a few samples and others are synthetically generated, they are challenging for symbolic regression methods and can be used to evaluate these algorithms. The idea of pushing them into PMLB is to help other users to quickly set up experiments with the data. I still need to write proper metadata for them. * Re-generated broken datasets CI was failing to parse the contents of these specific ones. * update dataset files Created by https://github.com/gAldeia/pmlb/actions/runs/11616806556\nfrom f23672c on 2024-10-31 * New metadata * Updated summary * update dataset files Created by https://github.com/gAldeia/pmlb/actions/runs/13434894123\nfrom bdc87c8 on 2025-02-20 * Fix typo * update dataset files Created by https://github.com/gAldeia/pmlb/actions/runs/13465733857\nfrom a226e6b on 2025-02-21 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent aa128d7 commit 7c1f4bd

File tree

90 files changed

+598
-35
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+598
-35
lines changed

.lfs-assets-id

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
0baa7b708956fd05b84d47b18a86f926335db5f42d2cd7e29ca83558c139aff3
2222
0be6203e167cc5e7b038368dbfe0a7790d5dc423c9d7e42887907c5c03f81c27
2323
0c342ef5d61bbcf43180a3b71d407b9d994942ce43e8960052201daf88dd095d
24+
0c848103ae200b9a969cf5eb9836592b7663aa7553e8a129102d7fc387c2f490
2425
0d05767a4c118752a25c4632aeea3b71ffa1bfe122b6a2401f85d20541be19a4
2526
0d39f17afc3a1712bd6c460aa941aed7835b3feb142538adfdd31ddc2451d60d
2627
0d43780ab866e54a2a78d8c86ba231ad0a5d55588450a33ed6fe52bee9638341
@@ -43,6 +44,7 @@
4344
1ee8cf9693351db7afe68f6fc32942845caae3e1030c688efa6c5d0b24229f46
4445
1f5cf829d2e58032e5d9067f1e7bf3fe7644cd5fe2825c81ef7fbaa445f496a7
4546
1f69f25b0168c39018c214ae39f2bb8fe6da97e1df389c2cb88cf9bde2f08ace
47+
21586d500f0961c0d2c8296644e3ec269e2ecf783f3c39c0fdf1dc6159edee0a
4648
21d506c397dfeb3edbbbc253b923f59be6edb516689677ebef535296c6c62242
4749
22053c4cdaaaf6169d90e4ee03f8b994cb682281ab6787bf14c617301db5663b
4850
22baa768886091d61e13e894610dfea3435dfd201f8300fbebe46cd6cf814c0b
@@ -79,6 +81,7 @@
7981
32abe7e2579387f0439d1595cb72f1f0fe79f41822ec99d06f0e219a65dda362
8082
32bdeb725bc79d00349bcc66c41d306f463005cf6ac623cf8c15b5f3f7bbbb83
8183
32d94576f8622f22279a02099e5269511a8b14fee6a441e880730966763b79c9
84+
346b2e3bbc0c631bc00b2d001dfc5791fe729cb472795b23d593895252ce6bb8
8285
34fd665457403f66db49a4d012c59d7c387f99a35597c8f0f0e31d40ada255f8
8386
35724de77dbb2d325f81905aa01f639cae3f29a3d24ae7b24ec84acbc9e08a8f
8487
35aefd558529484575b142f122c5e2af2eb337025a9607e3df5ab60a57783e09
@@ -107,6 +110,7 @@
107110
45b06a3b07f45e5aa49f13f030860b245507ba94185219571ceb314fbdd87c2f
108111
45b6fb5d5c4bb09f2f21b53b069b3994e4f6fa69a5a932cf01c1ebb335bf8645
109112
46578097c3f1477b9f4f2eb2dc74421162fa9a14e139b0e3e791e41679e459d9
113+
469d734ef8b6f79d2e38bc487251940bcaa9349050cb455063209e3371dcd439
110114
46e26ab2e17e1e92728b1a27ebcad5fc8319b4195414746064da34d95f27280e
111115
47478e3af60f8a6cc09dcfba8495af699c8187edc279a0e226e4e63b410d64b7
112116
4757ff95609abb91a98577cf6023c804ba2b0b749a9dfcc48597ac49d4bf72a0
@@ -142,10 +146,13 @@
142146
56f25ef2fcadcd25cc5ccdc721663194ead75bcd90c4d5c1b806d72ea193948f
143147
56f5ba3f3ba78f6e522a13e5a97e03ff28cf9fa4107021b22b932f6d4064c145
144148
56f81f2a4cdd1968cc83d45bfdaa049c90cf7cdf3141b2db9979d31c000e3937
149+
5704b631ba0afc6bf761196ae757565c6ea8398019094526f16383802b8f6cda
145150
57b36b18d3ed1b78d6ca647f701fdef974b27979cda7dda2fe91a81eb7e329d6
146151
588ef519ce346285e4cb9cdd5780abbbac32cab9661625c6e517a9b70c87495d
152+
5910dd1ee08ef353c08e2c25b629f66d42565eb4c990656e06d1b575fe5880a7
147153
5940ac21f3c7e93dd1dca45266304d160c45256f0628419f001c8b54e1c98360
148154
596303e877cb91ec3f96cab8c0eac3205f8c50d2e56debdde6a6b4bcffe57ccd
155+
59df1d16cb8aa383900dc43a067ca79437912283dd3b32e4834b20185efc586e
149156
5adc7d62cd741fde41554ad50ba608973a19f87f795787156e7f16855a227a49
150157
5b1d8788d9512819fd46d1acf1e86e70cb5f4418c8e8bfe299ebd3abf2188217
151158
5b20048751e68b6dd76a7b66ada790fb90f9dd41bd8c82d448fe439f8d038969
@@ -233,6 +240,7 @@
233240
840eff365b01eacb770f1e78f97bc889611362bd3bbf835f344a2792973c985a
234241
849976657dc371578819f225ce81eb9a76bd084ce743b5ff6753e415567c68dd
235242
854aedabba36d89fb2a79246592f2858d6905e7e75a1031128f67ed0c8a446a6
243+
85b0d6a23ccc3a1db5e2a94928c32e35dd2388d2cb13328c9a914fc594e7bab1
236244
86e18c1aa7247f824f7097219d75d29f8dcba3034bc89e1ee92c7778aa31bd9c
237245
8712952be1bc739d221729dd94705dc67e189981a728ecc9cfe08c8df5d8b125
238246
87df1bbfb83b8204093ccd84ed18bb3695220c03aa4096f63a84cc5e136dbc0b
@@ -284,17 +292,21 @@
284292
9def90abb62c1b9872ae2dbed3e425850ae44f98d3664e8b29e2444548e9fec9
285293
9e1ec477e8af8356c3b731f8815f19b57bb404bd7c1629a2020bc9d90b0c028a
286294
9e69e5aa34b36b4f528c42711c7fbaf88abcda9f0b7ba4181e4f57525b6d1527
295+
9f946159fb4fa4d6351952d89148572682de6c062f14b6c0f401b9479ad277de
287296
a15ad0797f0d445cafcf5afb14f26df0aee2417181a2081ad26b1c10e0aaf79c
288297
a2b0fbfc6f24cb86e3c612be4d59f5dc48b4e3e73620b480dd9f54dccf4d90da
289298
a39fc5f054db506e83a4a4ec47eba1f7f9bf9bfdc983174e699312f32f42f1f5
290299
a3a7bf32a44deebb795cf6faf6fc88936d24fc94157468f352c8972906318984
291300
a4e3dfadeb34bba861ef56046d2fc99c4d50d8475882d610a0b652d39c510f6b
292301
a551e2941365201552a3a819f035c64f297467464f7c7c349d00711316ce3c57
293302
a55638ba902c8afb52d5b006f2ac438c72dfcdd6325efe7488b125f7a9662989
303+
a568223ad5181bf644ca9e871fd235d0e614895b995155b0cb2dfec53f8f9328
294304
a5a5d15adf74702c323b62b595cdefcdd157f68ebd86af5e504bc16c9890c2ff
295305
a5c0e0103dc8caf7a9b18ab4546a046ba2e01425259c639069fa02f0824ae0f2
296306
a5fff4a8241312d53818c146ca0b132dc760a6848e7b89ec2e3271fc6454e7da
307+
a70e4196c6288b9132c1fa2515235a18be44b16f6a54fdb04ea043b8027b6ea0
297308
a71c05581e59bc83a6e500c66ea94ff9e355d5e457f0b094e10a61e7f13fdecc
309+
a80f850226d7a536fad6f33fdae2d08bee9bf36c3a28c452ed0aaa244c88bf39
298310
a857a458b0621f46b60f326d5760f7d3d39d6ba1fe87db1a7f1114bdfeb99862
299311
a8a1931c568c9637e3671aa4919cc6a1a3acfa26358fd3f81ca89e4b01f96a72
300312
a96ca8012634e20d924624747b939c1b99f7b4ec36a7819c80c962d7a1fafe49
@@ -330,6 +342,7 @@ bd0e747cb0a16d9f68843ccd6fa0b0d382bb21f2c83ccbc222712426ee42274b
330342
bd9d5214451c3b72e8a5ba4ae75a565d373d90a804659d0a5f4617fad3ac4cfd
331343
be6942e13096c21f10496a24056b01ff791e24d6172f5b7a09013c3307d38f28
332344
beedc054b8e7d974a98326db8c834843eb188cffba0f07029a8370b193ce020e
345+
bef7e5fb62eb611f1d60e8418df7b546c0df0ae28a51b5c3f7501b2128fdbb17
333346
bf066d8b8431c89d3c8afd58b0bfe56f53e0aaedd8d4ec05c132268115af3f36
334347
bfc6131af9d009576a82d25e0590955980535eb61c67d1553434da993e79af92
335348
bfd1b9c6e5f6314f6ddfc285ace11381570ad688c6c857562c3155a4e478530d
Lines changed: 6 additions & 0 deletions
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
dataset n_instances n_features n_binary_features n_categorical_features n_continuous_features endpoint_type n_classes imbalance task
2-
auto_insurance_losses 164 24 3 7 14 continuous 51 0.00844289113622844 regression
2+
auto_insurance_losses 164 24 3 6 15 categorical 51.0 0.008442891136228436 regression
Lines changed: 6 additions & 0 deletions
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
dataset n_instances n_features n_binary_features n_categorical_features n_continuous_features endpoint_type n_classes imbalance task
2-
auto_insurance_price 201 23 4 6 13 continuous 186 0.000343181229792947 regression
2+
auto_insurance_price 201 23 4 5 14 categorical 186.0 0.0003431812297929476 regression
Lines changed: 6 additions & 0 deletions
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
dataset n_instances n_features n_binary_features n_categorical_features n_continuous_features endpoint_type n_classes imbalance task
2-
auto_insurance_symboling 205 24 4 6 14 ordinal 6 0.0755788221296847 classification
2+
auto_insurance_symboling 205 24 4 5 15 categorical 6.0 0.07557882212968471 classification
Lines changed: 6 additions & 0 deletions
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
dataset n_instances n_features n_binary_features n_categorical_features n_continuous_features endpoint_type n_classes imbalance task
2-
breast_cancer_wisconsin_diagnostic 569 30 0 0 30 binary 2 0.064939878490615 classification
2+
breast_cancer_wisconsin_diagnostic 569 30 0 0 30 categorical 2.0 0.06493987849061501 classification
Lines changed: 6 additions & 0 deletions

0 commit comments

Comments
 (0)