-
Can we add dataset to the Well? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Contributions to the Well are welcome, especially in physics domains that are not covered yet either by the Well or by other publicly available dataset. This being said, due to limited resources and safety reasons we cannot host external data. We will however be happy to advertise any Well formatted dataset. To make your dataset compatible with the Well format follow the steps described below. Document the dataEach dataset composing the Well comes with a README explaining the content of the data, how they were generated, and what ML-challenges they involve. You should use the same template to describe your data (e.g. active matter description). Formatting of the dataTo serve a common interface to ML-applications for different datasets, the Well expects data to be stored in a tensor product grid. The raw data in the Well are stored in HDF5 files which have a specific data layout. This data layout is described in the documentation. Each HDF5 file represents different physical parameters and can contain different trajectories corresponding to different initial conditions. Thus the first step to add your data to the Well is to format them according to this layout and store them in HDF5 files. Computing Statistics and MetadataUse the Hosting the dataDue to limited resources and safety reasons we cannot host directly external data. We recommend you provide a link to download the data or rely on third-party services to host them like the Hugging Face dataset hub. Benchmarking the dataYou should also provide benchmark results on your data. Once your data are formatted according to the Well data layout, you can use the Well utility classes to benchmark SOTA models. Provide a YAML configuration file for your data as the ones in the Reach out to advertise your datasetYou can reach out to advertise your dataset and check the completion of all the above steps. As a summary, a Well formatted dataset should provide the following files:
|
Beta Was this translation helpful? Give feedback.
Contributions to the Well are welcome, especially in physics domains that are not covered yet either by the Well or by other publicly available dataset. This being said, due to limited resources and safety reasons we cannot host external data. We will however be happy to advertise any Well formatted dataset. To make your dataset compatible with the Well format follow the steps described below.
Document the data
Each dataset composing the Well comes with a README explaining the content of the data, how they were generated, and what ML-challenges they involve. You should use the same template to describe your data (e.g. active matter description).
Formatting of the data
To serve a common in…