-
Notifications
You must be signed in to change notification settings - Fork 106
Open
Labels
dataData loading and processingData loading and processing
Description
Problem
Now that multi-gpu training is working we are very interested in training on some larger crystal structure datasets. However, the datasets do not fit into the RAM. It would be great if it would be possible to either have to only load a partial dataset on each ddp node or be able to load the features on the fly to make large scale training possible. I assume the LMDB datasets that the OCP and MATSCIML are using should work for that. Ps: thank you for the pytorch lightning implementation
Proposed Solution
Add an option to save and load data to/from an LMDB database.
Alternatives
Examples would be here
https://github.com/Open-Catalyst-Project/ocp/blob/main/tutorials/lmdb_dataset_creation.ipynb
or here https://github.com/IntelLabs/matsciml/tree/main/matsciml/datasets
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataData loading and processingData loading and processing