According to Zenodo FAQ:
We currently accept up to 50GB per dataset (you can have multiple datasets); there is no size limit on communities.
So we don't expect much files to have an individual size above 50 GB.
- Base URL: https://zenodo.org/
- REST API
- List of HTTP status codes
Zenodo requires a token to access its API with higher rate limits. See "Authentication" to get a token and "Quickstart - Upload" to test it.
Example of direct API link for a given dataset: https://zenodo.org/api/records/8183728
The rate limit is
100 requests per minute, 5000 requests per hour
- Endpoint:
/api/records - HTTP method: GET
- Documentation: https://developers.zenodo.org/#records
- Search guide and documentation for the query string (Elastic)
Query examples:
resource_type.type:"dataset" AND filetype:"tpr"
with keywords:
resource_type.type:"dataset" AND filetype:"mdp" AND ("molecular dynamics" OR "molecular dynamic" OR "molecular-dynamics" OR "molecular-dynamic" OR "md trajectory" OR "md trajectories" OR "md simulation" OR "md simulations" OR "gromacs" OR "gromos" OR "namd" OR "amber" OR "desmond" OR "amber96" OR "amber99" OR "amber14" OR "charmm" OR "charmm27" OR "charmm36" OR "martini")
We search for all file types and keywords. Results are paginated by batch of 100 datasets.
The API send the full records of datasets, including complete files metadata.
For debugging purpose only, since all information is already provided in the search results
- Endpoint:
/api/records/{dataset_id} - HTTP method: GET
- Documentation: https://developers.zenodo.org/#records
Example of datasets related to molecular dynamics:
- Simulations of a beta-2 adrenergic receptor monomer on a flat membrane (view in API)
- GROMACS simulations of unfolding of ubiqutin in a strong electric field (view in API)
Many MD simulation files are archived in zip files.
Query:
resource_type.type:"dataset" AND filetype:"zip" AND ("molecular dynamics" OR "molecular dynamic" OR "molecular-dynamics" OR "molecular-dynamic" OR "md trajectory" OR "md trajectories" OR "md simulation" OR "md simulations" OR "gromacs" OR "gromos" OR "namd" OR "amber" OR "desmond" OR "amber96" OR "amber99" OR "amber14" OR "charmm" OR "charmm27" OR "charmm36" OR "martini")
Example of datasets related to molecular dynamics with zip files:
- All-atom molecular dynamics simulations of SARS-CoV-2 envelope protein E
- Structural dynamics of DNA depending on methylation pattern: Simulation dataset
- Exploring the interaction of a curcumin azobioisostere with Abeta42 dimers using replica exchange molecular dynamics simulations
- Molecular dynamics simulation data of regulatory ACT domain dimer of human phenylalanine hydroxylase (PAH) (with unbound ligand) (with multiple zip files)
Some dataset cannot be found with keywords. For instance:
Zip file content can be accessed through a preview page provided by Zenodo.
The URL for zip file content preview is: https://zenodo.org/record/{dataset_id}/preview/{zip_file_name}
For dataset All-atom molecular dynamics simulations of SARS-CoV-2 envelope protein E (view in API)
- preview for NoPTM-2_Mix_CHARMM36m_0.1x3mks.zip
- preview for NoPTM-4_POPC_CHARMM36m_0.1x3mks.zip
Note that the preview is available for the first 1000 files only.
File name and file size are the only metadata available from the preview.
Some zip file content are dense, with many folders and sub-folders.
Examples:
- For the dataset "Input files and scripts for Hamiltonian replica-exchange molecular dynamics simulations of intrinsically disordered proteins using a software GROMACS patched with PLUMED": preview of the file
hremd-idp.zip. - For the dataset "2DUV Machine Learning Protocol Code": preview of the file
code.zip.
These complexe zip files are handled by the current implementation of the Zenodo scraper.
Sometimes, zip file contents are not accessible.
For the dataset "G-Protein Coupled Receptor-Ligand Dissociation Rates and Mechanisms from tauRAMD Simulations": preview of the file Example_b2AR-alprenolol.zip is not available, probably because the file is too large (5.4 GB).