Skip to content

Latest commit

 

History

History
124 lines (72 loc) · 5.76 KB

File metadata and controls

124 lines (72 loc) · 5.76 KB

Zenodo documentation

File size

According to Zenodo FAQ:

We currently accept up to 50GB per dataset (you can have multiple datasets); there is no size limit on communities.

So we don't expect much files to have an individual size above 50 GB.

API

Token

Zenodo requires a token to access its API with higher rate limits. See "Authentication" to get a token and "Quickstart - Upload" to test it.

Example of direct API link for a given dataset: https://zenodo.org/api/records/8183728

Query

Search guide

Rate limiting

The rate limit is

100 requests per minute, 5000 requests per hour

Datasets

Search of MD-related datasets

Query examples:

resource_type.type:"dataset" AND filetype:"tpr"

with keywords:

resource_type.type:"dataset" AND filetype:"mdp"  AND ("molecular dynamics" OR "molecular dynamic" OR "molecular-dynamics" OR "molecular-dynamic" OR "md trajectory" OR "md trajectories" OR "md simulation" OR "md simulations" OR "gromacs" OR "gromos" OR "namd" OR "amber" OR "desmond" OR "amber96" OR "amber99" OR "amber14" OR "charmm" OR "charmm27" OR "charmm36" OR "martini")

Search strategy

We search for all file types and keywords. Results are paginated by batch of 100 datasets.

The API send the full records of datasets, including complete files metadata.

Get metadata for a given dataset

For debugging purpose only, since all information is already provided in the search results

Example of datasets related to molecular dynamics:

Zip files

Many MD simulation files are archived in zip files.

Query:

resource_type.type:"dataset" AND filetype:"zip"  AND ("molecular dynamics" OR "molecular dynamic" OR "molecular-dynamics" OR "molecular-dynamic" OR "md trajectory" OR "md trajectories" OR "md simulation" OR "md simulations" OR "gromacs" OR "gromos" OR "namd" OR "amber" OR "desmond" OR "amber96" OR "amber99" OR "amber14" OR "charmm" OR "charmm27" OR "charmm36" OR "martini")

Example of datasets related to molecular dynamics with zip files:

Some dataset cannot be found with keywords. For instance:

Accessing zip file content

Zip file content can be accessed through a preview page provided by Zenodo.

The URL for zip file content preview is: https://zenodo.org/record/{dataset_id}/preview/{zip_file_name}

For dataset All-atom molecular dynamics simulations of SARS-CoV-2 envelope protein E (view in API)

Note that the preview is available for the first 1000 files only.

File name and file size are the only metadata available from the preview.

Zip files with tree-like structure

Some zip file content are dense, with many folders and sub-folders.

Examples:

These complexe zip files are handled by the current implementation of the Zenodo scraper.

Issues with zip file content

Sometimes, zip file contents are not accessible.

For the dataset "G-Protein Coupled Receptor-Ligand Dissociation Rates and Mechanisms from tauRAMD Simulations": preview of the file Example_b2AR-alprenolol.zip is not available, probably because the file is too large (5.4 GB).