Skip to content

Commit b72cf6a

Browse files
authored
Merge branch 'main' into draft
2 parents b61a5e4 + e7aca01 commit b72cf6a

File tree

336 files changed

+10251
-873
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

336 files changed

+10251
-873
lines changed

datasets/3kricegenome.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ Documentation: https://github.com/awslabs/open-data-docs/tree/main/docs/3kricege
44
Contact: http://iric.irri.org/contact-us
55
ManagedBy: '[International Rice Research Institute](https://www.irri.org/)'
66
UpdateFrequency: Not updated
7+
Collabs:
8+
ASDI:
9+
Tags:
10+
- agriculture
711
Tags:
812
- agriculture
913
- food security

datasets/africa-field-boundary-labels.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ Documentation: Information on the primary dataset can be found [here](https://gi
1313
1414
ManagedBy: "[The Agricultural Impacts Research Group](https://agroimpacts.info/)"
1515
UpdateFrequency: "Updated versions of the dataset are added as they are developed"
16+
Collabs:
17+
ASDI:
18+
Tags:
19+
- agriculture
1620
Tags:
1721
- agriculture
1822
- machine learning

datasets/ag-loam.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ Documentation: https://github.com/UCR-Robotics/AG-LOAM
99
Contact: Hanzhe Teng ([email protected]), Konstantinos Karydis ([email protected])
1010
ManagedBy: "[Autonomous Robots and Control Systems Lab](https://sites.google.com/view/arcs-lab)"
1111
UpdateFrequency: NA
12+
Collabs:
13+
ASDI:
14+
Tags:
15+
- agriculture
1216
Tags:
1317
- aws-pds
1418
- robotics

datasets/ai3.yaml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
Name: AI3 Protein-Ligand Binding Affinity Dataset
2+
Description: >
3+
The rapid advancement of computing technologies, particularly artificial intelligence (AI), has revolutionized various domains, including drug discovery. Curated datasets are crucial for developing reliable, generalizable, and accurate models for practical applications. Generating experimental data on a large scale is an expensive and arduous process. In domains such as medical diagnostics where real-life data is hard to obtain, synthetic data has been shown to be extremely valuable. We, teams from IIIT Hyderabad, Intel, AWS, and Insilico Medicine, have performed physics-based calculations (molecular dynamics simulations) on about 20,000 protein-ligand complexes. The dataset comprises molecular dynamics snapshots, binding affinities calculated using the MM-PBSA method, and individual energy components, including electrostatic and van der Waals interactions. DatasetFileFormats essentially incorporate i. 3D coordinates of the protein-ligand complexes (pdb) in tar.gz files, and ii. CSV files containing the energy data. DatasetUsages are on i. ML scoring function for predicting binding affinities of given protein-ligand complexes, ii. Classification models for predicting correct binding poses of ligands, iii. identification of cryptic binding pockets, and iv. optimization of binding features by exploiting the individual components of the energy (experimental data has only the total binding affinity). Further, the novelty of the dataset highlights the fact that existing AI/ML training datasets lack dynamic data and are inherently biased. Further, binding affinity data existing in the literature are obtained from different experimental protocols. Therefore, this dataset has been uniquely created (from the same computational protocols) followed by free energy calculations with molecular dynamics (MD) simulations. The dynamic data-enriched protein-ligand coordinates can be used to effectively train convolutional neural network-based regression models for more accurate binding affinity prediction.
4+
Documentation: https://github.com/devalab/AI3
5+
6+
ManagedBy: International Institute of Information Technology Hyderabad
7+
UpdateFrequency: Not updated
8+
Tags:
9+
- pharmaceutical
10+
- simulations
11+
- health
12+
- life sciences
13+
- machine learning
14+
- protein
15+
- molecular dynamics
16+
- aws-pds
17+
License: https://devalab.in/AI3.html
18+
Resources:
19+
- Description: ai3data bucket includes coordinates and the energetics of ~20,000 protein-ligand binding affinity datasets. The subfolders of ai3data bucket consist of Version 1, Version2 and Version 3. Version1 contains the total Size of 10.4 GiB (Initial structure of the protein-ligand complex and the average binding affinities along with average energy components). Version2 contains the total Size of 1.2 TiB (Five trajectories of protein-ligand complex (200 snapshots in all) and the closest two water molecules for each of the protein-ligand complex, and the time series of the binding affinities along with average energy components). Version3 contains the total Size of 10.7 TiB (Five trajectories of completely solvated protein-ligand complex (200 snapshots in all), and the time series of binding affinities along with average energy components).
20+
ARN: arn:aws:s3:::ai3data
21+
Region: us-east-1
22+
Type: S3 Bucket
23+
DataAtWork:
24+
Tutorials:
25+
- Title: "AI3: Protein-Ligand Binding Affinity Dataset"
26+
URL: https://github.com/devalab/AI3
27+
AuthorName: Deva Priyakumar Lab
28+
AuthorURL: https://github.com/devalab
29+
Publications:
30+
- Title: "PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications"
31+
URL: https://www.nature.com/articles/s41597-022-01631-9
32+
AuthorName: U. Deva Priyakumar
33+
AuthorURL: https://devalab.in/
34+
- Title: "PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications"
35+
URL: https://www.nature.com/articles/s41597-023-02872-y
36+
AuthorName: U. Deva Priyakumar
37+
AuthorURL: https://devalab.in

datasets/allen-sea-ad-atlas.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,18 +30,30 @@ Resources:
3030
Type: S3 Bucket
3131
Explore:
3232
- '[Browse Bucket](https://sea-ad-single-cell-profiling.s3.amazonaws.com/index.html)'
33+
- Description: "Update notifications for s3://sea-ad-single-cell-profiling. Users can subscribe to this SNS topic with [AWS Lambda](https://aws.amazon.com/lambda/) or [AWS Simple Queue Service](https://aws.amazon.com/sqs/)."
34+
ARN: arn:aws:sns:us-west-2:208217671510:sea-ad-single-cell-profiling-object_created
35+
Region: us-west-2
36+
Type: SNS Topic
3337
- Description: Quantitative neuropathology (full resolution images, processed images, and quantifications) in a public bucket
3438
ARN: arn:aws:s3:::sea-ad-quantitative-neuropathology
3539
Region: us-west-2
3640
Type: S3 Bucket
3741
Explore:
3842
- '[Browse Bucket](https://sea-ad-quantitative-neuropathology.s3.amazonaws.com/index.html)'
43+
- Description: "Update notifications for s3://sea-ad-quantitative-neuropathology. Users can subscribe to this SNS topic with [AWS Lambda](https://aws.amazon.com/lambda/) or [AWS Simple Queue Service](https://aws.amazon.com/sqs/)."
44+
ARN: arn:aws:sns:us-west-2:208217671510:sea-ad-quantitative-neuropathology-object_created
45+
Region: us-west-2
46+
Type: SNS Topic
3947
- Description: Spatial transcriptomics data files in a public bucket
4048
ARN: arn:aws:s3:::sea-ad-spatial-transcriptomics
4149
Region: us-west-2
4250
Type: S3 Bucket
4351
Explore:
4452
- '[Browse Bucket](https://sea-ad-spatial-transcriptomics.s3.amazonaws.com/index.html)'
53+
- Description: "Update notifications for s3://sea-ad-spatial-transcriptomics. Users can subscribe to this SNS topic with [AWS Lambda](https://aws.amazon.com/lambda/) or [AWS Simple Queue Service](https://aws.amazon.com/sqs/)."
54+
ARN: arn:aws:sns:us-west-2:208217671510:sea-ad-spatial-transcriptomics-object_created
55+
Region: us-west-2
56+
Type: SNS Topic
4557
DataAtWork:
4658
Tools & Applications:
4759
- Title: Seattle Alzheimer’s Disease Brain Cell Atlas

datasets/allthebacteria.yaml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
Name: AllTheBacteria
2+
Description: All bacterial isolate whole-genome sequencing data from INSDC, uniformly assembled, quality-controlled, annotated, and searchable.
3+
Documentation: https://allthebacteria.org
4+
Contact: https://github.com/AllTheBacteria/AllTheBacteria/issues
5+
ManagedBy: "[European Bioinformatics Institute](https://www.ebi.ac.uk/)"
6+
UpdateFrequency: |
7+
The current release is for all SRA bacterial isolate data up to August 2024. The
8+
colllection will be updated occasionally, with no fixed schedule.
9+
Tags:
10+
- assembly
11+
- bacteria
12+
- bioinformatics
13+
- fasta
14+
- genomic
15+
- life sciences
16+
- microbial genomics
17+
- short read sequencing
18+
- whole genome sequencing
19+
License: "[MIT License](https://opensource.org/license/mit)"
20+
Resources:
21+
- Description: Individual, compressed genome assemblies in .fasta format in a public S3 bucket.
22+
ARN: arn:aws:s3:::allthebacteria-assemblies
23+
Region: eu-west-2
24+
Type: S3 Bucket
25+
Explore:
26+
- Description: Phylogenetically-compressed, batched xz archives of all genome assemblies in .fasta format in a public S3 bucket.
27+
ARN: arn:aws:s3:::allthebacteria-phylogeneticbatches
28+
Region: eu-west-2
29+
Type: S3 Bucket
30+
Explore:
31+
- Description: Metadata for each genome assembly, including taxonomic information, in a public S3 bucket.
32+
ARN: arn:aws:s3:::allthebacteria-metadata
33+
Region: eu-west-2
34+
Type: S3 Bucket
35+
Explore:
36+
- Description: "A [LexicMap](https://github.com/shenwei356/LexicMap) index of all genome assemblies. This can be used for efficient sequence alignment against all genomes."
37+
ARN: arn:aws:s3:::allthebacteria-lexicmap
38+
Region: eu-west-2
39+
Type: S3 Bucket
40+
Explore:
41+
DataAtWork:
42+
Publications:
43+
- Title: AllTheBacteria - all bacterial genomes assembled, available and searchable
44+
URL: https://doi.org/10.1101/2024.03.08.584059
45+
AuthorName: Hunt M, Lima L, Anderson D, Hawkey J, Shen W, Lees J, Iqbal I
46+
AuthorURL: https://researchportal.bath.ac.uk/en/persons/zamin-iqbal
47+
ADXCategories:
48+
- Healthcare & Life Sciences Data

datasets/amazon-last-mile-challenges.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ Contact: [email protected]
77
ManagedBy: "[Amazon](https://www.amazon.com/)"
88
UpdateFrequency: None
99

10+
Collabs:
11+
ASDI:
12+
Tags:
13+
- infrastructure
1014
Tags:
1115
- transportation
1216
- machine learning

datasets/aodn_animal_acoustic_tracking_delayed_qc.yaml

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,15 @@ Documentation: https://catalogue-imos.aodn.org.au/geonetwork/srv/eng/catalog.sea
2323
2424
ManagedBy: AODN
2525
UpdateFrequency: As Needed
26+
Collabs:
27+
ASDI:
28+
Tags:
29+
- biodiversity
2630
Tags:
27-
- oceans
28-
- marine mammals
29-
- biology
31+
- aws-pds
32+
- oceans
33+
- marine mammals
34+
- biology
3035
License: http://creativecommons.org/licenses/by/4.0/
3136
Resources:
3237
- Description: Cloud Optimised AODN dataset of IMOS - Animal Tracking Facility - Acoustic
@@ -38,12 +43,12 @@ DataAtWork:
3843
Tutorials:
3944
- Title: Accessing IMOS - Animal Tracking Facility - Acoustic Tracking - Quality
4045
Controlled Detections (2007 - ongoing)
41-
URL: https://nbviewer.org/github/aodn/aodn_cloud_optimised/blob/main/notebooks/animal_acoustic_tracking_delayed_qc.ipynb
46+
URL: https://github.com/aodn/aodn_cloud_optimised/blob/main/notebooks/animal_acoustic_tracking_delayed_qc.ipynb
4247
NotebookURL: https://githubtocolab.com/aodn/aodn_cloud_optimised/blob/main/notebooks/animal_acoustic_tracking_delayed_qc.ipynb
4348
AuthorName: Laurent Besnard
4449
AuthorURL: https://github.com/aodn/aodn_cloud_optimised
4550
- Title: Accessing and search for any AODN dataset
46-
URL: https://nbviewer.org/github/aodn/aodn_cloud_optimised/blob/main/notebooks/GetAodnData.ipynb
51+
URL: https://github.com/aodn/aodn_cloud_optimised/blob/main/notebooks/GetAodnData.ipynb
4752
NotebookURL: https://githubtocolab.com/aodn/aodn_cloud_optimised/blob/main/notebooks/GetAodnData.ipynb
4853
AuthorName: Laurent Besnard
4954
AuthorURL: https://github.com/aodn/aodn_cloud_optimised

datasets/aodn_animal_ctd_satellite_relay_tagging_delayed_qc.yaml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@ Documentation: https://catalogue-imos.aodn.org.au/geonetwork/srv/eng/catalog.sea
2323
2424
ManagedBy: AODN
2525
UpdateFrequency: As Needed
26+
Collabs:
27+
ASDI:
28+
Tags:
29+
- biodiversity
2630
Tags:
2731
- oceans
2832
- marine mammals
@@ -40,12 +44,12 @@ DataAtWork:
4044
Tutorials:
4145
- Title: Accessing Satellite Relay Tagging Program - Southern Ocean - MEOP Quality
4246
Controlled CTD Profiles
43-
URL: https://nbviewer.org/github/aodn/aodn_cloud_optimised/blob/main/notebooks/animal_ctd_satellite_relay_tagging_delayed_qc.ipynb
47+
URL: https://github.com/aodn/aodn_cloud_optimised/blob/main/notebooks/animal_ctd_satellite_relay_tagging_delayed_qc.ipynb
4448
NotebookURL: https://githubtocolab.com/aodn/aodn_cloud_optimised/blob/main/notebooks/animal_ctd_satellite_relay_tagging_delayed_qc.ipynb
4549
AuthorName: Laurent Besnard
4650
AuthorURL: https://github.com/aodn/aodn_cloud_optimised
4751
- Title: Accessing and search for any AODN dataset
48-
URL: https://nbviewer.org/github/aodn/aodn_cloud_optimised/blob/main/notebooks/GetAodnData.ipynb
52+
URL: https://github.com/aodn/aodn_cloud_optimised/blob/main/notebooks/GetAodnData.ipynb
4953
NotebookURL: https://githubtocolab.com/aodn/aodn_cloud_optimised/blob/main/notebooks/GetAodnData.ipynb
5054
AuthorName: Laurent Besnard
5155
AuthorURL: https://github.com/aodn/aodn_cloud_optimised

datasets/aodn_model_sea_level_anomaly_gridded_realtime.yaml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ DataAtWork:
77
AuthorURL: https://github.com/aodn/aodn_cloud_optimised
88
NotebookURL: https://githubtocolab.com/aodn/aodn_cloud_optimised/blob/main/notebooks/model_sea_level_anomaly_gridded_realtime.ipynb
99
Title: Accessing IMOS - OceanCurrent - Gridded sea level anomaly - Near real time
10-
URL: https://nbviewer.org/github/aodn/aodn_cloud_optimised/blob/main/notebooks/model_sea_level_anomaly_gridded_realtime.ipynb
10+
URL: https://github.com/aodn/aodn_cloud_optimised/blob/main/notebooks/model_sea_level_anomaly_gridded_realtime.ipynb
1111
- AuthorName: Laurent Besnard
1212
AuthorURL: https://github.com/aodn/aodn_cloud_optimised
1313
NotebookURL: https://githubtocolab.com/aodn/aodn_cloud_optimised/blob/main/notebooks/GetAodnData.ipynb
1414
Title: Accessing and search for any AODN dataset
15-
URL: https://nbviewer.org/github/aodn/aodn_cloud_optimised/blob/main/notebooks/GetAodnData.ipynb
15+
URL: https://github.com/aodn/aodn_cloud_optimised/blob/main/notebooks/GetAodnData.ipynb
1616
Description: "Gridded (adjusted) sea level anomaly (GSLA), gridded sea level (GSL)\
1717
\ and surface geostrophic velocity (UCUR,VCUR) for the Australasian region. GSLA\
1818
\ is mapped using optimal interpolation of detided, de-meaned, inverse-barometer-adjusted\
@@ -37,6 +37,10 @@ Resources:
3737
anomaly - Near real time
3838
Region: ap-southeast-2
3939
Type: S3 Bucket
40+
Collabs:
41+
ASDI:
42+
Tags:
43+
- oceans
4044
Tags:
4145
- oceans
4246
- ocean velocity

0 commit comments

Comments
 (0)