Skip to content

Commit b72493f

Browse files
authored
Merge branch 'main' into main
2 parents 8d1d412 + df01af6 commit b72493f

18 files changed

+300
-12
lines changed

datasets/aws-public-blockchain.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Description: >
1313
- XRP Ledger - SonarX - <code>s3://aws-public-blockchain/v1.1/sonarx/xrp/</code><br>
1414
- Stellar(<a href="https://developers.stellar.org/docs/learn/fundamentals/data-format/xdr" rel="noopener noreferrer">XDR files</a>) - Stellar - <code>s3://aws-public-blockchain/v1.1/stellar/</code><br>
1515
- The Open Network (TON) - TON - <code>s3://aws-public-blockchain/v1.1/ton/</code><br>
16+
- Cronos - Cronos - <code>s3://aws-public-blockchain/v1.1/cronos/</code><br>
1617
</br>
1718
1819
<h4>Become a Data Provider</h4>
@@ -24,6 +25,7 @@ Contact: [email protected]
2425
ManagedBy: "[Amazon Web Services](https://aws.amazon.com/)"
2526
UpdateFrequency: New data is delivered daily to the current date folders Parquet files.
2627
Tags:
28+
- aws-pds
2729
- blockchain
2830
- web3
2931
License: https://github.com/aws-samples/digital-assets-examples/blob/main/LICENSE
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
Name: Clinical Ultrasound Image Repository
2+
Description: Generic Clinical Ultrasound Data from Random Subjects acquired for Clinical Reasons, to be used for Developing Artificial Intelligence Applications. This dataset is complete with 2000 studies from 2000 subjects (one third each from abdominal, cardiac, and OB/GYN cases)
3+
Documentation: https://clinical-ultrasound-image-repository.s3.amazonaws.com/index.html
4+
5+
ManagedBy: "[MONAI Development Team](https://github.com/Project-MONAI/MONAI)"
6+
UpdateFrequency: This is a static dataset; however, tutorials and resources will be updated as they are developed.
7+
Tags:
8+
- medicine
9+
- medical imaging
10+
- machine learning
11+
- life sciences
12+
- aws-pds
13+
License: "[CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)"
14+
Resources:
15+
- Description: Clinical Ultrasound Image Repository
16+
ARN: arn:aws:s3:::clinical-ultrasound-image-repository
17+
Region: us-west-2
18+
Type: S3 Bucket
19+
Explore:
20+
- "[Browse Bucket](https://clinical-ultrasound-image-repository.s3.amazonaws.com/download.html)"

datasets/cmas-data-warehouse.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,12 @@ Resources:
7373
Type: S3 Bucket
7474
Explore:
7575
- '[Browse Bucket](https://cmaq-12us4-cracmm3-modeling-platform-2023.s3.amazonaws.com/index.html)'
76+
- Description: CMAQ Model Versions 5.5 CRACMM2 Input Data (2022r1) -- 12/22/2021 - 12/31/2022 12km CONUS
77+
ARN: arn:aws:s3::::::cmaq-12us1-cracmm2-modeling-platform-2022
78+
Region: us-east-1
79+
Type: S3 Bucket
80+
Explore:
81+
- '[Browse Bucket](https://cmaq-12us1-cracmm2-modeling-platform-2022.s3.amazonaws.com/index.html)'
7682
- Description: EPA 2022 Modeling Platform
7783
ARN: arn:aws:s3:::epa-2022-modeling-platform
7884
Region: us-east-1

datasets/colorado-elevation-data.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Name: State of Colorado Elevation Data
22
Description: The State of Colorado has gathered public historical elevation data.
33
Documentation: https://docs.google.com/document/d/1HMO-d4cCrBvFa2F6-N3lhP6rkezlvBmSUFA5S8t_ekQ/edit?usp=sharing
44
5-
ManagedBy: State of Colorado Governor's Office of Information Technology (OIT) GIS team
5+
ManagedBy: State of Colorado Governors Office of Information Technology OIT GIS team
66
UpdateFrequency: Periodically
77
Tags:
88
- aws-pds

datasets/colorado-imagery.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Name: State of Colorado Imagery
22
Description: The State of Colorado has gathered public historical imagery ranging from 2005 to 2021.
33
Documentation: https://docs.google.com/document/d/1YDHignUj9lQTMw2J-SqA96MTP8KmJYtk2ZKKC2ZYuPE/edit?usp=sharing
44
5-
ManagedBy: State of Colorado Governor's Office of Information Technology (OIT) GIS team
5+
ManagedBy: State of Colorado Governors Office of Information Technology OIT GIS team
66
UpdateFrequency: Periodically
77
Collabs:
88
ASDI:

datasets/deepdrug-dpeb.yaml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
Name: DeepDrug Protein Embeddings Bank (DPEB)
2+
Description: DPEB is a multimodal database of human protein embeddings integrating four biologically complementary representations—AlphaFold2, BioEmbeddings, ESM-2, and ProtVec—designed for enhanced protein-protein interaction prediction and functional classification.
3+
Documentation: https://github.com/deepdrugai/DPEB
4+
Contact: https://github.com/deepdrugai/DPEB/issues
5+
ManagedBy: "Louisiana State University"
6+
UpdateFrequency: Initial release; maintained for at least 2 years with updates planned based on new embedding models and protein coverage.
7+
Tags:
8+
- bioinformatics
9+
- protein
10+
- structural biology
11+
- machine learning
12+
- life sciences
13+
- aws-pds
14+
License: MIT
15+
Citation: "Sajol MSI et al. DeepDrug Protein Embeddings Bank (DPEB) was accessed on [DATE] at https://registry.opendata.aws/dpeb"
16+
Resources:
17+
- Description: Multimodal human protein embeddings (AlphaFold2, BioEmbeddings, ESM-2, ProtVec) with JSONL-formatted metadata containing FASTA, UniProt IDs, and embeddings.
18+
ARN: arn:aws:s3:::deepdrug-dpeb
19+
Region: us-west-2
20+
Type: S3 Bucket
21+
DataAtWork:
22+
Tutorials:
23+
- Title: Aggregating and Clustering AlphaFold2 Embeddings from DPEB
24+
URL: https://github.com/deepdrugai/DPEB/tree/main
25+
AuthorName: Md. Saiful Islam Sajol
26+
AuthorURL: https://github.com/deepdrugai
27+
Tools & Applications:
28+
- Title: DPEB Explorer Tool
29+
URL: https://github.com/deepdrugai/DPEB
30+
AuthorName: DeepDrug Lab
31+
AuthorURL: https://github.com/deepdrugai
32+
Publications:
33+
- Title: A Multimodal Human Protein Embeddings Database - DeepDrug Protein Embeddings Bank (DPEB)
34+
URL: https://doi.org/10.XXXX/nar.dpeb2025
35+
AuthorName: Sajol MSI, Rajasekaran M, Bess A, Alvin C, Mukhopadhyay S
36+
AuthorURL: https://github.com/deepdrugai/DPEB
37+
ADXCategories:
38+
- Healthcare & Life Sciences Data

datasets/dmi-danra-05.yaml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
Name: Danish Meteorological Institute (DMI) Reanalysis dataset v0.5
2+
Description: DANRA is a high-resolution meteorological reanalysis dataset for Denmark and Northwestern Europe covering the period September 1990 to December 2023
3+
Documentation: https://dmidk.github.io/danradocs/intro.html
4+
Contact: https://www.dmi.dk/kontakt
5+
ManagedBy: "[Danish Meteorological Institute](https://www.dmi.dk/)"
6+
UpdateFrequency: Not updated
7+
Collabs:
8+
ASDI:
9+
Tags:
10+
- climate
11+
- weather
12+
Tags:
13+
- aws-pds
14+
- air temperature
15+
- atmosphere
16+
- geospatial
17+
- global
18+
- land
19+
- meteorological
20+
- near-surface air temperature
21+
- near-surface relative humidity
22+
- near-surface specific humidity
23+
- model
24+
- water
25+
- weather
26+
- zarr
27+
License: DMI Reanalysis dataset v0.5 is distributed under the [Creative Commons License CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode.en)
28+
Resources:
29+
- Description: DMI Reanalysis dataset v0.5
30+
ARN: arn:aws:s3:::dmi-danra-05
31+
Region: eu-north-1
32+
Type: S3 Bucket
33+
DataAtWork:
34+
Tutorials:
35+
- Title: Looking at distributions
36+
URL: https://dmidk.github.io/danradocs/notebooks/distributions.html
37+
NotebookURL: https://dmidk.github.io/danradocs/_sources/notebooks/distributions.ipynb
38+
AuthorName: Danish Meteorological Institute
39+
AuthorURL: https://www.dmi.dk/
40+
Services:
41+
- Amazon S3
42+
- Title: DANRA figures
43+
URL: https://dmidk.github.io/danradocs/notebooks/paper-figures.html
44+
NotebookURL: https://dmidk.github.io/danradocs/_sources/notebooks/paper-figures.ipynb
45+
AuthorName: Danish Meteorological Institute
46+
AuthorURL: https://www.dmi.dk/
47+
Services:
48+
- Amazon S3
49+
ADXCategories:
50+
- Environmental Data

datasets/e11bio-prism.yaml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
Name: E11bio PRISM
2+
Description: |
3+
This dataset was generated using E11.bio's PRISM technology (Protein Reconstruction and Identification through Multiplexing),
4+
a platform that combines viral barcoding, expansion microscopy, and iterative immunolabeling for large-scale neuronal reconstruction.
5+
6+
Neurons in the mouse hippocampal CA3 were transduced with a library of adeno-associated viruses (AAVs)
7+
encoding diverse “protein bits”—small epitope tags that act as combinatorial barcodes.
8+
Tissue was then processed with an expansion microscopy protocol, physically enlarging the sample ~5×
9+
to achieve an effective voxel size of ~35 × 35 × 80 nm.
10+
Across multiple cycles of staining, imaging, and antibody stripping, the same expanded tissue was repeatedly labeled,
11+
enabling iterative immunostaining for dozens of molecular targets.
12+
13+
The dataset includes:
14+
1) Light microscopy data of multiplexed brain tissue
15+
2) Segmentations of cell morphology and protein expression in the tissue
16+
3) Files for faster visualization of the data (e.g. precomputed format)
17+
4) Additional supporting files (e.g. model predictions, manual annotations etc.)
18+
Documentation: https://github.com/e11bio/e11-open-data
19+
20+
ManagedBy: "[E11.bio](https://e11.bio)"
21+
UpdateFrequency: As required
22+
Tags:
23+
- bioinformatics
24+
- biology
25+
- brain images
26+
- cell imaging
27+
- computer vision
28+
- fluorescence imaging
29+
- high-throughput imaging
30+
- image processing
31+
- imaging
32+
- ion channels
33+
- life sciences
34+
- machine learning
35+
- microscopy
36+
- morphological reconstructions
37+
- Mus musculus
38+
- neurobiology
39+
- neuroimaging
40+
- neuroscience
41+
- protein
42+
- segmentation
43+
- zarr
44+
- aws-pds
45+
License: https://e11.bio/terms-of-use
46+
Resources:
47+
- Description: Data files in a public bucket
48+
ARN: arn:aws:s3:::e11bio-prism
49+
Region: us-east-1
50+
Type: S3 Bucket
51+
DataAtWork:
52+
Tutorials:
53+
- Title: E11.Bio PRISM OpenData
54+
URL: https://github.com/e11bio/e11-open-data
55+
NotebookURL:
56+
AuthorName: Arlo Sheridan & Johan Winnubst
57+
AuthorURL: https://e11.bio/team
58+
Tools & Applications:
59+
- Title: Volara
60+
URL: https://github.com/e11bio/volara
61+
AuthorName: Arlo Sheridan & Will Patton
62+
AuthorURL: https://e11.bio/team

datasets/ecmwf-era5.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
Deprecated: True
2-
DeprecatedNotice: The provider of this dataset will no longer maintain this dataset. We are open to talking with anyone else who might be willing to provide this dataset to the community. Contact <a href="mailto:[email protected]">[email protected]</a>.
2+
DeprecatedNotice: |
3+
<h3>The provider of this dataset will no longer maintain it, but has instead worked with NSF NCAR to rehost the dataset here: <a href=https://registry.opendata.aws/nsf-ncar-era5/>https://registry.opendata.aws/nsf-ncar-era5/</a> </h3>
34
Name: ECMWF ERA5 Reanalysis
45
Description: |
56
ERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, and the first reanalysis produced as an operational service. It utilizes the best available observation data from satellites and in-situ stations, which are assimilated and processed using ECMWF's Integrated Forecast System (IFS) Cycle 41r2.

datasets/fvcom_gom3.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
Name: UMASSD-FVCOM-GOM3-Hindcast
2+
Description: The Finite Volume Community Ocean Model (FVCOM) was used to simulate ocean water levels, velocity, temperature and salinity over a multi-decadal period (1984-present) in the waters of the Northeast US including the Gulf of Maine. The model was configured and run by the Dr. Changshen Chen, Director of the Marine Ecosystems Dynamics Modeling Laboratory in the School for Marine Science & Technology at the University of Massachusetts Dartmouth. The triangular mesh has a varying horizontal resolution from several hundred meters inshore to several kilometers offshore, and 45 terrain-following vertical layers. The model output was saved at hourly intervals from 2009-08-21 to 2022-06-17.
3+
Documentation: https://en.wikipedia.org/wiki/Finite_Volume_Community_Ocean_Model
4+
5+
ManagedBy: Open Science Computing, LLC
6+
UpdateFrequency: None
7+
Citation: https://web.archive.org/web/20161229211546id_/http://fvcom.smast.umassd.edu/wp-content/uploads/2013/11/MITSG_12-25.pdf
8+
Tags:
9+
- aws-pds
10+
- oceans
11+
License: CC0
12+
Resources:
13+
- Description: A collection of NetCDF files, kerchunk-generated Parquet reference files, and an Intake catalog
14+
ARN: arn:aws:s3:::fvcom-gom3
15+
Region: us-east-1
16+
Type: S3 Bucket
17+
DataAtWork:
18+
Tutorials:
19+
- Title: FVCOM Explorer Notebook
20+
URL: https://github.com/opensciencecomputing/fvcom
21+
NotebookURL: https://github.com/opensciencecomputing/umassd-fvcom/blob/main/fvcom_gom3_explore.ipynb
22+
AuthorName: Rich Signell
23+
AuthorURL: https://about.me/rich.signell
24+
Services:
25+
Publications:
26+
- Title: An Unstructured Grid, Finite Volume, Three Dimensional, Primitive Equations Ocean Model with Application to Coastal Ocean and Estuaries
27+
URL: https://doi.org/10.1175/1520-0426(2003)020%3C0159:AUGFVT%3E2.0.CO;2
28+
AuthorName: Changsheng Chen, Hedong Liu, and Robert C. Beardsley

0 commit comments

Comments
 (0)