Skip to content

Commit b9c39ab

Browse files
authored
Merge branch 'main' into main
2 parents 8b68dd4 + 0df426f commit b9c39ab

21 files changed

+425
-20
lines changed

datasets/askap.yaml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
Name: ASKAP Radio Telescope
2+
Description: |
3+
4+
ASKAP is the CSIRO’s newest radio telescope. It is situated at the Inyarrimanha Ilgari Bundara, the CSIRO Murchison Radio-astronomy Observatory on Wajarri Yamaji Country in the Murchison region of Western Australia, about 800 km north of Perth.
5+
6+
ASKAP consists of 36 12m dishes, spread-out as far as 6km apart. It uses a new technology called Phased Array Feeds (PAFs), which allows it to see more of the sky at once. This novel technology allows ASKAP to achieve extremely high survey speed, making it one of the best instruments in the world for mapping the sky at radio wavelengths.
7+
8+
Initial dataset available - The Rapid ASKAP Continuum Survey (RACS)
9+
10+
RACS is the first large-area survey completed with ASKAP. This survey is revolutionary as the entire sky was observed in a matter of weeks, doing what previously took telescopes years to do. RACS initially covered the whole sky at 890 MHz (RACS-Low), and has since expanded to ASKAP’s other bands (1.4 and 1.7 GHz). RACS also covers the sky in multiple epochs, with a second epoch of RACS-Low and RACS-Mid obtained and processed.
11+
12+
RACS provides astronomers with a unique opportunity to study the radio sky and radio populations, in particular supermassive blackholes (active galactic nuclei) and their role in galaxy evolution. The multi-epoch approach also allows a study of the transient sky and testing and verification of calibration methods. The large area allows for cosmological studies, such as a search for anisotropy in the galaxy population, or cosmic dipole.
13+
14+
Documentation: https://www.atnf.csiro.au/facilities/askap-radio-telescope/
15+
16+
ManagedBy: "[Australia Telescope National Facility, CSIRO](http://www.atnf.csiro.au/)"
17+
Citation: Please see the [ATNF acknowledgement page](https://www.atnf.csiro.au/resources/publications/atnf-publication-acknowledgement-statements/) for full citation instructions.
18+
UpdateFrequency: Roughly quarterly
19+
Tags:
20+
- aws-pds
21+
- astronomy
22+
- archives
23+
License: CC-BY-4.0. Attribution required for refereed scientific papers.
24+
Resources:
25+
- Description: The Rapid ASKAP Continuum Survey (RACS) Public Data Releases
26+
ARN: arn:aws:s3:::askap/racs
27+
Region: ap-southeast-2
28+
Type: S3 Bucket
29+
RequesterPays: False
30+
- Description: Notifications for new Rapid ASKAP Continuum Survey (RACS) data
31+
ARN: arn:aws:sns:ap-southeast-2:336305517014:racs-low1-object_created
32+
Region: sp-southeast-2
33+
Type: SNS Topic
34+
DataAtWork:
35+
Tutorials:
36+
- Title: CSIRO ASKAP Science Data Archive User Guide
37+
URL: https://research.csiro.au/casda/casda-user-guide/
38+
AuthorName: CSIRO, ATNF
39+
- Title: Rapid Askap Continuum Survey (RACS) Home Page
40+
URL: https://research.csiro.au/racs/
41+
AuthorName: CSIRO, ATNF
42+
Tools & Applications:
43+
Publications:
44+
- Title: ASKAP Publication List
45+
URL: https://www.atnf.csiro.au/facilities/askap-radio-telescope/publications/
46+
AuthorName: various, list maintained by CSIRO, ATNF
47+
- Title: ASKAP System Description paper
48+
URL: https://doi.org/10.1017/pasa.2021.1
49+
AuthorName: Hotan, A. et al.

datasets/aws-public-blockchain.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Description: >
1313
- XRP Ledger - SonarX - <code>s3://aws-public-blockchain/v1.1/sonarx/xrp/</code><br>
1414
- Stellar(<a href="https://developers.stellar.org/docs/learn/fundamentals/data-format/xdr" rel="noopener noreferrer">XDR files</a>) - Stellar - <code>s3://aws-public-blockchain/v1.1/stellar/</code><br>
1515
- The Open Network (TON) - TON - <code>s3://aws-public-blockchain/v1.1/ton/</code><br>
16+
- Cronos - Cronos - <code>s3://aws-public-blockchain/v1.1/cronos/</code><br>
1617
</br>
1718
1819
<h4>Become a Data Provider</h4>
@@ -24,6 +25,7 @@ Contact: [email protected]
2425
ManagedBy: "[Amazon Web Services](https://aws.amazon.com/)"
2526
UpdateFrequency: New data is delivered daily to the current date folders Parquet files.
2627
Tags:
28+
- aws-pds
2729
- blockchain
2830
- web3
2931
License: https://github.com/aws-samples/digital-assets-examples/blob/main/LICENSE
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
Name: Clinical Ultrasound Image Repository
2+
Description: Generic Clinical Ultrasound Data from Random Subjects acquired for Clinical Reasons, to be used for Developing Artificial Intelligence Applications. This dataset is complete with 2000 studies from 2000 subjects (one third each from abdominal, cardiac, and OB/GYN cases)
3+
Documentation: https://clinical-ultrasound-image-repository.s3.amazonaws.com/index.html
4+
5+
ManagedBy: "[MONAI Development Team](https://github.com/Project-MONAI/MONAI)"
6+
UpdateFrequency: This is a static dataset; however, tutorials and resources will be updated as they are developed.
7+
Tags:
8+
- medicine
9+
- medical imaging
10+
- machine learning
11+
- life sciences
12+
- aws-pds
13+
License: "[CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)"
14+
Resources:
15+
- Description: Clinical Ultrasound Image Repository
16+
ARN: arn:aws:s3:::clinical-ultrasound-image-repository
17+
Region: us-west-2
18+
Type: S3 Bucket
19+
Explore:
20+
- "[Browse Bucket](https://clinical-ultrasound-image-repository.s3.amazonaws.com/download.html)"

datasets/cmas-data-warehouse.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,12 @@ Resources:
7373
Type: S3 Bucket
7474
Explore:
7575
- '[Browse Bucket](https://cmaq-12us4-cracmm3-modeling-platform-2023.s3.amazonaws.com/index.html)'
76+
- Description: CMAQ Model Versions 5.5 CRACMM2 Input Data (2022r1) -- 12/22/2021 - 12/31/2022 12km CONUS
77+
ARN: arn:aws:s3::::::cmaq-12us1-cracmm2-modeling-platform-2022
78+
Region: us-east-1
79+
Type: S3 Bucket
80+
Explore:
81+
- '[Browse Bucket](https://cmaq-12us1-cracmm2-modeling-platform-2022.s3.amazonaws.com/index.html)'
7682
- Description: EPA 2022 Modeling Platform
7783
ARN: arn:aws:s3:::epa-2022-modeling-platform
7884
Region: us-east-1

datasets/colorado-elevation-data.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Name: State of Colorado Elevation Data
22
Description: The State of Colorado has gathered public historical elevation data.
33
Documentation: https://docs.google.com/document/d/1HMO-d4cCrBvFa2F6-N3lhP6rkezlvBmSUFA5S8t_ekQ/edit?usp=sharing
44
5-
ManagedBy: State of Colorado Governor's Office of Information Technology (OIT) GIS team
5+
ManagedBy: State of Colorado Governors Office of Information Technology OIT GIS team
66
UpdateFrequency: Periodically
77
Tags:
88
- aws-pds

datasets/colorado-imagery.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Name: State of Colorado Imagery
22
Description: The State of Colorado has gathered public historical imagery ranging from 2005 to 2021.
33
Documentation: https://docs.google.com/document/d/1YDHignUj9lQTMw2J-SqA96MTP8KmJYtk2ZKKC2ZYuPE/edit?usp=sharing
44
5-
ManagedBy: State of Colorado Governor's Office of Information Technology (OIT) GIS team
5+
ManagedBy: State of Colorado Governors Office of Information Technology OIT GIS team
66
UpdateFrequency: Periodically
77
Collabs:
88
ASDI:

datasets/deepdrug-dpeb.yaml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
Name: DeepDrug Protein Embeddings Bank (DPEB)
2+
Description: DPEB is a multimodal database of human protein embeddings integrating four biologically complementary representations—AlphaFold2, BioEmbeddings, ESM-2, and ProtVec—designed for enhanced protein-protein interaction prediction and functional classification.
3+
Documentation: https://github.com/deepdrugai/DPEB
4+
Contact: https://github.com/deepdrugai/DPEB/issues
5+
ManagedBy: "Louisiana State University"
6+
UpdateFrequency: Initial release; maintained for at least 2 years with updates planned based on new embedding models and protein coverage.
7+
Tags:
8+
- bioinformatics
9+
- protein
10+
- structural biology
11+
- machine learning
12+
- life sciences
13+
- aws-pds
14+
License: MIT
15+
Citation: "Sajol MSI et al. DeepDrug Protein Embeddings Bank (DPEB) was accessed on [DATE] at https://registry.opendata.aws/dpeb"
16+
Resources:
17+
- Description: Multimodal human protein embeddings (AlphaFold2, BioEmbeddings, ESM-2, ProtVec) with JSONL-formatted metadata containing FASTA, UniProt IDs, and embeddings.
18+
ARN: arn:aws:s3:::deepdrug-dpeb
19+
Region: us-west-2
20+
Type: S3 Bucket
21+
DataAtWork:
22+
Tutorials:
23+
- Title: Aggregating and Clustering AlphaFold2 Embeddings from DPEB
24+
URL: https://github.com/deepdrugai/DPEB/tree/main
25+
AuthorName: Md. Saiful Islam Sajol
26+
AuthorURL: https://github.com/deepdrugai
27+
Tools & Applications:
28+
- Title: DPEB Explorer Tool
29+
URL: https://github.com/deepdrugai/DPEB
30+
AuthorName: DeepDrug Lab
31+
AuthorURL: https://github.com/deepdrugai
32+
Publications:
33+
- Title: A Multimodal Human Protein Embeddings Database - DeepDrug Protein Embeddings Bank (DPEB)
34+
URL: https://doi.org/10.XXXX/nar.dpeb2025
35+
AuthorName: Sajol MSI, Rajasekaran M, Bess A, Alvin C, Mukhopadhyay S
36+
AuthorURL: https://github.com/deepdrugai/DPEB
37+
ADXCategories:
38+
- Healthcare & Life Sciences Data

datasets/depmap-omics-ccle.yaml

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
Name: The Cancer Dependency Map (DepMap) Cancer Cell Line Encyclopedia (CCLE) Dataset
2+
Description: This dataset consists of whole genome sequencing (WGS), whole exome sequencing (WES), and RNA sequencing files generated from ~1000 cancer cell lines described in Ghandi et al., 2019.
3+
Documentation: https://github.com/broadinstitute/depmap-omics-ccle
4+
Contact: https://forum.depmap.org
5+
ManagedBy: "[Cancer Data Science](https://cancerdatascience.org/), [Broad Institute](https://www.broadinstitute.org/)"
6+
UpdateFrequency: occasionally (as additional sequencings are generated for publicly-releasible CCLE models)
7+
Tags:
8+
- aws-pds
9+
- bam
10+
- biology
11+
- bioinformatics
12+
- cancer
13+
- genetic
14+
- genomic
15+
- Homo sapiens
16+
- life sciences
17+
- short read sequencing
18+
- transcriptomics
19+
- whole exome sequencing
20+
- whole genome sequencing
21+
License: https://grants.nih.gov/policy-and-compliance/policy-topics/sharing-policies/accessing-data/using-genomic-data
22+
Citation: Ghandi, Huang, Jané-Valbuena et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). https://doi.org/10.1038/s41586-019-1186-3
23+
Resources:
24+
- Description: CRAM/BAM files (and their corresponding CRAI/BAI indexes) for RNA, WES, and WGS samples released by The Cancer Dependency Map (DepMap) as part of the Cancer Cell Line Encyclopedia (CCLE) project
25+
ARN: arn:aws:s3:::depmap-omics-ccle
26+
Region: us-east-1
27+
Type: S3 Bucket
28+
- Description: Notifications for new depmap-omics-ccle data
29+
ARN: arn:aws:sns:us-east-1:019511184952:depmap-omics-ccle-object_created
30+
Region: us-east-1
31+
Type: SNS Topic
32+
DataAtWork:
33+
Tutorials:
34+
- Title: DepMap Omics CCLE data on the AWS Open Data Registry
35+
URL: https://github.com/broadinstitute/depmap-omics-ccle
36+
AuthorName: Devin McCabe
37+
Tools & Applications:
38+
- Title: The Cancer Dependency Map (DepMap)
39+
URL: https://depmap.org
40+
AuthorName: Arafeh, Shibue, Dempster et al.
41+
- Title: Cancer Cell Line Encyclopedia (CCLE)
42+
URL: https://sites.broadinstitute.org/ccle
43+
AuthorName: Ghandi, Huang, Jané-Valbuena et al.
44+
Publications:
45+
- Title: Next-generation characterization of the Cancer Cell Line Encyclopedia
46+
URL: https://www.nature.com/articles/s41586-019-1186-3
47+
AuthorName: Ghandi, Huang, Jané-Valbuena et al.
48+
- Title: The present and future of the Cancer Dependency Map
49+
URL: https://www.nature.com/articles/s41568-024-00763-x
50+
AuthorName: Arafeh, Shibue, Dempster et al.
51+
AuthorURL: https://depmap.org
52+
- Title: Partial gene suppression improves identification of cancer vulnerabilities when CRISPR-Cas9 knockout is pan-lethal
53+
URL: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-03020-w
54+
AuthorName: Krill-Burger, Dempster, Borah et al.
55+
- Title: Genetic dependencies associated with transcription factor activities in human cancer cell lines
56+
URL: https://www.sciencedirect.com/science/article/pii/S2211124724005035
57+
AuthorName: Thatikonda, Supper, Wachter et al.
58+
- Title: Bridging the gap between cancer cell line models and tumours using gene expression data
59+
URL: https://www.nature.com/articles/s41416-021-01359-0
60+
AuthorName: Noorbakhsh, Vazquez & McFarland
61+
- Title: Integrated cross-study datasets of genetic dependencies in cancer
62+
URL: https://www.nature.com/articles/s41467-021-21898-7
63+
AuthorName: Pacini, Dempster, Boyle et al.
64+
- Title: Machine learning multi-omics analysis reveals cancer driver dysregulation in pan-cancer cell lines compared to primary tumors
65+
URL: https://www.nature.com/articles/s42003-022-04075-4
66+
AuthorName: Sanders, Chandra, Zebarjadi et al.
67+
- Title: "The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks"
68+
URL: https://link.springer.com/article/10.1186/s13059-023-02877-1
69+
AuthorName: Ben Guebila, Wang, Lopes-Ramos et al.
70+
ADXCategories:
71+
- Healthcare & Life Sciences Data

datasets/dmi-danra-05.yaml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
Name: Danish Meteorological Institute (DMI) Reanalysis dataset v0.5
2+
Description: DANRA is a high-resolution meteorological reanalysis dataset for Denmark and Northwestern Europe covering the period September 1990 to December 2023
3+
Documentation: https://dmidk.github.io/danradocs/intro.html
4+
Contact: https://www.dmi.dk/kontakt
5+
ManagedBy: "[Danish Meteorological Institute](https://www.dmi.dk/)"
6+
UpdateFrequency: Not updated
7+
Collabs:
8+
ASDI:
9+
Tags:
10+
- climate
11+
- weather
12+
Tags:
13+
- aws-pds
14+
- air temperature
15+
- atmosphere
16+
- geospatial
17+
- global
18+
- land
19+
- meteorological
20+
- near-surface air temperature
21+
- near-surface relative humidity
22+
- near-surface specific humidity
23+
- model
24+
- water
25+
- weather
26+
- zarr
27+
License: DMI Reanalysis dataset v0.5 is distributed under the [Creative Commons License CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode.en)
28+
Resources:
29+
- Description: DMI Reanalysis dataset v0.5
30+
ARN: arn:aws:s3:::dmi-danra-05
31+
Region: eu-north-1
32+
Type: S3 Bucket
33+
DataAtWork:
34+
Tutorials:
35+
- Title: Looking at distributions
36+
URL: https://dmidk.github.io/danradocs/notebooks/distributions.html
37+
NotebookURL: https://dmidk.github.io/danradocs/_sources/notebooks/distributions.ipynb
38+
AuthorName: Danish Meteorological Institute
39+
AuthorURL: https://www.dmi.dk/
40+
Services:
41+
- Amazon S3
42+
- Title: DANRA figures
43+
URL: https://dmidk.github.io/danradocs/notebooks/paper-figures.html
44+
NotebookURL: https://dmidk.github.io/danradocs/_sources/notebooks/paper-figures.ipynb
45+
AuthorName: Danish Meteorological Institute
46+
AuthorURL: https://www.dmi.dk/
47+
Services:
48+
- Amazon S3
49+
ADXCategories:
50+
- Environmental Data

datasets/e11bio-prism.yaml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
Name: E11bio PRISM
2+
Description: |
3+
This dataset was generated using E11.bio's PRISM technology (Protein Reconstruction and Identification through Multiplexing),
4+
a platform that combines viral barcoding, expansion microscopy, and iterative immunolabeling for large-scale neuronal reconstruction.
5+
6+
Neurons in the mouse hippocampal CA3 were transduced with a library of adeno-associated viruses (AAVs)
7+
encoding diverse “protein bits”—small epitope tags that act as combinatorial barcodes.
8+
Tissue was then processed with an expansion microscopy protocol, physically enlarging the sample ~5×
9+
to achieve an effective voxel size of ~35 × 35 × 80 nm.
10+
Across multiple cycles of staining, imaging, and antibody stripping, the same expanded tissue was repeatedly labeled,
11+
enabling iterative immunostaining for dozens of molecular targets.
12+
13+
The dataset includes:
14+
1) Light microscopy data of multiplexed brain tissue
15+
2) Segmentations of cell morphology and protein expression in the tissue
16+
3) Files for faster visualization of the data (e.g. precomputed format)
17+
4) Additional supporting files (e.g. model predictions, manual annotations etc.)
18+
Documentation: https://github.com/e11bio/e11-open-data
19+
20+
ManagedBy: "[E11.bio](https://e11.bio)"
21+
UpdateFrequency: As required
22+
Tags:
23+
- bioinformatics
24+
- biology
25+
- brain images
26+
- cell imaging
27+
- computer vision
28+
- fluorescence imaging
29+
- high-throughput imaging
30+
- image processing
31+
- imaging
32+
- ion channels
33+
- life sciences
34+
- machine learning
35+
- microscopy
36+
- morphological reconstructions
37+
- Mus musculus
38+
- neurobiology
39+
- neuroimaging
40+
- neuroscience
41+
- protein
42+
- segmentation
43+
- zarr
44+
- aws-pds
45+
License: https://e11.bio/terms-of-use
46+
Resources:
47+
- Description: Data files in a public bucket
48+
ARN: arn:aws:s3:::e11bio-prism
49+
Region: us-east-1
50+
Type: S3 Bucket
51+
DataAtWork:
52+
Tutorials:
53+
- Title: E11.Bio PRISM OpenData
54+
URL: https://github.com/e11bio/e11-open-data
55+
NotebookURL:
56+
AuthorName: Arlo Sheridan & Johan Winnubst
57+
AuthorURL: https://e11.bio/team
58+
Tools & Applications:
59+
- Title: Volara
60+
URL: https://github.com/e11bio/volara
61+
AuthorName: Arlo Sheridan & Will Patton
62+
AuthorURL: https://e11.bio/team

0 commit comments

Comments
 (0)