Skip to content

Commit 4957527

Browse files
authored
Merge branch 'main' into main
2 parents 625cdcc + 59bbf6c commit 4957527

16 files changed

+267
-22
lines changed

datasets/aws-public-blockchain.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Description: >
1313
- XRP Ledger - SonarX - <code>s3://aws-public-blockchain/v1.1/sonarx/xrp/</code><br>
1414
- Stellar(<a href="https://developers.stellar.org/docs/learn/fundamentals/data-format/xdr" rel="noopener noreferrer">XDR files</a>) - Stellar - <code>s3://aws-public-blockchain/v1.1/stellar/</code><br>
1515
- The Open Network (TON) - TON - <code>s3://aws-public-blockchain/v1.1/ton/</code><br>
16+
- Cronos - Cronos - <code>s3://aws-public-blockchain/v1.1/cronos/</code><br>
1617
</br>
1718
1819
<h4>Become a Data Provider</h4>
@@ -24,6 +25,7 @@ Contact: [email protected]
2425
ManagedBy: "[Amazon Web Services](https://aws.amazon.com/)"
2526
UpdateFrequency: New data is delivered daily to the current date folders Parquet files.
2627
Tags:
28+
- aws-pds
2729
- blockchain
2830
- web3
2931
License: https://github.com/aws-samples/digital-assets-examples/blob/main/LICENSE

datasets/colorado-elevation-data.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Name: State of Colorado Elevation Data
22
Description: The State of Colorado has gathered public historical elevation data.
33
Documentation: https://docs.google.com/document/d/1HMO-d4cCrBvFa2F6-N3lhP6rkezlvBmSUFA5S8t_ekQ/edit?usp=sharing
44
5-
ManagedBy: State of Colorado Governor's Office of Information Technology (OIT) GIS team
5+
ManagedBy: State of Colorado Governors Office of Information Technology OIT GIS team
66
UpdateFrequency: Periodically
77
Tags:
88
- aws-pds

datasets/colorado-imagery.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Name: State of Colorado Imagery
22
Description: The State of Colorado has gathered public historical imagery ranging from 2005 to 2021.
33
Documentation: https://docs.google.com/document/d/1YDHignUj9lQTMw2J-SqA96MTP8KmJYtk2ZKKC2ZYuPE/edit?usp=sharing
44
5-
ManagedBy: State of Colorado Governor's Office of Information Technology (OIT) GIS team
5+
ManagedBy: State of Colorado Governors Office of Information Technology OIT GIS team
66
UpdateFrequency: Periodically
77
Collabs:
88
ASDI:

datasets/deepdrug-dpeb.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ License: MIT
1515
Citation: "Sajol MSI et al. DeepDrug Protein Embeddings Bank (DPEB) was accessed on [DATE] at https://registry.opendata.aws/dpeb"
1616
Resources:
1717
- Description: Multimodal human protein embeddings (AlphaFold2, BioEmbeddings, ESM-2, ProtVec) with JSONL-formatted metadata containing FASTA, UniProt IDs, and embeddings.
18-
ARN: arn:aws:s3:::deepdrug-dpeb-human-protein-embeddings
19-
Region: us-east-1
18+
ARN: arn:aws:s3:::deepdrug-dpeb
19+
Region: us-west-2
2020
Type: S3 Bucket
2121
DataAtWork:
2222
Tutorials:

datasets/depmap-omics-ccle.yaml

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
Name: The Cancer Dependency Map (DepMap) Cancer Cell Line Encyclopedia (CCLE) Dataset
2+
Description: This dataset consists of whole genome sequencing (WGS), whole exome sequencing (WES), and RNA sequencing files generated from ~1000 cancer cell lines described in Ghandi et al., 2019.
3+
Documentation: https://github.com/broadinstitute/depmap-omics-ccle
4+
Contact: https://forum.depmap.org
5+
ManagedBy: "[Cancer Data Science](https://cancerdatascience.org/), [Broad Institute](https://www.broadinstitute.org/)"
6+
UpdateFrequency: occasionally (as additional sequencings are generated for publicly-releasible CCLE models)
7+
Tags:
8+
- aws-pds
9+
- bam
10+
- biology
11+
- bioinformatics
12+
- cancer
13+
- genetic
14+
- genomic
15+
- Homo sapiens
16+
- life sciences
17+
- short read sequencing
18+
- transcriptomics
19+
- whole exome sequencing
20+
- whole genome sequencing
21+
License: https://grants.nih.gov/policy-and-compliance/policy-topics/sharing-policies/accessing-data/using-genomic-data
22+
Citation: Ghandi, Huang, Jané-Valbuena et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). https://doi.org/10.1038/s41586-019-1186-3
23+
Resources:
24+
- Description: CRAM/BAM files (and their corresponding CRAI/BAI indexes) for RNA, WES, and WGS samples released by The Cancer Dependency Map (DepMap) as part of the Cancer Cell Line Encyclopedia (CCLE) project
25+
ARN: arn:aws:s3:::depmap-omics-ccle
26+
Region: us-east-1
27+
Type: S3 Bucket
28+
- Description: Notifications for new depmap-omics-ccle data
29+
ARN: arn:aws:sns:us-east-1:019511184952:depmap-omics-ccle-object_created
30+
Region: us-east-1
31+
Type: SNS Topic
32+
DataAtWork:
33+
Tutorials:
34+
- Title: DepMap Omics CCLE data on the AWS Open Data Registry
35+
URL: https://github.com/broadinstitute/depmap-omics-ccle
36+
AuthorName: Devin McCabe
37+
Tools & Applications:
38+
- Title: The Cancer Dependency Map (DepMap)
39+
URL: https://depmap.org
40+
AuthorName: Arafeh, Shibue, Dempster et al.
41+
- Title: Cancer Cell Line Encyclopedia (CCLE)
42+
URL: https://sites.broadinstitute.org/ccle
43+
AuthorName: Ghandi, Huang, Jané-Valbuena et al.
44+
Publications:
45+
- Title: Next-generation characterization of the Cancer Cell Line Encyclopedia
46+
URL: https://www.nature.com/articles/s41586-019-1186-3
47+
AuthorName: Ghandi, Huang, Jané-Valbuena et al.
48+
- Title: The present and future of the Cancer Dependency Map
49+
URL: https://www.nature.com/articles/s41568-024-00763-x
50+
AuthorName: Arafeh, Shibue, Dempster et al.
51+
AuthorURL: https://depmap.org
52+
- Title: Partial gene suppression improves identification of cancer vulnerabilities when CRISPR-Cas9 knockout is pan-lethal
53+
URL: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-03020-w
54+
AuthorName: Krill-Burger, Dempster, Borah et al.
55+
- Title: Genetic dependencies associated with transcription factor activities in human cancer cell lines
56+
URL: https://www.sciencedirect.com/science/article/pii/S2211124724005035
57+
AuthorName: Thatikonda, Supper, Wachter et al.
58+
- Title: Bridging the gap between cancer cell line models and tumours using gene expression data
59+
URL: https://www.nature.com/articles/s41416-021-01359-0
60+
AuthorName: Noorbakhsh, Vazquez & McFarland
61+
- Title: Integrated cross-study datasets of genetic dependencies in cancer
62+
URL: https://www.nature.com/articles/s41467-021-21898-7
63+
AuthorName: Pacini, Dempster, Boyle et al.
64+
- Title: Machine learning multi-omics analysis reveals cancer driver dysregulation in pan-cancer cell lines compared to primary tumors
65+
URL: https://www.nature.com/articles/s42003-022-04075-4
66+
AuthorName: Sanders, Chandra, Zebarjadi et al.
67+
- Title: "The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks"
68+
URL: https://link.springer.com/article/10.1186/s13059-023-02877-1
69+
AuthorName: Ben Guebila, Wang, Lopes-Ramos et al.
70+
ADXCategories:
71+
- Healthcare & Life Sciences Data

datasets/dmi-danra-05.yaml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
Name: Danish Meteorological Institute (DMI) Reanalysis dataset v0.5
2+
Description: DANRA is a high-resolution meteorological reanalysis dataset for Denmark and Northwestern Europe covering the period September 1990 to December 2023
3+
Documentation: https://dmidk.github.io/danradocs/intro.html
4+
Contact: https://www.dmi.dk/kontakt
5+
ManagedBy: "[Danish Meteorological Institute](https://www.dmi.dk/)"
6+
UpdateFrequency: Not updated
7+
Collabs:
8+
ASDI:
9+
Tags:
10+
- climate
11+
- weather
12+
Tags:
13+
- aws-pds
14+
- air temperature
15+
- atmosphere
16+
- geospatial
17+
- global
18+
- land
19+
- meteorological
20+
- near-surface air temperature
21+
- near-surface relative humidity
22+
- near-surface specific humidity
23+
- model
24+
- water
25+
- weather
26+
- zarr
27+
License: DMI Reanalysis dataset v0.5 is distributed under the [Creative Commons License CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode.en)
28+
Resources:
29+
- Description: DMI Reanalysis dataset v0.5
30+
ARN: arn:aws:s3:::dmi-danra-05
31+
Region: eu-north-1
32+
Type: S3 Bucket
33+
DataAtWork:
34+
Tutorials:
35+
- Title: Looking at distributions
36+
URL: https://dmidk.github.io/danradocs/notebooks/distributions.html
37+
NotebookURL: https://dmidk.github.io/danradocs/_sources/notebooks/distributions.ipynb
38+
AuthorName: Danish Meteorological Institute
39+
AuthorURL: https://www.dmi.dk/
40+
Services:
41+
- Amazon S3
42+
- Title: DANRA figures
43+
URL: https://dmidk.github.io/danradocs/notebooks/paper-figures.html
44+
NotebookURL: https://dmidk.github.io/danradocs/_sources/notebooks/paper-figures.ipynb
45+
AuthorName: Danish Meteorological Institute
46+
AuthorURL: https://www.dmi.dk/
47+
Services:
48+
- Amazon S3
49+
ADXCategories:
50+
- Environmental Data

datasets/e11bio-prism.yaml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
Name: E11bio PRISM
2+
Description: |
3+
This dataset was generated using E11.bio's PRISM technology (Protein Reconstruction and Identification through Multiplexing),
4+
a platform that combines viral barcoding, expansion microscopy, and iterative immunolabeling for large-scale neuronal reconstruction.
5+
6+
Neurons in the mouse hippocampal CA3 were transduced with a library of adeno-associated viruses (AAVs)
7+
encoding diverse “protein bits”—small epitope tags that act as combinatorial barcodes.
8+
Tissue was then processed with an expansion microscopy protocol, physically enlarging the sample ~5×
9+
to achieve an effective voxel size of ~35 × 35 × 80 nm.
10+
Across multiple cycles of staining, imaging, and antibody stripping, the same expanded tissue was repeatedly labeled,
11+
enabling iterative immunostaining for dozens of molecular targets.
12+
13+
The dataset includes:
14+
1) Light microscopy data of multiplexed brain tissue
15+
2) Segmentations of cell morphology and protein expression in the tissue
16+
3) Files for faster visualization of the data (e.g. precomputed format)
17+
4) Additional supporting files (e.g. model predictions, manual annotations etc.)
18+
Documentation: https://github.com/e11bio/e11-open-data
19+
20+
ManagedBy: "[E11.bio](https://e11.bio)"
21+
UpdateFrequency: As required
22+
Tags:
23+
- bioinformatics
24+
- biology
25+
- brain images
26+
- cell imaging
27+
- computer vision
28+
- fluorescence imaging
29+
- high-throughput imaging
30+
- image processing
31+
- imaging
32+
- ion channels
33+
- life sciences
34+
- machine learning
35+
- microscopy
36+
- morphological reconstructions
37+
- Mus musculus
38+
- neurobiology
39+
- neuroimaging
40+
- neuroscience
41+
- protein
42+
- segmentation
43+
- zarr
44+
- aws-pds
45+
License: https://e11.bio/terms-of-use
46+
Resources:
47+
- Description: Data files in a public bucket
48+
ARN: arn:aws:s3:::e11bio-prism
49+
Region: us-east-1
50+
Type: S3 Bucket
51+
DataAtWork:
52+
Tutorials:
53+
- Title: E11.Bio PRISM OpenData
54+
URL: https://github.com/e11bio/e11-open-data
55+
NotebookURL:
56+
AuthorName: Arlo Sheridan & Johan Winnubst
57+
AuthorURL: https://e11.bio/team
58+
Tools & Applications:
59+
- Title: Volara
60+
URL: https://github.com/e11bio/volara
61+
AuthorName: Arlo Sheridan & Will Patton
62+
AuthorURL: https://e11.bio/team

datasets/ecmwf-era5.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
Deprecated: True
2-
DeprecatedNotice: The provider of this dataset will no longer maintain this dataset. We are open to talking with anyone else who might be willing to provide this dataset to the community. Contact <a href="mailto:[email protected]">[email protected]</a>.
2+
DeprecatedNotice: |
3+
<h3>The provider of this dataset will no longer maintain it, but has instead worked with NSF NCAR to rehost the dataset here: <a href=https://registry.opendata.aws/nsf-ncar-era5/>https://registry.opendata.aws/nsf-ncar-era5/</a> </h3>
34
Name: ECMWF ERA5 Reanalysis
45
Description: |
56
ERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, and the first reanalysis produced as an operational service. It utilizes the best available observation data from satellites and in-situ stations, which are assimilated and processed using ECMWF's Integrated Forecast System (IFS) Cycle 41r2.

datasets/fvcom_gom3.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
Name: UMASSD-FVCOM-GOM3-Hindcast
2+
Description: The Finite Volume Community Ocean Model (FVCOM) was used to simulate ocean water levels, velocity, temperature and salinity over a multi-decadal period (1984-present) in the waters of the Northeast US including the Gulf of Maine. The model was configured and run by the Dr. Changshen Chen, Director of the Marine Ecosystems Dynamics Modeling Laboratory in the School for Marine Science & Technology at the University of Massachusetts Dartmouth. The triangular mesh has a varying horizontal resolution from several hundred meters inshore to several kilometers offshore, and 45 terrain-following vertical layers. The model output was saved at hourly intervals from 2009-08-21 to 2022-06-17.
3+
Documentation: https://en.wikipedia.org/wiki/Finite_Volume_Community_Ocean_Model
4+
5+
ManagedBy: Open Science Computing, LLC
6+
UpdateFrequency: None
7+
Citation: https://web.archive.org/web/20161229211546id_/http://fvcom.smast.umassd.edu/wp-content/uploads/2013/11/MITSG_12-25.pdf
8+
Tags:
9+
- aws-pds
10+
- oceans
11+
License: CC0
12+
Resources:
13+
- Description: A collection of NetCDF files, kerchunk-generated Parquet reference files, and an Intake catalog
14+
ARN: arn:aws:s3:::fvcom-gom3
15+
Region: us-east-1
16+
Type: S3 Bucket
17+
DataAtWork:
18+
Tutorials:
19+
- Title: FVCOM Explorer Notebook
20+
URL: https://github.com/opensciencecomputing/fvcom
21+
NotebookURL: https://github.com/opensciencecomputing/umassd-fvcom/blob/main/fvcom_gom3_explore.ipynb
22+
AuthorName: Rich Signell
23+
AuthorURL: https://about.me/rich.signell
24+
Services:
25+
Publications:
26+
- Title: An Unstructured Grid, Finite Volume, Three Dimensional, Primitive Equations Ocean Model with Application to Coastal Ocean and Estuaries
27+
URL: https://doi.org/10.1175/1520-0426(2003)020%3C0159:AUGFVT%3E2.0.CO;2
28+
AuthorName: Changsheng Chen, Hedong Liu, and Robert C. Beardsley

datasets/humancellatlas.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Documentation: https://data.humancellatlas.org/
1212

1313
Contact: https://data.humancellatlas.org/contact
1414

15-
ManagedBy: UC Santa Cruz Genomics Institute, University of California, Santa Cruz (UCSC)
15+
ManagedBy: UC Santa Cruz Genomics Institute, University of California, Santa Cruz, UCSC
1616

1717
UpdateFrequency: Monthly
1818

@@ -95,4 +95,4 @@ DataAtWork:
9595
AuthorName: "Various authors"
9696

9797
ADXCategories:
98-
- Healthcare & Life Sciences Data
98+
- Healthcare & Life Sciences Data

0 commit comments

Comments
 (0)