Skip to content

Commit 47d146c

Browse files
committed
Merge branch 'add-depmap-omics-ccle' of https://github.com/dpmccabe/open-data-registry into add-depmap-omics-ccle
2 parents 6827958 + a1038b3 commit 47d146c

12 files changed

+191
-10
lines changed

datasets/aws-public-blockchain.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Description: >
1313
- XRP Ledger - SonarX - <code>s3://aws-public-blockchain/v1.1/sonarx/xrp/</code><br>
1414
- Stellar(<a href="https://developers.stellar.org/docs/learn/fundamentals/data-format/xdr" rel="noopener noreferrer">XDR files</a>) - Stellar - <code>s3://aws-public-blockchain/v1.1/stellar/</code><br>
1515
- The Open Network (TON) - TON - <code>s3://aws-public-blockchain/v1.1/ton/</code><br>
16+
- Cronos - Cronos - <code>s3://aws-public-blockchain/v1.1/cronos/</code><br>
1617
</br>
1718
1819
<h4>Become a Data Provider</h4>
@@ -24,6 +25,7 @@ Contact: [email protected]
2425
ManagedBy: "[Amazon Web Services](https://aws.amazon.com/)"
2526
UpdateFrequency: New data is delivered daily to the current date folders Parquet files.
2627
Tags:
28+
- aws-pds
2729
- blockchain
2830
- web3
2931
License: https://github.com/aws-samples/digital-assets-examples/blob/main/LICENSE

datasets/colorado-elevation-data.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Name: State of Colorado Elevation Data
22
Description: The State of Colorado has gathered public historical elevation data.
33
Documentation: https://docs.google.com/document/d/1HMO-d4cCrBvFa2F6-N3lhP6rkezlvBmSUFA5S8t_ekQ/edit?usp=sharing
44
5-
ManagedBy: State of Colorado Governor's Office of Information Technology (OIT) GIS team
5+
ManagedBy: State of Colorado Governors Office of Information Technology OIT GIS team
66
UpdateFrequency: Periodically
77
Tags:
88
- aws-pds

datasets/colorado-imagery.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Name: State of Colorado Imagery
22
Description: The State of Colorado has gathered public historical imagery ranging from 2005 to 2021.
33
Documentation: https://docs.google.com/document/d/1YDHignUj9lQTMw2J-SqA96MTP8KmJYtk2ZKKC2ZYuPE/edit?usp=sharing
44
5-
ManagedBy: State of Colorado Governor's Office of Information Technology (OIT) GIS team
5+
ManagedBy: State of Colorado Governors Office of Information Technology OIT GIS team
66
UpdateFrequency: Periodically
77
Collabs:
88
ASDI:

datasets/deepdrug-dpeb.yaml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
Name: DeepDrug Protein Embeddings Bank (DPEB)
2+
Description: DPEB is a multimodal database of human protein embeddings integrating four biologically complementary representations—AlphaFold2, BioEmbeddings, ESM-2, and ProtVec—designed for enhanced protein-protein interaction prediction and functional classification.
3+
Documentation: https://github.com/deepdrugai/DPEB
4+
Contact: https://github.com/deepdrugai/DPEB/issues
5+
ManagedBy: "Louisiana State University"
6+
UpdateFrequency: Initial release; maintained for at least 2 years with updates planned based on new embedding models and protein coverage.
7+
Tags:
8+
- bioinformatics
9+
- protein
10+
- structural biology
11+
- machine learning
12+
- life sciences
13+
- aws-pds
14+
License: MIT
15+
Citation: "Sajol MSI et al. DeepDrug Protein Embeddings Bank (DPEB) was accessed on [DATE] at https://registry.opendata.aws/dpeb"
16+
Resources:
17+
- Description: Multimodal human protein embeddings (AlphaFold2, BioEmbeddings, ESM-2, ProtVec) with JSONL-formatted metadata containing FASTA, UniProt IDs, and embeddings.
18+
ARN: arn:aws:s3:::deepdrug-dpeb
19+
Region: us-west-2
20+
Type: S3 Bucket
21+
DataAtWork:
22+
Tutorials:
23+
- Title: Aggregating and Clustering AlphaFold2 Embeddings from DPEB
24+
URL: https://github.com/deepdrugai/DPEB/tree/main
25+
AuthorName: Md. Saiful Islam Sajol
26+
AuthorURL: https://github.com/deepdrugai
27+
Tools & Applications:
28+
- Title: DPEB Explorer Tool
29+
URL: https://github.com/deepdrugai/DPEB
30+
AuthorName: DeepDrug Lab
31+
AuthorURL: https://github.com/deepdrugai
32+
Publications:
33+
- Title: A Multimodal Human Protein Embeddings Database - DeepDrug Protein Embeddings Bank (DPEB)
34+
URL: https://doi.org/10.XXXX/nar.dpeb2025
35+
AuthorName: Sajol MSI, Rajasekaran M, Bess A, Alvin C, Mukhopadhyay S
36+
AuthorURL: https://github.com/deepdrugai/DPEB
37+
ADXCategories:
38+
- Healthcare & Life Sciences Data

datasets/e11bio-prism.yaml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
Name: E11bio PRISM
2+
Description: |
3+
This dataset was generated using E11.bio's PRISM technology (Protein Reconstruction and Identification through Multiplexing),
4+
a platform that combines viral barcoding, expansion microscopy, and iterative immunolabeling for large-scale neuronal reconstruction.
5+
6+
Neurons in the mouse hippocampal CA3 were transduced with a library of adeno-associated viruses (AAVs)
7+
encoding diverse “protein bits”—small epitope tags that act as combinatorial barcodes.
8+
Tissue was then processed with an expansion microscopy protocol, physically enlarging the sample ~5×
9+
to achieve an effective voxel size of ~35 × 35 × 80 nm.
10+
Across multiple cycles of staining, imaging, and antibody stripping, the same expanded tissue was repeatedly labeled,
11+
enabling iterative immunostaining for dozens of molecular targets.
12+
13+
The dataset includes:
14+
1) Light microscopy data of multiplexed brain tissue
15+
2) Segmentations of cell morphology and protein expression in the tissue
16+
3) Files for faster visualization of the data (e.g. precomputed format)
17+
4) Additional supporting files (e.g. model predictions, manual annotations etc.)
18+
Documentation: https://github.com/e11bio/e11-open-data
19+
20+
ManagedBy: "[E11.bio](https://e11.bio)"
21+
UpdateFrequency: As required
22+
Tags:
23+
- bioinformatics
24+
- biology
25+
- brain images
26+
- cell imaging
27+
- computer vision
28+
- fluorescence imaging
29+
- high-throughput imaging
30+
- image processing
31+
- imaging
32+
- ion channels
33+
- life sciences
34+
- machine learning
35+
- microscopy
36+
- morphological reconstructions
37+
- Mus musculus
38+
- neurobiology
39+
- neuroimaging
40+
- neuroscience
41+
- protein
42+
- segmentation
43+
- zarr
44+
- aws-pds
45+
License: https://e11.bio/terms-of-use
46+
Resources:
47+
- Description: Data files in a public bucket
48+
ARN: arn:aws:s3:::e11bio-prism
49+
Region: us-east-1
50+
Type: S3 Bucket
51+
DataAtWork:
52+
Tutorials:
53+
- Title: E11.Bio PRISM OpenData
54+
URL: https://github.com/e11bio/e11-open-data
55+
NotebookURL:
56+
AuthorName: Arlo Sheridan & Johan Winnubst
57+
AuthorURL: https://e11.bio/team
58+
Tools & Applications:
59+
- Title: Volara
60+
URL: https://github.com/e11bio/volara
61+
AuthorName: Arlo Sheridan & Will Patton
62+
AuthorURL: https://e11.bio/team

datasets/ecmwf-era5.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
Deprecated: True
2-
DeprecatedNotice: The provider of this dataset will no longer maintain this dataset. We are open to talking with anyone else who might be willing to provide this dataset to the community. Contact <a href="mailto:[email protected]">[email protected]</a>.
2+
DeprecatedNotice: |
3+
<h3>The provider of this dataset will no longer maintain it, but has instead worked with NSF NCAR to rehost the dataset here: <a href=https://registry.opendata.aws/nsf-ncar-era5/>https://registry.opendata.aws/nsf-ncar-era5/</a> </h3>
34
Name: ECMWF ERA5 Reanalysis
45
Description: |
56
ERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, and the first reanalysis produced as an operational service. It utilizes the best available observation data from satellites and in-situ stations, which are assimilated and processed using ECMWF's Integrated Forecast System (IFS) Cycle 41r2.

datasets/humancellatlas.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Documentation: https://data.humancellatlas.org/
1212

1313
Contact: https://data.humancellatlas.org/contact
1414

15-
ManagedBy: UC Santa Cruz Genomics Institute, University of California, Santa Cruz (UCSC)
15+
ManagedBy: UC Santa Cruz Genomics Institute, University of California, Santa Cruz, UCSC
1616

1717
UpdateFrequency: Monthly
1818

@@ -95,4 +95,4 @@ DataAtWork:
9595
AuthorName: "Various authors"
9696

9797
ADXCategories:
98-
- Healthcare & Life Sciences Data
98+
- Healthcare & Life Sciences Data

datasets/ideam-radares.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,9 @@ DataAtWork:
2929
- Title: Read and plot Sigmet files available on AWS using Xradar
3030
URL: https://docs.openradarscience.org/projects/xradar/en/stable/notebooks/Read-plot-Sigmet-data-from-AWS.html
3131
AuthorName: Alfonso Ladino
32-
- Title: Taller de datos científicos con Python y R - AtmosCol 2023
33-
URL: https://projectpythia.org/AtmosCol-2023/notebooks/2.acceso-datos/2.2.Radares.html
32+
- Title: Ciencia de Datos Hidrometeorológicos con Python
33+
URL: https://projectpythia.org/AtmosCol-2023/radares
3434
AuthorName: Alfonso Ladino, Nicole Rivera, Max Grover
3535
- Title: Specific Differential Phase (KDP) retrieval methods comparison
36-
URL: https://projectpythia.org/radar-cookbook/notebooks/example-workflows/kdp-comparison.html
36+
URL: https://projectpythia.org/radar-cookbook/notebooks/example-workflows/kdp-comparison/
3737
AuthorName: Alfonso Ladino, Max Grover

datasets/oceanomics.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Name: OceanOmics
22
Description: "Minderoo Foundation OceanOmics aims to establish environmental DNA (eDNA) as a tool to measure, understand, and protect oceans. OceanOmics mainly generates two types of data: eDNA sequencing data (metabarcoding, metagenomics), and genome assembly data (marine vertebrates)."
33
Documentation: https://edna.minderoo.org
44
5-
ManagedBy: Minderoo Foundation OceanOmics (Dr Shannon Corrigan, Dr Philipp Bayer)
5+
ManagedBy: Minderoo Foundation OceanOmics, Dr Shannon Corrigan, Dr Philipp Bayer
66
UpdateFrequency: Data will be continually updated as it is generated.
77
Collabs:
88
ASDI:

datasets/proteingym.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Description: |
33
ProteinGym is a benchmark suite for assessing the performance of protein fitness prediction and design models. It comprises a large curated collection of 200+ high-throughput experimental assays (~3M mutated sequences), as well as clinical annotations from experts about the pathogenicity of mutants in over 3k human genes.
44
Documentation: https://github.com/OATML-Markslab/ProteinGym/blob/main/README.md
55
6-
ManagedBy: "Harvard Medical School; University of Oxford"
6+
ManagedBy: "Harvard Medical School, University of Oxford"
77
UpdateFrequency: Quarterly
88
Tags:
99
- aws-pds

0 commit comments

Comments
 (0)