Skip to content

Commit 4124959

Browse files
Merge branch 'main' into add-ogs-arco-ocean
2 parents 1586d15 + 44f7d04 commit 4124959

12 files changed

+319
-33
lines changed

datasets/allen-hmba-releases.yaml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
Name: Human and Mammalian Brain Atlas
2+
Description:
3+
Human and Mammalian Brain Atlas (HMBA) is a major atlas of the BRAIN Initiative Cell Atlas Network (BICAN) that proposes to establish a comprehensive,
4+
highly granular cell atlas in complete adult human, macaque, and marmoset brains that links brain structure, function and cellular architecture.
5+
Release artifacts have been made available in this OpenData bucket to enable utilization along with their paper publications by the neuroscience community.
6+
Documentation: https://portal.brain-map.org/explore/hmba
7+
8+
ManagedBy: "[Allen Institute](http://www.alleninstitute.org/)"
9+
UpdateFrequency: Never
10+
Tags:
11+
- aws-pds
12+
- biology
13+
- gene expression
14+
- neurobiology
15+
- life sciences
16+
- single-cell transcriptomics
17+
- Mus musculus
18+
- Homo sapiens
19+
- non-human primate
20+
License: http://www.alleninstitute.org/legal/terms-use/
21+
Citation:
22+
Resources:
23+
- Description: Project data files in a public bucket
24+
ARN: arn:aws:s3:::allen-hmba-releases
25+
Region: us-west-2
26+
Type: S3 Bucket
27+
DataAtWork:
28+
Tutorials:
29+
- Title: Human-Mammalian Brain - Basal Ganglia - Data
30+
URL: https://alleninstitute.github.io/abc_atlas_access/descriptions/HMBA-BG_dataset.html
31+
AuthorName: Allen Institute for Brain Science
32+
AuthorURL: www.alleninstitute.org
33+
- Title: Human-Mammalian Brain - CCF Book
34+
URL: https://alleninstitute.github.io/CCF-MAP/
35+
AuthorName: Allen Institute for Brain Science
36+
AuthorURL: www.alleninstitute.org
37+
Tools & Applications:
38+
- Title: HMBA Basal Ganglia resources in Brain Knowledge Platform's Data Catalog
39+
URL: https://knowledge.brain-map.org/data/POZ2HCPBT60DSDJ8UA7
40+
AuthorName: Allen Institute for Brain Science
41+
AuthorURL: www.alleninstitute.org
42+
43+
44+
45+
46+

datasets/askap.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,13 @@ Tags:
2323
License: CC-BY-4.0. Attribution required for refereed scientific papers.
2424
Resources:
2525
- Description: The Rapid ASKAP Continuum Survey (RACS) Public Data Releases
26-
ARN: arn:aws:s3:::askap/racs
26+
ARN: arn:aws:s3:::askap-odp/racs-low1/
2727
Region: ap-southeast-2
2828
Type: S3 Bucket
2929
RequesterPays: False
30-
- Description: Notifications for new Rapid ASKAP Continuum Survey (RACS) data
31-
ARN: arn:aws:sns:ap-southeast-2:336305517014:racs-low1-object_created
32-
Region: sp-southeast-2
30+
- Description: Notifications for new ASKAP data
31+
ARN: arn:aws:sns:ap-southeast-2:336305517014:askap-odp-object_created
32+
Region: ap-southeast-2
3333
Type: SNS Topic
3434
DataAtWork:
3535
Tutorials:
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
Name: "BraiDyn-BC: Cued lever-pull task dataset"
2+
Description: |
3+
The BraiDyn-BC (Brain Dynamics underlying emergence of Behavioral Change) Database offers an extensive, multimodal dataset that links
4+
wide-field calcium imaging of the mouse neocortex to comprehensive behavioral measurements during a behavioral task.
5+
As one of the contents in this database, we newly provide a dataset that includes 15 sessions spanning two weeks of motor skill learning,
6+
in which 25 mice were trained to pull a lever to obtain water rewards.
7+
Simultaneous high-speed videography captures body, facial, and eye movements, and environmental parameters are monitored.
8+
The dataset also features resting-state cortical activity and sensory-evoked responses, enhancing its utility for both learning-related and
9+
sensory-driven neural dynamics studies.
10+
Data are formatted in accordance with the Neurodata Without Borders (NWB) standard, ensuring compatibility with existing analysis tools and
11+
adherence to the FAIR principles.
12+
This resource enables in-depth investigations into the neural mechanisms underlying behavior and learning.
13+
The platform encourages collaborative research, supporting the exploration of rapid within-session learning effects, long-term behavioral adaptations, and neural circuit dynamics.
14+
Documentation: https://doi.org/10.1101/2025.02.03.631599
15+
Contact: "Ken Nakae ([email protected])"
16+
ManagedBy: "[BraiDyn-BC Database Project](https://boatneck-weeder-7b7.notion.site/BraiDyn-BC-Database-303cf08c89f94d81bb2eaed4c3c50345)"
17+
UpdateFrequency: NA
18+
Tags:
19+
- Mus musculus
20+
- neuroscience
21+
- calcium imaging
22+
- video
23+
- imaging
24+
- life sciences
25+
- aws-pds
26+
License: Creative Commons Attribution 4.0 International (CC-BY 4.0)
27+
Resources:
28+
- Description: BraiDyn-BC - Cued lever-pull task dataset
29+
ARN: arn:aws:s3:::braidyn-bc-buckets
30+
Region: ap-northeast-1
31+
Type: S3 Bucket
32+
DataAtWork:
33+
Tutorials:
34+
- Title: Detailed usage tutorials on Google Colab
35+
URL: https://drive.google.com/drive/folders/1QciTJd3tXkEGhz6782czB2dEO3fafm8M
36+
AuthorName: Keisuke Sehara
37+
AuthorURL: https://orcid.org/0000-0003-4368-8143
38+
Tools & Applications:
39+
- Title: A set of libraries used for generating the dataset
40+
URL: https://github.com/BraiDyn-BC/bdbc-data-pipeline
41+
AuthorName: Keisuke Sehara, Ryo Aoki, Shoya Sugimoto
42+
Publications:
43+
- Title: A multimodal dataset linking wide-field calcium imaging to behavior changes in mice during an operant lever-pull task
44+
URL: https://doi.org/10.1101/2025.02.03.631599
45+
AuthorName: Kondo M, Sehara K, Harukuni R, Aoki R, Sugimoto S, Tanaka YR, Matsuzaki M, Nakae K
46+
AuthorURL:
47+
ADXCategories:
48+
- Healthcare & Life Sciences Data
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
Name: >-
2+
EPA Hourly Prognostic Meteorological Data
3+
Description: >-
4+
The data are hourly outputs from the Weather Research and Forecasting (WRF) model
5+
generated by the EPA's Office of Air Quality Planning and Standards, Air Quality
6+
Assessment Division, Air Quality Modeling Group. These data were generated at a 12-km
7+
resolution over the Continental United States (12US), beginning for the year 2021 and
8+
continuing annually through 2023. These files are intended for use in a broad range of
9+
air quality applications, but specifically may be used in dispersion modeling applications
10+
that would benefit from the use of the Mesoscale Model Interface (MMIF) tool
11+
(https://www.epa.gov/scram/air-quality-dispersion-modeling-related-model-support-programs#mmif)
12+
which translates prognostic meteorological data into formats suitable for use with AERMOD,
13+
CALPUFF, or SCICHEM. The individual files are less than 1GB in size, which allows for
14+
the use of the MMIF tool in a Windows environment. These data are anticipated to be updated
15+
annually so the 3 most-recent years are available for use. Additionally, model-observation
16+
paired files are included to aid in the performance evaluation that is necessary for use
17+
of these data in regulatory applications per Appendix W to 40 CFR Part 51.
18+
Documentation: >-
19+
2022 WRF Modeling TSD:
20+
https://bit.ly/2022WRF
21+
22+
ManagedBy: U.S. Environmental Protection Agency (https://www.epa.gov)
23+
UpdateFrequency: Annually
24+
Tags:
25+
- aws-pds
26+
- environmental
27+
- air quality
28+
- regulatory
29+
- weather
30+
- meteorological
31+
License: >-
32+
These datasets are products of the U.S. Government and are intended for public
33+
access and use. Unless otherwise specified, all data produced by the U.S EPA
34+
is, by default, in the public domain and are not subject to domestic copyright
35+
protection under 17 U.S.C. § 105. More details on the U.S. Public Domain
36+
license are available here: http://www.usa.gov/publicdomain/label/1.0/
37+
Citation: >-
38+
WRF Modeling:
39+
US EPA, 2024, "Meteorological Model Performance for Annual 2022 Simulation
40+
WRF v4.4.2"
41+
Resources:
42+
- Description: >-
43+
The WRF output are stored as uncompressed netcdf/hdf5 formatted files in
44+
directories corresponding to the specific years of interest. The model-obs
45+
paired files are stored as comma-delimited files in the year-specific
46+
directories.
47+
ARN: 'arn:aws:s3:::epa-hourly-prognostic-meteorology'
48+
Region: us-east-1
49+
Type: S3 Bucket
50+
Explore:
51+
- '[Browse Bucket](https://epa-hourly-prognostic-meteorology.s3.amazonaws.com/index.html)'
52+
- Description: Notification for the EPA Hourly Prognostic Meteorological Data bucket
53+
ARN: 'arn:aws:sns:us-east-1:127085394039:epa-hourly-prognostic-meteorology-object_created'
54+
Region: us-east-1
55+
Type: SNS Topic

datasets/kreppref.yaml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
Name: Reference Indexes for krepp
2+
Description: krepp is an alignment-free method for estimating distances and phylogenetic placement of individual reads to many thousands of reference genomes in a scalable manner using k-mers. This dataset includes k-mer-based indexes consisting of ultra-large reference genome sets that can be efficiently analyzed using krepp.
3+
Documentation: https://github.com/bo1929/krepp/wiki/Available-reference-indexes
4+
Contact: https://github.com/bo1929/krepp/issues
5+
ManagedBy: Mirarab Lab at UC San Diego
6+
UpdateFrequency: Quarterly or as new data becomes available
7+
Tags:
8+
- bioinformatics
9+
- metagenomics
10+
- microbiome
11+
- reference index
12+
- aws-pds
13+
- life sciences
14+
License: GPL-3.0 license. Use of the data should be cited in the usual way, following https://github.com/bo1929/krepp/tree/master?tab=readme-ov-file#citation.
15+
Resources:
16+
- Description: This dataset contains genomic indexes for various reference datasets in binary format. Using krepp, you can perform distance estimation and phylogenetic placement with respect to these indexes.
17+
ARN: arn:aws:s3:::kreppref
18+
Region: us-west-1
19+
Type: S3 Bucket
20+
DataAtWork:
21+
Tutorials:
22+
- Title: Tutorial for using krepp indexes for metagenomic sequence analysis.
23+
URL: https://github.com/bo1929/krepp/wiki/Tutorial
24+
AuthorName: Ali Osman Berk Sapci
25+
AuthorURL: https://bo1929.github.io/
26+
Publications:
27+
- Title: A k-mer-based maximum likelihood method for estimating distances of reads to genomes enables genome-wide phylogenetic placement.
28+
URL: https://www.biorxiv.org/content/10.1101/2025.01.20.633730v2
29+
AuthorName: Sapci et al. (2024)

datasets/noaa-nws-naqfc-pds.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,10 @@ Resources:
3838
Type: S3 Bucket
3939
Explore:
4040
- '[Browse Bucket](https://noaa-nws-naqfc-pds.s3.amazonaws.com/index.html)'
41+
- Description: New data notifications for NAQFC, only Lambda and SQS protocols allowed
42+
ARN: arn:aws:sns:us-east-1:709902155096:NewNWSAirQualityObject
43+
Region: us-east-1
44+
Type: SNS Topic
4145
DataAtWork:
4246
Tutorials:
4347
Tools & Applications:

datasets/nrel-pds-ncdb.yaml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ Description: |
66
The NCDB seeks to maintain the inherent relationship between the various parameters
77
that are needed to model solar, wind, hydrology and load and provide data for multiple
88
important climate scenarios.
9-
109
Documentation: https://nsrdb.nrel.gov/
1110
1211
ManagedBy: '[National Renewable Energy Laboratory](https://www.nrel.gov/)'
@@ -46,14 +45,13 @@ Resources:
4645
Explore:
4746
- '[Browse Dataset](https://data.openei.org/s3_viewer?bucket=nrel-pds-hsds&prefix=nrel%2Fncdb%2F)'
4847
DataAtWork:
49-
Tutorials:
5048
Tools & Applications:
5149
- Title: NCDB Website
5250
URL: https://climate.nrel.gov
5351
AuthorName: NREL NCDB Team
54-
- Title: HSDS Examples
55-
URL: https://github.com/NREL/hsds-examples
56-
AuthorName: Caleb Phillips, Caroline Draxl, John Readey, Jordan Perr-Sauer, Michael Rossol
52+
- Title: NCDB HSDS Examples
53+
URL: https://github.com/NREL/hsds-examples/blob/master/notebooks/10_NCDB_introduction.ipynb
54+
AuthorName: Reid Olson
5755
Publications:
5856
- Title: Regridding uncertainty for statistical downscaling of solar radiation
5957
URL: https://ascmo.copernicus.org/articles/9/103/2023/

datasets/ont_basemod_data.yaml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
Name: ONT Methylation Benchmarking Datasets
2+
Description: ONT Methylation Benchmarking Datasets are generated to benchmark existing methylation-calling tools on the Oxford Nanopore sequencing platform using their recent R10.4.1 flowcell chemistry. It spans a diverse range of species, including bacteria (E. coli, H. pylori J99, H. pylori 26695, A. variabilis, T. denticola), plants (Rice, Arabidopsis), and mammals (mouse, human).In addition, the dataset includes EMSeq data for E. coli, plant, and mouse samples, which can serve as ground truth for methylation studies. It also provides unmethylated whole-genome amplified (WGA) DNA for H. pylori 26695 and a dam- dcm- double mutant (DM) of E. coli that lacks canonical 5mC and 6mA methylation. These variants, together with their wild-type counterparts, offer value for both training and benchmarking DNA methylation calling models.
3+
Documentation: https://github.com/SowpatiLab/ont-basemod-benchmark-data/blob/main/documentation.md
4+
5+
ManagedBy: "[CSIR-Centre for Cellular and Molecular Biology](https://www.ccmb.res.in/)"
6+
UpdateFrequency: Datasets will be updated periodically as additional data is generated.
7+
Tags:
8+
- aws-pds
9+
- life sciences
10+
- genomic
11+
- long read sequencing
12+
- bioinformatics
13+
- epigenomics
14+
- benchmark
15+
- bam
16+
License: "[MIT License](https://opensource.org/license/mit)"
17+
Citation: "Please cite Kulkarni et al. Comprehensive benchmarking of tools for nanopore-based detection of DNA methylation. bioRxiv (2024). doi: https://doi.org/10.1101/2024.11.09.622763 when referencing the ONT methylation benchmarking datasets in publications."
18+
Resources:
19+
- Description: ONT Methylation Benchmarking Datasets
20+
ARN: arn:aws:s3:::ont-basemod-benchmark-data
21+
Region: ap-south-1
22+
Type: S3 Bucket
23+
Explore:
24+
- '[Browse Bucket](https://ont-basemod-benchmark-data.s3.amazonaws.com/index.html)'
25+
- Description: Notifications for object created
26+
ARN: arn:aws:sns:ap-south-1:767415906609:ont-basemod-benchmark-data-object_created
27+
Region: ap-south-1
28+
Type: SNS Topic
29+
DataAtWork:
30+
Tutorials:
31+
- Title: Methylation calling using ONT methylation benchmarking dataset
32+
URL: https://github.com/SowpatiLab/ont-basemod-benchmark-data/blob/main/tutorial.md
33+
AuthorName: Onkar Kulkarni
34+
Publications:
35+
- Title: Comprehensive benchmarking of tools for nanopore-based detection of DNA methylation
36+
URL: https://www.biorxiv.org/content/10.1101/2024.11.09.622763v1
37+
AuthorName: Kulkarni et al.

datasets/rcm-ceos-ard.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Resources:
4242
Region: ca-central-1
4343
Type: S3 Bucket
4444
Explore:
45-
- '[EODMS STAC for RCM CEOS ARD](https://www.eodms-sgdot.nrcan-rncan.gc.ca/stac/collections/rcm-ard/items/)'
45+
- '[STAC for RCM CEOS ARD products](https://radiantearth.github.io/stac-browser/#/external/www.eodms-sgdot.nrcan-rncan.gc.ca/stac/collections/rcm-ard?.language=en)'
4646
DataAtWork:
4747
Tutorials:
4848
- Title: Workflows for accessing and manipulating RCM ARD SpatioTemporal Asset Catalog (STAC) in JupyterLab Python Notebooks - Flux de travail pour accéder et manipuler le catalogue d'actifs spatio-temporels (STAC) RCM ARD dans les notebooks Python JupyterLab
@@ -66,3 +66,7 @@ DataAtWork:
6666
URL: https://dataspace.copernicus.eu/explore-data/data-collections/copernicus-contributing-missions/collections-description/COP-DEM
6767
AuthorName: European Space Agency (ESA)
6868
AuthorURL: https://www.esa.int/
69+
- Title: RCM CEOS ARD Dataset on GEO.ca | Ensemble de données RCM CEOS ARD sur GEO.ca
70+
URL: https://app.geo.ca/en-ca/map-browser/record/eodms-rcm-ard
71+
AuthorName: Canada Centre for Remote Sensing | Centre canadien de télédétection
72+
AuthorURL: https://natural-resources.canada.ca/science-data/science-research/research-centres/canada-centre-remote-sensing

datasets/roa.yaml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
Name: Rain over Africa
2+
Description: The Rain over Africa (RoA) dataset consists of spaceborn estimates of precipitation of Rain over Africa using only geostationary imagery and obtained through a convolutional and quantile regression neural network. The dataset also contains some uncertainty estimates.
3+
Documentation: https://github.com/SEE-GEO/roa
4+
Contact: https://github.com/SEE-GEO/roa
5+
ManagedBy: "[Geoscience and Remote Sensing at Chalmers University of Technology](https://www.chalmers.se/en/departments/see/research/geo)"
6+
UpdateFrequency: At most, yearly
7+
Tags:
8+
- aws-pds
9+
- agriculture
10+
- analysis ready data
11+
- atmosphere
12+
- aws-pds
13+
- climate
14+
- deep learning
15+
- earth observation
16+
- geophysics
17+
- geoscience
18+
- hydrology
19+
- machine learning
20+
- precipitation
21+
- satellite imagery
22+
- weather
23+
- zarr
24+
License: "[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)"
25+
Citation: "Please refer to https://github.com/SEE-GEO/roa#5-how-to-cite for instructions on how to cite the RoA data."
26+
Resources:
27+
- Description: RoA expected rain rate and quantiles at levels 5%, 16%, 25%, 50%, 75%, 84%, and 95% in Zarr format
28+
ARN: arn:aws:s3:::rainoverafrica
29+
Region: us-west-2
30+
Type: S3 Bucket
31+
- Description: Notifications for new Rain over Africa data
32+
ARN: arn:aws:sns:us-west-2:261854712492:rainoverafrica-object_created
33+
Region: us-west-2
34+
Type: SNS Topic
35+
DataAtWork:
36+
Tutorials:
37+
- Title: Reading RoA data
38+
URL: https://github.com/SEE-GEO/roa?tab=readme-ov-file#22-reading-roa-data
39+
AuthorName: Adrià Amell
40+
Services:
41+
- Amazon S3
42+
- Title: How to use the data
43+
URL: https://github.com/SEE-GEO/roa?tab=readme-ov-file#3-how-to-use-the-data
44+
AuthorName: Adrià Amell
45+
Services:
46+
- Amazon S3
47+
Publications:
48+
- Title: Probabilistic near real-time retrievals of Rain over Africa using deep learning
49+
URL: https://doi.org/10.1029/2025JD044595
50+
AuthorName: Adrià Amell, Lilian Hee, Simon Pfreundschuh, and Patrick Eriksson
51+
DeprecatedNotice:
52+
ADXCategories:
53+
- Environmental Data

0 commit comments

Comments
 (0)