Skip to content

Commit 5fa0c73

Browse files
authored
Merge branch 'main' into main
2 parents cd2962f + ee8a687 commit 5fa0c73

12 files changed

+266
-27
lines changed

datasets/brazil-data-cubes.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ Documentation: http://brazildatacube.org/en/home-page-2/
44
55
ManagedBy: "[INPE - Brazil Data Cube](http://brazildatacube.org/)"
66
UpdateFrequency: New EO data cubes are added as soon as there are produced by the Brazil Data Cube project.
7+
DeprecatedNotice: This dataset is deprecated and will be removed from AWS Open Data in the near future. If you have any questions or require assistance, please contact us at [[email protected]].
78
Tags:
89
- earth observation
910
- satellite imagery
@@ -71,4 +72,4 @@ DataAtWork:
7172
AuthorName: K. R. Ferreira, et al.
7273
- Title: Building Earth Observation Data Cubes on AWS
7374
URL: https://www.proquest.com/openview/070d2a753cc88d26535c98293171a5ac/1?
74-
AuthorName: Ferreira, K R; Queiroz, G R; Marujo, R F B; Costa, R W. 
75+
AuthorName: Ferreira, K R; Queiroz, G R; Marujo, R F B; Costa, R W. 

datasets/broad-references.yaml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,6 @@ DataAtWork:
3030
- AWS Batch
3131
- Amazon FSx
3232
Tools & Applications:
33-
- Title: Genomics Workflows on AWS - Cromwell on AWS
34-
URL: https://docs.opendata.aws/genomics-workflows/orchestration/cromwell/cromwell-examples/#real-world-example-haplotypecaller
35-
AuthorName: W. Lee Pang
36-
AuthorURL: https://www.linkedin.com/in/lee-pang-a039a26/
3733
Publications:
3834
- Title: Advancing NGS quality control to enable measurement of actionable mutations in circulating tumor DNA
3935
URL: https://www.cell.com/cell-reports-methods/pdf/S2667-2375(21)00165-X.pdf

datasets/colorado-imagery.yaml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
Name: State of Colorado Imagery
2+
Description: The State of Colorado has gathered public historical imagery ranging from 2005 to 2021.
3+
Documentation: https://docs.google.com/document/d/1YDHignUj9lQTMw2J-SqA96MTP8KmJYtk2ZKKC2ZYuPE/edit?usp=sharing
4+
5+
ManagedBy: State of Colorado Governor's Office of Information Technology (OIT) GIS team
6+
UpdateFrequency: Periodically
7+
Tags:
8+
- aws-pds
9+
- aerial imagery
10+
- geospatial
11+
- imaging
12+
- mapping
13+
License: https://creativecommons.org/publicdomain/zero/1.0/legalcode
14+
Resources:
15+
- Description: The State of Colorado historic public aerial imagery. Currently, NAIP is available from 2005 and 2009-2021. The National Agriculture Imagery Program is a project managed by the U.S. Department of Agriculture created to collect leaf-on imagery for the United States during peak growing seasons. The files are available as GeoTIFFs. From 2005-2017 they have a one meter resolution. After that, it is a 60cm resolution.
16+
Region: us-east-1
17+
Type: S3 Bucket
18+
DataAtWork:
19+
Tutorials:
20+
- Title: Colorado AWS Open Imagery Guide
21+
URL: https://docs.google.com/document/d/15GjCSWSzst82FZMqBqdGV0rt6FKJzt03NlQYdWwsLGE/edit?usp=sharing
22+
AuthorName: State of Colorado OIT-GIS
23+
AuthorURL: https://geodata.colorado.gov/
24+
Tools & Applications:
25+
- Title: Colorado Public Imagery Dowloader
26+
URL: https://gis.colorado.gov/imagery/
27+
AuthorName: State of Colorado OIT-GIS
28+
AuthorURL: https://geodata.colorado.gov/
29+
ADXCategories:
30+
- Public Sector Data
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
Name: Canopy Tree Height maps for the Amazon Forest (mean height composite 2020-2024) by CTrees.org
2+
Description: |
3+
Mean canopy Tree Height for the Amazon Forest on the period 2020-2024 at 4.78 m of spatial resolution. Created using a deep learning model on high-resolution Planet imagery from the Norway's International Climate and Forest Initiative (NICFI) Satellite Data Program.
4+
Documentation: "[Project overview](https://ctrees.org/products/tree-level)"
5+
6+
ManagedBy: "[CTrees](https://ctrees.org/)"
7+
UpdateFrequency: TBD
8+
Tags:
9+
- aws-pds
10+
- cog
11+
- earth observation
12+
- land cover
13+
- deep learning
14+
- lidar
15+
- satellite imagery
16+
- image processing
17+
- environmental
18+
- conservation
19+
- geospatial
20+
License: |
21+
https://creativecommons.org/licenses/by/4.0/
22+
Citation: CTrees.org - 2025. Canopy Tree Height of the Amazon Forest. Accessed DAY MONTH YEAR.
23+
Resources:
24+
- Description: Cloud-optimized GeoTIFF files with names corresponding to the tiling system of the Norway's International Climate and Forest Initiative (NICFI) Satellite Data Program.
25+
ARN: arn:aws:s3:::ctrees-amazon-canopy-height
26+
Region: us-west-2
27+
Type: S3 Bucket
28+
Explore:
29+
- "[Browse CTrees Bucket](https://ctrees-amazon-canopy-height.s3.us-west-2.amazonaws.com/index.html)"
30+
DataAtWork:
31+
Tutorials:
32+
Tools & Applications:
33+
Publications:
34+
- Title: "Is this the largest tree in the Amazon? A Q&A with CTrees scientist Fabien Wagner"
35+
URL: https://ctrees.org/news/largest-tree-amazon-with-fabien-wagner-63
36+
AuthorName: Rachel Kovinsky
37+
- Title: "High Resolution Tree Height Mapping of the Amazon Forest using Planet NICFI Images and LiDAR-Informed U-Net Model"
38+
URL: https://doi.org/10.48550/arXiv.2501.10600
39+
AuthorName: Fabien H Wagner, Ricardo Dalagnol, Griffin Carter, Mayumi CM Hirye, Shivraj Gill, Le Bienfaiteur Sagang Takougoum, Samuel Favrichon, Michael Keller, Jean PHB Ometto, Lorena Alves, Cynthia Creze, Stephanie P George-Chacon, Shuang Li, Zhihua Liu, Adugna Mullissa, Yan Yang, Erone G Santos, Sarah R Worden, Martin Brandt, Philippe Ciais, Stephen C Hagen, Sassan Saatchi

datasets/ctrees-california-vhr-tree-height.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Name: Sub-Meter Canopy Tree Height of California in 2020 by CTrees.org
22
Description: |
3-
Canopy Tree Height maps for California in 2020. Created using a deep learning model on very-high-resolution airborne imagery from the National Agriculture Imagery Program (NAIP) by United States Department of Agriculture (USDA).
3+
Canopy Tree Height maps for California in 2020. Created using a deep learning model on very-high-resolution airborne imagery from the National Agriculture Imagery Program (NAIP) by United States Department of Agriculture (USDA).
44
Documentation: "[Project overview](https://ctrees.org/products/tree-level)"
55
66
ManagedBy: "[CTrees](https://ctrees.org/)"
@@ -18,13 +18,14 @@ Tags:
1818
- geospatial
1919
License: |
2020
https://creativecommons.org/licenses/by/4.0/
21-
Citation:
22-
CTrees.org - 2024. Sub-Meter Canopy Tree Height of California. Accessed DAY MONTH YEAR.
21+
Citation: CTrees.org - 2024. Sub-Meter Canopy Tree Height of California. Accessed DAY MONTH YEAR.
2322
Resources:
2423
- Description: Cloud-optimized GeoTIFF files with names corresponding to image of California for the year 2020 from the National Agriculture Imagery Program (NAIP) - United States Department of Agriculture (USDA) [NAIP](s3://naip-analytic/).
2524
ARN: arn:aws:s3:::ctrees-tree-height-ca-2020/
2625
Region: us-west-2
2726
Type: S3 Bucket
27+
Explore:
28+
- "[Browse CTrees Bucket](https://ctrees-tree-height-ca-2020.s3.us-west-2.amazonaws.com/index.html)"
2829
DataAtWork:
2930
Tutorials:
3031
Tools & Applications:
@@ -35,4 +36,3 @@ DataAtWork:
3536
- Title: Sub-Meter Tree Height Mapping of California using Aerial Images and LiDAR-Informed U-Net Model
3637
URL: https://doi.org/10.1016/j.rse.2024.114099
3738
AuthorName: Fabien H Wagner, Sophia Roberts, Alison L Ritz, Griffin Carter, Ricardo Dalagnol, Samuel Favrichon, Mayumi CM Hirye, Martin Brandt, Philippe Ciais and Sassan Saatchi
38-

datasets/gatk-test-data.yaml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,4 @@ Resources:
2323
DataAtWork:
2424
Tutorials:
2525
Tools & Applications:
26-
- Title: Genomics Workflows on AWS - Cromwell on AWS
27-
URL: https://docs.opendata.aws/genomics-workflows/orchestration/cromwell/cromwell-examples/#real-world-example-haplotypecaller
28-
AuthorName: W. Lee Pang
29-
AuthorURL: https://www.linkedin.com/in/lee-pang-a039a26/
3026
Publications:

datasets/humancellatlas.yaml

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
Name: Human Cell Atlas
2+
3+
Description: "The Human Cell Atlas (HCA) is a collaborative community of
4+
international scientists. Our mission is to create comprehensive reference
5+
maps of all the cells in the human body as a basis for both understanding
6+
human health and diagnosing, monitoring, and treating disease. The HCA
7+
registry has more than one thousand member scientists from hundreds of
8+
institutions around the world. The project is steered and governed by an
9+
Organizing Committee, co-chaired by Aviv Regev and Sarah Teichmann."
10+
11+
Documentation: https://data.humancellatlas.org/
12+
13+
Contact: https://data.humancellatlas.org/contact
14+
15+
ManagedBy: UC Santa Cruz Genomics Institute, University of California, Santa Cruz (UCSC)
16+
17+
UpdateFrequency: Monthly
18+
19+
Tags:
20+
- life sciences
21+
- biology
22+
- cell biology
23+
- genome
24+
- genomic
25+
- transcriptomics
26+
- gene expression
27+
- single-cell transcriptomics
28+
- cell imaging
29+
- Homo sapiens
30+
- Mus musculus
31+
32+
License: https://data.humancellatlas.org/about/data-use-agreement
33+
34+
Citation: "The URL for the HCA Data Portal, https://data.humancellatlas.org, can
35+
be used as the citation for data obtained from the HCA. Alternatively, you can
36+
cite: Regev A., et al. 2017. The Human Cell Atlas. Elife. Dec 5;6. pii:
37+
e27041. doi: 10.7554/eLife.27041."
38+
39+
Resources:
40+
- Description: "An S3 bucket containing all publicly accessible data files in
41+
the Human Cell Atlas. The bucket layout and access procedures are
42+
documented at
43+
44+
https://github.com/DataBiosphere/azul/blob/develop/docs/mirror.rst
45+
46+
and metadata can be viewed at
47+
48+
https://explore.data.humancellatlas.org/
49+
50+
or accessed programmatically at
51+
52+
https://service.azul.data.humancellatlas.org/"
53+
ARN: arn:aws:s3:::humancellatlas
54+
Region: us-east-1
55+
Type: S3 Bucket
56+
Explore:
57+
- "[Data Browser UI](https://explore.data.humancellatlas.org/)"
58+
- "[Azul REST Web Service](https://service.azul.data.humancellatlas.org/)"
59+
60+
DataAtWork:
61+
Publications:
62+
- Title: "The Human Cell Atlas White Paper"
63+
URL: https://arxiv.org/abs/1810.05192
64+
AuthorName: "Aviv Regev, Sarah Teichmann, Orit Rozenblatt-Rosen, Michael
65+
Stubbington, Kristin Ardlie, Ido Amit, Paola Arlotta, Gary Bader,
66+
Christophe Benoist, Moshe Biton, Bernd Bodenmiller, Benoit Bruneau,
67+
Peter Campbell, Mary Carmichael, Piero Carninci, Leslie Castelo-Soccio,
68+
Menna Clatworthy, Hans Clevers, Christian Conrad, Roland Eils, Jeremy
69+
Freeman, Lars Fugger, Berthold Goettgens, Daniel Graham, Anna Greka, Nir
70+
Hacohen, Muzlifah Haniffa, Ingo Helbig, Robert Heuckeroth, Sekar
71+
Kathiresan, Seung Kim, Allon Klein, Bartha Knoppers, Arnold Kriegstein,
72+
Eric Lander, Jane Lee, Ed Lein, Sten Linnarsson, Evan Macosko, Sonya
73+
MacParland, Robert Majovski, Partha Majumder, John Marioni, Ian
74+
McGilvray, Miriam Merad, Musa Mhlanga, Shalin Naik, Martijn Nawijn,
75+
Garry Nolan, Benedict Paten, Dana Pe'er, Anthony Philippakis, Chris
76+
Ponting, Steve Quake, Jayaraj Rajagopal, Nikolaus Rajewsky, Wolf Reik,
77+
Jennifer Rood, Kourosh Saeb-Parsy, Herbert Schiller, Steve Scott, Alex
78+
Shalek, Ehud Shapiro, Jay Shin, Kenneth Skeldon, Michael Stratton, Jenna
79+
Streicher, Henk Stunnenberg, Kai Tan, Deanne Taylor, Adrian Thorogood,
80+
Ludovic Vallier, Alexander van Oudenaarden, Fiona Watt, Wilko Weicher,
81+
Jonathan Weissman, Andrew Wells, Barbara Wold, Ramnik Xavier, Xiaowei
82+
Zhuang, Human Cell Atlas Organizing Committee"
83+
- Title: "The network effect: studying COVID-19 pathology with the Human Cell Atlas"
84+
URL: https://www.nature.com/articles/s41580-020-0267-3
85+
AuthorName: "Sarah Teichmann, Aviv Regev"
86+
- Title: "The Human Cell Atlas from a cell census to a unified foundation model"
87+
URL: https://www.nature.com/articles/s41586-024-08338-4
88+
AuthorName: "Jennifer E. Rood, Samantha Wynne, Lucia Robson, Anna
89+
Hupalowska, John Randell, Sarah A. Teichmann & Aviv Regev"
90+
- Title: "The Human Cell Atlas: towards a first draft atlas"
91+
URL: https://www.nature.com/collections/jccbbdahji
92+
AuthorName: "Various authors"
93+
- Title: "The Human Cell Atlas: towards a first draft atlas"
94+
URL: https://www.nature.com/immersive/d42859-024-00060-5/index.html
95+
AuthorName: "Various authors"
96+
97+
ADXCategories:
98+
- Healthcare & Life Sciences Data
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
Name: Indian High Court Judgments
2+
Description: This dataset contains judgements from the Indian High Courts, downloaded from ecourts website. It contains judgments of 25 high courts, along with raw metadata (in json format) and structured metadata (in parquet format). Judgments from the website are further compressed to optimize for size (care has been taken to not have any loss of data either in content or in visual appearance). Tar files are also made available in addition to the individual pdf files to make it easier for bulk download.
3+
Documentation: https://github.com/vanga/indian-high-court-judgments/blob/opendata/docs/opendata/dataset.md
4+
5+
ManagedBy: "[Dattam Labs](https://dattam.in)"
6+
UpdateFrequency: Quarterly
7+
Tags:
8+
- legal data
9+
License: CC-BY-4.0
10+
Resources:
11+
- Description: S3 bucket containing the judgments
12+
ARN: arn:aws:s3:::indian-high-court-judgments
13+
Region: ap-south-1
14+
Type: S3 Bucket
15+
DataAtWork:
16+
Tutorials:
17+
- Title: Using AWS Athena to query the metadata
18+
URL: https://github.com/vanga/indian-high-court-judgments/blob/main/opendata/tutorials/athena.md
19+
AuthorName: Pradeep Vanga
20+
AuthorURL: https://github.com/vanga
21+
Services:
22+
- Amazon Athena
23+
- Title: Extracting text from judgment PDFs
24+
URL: https://github.com/vanga/indian-high-court-judgments/blob/main/opendata/tutorials/README.md
25+
AuthorName: Pradeep Vanga
26+
AuthorURL: https://github.com/vanga
27+
Services:
28+
- Amazon S3
29+

datasets/open-ceda.yaml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,14 @@ Description: |
66
At its core, CEDA connects economic exchanges to GHG emissions by quantifying the life-cycle emissions of products and services. This is achieved through the integration of input-output tables, which represent the full supply-chain network of the global economy, with GHG emissions data. As a result, CEDA provides users with a powerful tool to assess the environmental impacts embedded in corporate value chains.
77
Documentation: https://openceda.org/
88
9-
ManagedBy: Watershed Technology
9+
ManagedBy: "[Watershed Technology](https://watershed.com)"
1010
UpdateFrequency: Annual
11+
Collabs:
12+
ASDI:
13+
Tags:
14+
- sustainability
1115
Tags:
16+
- aws-pds
1217
- climate
1318
- carbon
1419
- scope 3
@@ -21,6 +26,8 @@ Resources:
2126
ARN: arn:aws:s3:::open-ceda
2227
Region: us-west-2
2328
Type: S3 Bucket
29+
Explore:
30+
- "[Open CEDA](https://open-ceda.s3.amazonaws.com/index.html)"
2431
DataAtWork:
2532
Tutorials:
2633
- Title: For a tutoral please download the CEDA Methodology Documentation on the openceda.org website.
@@ -49,4 +56,4 @@ DataAtWork:
4956
URL: https://www.eurogypsum.org/wp-content/uploads/2015/05/N0533.pdf
5057
AuthorName: Arnold Tukker (TNO), Gjalt Huppes, Lauran van Oers, Reinout Heijungs (CML), 2006
5158
ADXCategories:
52-
- Environmental Data
59+
- Environmental Data

datasets/slacken.yaml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
Name: Metagenomic reference libraries for Slacken
2+
Description: Metagenomic indexes for use with the Slacken taxonomic classification tool
3+
Documentation: https://github.com/JNP-Solutions/Slacken/wiki/Pre%E2%80%90built-Slacken-indexes-on-Amazon-S3
4+
5+
ManagedBy: Johan Nyström-Persson
6+
UpdateFrequency: These indexes are currently what was used in our 2025 publication introducing the concept of 2-step classification and comparing Kraken2 with Slacken. We aim to update the data at least once per year, resources permitting.
7+
Tags:
8+
- genomic
9+
- metagenomics
10+
- microbiome
11+
- bioinformatics
12+
- biology
13+
- life sciences
14+
License: There are no restrictions on the use of this data.
15+
Resources:
16+
- Description: Metagenomic indexes for Slacken, a metagenomic classifier, based on NCBI RefSeq genomes.
17+
ARN: arn:aws:s3:::slacken
18+
Region: us-east-1
19+
Type: S3 Bucket
20+
Explore:
21+
- '[Browse Bucket](https://slacken.s3.amazonaws.com/)'
22+
DataAtWork:
23+
Tutorials:
24+
- Title: Classifying metagenomic samples on AWS ElasticMapReduce
25+
URL: https://github.com/JNP-Solutions/Slacken/wiki/Classifying-metagenomic-samples-on-AWS-Elastic-MapReduce
26+
Services:
27+
- Amazon EC2
28+
- Amazon EMR
29+
- Amazon S3
30+
AuthorName: Johan Nyström-Persson
31+
AuthorURL: https://github.com/jtnystrom
32+
Tools & Applications:
33+
- Title: Slacken
34+
URL: https://github.com/JNP-Solutions/Slacken
35+
AuthorName: Johan Nyström-Persson, Nishad Bapatdhar
36+
AuthorURL: https://github.com/jtnystrom
37+
Publications:
38+
- Title: "Precise and scalable metagenomic profiling with sample-tailored minimizer libraries"
39+
URL: https://www.biorxiv.org/content/10.1101/2024.12.22.629657
40+
AuthorName: Johan Nyström-Persson, Nishad Bapatdhar and Samik Ghosh

0 commit comments

Comments
 (0)