Skip to content

Commit 5109d9b

Browse files
authored
Merge branch 'main' into main
2 parents ef97369 + 217ca80 commit 5109d9b

File tree

9 files changed

+134
-26
lines changed

9 files changed

+134
-26
lines changed

datasets/aws-public-blockchain.yaml

Lines changed: 12 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,25 +3,19 @@ Description: >
33
<p>The AWS Public Blockchain Data initiative provides free access to blockchain datasets through collaboration with data providers. The data is optimized for analytics by being transformed into compressed Parquet files, partitioned by date for efficient querying.</p>
44
55
<h4>Datasets</h4>
6-
<table width="100%">
7-
<thead>
8-
<tr><th>Blockchain dataset</th><th>Maintained by</th><th>Path</th></tr>
9-
</thead>
10-
<tbody>
11-
<tr><td>Bitcoin</td> <td>AWS</td> <td><code>s3://aws-public-blockchain/v1.0/btc/</code></td></tr>
12-
<tr><td>Ethereum</td> <td>AWS</td> <td><code>s3://aws-public-blockchain/v1.0/eth/</code></td></tr>
13-
<tr><td>Arbitrum</td> <td>SonarX</td> <td><code>s3://aws-public-blockchain/v1.1/sonarx/arbitrum/</code></td></tr>
14-
<tr><td>Aptos</td> <td>SonarX</td> <td><code>s3://aws-public-blockchain/v1.1/sonarx/aptos/</code></td></tr>
15-
<tr><td>Base</td> <td>SonarX</td> <td><code>s3://aws-public-blockchain/v1.1/sonarx/base/</code></td></tr>
16-
<tr><td>Provenance</td> <td>SonarX</td> <td><code>s3://aws-public-blockchain/v1.1/sonarx/provenance/</code></td></tr>
17-
<tr><td>XRP Ledger</td> <td>SonarX</td> <td><code>s3://aws-public-blockchain/v1.1/sonarx/xrp/</code></td></tr>
18-
<tr><td>Stellar (<a href="https://developers.stellar.org/docs/learn/fundamentals/data-format/xdr" rel="noopener noreferrer">XDR files</a>)</td> <td>Stellar</td> <td><code>s3://aws-public-blockchain/v1.1/stellar/</code></td></tr>
19-
<tr><td>The Open Network (TON)</td> <td>TON</td> <td><code>s3://aws-public-blockchain/v1.1/ton/</code></td></tr>
20-
</tbody>
21-
</table>
22-
</br>
6+
<b>Blockchain dataset - Maintained by - Path:</b><br>
7+
- Bitcoin - AWS - <code>s3://aws-public-blockchain/v1.0/btc/</code><br>
8+
- Ethereum - AWS - <code>s3://aws-public-blockchain/v1.0/eth/</code><br>
9+
- Arbitrum - SonarX - <code>s3://aws-public-blockchain/v1.1/sonarx/arbitrum/</code><br>
10+
- Aptos - SonarX - <code>s3://aws-public-blockchain/v1.1/sonarx/aptos/</code><br>
11+
- Base - SonarX - <code>s3://aws-public-blockchain/v1.1/sonarx/base/</code><br>
12+
- Provenance - SonarX - <code>s3://aws-public-blockchain/v1.1/sonarx/provenance/</code><br>
13+
- XRP Ledger - SonarX - <code>s3://aws-public-blockchain/v1.1/sonarx/xrp/</code><br>
14+
- Stellar(<a href="https://developers.stellar.org/docs/learn/fundamentals/data-format/xdr" rel="noopener noreferrer">XDR files</a>) - Stellar - <code>s3://aws-public-blockchain/v1.1/stellar/</code><br>
15+
- The Open Network (TON) - TON - <code>s3://aws-public-blockchain/v1.1/ton/</code><br>
16+
</br>
2317
24-
<h3>Become a Data Provider</h3>
18+
<h4>Become a Data Provider</h4>
2519
<p>We welcome additional blockchain data providers to join this initiative. If you're interested in contributing datasets to the AWS Public Blockchain Data program, please contact our team at <a href="mailto:[email protected]">[email protected]</a>.</p>
2620
2721

datasets/gaia-dr3.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
2+
Name: Gaia DR3
3+
Description: |
4+
[Gaia DR3 data](https://www.cosmos.esa.int/web/gaia/dr3) were originally released by the European Space Agency in December 2020. This [HATS](https://hats.readthedocs.io/en/stable)-formatted catalog was produced by the LSST Interdisciplinary Network for Collaboration and Computing. The GAIA HATS Datasets are specifically designed for efficient spatial cross-matching with other HATS-format catalogs, whether within the same archive or across distributed archive data centers. This enables astronomers to perform complex analyses, such as identifying correlations or overlaps between datasets from different surveys. Users can leverage [LSDB (Large-Scale Database)](https://docs.lsdb.io/en/latest/), a scalable spatial analysis library, to execute precise, high-performance operations like cone searches or cross-matching.
5+
Documentation: https://docs.lsdb.io/en/latest/index.html
6+
7+
ManagedBy: "[Space Telescope Science Institute](http://www.stsci.edu/)"
8+
Citation: Please see [the LSDB citation page](https://docs.lsdb.io/en/latest/citation.html) if using LSDB for an academic publication. Please also [cite the Gaia team](https://gea.esac.esa.int/archive/documentation/GDR3/Miscellaneous/sec_credit_and_citation_instructions/).
9+
UpdateFrequency: Never
10+
Tags:
11+
- astronomy
12+
License: Attribution required.
13+
Resources:
14+
- Description: Gaia DR3 HATS-Formatted Files
15+
ARN: arn:aws:s3:::stpubdata/gaia
16+
Region: us-east-1
17+
Type: S3 Bucket
18+
RequesterPays: False
19+
- Description: Notifications for new data
20+
ARN: arn:aws:sns:us-east-1:879230861493:stpubdata/gaia
21+
Region: us-east-1
22+
Type: SNS Topic
23+
DataAtWork:
24+
Tutorials:
25+
- Title: Dark Energy Survey / Gaia DR3 Crossmatch
26+
URL: https://docs.lsdb.io/en/stable/tutorials/pre_executed/des-gaia.html
27+
AuthorName: LSDB Collaboration

datasets/noaa-historicalcharts.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Name: NOAA Historical Maps and Charts
22
Description: Historical Charts are not for Navigation. The collection primarily consists of historic charts and maps produced by NOAA's Coast Survey and its predecessors, especially the U.S. Coast and Geodetic Survey and the U.S. Lake Survey (previously under the Department of War). The collection also includes bathymetric maps, land sketches, Civil War battle maps, aeronautical charting from the 1930s to the 1950s, and other drawings and photographs.
33
Documentation: https://historicalcharts.noaa.gov/about.php
44
Contact: |
5-
For any questions regarding data delivery not associated with this platform or any general questions regarding the NOAA Big Data Program, email noaa.bdp@noaa.gov.<br/><br/>
5+
For any general questions regarding the NOAA Open Data Dissemination (NODD) Program, email the NODD Team at [email protected]. We also seek to identify case studies on how NOAA data is being used and will be featuring those stories in joint publications and in upcoming events. If you are interested in seeing your story highlighted, please share it with the NODD team by emailing nodd@noaa.gov.<br/><br/>
66
For general questions or feedback about the data, please submit inquiries through the NOAA Office of Coast Survey (OCS) ASSIST Tool at https://www.nauticalcharts.noaa.gov/customer-service/assist/.
77
ManagedBy: "[NOAA](http://www.noaa.gov/)"
88
UpdateFrequency: Periodic manual updates when historic charts are added to the collection.
@@ -25,3 +25,4 @@ Resources:
2525
Type: S3 Bucket
2626
Explore:
2727
- '[Browse Bucket](https://noaa-nos-historicalcharts-pds.s3.amazonaws.com/index.html)'
28+

datasets/noaa-ncn.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ Description: |
66
- [NOAA-NCN on AWS](https://noaa-cors-pds.s3.amazonaws.com/index.html)
77
- [NGS server: https://geodesy.noaa.gov/corsdata/](https://geodesy.noaa.gov/corsdata/)
88
- [NGS's customized data request service (UFCORS)](https://geodesy.noaa.gov/UFCORS/)
9-
- [NGS Anonymous ftp://geodesy.noaa.gov/cors/ - This service is going away on August 02, 2021!](ftp://geodesy.noaa.gov/cors/)
109
- #### NCN Data and Products
1110
- **RINEX**: The GPS/GNSS data collected at NCN stations are made available to the public by NGS in Receiver INdependent EXchange (RINEX) format. Most data are available within 1 hour (60 minutes) from when they were recorded at the remote site, and a few sites have a delay of 24 hours (1440 minutes).<br/>RINEX data can be found at: *rinex/`YYYY`/`DDD`/`ssss`/*
1211
- **Station logs**:

datasets/ome-zarr-open-scivis.yaml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
Name: OME-Zarr Open SciVis Datasets
2+
Description: This project provides the Open SciVis Datasets in a chunked, highly-compressed, multi-scale format, encodes metadata in JSON according to the OME-Zarr specification, and hosts the datasets on AWS S3 through the AWS Open Data Program, aiming to serve as a web-based resource for the scientific visualization community to enhance reproducibility and facilitate testing and development of OME-Zarr tools.
3+
Documentation: https://github.com/InsightSoftwareConsortium/OMEZarrOpenSciVisDatasets
4+
Contact: "Matt McCormick <[email protected]>"
5+
ManagedBy: "NumFOCUS"
6+
UpdateFrequency: On a biannual basis we update the datasets and sync with OME-Zarr standards.
7+
Tags:
8+
- biology
9+
- image processing
10+
- imaging
11+
- neuroimaging
12+
- neuroscience
13+
- life sciences
14+
- magnetic resonance imaging
15+
- computed tomography
16+
- volumetric imaging
17+
- zarr
18+
License: CC-BY-4.0 unless otherwise specified
19+
Resources:
20+
- Description: OME-Zarr Open SciVis Datasets
21+
ARN: arn:aws:s3:::ome-zarr-scivis
22+
Region: us-east-1
23+
Type: S3 Bucket
24+
DataAtWork:
25+
Tutorials:
26+
- Title: Read and Visualize in Python
27+
URL: https://github.com/InsightSoftwareConsortium/OMEZarrOpenSciVisDatasets?tab=readme-ov-file#usage
28+
AuthorName: Matt McCormick
29+
AuthorURL: https://github.com/thewtex
30+
Tools & Applications:
31+
- Title: A list of tools and libraries with OME-Zarr support
32+
URL: https://ngff.openmicroscopy.org/tools/index.html
33+
AuthorName: NGFF community
34+
AuthorURL: https://github.com/ome/ngff
35+
Publications:
36+
- Title: "OME-NGFF: a next-generation file format for expanding bioimaging data-access strategies"
37+
URL: https://www.nature.com/articles/s41592-021-01326-w
38+
AuthorName: Josh Moore, Chris Allan, Sébastien Besson, Jean-Marie Burel, Erin Diel, David Gault, Kevin Kozlowski, Dominik Lindner, Melissa Linkert, Trevor Manz, Will Moore, Constantin Pape, Christian Tischer & Jason R. Swedlow
39+
- Title: "OME-Zarr: a cloud-optimized bioimaging file format with international community support"
40+
URL: https://link.springer.com/article/10.1007/s00418-023-02209-1
41+
AuthorName: Josh Moore, Daniela Basurto-Lozada, Sébastien Besson, John Bogovic, Jordão Bragantini, Eva M. Brown, Jean-Marie Burel, Xavier Casas Moreno, Gustavo de Medeiros, Erin E. Diel, David Gault, Satrajit S. Ghosh, Ilan Gold, Yaroslav O. Halchenko, Matthew Hartley, Dave Horsfall, Mark S. Keller, Mark Kittisopikul, Gabor Kovacs, Aybüke Küpcü Yoldaş, Koji Kyoda, Albane le Tournoulx de la Villegeorges, Tong Li, Prisca Liberali, Dominik Lindner, Melissa Linkert, Joel Lüthi, Jeremy Maitin-Shepard, Trevor Manz, Luca Marconato, Matthew McCormick, Merlin Lange, Khaled Mohamed, William Moore, Nils Norlin, Wei Ouyang, Bugra Özdemir, Giovanni Palla, Constantin Pape, Lucas Pelkmans, Tobias Pietzsch, Stephan Preibisch, Martin Prete, Norman Rzepka, Sameeul Samee, Nicholas Schaub, Hythem Sidky, Ahmet Can Solak, David R. Stirling, Jonathan Striebel, Christian Tischer, Daniel Toloudis, Isaac Virshup, Petr Walczysko, Alan M. Watson, Erin Weisbart, Frances Wong, Kevin A. Yamauchi, Omer Bayraktar, Beth A. Cimini, Nils Gehlenborg, Muzlifah Haniffa, Nathan Hotaling, Shuichi Onami, Loic A. Royer, Stephan Saalfeld, Oliver Stegle, Fabian J. Theis & Jason R. Swedlow
42+
- Title: Open SciVis Datasets
43+
URL: http://klacansky.com/open-scivis-datasets/
44+
AuthorName: Pavol Klacansky
45+
DeprecatedNotice:
46+
ADXCategories:
47+
- Healthcare & Life Sciences Data
48+
- Manufacturing Data

datasets/proteingym.yaml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
Name: ProteinGym
2+
Description: |
3+
ProteinGym is a benchmark suite for assessing the performance of protein fitness prediction and design models. It comprises a large curated collection of 200+ high-throughput experimental assays (~3M mutated sequences), as well as clinical annotations from experts about the pathogenicity of mutants in over 3k human genes.
4+
Documentation: https://github.com/OATML-Markslab/ProteinGym/blob/main/README.md
5+
6+
ManagedBy: "Harvard Medical School; University of Oxford"
7+
UpdateFrequency: Quarterly
8+
Tags:
9+
- aws-pds
10+
- protein
11+
- bioinformatics
12+
- biology
13+
- life sciences
14+
- deep learning
15+
- machine learning
16+
License: MIT License
17+
Resources:
18+
- Description: "ProteinGym dataset including all substitution/indel mutations from Deep Mutational Scanning (DMS) experiments (DMS_substitutions.parquet / DMS_indels.parquet), and all substitution/indel mutations from clinical variant databases (clinical_substitutions.parquet / clinical_indels.parquet)."
19+
ARN: arn:aws:s3:::proteingym
20+
Region: us-east-2
21+
Type: S3 Bucket
22+
DataAtWork:
23+
Tutorials:
24+
- Title: Scoring ProteinGym assays with TranceptEVE
25+
URL: https://github.com/OATML-Markslab/ProteinGym/blob/main/notebooks/TranceptEVE_example.ipynb
26+
AuthorName: Daniel Ritter
27+
AuthorURL: https://danieldritter.github.io/
28+
Tools & Applications:
29+
- Title: ProteinGym website
30+
URL: https://proteingym.org/
31+
AuthorName: Pascal Notin & Daniel Ritter
32+
Publications:
33+
- Title: "ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design"
34+
URL: https://papers.nips.cc/paper_files/paper/2023/hash/cac723e5ff29f65e3fcbb0739ae91bee-Abstract-Datasets_and_Benchmarks.html
35+
AuthorName: "Pascal Notin, et al."
36+
AuthorURL: https://www.pascalnotin.com/

datasets/software-heritage.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Description: |
1414
information is also included, providing timestamps about when and where all
1515
archived source code artifacts have been observed in the wild.
1616
Author and committer information is anonymized.
17-
Documentation: https://docs.softwareheritage.org/devel/swh-dataset/graph/athena.html
17+
Documentation: https://docs.softwareheritage.org/devel/swh-export/graph/athena.html
1818
1919
ManagedBy: Software Heritage
2020
UpdateFrequency: Data is updated yearly
@@ -48,11 +48,11 @@ Resources:
4848
DataAtWork:
4949
Tutorials:
5050
- Title: Using the Software Heritage Graph Dataset
51-
URL: https://docs.softwareheritage.org/devel/swh-dataset/graph/index.html
51+
URL: https://docs.softwareheritage.org/devel/swh-export/graph/
5252
AuthorName: The Software Heritage team
5353
Tools & Applications:
5454
- Title: The SWH-Graph module
55-
URL: https://docs.softwareheritage.org/devel/swh-graph/index.html
55+
URL: https://docs.softwareheritage.org/devel/swh-graph/
5656
AuthorName: The Software Heritage team
5757
Publications:
5858
- Title: The Software Heritage Graph Dataset

datasets/tglc.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11

2-
Name: TESS-GAIA Light Curve (TESS)
2+
Name: TESS-GAIA Light Curve (TGLC)
33
Description: |
44
TESS-Gaia Light Curve (TGLC) is a PSF-based TESS full-frame image (FFI) light curve product. Using Gaia DR3 as priors, the team forward models the FFIs with the effective point spread function to remove contamination from nearby stars. The resulting light curves show a photometric precision closely tracking the pre-launch prediction of the noise level: TGLC's photometric precision consistently reaches ≲2% at 16th TESS magnitude even in crowded fields, demonstrating excellent decontamination and deblending power.
55
Documentation: https://archive.stsci.edu/hlsp/tglc
@@ -13,12 +13,12 @@ Tags:
1313
License: All HLSPs hosted at MAST are subject to a [CC By 4.0 license](https://creativecommons.org/licenses/by/4.0/).
1414
Resources:
1515
- Description: TGLC Files
16-
ARN: arn:aws:s3:::stpubdata/hlsp/tglc
16+
ARN: arn:aws:s3:::stpubdata/mast/hlsp/tglc
1717
Region: us-east-1
1818
Type: S3 Bucket
1919
RequesterPays: False
2020
- Description: Notifications for new data
21-
ARN: arn:aws:sns:us-east-1:879230861493:stpubdata/hlsp/tglc
21+
ARN: arn:aws:sns:us-east-1:879230861493:stpubdata/mast/hlsp/tglc
2222
Region: us-east-1
2323
Type: SNS Topic
2424
DataAtWork:

tags.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,9 @@
202202
- hazard indicator
203203
- Hawkes Process
204204
- hdf5
205+
- hdf
205206
- health
207+
- heliophysics
206208
- high-throughput imaging
207209
- hiring
208210
- hispanic
@@ -418,6 +420,7 @@
418420
- temporal point process
419421
- tertiary analysis
420422
- text analysis
423+
- tiff
421424
- tiles
422425
- time series forecasting
423426
- trading

0 commit comments

Comments
 (0)