Skip to content

Commit 1f2ef1c

Browse files
authored
Merge pull request #1 from awslabs/main
Merged updates made after the commit
2 parents 1a9bea5 + 0b2a064 commit 1f2ef1c

File tree

8 files changed

+176
-17
lines changed

8 files changed

+176
-17
lines changed

datasets/aws-public-blockchain.yaml

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,8 @@
11
Name: AWS Public Blockchain Data
22
Description: >
3-
The AWS Public Blockchain Data provides free access to blockchain datasets. Data is transformed into multiple
4-
tables as compressed Parquet files, partitioned by date, to allow efficient access for most common analytics queries.
3+
<p>The AWS Public Blockchain Data initiative provides free access to blockchain datasets through collaboration with data providers. The data is optimized for analytics by being transformed into compressed Parquet files, partitioned by date for efficient querying.</p>
54
6-
</br></br>
7-
8-
<strong>Datasets</strong></br></br>
5+
<h4>Datasets</h4>
96
<table width="100%">
107
<thead>
118
<tr><th>Blockchain dataset</th><th>Maintained by</th><th>Path</th></tr>
@@ -18,10 +15,15 @@ Description: >
1815
<tr><td>Base</td> <td>SonarX</td> <td><code>s3://aws-public-blockchain/v1.1/sonarx/base/</code></td></tr>
1916
<tr><td>Provenance</td> <td>SonarX</td> <td><code>s3://aws-public-blockchain/v1.1/sonarx/provenance/</code></td></tr>
2017
<tr><td>XRP Ledger</td> <td>SonarX</td> <td><code>s3://aws-public-blockchain/v1.1/sonarx/xrp/</code></td></tr>
18+
<tr><td>Stellar (<a href="https://developers.stellar.org/docs/learn/fundamentals/data-format/xdr" rel="noopener noreferrer">XDR files</a>)</td> <td>Stellar</td> <td><code>s3://aws-public-blockchain/v1.1/stellar/</code></td></tr>
19+
<tr><td>The Open Network (TON)</td> <td>TON</td> <td><code>s3://aws-public-blockchain/v1.1/ton/</code></td></tr>
2120
</tbody>
2221
</table>
2322
</br>
24-
For full datasets, with support and real-time updates, please visit <a href="https://sonarx.com">SonarX</a>.
23+
24+
<h3>Become a Data Provider</h3>
25+
<p>We welcome additional blockchain data providers to join this initiative. If you're interested in contributing datasets to the AWS Public Blockchain Data program, please contact our team at <a href="mailto:[email protected]">[email protected]</a>.</p>
26+
2527
2628
Documentation: https://github.com/aws-samples/digital-assets-examples/blob/main/analytics/
2729
@@ -36,11 +38,23 @@ Resources:
3638
ARN: arn:aws:s3:::aws-public-blockchain
3739
Region: us-east-2
3840
Type: S3 Bucket
41+
Explore:
42+
- '[Browse Bucket](https://aws-public-blockchain.s3.us-east-2.amazonaws.com/index.html)'
43+
3944
DataAtWork:
4045
Publications:
46+
- Title: "Exploring Arbitrum Data: Analyze L2 Activity with AWS Public Blockchain Datasets"
47+
URL: https://repost.aws/articles/ARpnBONglsT2e6D-hZZmxVvA/exploring-arbitrum-data-analyze-l2-activity-with-aws-public-blockchain-datasets
48+
AuthorName: Simon Goldberd, Everton Fraga
49+
- Title: "Unlocking XRP Ledger Data: Comprehensive Analysis with AWS Public Blockchain Datasets"
50+
URL: https://repost.aws/articles/ARg_zMIXlhTG2hSDFZDfF6hQ/unlocking-xrp-ledger-data-comprehensive-analysis-with-aws-public-blockchain-datasets
51+
AuthorName: Simon Goldberd, Everton Fraga
4152
- Title: New datasets added to the AWS Public Blockchain Datasets — available for analytics and research
4253
URL: https://repost.aws/articles/AR3gztQGeSS8CfaKNNeyYwsQ
4354
AuthorName: Everton Fraga, Simon Goldberg
55+
- Title: FEDS Notes - Primary and Secondary Markets for Stablecoins
56+
URL: https://www.federalreserve.gov/econres/notes/feds-notes/primary-and-secondary-markets-for-stablecoins-20240223.html
57+
AuthorName: Cy Watsky, Jeffrey Allen, Hamzah Daud, Jochen Demuth, Daniel Little, Megan Rodden, Amber Seira
4458
- Title: Access Bitcoin and Ethereum open datasets for cross-chain analytics
4559
URL: https://aws.amazon.com/blogs/database/access-bitcoin-and-ethereum-open-datasets-for-cross-chain-analytics/
4660
AuthorName: Oliver Steffmann, Bhaskar Ravat, Sreeji Gopal, and Stefan Dicker

datasets/carbonpdf.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
Name: CarbonPDF
2+
Description: A carbon question-answering (QA) dataset specifically designed to facilitate the extraction and analysis of data from real-world carbon reports of computing products. The dataset features annotated metadata, a variety of numerical reasoning tasks, and structured derivations to ensure accurate processing of fragmented and inconsistent information.
3+
Documentation: https://github.com/pittcps/carbonpdf-dataset
4+
5+
ManagedBy: Pittcps lab
6+
UpdateFrequency: Data for a new company is added once collected.
7+
Collabs:
8+
ASDI:
9+
Tags:
10+
- climate
11+
Tags:
12+
- aws-pds
13+
- environmental
14+
- product comparison
15+
- csv
16+
- information retrieval
17+
- industry
18+
License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
19+
Resources:
20+
- Description: A component-level product carbon footprint dataset and a corresponding question-answering dataset based on it
21+
ARN: arn:aws:s3:::carbonpdf
22+
Region: us-east-1
23+
Type: S3 Bucket
24+
Explore:
25+
- '[Explore](https://github.com/pittcps/carbonpdf-dataset)'
26+
ADXCategories:
27+
- Environmental Data
28+
- Manufacturing Data

datasets/loc-sanborn-maps.yaml

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
Name: Sanborn Maps Data Package
3+
Description: The dataset contains metadata records for 50,600 maps from the
4+
[Sanborn Fire Insurance Maps
5+
collection](https://www.loc.gov/collections/sanborn-maps/) and their
6+
corresponding 440,048 JPEG images. The Sanborn collection at Library of
7+
Congress includes over fifty thousand editions of fire insurance maps
8+
comprising almost seven hundred thousand individual sheets. The Library of
9+
Congress holdings represent the largest extant collection of maps produced by
10+
the Sanborn Map Company.
11+
Documentation: https://data.labs.loc.gov/sanborn/
12+
Contact: For curatorial questions about the content of the collection and
13+
formats, contact the Library of Congress Geography and Map Division at
14+
https://ask.loc.gov/map-geography. For technical questions about access,
15+
16+
ManagedBy: "[Library of Congress](https://www.loc.gov/)"
17+
UpdateFrequency: As new and significant changes to the underlying digital collection occurs
18+
Tags:
19+
- aws-pds
20+
- archives
21+
- cities
22+
- computer vision
23+
- conservation
24+
- culture
25+
- cultural preservation
26+
- demographics
27+
- digital assets
28+
- geospatial
29+
- history
30+
- housing
31+
- land use
32+
- mapping
33+
- urban
34+
License: The content of the Library of Congress online Sanborn Maps Collection
35+
is in the public domain and is free to use and reuse. For more information,
36+
see
37+
https://www.loc.gov/collections/sanborn-maps/about-this-collection/rights-and-access/.
38+
Resources:
39+
- Description: Sanborn Maps data
40+
ARN: arn:aws:s3:::loc-sanborn-maps
41+
Region: us-west-2
42+
Type: S3 Bucket
43+
Explore:
44+
- "[Browse Bucket by
45+
State](https://loc-sanborn-maps.s3.amazonaws.com/maps-by-state/index.html)"
46+
- "[README](https://loc-sanborn-maps.s3.amazonaws.com/README.html)"
47+
DataAtWork:
48+
Tutorials:
49+
- Title: README data cover sheet
50+
URL: https://loc-sanborn-maps.s3.amazonaws.com/README.html
51+
AuthorName: Library of Congress
52+
- Title: Sanborn Map Data Python Tutorial (Jupyter notebook)
53+
URL: https://libraryofcongress.github.io/data-exploration/Data%20Packages/sanborn.html
54+
AuthorName: Library of Congress
55+
AuthorURL: https://github.com/LibraryOfCongress
56+
- Title: "Fire Insurance Maps at the Library of Congress: A Resource Guide"
57+
URL: https://guides.loc.gov/fire-insurance-maps/introduction
58+
AuthorName: Julie Stoner, Reference Librarian, Geography and Map Division,
59+
Library of Congress
60+
Tools & Applications:
61+
- Title: Sanborn Atlas Volume Finder
62+
URL: https://loc.maps.arcgis.com/apps/instant/media/index.html?appid=0cb2c04324a0413081e1b793ea18f854
63+
AuthorName: Julie Stoner and Meagan Snow, Geography and Map Division, Library of
64+
Congress
65+
AuthorURL: https://github.com/aarande
66+
Publications:
67+
- Title: Introduction to the Collection
68+
URL: https://www.loc.gov/collections/sanborn-maps/articles-and-essays/introduction-to-the-collection/
69+
AuthorName: Walter W. Ristow

datasets/noaa-nbm-parallel.yaml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
Name: NOAA National Blend of Models (NBM) Parallel
2+
Description: |
3+
The National Blend of Models (NBM) is a nationally consistent and skillful suite of calibrated forecast guidance based on a blend of both NWS and non-NWS numerical weather prediction model data and post-processed model guidance. The goal of the NBM is to create a highly accurate, skillful and consistent starting point for the gridded forecast. This dataset contains data from the current parallel version of the NBM which is a test version, featuring many changes, that is a candidate to be implemented into operations following a careful vetting process.
4+
Documentation: |
5+
https://vlab.noaa.gov/web/mdl/nbm
6+
Contact: |
7+
For any questions regarding data delivery not associated with this platform or any general questions regarding the NOAA Open Data Dissemination (NODD) Program, email the NODD Team at [email protected].
8+
We also seek to identify case studies on how NOAA data is being used and will be featuring those stories in joint publications and in upcoming events. If you are interested in seeing your story highlighted, please share it with the NODD team by emailing [email protected]
9+
ManagedBy: "[NOAA](http://www.noaa.gov/)"
10+
UpdateFrequency: |
11+
Once per hour.
12+
Collabs:
13+
ASDI:
14+
Tags:
15+
- weather
16+
Tags:
17+
- aws-pds
18+
- agriculture
19+
- climate
20+
- disaster response
21+
- environmental
22+
- meteorological
23+
- weather
24+
License: |
25+
NOAA data disseminated through NODD are open to the public and can be used as desired.<br/> <br/>NOAA makes data openly available to ensure maximum use of our data, and to spur and encourage exploration and innovation throughout the industry. NOAA requests attribution for the use or dissemination of unaltered NOAA data. However, it is not permissible to state or imply endorsement by or affiliation with NOAA. If you modify NOAA data, you may not state or imply that it is original, unaltered NOAA data.
26+
Resources:
27+
- Description: National Blend of Models (NBM) Parallel
28+
ARN: arn:aws:s3:::noaa-nbm-para-pds
29+
Region: us-east-1
30+
Type: S3 Bucket
31+
Explore:
32+
- '[Browse Bucket](https://noaa-nbm-para-pds.s3.amazonaws.com/index.html)'
33+
- Description: New data notifications for NBM Parallel, only Lambda and SQS protocols allowed
34+
ARN: arn:aws:sns:us-east-1:123901341784:NewNBMParaObject
35+
Region: us-east-1
36+
Type: SNS Topic

datasets/noaa-nws-naqfc-pds.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
Name: NOAA National Air Quality Forecast Capability (NAQFC) Regional Model Guidance
22
Description: |
3-
The National Air Quality Forecasting Capability (NAQFC) dataset contains model-generated Air-Quality (AQ) forecast guidance from three different prediction systems. The first system is a coupled weather and atmospheric chemistry numerical forecast model, known as the Air Quality Model (AQM). It is used to produce forecast guidance for ozone (O3) and particulate matter with diameter equal to or less than 2.5 micrometers (PM2.5) using meteorological forecasts based on NCEP’s operational weather forecast models such as North American Mesoscale Models (NAM) and Global Forecast System (GFS), and atmospheric chemistry based on the EPA’s Community Multiscale Air Quality (CMAQ) model. In addition, the modeling system incorporates information related to chemical emissions, including anthropogenic emissions provided by the EPA and fire emissions from NOAA/NESDIS. The NCEP NAQFC AQM output fields in this archive include 72-hr forecast products of model raw and bias-correction predictions, extending back to 1 January 2020. All of the output was generated by the contemporaneous operational AQM, beginning with AQMv5 in 2020, with upgrades to AQMv6 on 20 July 2021, and AQMv7 on 14 May 2024. The history of AQM upgrades is documented [here](https://www.emc.ncep.noaa.gov/mmb/aq/AQChangelog.html)
3+
The National Air Quality Forecasting Capability (NAQFC) dataset contains model-generated air quality (AQ) forecast guidance from three different prediction systems. The first system is a coupled weather and atmospheric chemistry numerical forecast model, known as the Air Quality Model (AQM). It is used to produce forecast guidance for ozone (O3) and particulate matter that is less than or equal to 2.5 micrometers in diameter (PM2.5). Prior to May 14, 2024, AQM predictions were derived using the EPA’s Community Multiscale Air Quality (CMAQ) model, driven by meteorological fields from NCEP’s operational weather forecast models, specifically the North American Mesoscale Model (NAM; prior to 20 July 2021) and the Global Forecast System (GFS; beginning 20 July 2021). Since May 14, 2024, AQM guidance has been produced by a unique application within the community-based Unified Forecast System (UFS). The core model components in this application are derived directly from the fully online-coupled UFS-based weather and CMAQ-based chemistry models. In addition, it incorporates information related to chemical and particle source emissions as it integrates forward in time, including anthropogenic chemical emissions provided by the EPA, fire emissions from NOAA/NESDIS, and airborne particles generated by human activities and those predicted to be generated by wind-driven erosion and biosphere at ground level. The NCEP NAQFC AQM output fields in this archive include model raw and bias-corrected predictions dating back to 1 January 2020, all generated by the contemporaneous operational AQM, beginning with AQMv5 in 2020, transitioning to AQMv6 on 20 July 2021, and to AQMv7 on 14 May 2024. The length of each forecast was 48 hours prior to the implementation of AQMv6, and has been 72 hours ever since. The history of AQM upgrades is documented [here](https://www.emc.ncep.noaa.gov/mmb/aq/AQChangelog.html)
44
<br/>
55
<br/>
6-
The second prediction is known as the Hybrid Single-Particle Lagrangian Integrated Trajectory model (HYSPLIT). It is a widely used atmospheric transport and dispersion model containing an internal dust-generation module. It provides forecast guidance for atmospheric dust concentration and, prior to 28 June 2022, it also provided the NAQFC forecast guidance for smoke. Since that date, the third prediction system, a regional numerical weather prediction (NWP) model known as the Rapid Refresh (RAP) model, has subsumed HYSPLIT for operational smoke guidance, simulating the emission, transport, and deposition of smoke particles that originate from biomass burning (fires) and anthropogenic sources.
6+
The second prediction is known as the Hybrid Single-Particle Lagrangian Integrated Trajectory model (HYSPLIT). It is a widely used atmospheric transport and dispersion model containing an internal dust-generation module. It provides forecast guidance for atmospheric dust concentration and, prior to 28 June 2022, it also provided the NAQFC forecast guidance for smoke. Starting on that date, the third prediction system, a regional numerical weather prediction (NWP) model known as the Rapid Refresh (RAP) model, subsumed HYSPLIT for operational smoke guidance, simulating the emission, transport, and deposition of smoke particles that originate from biomass burning (fires) and anthropogenic sources.
77
<br/>
88
<br/>
9-
The output from each of these modeling systems is generated over three separate domains, one covering CONUS, one Alaska, and the other Hawaii. Currently, for this archive, the ozone, (PM2.5), and smoke output is available over all three domains, while dust products are available only over the CONUS domain. The predicted concentrations of all species in the lowest model layer (i.e., the layer in contact with the surface) are available, as are vertically integrated values of smoke and dust. The data is gridded horizontally within each domain, with a grid spacing of approximately 5 km over CONUS, 6 km over Alaska, and 2.5 km over Hawaii. Ozone concentrations are provided in parts per billion (PPB), while the concentrations of all other species are quantified in units of micrograms per cubic meter (ug/m3), except for the column-integrated smoke values which are expressed in units of mg/m2.
9+
The output from each of these modeling systems is generated over three separate domains, one covering CONUS, another over Alaska, and the other over Hawaii. Currently, for this archive, the O3, PM2.5, and smoke output is available over all three domains, while dust products are available only over the CONUS domain. The predicted concentrations of all species in the lowest model layer (i.e., the layer in contact with the surface) are available, as are vertically integrated values of smoke and dust. The data is gridded horizontally within each domain, with a grid spacing of approximately 5 km over CONUS, 6 km over Alaska, and 2.5 km over Hawaii. O3 concentrations are provided in parts per billion (PPB), while the concentrations of all other species are quantified in units of micrograms per cubic meter (ug/m3), except for the column-integrated smoke values which are expressed in units of milligrams per square meter (mg/m2).
1010
<br/>
1111
<br/>
12-
Temporally, O3 and PM2.5 are available as maximum and/or averaged values over various time periods. Specifically, O3 is available in both 1-hour and 8-hour (backward calculated) averages, as well as preceding 1-hour and 8-hour maximum values. Similarly, PM2.5 is available in 1-hour and 24-hour average values and 24-hour maximum values. In addition, all O3 and PM2.5 fields are available with bias-corrected magnitudes, based on derived model biases relative to observations.
12+
Temporally, O3 and PM2.5 are available as maximum and/or averaged values over various time periods, selected in part for consistency with the EPA’s National Ambient Air Quality Standards. Specifically, O3 is available in both 1-hour and 8-hour (backward calculated) averages, as well as preceding 1-hour and 8-hour maximum values. Similarly, PM2.5 is available in 1-hour and 24-hour average values and 24-hour maximum values. In addition, all O3 and PM2.5 fields are available with bias-corrected magnitudes, based on derived historical model biases relative to observations.
1313
<br/>
1414
<br/>
15-
The AQM produces hourly forecast guidance for O3 and PM2.5 out to 72 hours twice per day, starting at 0600 and 1200 UTC. Smoke guidance is available out to 51 hours from once-per-day RAP forecasts initialized at 0300 UTC, while dust guidance from HYSPLIT is available out to 48 hours from initialization times of 0600 and 1200 UTC.
15+
The AQM produces hourly forecast guidance for O3 and PM2.5 up to 72 hours twice per day. Smoke guidance is available up to 51 hours from once-per-day RAP forecasts, while dust guidance from HYSPLIT is available up to 48 hours.
1616
Documentation: https://vlab.noaa.gov/web/osti-modeling/air-quality
1717
Contact: For questions regarding data content or quality, visit the NCEP AQM Products website. For any questions regarding data delivery or any general questions regarding the NOAA Open Data Dissemination (NODD) Program, email the NODD Team at [email protected].
1818
<br /> We also seek to identify case studies on how NOAA data is being used and will be featuring those stories in joint publications and in upcoming events. If you are interested in seeing your story highlighted, please share it with the NODD team by emailing [email protected]
1919
ManagedBy: "[NOAA](http://www.noaa.gov/)"
20-
UpdateFrequency: 2 times per day, 0600 and 1200 UTC for O3, PM2.5, and dust; once per day, 0300 UTC for smoke
20+
UpdateFrequency: Two times per day, 0600 and 1200 UTC for O3, PM2.5, and dust; once per day, 0300 UTC for smoke
2121
Collabs:
2222
ASDI:
2323
Tags:

datasets/nrel-pds-dsgrid.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,12 @@ Resources:
4242
Type: S3 Bucket
4343
Explore:
4444
- '[Browse Dataset](https://data.openei.org/s3_viewer?bucket=oedi-data-lake&prefix=dsgrid-2018-efs%2F)'
45+
- Description: '[Demand-Side Grid Model (dsgrid) Building Load Profiles](https://data.openei.org/submissions/8446)'
46+
ARN: arn:aws:s3:::nrel-pds-dsgrid/building/
47+
Region: us-west-2
48+
Type: S3 Bucket
49+
Explore:
50+
- '[Browse Dataset](https://data.openei.org/s3_viewer?bucket=nrel-pds-dsgrid&prefix=building%2F)'
4551
DataAtWork:
4652
Tutorials:
4753
- Title: dsgrid Documentation

0 commit comments

Comments
 (0)