Skip to content
This repository was archived by the owner on Jan 12, 2024. It is now read-only.

Commit 9876fbd

Browse files
committed
Add some basic release notes.
1 parent 21b0e0a commit 9876fbd

File tree

2 files changed

+27
-15
lines changed

2 files changed

+27
-15
lines changed

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@
7373
"intake_parquet": ("https://intake-parquet.readthedocs.io/en/latest/", None),
7474
"intake_sql": ("https://intake-sql.readthedocs.io/en/latest/", None),
7575
"pandas": ("https://pandas.pydata.org/pandas-docs/stable", None),
76+
"pyarrow": ("https://pyarrow.readthedocs.io/en/latest/", None),
7677
"pytest": ("https://docs.pytest.org/en/latest/", None),
7778
"python": ("https://docs.python.org/3", None),
7879
"setuptools": ("https://setuptools.pypa.io/en/latest/", None),

docs/release_notes.rst

Lines changed: 26 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,37 @@
11
=======================================================================================
2-
PACKAGE_NAME Release Notes
2+
PUDL Data Catalog Release Notes
33
=======================================================================================
44

55
.. _release-v0-1-0:
66

77
---------------------------------------------------------------------------------------
8-
0.1.0 (2022-XX-XX)
8+
0.1.0 (2022-04-21)
99
---------------------------------------------------------------------------------------
1010

11-
This is a Header
12-
^^^^^^^^^^^^^^^^
13-
* Briefly describe the substantial changes to the code in here when you make a PR.
14-
* That way and users we have documentation as to what's going on.
15-
* You can refer to the relevant pull request using the ``pr`` role: :pr:`1`
16-
* Don't hesitate to give shoutouts to folks who contributed like :user:`cmgosnell`
17-
* You can link to issues that were closed like this: :issue:`2,3,4`
18-
19-
Bug Fixes
20-
^^^^^^^^^
21-
* It's good to make a note of any known bugs that are fixed by the release, and refer
22-
to the relevant issues.
11+
First Release
12+
^^^^^^^^^^^^^
13+
* We're excited to start providing bulk, versioned, programmatic access to the PUDL
14+
data, starting with the EPA CEMS hourly emissions data. This is still experimental.
15+
* The data is available in a Google cloud object store, via an Intake data catalog, and
16+
is stored in Apache Parquet files.
17+
* We're still working out some performance and metadata issues, but it's at least
18+
nominally functional, and we wanted to get it out early and see if we could get some
19+
feedback.
20+
* Currently there's a single-file and a partitioned version of the same data. We
21+
recommend using the single-file version (the source named ``hourly_emissions_epacems``
22+
in the catalog) since performance is generally better and we need to work on making
23+
per-file local caching more efficient before its worth using the partitioned data.
24+
* Thanks to :user:`martindurant` for helping us get things set up and helping us debug
25+
some issues.
2326

2427
Known Issues
2528
^^^^^^^^^^^^
26-
* It's also good to list any remaining known problems, and link to their issues too.
29+
* Local caching of the Parquet files works, but with both the monolithic and partitioned
30+
versions of the data will typically cache the entire dataset immediately upon first
31+
access. This is because the metadata describing what data is in which file is only
32+
available within the Parquet files themseles, so every files has to be accessed in
33+
order to filter the entire dataset. Since the data is several GB, it can take a while
34+
to cache initially. Subsequent access is fast. See :issue:`4`
35+
* Accessing the year-state partitioned version of the data is much slower than the
36+
monolithic single file version. We don't really understand why. For now it's
37+
recommended to use the monolithic EPA CEMS data. See :issue:`8`

0 commit comments

Comments
 (0)