Add some basic release notes.

zaneselvans · zaneselvans · commit 9876fbd3b693 · 2022-04-21T17:28:30.000-05:00
diff --git a/docs/conf.py b/docs/conf.py
@@ -73,6 +73,7 @@
     "intake_parquet": ("https://intake-parquet.readthedocs.io/en/latest/", None),
     "intake_sql": ("https://intake-sql.readthedocs.io/en/latest/", None),
     "pandas": ("https://pandas.pydata.org/pandas-docs/stable", None),
+    "pyarrow": ("https://pyarrow.readthedocs.io/en/latest/", None),
     "pytest": ("https://docs.pytest.org/en/latest/", None),
     "python": ("https://docs.python.org/3", None),
     "setuptools": ("https://setuptools.pypa.io/en/latest/", None),
diff --git a/docs/release_notes.rst b/docs/release_notes.rst
@@ -1,26 +1,37 @@
 =======================================================================================
-PACKAGE_NAME Release Notes
+PUDL Data Catalog Release Notes
 =======================================================================================
 
 .. _release-v0-1-0:
 
 ---------------------------------------------------------------------------------------
-0.1.0 (2022-XX-XX)
+0.1.0 (2022-04-21)
 ---------------------------------------------------------------------------------------
 
-This is a Header
-^^^^^^^^^^^^^^^^
-* Briefly describe the substantial changes to the code in here when you make a PR.
-* That way and users we have documentation as to what's going on.
-* You can refer to the relevant pull request using the ``pr`` role: :pr:`1`
-* Don't hesitate to give shoutouts to folks who contributed like :user:`cmgosnell`
-* You can link to issues that were closed like this: :issue:`2,3,4`
-
-Bug Fixes
-^^^^^^^^^
-* It's good to make a note of any known bugs that are fixed by the release, and refer
-  to the relevant issues.
+First Release
+^^^^^^^^^^^^^
+* We're excited to start providing bulk, versioned, programmatic access to the PUDL
+  data, starting with the EPA CEMS hourly emissions data. This is still experimental.
+* The data is available in a Google cloud object store, via an Intake data catalog, and
+  is stored in Apache Parquet files.
+* We're still working out some performance and metadata issues, but it's at least
+  nominally functional, and we wanted to get it out early and see if we could get some
+  feedback.
+* Currently there's a single-file and a partitioned version of the same data. We
+  recommend using the single-file version (the source named ``hourly_emissions_epacems``
+  in the catalog) since performance is generally better and we need to work on making
+  per-file local caching more efficient before its worth using the partitioned data.
+* Thanks to :user:`martindurant` for helping us get things set up and helping us debug
+  some issues.
 
 Known Issues
 ^^^^^^^^^^^^
-* It's also good to list any remaining known problems, and link to their issues too.
+* Local caching of the Parquet files works, but with both the monolithic and partitioned
+  versions of the data will typically cache the entire dataset immediately upon first
+  access. This is because the metadata describing what data is in which file is only
+  available within the Parquet files themseles, so every files has to be accessed in
+  order to filter the entire dataset. Since the data is several GB, it can take a while
+  to cache initially. Subsequent access is fast. See :issue:`4`
+* Accessing the year-state partitioned version of the data is much slower than the
+  monolithic single file version. We don't really understand why. For now it's
+  recommended to use the monolithic EPA CEMS data. See :issue:`8`