Conversation
jdangerx
left a comment
There was a problem hiding this comment.
Looking good, just some comments from poking around in wasm mode mostly!
d1-dossier-plant.py
Outdated
| from datetime import date | ||
|
|
||
| import pandas as pd | ||
| import marimo as mo |
There was a problem hiding this comment.
You'll need to import pyarrow here too, for the wasm notebook to work.
Separately, fastparquet is much faster than pyarrow at least in the pyodide environment. With pyarrow the two parquet loads take 33s on my computer, and fastparquet loads in 10s - consider using fastparquet instead?
d1-dossier-plant.py
Outdated
|
|
||
| @app.cell | ||
| def _(pd): | ||
| out_eia__yearly_generators = pd.read_parquet("https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/out_eia__yearly_generators.parquet") |
There was a problem hiding this comment.
Using progress bars could be a nice UI touch, at the expense of gunking up the code a bit:
d1-dossier-plant.py
Outdated
|
|
||
| @app.cell | ||
| def _(mo, out_eia__yearly_plants): | ||
| selected_state = mo.ui.dropdown.from_series(out_eia__yearly_plants.state.drop_duplicates().sort_values(), label="Select a state:", value="CO") |
There was a problem hiding this comment.
Having preselected values is really helpful!
d1-dossier-plant.py
Outdated
|
|
||
| @app.cell | ||
| def _(this_plant__generators): | ||
| this_plant__generators.set_index("generator_id").T.dropna(thresh=1) |
There was a problem hiding this comment.
Couldn't figure out a way to make the default page size bigger here... we can use mo.ui.dataframe(..., page_size=100) but that has an unhideable transformation UI attached. And we can use Table but that requires munging and isn't really designed for two labeled axes.
pyproject.toml
Outdated
| "marimo>=0.20.2", | ||
| "matplotlib>=3.10.8", | ||
| "pandas>=3.0.1", | ||
| "pyarrow>=23.0.1", |
There was a problem hiding this comment.
I think marimo is stuck on a pretty old version of pyodide: marimo-team/marimo#5995
So we should probably pin these libs to what is available in that version: https://pyodide.org/en/0.27.7/usage/packages-in-pyodide.html
There was a problem hiding this comment.
mmmmmnyhhh I can't get matplotlib 3.8.* to build. I can poke up the pyproject.toml and have someone else try, see if it's me or not?
terrible horrible no good very bad failed build output |
d1-dossier-plant.py
Outdated
| available = available.loc[available>0].index | ||
| if available.shape[0]==1: | ||
| only_option.add(k) | ||
| filters[k] = mo.ui.multiselect(options={str(x): x for x in available}, value=[str(available[0])] if k in only_option else None) |
There was a problem hiding this comment.
playing around: we could add some sort of explanation of each column with a tooltip like so:
| filters[k] = mo.ui.multiselect(options={str(x): x for x in available}, value=[str(available[0])] if k in only_option else None) | |
| filters[k] = mo.ui.multiselect(label=f"<div data-tooltip='details for {k}'>?</div>", options={str(x): x for x in available}, value=[str(available[0])] if k in only_option else None) |
obviously with real content instead of fake content. and we could also maybe make it look nicer:
🤷 - not urgent, just a thing to think about for the future.
There was a problem hiding this comment.
A few actual requests for clarity with labels and lots of lil non-blocking nice-to-have's. I can think of so many things to add onto this! like add some charts that are generator based (similar to what you have here for the whole plant) and adding fuel cost by fuel type charts over time! Adding capacity factor per generator over time. Getting extra fancy and (first making a monthly cems emissions table) and adding emissions by unit. but all of that is nice-to-haves and would be easy to add incrementally.
This is such a fun place to start!
also just fyi i barely reviewed the code itself - i was mostly reviewing thinking about a potential user wanting to explore a given plant.
e-belfer
left a comment
There was a problem hiding this comment.
Woo it's happening! I think to start I'd focus on doing fewer things better, and there's something about the time-based generator categorical/attribute data that is feeling a bit tricky to visualize right now. Otherwise it's an exciting start! Below are some notes, non of which are blocking.
- Took me approximately 30 seconds to load.
- I find filtering by report_date with a full timescale below (up until the date of that data) to be kind of confusing. I wonder if we want to restrict what is getting filtered by year and move the report year filter much closer to that point in the dashboard, since this mostly seems useful for seeing what generators were active in a particular year and more about them. For plants that have stopped generating, they'll report 0 generators in the most recent year but you'll see data in the timeseries below, which is slightly confusing.
- Tiny request to drop the selector buttons in the marimo UI table.
- I think we can probably start with a smaller number of generator selectors (e.g., operational status, prime mover, energy source code, some boolean cols), rather than all of them. Having generator retirement date as anything other than "after X date" isn't that useful since I don't really expect to be able to select >1
- "No generation data available at the generation level" is slightly confusing (I definitely went, there's data literally below!) but I don't have a great suggestion about rewording.
- One we finalize, presuming we'll want to think about styling, any additional text/links needing, etc.
Co-authored-by: Christina Gosnell <cgosnell@catalyst.coop>
for more information, see https://pre-commit.ci
do you know how to do this? I couldn't figure out how to configure display of 50 rows per page without it forcing selector buttons on me |
You can add |
cmgosnell
left a comment
There was a problem hiding this comment.
This all looks real good! I appreciate the updates and the next phase issue to compile next steps and wishes.
its not blocking for me but i agree with ella's suggestion to limit the columns that you can select generators by. I like the concept of what you did (only non-float columns) but it is a bit overwhelming. I'd just go with these but keep displaying everything in the table like you are doing now:
filter_columns = [
"generator_id",
"unit_id_pudl",
"technology_description",
"energy_source_code_1",
"prime_mover_code",
"operational_status",
"fuel_type_code_pudl",
"associated_combined_heat_power",
"operational_status_code",
]|
We're suddenly unable to read parquet files via URL. If you use if you use Things I've tried:
|
Overview
Closes #20.
What did you change in this PR?
plant-explorer.pycontaining marimo notebook for exploring plantsPlant summary:
Generator summary:
Generator select-a-whirl of tabular data:
Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list