-
Notifications
You must be signed in to change notification settings - Fork 8
fix: missing production and consumption column #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.x.x
Are you sure you want to change the base?
fix: missing production and consumption column #177
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the add_energy_flows() function to handle more flexible energy flow calculations by:
- Adding support for grid column input to enable consumption inference
- Transforming production columns using the
asset_production()function before aggregation - Introducing an optional
clip_non_negativeparameter to helper functions for controlled clipping behavior
Comments suppressed due to low confidence (1)
src/frequenz/lib/notebooks/reporting/utils/helpers.py:1
- The new consumption inference logic with three branches (existing column, inferred from grid, fallback to zeros) lacks test coverage. Tests should verify each branch executes correctly and produces expected results.
# License: MIT
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a1fbe82 to
26afd0b
Compare
|
This needs a rebase |
4d4bdb9 to
e559cb3
Compare
|
|
||
| def _sum_cols(df: pd.DataFrame, cols: list[str] | None) -> pd.Series: | ||
| def _sum_cols( | ||
| df: pd.DataFrame, cols: list[str] | None, *, clip_non_negative: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new argument is not used anywhere, so why introducing it?
| series = df.reindex(columns=[col], fill_value=0)[col].astype("float64") | ||
|
|
||
| if clip_non_negative: | ||
| series = series.clip(lower=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC you only use it in one place, and I also don't see the point of having this flag (which is just a boolean for lower=0) when you could simply do _get_numeric_series(...).clip(lower=0), which is more flexible and readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense. I will do that
| # Surplus vs. consumption | ||
| # Normalize production, grid and consumption columns by removing None entries | ||
| resolved_production_cols = [ | ||
| col for col in (production_cols or []) if col is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there Nones in the columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for handling the cases where users do not give any production/consumption/grid cols.
e559cb3 to
908a341
Compare
…eation Signed-off-by: Mohammad Tayyab <[email protected]>
908a341 to
0ea5ed5
Compare
| .reindex(production_excess_series.index, fill_value=0.0) | ||
| .clip(lower=0) | ||
| ) | ||
| return pd.concat([production_excess_series, battery], axis=1).min(axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to ensure that battery is properly aligned with production_excess_series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can they be not aligned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn’t sure if reindexing would be the best approach moving forward or not.
Consider cases where users provide input data in which production and consumption have matching indices and are aligned correctly, but the battery data contains extra rows (for any unknown reason). In such cases, the downstream functions would end up with more rows than the original production and consumption data.
Should we handle such scenarios within this Python codebase, or can we assume that the input data is already clean and pre-processed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which values would the rows have? If it's NaNs I wouldn't be concerned.
| return share | ||
|
|
||
|
|
||
| def consumption( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added battery series separately and changed the function to work on pd.series rather than pd.dataframes.
| df_flows["production_total"], | ||
| df_flows["consumption_total"], | ||
| production_is_positive=production_is_positive, | ||
| production_is_positive=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial input production is negative but is handled by asset_production to invert the sign. So, we have to change the arguments to positive.
| ) | ||
|
|
||
| # Battery charging power (optional) | ||
| bat_in = _get_numeric_series(df_flows, battery_charge_col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a separate battery series containing only positive values to represent the battery charge.
| .reindex(production_excess_series.index, fill_value=0.0) | ||
| .clip(lower=0) | ||
| ) | ||
| return pd.concat([production_excess_series, battery], axis=1).min(axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can they be not aligned?
| production_cols: List of production column names (e.g., "pv", "chp", "battery" or "ev"). | ||
| Can be None or empty if no on-site generation is present. | ||
| grid_cols: List of one or more grid column names. | ||
| grid: Series of grid import values (e.g., kW or MW). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only grid import and not grid power?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should be grid power. I will update the docstring.
|
|
||
| if not grid_cols: | ||
| raise ValueError("At least one grid column must be specified in grid_cols.") | ||
| grid_s = grid.astype("float64").fillna(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not fill with zeroes. There can be various reasons that the grid measurements are missing and we cannot safely know if they were zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to fill it with 0 so as to avoid the cases of the consumption also being NaN in those cases.
For eg.
Production - 15, battery - 5, grid - NaN
Then consumption will also be NaN.
So, in this case should we have the consumption as NaN or 20?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consumption should be NaN, because you don't know what the grid power is.
| return consumption | ||
| if (result < 0).any(): | ||
| warnings.warn( | ||
| "Negative inferred consumption detected. This can occur during net export " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by net export and why can this cause negatives here? I think the most likely case to get negative values here is unmeasured energy generation sources or short-term spikes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the most likely case of negative consumption is only going to be when the production sign was positive and after subtracting it from grid we get a negative value.
Net export are the cases where there was less consumption than production on-site and hence, the consumption had a negative sign. I just added this as a warning to make sure that users know that they have such periods and to check if there was no calculation error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and hence, the consumption had a negative sign.
Do you mean "grid power had a negative sign"?
And when you say net export, does it refer to the negative part of grid power?
| series = pd.Series(0.0, index=df.index, dtype="float64") | ||
| else: | ||
| raw = df.reindex(columns=[col], fill_value=0)[col] | ||
| series = pd.to_numeric(raw, errors="coerce").fillna(0.0).astype("float64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not introduced in this PR, but also here not sure if we should by default fill with 0s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also to make sure we don't get NaNs during downstream calculations.
|
|
||
|
|
||
| # pylint: disable=too-many-arguments, too-many-locals | ||
| def _column_has_data(df: pd.DataFrame, col: str | None) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe _has_nonzero_values(df, *, column: str | None)? And why would you call it with None at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could occur in cases where the any column input (production, battery, consumption) from the user was not provided and by default it is assumed as None.
So, we need to safely pass those cases from our function
No description provided.