Skip to content

Conversation

@Mohammad-Tayyab-Frequenz
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings October 29, 2025 14:21
@Mohammad-Tayyab-Frequenz Mohammad-Tayyab-Frequenz requested a review from a team as a code owner October 29, 2025 14:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the add_energy_flows() function to handle more flexible energy flow calculations by:

  • Adding support for grid column input to enable consumption inference
  • Transforming production columns using the asset_production() function before aggregation
  • Introducing an optional clip_non_negative parameter to helper functions for controlled clipping behavior
Comments suppressed due to low confidence (1)

src/frequenz/lib/notebooks/reporting/utils/helpers.py:1

  • The new consumption inference logic with three branches (existing column, inferred from grid, fallback to zeros) lacks test coverage. Tests should verify each branch executes correctly and produces expected results.
# License: MIT

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added part:docs Affects the documentation part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) labels Oct 29, 2025
@cwasicki
Copy link
Contributor

This needs a rebase

@Mohammad-Tayyab-Frequenz Mohammad-Tayyab-Frequenz force-pushed the fix-energy-flows-df branch 3 times, most recently from 4d4bdb9 to e559cb3 Compare October 29, 2025 14:57

def _sum_cols(df: pd.DataFrame, cols: list[str] | None) -> pd.Series:
def _sum_cols(
df: pd.DataFrame, cols: list[str] | None, *, clip_non_negative: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new argument is not used anywhere, so why introducing it?

series = df.reindex(columns=[col], fill_value=0)[col].astype("float64")

if clip_non_negative:
series = series.clip(lower=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC you only use it in one place, and I also don't see the point of having this flag (which is just a boolean for lower=0) when you could simply do _get_numeric_series(...).clip(lower=0), which is more flexible and readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. I will do that

# Surplus vs. consumption
# Normalize production, grid and consumption columns by removing None entries
resolved_production_cols = [
col for col in (production_cols or []) if col is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there Nones in the columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for handling the cases where users do not give any production/consumption/grid cols.

.reindex(production_excess_series.index, fill_value=0.0)
.clip(lower=0)
)
return pd.concat([production_excess_series, battery], axis=1).min(axis=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to ensure that battery is properly aligned with production_excess_series.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can they be not aligned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn’t sure if reindexing would be the best approach moving forward or not.
Consider cases where users provide input data in which production and consumption have matching indices and are aligned correctly, but the battery data contains extra rows (for any unknown reason). In such cases, the downstream functions would end up with more rows than the original production and consumption data.

Should we handle such scenarios within this Python codebase, or can we assume that the input data is already clean and pre-processed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which values would the rows have? If it's NaNs I wouldn't be concerned.

return share


def consumption(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added battery series separately and changed the function to work on pd.series rather than pd.dataframes.

df_flows["production_total"],
df_flows["consumption_total"],
production_is_positive=production_is_positive,
production_is_positive=True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial input production is negative but is handled by asset_production to invert the sign. So, we have to change the arguments to positive.

)

# Battery charging power (optional)
bat_in = _get_numeric_series(df_flows, battery_charge_col)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a separate battery series containing only positive values to represent the battery charge.

.reindex(production_excess_series.index, fill_value=0.0)
.clip(lower=0)
)
return pd.concat([production_excess_series, battery], axis=1).min(axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can they be not aligned?

production_cols: List of production column names (e.g., "pv", "chp", "battery" or "ev").
Can be None or empty if no on-site generation is present.
grid_cols: List of one or more grid column names.
grid: Series of grid import values (e.g., kW or MW).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only grid import and not grid power?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be grid power. I will update the docstring.


if not grid_cols:
raise ValueError("At least one grid column must be specified in grid_cols.")
grid_s = grid.astype("float64").fillna(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not fill with zeroes. There can be various reasons that the grid measurements are missing and we cannot safely know if they were zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fill it with 0 so as to avoid the cases of the consumption also being NaN in those cases.
For eg.
Production - 15, battery - 5, grid - NaN
Then consumption will also be NaN.

So, in this case should we have the consumption as NaN or 20?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consumption should be NaN, because you don't know what the grid power is.

return consumption
if (result < 0).any():
warnings.warn(
"Negative inferred consumption detected. This can occur during net export "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by net export and why can this cause negatives here? I think the most likely case to get negative values here is unmeasured energy generation sources or short-term spikes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most likely case of negative consumption is only going to be when the production sign was positive and after subtracting it from grid we get a negative value.

Net export are the cases where there was less consumption than production on-site and hence, the consumption had a negative sign. I just added this as a warning to make sure that users know that they have such periods and to check if there was no calculation error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and hence, the consumption had a negative sign.

Do you mean "grid power had a negative sign"?

And when you say net export, does it refer to the negative part of grid power?

series = pd.Series(0.0, index=df.index, dtype="float64")
else:
raw = df.reindex(columns=[col], fill_value=0)[col]
series = pd.to_numeric(raw, errors="coerce").fillna(0.0).astype("float64")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not introduced in this PR, but also here not sure if we should by default fill with 0s.

Copy link
Contributor Author

@Mohammad-Tayyab-Frequenz Mohammad-Tayyab-Frequenz Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also to make sure we don't get NaNs during downstream calculations.



# pylint: disable=too-many-arguments, too-many-locals
def _column_has_data(df: pd.DataFrame, col: str | None) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe _has_nonzero_values(df, *, column: str | None)? And why would you call it with None at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could occur in cases where the any column input (production, battery, consumption) from the user was not provided and by default it is assumed as None.
So, we need to safely pass those cases from our function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:docs Affects the documentation part:tooling Affects the development tooling (CI, deployment, dependency management, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants