Skip to content

Conversation

@cwasicki
Copy link
Collaborator

@cwasicki cwasicki commented Sep 5, 2023

Fixes #601

@cwasicki cwasicki requested a review from a team as a code owner September 5, 2023 16:17
@github-actions github-actions bot added part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests part:data-pipeline Affects the data pipeline labels Sep 5, 2023
@cwasicki cwasicki added this to the v1.0.0-rc milestone Sep 5, 2023
@cwasicki cwasicki self-assigned this Sep 5, 2023
llucax
llucax previously approved these changes Sep 6, 2023
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of bike shedding naming comments, but LGTM in general!

"""
return self._gaps

def is_missing_value(self, value: QuantityT | None) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick:

Suggested change
def is_missing_value(self, value: QuantityT | None) -> bool:
def is_value_missing(self, value: QuantityT | None) -> bool:

Or simply is_missing(). If you take a Sample instead of a QuantityT, I think is_value_missing() makes more sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, using a negative as part of a function name is usually not recommended, because you can end up with a double negation, which is harder to reason about. What about:

Suggested change
def is_missing_value(self, value: QuantityT | None) -> bool:
def has_value(self, sample: Sample) -> bool:

But I'm not 100% sure in this case, because I guess here missing is important, as you are talking about gaps. Another alternative could be is_value_present(sample). So just food for thought, I'm not convinced either way :P

Also, infinite is considered as not missing, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make it taking a sample instead. Do you think it can be moved to a method of Sample at some point?

Will change it to has_value. There is already a is_missing, which is about timestamps. I guess another candidate to improve naming.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, infinite is considered as not missing, right?

So far yes, on the typical microgrid data we work with infinities will already be turned into NaNs and for other use-cases I did not want to restrict it too much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make it taking a sample instead. Do you think it can be moved to a method of Sample at some point?

I'm not sure, because again what's a present value could depend on the application, for example is infinite a value or not? I would be in favor of adding more common methods from math that would also check for None, like is_finite(), but that is not isnan() and not isinf().

There is already a is_missing, which is about timestamps. I guess another candidate to improve naming.

Oh! Yes :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative could be is_gap() which ties it up pretty well with the concept.

But I was thinking if maybe is_missing(timestamp) can be unified into a is_gap(sample), then update() could already pass if a timestamp is missing to self._update_gaps(). But maybe this is too much of a refactor to be on the scope of this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about is_gap(sample): is_missing checks whether a sample is already in our gap list, has_value checks whether a sample can be considered a gap.

@cwasicki
Copy link
Collaborator Author

cwasicki commented Sep 6, 2023

@llucax Updated

llucax
llucax previously approved these changes Sep 6, 2023
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All optional comments, LGTM as it is too.

Returns:
True if the sample has a value and it's not NaN.
"""
return not ((sample.value is None) or sample.value.isnan())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to save some parenthesis (which might make it easier to read):

Suggested change
return not ((sample.value is None) or sample.value.isnan())
return sample.value is not None and not sample.value.isnan()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking about that and found it even less readable. But removed one level.

# Update data
value: float = np.nan if sample.value is None else sample.value.base_value
if self.has_value(sample):
assert sample.value is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this, too bad that this is needed, because it was just checked by has_value().

You could avoid this by making has_value a function instead and using TypeGuard, but that would also need to take it back to operate on float | None:

from typing import TypeGuard

def has_value(sample: float | None) -> TypeGuard[float]:
    """Check if a sample has a value and it's not NaN.
    Args:
        sample: sample to check.
    Returns:
        True if the sample has a value and it's not NaN.
    """
    return value is not None and not value.isnan()

if has_value(sample):  # If this returns True, then the type checker narrows the type to `float`
    value = sample.value.base_value  # this is now fine without the assert
else:
    value = np.nan

I thought about alternatives and all seem awful, so I guess we'll have to live with the assert 🤷

Comment on lines -157 to +172
value: float = np.nan if sample.value is None else sample.value.base_value
if self.has_value(sample):
assert sample.value is not None
value = sample.value.base_value
else:
value = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or.... ❗❗❗ just not use the function in this context... The original code is pretty simple and understandable (and works the same, as if it is NaN we'll use np.nan anyway, or is it np.nan different from float('nan')?) without it too:

        valueL = np.nan if sample.value is None else sample.value
        self._buffer[self.datetime_to_index(timestamp)] = value
        self._update_gaps(timestamp, prev_newest, not self.has_value(sample))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to leave it as it is required in both these places. So you have one location where you define what a missing value is and what not (e.g. if we want to change it to include infs at some point.)

Actually I find it not so bad, even more readable. But yea, the assertion is annoying.

@cwasicki
Copy link
Collaborator Author

cwasicki commented Sep 6, 2023

@llucax @matthias-wende-frequenz Updated, but I am not authorized to merge.

@matthias-wende-frequenz matthias-wende-frequenz added this pull request to the merge queue Sep 6, 2023
Merged via the queue into frequenz-floss:v0.x.x with commit 01b8d9d Sep 6, 2023
@cwasicki cwasicki deleted the gapdef branch September 6, 2023 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:data-pipeline Affects the data pipeline part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests

Projects

Development

Successfully merging this pull request may close these issues.

Treat NaN as gap in ring buffer

3 participants