-
Notifications
You must be signed in to change notification settings - Fork 20
Treat NaN as gap in ring buffer #641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat NaN as gap in ring buffer #641
Conversation
Signed-off-by: cwasicki <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of bike shedding naming comments, but LGTM in general!
| """ | ||
| return self._gaps | ||
|
|
||
| def is_missing_value(self, value: QuantityT | None) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick:
| def is_missing_value(self, value: QuantityT | None) -> bool: | |
| def is_value_missing(self, value: QuantityT | None) -> bool: |
Or simply is_missing(). If you take a Sample instead of a QuantityT, I think is_value_missing() makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, using a negative as part of a function name is usually not recommended, because you can end up with a double negation, which is harder to reason about. What about:
| def is_missing_value(self, value: QuantityT | None) -> bool: | |
| def has_value(self, sample: Sample) -> bool: |
But I'm not 100% sure in this case, because I guess here missing is important, as you are talking about gaps. Another alternative could be is_value_present(sample). So just food for thought, I'm not convinced either way :P
Also, infinite is considered as not missing, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make it taking a sample instead. Do you think it can be moved to a method of Sample at some point?
Will change it to has_value. There is already a is_missing, which is about timestamps. I guess another candidate to improve naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, infinite is considered as not missing, right?
So far yes, on the typical microgrid data we work with infinities will already be turned into NaNs and for other use-cases I did not want to restrict it too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make it taking a sample instead. Do you think it can be moved to a method of Sample at some point?
I'm not sure, because again what's a present value could depend on the application, for example is infinite a value or not? I would be in favor of adding more common methods from math that would also check for None, like is_finite(), but that is not isnan() and not isinf().
There is already a
is_missing, which is about timestamps. I guess another candidate to improve naming.
Oh! Yes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative could be is_gap() which ties it up pretty well with the concept.
But I was thinking if maybe is_missing(timestamp) can be unified into a is_gap(sample), then update() could already pass if a timestamp is missing to self._update_gaps(). But maybe this is too much of a refactor to be on the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about is_gap(sample): is_missing checks whether a sample is already in our gap list, has_value checks whether a sample can be considered a gap.
aaf4294
|
@llucax Updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All optional comments, LGTM as it is too.
| Returns: | ||
| True if the sample has a value and it's not NaN. | ||
| """ | ||
| return not ((sample.value is None) or sample.value.isnan()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to save some parenthesis (which might make it easier to read):
| return not ((sample.value is None) or sample.value.isnan()) | |
| return sample.value is not None and not sample.value.isnan() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was thinking about that and found it even less readable. But removed one level.
| # Update data | ||
| value: float = np.nan if sample.value is None else sample.value.base_value | ||
| if self.has_value(sample): | ||
| assert sample.value is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this, too bad that this is needed, because it was just checked by has_value().
You could avoid this by making has_value a function instead and using TypeGuard, but that would also need to take it back to operate on float | None:
from typing import TypeGuard
def has_value(sample: float | None) -> TypeGuard[float]:
"""Check if a sample has a value and it's not NaN.
Args:
sample: sample to check.
Returns:
True if the sample has a value and it's not NaN.
"""
return value is not None and not value.isnan()
if has_value(sample): # If this returns True, then the type checker narrows the type to `float`
value = sample.value.base_value # this is now fine without the assert
else:
value = np.nanI thought about alternatives and all seem awful, so I guess we'll have to live with the assert 🤷
| value: float = np.nan if sample.value is None else sample.value.base_value | ||
| if self.has_value(sample): | ||
| assert sample.value is not None | ||
| value = sample.value.base_value | ||
| else: | ||
| value = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or.... ❗❗❗ just not use the function in this context... The original code is pretty simple and understandable (and works the same, as if it is NaN we'll use np.nan anyway, or is it np.nan different from float('nan')?) without it too:
valueL = np.nan if sample.value is None else sample.value
self._buffer[self.datetime_to_index(timestamp)] = value
self._update_gaps(timestamp, prev_newest, not self.has_value(sample))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to leave it as it is required in both these places. So you have one location where you define what a missing value is and what not (e.g. if we want to change it to include infs at some point.)
Actually I find it not so bad, even more readable. But yea, the assertion is annoying.
Signed-off-by: cwasicki <[email protected]>
|
@llucax @matthias-wende-frequenz Updated, but I am not authorized to merge. |
Fixes #601