Skip to content

Conversation

@flora-hofmann-frequenz
Copy link
Collaborator

Addressing Issue #79.

@github-actions github-actions bot added the part:docs Affects the documentation label Oct 30, 2024
Marenz
Marenz previously approved these changes Oct 30, 2024
start_dt: datetime,
end_dt: datetime,
resolution: int | None,
resampling_period: timedelta,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be None if no resampling is desired.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a mypy error when I set it to None and the use total_seconds() in the streaming filter. Not quite sure how to work around it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_client.py:303: error: Item "None" of "timedelta | None" has no attribute "total_seconds" [union-attr]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's None you can pass that to the API, otherwise you convert it. E.g.

resolution = td.total_seconds() if sp is not None else None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this in.

Besides the microgrid_id, component_ids, and metrics, start, and end time,
you can also set the sampling period for resampling using the `resolution` parameter.
For example, to resample data every 15 minutes, use a `resolution` of 900 seconds. The default is 1 second.
you can also set the sampling period for resampling using the `resampling_period`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that the SDK uses sampling_period, but I think the term you chose is more accurate, so I would keep it.

resampling_options=PBResamplingOptions(resolution=resolution),
resampling_options=PBResamplingOptions(
resolution=(
int(resampling_period.total_seconds())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering, do we want int() which cuts off the fractional part or rather round() which .. well.. rounds ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but we had resolution : int beforehand, that's why I kept it this way

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it also makes sense, cause the client/user would likely use 1min, 15min and probably nothing too complicated in between. If it needs to be as it is read out then one can just leave it defaulting to None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but due to floating point, you could end up with an number like 4.999999 in there and instead of 5 you'll get 4 this way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the proto defines uint32 should we then not also use int here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would still use int because the API doesn't support sub-second resolution. But int just cuts off the fractional part, so what @Marenz describes is a problem with floating points in general where you cannot always represent integers exactly. So I agree with rounding here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where you cannot always represent integers exactly.

In practice this should less of a concern in the range we use this parameter, maybe more if the value is a result of floating point operations I guess. But it's still good practice :)

)
parser.add_argument("--resolution", type=int, help="Resolution", default=None)
parser.add_argument(
"--resampling_period", type=int, help="Resampling period", default=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is anywhere explained what unit this parameter expects? And is it desired that sub-second durations (if those are seconds) are not supported (int)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but when calling this program with "--help", I won't have the proto buf in mind and how something maps exactly from that to my command line parameter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh sorry, you mean i should add something like: help="Resampling period in seconds (integer, no sub-second durations)." (obviously pending our discussion above).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter here is int, so that "sub second" part is not needed. But you convert it later to a time-delta which then converts it to seconds, but in floating point. And that last conversion is only the part where the problem can happen. Though rather theoretical maybe, depending what the source of the value is :)

start_dt: The start date and time.
end_dt: The end date and time.
resolution: The resampling resolution for the data, represented in seconds.
resampling_period: The period for resampling the data, specified as `timedelta` object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change it now, but for future, I think the extra type info isn't required in the comment, the type hints already give us that :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I have taken it out since I had to rebase on v0.x.x to get rid of the merge commit in any case.

Marenz
Marenz previously approved these changes Nov 12, 2024
Marenz
Marenz previously approved these changes Nov 12, 2024
Copy link
Contributor

@Marenz Marenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎆

start_dt: datetime,
end_dt: datetime,
resolution: int,
resampling_period: timedelta,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are passing an int (and in fact also None is possible, which is missing here). I would leave this and convert to timedelta (if set) below.

Copy link
Collaborator Author

@flora-hofmann-frequenz flora-hofmann-frequenz Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am not entirely sure that I understand this comment. I tried the following change:

resampling_period: int | None,
states: bool,
bounds: bool,
service_address: str,
key: str,
fmt: str,

) -> None:
"""Test the ReportingApiClient.

Args:
    microgrid_id: microgrid ID
    component_id: component ID
    metric_names: list of metric names
    start_dt: start datetime
    end_dt: end datetime
    resampling_period: The period for resampling the data.
    states: include states in the output
    bounds: include bounds in the output
    service_address: service address
    key: API key
    fmt: output format

Raises:
    ValueError: if output format is invalid
"""
client = ReportingApiClient(service_address, key)

metrics = [Metric[mn] for mn in metric_names]

if resampling_period is not None:
    resampling_period = timedelta(seconds=resampling_period)

def data_iter() -> AsyncIterator[MetricSample]:
    """Iterate over single metric.

    Just a wrapper around the client method for readability.

    Returns:
        Iterator over single metric samples
    """
    return client.list_single_component_data(
        microgrid_id=microgrid_id,
        component_id=component_id,
        metrics=metrics,
        start_dt=start_dt,
        end_dt=end_dt,
        resampling_period=resampling_period,
        include_states=states,
        include_bounds=bounds,
    )

But now I am getting (expected) mypy errors?

src/frequenz/client/reporting/main.py:131: error: Incompatible types in assignment (expression has type "timedelta", variable has type "int | None") [assignment]
src/frequenz/client/reporting/main.py:147: error: Argument "resampling_period" to "list_single_component_data" of "ReportingApiClient" has incompatible type "int | None"; expected "timedelta | None" [arg-type]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here in the run method, the type should be int | None, not in the ReportingApiClient Constructor if I read it correctly.
You then only need to convert it to timedeltta in line 144.

I would also recommend renaming the parameter to resampling_period_s to indicate that it is seconds in the name right away. Once it is a timedelta that extra _s won't be needed anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here in the run method, the type should be int | None

Yes, and below:

resampling_period = timedelta(seconds=resampling_period_s) if resampling_period_s is not None or None
return client.list_single_component_data(
            microgrid_id=microgrid_id,
            component_id=component_id,
            metrics=metrics,
            start_dt=start_dt,
            end_dt=end_dt,
            resampling_period=resampling_period,
            include_states=states,
            include_bounds=bounds,
        )

@flora-hofmann-frequenz flora-hofmann-frequenz added this pull request to the merge queue Nov 20, 2024
Merged via the queue into frequenz-floss:v0.x.x with commit abe0641 Nov 20, 2024
14 checks passed
@flora-hofmann-frequenz flora-hofmann-frequenz deleted the name_resolution branch November 20, 2024 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:docs Affects the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants