-
Notifications
You must be signed in to change notification settings - Fork 24
chore: refactor metrics classes [DO NOT MERGE WITHOUT CHANGING BASE TO MAIN] #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: chore/rename-path-constants
Are you sure you want to change the base?
chore: refactor metrics classes [DO NOT MERGE WITHOUT CHANGING BASE TO MAIN] #613
Conversation
Test results for commit 4a25772Test coverage for 4a25772
Static code analysis report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Some questions/remarks about semantic changes, otherwise it looks good.
def installation_start_timestamp(self) -> NonNegativeFloat: # type: ignore | ||
"""UNIX timestamp of in which the VM setup started.""" | ||
|
||
@property | ||
def installation_end_timestamp(self) -> NonNegativeFloat | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before it was
installation_start_timestamp: NonNegativeFloat | None
installed_timestamp: NonNegativeFloat
so it seems it should be
def installation_start_timestamp(self) -> NonNegativeFloat: # type: ignore | |
"""UNIX timestamp of in which the VM setup started.""" | |
@property | |
def installation_end_timestamp(self) -> NonNegativeFloat | None: | |
def installation_start_timestamp(self) -> NonNegativeFloat | None: # type: ignore | |
"""UNIX timestamp of in which the VM setup started.""" | |
@property | |
def installation_end_timestamp(self) -> NonNegativeFloat: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous implementation as wrong (diff here).
Installation start timestamp is referring to the OpenStack VM created_at
timestamp which the type hint suggests that it cannot be None
.
Installation end timestamp, however, is a metric that is pulled over SSH connection which could be None
due to some failures.
One question I have is whether Installation start timestamp (created_at timestamp) from OpenStack VM might be None
which I don't think it shoud?
) | ||
return None | ||
@property | ||
def installation_end_timestamp(self) -> NonNegativeFloat | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def installation_end_timestamp(self) -> NonNegativeFloat | None: | |
def installation_end_timestamp(self) -> NonNegativeFloat: |
If I am correct with my other comment above
@@ -385,12 +401,12 @@ def _issue_runner_installed( | |||
Returns: | |||
The type of the issued event. | |||
""" | |||
installation_end_timestamp = runner_metrics.installation_end_timestamp or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
installation_end_timestamp = runner_metrics.installation_end_timestamp or 0 | |
installation_end_timestamp = runner_metrics.installation_end_timestamp |
if I am correct with my other comment above
flavor=flavor, | ||
# the installation_start_timestamp should be present | ||
duration=runner_metrics.installed_timestamp # type: ignore | ||
- runner_metrics.installation_start_timestamp, # type: ignore | ||
duration=max(installation_end_timestamp - runner_metrics.installation_start_timestamp, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the max change here, which looks like a semantic change?
@@ -466,15 +482,17 @@ def _create_runner_start( | |||
# might be higher than the pre-job timestamp. This is due to the fact that we issue the runner | |||
# installed timestamp for Openstack after waiting with delays for the runner to be ready. | |||
# We set the idle_duration to 0 in this case. | |||
if pre_job_metrics.timestamp < runner_metrics.installed_timestamp: | |||
if pre_job_metrics.timestamp < (runner_metrics.installation_end_timestamp or 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if pre_job_metrics.timestamp < (runner_metrics.installation_end_timestamp or 0): | |
if pre_job_metrics.timestamp < runner_metrics.installation_end_timestamp: |
if I am correct with my comment above that the timestamp should always be present
if timestamp := metrics_contents_map.get(RUNNER_INSTALLED_TS_FILE_PATH, None): | ||
try: | ||
runner_installed_timestamp = float(timestamp) | ||
except ValueError: | ||
logger.warning("Corrupt runner installed timestamp: %s", timestamp) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a semantic change: Before the PR , if the timestamp was not present, we would not issue any metrics at all
if self.runner_installed is None:
logger.error(
"Invalid pulled metrics. No runner_installed information for %s.", instance_id
)
return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a comment after discussion:
This semantics change is proposed to convey metrics for instances that may have failed to initialize.
Conveying None
values for all metrics would suggest that some step has gone wrong and let us decide to perhaps put it in +inf
bucket.
This metric would show us how many times we were failing to either instantiate or SSH into the machine to fetch the metrics, helping us decide the next move.
], | ||
) | ||
def test_issue_events_partial_metrics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems gone, are these cases covered in the new tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, they have been better parametrized by the "state" input, rather than testing for the internal function calls.
The parametrized tests are:
- no instances
- single instance, no metrics
- single instance, partial metrics(POST_JOB_METRICS_FILE_PATH): post job only
- single instance, partial metrics(PRE_JOB_METRICS_FILE_PATH): pre job only
- single instance, partial metrics(RUNNER_INSTALLED_TS_FILE_PATH): runner installed timestamp only
- single instance, all metrics
- multi instance, all metrics
These cases more than cover the existing tests, but go beyond and extend the combinatorics.
Applicable spec:
Overview
RunnerMetrics
into an interface READONLY dataclass using Protocol + @Property_pull_runner_metrics
functionPulledMetrics
classrunner_installed
& other data attributes to be more descriptiveRationale
Juju Events Changes
Module Changes
Library Changes
Checklist
urgent
,trivial
,complex
).github-runner-manager/pyproject.toml
.No user facing changes