-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[A/B] Update script to use stored metrics.json
#5481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5481 +/- ##
==========================================
- Coverage 82.74% 82.73% -0.01%
==========================================
Files 269 269
Lines 27798 27798
==========================================
- Hits 23001 23000 -1
- Misses 4797 4798 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c65891b
to
75b4b5b
Compare
Since we store metrics for each test directly, we don't need to parse emf logs from test-report.json. Replace the logic of gathering metrics in ab_test.py to look at all `metrics.json` files instead. Signed-off-by: Egor Lazarchuk <[email protected]>
This function was only used in ab_test.py. Signed-off-by: Egor Lazarchuk <[email protected]>
This functions was only used in ab_test.py Signed-off-by: Egor Lazarchuk <[email protected]>
This function was only used in ab_test.py Signed-off-by: Egor Lazarchuk <[email protected]>
There is no reason to emit metrics when running the A/B script. This is also the last peice imported from test framework, so remove the hack used to be able to import things from it. Signed-off-by: Egor Lazarchuk <[email protected]>
This functions was not used anymore Signed-off-by: Egor Lazarchuk <[email protected]>
75b4b5b
to
2a5d9e5
Compare
metrics_logger.set_dimensions({"metric": metric, **dict(dimension_set)}) | ||
metrics_logger.put_metric("p_value", float(result.pvalue), "None") | ||
metrics_logger.put_metric("mean_difference", float(result.statistic), unit) | ||
metrics_logger.set_property("data_a", values_a) | ||
metrics_logger.set_property("data_b", metrics_b[metric][0]) | ||
metrics_logger.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I ever looked at these metrics, but how will be the way to check these? should we print a report of the A/B run on stdout or on to a file that we can explore in buildkite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can dump the output like this: https://github.com/firecracker-microvm/firecracker/pull/4923/files#diff-edc53a8d8d2432bf93a2590fdb5aac94c515586dbd8f916cb9fda4fc78166e17R295 and it will be included into the uploaded archive. But I don't seet a big reason to do this because we can just rerun A/B script on the downloaded data localy to debug it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, you can always download everything manually and run it locally but that takes a non negligible amount of time. Maybe nobody will ever look at it as for these metrics, but dumping everything to a report (maybe a json one) seems reasonable to me as a first step when understanding A/B results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. While we're changing this, maybe we can add a small report of the A/B run. I found out here we have some metrics that I never looked into, but something like that could be useful to debug the A/B. For example, only if the test fails we get the report right now, we could print all the other into a file so that we can check it when we are debugging a regression (maybe it wasn't significant enough but A/B found it nonetheless).
Changes
Update the
ab_test.py
script to utilizemetrics.json
files instead oftest-report.json
to obtain metrics emitted by tests.Also do minor clean up of the script, removing the hack for importing our test framework into the script.
Reason
Simplification.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkbuild --all
to verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md
.Runbook for Firecracker API changes.
integration tests.
TODO
.rust-vmm
.