Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 0 additions & 21 deletions tools/ab_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,12 @@
import os
import statistics
import subprocess
import sys
from collections import defaultdict
from pathlib import Path
from typing import Callable, List, TypeVar

import scipy

# Hack to be able to use our test framework code
sys.path.append(str(Path(__file__).parent.parent / "tests"))

# pylint:disable=wrong-import-position
from framework.properties import global_props
from host_tools.metrics import get_metrics_logger

UNIT_REDUCTIONS = {
"Microseconds": "Milliseconds",
"Milliseconds": "Seconds",
Expand Down Expand Up @@ -264,11 +256,6 @@ def analyze_data(

results = {}

metrics_logger = get_metrics_logger()

for prop_name, prop_val in global_props.__dict__.items():
metrics_logger.set_property(prop_name, prop_val)

for dimension_set in data_a:
metrics_a = data_a[dimension_set]
metrics_b = data_b[dimension_set]
Expand All @@ -281,14 +268,6 @@ def analyze_data(
result = check_regression(
values_a, metrics_b[metric][0], n_resamples=n_resamples
)

metrics_logger.set_dimensions({"metric": metric, **dict(dimension_set)})
metrics_logger.put_metric("p_value", float(result.pvalue), "None")
metrics_logger.put_metric("mean_difference", float(result.statistic), unit)
metrics_logger.set_property("data_a", values_a)
metrics_logger.set_property("data_b", metrics_b[metric][0])
metrics_logger.flush()
Comment on lines -246 to -251
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I ever looked at these metrics, but how will be the way to check these? should we print a report of the A/B run on stdout or on to a file that we can explore in buildkite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can dump the output like this: https://github.com/firecracker-microvm/firecracker/pull/4923/files#diff-edc53a8d8d2432bf93a2590fdb5aac94c515586dbd8f916cb9fda4fc78166e17R295 and it will be included into the uploaded archive. But I don't seet a big reason to do this because we can just rerun A/B script on the downloaded data localy to debug it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, you can always download everything manually and run it locally but that takes a non negligible amount of time. Maybe nobody will ever look at it as for these metrics, but dumping everything to a report (maybe a json one) seems reasonable to me as a first step when understanding A/B results.


results[dimension_set, metric] = (result, unit)

# We sort our A/B-Testing results keyed by metric here. The resulting lists of values
Expand Down