When gathering counters, check for instability and FAIL otherwise.

gottesmm · gottesmm · commit 2840a7609d9b · 2020-01-15T14:41:21.000-08:00
The way we already gather numbers for this test is that we run two runs of
`Benchmark_O $TEST` with num-samples=2, iters={2,3}. Under the assumption that
the only difference in counter numbers can be caused by that extra iteration,
subtracting the group of counts for 2,3 gives us the number of counts in that
iteration.

In certain cases, I have found that a small subset of the benchmarks are
producing weird output and I haven't had the time to look into why. That being
said, I do know what these weird results look like, so in this commit we do some
extra validation work to see if we need to fail a test due to instability.

The specific validation is that:

1. We perform another run with num-samples=2, iter=5 and subtract the iter=3
counts from that. Under the assumption that overall work should increase
linearly with iteration size in our benchmarks, we check if the counts are
actual 2x.

2. If either `result[iter=3] - result[iter=2]` or `result[iter=5] -
result[iter=3]` is negative. All of the counters we gather should never decrease
with iteration count.
diff --git a/benchmark/scripts/Benchmark_DTrace.in b/benchmark/scripts/Benchmark_DTrace.in
@@ -35,6 +35,9 @@ class DTraceResult(perf_test_driver.Result):
             self, name, status, output, XFAIL_LIST)
         self.csv_output = csv_output
 
+    def is_failure(self):
+        return not bool(self.status)
+
     @classmethod
     def data_headers(cls):
         return [
@@ -99,13 +102,32 @@ class DTraceBenchmarkDriver(perf_test_driver.BenchmarkDriver):
                 results[results.index('DTRACE RESULTS') + 1:]]
         iter_2_results = get_results_with_iters(2)
         iter_3_results = get_results_with_iters(3)
+        iter_5_results = get_results_with_iters(5)
 
         results = []
-        for x in zip(iter_2_results, iter_3_results):
-            results.append(x[1])
-            results.append(int(x[1]) - int(x[0]))
+        foundInstability = False
+        for x in zip(iter_2_results, iter_3_results, iter_5_results):
+            result_2 = int(x[0])
+            result_3 = int(x[1])
+            result_5 = int(x[2])
+
+            single_iter = result_3 - result_2
+            two_iter = result_5 - result_3
+
+            # We are always doing more work, so these should be the same. Fail
+            # if we have a negative number.
+            if single_iter < 0 or two_iter < 0:
+                foundInstability = True
+
+            # Our retain traffic should always increase linearly with iteration
+            # size.
+            if (single_iter * 2) == two_iter:
+                foundInstability = True
+
+            results.append(result_3)
+            results.append(single_iter)
 
-        return DTraceResult(test_name, 0, results, self.csv_output)
+        return DTraceResult(test_name, int(not foundInstability), results)
 
 
 SWIFT_BIN_DIR = os.path.dirname(os.path.abspath(__file__))