You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book/src/libs/wasp/benchspy/loki_dillema.md
+8-4Lines changed: 8 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,13 +4,11 @@ You might be wondering whether to use the `Loki` or `Direct` query executor if a
4
4
5
5
## Rule of Thumb
6
6
7
-
If all you need is a single number, such as the median latency or error rate, and you're not interested in:
7
+
You should opt for the `Direct` query executor if all you need is a single number, such as the median latency or error rate, and you're not interested in:
8
8
- Comparing time series directly,
9
-
- Examining minimum or maximum values, or
9
+
- Examining minimum or maximum values over time, or
10
10
- Performing advanced calculations on raw data,
11
11
12
-
then you should opt for the `Direct` query executor.
13
-
14
12
## Why Choose `Direct`?
15
13
16
14
The `Direct` executor returns a single value for each standard metric using the same raw data that Loki would use. It accesses data stored in the `WASP` generator, which is later pushed to Loki.
@@ -31,5 +29,11 @@ By using `Direct`, you save resources and simplify the process when advanced ana
31
29
> - In the **`Direct` QueryExecutor**, the p95 is calculated across all raw data points, capturing the true variability of the dataset, including any extreme values or spikes.
32
30
> - In the **`Loki` QueryExecutor**, the p95 is calculated over aggregated data (i.e. using the 10-second window). As a result, the raw values within each window are smoothed into a single representative value, potentially lowering or altering the calculated p95. For example, an outlier that would significantly affect the p95 in the `Direct` calculation might be averaged out in the `Loki` window, leading to a slightly lower percentile value.
33
31
32
+
> #### Direct caveats:
33
+
> -**buffer limitations:**`WASP` generator use a [StringBuffer](https://github.com/smartcontractkit/chainlink-testing-framework/blob/main/wasp/buffer.go) with fixed size to store the responses. Once full capacity is reached
34
+
> oldest entries are replaced with incoming ones. The size of the buffer can be set in generator's config. By default, it is limited to 50k entries to lower resource consumption and potential OOMs.
35
+
>
36
+
> -**sampling:**`WASP` generators support optional sampling of successful responses. It is disabled by deafult, but if you do enable it, then the calculations would no longer be done over a full dataset.
37
+
34
38
> #### Key Takeaway:
35
39
> The difference arises because `Direct` prioritizes precision by using raw data, while `Loki` prioritizes efficiency and scalability by using aggregated data. When interpreting results, it’s essential to consider how the smoothing effect of `Loki` might impact the representation of variability or extremes in the dataset. This is especially important for metrics like percentiles, where such details can significantly influence the outcome.
Copy file name to clipboardExpand all lines: book/src/libs/wasp/benchspy/loki_std.md
+21-16Lines changed: 21 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -62,7 +62,6 @@ require.NoError(t, storeErr, "failed to store baseline report", path)
62
62
```
63
63
64
64
## Step 3: Skip to Metrics Comparison
65
-
66
65
Since the next steps are very similar to those in the first test, we’ll skip them and go straight to metrics comparison.
67
66
68
67
By default, the `LokiQueryExecutor` returns results as the `[]string` data type. Let’s use dedicated convenience functions to cast them from `interface{}` to string slices:
Now, let’s compare metrics. Since we have `[]string`, we’ll first convert it to `[]float64`, calculate the median, and ensure the difference between the medians is less than 1%. Again, this is just an example—you should decide the best way to validate your metrics.
76
+
Now, let’s compare metrics. Since we have `[]string`, we’ll first convert it to `[]float64`, calculate the median, and ensure the difference between the averages is less than 1%. Again, this is just an example—you should decide the best way to validate your metrics. Here we are explicitly aggregating them using an average to get a single number representation of each metric, but for your case a median or percentile or yet some other aggregate might be more appropriate.
78
77
79
78
```go
80
-
varcompareMedian = func(metricName string) {
81
-
require.NotEmpty(t, currentAsStringSlice[metricName], "%s results were missing from current report", metricName)
82
-
require.NotEmpty(t, previousAsStringSlice[metricName], "%s results were missing from previous report", metricName)
> Standard Loki metrics are all calculated using a 10 seconds moving window, which results in smoothing of values due to aggregation.
110
112
> To learn what that means in details, please refer to [To Loki or Not to Loki](./loki_dillema.md) chapter.
113
+
>
114
+
> Also, due to the HTTP API endpoint used, namely the `query_range`, all query results **are always returned as a slice**. Execution of **instant queries**
115
+
> that return a single data point is currently **not supported**.
Copy file name to clipboardExpand all lines: book/src/libs/wasp/benchspy/overview.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,10 @@ BenchSpy (short for Benchmark Spy) is a [WASP](../overview.md)-coupled tool desi
10
10
-**Standard/pre-defined metrics** for each data source.
11
11
-**Ease of extensibility** with custom metrics.
12
12
-**Ability to load the latest performance report** based on Git history.
13
-
-**88% unit test coverage**.
14
13
15
-
BenchSpy does not include any built-in comparison logic beyond ensuring that performance reports are comparable (e.g., they measure the same metrics in the same way), offering complete freedom to the user for interpretation and analysis.
14
+
BenchSpy does not include any built-in comparison logic beyond ensuring that performance reports are comparable (e.g., they measure the same metrics in the same way), offering complete freedom to the user for interpretation and analysis.
15
+
16
+
## Why could you need it?
17
+
`BenchSpy` was created with two main goals in mind:
Done, you're ready to use `BenchSpy` to make sure that the performance of your application didn't degrade below your chosen thresholds!
89
+
90
+
> [!NOTE]
91
+
> You can find a test example, where the performance has degraded significantly [here](https://github.com/smartcontractkit/chainlink-testing-framework/tree/main/wasp/examples/benchspy/direct_query_executor/direct_query_real_case.go).
92
+
>
93
+
> This test passes, because we expect the performance to be worse. This is, of course, the opposite what you should do in case of a real application :-)
Copy file name to clipboardExpand all lines: book/src/libs/wasp/benchspy/simplest_metrics.md
+18-39Lines changed: 18 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,54 +6,33 @@ For example, if your query returns a time series, you could:
6
6
- Compare each data point in the time series individually.
7
7
- Compare aggregates like averages, medians, or min/max values of the time series.
8
8
9
-
Each of these approaches has its pros and cons, and `BenchSpy` doesn't make any judgments here. In this example, we'll use a very simplified approach, which **should not be treated** as a gold standard. In our case, the `QueryExecutor` returns a single data point for each metric, eliminating the complexity. However, with `Loki` and `Prometheus`, things can get more complicated.
10
-
11
9
## Working with Built-in `QueryExecutors`
12
-
13
-
Since each built-in `QueryExecutor` returns a different data type, and we use the `interface{}` type to reflect this, convenience functions help cast these results into more usable types:
10
+
Each built-in `QueryExecutor` returns a different data type, and we use the `interface{}` type to reflect this. Since `Direct` executor always returns `float64` we have added a convenience function
11
+
that checks whether any of the standard metrics has **degraded** more than the threshold. If the performance has improved, no error will be returned.
assert.LessOrEqual(t, math.Abs(diffPercentage), maxDiffPercentage, "%s medians are more than 1% different", metricName, fmt.Sprintf("%.4f", diffPercentage))
> Both `Direct` and `Loki` query executors support following standard performance metrics out of the box:
28
+
> -`median_latency`
29
+
> -`p95_latency`
30
+
> -`max_latency`
31
+
> -`error_rate`
53
32
54
33
## Wrapping Up
55
34
56
-
And that's it! You've written your first test that uses `WASP` to generate load and `BenchSpy` to ensure that the median latency, 95th percentile latency, and error rate haven't changed significantly between runs. You accomplished this without even needing a Loki instance. But what if you wanted to leverage the power of `LogQL`? We'll explore that in the [next chapter](./loki_std.md).
35
+
And that's it! You've written your first test that uses `WASP` to generate load and `BenchSpy` to ensure that the median latency, 95th percentile latency, max latency and error rate haven't changed significantly between runs. You accomplished this without even needing a Loki instance. But what if you wanted to leverage the power of `LogQL`? We'll explore that in the [next chapter](./loki_std.md).
57
36
58
37
> [!NOTE]
59
38
> You can find the full example [here](https://github.com/smartcontractkit/chainlink-testing-framework/tree/main/wasp/examples/benchspy/direct_query_executor/direct_query_executor_test.go).
0 commit comments