smartcontractkit
diff --git a/‎book/src/SUMMARY.md‎
Lines changed: 14 additions & 1 deletion b/‎book/src/SUMMARY.md‎
Lines changed: 14 additions & 1 deletion
diff --git a/‎book/src/libs/wasp/benchspy/first_test.md‎
Lines changed: 99 additions & 0 deletions b/‎book/src/libs/wasp/benchspy/first_test.md‎
Lines changed: 99 additions & 0 deletions
diff --git a/‎book/src/libs/wasp/benchspy/getting_started.md‎
Lines changed: 14 additions & 0 deletions b/‎book/src/libs/wasp/benchspy/getting_started.md‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎book/src/libs/wasp/benchspy/loki_custom.md‎
Lines changed: 38 additions & 0 deletions b/‎book/src/libs/wasp/benchspy/loki_custom.md‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎book/src/libs/wasp/benchspy/loki_std.md‎
Lines changed: 104 additions & 0 deletions b/‎book/src/libs/wasp/benchspy/loki_std.md‎
Lines changed: 104 additions & 0 deletions
diff --git a/‎book/src/libs/wasp/benchspy/overview.md‎
Lines changed: 15 additions & 0 deletions b/‎book/src/libs/wasp/benchspy/overview.md‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎book/src/libs/wasp/benchspy/prometheus_custom.md‎
Lines changed: 81 additions & 0 deletions b/‎book/src/libs/wasp/benchspy/prometheus_custom.md‎
Lines changed: 81 additions & 0 deletions
@@ -69,14 +69,27 @@
       - [Profile](./libs/wasp/components/profile.md)
       - [Sampler](./libs/wasp/components/sampler.md)
       - [Schedule](./libs/wasp/components/schedule.md)
+    - [BenchSpy](./libs/wasp/benchspy/overview.md)
+      - [Getting started](./libs/wasp/benchspy/getting_started.md)
+      - [Your first test](./libs/wasp/benchspy/first_test.md)
+      - [Simplest metrics](./libs/wasp/benchspy/simplest_metrics.md)
+      - [Standard Loki metrics](./libs/wasp/benchspy/loki_std.md)
+      - [Custom Loki metrics](./libs/wasp/benchspy/loki_custom.md)
+      - [Standard Prometheus metrics](./libs/wasp/benchspy/prometheus_std.md)
+      - [Custom Prometheus metrics](./libs/wasp/benchspy/prometheus_custom.md)
+      - [Defining a new report]()
+      - [Adding new QueryExecutor]()
+      - [Adding new storage]()
+      - [Adding new standard load metric]()
+      - [Adding new standard resource metric]()
     - [How to](./libs/wasp/how-to/overview.md)
       - [Start local observability stack](./libs/wasp/how-to/start_local_observability_stack.md)
       - [Try it out quickly](./libs/wasp/how-to/run_included_tests.md)
       - [Chose between RPS and VUs](./libs/wasp/how-to/chose_rps_vu.md)
       - [Define NFRs and check alerts](./libs/wasp/how-to/define_nfr_check_alerts.md)
       - [Use labels](./libs/wasp/how-to/use_labels.md)
       - [Incorporate load tests in your workflow](./libs/wasp/how-to/incorporate_load_tests.md)
-      - [Reuse dashboard components](./libs/wasp/how-to/reuse_dashboard_components.md) 
+      - [Reuse dashboard components](./libs/wasp/how-to/reuse_dashboard_components.md)
       - [Parallelize load](./libs/wasp/how-to/parallelise_load.md)
       - [Debug Loki errors](./libs/wasp/how-to/debug_loki_errors.md)
   - [Havoc](./libs/havoc.md)
 
@@ -0,0 +1,99 @@
+# BenchSpy - Your first test
+
+Let's start with a simplest case, which doesn't require you to have any of the observability stack, but only `WASP` and the application you are testing.
+`BenchSpy` comes with some built-in `QueryExecutors` each of which additionaly has predefined metrics that you can use. One of these executors is the
+`GeneratorQueryExecutor` that fetches metrics directly from `WASP` generators.
+
+Our first test will follow the following logic:
+* Run a simple load test
+* Generate the performance report and store it
+* Run the load again
+* Generate a new report and compare it to the previous one
+
+We will use some very simplified assertions, used only for the sake of example, and expect the performance to remain unchanged.
+
+Let's start by defining and running a generator that will use a mocked service:
+```go
+gen, err := wasp.NewGenerator(&wasp.Config{
+    T:           t,
+    GenName:     "vu",
+    CallTimeout: 100 * time.Millisecond,
+    LoadType:    wasp.VU,
+    Schedule:    wasp.Plain(10, 15*time.Second),
+    VU: wasp.NewMockVU(&wasp.MockVirtualUserConfig{
+        CallSleep: 50 * time.Millisecond,
+    }),
+})
+require.NoError(t, err)
+gen.Run(true)
+```
+
+Now that we have load data, let's generate a baseline performance report and store it in the local storage:
+```go
+fetchCtx, cancelFn := context.WithTimeout(context.Background(), 60*time.Second)
+defer cancelFn()
+
+baseLineReport, err := benchspy.NewStandardReport(
+    // random hash, this should be commit or hash of the Application Under Test (AUT)
+    "e7fc5826a572c09f8b93df3b9f674113372ce924",
+    // use built-in queries for an executor that fetches data directly from the WASP generator
+    benchspy.WithStandardQueryExecutorType(benchspy.StandardQueryExecutor_Generator),
+    // WASP generators
+    benchspy.WithGenerators(gen),
+)
+require.NoError(t, err, "failed to create original report")
+
+fetchErr := baseLineReport.FetchData(fetchCtx)
+require.NoError(t, fetchErr, "failed to fetch data for original report")
+
+path, storeErr := baseLineReport.Store()
+require.NoError(t, storeErr, "failed to store current report", path)
+```
+
+> [!NOTE]
+> There's quite a lot to unpack here and you are enouraged to read more about build-in `QueryExecutors` and
+> standard metrics each comes with [here](./built_in_query_executors.md) and about the `StandardReport` [here](./standard_report.md).
+>
+> For now, it's enough for you to know that standard metrics that `StandardQueryExecutor_Generator` comes with are following:
+> * median latency
+> * p95 latency (95th percentile)
+> * error rate
+
+With baseline report ready let's run the load test again, but this time let's use a wrapper function
+that will automatically load the previous report, generate a new one and make sure that they are actually comparable.
+```go
+// define a new generator using the same config values
+newGen, err := wasp.NewGenerator(&wasp.Config{
+    T:           t,
+    GenName:     "vu",
+    CallTimeout: 100 * time.Millisecond,
+    LoadType:    wasp.VU,
+    Schedule:    wasp.Plain(10, 15*time.Second),
+    VU: wasp.NewMockVU(&wasp.MockVirtualUserConfig{
+        CallSleep: 50 * time.Millisecond,
+    }),
+})
+require.NoError(t, err)
+
+// run the load
+newGen.Run(true)
+
+fetchCtx, cancelFn = context.WithTimeout(context.Background(), 60*time.Second)
+defer cancelFn()
+
+// currentReport is the report that we just created (baseLineReport)
+currentReport, previousReport, err := benchspy.FetchNewStandardReportAndLoadLatestPrevious(
+    fetchCtx,
+    "e7fc5826a572c09f8b93df3b9f674113372ce925",
+    benchspy.WithStandardQueryExecutorType(benchspy.StandardQueryExecutor_Generator),
+    benchspy.WithGenerators(newGen),
+)
+require.NoError(t, err, "failed to fetch current report or load the previous one")
+```
+
+> [!NOTE]
+> In real-world case, once you have the first report generated you should only need to use
+> `benchspy.FetchNewStandardReportAndLoadLatestPrevious` function.
+
+Okay, so we have two reports now, that's great, but how do we make sure that application's performance is as expected?
+You'll find out in the [next chapter](./first_test_comparison.md).
@@ -0,0 +1,14 @@
+# BenchSpy - Getting started
+
+All of the following examples assume that you have access to following applications:
+* Grafana
+* Loki
+* Prometheus
+
+> [!NOTE]
+> The easiest way to run them locally is by using CTFv2's [observability stack](../../../framework/observability/observability_stack.md).
+> Just remember to first install the `CTF CLI` as described in [CTFv2 Getting Started](../../../framework/getting_started.md) chapter.
+
+Since BenchSpy is tightly couplesd with WASP it's highly recommended that you [get familiar with it first](../overview.md), if you haven't yet.
+
+Ready? [Let's go!](./first_test.md)
@@ -0,0 +1,38 @@
+# BenchSpy - Custom Loki metrics
+
+In this chapter we will see how to use custom LogQl queries in the performance report. For this more advanced use case
+we will need to compose the performance report manually.
+
+Load-generation part is the same as in the standard Loki metrics example and thus will be skipped.
+
+Let's define two illustrative metrics now:
+* `vu_over_time` - rate of virtual users generated by WASP, 10 seconds window
+* `responses_over_time` - number of AUT's responses, 1 second window
+
+```go
+lokiQueryExecutor := benchspy.NewLokiQueryExecutor(
+    map[string]string{
+        "vu_over_time":        fmt.Sprintf("max_over_time({branch=~\"%s\", commit=~\"%s\", go_test_name=~\"%s\", test_data_type=~\"stats\", gen_name=~\"%s\"} | json | unwrap current_instances [10s]) by (node_id, go_test_name, gen_name)", label, label, t.Name(), gen.Cfg.GenName),
+        "responses_over_time": fmt.Sprintf("sum(count_over_time({branch=~\"%s\", commit=~\"%s\", go_test_name=~\"%s\", test_data_type=~\"responses\", gen_name=~\"%s\"} [1s])) by (node_id, go_test_name, gen_name)", label, label, t.Name(), gen.Cfg.GenName),
+    },
+    gen.Cfg.LokiConfig,
+)
+```
+
+> [!NOTE]
+> These LogQl queries are using standard labels that `WASP` uses when sending data to Loki.
+
+And create a `StandardReport` using our custom queries:
+```go
+baseLineReport, err := benchspy.NewStandardReport(
+    "2d1fa3532656c51991c0212afce5f80d2914e34e",
+    // notice the different functional option used to pass custom executors
+    benchspy.WithQueryExecutors(lokiQueryExecutor),
+    benchspy.WithGenerators(gen),
+)
+require.NoError(t, err, "failed to create baseline report")
+```
+
+The rest of the code remains basically unchanged (apart from the name of metrics we are asserting on). You can find the full example [here](...).
+
+Now it's time to look at the last of the bundled `QueryExecutors`. Proceed to the [next chapter to read about Prometheus](./prometheus.md).
@@ -0,0 +1,104 @@
+# BenchSpy - Standard Loki metrics
+
+> [!NOTE]
+> This example assumes you have access to Loki and Grafana instances. If you don't
+> find out how to launch them using CTFv2's [observability stack](../../../framework/observability/observability_stack.md).
+
+Our Loki example, will vary from the previous one in just a couple of details:
+* generator will have Loki config
+* standard query executor type will be `benchspy.StandardQueryExecutor_Loki`
+* we will cast all results to `[]string`
+* and calculate medians for all metrics
+
+Ready?
+
+Let's define new load generation first:
+```go
+label := "benchspy-std"
+
+gen, err := wasp.NewGenerator(&wasp.Config{
+    T:          t,
+    // read Loki config from environment
+    LokiConfig: wasp.NewEnvLokiConfig(),
+    GenName:    "vu",
+    // set unique labels
+    Labels: map[string]string{
+        "branch": label,
+        "commit": label,
+    },
+    CallTimeout: 100 * time.Millisecond,
+    LoadType:    wasp.VU,
+    Schedule:    wasp.Plain(10, 15*time.Second),
+    VU: wasp.NewMockVU(&wasp.MockVirtualUserConfig{
+        CallSleep: 50 * time.Millisecond,
+    }),
+})
+require.NoError(t, err)
+```
+
+Now let's run the generator and save baseline report:
+```go
+gen.Run(true)
+
+fetchCtx, cancelFn := context.WithTimeout(context.Background(), 60*time.Second)
+defer cancelFn()
+
+baseLineReport, err := benchspy.NewStandardReport(
+    "c2cf545d733eef8bad51d685fcb302e277d7ca14",
+    // notice the different standard executor type
+    benchspy.WithStandardQueryExecutorType(benchspy.StandardQueryExecutor_Loki),
+    benchspy.WithGenerators(gen),
+)
+require.NoError(t, err, "failed to create original report")
+
+fetchErr := baseLineReport.FetchData(fetchCtx)
+require.NoError(t, fetchErr, "failed to fetch data for original report")
+
+path, storeErr := baseLineReport.Store()
+require.NoError(t, storeErr, "failed to store current report", path)
+```
+
+Since next steps are very similar to the ones used in the first test we will skip them and jump straight
+to metrics comparison.
+
+By default, `LokiQueryExecutor` returns `[]string` data type, so let's use dedicated convenience functions
+to cast them from `interface{}` to string slice:
+```go
+currentAsStringSlice := benchspy.MustAllLokiResults(currentReport)
+previousAsStringSlice := benchspy.MustAllLokiResults(previousReport)
+```
+
+And finally, time to compare metrics. Since we have a `[]string` we will first convert it to `[]float64` and
+then calculate a median and assume it hasn't changed by more than 1%. Again, remember that this is just an illustration.
+You should decide yourself what's the best way to assert the metrics.
+
+```go
+var compareMedian = func(metricName string) {
+    require.NotEmpty(t, currentAsStringSlice[metricName], "%s results were missing from current report", metricName)
+    require.NotEmpty(t, previousAsStringSlice[metricName], "%s results were missing from previous report", metricName)
+
+    currentFloatSlice, err := benchspy.StringSliceToFloat64Slice(currentAsStringSlice[metricName])
+    require.NoError(t, err, "failed to convert %s results to float64 slice", metricName)
+    currentMedian := benchspy.CalculatePercentile(currentFloatSlice, 0.5)
+
+    previousFloatSlice, err := benchspy.StringSliceToFloat64Slice(previousAsStringSlice[metricName])
+    require.NoError(t, err, "failed to convert %s results to float64 slice", metricName)
+    previousMedian := benchspy.CalculatePercentile(previousFloatSlice, 0.5)
+
+    var diffPrecentage float64
+    if previousMedian != 0 {
+        diffPrecentage = (currentMedian - previousMedian) / previousMedian * 100
+    } else {
+        diffPrecentage = currentMedian * 100
+    }
+    assert.LessOrEqual(t, math.Abs(diffPrecentage), 1.0, "%s medians are more than 1% different", metricName, fmt.Sprintf("%.4f", diffPrecentage))
+}
+
+compareMedian(string(benchspy.MedianLatency))
+compareMedian(string(benchspy.Percentile95Latency))
+compareMedian(string(benchspy.ErrorRate))
+```
+
+We have used standard metrics, which are the same as in the first test, now let's see how you can use your custom LogQl queries.
+
+You can find the full example [here](...).
@@ -0,0 +1,15 @@
+# BenchSpy
+
+BenchSpy (short for benchmark spy) is a WASP-coupled tool that allows for easy comparison of various performance metrics.
+It supports three types of data sources:
+* `Loki`
+* `Prometheus`
+* `WASP generators`
+
+And can be easily extended to support additional ones.
+
+Since it's main goal is comparison of performance between various releases or versions of applications (for example, to catch performance degradation)
+it is `Git`-aware and is able to automatically find the latest relevant performance report.
+
+It doesn't come with any comparation logic, other than making sure that performance reports are comparable (e.g. they mesure the same metrics in the same way),
+leaving total freedom to the user.
@@ -0,0 +1,81 @@
+# BenchSpy - Custom Prometheus metrics
+
+Similarly to what we have done with Loki, we can use custom metrics with Prometheus.
+
+Most of the code is the same as in previous example. Differences start with the need to manually
+create a `PrometheusQueryExecutor` with our custom queries:
+
+```go
+// no need to not pass name regexp pattern
+// we provide them directly in custom queries
+promConfig := benchspy.NewPrometheusConfig()
+
+customPrometheus, err := benchspy.NewPrometheusQueryExecutor(
+    map[string]string{
+        // scalar value
+        "95p_cpu_all_containers": "scalar(quantile(0.95, rate(container_cpu_usage_seconds_total{name=~\"node[^0]\"}[5m])) * 100)",
+        // matrix value
+        "cpu_rate_by_container": "rate(container_cpu_usage_seconds_total{name=~\"node[^0]\"}[1m])[30m:1m]",
+    },
+    *promConfig,
+)
+```
+
+Then we pass them as custom query executor:
+```go
+baseLineReport, err := benchspy.NewStandardReport(
+    "91ee9e3c903d52de12f3d0c1a07ac3c2a6d141fb",
+    benchspy.WithQueryExecutors(customPrometheus),
+    benchspy.WithGenerators(gen),
+)
+require.NoError(t, err, "failed to create baseline report")
+```
+
+> [!NOTE]
+> Notice that when using custom Prometheus queries we don't need to pass the `PrometheusConfig`
+> to `NewStandardReport()`, because we have already set it when creating `PrometheusQueryExecutor`.
+
+Fetching of current and previous report remain unchanged, just like getting Prometheus metrics cast
+to it's specific type:
+```go
+currentAsValues := benchspy.MustAllPrometheusResults(currentReport)
+previousAsValues := benchspy.MustAllPrometheusResults(previousReport)
+
+assert.Equal(t, len(currentAsValues), len(previousAsValues), "number of metrics in results should be the same")
+```
+
+But now comes another difference. All standard query results were instances of `model.Vector`. Our two custom queries
+introduce two new types:
+* `model.Matrix`
+* `*model.Scalar`
+
+And these differences are reflected in further casting that we do, before getting final metrics:
+```go
+current95CPUUsage := currentAsValues["95p_cpu_all_containers"]
+previous95CPUUsage := previousAsValues["95p_cpu_all_containers"]
+
+assert.Equal(t, current95CPUUsage.Type(), previous95CPUUsage.Type(), "types of metrics should be the same")
+assert.IsType(t, current95CPUUsage, &model.Scalar{}, "current metric should be a scalar")
+
+currentCPUByContainer := currentAsValues["cpu_rate_by_container"]
+previousCPUByContainer := previousAsValues["cpu_rate_by_container"]
+
+assert.Equal(t, currentCPUByContainer.Type(), previousCPUByContainer.Type(), "types of metrics should be the same")
+assert.IsType(t, currentCPUByContainer, model.Matrix{}, "current metric should be a scalar")
+
+current95CPUUsageAsMatrix := currentCPUByContainer.(model.Matrix)
+previous95CPUUsageAsMatrix := currentCPUByContainer.(model.Matrix)
+
+assert.Equal(t, len(current95CPUUsageAsMatrix), len(previous95CPUUsageAsMatrix), "number of samples in matrices should be the same")
+```
+
+> [!NOTE]
+> When casting to Prometheus' final types it's crucial to remember that two types have pointer receivers and the other two value receivers.
+>
+> Pointer receivers:
+> * `*model.String`
+> * `*model.Scalar`
+>
+> Value receivers:
+> * `model.Vector`
+> * `model.Matrix`