Skip to content

Commit d415e31

Browse files
authored
[TT-1741] performance comparison tool (#1424)
benchspy
1 parent ef26f18 commit d415e31

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+9311
-303
lines changed

.github/workflows/generate-go-docs.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ jobs:
3131
GOPRIVATE: github.com/smartcontractkit/generate-go-function-docs
3232
run: |
3333
git config --global url."https://x-access-token:${{ steps.setup-github-token-read.outputs.access-token }}@github.com/".insteadOf "https://github.com/"
34-
go install github.com/smartcontractkit/[email protected].1
34+
go install github.com/smartcontractkit/[email protected].2
3535
go install github.com/jmank88/[email protected]
3636
go install golang.org/x/tools/gopls@latest
3737
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: WASP's BenchSpy Go Tests
2+
on: [push]
3+
concurrency:
4+
group: ${{ github.workflow }}-${{ github.ref }}
5+
cancel-in-progress: true
6+
jobs:
7+
test:
8+
defaults:
9+
run:
10+
working-directory: wasp
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v3
14+
- uses: dorny/paths-filter@v3
15+
id: changes
16+
with:
17+
filters: |
18+
src:
19+
- 'wasp/benchspy/**'
20+
- uses: cachix/install-nix-action@08dcb3a5e62fa31e2da3d490afc4176ef55ecd72 # v30
21+
if: steps.changes.outputs.src == 'true'
22+
with:
23+
nix_path: nixpkgs=channel:nixos-unstable
24+
- name: Run tests
25+
if: steps.changes.outputs.src == 'true'
26+
run: |-
27+
nix develop -c make test_benchspy_race

.github/workflows/wasp-test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ jobs:
88
defaults:
99
run:
1010
working-directory: wasp
11-
runs-on: ubuntu-latest
11+
runs-on: ubuntu22.04-16cores-64GB
1212
steps:
1313
- uses: actions/checkout@v3
1414
- uses: dorny/paths-filter@v3

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ artifacts/
1919

2020
# Output of the go coverage tool, specifically when used with LiteIDE
2121
*.out
22+
cover.html
2223

2324
# Dependency directories (remove the comment below to include it)
2425
# vendor/

.nancy-ignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,5 @@ CVE-2024-24786 # CWE-835 Loop with Unreachable Exit Condition ('Infinite Loop')
1212
CVE-2024-32972 # CWE-400: Uncontrolled Resource Consumption ('Resource Exhaustion') [still not fixed, not even in v1.13.8]
1313
CVE-2023-42319 # CWE-noinfo: lol... go-ethereum v1.13.8 again
1414
CVE-2024-10086 # Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')
15-
CVE-2024-51744 # CWE-755: Improper Handling of Exceptional Conditions
15+
CVE-2024-51744 # CWE-755: Improper Handling of Exceptional Conditions
16+
CVE-2024-45338 # CWE-770: Allocation of Resources Without Limits or Throttling

book/src/SUMMARY.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,23 @@
6969
- [Profile](./libs/wasp/components/profile.md)
7070
- [Sampler](./libs/wasp/components/sampler.md)
7171
- [Schedule](./libs/wasp/components/schedule.md)
72+
- [BenchSpy](./libs/wasp/benchspy/overview.md)
73+
- [Getting started](./libs/wasp/benchspy/getting_started.md)
74+
- [Your first test](./libs/wasp/benchspy/first_test.md)
75+
- [Simplest metrics](./libs/wasp/benchspy/simplest_metrics.md)
76+
- [Standard Loki metrics](./libs/wasp/benchspy/loki_std.md)
77+
- [Custom Loki metrics](./libs/wasp/benchspy/loki_custom.md)
78+
- [Standard Prometheus metrics](./libs/wasp/benchspy/prometheus_std.md)
79+
- [Custom Prometheus metrics](./libs/wasp/benchspy/prometheus_custom.md)
80+
- [To Loki or not to Loki?](./libs/wasp/benchspy/loki_dillema.md)
81+
- [Real world example](./libs/wasp/benchspy/real_world.md)
82+
- [Reports](./libs/wasp/benchspy/reports/overview.md)
83+
- [Standard Report](./libs/wasp/benchspy/reports/standard_report.md)
84+
- [Adding new QueryExecutor](./libs/wasp/benchspy/reports/new_executor.md)
85+
- [Adding new standard load metric]()
86+
- [Adding new standard resource metric]()
87+
- [Defining a new report](./libs/wasp/benchspy/reports/new_report.md)
88+
- [Adding new storage]()
7289
- [How to](./libs/wasp/how-to/overview.md)
7390
- [Start local observability stack](./libs/wasp/how-to/start_local_observability_stack.md)
7491
- [Try it out quickly](./libs/wasp/how-to/run_included_tests.md)
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# BenchSpy - Your First Test
2+
3+
Let's start with the simplest case, which doesn't require any part of the observability stack—only `WASP` and the application you are testing.
4+
`BenchSpy` comes with built-in `QueryExecutors`, each of which also has predefined metrics that you can use. One of these executors is the `DirectQueryExecutor`, which fetches metrics directly from `WASP` generators,
5+
which means you can run it with Loki.
6+
7+
> [!NOTE]
8+
> Not sure whether to use `Loki` or `Direct` query executors? [Read this!](./loki_dillema.md)
9+
10+
## Test Overview
11+
12+
Our first test will follow this logic:
13+
- Run a simple load test.
14+
- Generate a performance report and store it.
15+
- Run the load test again.
16+
- Generate a new report and compare it to the previous one.
17+
18+
We'll use very simplified assertions for this example and expect the performance to remain unchanged.
19+
20+
### Step 1: Define and Run a Generator
21+
22+
Let's start by defining and running a generator that uses a mocked service:
23+
24+
```go
25+
gen, err := wasp.NewGenerator(&wasp.Config{
26+
T: t,
27+
GenName: "vu",
28+
CallTimeout: 100 * time.Millisecond,
29+
LoadType: wasp.VU,
30+
Schedule: wasp.Plain(10, 15*time.Second),
31+
VU: wasp.NewMockVU(&wasp.MockVirtualUserConfig{
32+
CallSleep: 50 * time.Millisecond,
33+
}),
34+
})
35+
require.NoError(t, err)
36+
gen.Run(true)
37+
```
38+
39+
### Step 2: Generate a Baseline Performance Report
40+
41+
With load data available, let's generate a baseline performance report and store it in local storage:
42+
43+
```go
44+
baseLineReport, err := benchspy.NewStandardReport(
45+
// random hash, this should be the commit or hash of the Application Under Test (AUT)
46+
"v1.0.0",
47+
// use built-in queries for an executor that fetches data directly from the WASP generator
48+
benchspy.WithStandardQueries(benchspy.StandardQueryExecutor_Direct),
49+
// WASP generators
50+
benchspy.WithGenerators(gen),
51+
)
52+
require.NoError(t, err, "failed to create baseline report")
53+
54+
fetchCtx, cancelFn := context.WithTimeout(context.Background(), 60*time.Second)
55+
defer cancelFn()
56+
57+
fetchErr := baseLineReport.FetchData(fetchCtx)
58+
require.NoError(t, fetchErr, "failed to fetch data for baseline report")
59+
60+
path, storeErr := baseLineReport.Store()
61+
require.NoError(t, storeErr, "failed to store baseline report", path)
62+
```
63+
64+
> [!NOTE]
65+
> There's a lot to unpack here, and you're encouraged to read more about the built-in `QueryExecutors` and the standard metrics they provide as well as about the `StandardReport` [here](./reports/standard_report.md).
66+
>
67+
> For now, it's enough to know that the standard metrics provided by `StandardQueryExecutor_Direct` include:
68+
> - Median latency
69+
> - P95 latency (95th percentile)
70+
> - Max latency
71+
> - Error rate
72+
73+
### Step 3: Run the Test Again and Compare Reports
74+
75+
With the baseline report ready, let's run the load test again. This time, we'll use a wrapper function to automatically load the previous report, generate a new one, and ensure they are comparable.
76+
77+
```go
78+
// define a new generator using the same config values
79+
newGen, err := wasp.NewGenerator(&wasp.Config{
80+
T: t,
81+
GenName: "vu",
82+
CallTimeout: 100 * time.Millisecond,
83+
LoadType: wasp.VU,
84+
Schedule: wasp.Plain(10, 15*time.Second),
85+
VU: wasp.NewMockVU(&wasp.MockVirtualUserConfig{
86+
CallSleep: 50 * time.Millisecond,
87+
}),
88+
})
89+
require.NoError(t, err)
90+
91+
// run the load
92+
newGen.Run(true)
93+
94+
fetchCtx, cancelFn = context.WithTimeout(context.Background(), 60*time.Second)
95+
defer cancelFn()
96+
97+
// currentReport is the report that we just created (baseLineReport)
98+
currentReport, previousReport, err := benchspy.FetchNewStandardReportAndLoadLatestPrevious(
99+
fetchCtx,
100+
// commit or tag of the new application version
101+
"v2.0.0",
102+
benchspy.WithStandardQueries(benchspy.StandardQueryExecutor_Direct),
103+
benchspy.WithGenerators(newGen),
104+
)
105+
require.NoError(t, err, "failed to fetch current report or load the previous one")
106+
```
107+
108+
> [!NOTE]
109+
> In a real-world case, once you've generated the first report, you should only need to use the `benchspy.FetchNewStandardReportAndLoadLatestPrevious` function.
110+
111+
### What's Next?
112+
113+
Now that we have two reports, how do we ensure that the application's performance meets expectations?
114+
Find out in the [next chapter](./simplest_metrics.md).
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# BenchSpy - Getting Started
2+
3+
The following examples assume you have access to the following applications:
4+
- Grafana
5+
- Loki
6+
- Prometheus
7+
8+
> [!NOTE]
9+
> The easiest way to run these locally is by using CTFv2's [observability stack](../../../framework/observability/observability_stack.md).
10+
> Be sure to install the `CTF CLI` first, as described in the [CTFv2 Getting Started](../../../framework/getting_started.md) guide.
11+
12+
Since BenchSpy is tightly coupled with WASP, we highly recommend that you [get familiar with it first](../overview.md) if you haven't already.
13+
14+
Ready? [Let's get started!](./first_test.md)
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# BenchSpy - Custom Loki Metrics
2+
3+
In this chapter, we’ll explore how to use custom `LogQL` queries in the performance report. For this more advanced use case, we’ll manually compose the performance report.
4+
5+
The load generation part is the same as in the standard Loki metrics example and will be skipped.
6+
7+
## Defining Custom Metrics
8+
9+
Let’s define two illustrative metrics:
10+
- **`vu_over_time`**: The rate of virtual users generated by WASP, using a 10-second window.
11+
- **`responses_over_time`**: The number of AUT's responses, using a 1-second window.
12+
13+
```go
14+
lokiQueryExecutor := benchspy.NewLokiQueryExecutor(
15+
map[string]string{
16+
"vu_over_time": fmt.Sprintf("max_over_time({branch=~\"%s\", commit=~\"%s\", go_test_name=~\"%s\", test_data_type=~\"stats\", gen_name=~\"%s\"} | json | unwrap current_instances [10s]) by (node_id, go_test_name, gen_name)", label, label, t.Name(), gen.Cfg.GenName),
17+
"responses_over_time": fmt.Sprintf("sum(count_over_time({branch=~\"%s\", commit=~\"%s\", go_test_name=~\"%s\", test_data_type=~\"responses\", gen_name=~\"%s\"} [1s])) by (node_id, go_test_name, gen_name)", label, label, t.Name(), gen.Cfg.GenName),
18+
},
19+
gen.Cfg.LokiConfig,
20+
)
21+
```
22+
23+
> [!NOTE]
24+
> These `LogQL` queries use the standard labels that `WASP` applies when sending data to Loki.
25+
26+
## Creating a `StandardReport` with Custom Queries
27+
28+
Now, let’s create a `StandardReport` using our custom queries:
29+
30+
```go
31+
baseLineReport, err := benchspy.NewStandardReport(
32+
"v1.0.0",
33+
// notice the different functional option used to pass Loki executor with custom queries
34+
benchspy.WithQueryExecutors(lokiQueryExecutor),
35+
benchspy.WithGenerators(gen),
36+
)
37+
require.NoError(t, err, "failed to create baseline report")
38+
```
39+
40+
## Wrapping Up
41+
42+
The rest of the code remains unchanged, except for the names of the metrics being asserted. You can find the full example [here](...).
43+
44+
Now it’s time to look at the last of the bundled `QueryExecutors`. Proceed to the [next chapter to read about Prometheus](./prometheus_std.md).
45+
46+
> [!NOTE]
47+
> You can find the full example [here](https://github.com/smartcontractkit/chainlink-testing-framework/tree/main/wasp/examples/benchspy/loki_query_executor/loki_query_executor_test.go).
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# BenchSpy - To Loki or Not to Loki?
2+
3+
You might be wondering whether to use the `Loki` or `Direct` query executor if all you need are basic latency metrics.
4+
5+
## Rule of Thumb
6+
7+
You should opt for the `Direct` query executor if all you need is a single number, such as the median latency or error rate, and you're not interested in:
8+
- Comparing time series directly,
9+
- Examining minimum or maximum values over time, or
10+
- Performing advanced calculations on raw data,
11+
12+
## Why Choose `Direct`?
13+
14+
The `Direct` executor returns a single value for each standard metric using the same raw data that Loki would use. It accesses data stored in the `WASP` generator, which is later pushed to Loki.
15+
16+
This means you can:
17+
- Run your load test without a Loki instance.
18+
- Avoid calculating metrics like the median, 95th percentile latency, or error ratio yourself.
19+
20+
By using `Direct`, you save resources and simplify the process when advanced analysis isn't required.
21+
22+
> [!WARNING]
23+
> Metrics calculated by the two query executors may differ slightly due to differences in their data processing and calculation methods:
24+
> - **`Direct` QueryExecutor**: This method processes all individual data points from the raw dataset, ensuring that every value is taken into account for calculations like averages, percentiles, or other statistics. It provides the most granular and precise results but may also be more sensitive to outliers and noise in the data.
25+
> - **`Loki` QueryExecutor**: This method aggregates data using a default window size of 10 seconds. Within each window, multiple raw data points are combined (e.g., through averaging, summing, or other aggregation functions), which reduces the granularity of the dataset. While this approach can improve performance and reduce noise, it also smooths the data, which may obscure outliers or small-scale variability.
26+
27+
> #### Why This Matters for Percentiles:
28+
> Percentiles, such as the 95th percentile (p95), are particularly sensitive to the granularity of the input data:
29+
> - In the **`Direct` QueryExecutor**, the p95 is calculated across all raw data points, capturing the true variability of the dataset, including any extreme values or spikes.
30+
> - In the **`Loki` QueryExecutor**, the p95 is calculated over aggregated data (i.e. using the 10-second window). As a result, the raw values within each window are smoothed into a single representative value, potentially lowering or altering the calculated p95. For example, an outlier that would significantly affect the p95 in the `Direct` calculation might be averaged out in the `Loki` window, leading to a slightly lower percentile value.
31+
32+
> #### Direct caveats:
33+
> - **buffer limitations:** `WASP` generator use a [StringBuffer](https://github.com/smartcontractkit/chainlink-testing-framework/blob/main/wasp/buffer.go) with fixed size to store the responses. Once full capacity is reached
34+
> oldest entries are replaced with incoming ones. The size of the buffer can be set in generator's config. By default, it is limited to 50k entries to lower resource consumption and potential OOMs.
35+
>
36+
> - **sampling:** `WASP` generators support optional sampling of successful responses. It is disabled by deafult, but if you do enable it, then the calculations would no longer be done over a full dataset.
37+
38+
> #### Key Takeaway:
39+
> The difference arises because `Direct` prioritizes precision by using raw data, while `Loki` prioritizes efficiency and scalability by using aggregated data. When interpreting results, it’s essential to consider how the smoothing effect of `Loki` might impact the representation of variability or extremes in the dataset. This is especially important for metrics like percentiles, where such details can significantly influence the outcome.

0 commit comments

Comments
 (0)