Skip to content

Commit 7c8b651

Browse files
authored
Add a new benchtool client (#164)
* feat: initial working version of benchtool Signed-off-by: Jacob Lisi <[email protected]> * fix(benchtool): register batch-size as default Signed-off-by: Jacob Lisi <[email protected]> * fix(benchtool): ensure write loop actually loops Signed-off-by: Jacob Lisi <[email protected]> * feat(benchtool): add logging and instrumentation Signed-off-by: Jacob Lisi <[email protected]> * fix(benchtool): handle case where batchsize is bigger than timeseries Signed-off-by: Jacob Lisi <[email protected]> * fix(benchtool): instantiate client map Signed-off-by: Jacob Lisi <[email protected]> * chore(benchtool): make noisy log line debug level Signed-off-by: Jacob Lisi <[email protected]> * chore(benchtool): remove ioutil Signed-off-by: Jacob Lisi <[email protected]> * chore(benchtool): use Go 1.16 Signed-off-by: Jacob Lisi <[email protected]> * feat(benchtool): instrument remote-write client Signed-off-by: Jacob Lisi <[email protected]> * fix(benchtool): fix how metrics are instrumented Signed-off-by: Jacob Lisi <[email protected]> * feat(benchtool): use resp.StatusCode in prom label Signed-off-by: Jacob Lisi <[email protected]> * feat(benchtool): ensure labelset is used correctly Signed-off-by: Jacob Lisi <[email protected]> * feat(benchtool): add logging config options Signed-off-by: Jacob Lisi <[email protected]> * add series type map to workload Signed-off-by: Jacob Lisi <[email protected]> * add ring-checking module Signed-off-by: Jacob Lisi <[email protected]> * fix issue with null workload Signed-off-by: Jacob Lisi <[email protected]> * remove early check Signed-off-by: Jacob Lisi <[email protected]> * actual unique metric names Signed-off-by: Jacob Lisi <[email protected]> * start ring Signed-off-by: Jacob Lisi <[email protected]> * actual unique metric names Signed-off-by: Jacob Lisi <[email protected]> * single worker at a time for remote write requests Signed-off-by: Jacob Lisi <[email protected]> * reduce the number of concurrent writers and set default timeout equal to interval Signed-off-by: Jacob Lisi <[email protected]> * increase the number of workers Signed-off-by: Jacob Lisi <[email protected]> * remove work pool and use errgroup Signed-off-by: Jacob Lisi <[email protected]> * use a closure Signed-off-by: Jacob Lisi <[email protected]> * readd a worker channel Signed-off-by: Jacob Lisi <[email protected]> * refactor bench runner into separate files Signed-off-by: Jacob Lisi <[email protected]> * rename write_bench.go --> bench.go Signed-off-by: Jacob Lisi <[email protected]> * start queries Signed-off-by: Jacob Lisi <[email protected]> * stop point for query workload Signed-off-by: Jacob Lisi <[email protected]> * add random seed for query workload using adler32 checksum and id Signed-off-by: Jacob Lisi <[email protected]> * improve further Signed-off-by: Jacob Lisi <[email protected]> * fix no address panic Signed-off-by: Jacob Lisi <[email protected]> * fix address resolution Signed-off-by: Jacob Lisi <[email protected]> * fix address resolution Signed-off-by: Jacob Lisi <[email protected]> * fix address resolution Signed-off-by: Jacob Lisi <[email protected]> * add jitter to queries Signed-off-by: Jacob Lisi <[email protected]> * better buckets Signed-off-by: Jacob Lisi <[email protected]> * increase parallel queries Signed-off-by: Jacob Lisi <[email protected]> * move write configs into workload config file Signed-off-by: Jacob Lisi <[email protected]> * add support for tenant ID Signed-off-by: Jacob Lisi <[email protected]> * fix query tenant flag Signed-off-by: Jacob Lisi <[email protected]> * use instance-name flag for tenant name Signed-off-by: Jacob Lisi <[email protected]> * fix duplicate flag registration bug Signed-off-by: Jacob Lisi <[email protected]> * cleanup a bit Signed-off-by: Jacob Lisi <[email protected]> * increase number of write workers Signed-off-by: Jacob Lisi <[email protected]> * docs(benchtool): add documentation explaining the workload configuration Signed-off-by: Jacob Lisi <[email protected]> * remove unused workload config option Signed-off-by: Jacob Lisi <[email protected]> * remove extra docker tag push steps for local development Signed-off-by: Jacob Lisi <[email protected]> * update README Signed-off-by: Jacob Lisi <[email protected]> * remove from make all step to avoid Go 1.16 compat issue Signed-off-by: Jacob Lisi <[email protected]> * fix linting issues Signed-off-by: Jacob Lisi <[email protected]> * vary the number of writer workers by the number of configured replicas Signed-off-by: Jacob Lisi <[email protected]> * add pprof Signed-off-by: Jacob Lisi <[email protected]> * refactor per PR comments Signed-off-by: Jacob Lisi <[email protected]>
1 parent ebeed6b commit 7c8b651

File tree

21 files changed

+1780
-1
lines changed

21 files changed

+1780
-1
lines changed

Makefile

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,17 @@ GIT_BRANCH := $(shell git rev-parse --abbrev-ref HEAD)
88
GO_FLAGS := -mod=vendor -ldflags "-extldflags \"-static\" -s -w -X $(VPREFIX).Branch=$(GIT_BRANCH) -X $(VPREFIX).Version=$(IMAGE_TAG) -X $(VPREFIX).Revision=$(GIT_REVISION)" -tags netgo
99

1010
all: cortextool chunktool logtool
11-
images: cortextool-image chunktool-image logtool-image
11+
images: cortextool-image chunktool-image logtool-image benchtool-image
12+
benchtool: cmd/benchtool/benchtool
1213
cortextool: cmd/cortextool/cortextool
1314
chunktool: cmd/chunktool/chunktool
1415
logtool: cmd/logtool/logtool
1516
e2ealerting: cmd/e2ealerting/e2ealerting
1617

18+
benchtool-image:
19+
$(SUDO) docker build -t $(IMAGE_PREFIX)/benchtool -f cmd/benchtool/Dockerfile .
20+
$(SUDO) docker tag $(IMAGE_PREFIX)/benchtool $(IMAGE_PREFIX)/benchtool:$(IMAGE_TAG)
21+
1722
cortextool-image:
1823
$(SUDO) docker build -t $(IMAGE_PREFIX)/cortextool -f cmd/cortextool/Dockerfile .
1924
$(SUDO) docker tag $(IMAGE_PREFIX)/cortextool $(IMAGE_PREFIX)/cortextool:$(IMAGE_TAG)
@@ -32,6 +37,9 @@ e2ealerting-image:
3237
push-e2ealerting-image: e2ealerting-image
3338
$(SUDO) docker push $(IMAGE_PREFIX)/e2ealerting:$(IMAGE_TAG)
3439

40+
cmd/benchtool/benchtool: $(APP_GO_FILES) cmd/benchtool/main.go
41+
CGO_ENABLED=0 go build $(GO_FLAGS) -o $@ ./$(@D)
42+
3543
cmd/cortextool/cortextool: $(APP_GO_FILES) cmd/cortextool/main.go
3644
CGO_ENABLED=0 go build $(GO_FLAGS) -o $@ ./$(@D)
3745

@@ -60,6 +68,7 @@ test:
6068
go test -mod=vendor -p=8 ./pkg/...
6169

6270
clean:
71+
rm -rf cmd/benchtool/benchtool
6372
rm -rf cmd/cortextool/cortextool
6473
rm -rf cmd/chunktool/chunktool
6574
rm -rf cmd/logtool/logtool

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,11 @@ Timestamp TraceID Length Duration
284284
2020-05-26 13:52:15.771988849 -0400 EDT 2e7473ab10160630 10h33m0s 7.472855362s (200) /api/prom/api/v1/query_range
285285
2020-05-26 13:53:46.712563497 -0400 EDT 761f3221dcdd85de 10h33m0s 11.874296689s (200) /api/prom/api/v1/query_range
286286
```
287+
## benchtool
287288

289+
A tool for benchmarking a Prometheus remote-write backend and PromQL compatible
290+
API. It allows for metrics to be generated using a [workload
291+
file](docs/benchtool.md).
288292

289293
### License
290294

cmd/benchtool/Dockerfile

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
FROM golang:1.16.3-stretch as build
2+
ARG GOARCH="amd64"
3+
COPY . /build_dir
4+
WORKDIR /build_dir
5+
ENV GOPROXY=https://proxy.golang.org
6+
RUN make clean && make benchtool
7+
8+
FROM alpine:3.13
9+
RUN apk add --update --no-cache ca-certificates
10+
COPY --from=build /build_dir/cmd/benchtool/benchtool /usr/bin/benchtool
11+
EXPOSE 80
12+
ENTRYPOINT [ "/usr/bin/benchtool" ]

cmd/benchtool/main.go

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
package main
2+
3+
import (
4+
"context"
5+
"flag"
6+
"net/http"
7+
_ "net/http/pprof"
8+
"os"
9+
"os/signal"
10+
11+
"github.com/cortexproject/cortex/pkg/util/flagext"
12+
logutil "github.com/cortexproject/cortex/pkg/util/log"
13+
"github.com/go-kit/kit/log/level"
14+
"github.com/prometheus/client_golang/prometheus"
15+
"github.com/prometheus/client_golang/prometheus/promhttp"
16+
"github.com/weaveworks/common/logging"
17+
18+
"github.com/grafana/cortex-tools/pkg/bench"
19+
)
20+
21+
var (
22+
benchConfig bench.Config
23+
LogLevelConfig logging.Level
24+
LogFormatConfig logging.Format
25+
)
26+
27+
func main() {
28+
flagext.RegisterFlags(&benchConfig, &LogLevelConfig, &LogFormatConfig)
29+
flag.Parse()
30+
31+
logger, err := logutil.NewPrometheusLogger(LogLevelConfig, LogFormatConfig)
32+
if err != nil {
33+
level.Error(logger).Log("msg", "error initializing logger", "err", err)
34+
os.Exit(1)
35+
}
36+
37+
benchmarkRunner, err := bench.NewBenchRunner(benchConfig, logger, prometheus.DefaultRegisterer)
38+
if err != nil {
39+
level.Error(logger).Log("msg", "error initializing benchmarker", "err", err)
40+
os.Exit(1)
41+
}
42+
43+
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
44+
defer stop()
45+
46+
go func() {
47+
http.Handle("/metrics", promhttp.Handler())
48+
panic(http.ListenAndServe(":80", nil))
49+
}()
50+
51+
level.Info(logger).Log("msg", "starting benchmarker")
52+
err = benchmarkRunner.Run(ctx)
53+
if err != nil {
54+
level.Error(logger).Log("msg", "benchmarker failed", "err", err)
55+
os.Exit(1)
56+
}
57+
}

docs/benchtool.md

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
# Benchtool
2+
3+
The `benchtool` is a small load testing utility to generate load for Prometheus
4+
remote-write and query API endpoints. It uses a statically configured YAML
5+
workload file to describe the series and queries used for the generate load.
6+
7+
**warning:** the specifications outlined here are subject to change. It is
8+
likely the specs in this file will require some adjustment to accomodate
9+
expanded functionality and this should not be consided stable.
10+
11+
## Workload file
12+
13+
The workload file can be configured as a follows:
14+
15+
```yaml
16+
replicas: "<int>"
17+
queries:
18+
- expr_template: "<string>"
19+
interval: "<duration>"
20+
num_queries: "<int>"
21+
series_type: "<enum series-type>"
22+
regex: "<bool>"
23+
series:
24+
- name: "<string>"
25+
type: "<enum series-type>"
26+
static_labels:
27+
"<string>": "<string>"
28+
labels:
29+
- name: "<string>"
30+
unique_values: "<int>"
31+
value_prefix: "<string>"
32+
write_options:
33+
batch_size: "<int>"
34+
interval: "<duration>"
35+
```
36+
37+
### Series
38+
39+
The series configuration section of the workload is used to configure the
40+
metrics generated by the workload. It is a list of series objects that allow the
41+
workload author to declare name and characteristics of each series the workload
42+
will generate.
43+
44+
- **name** is the metric name of the series that will generated. The name will
45+
be used as the reserved `__name__` label for each series generated using this
46+
series config.
47+
- **type** references the type of series that will be generated and the
48+
characteristics of it's underlying data. See the [series type
49+
section](#series-type) for more information.
50+
- **static_labels** is a map of string values to string keys. These will be
51+
added as labels to each series generated from this series description.
52+
- **labels** is field that allows for the input of a list of label descriptions.
53+
The series generated from this series description will be based on the result
54+
of all possible combinations of the provided label descriptions.
55+
- **name** refers to the label name of the generated label.
56+
- **unique_values** referes to the number of values to generate for this
57+
label.
58+
- **value_prefix** refers to the prefix that will used for each generated
59+
label value. The resultant label value will be the prefix appended with
60+
`-<integer>`, where the integer is a value between `0` and the configured
61+
`unique_values`.
62+
63+
#### Series types
64+
65+
Series types are used in both the series and queries section of the workload.
66+
The denote the character and properties of the data of the generated or queried
67+
series. The following series types are supported.
68+
69+
- *gauge-zero*: correlates to a gauge series that will contantly be zero. This
70+
is a commonly seen pattern in Prometheus info metrics.
71+
- *gauge-random*: correlates to a gauge series that constant changes to a random
72+
value. The values is chosen using `rand.Float64`.
73+
- *counter-one* correlates to a counter that increases by 1 at every interval.
74+
- *counter-random* correlates to a counter that increases by a random amount
75+
every interval. The random amount is not constant and is currently chosen
76+
using `rand.Int()`.
77+
78+
### Global options
79+
80+
- **replicas**: Replicas is meant to be a stand in for the host label. For each
81+
value between 0 and the configured replica value, a `bench_replica` label will
82+
be added and appended to each generated series. This label will also be used for
83+
queries configured to use regular expressions.
84+
85+
### Queries
86+
87+
The query workload is made up of a list of configured queries. Queries have a
88+
variety of options that can be configured to get the desired behavior.
89+
90+
- **series_type:** Series type to use for this query. A series name generated
91+
from the `series` section of the Cortex config will be chosen and used for
92+
each generated query.
93+
- **expr_template** is used to configure the query expression that will be run.
94+
It uses a modified [Go text template](https://golang.org/pkg/text/template/)
95+
to allow for the name of a series and a matcher to be injected into the query.
96+
The name will be sourced from the series name and the matcher will be a match
97+
on the `bench_replicas` label.
98+
- **regex** is a boolean value that, if enabled, will cause the injected
99+
`bench_replica` matcher on the query to be a regex match on a single replica
100+
instead of an exact matcher.
101+
- **interval:** Interval at which each instantiation of this query description
102+
will run. A random jitter no greater than half the configured interval is
103+
applied before the first run of each query to ensure that queries are not all
104+
scheduled to run simultaneously.
105+
- **num_queries:** Number of replicas to run of this configured query. Each
106+
instance of the generated query will select a random series with the same
107+
series type. Each query will also apply a unique random jitter on the interval
108+
before it's first run.
109+
110+
### Write options
111+
112+
- **batch_size** determines the number of samples send in each remote-write
113+
request.
114+
- **interval** is the duration period between when each batch of remote-write
115+
requests are generated and queued to be sent.
116+
117+
### Example workload file
118+
119+
```yaml
120+
---
121+
queries:
122+
- expr_template: sum(<<.Name>>{<<.Matchers>>})
123+
interval: 1m
124+
num_queries: 3
125+
series_type: gauge-random
126+
- expr_template: sum(<<.Name>>{<<.Matchers>>})
127+
interval: 1m
128+
num_queries: 5
129+
series_type: gauge-zero
130+
time_range: 2h
131+
replicas: 5
132+
series:
133+
- labels:
134+
- name: label_01
135+
unique_values: 5
136+
value_prefix: label_value_01
137+
- name: label_02
138+
unique_values: 20
139+
value_prefix: label_value_02
140+
name: metric_gauge_random_01
141+
static_labels:
142+
static: "true"
143+
type: gauge-random
144+
- labels:
145+
- name: label_01
146+
unique_values: 5
147+
value_prefix: label_value_01
148+
- name: label_02
149+
unique_values: 20
150+
value_prefix: label_value_02
151+
name: metric_gauge_zero_01
152+
static_labels:
153+
static: "true"
154+
type: gauge-zero
155+
write_options:
156+
batch_size: 1000
157+
interval: 15s
158+
```
159+
160+
## Consistency
161+
162+
To ensure results are consistent between runs, the seed of the random number
163+
generator used by the workload is determined by the configured `-bench.id` flag
164+
value of the benchtool instance.
165+
166+
This will ensure two `benchtool` processes run with the same id and workload
167+
config file will result in the same behavior between runs.

go.mod

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ require (
2626
github.com/grafana-tools/sdk v0.0.0-20210310213032-c3f3511b3e9b
2727
github.com/grafana/loki v1.6.2-0.20210310125813-306cc724380c
2828
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db
29+
github.com/nouney/randomstring v0.0.0-20180330205616-1374daa59f01
30+
github.com/opentracing-contrib/go-stdlib v1.0.0
2931
github.com/opentracing/opentracing-go v1.2.0
3032
github.com/pkg/errors v0.9.1
3133
github.com/prometheus/alertmanager v0.21.1-0.20210310093010-0f9cab6991e6

go.sum

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1188,6 +1188,8 @@ github.com/nicolai86/scaleway-sdk v1.10.2-0.20180628010248-798f60e20bb2/go.mod h
11881188
github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e h1:fD57ERR4JtEqsWbfPhv4DMiApHyliiK5xCTNVSPiaAs=
11891189
github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e/go.mod h1:zD1mROLANZcx1PVRCS0qkT7pwLkGfwJo4zjcN/Tysno=
11901190
github.com/nkovacs/streamquote v0.0.0-20170412213628-49af9bddb229/go.mod h1:0aYXnNPJ8l7uZxf45rWW1a/uME32OF0rhiYGNQ2oF2E=
1191+
github.com/nouney/randomstring v0.0.0-20180330205616-1374daa59f01 h1:sixbuYqAzQK/jc+q2aj6buyXE0hW/VzRu+/5j+9ezeY=
1192+
github.com/nouney/randomstring v0.0.0-20180330205616-1374daa59f01/go.mod h1:XHZEgzLAde8MdprPDSpGylskAyazKjP8UyEMxgoUaW4=
11911193
github.com/nsqio/go-nsq v1.0.7/go.mod h1:XP5zaUs3pqf+Q71EqUJs3HYfBIqfK6G83WQMdNN+Ito=
11921194
github.com/nxadm/tail v1.4.4 h1:DQuhQpB1tVlglWS2hLQ5OV6B5r8aGxSrPc5Qo6uTN78=
11931195
github.com/nxadm/tail v1.4.4/go.mod h1:kenIhsEOeOJmVchQTgglprH7qJGnHDVpk1VPCcaMI8A=

0 commit comments

Comments
 (0)