Skip to content

Commit a1c2d6b

Browse files
authored
Merge pull request #311 from ydb-platform/slo
slo: workload through sync ydb driver
2 parents 7893319 + fc0a701 commit a1c2d6b

File tree

13 files changed

+882
-1
lines changed

13 files changed

+882
-1
lines changed

.github/workflows/slo.yml

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
name: SLO
2+
3+
on:
4+
pull_request:
5+
branches: [main]
6+
workflow_dispatch:
7+
8+
jobs:
9+
test-slo:
10+
concurrency:
11+
group: slo-${{ github.ref }}
12+
if: (!contains(github.event.pull_request.labels.*.name, 'no slo'))
13+
14+
runs-on: ubuntu-latest
15+
name: SLO test
16+
permissions:
17+
checks: write
18+
pull-requests: write
19+
contents: read
20+
issues: write
21+
22+
steps:
23+
- name: Checkout repository
24+
uses: actions/checkout@v3
25+
26+
- name: Run SLO
27+
uses: ydb-platform/slo-tests@js-version
28+
with:
29+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
30+
KUBECONFIG_B64: ${{ secrets.SLO_KUBE_CONFIG }}
31+
AWS_CREDENTIALS_B64: ${{ secrets.SLO_AWS_CREDENTIALS }}
32+
AWS_CONFIG_B64: ${{ secrets.SLO_AWS_CONFIG }}
33+
DOCKER_USERNAME: ${{ secrets.SLO_DOCKER_USERNAME }}
34+
DOCKER_PASSWORD: ${{ secrets.SLO_DOCKER_PASSWORD }}
35+
DOCKER_REPO: ${{ secrets.SLO_DOCKER_REPO }}
36+
DOCKER_FOLDER: ${{ secrets.SLO_DOCKER_FOLDER }}
37+
s3_endpoint: ${{ secrets.SLO_S3_ENDPOINT }}
38+
s3_images_folder: ${{ vars.SLO_S3_IMAGES_FOLDER }}
39+
grafana_domain: ${{ vars.SLO_GRAFANA_DOMAIN }}
40+
grafana_dashboard: ${{ vars.SLO_GRAFANA_DASHBOARD }}
41+
ydb_version: 'newest'
42+
timeBetweenPhases: 30
43+
shutdownTime: 30
44+
45+
language_id0: sync
46+
language0: python-sync
47+
workload_path0: tests/slo
48+
workload_build_context0: ../..
49+
workload_build_options0: -f Dockerfile
50+
51+
- uses: actions/upload-artifact@v3
52+
if: always()
53+
with:
54+
name: slo-logs
55+
path: logs/

docker-compose.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
version: "3.9"
1+
version: "3.3"
22
services:
33
ydb:
44
image: cr.yandex/yc/yandex-docker-local-ydb:latest

tests/slo/Dockerfile

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
FROM python:3.8
2+
COPY . /src
3+
WORKDIR /src
4+
RUN python -m pip install --upgrade pip && python -m pip install -e . && python -m pip install -r tests/slo/requirements.txt
5+
WORKDIR tests/slo
6+
7+
ENTRYPOINT ["python", "src"]

tests/slo/README.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# SLO workload
2+
3+
SLO is the type of test where app based on ydb-sdk is tested against falling YDB cluster nodes, tablets, network
4+
(that is possible situations for distributed DBs with hundreds of nodes)
5+
6+
### Implementations:
7+
8+
There are two implementations:
9+
10+
- `sync`
11+
- `async` (now unimplemented)
12+
13+
### Usage:
14+
15+
It has 3 commands:
16+
17+
- `create` - creates table in database
18+
- `cleanup` - drops table in database
19+
- `run` - runs workload (read and write to table with sets RPS)
20+
21+
### Run examples with all arguments:
22+
23+
create:
24+
`python tests/slo/src/ create localhost:2136 /local -t tableName
25+
--min-partitions-count 6 --max-partitions-count 1000 --partition-size 1 -с 1000
26+
--write-timeout 10000`
27+
28+
cleanup:
29+
`python tests/slo/src/ cleanup localhost:2136 /local -t tableName`
30+
31+
run:
32+
`python tests/slo/src/ run localhost:2136 /local -t tableName
33+
--prom-pgw http://prometheus-pushgateway:9091 -report-period 250
34+
--read-rps 1000 --read-timeout 10000
35+
--write-rps 100 --write-timeout 10000
36+
--time 600 --shutdown-time 30`
37+
38+
## Arguments for commands:
39+
40+
### create
41+
`python tests/slo/src/ create <endpoint> <db> [options]`
42+
43+
```
44+
Arguments:
45+
endpoint YDB endpoint to connect to
46+
db YDB database to connect to
47+
48+
Options:
49+
-t --table-name <string> table name to create
50+
51+
-p-min --min-partitions-count <int> minimum amount of partitions in table
52+
-p-max --max-partitions-count <int> maximum amount of partitions in table
53+
-p-size --partition-size <int> partition size in mb
54+
55+
-c --initial-data-count <int> amount of initially created rows
56+
57+
--write-timeout <int> write timeout milliseconds
58+
59+
--batch-size <int> amount of new records in each create request
60+
--threads <int> number of threads to use
61+
62+
```
63+
64+
### cleanup
65+
`python tests/slo/src/ cleanup <endpoint> <db> [options]`
66+
67+
```
68+
Arguments:
69+
endpoint YDB endpoint to connect to
70+
db YDB database to connect to
71+
72+
Options:
73+
-t --table-name <string> table name to create
74+
```
75+
76+
### run
77+
`python tests/slo/src/ run <endpoint> <db> [options]`
78+
79+
```
80+
Arguments:
81+
endpoint YDB endpoint to connect to
82+
db YDB database to connect to
83+
84+
Options:
85+
-t --table-name <string> table name to create
86+
87+
--prom-pgw <string> prometheus push gateway
88+
--report-period <int> prometheus push period in milliseconds
89+
90+
--read-rps <int> read RPS
91+
--read-timeout <int> read timeout milliseconds
92+
93+
--write-rps <int> write RPS
94+
--write-timeout <int> write timeout milliseconds
95+
96+
--time <int> run time in seconds
97+
--shutdown-time <int> graceful shutdown time in seconds
98+
99+
--read-threads <int> number of threads to use for write requests
100+
--write-threads <int> number of threads to use for read requests
101+
```
102+
103+
## Authentication
104+
105+
Workload using [auth-env](https://ydb.yandex-team.ru/docs/reference/ydb-sdk/recipes/auth-env) for authentication.
106+
107+
## What's inside
108+
When running `run` command, the program creates three jobs: `readJob`, `writeJob`, `metricsJob`.
109+
110+
- `readJob` reads rows from the table one by one with random identifiers generated by writeJob
111+
- `writeJob` generates and inserts rows
112+
- `metricsJob` periodically sends metrics to Prometheus
113+
114+
Table have these fields:
115+
- `object_id Uint64`
116+
- `object_hash Uint64 Digest::NumericHash(id)`
117+
- `payload_str UTF8`
118+
- `payload_double Double`
119+
- `payload_timestamp Timestamp`
120+
121+
Primary key: `("object_hash", "object_id")`
122+
123+
## Collected metrics
124+
- `oks` - amount of OK requests
125+
- `not_oks` - amount of not OK requests
126+
- `inflight` - amount of requests in flight
127+
- `latency` - summary of latencies in ms
128+
- `attempts` - summary of amount for request
129+
130+
> You must reset metrics to keep them `0` in prometheus and grafana before beginning and after ending of jobs
131+
132+
## Look at metrics in grafana
133+
You can get dashboard used in that test [here](https://github.com/ydb-platform/slo-tests/blob/main/k8s/helms/grafana.yaml#L69) - you will need to import json into grafana.

tests/slo/requirements.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
requests==2.28.2
2+
ratelimiter==1.2.0.post0
3+
prometheus-client==0.17.0
4+
quantile-estimator==0.1.2

tests/slo/src/__init__.py

Whitespace-only changes.

tests/slo/src/__main__.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import gc
2+
import logging
3+
4+
from options import parse_options
5+
from runner import run_from_args
6+
7+
logging.basicConfig(level=logging.INFO)
8+
9+
10+
if __name__ == "__main__":
11+
args = parse_options()
12+
gc.disable()
13+
run_from_args(args)

tests/slo/src/generator.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# -*- coding: utf-8 -*-
2+
import dataclasses
3+
import logging
4+
import random
5+
import string
6+
from datetime import datetime
7+
from threading import Lock
8+
9+
logger = logging.getLogger(__name__)
10+
11+
12+
MAX_UINT64 = 2**64 - 1
13+
14+
15+
def generate_random_string(min_len, max_len):
16+
strlen = random.randint(min_len, max_len)
17+
return "".join(random.choices(string.ascii_lowercase, k=strlen))
18+
19+
20+
@dataclasses.dataclass
21+
class Row:
22+
object_id: int
23+
payload_str: str
24+
payload_double: float
25+
payload_timestamp: datetime
26+
27+
28+
@dataclasses.dataclass
29+
class RowGenerator:
30+
id_counter: int = 0
31+
lock = Lock()
32+
33+
def get(self):
34+
with self.lock:
35+
self.id_counter += 1
36+
if self.id_counter >= MAX_UINT64:
37+
self.id_counter = 0
38+
logger.warning("RowGenerator: maxint reached")
39+
40+
return Row(
41+
object_id=self.id_counter,
42+
payload_str=generate_random_string(20, 40),
43+
payload_double=random.random(),
44+
payload_timestamp=datetime.now(),
45+
)
46+
47+
48+
def batch_generator(args, start_id=0):
49+
row_generator = RowGenerator(start_id)
50+
remain = args.initial_data_count
51+
52+
while True:
53+
size = min(remain, args.batch_size)
54+
if size < 1:
55+
return
56+
yield [row_generator.get() for _ in range(size)]
57+
remain -= size

0 commit comments

Comments
 (0)