|
| 1 | +# SLO workload |
| 2 | + |
| 3 | +SLO is the type of test where app based on ydb-sdk is tested against falling YDB cluster nodes, tablets, network |
| 4 | +(that is possible situations for distributed DBs with hundreds of nodes) |
| 5 | + |
| 6 | +### Implementations: |
| 7 | + |
| 8 | +There are two implementations: |
| 9 | + |
| 10 | +- `sync` |
| 11 | +- `async` (now unimplemented) |
| 12 | + |
| 13 | +### Usage: |
| 14 | + |
| 15 | +It has 3 commands: |
| 16 | + |
| 17 | +- `create` - creates table in database |
| 18 | +- `cleanup` - drops table in database |
| 19 | +- `run` - runs workload (read and write to table with sets RPS) |
| 20 | + |
| 21 | +### Run examples with all arguments: |
| 22 | + |
| 23 | +create: |
| 24 | +`python tests/slo/src/ create localhost:2136 /local -t tableName |
| 25 | +--min-partitions-count 6 --max-partitions-count 1000 --partition-size 1 -с 1000 |
| 26 | +--write-timeout 10000` |
| 27 | + |
| 28 | +cleanup: |
| 29 | +`python tests/slo/src/ cleanup localhost:2136 /local -t tableName` |
| 30 | + |
| 31 | +run: |
| 32 | +`python tests/slo/src/ run localhost:2136 /local -t tableName |
| 33 | +--prom-pgw http://prometheus-pushgateway:9091 -report-period 250 |
| 34 | +--read-rps 1000 --read-timeout 10000 |
| 35 | +--write-rps 100 --write-timeout 10000 |
| 36 | +--time 600 --shutdown-time 30` |
| 37 | + |
| 38 | +## Arguments for commands: |
| 39 | + |
| 40 | +### create |
| 41 | +`python tests/slo/src/ create <endpoint> <db> [options]` |
| 42 | + |
| 43 | +``` |
| 44 | +Arguments: |
| 45 | + endpoint YDB endpoint to connect to |
| 46 | + db YDB database to connect to |
| 47 | +
|
| 48 | +Options: |
| 49 | + -t --table-name <string> table name to create |
| 50 | +
|
| 51 | + -p-min --min-partitions-count <int> minimum amount of partitions in table |
| 52 | + -p-max --max-partitions-count <int> maximum amount of partitions in table |
| 53 | + -p-size --partition-size <int> partition size in mb |
| 54 | +
|
| 55 | + -c --initial-data-count <int> amount of initially created rows |
| 56 | +
|
| 57 | + --write-timeout <int> write timeout milliseconds |
| 58 | +
|
| 59 | + --batch-size <int> amount of new records in each create request |
| 60 | + --threads <int> number of threads to use |
| 61 | +
|
| 62 | +``` |
| 63 | + |
| 64 | +### cleanup |
| 65 | +`python tests/slo/src/ cleanup <endpoint> <db> [options]` |
| 66 | + |
| 67 | +``` |
| 68 | +Arguments: |
| 69 | + endpoint YDB endpoint to connect to |
| 70 | + db YDB database to connect to |
| 71 | +
|
| 72 | +Options: |
| 73 | + -t --table-name <string> table name to create |
| 74 | +``` |
| 75 | + |
| 76 | +### run |
| 77 | +`python tests/slo/src/ run <endpoint> <db> [options]` |
| 78 | + |
| 79 | +``` |
| 80 | +Arguments: |
| 81 | + endpoint YDB endpoint to connect to |
| 82 | + db YDB database to connect to |
| 83 | +
|
| 84 | +Options: |
| 85 | + -t --table-name <string> table name to create |
| 86 | +
|
| 87 | + --prom-pgw <string> prometheus push gateway |
| 88 | + --report-period <int> prometheus push period in milliseconds |
| 89 | +
|
| 90 | + --read-rps <int> read RPS |
| 91 | + --read-timeout <int> read timeout milliseconds |
| 92 | +
|
| 93 | + --write-rps <int> write RPS |
| 94 | + --write-timeout <int> write timeout milliseconds |
| 95 | +
|
| 96 | + --time <int> run time in seconds |
| 97 | + --shutdown-time <int> graceful shutdown time in seconds |
| 98 | +
|
| 99 | + --read-threads <int> number of threads to use for write requests |
| 100 | + --write-threads <int> number of threads to use for read requests |
| 101 | +``` |
| 102 | + |
| 103 | +## Authentication |
| 104 | + |
| 105 | +Workload using [auth-env](https://ydb.yandex-team.ru/docs/reference/ydb-sdk/recipes/auth-env) for authentication. |
| 106 | + |
| 107 | +## What's inside |
| 108 | +When running `run` command, the program creates three jobs: `readJob`, `writeJob`, `metricsJob`. |
| 109 | + |
| 110 | +- `readJob` reads rows from the table one by one with random identifiers generated by writeJob |
| 111 | +- `writeJob` generates and inserts rows |
| 112 | +- `metricsJob` periodically sends metrics to Prometheus |
| 113 | + |
| 114 | +Table have these fields: |
| 115 | +- `object_id Uint64` |
| 116 | +- `object_hash Uint64 Digest::NumericHash(id)` |
| 117 | +- `payload_str UTF8` |
| 118 | +- `payload_double Double` |
| 119 | +- `payload_timestamp Timestamp` |
| 120 | + |
| 121 | +Primary key: `("object_hash", "object_id")` |
| 122 | + |
| 123 | +## Collected metrics |
| 124 | +- `oks` - amount of OK requests |
| 125 | +- `not_oks` - amount of not OK requests |
| 126 | +- `inflight` - amount of requests in flight |
| 127 | +- `latency` - summary of latencies in ms |
| 128 | +- `attempts` - summary of amount for request |
| 129 | + |
| 130 | +> You must reset metrics to keep them `0` in prometheus and grafana before beginning and after ending of jobs |
| 131 | +
|
| 132 | +## Look at metrics in grafana |
| 133 | +You can get dashboard used in that test [here](https://github.com/ydb-platform/slo-tests/blob/main/k8s/helms/grafana.yaml#L69) - you will need to import json into grafana. |
0 commit comments