|
| 1 | +# SLO workload |
| 2 | + |
| 3 | +SLO is the type of test where app based on ydb-sdk is tested against falling YDB cluster nodes, tablets, network |
| 4 | +(that is possible situations for distributed DBs with hundreds of nodes) |
| 5 | + |
| 6 | +### Implementations: |
| 7 | + |
| 8 | +There are two implementations: |
| 9 | + |
| 10 | +- `native` - `./native` |
| 11 | +- `database/sql` - `./database/sql` |
| 12 | + |
| 13 | +### Usage: |
| 14 | + |
| 15 | +It has 3 commands: |
| 16 | + |
| 17 | +- `create` - creates table in database |
| 18 | +- `cleanup` - drops table in database |
| 19 | +- `run` - runs workload (read and write to table with sets RPS) |
| 20 | + |
| 21 | +### Run examples with all arguments: |
| 22 | + |
| 23 | +create: |
| 24 | +`slo-go-workload create create grpcs://ydb.cool.example.com:2135 /some/folder -t tableName |
| 25 | +-min-partitions-count 6 -max-partitions-count 1000 -partition-size 1 -с 1000 |
| 26 | +-write-timeout 10000` |
| 27 | + |
| 28 | +cleanup: |
| 29 | +`slo-go-workload create cleanup grpcs://ydb.cool.example.com:2135 /some/folder -t tableName` |
| 30 | + |
| 31 | +run: |
| 32 | +`slo-go-workload create run grpcs://ydb.cool.example.com:2135 /some/folder -t tableName |
| 33 | +-prom-pgw http://prometheus-pushgateway:9091 -report-period 250 |
| 34 | +-read-rps 1000 -read-timeout 10000 |
| 35 | +-write-rps 100 -write-timeout 10000 |
| 36 | +-time 600 -shutdown-time 30` |
| 37 | + |
| 38 | +## Arguments for commands: |
| 39 | + |
| 40 | +### create |
| 41 | +`slo-go-workload create <endpoint> <db> [options]` |
| 42 | + |
| 43 | +``` |
| 44 | +Arguments: |
| 45 | + endpoint YDB endpoint to connect to |
| 46 | + db YDB database to connect to |
| 47 | +
|
| 48 | +Options: |
| 49 | + -t -table-name <string> table name to create |
| 50 | +
|
| 51 | + -min-partitions-count <int> minimum amount of partitions in table |
| 52 | + -max-partitions-count <int> maximum amount of partitions in table |
| 53 | + -partition-size <int> partition size in mb |
| 54 | + |
| 55 | + -c -initial-data-count <int> amount of initially created rows |
| 56 | + |
| 57 | + -write-timeout <int> write timeout milliseconds |
| 58 | +``` |
| 59 | + |
| 60 | +### cleanup |
| 61 | +`slo-go-workload cleanup <endpoint> <db> [options]` |
| 62 | + |
| 63 | +``` |
| 64 | +Arguments: |
| 65 | + endpoint YDB endpoint to connect to |
| 66 | + db YDB database to connect to |
| 67 | +
|
| 68 | +Options: |
| 69 | + -t -table-name <string> table name to create |
| 70 | + |
| 71 | + -write-timeout <int> write timeout milliseconds |
| 72 | +``` |
| 73 | + |
| 74 | +### run |
| 75 | +`slo-go-workload run <endpoint> <db> [options]` |
| 76 | + |
| 77 | +``` |
| 78 | +Arguments: |
| 79 | + endpoint YDB endpoint to connect to |
| 80 | + db YDB database to connect to |
| 81 | +
|
| 82 | +Options: |
| 83 | + -t -table-name <string> table name to create |
| 84 | + |
| 85 | + -initial-data-count <int> amount of initially created rows |
| 86 | + |
| 87 | + -prom-pgw <string> prometheus push gateway |
| 88 | + -report-period <int> prometheus push period in milliseconds |
| 89 | + |
| 90 | + -read-rps <int> read RPS |
| 91 | + -read-timeout <int> read timeout milliseconds |
| 92 | + |
| 93 | + -write-rps <int> write RPS |
| 94 | + -write-timeout <int> write timeout milliseconds |
| 95 | + |
| 96 | + -time <int> run time in seconds |
| 97 | + -shutdown-time <int> graceful shutdown time in seconds |
| 98 | +``` |
| 99 | + |
| 100 | +## Authentication |
| 101 | + |
| 102 | +Workload using [ydb-go-sdk-auth-environ](https://github.com/ydb-platform/ydb-go-sdk-auth-environ) for authentication. |
| 103 | + |
| 104 | +## What's inside |
| 105 | +When running `run` command, the program creates three jobs: `readJob`, `writeJob`, `metricsJob`. |
| 106 | + |
| 107 | +- `readJob` reads rows from the table one by one with random identifiers generated by writeJob |
| 108 | +- `writeJob` generates and inserts rows |
| 109 | +- `metricsJob` periodically sends metrics to Prometheus |
| 110 | + |
| 111 | +Table have these fields: |
| 112 | +- `id Uint64` |
| 113 | +- `hash Uint64 Digest::NumericHash(id)` |
| 114 | +- `payload_str UTF8` |
| 115 | +- `payload_double Double` |
| 116 | +- `payload_timestamp Timestamp` |
| 117 | +- `payload_hash Uint64` |
| 118 | + |
| 119 | +Primary key: `("hash", "id")` |
| 120 | + |
| 121 | +## Collected metrics |
| 122 | +- `oks` - amount of OK requests |
| 123 | +- `not_oks` - amount of not OK requests |
| 124 | +- `inflight` - amount of requests in flight |
| 125 | +- `latency` - summary of latencies in ms |
| 126 | + |
| 127 | +> You must reset metrics to keep them `0` in prometheus and grafana before beginning and after ending of jobs |
| 128 | +
|
| 129 | +In `go` it looks like that: |
| 130 | +```go |
| 131 | +func (m *Metrics) Reset() error { |
| 132 | + m.oks.WithLabelValues(JobRead).Set(0) |
| 133 | + m.oks.WithLabelValues(JobWrite).Set(0) |
| 134 | + |
| 135 | + m.notOks.WithLabelValues(JobRead).Set(0) |
| 136 | + m.notOks.WithLabelValues(JobWrite).Set(0) |
| 137 | + |
| 138 | + m.inflight.WithLabelValues(JobRead).Set(0) |
| 139 | + m.inflight.WithLabelValues(JobWrite).Set(0) |
| 140 | + |
| 141 | + m.latencies.Reset() |
| 142 | + |
| 143 | + return m.Push() |
| 144 | +} |
| 145 | +``` |
| 146 | + |
| 147 | +## Look at metrics in grafana |
| 148 | +You can get dashboard used in that test [here](https://github.com/ydb-platform/slo-tests/blob/main/k8s/helms/grafana.yaml#L69) - you will need to import json into grafana. |
0 commit comments