|
2 | 2 |
|
3 | 3 | > _[Would you like to know the secret to eternal happiness?](https://youtu.be/M_FAL8nVT40?t=25)_ |
4 | 4 |
|
5 | | -Acolyte is a lightweight resource monitor for Kubernetes containers. |
| 5 | +Acolyte is a lightweight resource monitoring tool designed to collect statistics in containerized environments, |
| 6 | +particularly Kubernetes. |
6 | 7 |
|
7 | | -The planned flow is: |
| 8 | +Acolyte monitors CPU, memory, and GPU utilization and writes the data to JSON files for easy consumption by other |
| 9 | +services. It's designed to run alongside your application in the same container and built with compatibility in mind. |
8 | 10 |
|
9 | | -1. start a container |
10 | | -2. `exec` Acolyte to the container, and make sure it keeps on running after `exec` termination |
11 | | -3. continuously record stats on a shared volume, another worker reads them from there |
12 | | -4. Acolyte dies with the container |
| 11 | +Acolyte is configured through environment variables: |
| 12 | + |
| 13 | +* `RUST_LOG`: log level e.g. debug; default: info |
| 14 | +* `ACOLYTE_STATS_DIR`: directory where stat files are written; default: /tmp/acolyte/stats |
| 15 | +* `ACOLYTE_STAT_INTERVAL_MS`: interval between stats collection in milliseconds; default: 5000 |
| 16 | +* `ACOLYTE_MAX_STATS_ENTRIES`: maximum number of stat files to keep; default: 12 |
| 17 | +* `ACOLYTE_CPU_SAMPLE_RATE_MS`: sample window for CPU usage in milliseconds; default: 100 |
| 18 | +* `SENTRY_DSN`: optional Sentry DSN for error reporting |
| 19 | +* `CLUSTER_NAME`: optional cluster identification for Sentry |
| 20 | + |
| 21 | +```shell |
| 22 | +# you probably want to run it in the background in your container |
| 23 | +./acolyte & |
| 24 | + |
| 25 | +# or attach it to an already running Kubernetes Pod |
| 26 | +kubectl cp ./target/x86_64-unknown-linux-musl/release/acolyte my-pod:/tmp/acolyte |
| 27 | +kubectl exec my-pod -- sh -c "/tmp/acolyte &" |
| 28 | +``` |
| 29 | + |
| 30 | +The JSON fields are fairly self-explanatory e.g. `stats-1741860918020.json`: |
| 31 | + |
| 32 | +```json |
| 33 | +{ |
| 34 | + "time": 1741860918.0206466, |
| 35 | + "num_cpus": 20.0, |
| 36 | + "cpu_usage": 3.5053825547467063, |
| 37 | + "memory_usage_kb": 22802796, |
| 38 | + "memory_total_kb": 65542712, |
| 39 | + "num_gpus": 1, |
| 40 | + "gpu_usage": 0.23, |
| 41 | + "gpu_memory_usage_kb": 50176, |
| 42 | + "gpu_memory_total_kb": 8388608 |
| 43 | +} |
| 44 | +``` |
13 | 45 |
|
14 | 46 | ## Development |
15 | 47 |
|
|
0 commit comments