Gemini exposes Prometheus metrics for monitoring test runs. This guide covers the available metrics and how to use them.
Metrics are available via HTTP:
# Default endpoint
curl http://localhost:2112/metrics
# Custom bind address
./gemini --bind=0.0.0.0:9090 ...
curl http://localhost:9090/metrics| Metric | Type | Description |
|---|---|---|
cql_requests |
Counter | Total CQL requests by system (oracle/test) and method |
cql_error_requests |
Counter | Failed CQL requests |
cql_query_timeouts |
Counter | Query timeouts by cluster and query type |
cql_queries |
Counter | Queries by cluster, host, and query type |
cql_query_errors |
Counter | Query errors with error type |
cql_batches |
Counter | Batch operations |
cql_batched_queries |
Counter | Queries within batches |
| Metric | Type | Description |
|---|---|---|
cql_query_time |
Histogram | Query execution time in seconds |
cql_connect_time |
Histogram | Connection establishment time |
execution_time |
Histogram | Task execution time |
| Metric | Type | Description |
|---|---|---|
cql_connections |
Gauge | Active connections by cluster and host |
cql_connections_errors |
Counter | Connection errors |
| Metric | Type | Description |
|---|---|---|
validated_rows |
Counter | Successfully validated rows by table |
execution_errors |
Counter | Execution errors by type |
| Metric | Type | Description |
|---|---|---|
statement_logger_enqueued_total |
Counter | Items sent to statement logger |
statement_logger_dequeued_total |
Counter | Items processed by statement logger |
statement_logger_items |
Gauge | Current items in logger |
statement_logger_flushes_total |
Counter | File flush operations |
stmt_error_last_timestamp_seconds |
Gauge | Last error timestamp per partition |
| Metric | Type | Description |
|---|---|---|
workers_current |
Gauge | Active workers by job type |
Standard Go metrics are included:
go_goroutines- Current goroutinesgo_memstats_*- Memory statisticsgo_gc_*- Garbage collection statsgo_sync_mutex_wait_total_seconds- Mutex contention
Operations per second:
rate(cql_requests[1m])
Error rate:
rate(cql_error_requests[1m]) / rate(cql_requests[1m])
Query latency (p99):
histogram_quantile(0.99, rate(cql_query_time_bucket[5m]))
Active workers:
sum(workers_current) by (job)
Validation throughput:
rate(validated_rows[1m])
A pre-built Grafana dashboard is available at docker/monitoring/Gemini.json. Import it to visualize:
- Request rates and errors
- Query latencies
- Connection status
- Worker activity
- Memory usage
The monitoring stack is included in the cluster setup:
make scylla-setup-clusterAccess:
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000
- Add Gemini to Prometheus targets:
scrape_configs:
- job_name: 'gemini'
static_configs:
- targets: ['localhost:2112']- Import the Grafana dashboard from
docker/monitoring/Gemini.json
alert: GeminiHighErrorRate
expr: rate(cql_error_requests[5m]) / rate(cql_requests[5m]) > 0.01
for: 5m
labels:
severity: warning
annotations:
summary: "Gemini error rate above 1%"alert: GeminiSlowQueries
expr: histogram_quantile(0.99, rate(cql_query_time_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Gemini p99 latency above 1 second"alert: GeminiConnectionErrors
expr: rate(cql_connections_errors[5m]) > 0
for: 1m
labels:
severity: critical
annotations:
summary: "Gemini experiencing connection errors"