From a4c5b4b3a87919ae23af4ccba0850272e3b915c4 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Wed, 7 Aug 2024 17:40:27 -0500 Subject: [PATCH 01/25] DOC-3945 v2 DB Prometheus metrics --- .../prometheus-metrics-definitions-lists.md | 540 ++++++++++++++++++ .../prometheus-metrics-definitions.md | 144 ++--- 2 files changed, 612 insertions(+), 72 deletions(-) create mode 100644 content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions-lists.md diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions-lists.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions-lists.md new file mode 100644 index 0000000000..f2eb6187f7 --- /dev/null +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions-lists.md @@ -0,0 +1,540 @@ +--- +Title: Metrics in Prometheus +alwaysopen: false +categories: +- docs +- integrate +- rs +description: The metrics available to Prometheus. +group: observability +linkTitle: Prometheus metrics (lists) +summary: You can use Prometheus and Grafana to collect and visualize your Redis Enterprise + Software metrics. +type: integration +weight: 45 +--- +The [integration with Prometheus]({{< relref "/integrate/prometheus-with-redis-enterprise/" >}}) +lets you create dashboards that highlight the metrics that are important to you. + +Here are the metrics available to Prometheus: + +## Database metrics + +**V1 metric:** `bdb_avg_latency` +- **Description:** Average latency of operations on the DB (seconds); returned only when there is traffic +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 + ``` + +**V1 metric:** `bdb_avg_latency_max` +- **Description:** Highest value of average latency of operations on the DB (seconds); returned only when there is traffic +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 + ``` + +**V1 metric:** `bdb_avg_read_latency` +- **Description:** Average latency of read operations (seconds); returned only when there is traffic +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 + ``` + +**V1 metric:** `bdb_avg_read_latency_max` +- **Description:** Highest value of average latency of read operations (seconds); returned only when there is traffic +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 + ``` + +**V1 metric:** `bdb_avg_write_latency` +- **Description:** Average latency of write operations (seconds); returned only when there is traffic +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 + ``` + +**V1 metric:** `bdb_avg_write_latency_max` +- **Description:** Highest value of average latency of write operations (seconds); returned only when there is traffic +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 + ``` + +**V1 metric:** `bdb_bigstore_shard_count` +- **Description:** Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled +- **Equivalent V2 PromQL:** + ```promql + sum( + (sum( + label_replace( + label_replace( + namedprocess_namegroup_thread_count{groupname=~"redis-\d+", threadname=~"(speedb|rocksdb).*"}, + "redis", "$1", "groupname", "redis-(\d+)" + ), + "driver", "$1", "threadname", "(speedb|rocksdb).*" + ) + ) by (redis, driver) > bool 0) + * on (redis) group_left(bdb) + redis_server_up + ) by (bdb, driver) + ``` + +**V1 metric:** `bdb_conns` +- **Description:** Number of client connections to DB +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (endpoint_conns) + ``` + +**V1 metric:** `bdb_egress_bytes` +- **Description:** Rate of outgoing network traffic from the DB (bytes/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_egress_bytes[1m])) + ``` + +**V1 metric:** `bdb_egress_bytes_max` +- **Description:** Highest value of rate of outgoing network traffic from the DB (bytes/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_egress_bytes[1m])) + ``` + +**V1 metric:** `bdb_evicted_objects` +- **Description:** Rate of key evictions from DB (evictions/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_evicted_objects_max` +- **Description:** Highest value of rate of key evictions from DB (evictions/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_expired_objects` +- **Description:** Rate keys expired in DB (expirations/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_expired_objects_max` +- **Description:** Highest value of rate keys expired in DB (expirations/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_fork_cpu_system` +- **Description:** % cores utilization in system mode for all redis shard fork child processes of this database +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m])) + ``` + +**V1 metric:** `bdb_fork_cpu_system_max` +- **Description:** Highest value of % cores utilization in system mode for all redis shard fork child processes of this database +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m])) + ``` + +**V1 metric:** `bdb_fork_cpu_user` +- **Description:** % cores utilization in user mode for all redis shard fork child processes of this database +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m])) + ``` + +**V1 metric:** `bdb_fork_cpu_user_max` +- **Description:** Highest value of % cores utilization in user mode for all redis shard fork child processes of this database +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m])) + ``` + +**V1 metric:** `bdb_ingress_bytes` +- **Description:** Rate of incoming network traffic to DB (bytes/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_ingress_bytes[1m])) + ``` + +**V1 metric:** `bdb_ingress_bytes_max` +- **Description:** Highest value of rate of incoming network traffic to DB (bytes/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_ingress_bytes[1m])) + ``` + +**V1 metric:** `bdb_instantaneous_ops_per_sec` +- **Description:** Request rate handled by all shards of DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (redis_server_instantaneous_ops_per_sec) + ``` + +**V1 metric:** `bdb_main_thread_cpu_system` +- **Description:** % cores utilization in system mode for all redis shard main threads of this database +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m])) + ``` + +**V1 metric:** `bdb_main_thread_cpu_system_max` +- **Description:** Highest value of % cores utilization in system mode for all redis shard main threads of this database +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m])) + ``` + +**V1 metric:** `bdb_main_thread_cpu_user` +- **Description:** % cores utilization in user mode for all redis shard main threads of this database +- **Equivalent V2 PromQL:** + ```promql + sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m])) + ``` + +**V1 metric:** `bdb_main_thread_cpu_user_max` +- **Description:** Highest value of % cores utilization in user mode for all redis shard main threads of this database +- **Equivalent V2 PromQL:** + ```promql + sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m])) + ``` + +**V1 metric:** `bdb_mem_frag_ratio` +- **Description:** RAM fragmentation ratio (RSS / allocated RAM) +- **Equivalent V2 PromQL:** + ```promql + avg(redis_server_mem_fragmentation_ratio) + ``` + +**V1 metric:** `bdb_mem_size_lua` +- **Description:** Redis lua scripting heap size (bytes) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (redis_server_used_memory_lua) + ``` + +**V1 metric:** `bdb_memory_limit` +- **Description:** Configured RAM limit for the database +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (redis_server_maxmemory) + ``` + +**V1 metric:** `bdb_monitor_sessions_count` +- **Description:** Number of client connected in monitor mode to the DB +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (endpoint_monitor_sessions_count) + ``` + +**V1 metric:** `bdb_no_of_keys` +- **Description:** Number of keys in DB +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (redis_server_db_keys{role="master"}) + ``` + +**V1 metric:** `bdb_other_req` +- **Description:** Rate of other (non read/write) requests on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_other_req[1m])) + ``` + +**V1 metric:** `bdb_other_req_max` +- **Description:** Highest value of rate of other (non read/write) requests on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_other_req[1m])) + ``` + +**V1 metric:** `bdb_other_res` +- **Description:** Rate of other (non read/write) responses on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_other_res[1m])) + ``` + +**V1 metric:** `bdb_other_res_max` +- **Description:** Highest value of rate of other (non read/write) responses on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_other_res[1m])) + ``` + +**V1 metric:** `bdb_pubsub_channels` +- **Description:** Count the pub/sub channels with subscribed clients +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (redis_server_pubsub_channels) + ``` + +**V1 metric:** `bdb_pubsub_channels_max` +- **Description:** Highest value of count the pub/sub channels with subscribed clients +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (redis_server_pubsub_channels) + ``` + +**V1 metric:** `bdb_pubsub_patterns` +- **Description:** Count the pub/sub patterns with subscribed clients +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (redis_server_pubsub_patterns) + ``` + +**V1 metric:** `bdb_pubsub_patterns_max` +- **Description:** Highest value of count the pub/sub patterns with subscribed clients +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (redis_server_pubsub_patterns) + ``` + +**V1 metric:** `bdb_read_hits` +- **Description:** Rate of read operations accessing an existing key (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_read_hits_max` +- **Description:** Highest value of rate of read operations accessing an existing key (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_read_misses` +- **Description:** Rate of read operations accessing a non-existing key (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_read_misses_max` +- **Description:** Highest value of rate of read operations accessing a non-existing key (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_read_req` +- **Description:** Rate of read requests on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_read_req[1m])) + ``` + +**V1 metric:** `bdb_read_req_max` +- **Description:** Highest value of rate of read requests on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_read_req[1m])) + ``` + +**V1 metric:** `bdb_read_res` +- **Description:** Rate of read responses on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_read_res[1m])) + ``` + +**V1 metric:** `bdb_read_res_max` +- **Description:** Highest value of rate of read responses on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_read_res[1m])) + ``` + +**V1 metric:** `bdb_shard_cpu_system` +- **Description:** % cores utilization in system mode for all redis shard processes of this database +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m])) + ``` + +**V1 metric:** `bdb_shard_cpu_system_max` +- **Description:** Highest value of % cores utilization in system mode for all redis shard processes of this database +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m])) + ``` + +**V1 metric:** `bdb_shard_cpu_user` +- **Description:** % cores utilization in user mode for the redis shard process +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m])) + ``` + +**V1 metric:** `bdb_shard_cpu_user_max` +- **Description:** Highest value of % cores utilization in user mode for the redis shard process +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m])) + ``` + +**V1 metric:** `bdb_shards_used` +- **Description:** Used shard count by database and by shard type (ram / flash) +- **Equivalent V2 PromQL:** + ``` + sum( + (sum( + label_replace( + label_replace( + label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+"}, + "redis", "$1", "groupname", "redis-(\d+)"), + "shard_type", "flash", "threadname", "(bigstore).*"), + "shard_type", "ram", "shard_type", "") + ) by (redis, shard_type) > bool 0) + * on (redis) group_left(bdb) + redis_server_up + ) by (bdb, shard_type) + ``` + +**V1 metric:** `bdb_total_connections_received` +- **Description:** Rate of new client connections to DB (connections/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_total_connections_received[1m])) + ``` + +**V1 metric:** `bdb_total_connections_received_max` +- **Description:** Highest value of rate of new client connections to DB (connections/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_total_connections_received[1m])) + ``` + +**V1 metric:** `bdb_total_req` +- **Description:** Rate of all requests on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_total_req[1m])) + ``` + +**V1 metric:** `bdb_total_req_max` +- **Description:** Highest value of rate of all requests on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_total_req[1m])) + ``` + +**V1 metric:** `bdb_total_res` +- **Description:** Rate of all responses on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_total_res[1m])) + ``` + +**V1 metric:** `bdb_total_res_max` +- **Description:** Highest value of rate of all responses on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_total_res[1m])) + ``` + +**V1 metric:** `bdb_up` +- **Description:** Database is up and running +- **Equivalent V2 PromQL:** + ```promql + min by(bdb) (redis_up) + ``` + +**V1 metric:** `bdb_used_memory` +- **Description:** Memory used by db (in bigredis this includes flash) (bytes) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (redis_server_used_memory) + ``` + +**V1 metric:** `bdb_write_hits` +- **Description:** Rate of write operations accessing an existing key (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_write_hits_max` +- **Description:** Highest value of rate of write operations accessing an existing key (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_write_misses` +- **Description:** Rate of write operations accessing a non-existing key (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_write_misses_max` +- **Description:** Highest value of rate of write operations accessing a non-existing key (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m])) + ``` + +**V1 metric:** `bdb_write_req` +- **Description:** Rate of write requests on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_write_req[1m])) + ``` + +**V1 metric:** `bdb_write_req_max` +- **Description:** Highest value of rate of write requests on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by (bdb) (irate(endpoint_write_req[1m])) + ``` + +**V1 metric:** `bdb_write_res` +- **Description:** Rate of write responses on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_write_responses[1m])) + ``` + +**V1 metric:** `bdb_write_res_max` +- **Description:** Highest value of rate of write responses on DB (ops/sec) +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (irate(endpoint_write_responses[1m])) + ``` + +**V1 metric:** `no_of_expires` +- **Description:** Current number of volatile keys in the database +- **Equivalent V2 PromQL:** + ```promql + sum by(bdb) (redis_server_db_expires{role="master"}) + ``` + +## Node metrics + +TBA + +## Cluster metrics + +TBA + +## Proxy metrics + +TBA + +## Replication metrics + +TBA + +## Shard metrics + +TBA diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 0c44afb102..261b20365f 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -1,5 +1,5 @@ --- -Title: Metrics in Prometheus +Title: Metrics in Prometheus (tables) alwaysopen: false categories: - docs @@ -7,7 +7,7 @@ categories: - rs description: The metrics available to Prometheus. group: observability -linkTitle: Prometheus metrics +linkTitle: Prometheus metrics (tables) summary: You can use Prometheus and Grafana to collect and visualize your Redis Enterprise Software metrics. type: integration @@ -20,76 +20,76 @@ Here are the metrics available to Prometheus: ## Database metrics -| Metric | Description | -| ------ | :------ | -| bdb_avg_latency | Average latency of operations on the DB (seconds); returned only when there is traffic | -| bdb_avg_latency_max | Highest value of average latency of operations on the DB (seconds); returned only when there is traffic | -| bdb_avg_read_latency | Average latency of read operations (seconds); returned only when there is traffic | -| bdb_avg_read_latency_max | Highest value of average latency of read operations (seconds); returned only when there is traffic | -| bdb_avg_write_latency | Average latency of write operations (seconds); returned only when there is traffic | -| bdb_avg_write_latency_max | Highest value of average latency of write operations (seconds); returned only when there is traffic | -| bdb_bigstore_shard_count | Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled | -| bdb_conns | Number of client connections to DB | -| bdb_egress_bytes | Rate of outgoing network traffic from the DB (bytes/sec) | -| bdb_egress_bytes_max | Highest value of rate of outgoing network traffic from the DB (bytes/sec) | -| bdb_evicted_objects | Rate of key evictions from DB (evictions/sec) | -| bdb_evicted_objects_max | Highest value of rate of key evictions from DB (evictions/sec) | -| bdb_expired_objects | Rate keys expired in DB (expirations/sec) | -| bdb_expired_objects_max | Highest value of rate keys expired in DB (expirations/sec) | -| bdb_fork_cpu_system | % cores utilization in system mode for all redis shard fork child processes of this database | -| bdb_fork_cpu_system_max | Highest value of % cores utilization in system mode for all redis shard fork child processes of this database | -| bdb_fork_cpu_user | % cores utilization in user mode for all redis shard fork child processes of this database | -| bdb_fork_cpu_user_max | Highest value of % cores utilization in user mode for all redis shard fork child processes of this database | -| bdb_ingress_bytes | Rate of incoming network traffic to DB (bytes/sec) | -| bdb_ingress_bytes_max | Highest value of rate of incoming network traffic to DB (bytes/sec) | -| bdb_instantaneous_ops_per_sec | Request rate handled by all shards of DB (ops/sec) | -| bdb_main_thread_cpu_system | % cores utilization in system mode for all redis shard main threads of this database | -| bdb_main_thread_cpu_system_max | Highest value of % cores utilization in system mode for all redis shard main threads of this database | -| bdb_main_thread_cpu_user | % cores utilization in user mode for all redis shard main threads of this database | -| bdb_main_thread_cpu_user_max | Highest value of % cores utilization in user mode for all redis shard main threads of this database | -| bdb_mem_frag_ratio | RAM fragmentation ratio (RSS / allocated RAM) | -| bdb_mem_size_lua | Redis lua scripting heap size (bytes) | -| bdb_memory_limit | Configured RAM limit for the database | -| bdb_monitor_sessions_count | Number of client connected in monitor mode to the DB | -| bdb_no_of_keys | Number of keys in DB | -| bdb_other_req | Rate of other (non read/write) requests on DB (ops/sec) | -| bdb_other_req_max | Highest value of rate of other (non read/write) requests on DB (ops/sec) | -| bdb_other_res | Rate of other (non read/write) responses on DB (ops/sec) | -| bdb_other_res_max | Highest value of rate of other (non read/write) responses on DB (ops/sec) | -| bdb_pubsub_channels | Count the pub/sub channels with subscribed clients | -| bdb_pubsub_channels_max | Highest value of count the pub/sub channels with subscribed clients | -| bdb_pubsub_patterns | Count the pub/sub patterns with subscribed clients | -| bdb_pubsub_patterns_max | Highest value of count the pub/sub patterns with subscribed clients | -| bdb_read_hits | Rate of read operations accessing an existing key (ops/sec) | -| bdb_read_hits_max | Highest value of rate of read operations accessing an existing key (ops/sec) | -| bdb_read_misses | Rate of read operations accessing a non-existing key (ops/sec) | -| bdb_read_misses_max | Highest value of rate of read operations accessing a non-existing key (ops/sec) | -| bdb_read_req | Rate of read requests on DB (ops/sec) | -| bdb_read_req_max | Highest value of rate of read requests on DB (ops/sec) | -| bdb_read_res | Rate of read responses on DB (ops/sec) | -| bdb_read_res_max | Highest value of rate of read responses on DB (ops/sec) | -| bdb_shard_cpu_system | % cores utilization in system mode for all redis shard processes of this database | -| bdb_shard_cpu_system_max | Highest value of % cores utilization in system mode for all redis shard processes of this database | -| bdb_shard_cpu_user | % cores utilization in user mode for the redis shard process | -| bdb_shard_cpu_user_max | Highest value of % cores utilization in user mode for the redis shard process | -| bdb_shards_used | Used shard count by database and by shard type (ram / flash) | -| bdb_total_connections_received | Rate of new client connections to DB (connections/sec) | -| bdb_total_connections_received_max | Highest value of rate of new client connections to DB (connections/sec) | -| bdb_total_req | Rate of all requests on DB (ops/sec) | -| bdb_total_req_max | Highest value of rate of all requests on DB (ops/sec) | -| bdb_total_res | Rate of all responses on DB (ops/sec) | -| bdb_total_res_max | Highest value of rate of all responses on DB (ops/sec) | -| bdb_up | Database is up and running | -| bdb_used_memory | Memory used by db (in bigredis this includes flash) (bytes) | -| bdb_write_hits | Rate of write operations accessing an existing key (ops/sec) | -| bdb_write_hits_max | Highest value of rate of write operations accessing an existing key (ops/sec) | -| bdb_write_misses | Rate of write operations accessing a non-existing key (ops/sec) | -| bdb_write_misses_max | Highest value of rate of write operations accessing a non-existing key (ops/sec) | -| bdb_write_req | Rate of write requests on DB (ops/sec) | -| bdb_write_req_max | Highest value of rate of write requests on DB (ops/sec) | -| bdb_write_res | Rate of write responses on DB (ops/sec) | -| bdb_write_res_max | Highest value of rate of write responses on DB (ops/sec) | -| no_of_expires | Current number of volatile keys in the database | +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| bdb_avg_latency | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of operations on the DB (seconds); returned only when there is traffic | +| bdb_avg_latency_max | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of operations on the DB (seconds); returned only when there is traffic | +| bdb_avg_read_latency | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of read operations (seconds); returned only when there is traffic | +| bdb_avg_read_latency_max | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of read operations (seconds); returned only when there is traffic | +| bdb_avg_write_latency | `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of write operations (seconds); returned only when there is traffic | +| bdb_avg_write_latency_max | `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of write operations (seconds); returned only when there is traffic | +| bdb_bigstore_shard_count | `sum((sum(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+", threadname=~"(speedb\|rocksdb).*"}, "redis", "$1", "groupname", "redis-(\d+)"), "driver", "$1", "threadname", "(speedb\|rocksdb).*")) by (redis, driver) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, driver)` | Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled | +| bdb_conns | `sum by(bdb) (endpoint_conns)` | Number of client connections to DB | +| bdb_egress_bytes | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Rate of outgoing network traffic from the DB (bytes/sec) | +| bdb_egress_bytes_max | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Highest value of rate of outgoing network traffic from the DB (bytes/sec) | +| bdb_evicted_objects | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Rate of key evictions from DB (evictions/sec) | +| bdb_evicted_objects_max | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Highest value of rate of key evictions from DB (evictions/sec) | +| bdb_expired_objects | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Rate keys expired in DB (expirations/sec) | +| bdb_expired_objects_max | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Highest value of rate keys expired in DB (expirations/sec) | +| bdb_fork_cpu_system | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | % cores utilization in system mode for all redis shard fork child processes of this database | +| bdb_fork_cpu_system_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | Highest value of % cores utilization in system mode for all redis shard fork child processes of this database | +| bdb_fork_cpu_user | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | % cores utilization in user mode for all redis shard fork child processes of this database | +| bdb_fork_cpu_user_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | Highest value of % cores utilization in user mode for all redis shard fork child processes of this database | +| bdb_ingress_bytes | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Rate of incoming network traffic to DB (bytes/sec) | +| bdb_ingress_bytes_max | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Highest value of rate of incoming network traffic to DB (bytes/sec) | +| bdb_instantaneous_ops_per_sec | `sum by(bdb) (redis_server_instantaneous_ops_per_sec)` | Request rate handled by all shards of DB (ops/sec) | +| bdb_main_thread_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | % cores utilization in system mode for all redis shard main threads of this database | +| bdb_main_thread_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in system mode for all redis shard main threads of this database | +| bdb_main_thread_cpu_user | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | % cores utilization in user mode for all redis shard main threads of this database | +| bdb_main_thread_cpu_user_max | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in user mode for all redis shard main threads of this database | +| bdb_mem_frag_ratio | `avg(redis_server_mem_fragmentation_ratio)` | RAM fragmentation ratio (RSS / allocated RAM) | +| bdb_mem_size_lua | `sum by(bdb) (redis_server_used_memory_lua)` | Redis lua scripting heap size (bytes) | +| bdb_memory_limit | `sum by(bdb) (redis_server_maxmemory)` | Configured RAM limit for the database | +| bdb_monitor_sessions_count | `sum by(bdb) (endpoint_monitor_sessions_count)` | Number of client connected in monitor mode to the DB | +| bdb_no_of_keys | `sum by (bdb) (redis_server_db_keys{role="master"})` | Number of keys in DB | +| bdb_other_req | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Rate of other (non read/write) requests on DB (ops/sec) | +| bdb_other_req_max | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Highest value of rate of other (non read/write) requests on DB (ops/sec) | +| bdb_other_res | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Rate of other (non read/write) responses on DB (ops/sec) | +| bdb_other_res_max | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Highest value of rate of other (non read/write) responses on DB (ops/sec) | +| bdb_pubsub_channels | `sum by(bdb) (redis_server_pubsub_channels)` | Count the pub/sub channels with subscribed clients | +| bdb_pubsub_channels_max | `sum by(bdb) (redis_server_pubsub_channels)` | Highest value of count the pub/sub channels with subscribed clients | +| bdb_pubsub_patterns | `sum by(bdb) (redis_server_pubsub_patterns)` | Count the pub/sub patterns with subscribed clients | +| bdb_pubsub_patterns_max | `sum by(bdb) (redis_server_pubsub_patterns)` | Highest value of count the pub/sub patterns with subscribed clients | +| bdb_read_hits | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Rate of read operations accessing an existing key (ops/sec) | +| bdb_read_hits_max | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Highest value of rate of read operations accessing an existing key (ops/sec) | +| bdb_read_misses | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Rate of read operations accessing a non-existing key (ops/sec) | +| bdb_read_misses_max | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Highest value of rate of read operations accessing a non-existing key (ops/sec) | +| bdb_read_req | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Rate of read requests on DB (ops/sec) | +| bdb_read_req_max | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Highest value of rate of read requests on DB (ops/sec) | +| bdb_read_res | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Rate of read responses on DB (ops/sec) | +| bdb_read_res_max | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Highest value of rate of read responses on DB (ops/sec) | +| bdb_shard_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | % cores utilization in system mode for all redis shard processes of this database | +| bdb_shard_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | Highest value of % cores utilization in system mode for all redis shard processes of this database | +| bdb_shard_cpu_user | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | % cores utilization in user mode for the redis shard process | +| bdb_shard_cpu_user_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | Highest value of % cores utilization in user mode for the redis shard process | +| bdb_shards_used | `sum((sum(label_replace(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+"}, "redis", "$1", "groupname", "redis-(\d+)"), "shard_type", "flash", "threadname", "(bigstore).*"), "shard_type", "ram", "shard_type", "")) by (redis, shard_type) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, shard_type)` | Used shard count by database and by shard type (ram / flash) | +| bdb_total_connections_received | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Rate of new client connections to DB (connections/sec) | +| bdb_total_connections_received_max | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Highest value of rate of new client connections to DB (connections/sec) | +| bdb_total_req | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Rate of all requests on DB (ops/sec) | +| bdb_total_req_max | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Highest value of rate of all requests on DB (ops/sec) | +| bdb_total_res | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Rate of all responses on DB (ops/sec) | +| bdb_total_res_max | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Highest value of rate of all responses on DB (ops/sec) | +| bdb_up | `min by(bdb) (redis_up)` | Database is up and running | +| bdb_used_memory | `sum by (bdb) (redis_server_used_memory)` | Memory used by db (in bigredis this includes flash) (bytes) | +| bdb_write_hits | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Rate of write operations accessing an existing key (ops/sec) | +| bdb_write_hits_max | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Highest value of rate of write operations accessing an existing key (ops/sec) | +| bdb_write_misses | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Rate of write operations accessing a non-existing key (ops/sec) | +| bdb_write_misses_max | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Highest value of rate of write operations accessing a non-existing key (ops/sec) | +| bdb_write_req | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Rate of write requests on DB (ops/sec) | +| bdb_write_req_max | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Highest value of rate of write requests on DB (ops/sec) | +| bdb_write_res | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Rate of write responses on DB (ops/sec) | +| bdb_write_res_max | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Highest value of rate of write responses on DB (ops/sec) | +| no_of_expires | `sum by(bdb) (redis_server_db_expires{role="master"})` | Current number of volatile keys in the database | ## Node metrics From 11a1521a50b717ac896ce2f9a0ac07609ccbf8ce Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Thu, 8 Aug 2024 11:02:38 -0500 Subject: [PATCH 02/25] Remove list-formatted metrics preview --- .../prometheus-metrics-definitions-lists.md | 540 ------------------ .../prometheus-metrics-definitions.md | 4 +- 2 files changed, 2 insertions(+), 542 deletions(-) delete mode 100644 content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions-lists.md diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions-lists.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions-lists.md deleted file mode 100644 index f2eb6187f7..0000000000 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions-lists.md +++ /dev/null @@ -1,540 +0,0 @@ ---- -Title: Metrics in Prometheus -alwaysopen: false -categories: -- docs -- integrate -- rs -description: The metrics available to Prometheus. -group: observability -linkTitle: Prometheus metrics (lists) -summary: You can use Prometheus and Grafana to collect and visualize your Redis Enterprise - Software metrics. -type: integration -weight: 45 ---- -The [integration with Prometheus]({{< relref "/integrate/prometheus-with-redis-enterprise/" >}}) -lets you create dashboards that highlight the metrics that are important to you. - -Here are the metrics available to Prometheus: - -## Database metrics - -**V1 metric:** `bdb_avg_latency` -- **Description:** Average latency of operations on the DB (seconds); returned only when there is traffic -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 - ``` - -**V1 metric:** `bdb_avg_latency_max` -- **Description:** Highest value of average latency of operations on the DB (seconds); returned only when there is traffic -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 - ``` - -**V1 metric:** `bdb_avg_read_latency` -- **Description:** Average latency of read operations (seconds); returned only when there is traffic -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 - ``` - -**V1 metric:** `bdb_avg_read_latency_max` -- **Description:** Highest value of average latency of read operations (seconds); returned only when there is traffic -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 - ``` - -**V1 metric:** `bdb_avg_write_latency` -- **Description:** Average latency of write operations (seconds); returned only when there is traffic -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 - ``` - -**V1 metric:** `bdb_avg_write_latency_max` -- **Description:** Highest value of average latency of write operations (seconds); returned only when there is traffic -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000 - ``` - -**V1 metric:** `bdb_bigstore_shard_count` -- **Description:** Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled -- **Equivalent V2 PromQL:** - ```promql - sum( - (sum( - label_replace( - label_replace( - namedprocess_namegroup_thread_count{groupname=~"redis-\d+", threadname=~"(speedb|rocksdb).*"}, - "redis", "$1", "groupname", "redis-(\d+)" - ), - "driver", "$1", "threadname", "(speedb|rocksdb).*" - ) - ) by (redis, driver) > bool 0) - * on (redis) group_left(bdb) - redis_server_up - ) by (bdb, driver) - ``` - -**V1 metric:** `bdb_conns` -- **Description:** Number of client connections to DB -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (endpoint_conns) - ``` - -**V1 metric:** `bdb_egress_bytes` -- **Description:** Rate of outgoing network traffic from the DB (bytes/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_egress_bytes[1m])) - ``` - -**V1 metric:** `bdb_egress_bytes_max` -- **Description:** Highest value of rate of outgoing network traffic from the DB (bytes/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_egress_bytes[1m])) - ``` - -**V1 metric:** `bdb_evicted_objects` -- **Description:** Rate of key evictions from DB (evictions/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_evicted_objects_max` -- **Description:** Highest value of rate of key evictions from DB (evictions/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_expired_objects` -- **Description:** Rate keys expired in DB (expirations/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_expired_objects_max` -- **Description:** Highest value of rate keys expired in DB (expirations/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_fork_cpu_system` -- **Description:** % cores utilization in system mode for all redis shard fork child processes of this database -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m])) - ``` - -**V1 metric:** `bdb_fork_cpu_system_max` -- **Description:** Highest value of % cores utilization in system mode for all redis shard fork child processes of this database -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m])) - ``` - -**V1 metric:** `bdb_fork_cpu_user` -- **Description:** % cores utilization in user mode for all redis shard fork child processes of this database -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m])) - ``` - -**V1 metric:** `bdb_fork_cpu_user_max` -- **Description:** Highest value of % cores utilization in user mode for all redis shard fork child processes of this database -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m])) - ``` - -**V1 metric:** `bdb_ingress_bytes` -- **Description:** Rate of incoming network traffic to DB (bytes/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_ingress_bytes[1m])) - ``` - -**V1 metric:** `bdb_ingress_bytes_max` -- **Description:** Highest value of rate of incoming network traffic to DB (bytes/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_ingress_bytes[1m])) - ``` - -**V1 metric:** `bdb_instantaneous_ops_per_sec` -- **Description:** Request rate handled by all shards of DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (redis_server_instantaneous_ops_per_sec) - ``` - -**V1 metric:** `bdb_main_thread_cpu_system` -- **Description:** % cores utilization in system mode for all redis shard main threads of this database -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m])) - ``` - -**V1 metric:** `bdb_main_thread_cpu_system_max` -- **Description:** Highest value of % cores utilization in system mode for all redis shard main threads of this database -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m])) - ``` - -**V1 metric:** `bdb_main_thread_cpu_user` -- **Description:** % cores utilization in user mode for all redis shard main threads of this database -- **Equivalent V2 PromQL:** - ```promql - sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m])) - ``` - -**V1 metric:** `bdb_main_thread_cpu_user_max` -- **Description:** Highest value of % cores utilization in user mode for all redis shard main threads of this database -- **Equivalent V2 PromQL:** - ```promql - sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m])) - ``` - -**V1 metric:** `bdb_mem_frag_ratio` -- **Description:** RAM fragmentation ratio (RSS / allocated RAM) -- **Equivalent V2 PromQL:** - ```promql - avg(redis_server_mem_fragmentation_ratio) - ``` - -**V1 metric:** `bdb_mem_size_lua` -- **Description:** Redis lua scripting heap size (bytes) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (redis_server_used_memory_lua) - ``` - -**V1 metric:** `bdb_memory_limit` -- **Description:** Configured RAM limit for the database -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (redis_server_maxmemory) - ``` - -**V1 metric:** `bdb_monitor_sessions_count` -- **Description:** Number of client connected in monitor mode to the DB -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (endpoint_monitor_sessions_count) - ``` - -**V1 metric:** `bdb_no_of_keys` -- **Description:** Number of keys in DB -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (redis_server_db_keys{role="master"}) - ``` - -**V1 metric:** `bdb_other_req` -- **Description:** Rate of other (non read/write) requests on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_other_req[1m])) - ``` - -**V1 metric:** `bdb_other_req_max` -- **Description:** Highest value of rate of other (non read/write) requests on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_other_req[1m])) - ``` - -**V1 metric:** `bdb_other_res` -- **Description:** Rate of other (non read/write) responses on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_other_res[1m])) - ``` - -**V1 metric:** `bdb_other_res_max` -- **Description:** Highest value of rate of other (non read/write) responses on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_other_res[1m])) - ``` - -**V1 metric:** `bdb_pubsub_channels` -- **Description:** Count the pub/sub channels with subscribed clients -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (redis_server_pubsub_channels) - ``` - -**V1 metric:** `bdb_pubsub_channels_max` -- **Description:** Highest value of count the pub/sub channels with subscribed clients -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (redis_server_pubsub_channels) - ``` - -**V1 metric:** `bdb_pubsub_patterns` -- **Description:** Count the pub/sub patterns with subscribed clients -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (redis_server_pubsub_patterns) - ``` - -**V1 metric:** `bdb_pubsub_patterns_max` -- **Description:** Highest value of count the pub/sub patterns with subscribed clients -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (redis_server_pubsub_patterns) - ``` - -**V1 metric:** `bdb_read_hits` -- **Description:** Rate of read operations accessing an existing key (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_read_hits_max` -- **Description:** Highest value of rate of read operations accessing an existing key (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_read_misses` -- **Description:** Rate of read operations accessing a non-existing key (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_read_misses_max` -- **Description:** Highest value of rate of read operations accessing a non-existing key (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_read_req` -- **Description:** Rate of read requests on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_read_req[1m])) - ``` - -**V1 metric:** `bdb_read_req_max` -- **Description:** Highest value of rate of read requests on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_read_req[1m])) - ``` - -**V1 metric:** `bdb_read_res` -- **Description:** Rate of read responses on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_read_res[1m])) - ``` - -**V1 metric:** `bdb_read_res_max` -- **Description:** Highest value of rate of read responses on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_read_res[1m])) - ``` - -**V1 metric:** `bdb_shard_cpu_system` -- **Description:** % cores utilization in system mode for all redis shard processes of this database -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m])) - ``` - -**V1 metric:** `bdb_shard_cpu_system_max` -- **Description:** Highest value of % cores utilization in system mode for all redis shard processes of this database -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m])) - ``` - -**V1 metric:** `bdb_shard_cpu_user` -- **Description:** % cores utilization in user mode for the redis shard process -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m])) - ``` - -**V1 metric:** `bdb_shard_cpu_user_max` -- **Description:** Highest value of % cores utilization in user mode for the redis shard process -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m])) - ``` - -**V1 metric:** `bdb_shards_used` -- **Description:** Used shard count by database and by shard type (ram / flash) -- **Equivalent V2 PromQL:** - ``` - sum( - (sum( - label_replace( - label_replace( - label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+"}, - "redis", "$1", "groupname", "redis-(\d+)"), - "shard_type", "flash", "threadname", "(bigstore).*"), - "shard_type", "ram", "shard_type", "") - ) by (redis, shard_type) > bool 0) - * on (redis) group_left(bdb) - redis_server_up - ) by (bdb, shard_type) - ``` - -**V1 metric:** `bdb_total_connections_received` -- **Description:** Rate of new client connections to DB (connections/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_total_connections_received[1m])) - ``` - -**V1 metric:** `bdb_total_connections_received_max` -- **Description:** Highest value of rate of new client connections to DB (connections/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_total_connections_received[1m])) - ``` - -**V1 metric:** `bdb_total_req` -- **Description:** Rate of all requests on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_total_req[1m])) - ``` - -**V1 metric:** `bdb_total_req_max` -- **Description:** Highest value of rate of all requests on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_total_req[1m])) - ``` - -**V1 metric:** `bdb_total_res` -- **Description:** Rate of all responses on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_total_res[1m])) - ``` - -**V1 metric:** `bdb_total_res_max` -- **Description:** Highest value of rate of all responses on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_total_res[1m])) - ``` - -**V1 metric:** `bdb_up` -- **Description:** Database is up and running -- **Equivalent V2 PromQL:** - ```promql - min by(bdb) (redis_up) - ``` - -**V1 metric:** `bdb_used_memory` -- **Description:** Memory used by db (in bigredis this includes flash) (bytes) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (redis_server_used_memory) - ``` - -**V1 metric:** `bdb_write_hits` -- **Description:** Rate of write operations accessing an existing key (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_write_hits_max` -- **Description:** Highest value of rate of write operations accessing an existing key (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_write_misses` -- **Description:** Rate of write operations accessing a non-existing key (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_write_misses_max` -- **Description:** Highest value of rate of write operations accessing a non-existing key (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m])) - ``` - -**V1 metric:** `bdb_write_req` -- **Description:** Rate of write requests on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_write_req[1m])) - ``` - -**V1 metric:** `bdb_write_req_max` -- **Description:** Highest value of rate of write requests on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by (bdb) (irate(endpoint_write_req[1m])) - ``` - -**V1 metric:** `bdb_write_res` -- **Description:** Rate of write responses on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_write_responses[1m])) - ``` - -**V1 metric:** `bdb_write_res_max` -- **Description:** Highest value of rate of write responses on DB (ops/sec) -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (irate(endpoint_write_responses[1m])) - ``` - -**V1 metric:** `no_of_expires` -- **Description:** Current number of volatile keys in the database -- **Equivalent V2 PromQL:** - ```promql - sum by(bdb) (redis_server_db_expires{role="master"}) - ``` - -## Node metrics - -TBA - -## Cluster metrics - -TBA - -## Proxy metrics - -TBA - -## Replication metrics - -TBA - -## Shard metrics - -TBA diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 261b20365f..e199072d57 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -1,5 +1,5 @@ --- -Title: Metrics in Prometheus (tables) +Title: Metrics in Prometheus alwaysopen: false categories: - docs @@ -7,7 +7,7 @@ categories: - rs description: The metrics available to Prometheus. group: observability -linkTitle: Prometheus metrics (tables) +linkTitle: Prometheus metrics summary: You can use Prometheus and Grafana to collect and visualize your Redis Enterprise Software metrics. type: integration From 9e40d6830cc7beecc5de9ab63867238ed1443a69 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 9 Aug 2024 10:09:09 -0500 Subject: [PATCH 03/25] DOC-3946 v2 node Prometheus metrics --- .../prometheus-metrics-definitions.md | 90 +++++++++---------- 1 file changed, 45 insertions(+), 45 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index e199072d57..0596efdb65 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -93,51 +93,51 @@ Here are the metrics available to Prometheus: ## Node metrics -| Metric | Description | -| ------ | :------ | -| node_available_flash | Available flash in node (bytes) | -| node_available_flash_no_overbooking | Available flash in node (bytes), without taking into account overbooking | -| node_available_memory | Amount of free memory in node (bytes) that is available for database provisioning | -| node_available_memory_no_overbooking | Available ram in node (bytes) without taking into account overbooking | -| node_avg_latency | Average latency of requests handled by endpoints on the node in milliseconds; returned only when there is traffic | -| node_bigstore_free | Sum of free space of back-end flash (used by flash DB's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | -| node_bigstore_iops | Rate of i/o operations against back-end flash for all shards which are part of a flash based DB (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | -| node_bigstore_kv_ops | Rate of value read/write operations against back-end flash for all shards which are part of a flash based DB (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | -| node_bigstore_throughput | Throughput i/o operations against back-end flash for all shards which are part of a flash based DB (BigRedis) in cluster (bytes/sec); returned only when BigRedis is enabled | -| node_cert_expiration_seconds | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates expiration]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | -| node_conns | Number of clients connected to endpoints on node | -| node_cpu_idle | CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_max | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_median | Average value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_min | Lowest value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_system | CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_max | Highest value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_median | Average value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_min | Lowest value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_user | CPU time portion spent by users-pace processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_max | Highest value of CPU time portion spent by users-pace processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_median | Average value of CPU time portion spent by users-pace processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_min | Lowest value of CPU time portion spent by users-pace processes (0-1, multiply by 100 to get percent) | -| node_cur_aof_rewrites | Number of aof rewrites that are currently performed by shards on this node | -| node_egress_bytes | Rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_max | Highest value of rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_median | Average value of rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_min | Lowest value of rate of outgoing network traffic to node (bytes/sec) | -| node_ephemeral_storage_avail | Disk space available to RLEC processes on configured ephemeral disk (bytes) | -| node_ephemeral_storage_free | Free disk space on configured ephemeral disk (bytes) | -| node_free_memory | Free memory in node (bytes) | -| node_ingress_bytes | Rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_max | Highest value of rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_median | Average value of rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_min | Lowest value of rate of incoming network traffic to node (bytes/sec) | -| node_persistent_storage_avail | Disk space available to RLEC processes on configured persistent disk (bytes) | -| node_persistent_storage_free | Free disk space on configured persistent disk (bytes) | -| node_provisional_flash | Amount of flash available for new shards on this node, taking into account overbooking, max redis servers, reserved flash and provision and migration thresholds (bytes) | -| node_provisional_flash_no_overbooking | Amount of flash available for new shards on this node, without taking into account overbooking, max redis servers, reserved flash and provision and migration thresholds (bytes) | -| node_provisional_memory | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases | -| node_provisional_memory_no_overbooking | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases, without taking into account overbooking | -| node_total_req | Request rate handled by endpoints on node (ops/sec) | -| node_up | Node is part of the cluster and is connected | +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| node_available_flash | `node_available_flash_bytes` | Available flash in node (bytes) | +| node_available_flash_no_overbooking | `node_available_flash_no_overbooking_bytes` | Available flash in node (bytes), without taking into account overbooking | +| node_available_memory | `node_available_memory_bytes` | Amount of free memory in node (bytes) that is available for database provisioning | +| node_available_memory_no_overbooking | `node_available_memory_no_overbooking_bytes` | Available RAM in node (bytes) without taking into account overbooking | +| node_avg_latency | `sum by (proxy) (irate(endpoint_acc_latency[1m])) / sum by (proxy) (irate(endpoint_total_started_res[1m]))` | Average latency of requests handled by endpoints on the node in milliseconds; returned only when there is traffic | +| node_bigstore_free | `node_bigstore_free_bytes` | Sum of free space of back-end flash (used by flash DB's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | +| node_bigstore_iops | `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based DB (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | +| node_bigstore_kv_ops | `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based DB (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | +| node_bigstore_throughput | `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based DB (BigRedis) in cluster (bytes/sec); returned only when BigRedis is enabled | +| node_cert_expiration_seconds | `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node | +| node_conns | `sum by (node) (endpoint_conns)` | Number of clients connected to endpoints on node | +| node_cpu_idle | `avg by (node) (irate(node_cpu_seconds_total{mode="idle"}[1m]))` | CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_max | `not supported - see footnote2` | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_median | `not supported - see footnote2` | Average value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_min | `not supported - see footnote2` | Lowest value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_system | `avg by (node) (irate(node_cpu_seconds_total{mode="system"}[1m]))` | CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_max | `not supported - see footnote2` | Highest value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_median | `not supported - see footnote2` | Average value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_min | `not supported - see footnote2` | Lowest value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | +| node_cpu_user | `avg by (node) (irate(node_cpu_seconds_total{mode="user"}[1m]))` | CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_max | `not supported - see footnote2` | Highest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_median | `not supported - see footnote2` | Average value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_min | `not supported - see footnote2` | Lowest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cur_aof_rewrites | `sum by (cluster, node) (redis_server_aof_rewrite_in_progress)` | Number of AOF rewrites that are currently performed by shards on this node | +| node_egress_bytes | `irate(node_network_transmit_bytes_total{device=""}[1m])` | Rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_max | `not supported - see footnote2` | Highest value of rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_median | `not supported - see footnote2` | Average value of rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_min | `not supported - see footnote2` | Lowest value of rate of outgoing network traffic to node (bytes/sec) | +| node_ephemeral_storage_avail | `node_ephemeral_storage_avail_bytes` | Disk space available to RLEC processes on configured ephemeral disk (bytes) | +| node_ephemeral_storage_free | `node_ephemeral_storage_free_bytes` | Free disk space on configured ephemeral disk (bytes) | +| node_free_memory | `node_memory_MemFree_bytes` | Free memory in node (bytes) | +| node_ingress_bytes | `irate(node_network_receive_bytes_total{device=""}[1m])` | Rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_max | `not supported - see footnote2` | Highest value of rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_median | `not supported - see footnote2` | Average value of rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_min | `not supported - see footnote2` | Lowest value of rate of incoming network traffic to node (bytes/sec) | +| node_persistent_storage_avail | `node_persistent_storage_avail_bytes` | Disk space available to RLEC processes on configured persistent disk (bytes) | +| node_persistent_storage_free | `node_persistent_storage_free_bytes` | Free disk space on configured persistent disk (bytes) | +| node_provisional_flash | `node_provisional_flash_bytes` | Amount of flash available for new shards on this node, taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| node_provisional_flash_no_overbooking | `node_provisional_flash_no_overbooking_bytes` | Amount of flash available for new shards on this node, without taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| node_provisional_memory | `node_provisional_memory_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases | +| node_provisional_memory_no_overbooking | `node_provisional_memory_no_overbooking_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases, without taking into account overbooking | +| node_total_req | `sum by (cluster, node) (irate(endpoint_total_req[1m]))` | Request rate handled by endpoints on node (ops/sec) | +| node_up | `node_metrics_up` | Node is part of the cluster and is connected | ## Cluster metrics From b63996a943e84bc4a96f6e35b6252361cf4c43fe Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 9 Aug 2024 10:12:52 -0500 Subject: [PATCH 04/25] DOC-3949 v2 cluster Prometheus metrics --- .../prometheus-metrics-definitions.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 0596efdb65..1053c6fea4 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -141,10 +141,9 @@ Here are the metrics available to Prometheus: ## Cluster metrics -| Metric | Description | -| ------ | :------ | -| cluster_shards_limit | Total shard limit by the license by shard type (ram / flash) | - +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| cluster_shards_limit | `license_shards_limit` | Total shard limit by the license by shard type (ram / flash) | ## Proxy metrics From 1b89d8f52328e1922c9816f8ddadd79f96bf2ce0 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 9 Aug 2024 11:13:20 -0500 Subject: [PATCH 05/25] DOC-3950 v2 proxy Prometheus metrics --- .../prometheus-metrics-definitions.md | 118 +++++++++--------- 1 file changed, 60 insertions(+), 58 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 1053c6fea4..22b973bd9e 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -147,64 +147,66 @@ Here are the metrics available to Prometheus: ## Proxy metrics -| Metric | Description | -| ------ | :------ | -| listener_acc_latency | Accumulative latency (sum of the latencies) of all types of commands on DB. For the average latency, divide this value by listener_total_res | -| listener_acc_latency_max | Highest value of accumulative latency of all types of commands on DB | -| listener_acc_other_latency | Accumulative latency (sum of the latencies) of commands that are type "other" on DB. For the average latency, divide this value by listener_other_res | -| listener_acc_other_latency_max | Highest value of accumulative latency of commands that are type "other" on DB | -| listener_acc_read_latency | Accumulative latency (sum of the latencies) of commands that are type "read" on DB. For the average latency, divide this value by listener_read_res | -| listener_acc_read_latency_max | Highest value of accumulative latency of commands that are type "read" on DB | -| listener_acc_write_latency | Accumulative latency (sum of the latencies) of commands that are type "write" on DB. For the average latency, divide this value by listener_write_res | -| listener_acc_write_latency_max | Highest value of accumulative latency of commands that are type "write" on DB | -| listener_auth_cmds | Number of memcached AUTH commands sent to the DB | -| listener_auth_cmds_max | Highest value of number of memcached AUTH commands sent to the DB | -| listener_auth_errors | Number of error responses to memcached AUTH commands | -| listener_auth_errors_max | Highest value of number of error responses to memcached AUTH commands | -| listener_cmd_flush | Number of memcached FLUSH_ALL commands sent to the DB | -| listener_cmd_flush_max | Highest value of number of memcached FLUSH_ALL commands sent to the DB | -| listener_cmd_get | Number of memcached GET commands sent to the DB | -| listener_cmd_get_max | Highest value of number of memcached GET commands sent to the DB | -| listener_cmd_set | Number of memcached SET commands sent to the DB | -| listener_cmd_set_max | Highest value of number of memcached SET commands sent to the DB | -| listener_cmd_touch | Number of memcached TOUCH commands sent to the DB | -| listener_cmd_touch_max | Highest value of number of memcached TOUCH commands sent to the DB | -| listener_conns | Number of clients connected to the endpoint | -| listener_egress_bytes | Rate of outgoing network traffic to the endpoint (bytes/sec) | -| listener_egress_bytes_max | Highest value of rate of outgoing network traffic to the endpoint (bytes/sec) | -| listener_ingress_bytes | Rate of incoming network traffic to the endpoint (bytes/sec) | -| listener_ingress_bytes_max | Highest value of rate of incoming network traffic to the endpoint (bytes/sec) | -| listener_last_req_time | Time of last command sent to the DB | -| listener_last_res_time | Time of last response sent from the DB | -| listener_max_connections_exceeded | Number of times the Number of clients connected to the db at the same time has exeeded the max limit | -| listener_max_connections_exceeded_max | Highest value of number of times the Number of clients connected to the db at the same time has exeeded the max limit | -| listener_monitor_sessions_count | Number of client connected in monitor mode to the endpoint | -| listener_other_req | Rate of other (non read/write) requests on the endpoint (ops/sec) | -| listener_other_req_max | Highest value of rate of other (non read/write) requests on the endpoint (ops/sec) | -| listener_other_res | Rate of other (non read/write) responses on the endpoint (ops/sec) | -| listener_other_res_max | Highest value of rate of other (non read/write) responses on the endpoint (ops/sec) | -| listener_other_started_res | Number of responses sent from the DB of type "other" | -| listener_other_started_res_max | Highest value of number of responses sent from the DB of type "other" | -| listener_read_req | Rate of read requests on the endpoint (ops/sec) | -| listener_read_req_max | Highest value of rate of read requests on the endpoint (ops/sec) | -| listener_read_res | Rate of read responses on the endpoint (ops/sec) | -| listener_read_res_max | Highest value of rate of read responses on the endpoint (ops/sec) | -| listener_read_started_res | Number of responses sent from the DB of type "read" | -| listener_read_started_res_max | Highest value of number of responses sent from the DB of type "read" | -| listener_total_connections_received | Rate of new client connections to the endpoint (connections/sec) | -| listener_total_connections_received_max | Highest value of rate of new client connections to the endpoint (connections/sec) | -| listener_total_req | Request rate handled by the endpoint (ops/sec) | -| listener_total_req_max | Highest value of rate of all requests on the endpoint (ops/sec) | -| listener_total_res | Rate of all responses on the endpoint (ops/sec) | -| listener_total_res_max | Highest value of rate of all responses on the endpoint (ops/sec) | -| listener_total_started_res | Number of responses sent from the DB of all types | -| listener_total_started_res_max | Highest value of number of responses sent from the DB of all types | -| listener_write_req | Rate of write requests on the endpoint (ops/sec) | -| listener_write_req_max | Highest value of rate of write requests on the endpoint (ops/sec) | -| listener_write_res | Rate of write responses on the endpoint (ops/sec) | -| listener_write_res_max | Highest value of rate of write responses on the endpoint (ops/sec) | -| listener_write_started_res | Number of responses sent from the DB of type "write" | -| listener_write_started_res_max | Highest value of number of responses sent from the DB of type "write" | +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| listener_acc_latency | N/A | Accumulative latency (sum of the latencies) of all types of commands on DB. For the average latency, divide this value by listener_total_res | +| listener_acc_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of all types of commands on DB | +| listener_acc_other_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "other" on DB. For the average latency, divide this value by listener_other_res | +| listener_acc_other_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "other" on DB | +| listener_acc_read_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "read" on DB. For the average latency, divide this value by listener_read_res | +| listener_acc_read_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "read" on DB | +| listener_acc_write_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "write" on DB. For the average latency, divide this value by listener_write_res | +| listener_acc_write_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "write" on DB | +| listener_auth_cmds | N/A | Number of memcached AUTH commands sent to the DB | +| listener_auth_cmds_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached AUTH commands sent to the DB | +| listener_auth_errors | N/A | Number of error responses to memcached AUTH commands | +| listener_auth_errors_max | N/A[1](#proxy-table-note-1) | Highest value of number of error responses to memcached AUTH commands | +| listener_cmd_flush | N/A | Number of memcached FLUSH_ALL commands sent to the DB | +| listener_cmd_flush_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached FLUSH_ALL commands sent to the DB | +| listener_cmd_get | N/A | Number of memcached GET commands sent to the DB | +| listener_cmd_get_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached GET commands sent to the DB | +| listener_cmd_set | N/A | Number of memcached SET commands sent to the DB | +| listener_cmd_set_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached SET commands sent to the DB | +| listener_cmd_touch | N/A | Number of memcached TOUCH commands sent to the DB | +| listener_cmd_touch_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached TOUCH commands sent to the DB | +| listener_conns | N/A | Number of clients connected to the endpoint | +| listener_egress_bytes | N/A | Rate of outgoing network traffic to the endpoint (bytes/sec) | +| listener_egress_bytes_max | N/A[1](#proxy-table-note-1) | Highest value of rate of outgoing network traffic to the endpoint (bytes/sec) | +| listener_ingress_bytes | N/A | Rate of incoming network traffic to the endpoint (bytes/sec) | +| listener_ingress_bytes_max | N/A[1](#proxy-table-note-1) | Highest value of rate of incoming network traffic to the endpoint (bytes/sec) | +| listener_last_req_time | N/A | Time of last command sent to the DB | +| listener_last_res_time | N/A | Time of last response sent from the DB | +| listener_max_connections_exceeded | `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the DB at the same time has exceeded the max limit | +| listener_max_connections_exceeded_max | N/A[1](#proxy-table-note-1) | Highest value of number of times the number of clients connected to the DB at the same time has exceeded the max limit | +| listener_monitor_sessions_count | N/A | Number of client connected in monitor mode to the endpoint | +| listener_other_req | N/A | Rate of other (non-read/write) requests on the endpoint (ops/sec) | +| listener_other_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of other (non-read/write) requests on the endpoint (ops/sec) | +| listener_other_res | N/A | Rate of other (non-read/write) responses on the endpoint (ops/sec) | +| listener_other_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of other (non-read/write) responses on the endpoint (ops/sec) | +| listener_other_started_res | N/A | Number of responses sent from the DB of type "other" | +| listener_other_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the DB of type "other" | +| listener_read_req | `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | +| listener_read_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of read requests on the endpoint (ops/sec) | +| listener_read_res | `irate(endpoint_read_responses[1m])` | Rate of read responses on the endpoint (ops/sec) | +| listener_read_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of read responses on the endpoint (ops/sec) | +| listener_read_started_res | N/A | Number of responses sent from the DB of type "read" | +| listener_read_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the DB of type "read" | +| listener_total_connections_received | `irate(endpoint_total_connections_received[1m])` | Rate of new client connections to the endpoint (connections/sec) | +| listener_total_connections_received_max | N/A[1](#proxy-table-note-1) | Highest value of rate of new client connections to the endpoint (connections/sec) | +| listener_total_req | N/A | Request rate handled by the endpoint (ops/sec) | +| listener_total_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of all requests on the endpoint (ops/sec) | +| listener_total_res | N/A | Rate of all responses on the endpoint (ops/sec) | +| listener_total_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of all responses on the endpoint (ops/sec) | +| listener_total_started_res | N/A | Number of responses sent from the DB of all types | +| listener_total_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the DB of all types | +| listener_write_req | `irate(endpoint_write_requests[1m])` | Rate of write requests on the endpoint (ops/sec) | +| listener_write_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of write requests on the endpoint (ops/sec) | +| listener_write_res | `irate(endpoint_write_responses[1m])` | Rate of write responses on the endpoint (ops/sec) | +| listener_write_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of write responses on the endpoint (ops/sec) | +| listener_write_started_res | N/A | Number of responses sent from the DB of type "write" | +| listener_write_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the DB of type "write" | + +1. The `max`, `min`, and `median` v1 metrics provide the aggregated value of the corresponding metric over a 30-second period. This was intended to alleviate a limitation in the v1 system that limited the resolution of reported metrics regardless of the configured scrape interval. This limitation does not apply to v2. You should avoid the extra aggregations unless required for specific use cases. ## Replication metrics From 154f6ef3ed0598d027973075ebfe1c9f42d05ef6 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 9 Aug 2024 11:18:00 -0500 Subject: [PATCH 06/25] DOC-3947 v2 replication Prometheus metrics --- .../prometheus-metrics-definitions.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 22b973bd9e..a91eb6732a 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -210,16 +210,16 @@ Here are the metrics available to Prometheus: ## Replication metrics -| Metric | Description | -| ------ | :------ | -| bdb_replicaof_syncer_ingress_bytes | Rate of compressed incoming network traffic to Replica Of DB (bytes/sec) | -| bdb_replicaof_syncer_ingress_bytes_decompressed | Rate of decompressed incoming network traffic to Replica Of DB (bytes/sec) | -| bdb_replicaof_syncer_local_ingress_lag_time | Lag time between the source and the destination for Replica Of traffic (ms) | -| bdb_replicaof_syncer_status | Syncer status for Replica Of traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | -| bdb_crdt_syncer_ingress_bytes | Rate of compressed incoming network traffic to CRDB (bytes/sec) | -| bdb_crdt_syncer_ingress_bytes_decompressed | Rate of decompressed incoming network traffic to CRDB (bytes/sec) | -| bdb_crdt_syncer_local_ingress_lag_time | Lag time between the source and the destination (ms) for CRDB traffic | -| bdb_crdt_syncer_status | Syncer status for CRDB traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| bdb_replicaof_syncer_ingress_bytes | `rate(replica_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to Replica Of DB (bytes/sec) | +| bdb_replicaof_syncer_ingress_bytes_decompressed | `rate(replica_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to Replica Of DB (bytes/sec) | +| bdb_replicaof_syncer_local_ingress_lag_time | `database_syncer_lag_ms{syncer_type="replicaof"}` | Lag time between the source and the destination for Replica Of traffic (ms) | +| bdb_replicaof_syncer_status | `database_syncer_current_status{syncer_type="replicaof"}` | Syncer status for Replica Of traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | +| bdb_crdt_syncer_ingress_bytes | `rate(crdt_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to CRDB (bytes/sec) | +| bdb_crdt_syncer_ingress_bytes_decompressed | `rate(crdt_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to CRDB (bytes/sec) | +| bdb_crdt_syncer_local_ingress_lag_time | `database_syncer_lag_ms{syncer_type="crdt"}` | Lag time between the source and the destination (ms) for CRDB traffic | +| bdb_crdt_syncer_status | `database_syncer_current_status{syncer_type="crdt"}` | Syncer status for CRDB traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | ## Shard metrics From 7a4434db4a77a4b4fa40fde0cba3b2dde93db217 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 9 Aug 2024 11:24:18 -0500 Subject: [PATCH 07/25] DOC-3948 v2 shard Prometheus metrics --- .../prometheus-metrics-definitions.md | 114 +++++++++--------- 1 file changed, 57 insertions(+), 57 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index a91eb6732a..118e71bb58 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -223,60 +223,60 @@ Here are the metrics available to Prometheus: ## Shard metrics -| Metric | Description | -| ------ | :------ | -| redis_active_defrag_running | Automatic memory defragmentation current aggressiveness (% cpu) | -| redis_allocator_active | Total used memory including external fragmentation | -| redis_allocator_allocated | Total allocated memory | -| redis_allocator_resident | Total resident memory (RSS) | -| redis_aof_last_cow_size | Last AOFR, CopyOnWrite memory | -| redis_aof_rewrite_in_progress | The number of simultaneous AOF rewrites that are in progress | -| redis_aof_rewrites | Number of AOF rewrites this process executed | -| redis_aof_delayed_fsync | Number of times an AOF fsync caused delays in the redis main thread (inducing latency); This can indicate that the disk is slow or overloaded | -| redis_blocked_clients | Count the clients waiting on a blocking call | -| redis_connected_clients | Number of client connections to the specific shard | -| redis_connected_slaves | Number of connected slaves | -| redis_db0_avg_ttl | Average TTL of all volatile keys | -| redis_db0_expires | Total count of volatile keys | -| redis_db0_keys | Total key count | -| redis_evicted_keys | Keys evicted so far (since restart) | -| redis_expire_cycle_cpu_milliseconds | The cumulative amount of time spent on active expiry cycles | -| redis_expired_keys | Keys expired so far (since restart) | -| redis_forwarding_state | Shard forwarding state (on or off) | -| redis_keys_trimmed | The number of keys that were trimmed in the current or last resharding process | -| redis_keyspace_read_hits | Number of read operations accessing an existing keyspace | -| redis_keyspace_read_misses | Number of read operations accessing an non-existing keyspace | -| redis_keyspace_write_hits | Number of write operations accessing an existing keyspace | -| redis_keyspace_write_misses | Number of write operations accessing an non-existing keyspace | -| redis_master_link_status | Indicates if the replica is connected to its master | -| redis_master_repl_offset | Number of bytes sent to replicas by the shard; Calculate the throughput for a time period by comparing the value at different times | -| redis_master_sync_in_progress | The master shard is synchronizing (1 true | 0 false) | -| redis_max_process_mem | Current memory limit configured by redis_mgr according to node free memory | -| redis_maxmemory | Current memory limit configured by redis_mgr according to db memory limits | -| redis_mem_aof_buffer | Current size of AOF buffer | -| redis_mem_clients_normal | Current memory used for input and output buffers of non-replica clients | -| redis_mem_clients_slaves | Current memory used for input and output buffers of replica clients | -| redis_mem_fragmentation_ratio | Memory fragmentation ratio (1.3 means 30% overhead) | -| redis_mem_not_counted_for_evict | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | -| redis_mem_replication_backlog | Size of replication backlog | -| redis_module_fork_in_progress | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | -| redis_process_cpu_system_seconds_total | Shard Process system CPU time spent in seconds | -| redis_process_cpu_usage_percent | Shard Process cpu usage precentage | -| redis_process_cpu_user_seconds_total | Shard user CPU time spent in seconds | -| redis_process_main_thread_cpu_system_seconds_total | Shard main thread system CPU time spent in seconds | -| redis_process_main_thread_cpu_user_seconds_total | Shard main thread user CPU time spent in seconds | -| redis_process_max_fds | Shard Maximum number of open file descriptors | -| redis_process_open_fds | Shard Number of open file descriptors | -| redis_process_resident_memory_bytes | Shard Resident memory size in bytes | -| redis_process_start_time_seconds | Shard Start time of the process since unix epoch in seconds | -| redis_process_virtual_memory_bytes | Shard virtual memory in bytes | -| redis_rdb_bgsave_in_progress | Indication if bgsave is currently in progress | -| redis_rdb_last_cow_size | Last bgsave (or SYNC fork) used CopyOnWrite memory | -| redis_rdb_saves | Total count of bgsaves since process was restarted (including replica fullsync and persistence) | -| redis_repl_touch_bytes | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; Calculate the throughput for a time period by comparing the value at different times | -| redis_total_commands_processed | Number of commands processed by the shard; Calculate the number of commands for a time period by comparing the value at different times | -| redis_total_connections_received | Number of connections received by the shard; Calculate the number of connections for a time period by comparing the value at different times | -| redis_total_net_input_bytes | Number of bytes received by the shard; Calculate the throughput for a time period by comparing the value at different times | -| redis_total_net_output_bytes | Number of bytes sent by the shard; Calculate the throughput for a time period by comparing the value at different times | -| redis_up | Shard is up and running | -| redis_used_memory | Memory used by shard (in bigredis this includes flash) (bytes) | +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| redis_active_defrag_running | `redis_server_active_defrag_running` | Automatic memory defragmentation current aggressiveness (% cpu) | +| redis_allocator_active | `redis_server_allocator_active` | Total used memory including external fragmentation | +| redis_allocator_allocated | `redis_server_allocator_allocated` | Total allocated memory | +| redis_allocator_resident | `redis_server_allocator_resident` | Total resident memory (RSS) | +| redis_aof_last_cow_size | `redis_server_aof_last_cow_size` | Last AOFR, CopyOnWrite memory | +| redis_aof_rewrite_in_progress | `redis_server_aof_rewrite_in_progress` | The number of simultaneous AOF rewrites that are in progress | +| redis_aof_rewrites | `redis_server_aof_rewrites` | Number of AOF rewrites this process executed | +| redis_aof_delayed_fsync | `redis_server_aof_delayed_fsync` | Number of times an AOF fsync caused delays in the Redis main thread (inducing latency); This can indicate that the disk is slow or overloaded | +| redis_blocked_clients | `redis_server_blocked_clients` | Count the clients waiting on a blocking call | +| redis_connected_clients | `redis_server_connected_clients` | Number of client connections to the specific shard | +| redis_connected_slaves | `redis_server_connected_slaves` | Number of connected slaves | +| redis_db0_avg_ttl | `redis_server_db0_avg_ttl` | Average TTL of all volatile keys | +| redis_db0_expires | `redis_server_expired_keys` | Total count of volatile keys | +| redis_db0_keys | `redis_server_db0_keys` | Total key count | +| redis_evicted_keys | `redis_server_evicted_keys` | Keys evicted so far (since restart) | +| redis_expire_cycle_cpu_milliseconds | `redis_server_expire_cycle_cpu_milliseconds` | The cumulative amount of time spent on active expiry cycles | +| redis_expired_keys | `redis_server_expired_keys` | Keys expired so far (since restart) | +| redis_forwarding_state | `redis_server_forwarding_state` | Shard forwarding state (on or off) | +| redis_keys_trimmed | `redis_server_keys_trimmed` | The number of keys that were trimmed in the current or last resharding process | +| redis_keyspace_read_hits | `redis_server_keyspace_read_hits` | Number of read operations accessing an existing keyspace | +| redis_keyspace_read_misses | `redis_server_keyspace_read_misses` | Number of read operations accessing a non-existing keyspace | +| redis_keyspace_write_hits | `redis_server_keyspace_write_hits` | Number of write operations accessing an existing keyspace | +| redis_keyspace_write_misses | `redis_server_keyspace_write_misses` | Number of write operations accessing a non-existing keyspace | +| redis_master_link_status | `redis_server_master_link_status` | Indicates if the replica is connected to its master | +| redis_master_repl_offset | `redis_server_master_repl_offset` | Number of bytes sent to replicas by the shard; Calculate the throughput for a time period by comparing the value at different times | +| redis_master_sync_in_progress | `redis_server_master_sync_in_progress` | The master shard is synchronizing (1 true | 0 false) | +| redis_max_process_mem | `redis_server_max_process_mem` | Current memory limit configured by redis_mgr according to node free memory | +| redis_maxmemory | `redis_server_maxmemory` | Current memory limit configured by redis_mgr according to DB memory limits | +| redis_mem_aof_buffer | `redis_server_mem_aof_buffer` | Current size of AOF buffer | +| redis_mem_clients_normal | `redis_server_mem_clients_normal` | Current memory used for input and output buffers of non-replica clients | +| redis_mem_clients_slaves | `redis_server_mem_clients_slaves` | Current memory used for input and output buffers of replica clients | +| redis_mem_fragmentation_ratio | `redis_server_mem_fragmentation_ratio` | Memory fragmentation ratio (1.3 means 30% overhead) | +| redis_mem_not_counted_for_evict | `redis_server_mem_not_counted_for_evict` | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | +| redis_mem_replication_backlog | `redis_server_mem_replication_backlog` | Size of replication backlog | +| redis_module_fork_in_progress | `redis_server_module_fork_in_progress` | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | +| redis_process_cpu_system_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="system"}` | Shard Process system CPU time spent in seconds | +| redis_process_cpu_usage_percent | `namedprocess_namegroup_cpu_seconds_total{mode=~"system|user"}` | Shard Process CPU usage percentage | +| redis_process_cpu_user_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="user"}` | Shard user CPU time spent in seconds | +| redis_process_main_thread_cpu_system_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="system",threadname="redis-server"}` | Shard main thread system CPU time spent in seconds | +| redis_process_main_thread_cpu_user_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="user",threadname="redis-server"}` | Shard main thread user CPU time spent in seconds | +| redis_process_max_fds | `max(namedprocess_namegroup_open_filedesc)` | Shard Maximum number of open file descriptors | +| redis_process_open_fds | `namedprocess_namegroup_open_filedesc` | Shard Number of open file descriptors | +| redis_process_resident_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="resident"}` | Shard Resident memory size in bytes | +| redis_process_start_time_seconds | `namedprocess_namegroup_oldest_start_time_seconds` | Shard Start time of the process since unix epoch in seconds | +| redis_process_virtual_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="virtual"}` | Shard virtual memory in bytes | +| redis_rdb_bgsave_in_progress | `redis_server_rdb_bgsave_in_progress` | Indication if bgsave is currently in progress | +| redis_rdb_last_cow_size | `redis_server_rdb_last_cow_size` | Last bgsave (or SYNC fork) used CopyOnWrite memory | +| redis_rdb_saves | `redis_server_rdb_saves` | Total count of bgsaves since process was restarted (including replica fullsync and persistence) | +| redis_repl_touch_bytes | `redis_server_repl_touch_bytes` | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; Calculate the throughput for a time period by comparing the value at different times | +| redis_total_commands_processed | `redis_server_total_commands_processed` | Number of commands processed by the shard; Calculate the number of commands for a time period by comparing the value at different times | +| redis_total_connections_received | `redis_server_total_connections_received` | Number of connections received by the shard; Calculate the number of connections for a time period by comparing the value at different times | +| redis_total_net_input_bytes | `redis_server_total_net_input_bytes` | Number of bytes received by the shard; Calculate the throughput for a time period by comparing the value at different times | +| redis_total_net_output_bytes | `redis_server_total_net_output_bytes` | Number of bytes sent by the shard; Calculate the throughput for a time period by comparing the value at different times | +| redis_up | `redis_server_up` | Shard is up and running | +| redis_used_memory | `redis_server_used_memory` | Memory used by shard (in BigRedis this includes flash) (bytes) | From abf696563a845dc9d5a6ddb2df08e6e7d02d7af2 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 9 Aug 2024 16:26:24 -0500 Subject: [PATCH 08/25] Small copy edits and fixes --- .../prometheus-metrics-definitions.md | 172 +++++++++--------- 1 file changed, 86 insertions(+), 86 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 118e71bb58..34091e17fb 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -22,40 +22,40 @@ Here are the metrics available to Prometheus: | V1 metric | Equivalent V2 PromQL | Description | | --------- | :------------------- | :---------- | -| bdb_avg_latency | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of operations on the DB (seconds); returned only when there is traffic | -| bdb_avg_latency_max | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of operations on the DB (seconds); returned only when there is traffic | +| bdb_avg_latency | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of operations on the database (seconds); returned only when there is traffic | +| bdb_avg_latency_max | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of operations on the database (seconds); returned only when there is traffic | | bdb_avg_read_latency | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of read operations (seconds); returned only when there is traffic | | bdb_avg_read_latency_max | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of read operations (seconds); returned only when there is traffic | | bdb_avg_write_latency | `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of write operations (seconds); returned only when there is traffic | | bdb_avg_write_latency_max | `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of write operations (seconds); returned only when there is traffic | | bdb_bigstore_shard_count | `sum((sum(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+", threadname=~"(speedb\|rocksdb).*"}, "redis", "$1", "groupname", "redis-(\d+)"), "driver", "$1", "threadname", "(speedb\|rocksdb).*")) by (redis, driver) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, driver)` | Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled | -| bdb_conns | `sum by(bdb) (endpoint_conns)` | Number of client connections to DB | -| bdb_egress_bytes | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Rate of outgoing network traffic from the DB (bytes/sec) | -| bdb_egress_bytes_max | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Highest value of rate of outgoing network traffic from the DB (bytes/sec) | -| bdb_evicted_objects | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Rate of key evictions from DB (evictions/sec) | -| bdb_evicted_objects_max | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Highest value of rate of key evictions from DB (evictions/sec) | -| bdb_expired_objects | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Rate keys expired in DB (expirations/sec) | -| bdb_expired_objects_max | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Highest value of rate keys expired in DB (expirations/sec) | -| bdb_fork_cpu_system | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | % cores utilization in system mode for all redis shard fork child processes of this database | -| bdb_fork_cpu_system_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | Highest value of % cores utilization in system mode for all redis shard fork child processes of this database | -| bdb_fork_cpu_user | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | % cores utilization in user mode for all redis shard fork child processes of this database | -| bdb_fork_cpu_user_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | Highest value of % cores utilization in user mode for all redis shard fork child processes of this database | -| bdb_ingress_bytes | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Rate of incoming network traffic to DB (bytes/sec) | -| bdb_ingress_bytes_max | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Highest value of rate of incoming network traffic to DB (bytes/sec) | -| bdb_instantaneous_ops_per_sec | `sum by(bdb) (redis_server_instantaneous_ops_per_sec)` | Request rate handled by all shards of DB (ops/sec) | -| bdb_main_thread_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | % cores utilization in system mode for all redis shard main threads of this database | -| bdb_main_thread_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in system mode for all redis shard main threads of this database | -| bdb_main_thread_cpu_user | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | % cores utilization in user mode for all redis shard main threads of this database | -| bdb_main_thread_cpu_user_max | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in user mode for all redis shard main threads of this database | +| bdb_conns | `sum by(bdb) (endpoint_conns)` | Number of client connections to database | +| bdb_egress_bytes | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Rate of outgoing network traffic from the database (bytes/sec) | +| bdb_egress_bytes_max | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Highest value of rate of outgoing network traffic from the database (bytes/sec) | +| bdb_evicted_objects | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Rate of key evictions from database (evictions/sec) | +| bdb_evicted_objects_max | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Highest value of rate of key evictions from database (evictions/sec) | +| bdb_expired_objects | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Rate keys expired in database (expirations/sec) | +| bdb_expired_objects_max | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Highest value of rate keys expired in database (expirations/sec) | +| bdb_fork_cpu_system | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | % cores utilization in system mode for all Redis shard fork child processes of this database | +| bdb_fork_cpu_system_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard fork child processes of this database | +| bdb_fork_cpu_user | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | % cores utilization in user mode for all Redis shard fork child processes of this database | +| bdb_fork_cpu_user_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard fork child processes of this database | +| bdb_ingress_bytes | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Rate of incoming network traffic to database (bytes/sec) | +| bdb_ingress_bytes_max | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Highest value of rate of incoming network traffic to database (bytes/sec) | +| bdb_instantaneous_ops_per_sec | `sum by(bdb) (redis_server_instantaneous_ops_per_sec)` | Request rate handled by all shards of database (ops/sec) | +| bdb_main_thread_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | % cores utilization in system mode for all Redis shard main threads of this database | +| bdb_main_thread_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard main threads of this database | +| bdb_main_thread_cpu_user | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | % cores utilization in user mode for all Redis shard main threads of this database | +| bdb_main_thread_cpu_user_max | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard main threads of this database | | bdb_mem_frag_ratio | `avg(redis_server_mem_fragmentation_ratio)` | RAM fragmentation ratio (RSS / allocated RAM) | | bdb_mem_size_lua | `sum by(bdb) (redis_server_used_memory_lua)` | Redis lua scripting heap size (bytes) | | bdb_memory_limit | `sum by(bdb) (redis_server_maxmemory)` | Configured RAM limit for the database | -| bdb_monitor_sessions_count | `sum by(bdb) (endpoint_monitor_sessions_count)` | Number of client connected in monitor mode to the DB | -| bdb_no_of_keys | `sum by (bdb) (redis_server_db_keys{role="master"})` | Number of keys in DB | -| bdb_other_req | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Rate of other (non read/write) requests on DB (ops/sec) | -| bdb_other_req_max | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Highest value of rate of other (non read/write) requests on DB (ops/sec) | -| bdb_other_res | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Rate of other (non read/write) responses on DB (ops/sec) | -| bdb_other_res_max | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Highest value of rate of other (non read/write) responses on DB (ops/sec) | +| bdb_monitor_sessions_count | `sum by(bdb) (endpoint_monitor_sessions_count)` | Number of client connected in monitor mode to the database | +| bdb_no_of_keys | `sum by (bdb) (redis_server_db_keys{role="master"})` | Number of keys in database | +| bdb_other_req | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Rate of other (non read/write) requests on database (ops/sec) | +| bdb_other_req_max | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Highest value of rate of other (non read/write) requests on database (ops/sec) | +| bdb_other_res | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Rate of other (non read/write) responses on database (ops/sec) | +| bdb_other_res_max | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Highest value of rate of other (non read/write) responses on database (ops/sec) | | bdb_pubsub_channels | `sum by(bdb) (redis_server_pubsub_channels)` | Count the pub/sub channels with subscribed clients | | bdb_pubsub_channels_max | `sum by(bdb) (redis_server_pubsub_channels)` | Highest value of count the pub/sub channels with subscribed clients | | bdb_pubsub_patterns | `sum by(bdb) (redis_server_pubsub_patterns)` | Count the pub/sub patterns with subscribed clients | @@ -64,31 +64,31 @@ Here are the metrics available to Prometheus: | bdb_read_hits_max | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Highest value of rate of read operations accessing an existing key (ops/sec) | | bdb_read_misses | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Rate of read operations accessing a non-existing key (ops/sec) | | bdb_read_misses_max | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Highest value of rate of read operations accessing a non-existing key (ops/sec) | -| bdb_read_req | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Rate of read requests on DB (ops/sec) | -| bdb_read_req_max | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Highest value of rate of read requests on DB (ops/sec) | -| bdb_read_res | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Rate of read responses on DB (ops/sec) | -| bdb_read_res_max | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Highest value of rate of read responses on DB (ops/sec) | -| bdb_shard_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | % cores utilization in system mode for all redis shard processes of this database | -| bdb_shard_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | Highest value of % cores utilization in system mode for all redis shard processes of this database | -| bdb_shard_cpu_user | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | % cores utilization in user mode for the redis shard process | -| bdb_shard_cpu_user_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | Highest value of % cores utilization in user mode for the redis shard process | +| bdb_read_req | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Rate of read requests on database (ops/sec) | +| bdb_read_req_max | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Highest value of rate of read requests on database (ops/sec) | +| bdb_read_res | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Rate of read responses on database (ops/sec) | +| bdb_read_res_max | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Highest value of rate of read responses on database (ops/sec) | +| bdb_shard_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | % cores utilization in system mode for all Redis shard processes of this database | +| bdb_shard_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard processes of this database | +| bdb_shard_cpu_user | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | % cores utilization in user mode for the Redis shard process | +| bdb_shard_cpu_user_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | Highest value of % cores utilization in user mode for the Redis shard process | | bdb_shards_used | `sum((sum(label_replace(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+"}, "redis", "$1", "groupname", "redis-(\d+)"), "shard_type", "flash", "threadname", "(bigstore).*"), "shard_type", "ram", "shard_type", "")) by (redis, shard_type) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, shard_type)` | Used shard count by database and by shard type (ram / flash) | -| bdb_total_connections_received | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Rate of new client connections to DB (connections/sec) | -| bdb_total_connections_received_max | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Highest value of rate of new client connections to DB (connections/sec) | -| bdb_total_req | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Rate of all requests on DB (ops/sec) | -| bdb_total_req_max | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Highest value of rate of all requests on DB (ops/sec) | -| bdb_total_res | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Rate of all responses on DB (ops/sec) | -| bdb_total_res_max | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Highest value of rate of all responses on DB (ops/sec) | +| bdb_total_connections_received | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Rate of new client connections to database (connections/sec) | +| bdb_total_connections_received_max | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Highest value of rate of new client connections to database (connections/sec) | +| bdb_total_req | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Rate of all requests on database (ops/sec) | +| bdb_total_req_max | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Highest value of rate of all requests on database (ops/sec) | +| bdb_total_res | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Rate of all responses on database (ops/sec) | +| bdb_total_res_max | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Highest value of rate of all responses on database (ops/sec) | | bdb_up | `min by(bdb) (redis_up)` | Database is up and running | -| bdb_used_memory | `sum by (bdb) (redis_server_used_memory)` | Memory used by db (in bigredis this includes flash) (bytes) | +| bdb_used_memory | `sum by (bdb) (redis_server_used_memory)` | Memory used by database (in BigRedis this includes flash) (bytes) | | bdb_write_hits | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Rate of write operations accessing an existing key (ops/sec) | | bdb_write_hits_max | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Highest value of rate of write operations accessing an existing key (ops/sec) | | bdb_write_misses | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Rate of write operations accessing a non-existing key (ops/sec) | | bdb_write_misses_max | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Highest value of rate of write operations accessing a non-existing key (ops/sec) | -| bdb_write_req | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Rate of write requests on DB (ops/sec) | -| bdb_write_req_max | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Highest value of rate of write requests on DB (ops/sec) | -| bdb_write_res | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Rate of write responses on DB (ops/sec) | -| bdb_write_res_max | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Highest value of rate of write responses on DB (ops/sec) | +| bdb_write_req | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Rate of write requests on database (ops/sec) | +| bdb_write_req_max | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Highest value of rate of write requests on database (ops/sec) | +| bdb_write_res | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Rate of write responses on database (ops/sec) | +| bdb_write_res_max | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Highest value of rate of write responses on database (ops/sec) | | no_of_expires | `sum by(bdb) (redis_server_db_expires{role="master"})` | Current number of volatile keys in the database | ## Node metrics @@ -100,11 +100,11 @@ Here are the metrics available to Prometheus: | node_available_memory | `node_available_memory_bytes` | Amount of free memory in node (bytes) that is available for database provisioning | | node_available_memory_no_overbooking | `node_available_memory_no_overbooking_bytes` | Available RAM in node (bytes) without taking into account overbooking | | node_avg_latency | `sum by (proxy) (irate(endpoint_acc_latency[1m])) / sum by (proxy) (irate(endpoint_total_started_res[1m]))` | Average latency of requests handled by endpoints on the node in milliseconds; returned only when there is traffic | -| node_bigstore_free | `node_bigstore_free_bytes` | Sum of free space of back-end flash (used by flash DB's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | -| node_bigstore_iops | `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based DB (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | -| node_bigstore_kv_ops | `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based DB (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | -| node_bigstore_throughput | `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based DB (BigRedis) in cluster (bytes/sec); returned only when BigRedis is enabled | -| node_cert_expiration_seconds | `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node | +| node_bigstore_free | `node_bigstore_free_bytes` | Sum of free space of back-end flash (used by flash database's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | +| node_bigstore_iops | `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | +| node_bigstore_kv_ops | `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | +| node_bigstore_throughput | `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in cluster (bytes/sec); returned only when BigRedis is enabled | +| node_cert_expiration_seconds | `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates expiration]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | | node_conns | `sum by (node) (endpoint_conns)` | Number of clients connected to endpoints on node | | node_cpu_idle | `avg by (node) (irate(node_cpu_seconds_total{mode="idle"}[1m]))` | CPU idle time portion (0-1, multiply by 100 to get percent) | | node_cpu_idle_max | `not supported - see footnote2` | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | @@ -149,62 +149,62 @@ Here are the metrics available to Prometheus: | V1 metric | Equivalent V2 PromQL | Description | | --------- | :------------------- | :---------- | -| listener_acc_latency | N/A | Accumulative latency (sum of the latencies) of all types of commands on DB. For the average latency, divide this value by listener_total_res | -| listener_acc_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of all types of commands on DB | -| listener_acc_other_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "other" on DB. For the average latency, divide this value by listener_other_res | -| listener_acc_other_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "other" on DB | -| listener_acc_read_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "read" on DB. For the average latency, divide this value by listener_read_res | -| listener_acc_read_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "read" on DB | -| listener_acc_write_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "write" on DB. For the average latency, divide this value by listener_write_res | -| listener_acc_write_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "write" on DB | -| listener_auth_cmds | N/A | Number of memcached AUTH commands sent to the DB | -| listener_auth_cmds_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached AUTH commands sent to the DB | +| listener_acc_latency | N/A | Accumulative latency (sum of the latencies) of all types of commands on the database. For the average latency, divide this value by listener_total_res | +| listener_acc_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of all types of commands on the database | +| listener_acc_other_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "other" on the database. For the average latency, divide this value by listener_other_res | +| listener_acc_other_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "other" on the database | +| listener_acc_read_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "read" on the database. For the average latency, divide this value by listener_read_res | +| listener_acc_read_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "read" on the database | +| listener_acc_write_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "write" on the database. For the average latency, divide this value by listener_write_res | +| listener_acc_write_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "write" on the database | +| listener_auth_cmds | N/A | Number of memcached AUTH commands sent to the database | +| listener_auth_cmds_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached AUTH commands sent to the database | | listener_auth_errors | N/A | Number of error responses to memcached AUTH commands | | listener_auth_errors_max | N/A[1](#proxy-table-note-1) | Highest value of number of error responses to memcached AUTH commands | -| listener_cmd_flush | N/A | Number of memcached FLUSH_ALL commands sent to the DB | -| listener_cmd_flush_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached FLUSH_ALL commands sent to the DB | -| listener_cmd_get | N/A | Number of memcached GET commands sent to the DB | -| listener_cmd_get_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached GET commands sent to the DB | -| listener_cmd_set | N/A | Number of memcached SET commands sent to the DB | -| listener_cmd_set_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached SET commands sent to the DB | -| listener_cmd_touch | N/A | Number of memcached TOUCH commands sent to the DB | -| listener_cmd_touch_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached TOUCH commands sent to the DB | +| listener_cmd_flush | N/A | Number of memcached FLUSH_ALL commands sent to the database | +| listener_cmd_flush_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached FLUSH_ALL commands sent to the database | +| listener_cmd_get | N/A | Number of memcached GET commands sent to the database | +| listener_cmd_get_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached GET commands sent to the database | +| listener_cmd_set | N/A | Number of memcached SET commands sent to the database | +| listener_cmd_set_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached SET commands sent to the database | +| listener_cmd_touch | N/A | Number of memcached TOUCH commands sent to the database | +| listener_cmd_touch_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached TOUCH commands sent to the database | | listener_conns | N/A | Number of clients connected to the endpoint | | listener_egress_bytes | N/A | Rate of outgoing network traffic to the endpoint (bytes/sec) | | listener_egress_bytes_max | N/A[1](#proxy-table-note-1) | Highest value of rate of outgoing network traffic to the endpoint (bytes/sec) | | listener_ingress_bytes | N/A | Rate of incoming network traffic to the endpoint (bytes/sec) | | listener_ingress_bytes_max | N/A[1](#proxy-table-note-1) | Highest value of rate of incoming network traffic to the endpoint (bytes/sec) | -| listener_last_req_time | N/A | Time of last command sent to the DB | -| listener_last_res_time | N/A | Time of last response sent from the DB | -| listener_max_connections_exceeded | `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the DB at the same time has exceeded the max limit | -| listener_max_connections_exceeded_max | N/A[1](#proxy-table-note-1) | Highest value of number of times the number of clients connected to the DB at the same time has exceeded the max limit | +| listener_last_req_time | N/A | Time of last command sent to the database | +| listener_last_res_time | N/A | Time of last response sent from the database | +| listener_max_connections_exceeded | `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the database at the same time has exceeded the max limit | +| listener_max_connections_exceeded_max | N/A[1](#proxy-table-note-1) | Highest value of number of times the number of clients connected to the database at the same time has exceeded the max limit | | listener_monitor_sessions_count | N/A | Number of client connected in monitor mode to the endpoint | | listener_other_req | N/A | Rate of other (non-read/write) requests on the endpoint (ops/sec) | | listener_other_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of other (non-read/write) requests on the endpoint (ops/sec) | | listener_other_res | N/A | Rate of other (non-read/write) responses on the endpoint (ops/sec) | | listener_other_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of other (non-read/write) responses on the endpoint (ops/sec) | -| listener_other_started_res | N/A | Number of responses sent from the DB of type "other" | -| listener_other_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the DB of type "other" | +| listener_other_started_res | N/A | Number of responses sent from the database of type "other" | +| listener_other_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the database of type "other" | | listener_read_req | `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | | listener_read_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of read requests on the endpoint (ops/sec) | | listener_read_res | `irate(endpoint_read_responses[1m])` | Rate of read responses on the endpoint (ops/sec) | | listener_read_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of read responses on the endpoint (ops/sec) | -| listener_read_started_res | N/A | Number of responses sent from the DB of type "read" | -| listener_read_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the DB of type "read" | +| listener_read_started_res | N/A | Number of responses sent from the database of type "read" | +| listener_read_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the database of type "read" | | listener_total_connections_received | `irate(endpoint_total_connections_received[1m])` | Rate of new client connections to the endpoint (connections/sec) | | listener_total_connections_received_max | N/A[1](#proxy-table-note-1) | Highest value of rate of new client connections to the endpoint (connections/sec) | | listener_total_req | N/A | Request rate handled by the endpoint (ops/sec) | | listener_total_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of all requests on the endpoint (ops/sec) | | listener_total_res | N/A | Rate of all responses on the endpoint (ops/sec) | | listener_total_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of all responses on the endpoint (ops/sec) | -| listener_total_started_res | N/A | Number of responses sent from the DB of all types | -| listener_total_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the DB of all types | +| listener_total_started_res | N/A | Number of responses sent from the database of all types | +| listener_total_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the database of all types | | listener_write_req | `irate(endpoint_write_requests[1m])` | Rate of write requests on the endpoint (ops/sec) | | listener_write_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of write requests on the endpoint (ops/sec) | | listener_write_res | `irate(endpoint_write_responses[1m])` | Rate of write responses on the endpoint (ops/sec) | | listener_write_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of write responses on the endpoint (ops/sec) | -| listener_write_started_res | N/A | Number of responses sent from the DB of type "write" | -| listener_write_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the DB of type "write" | +| listener_write_started_res | N/A | Number of responses sent from the database of type "write" | +| listener_write_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the database of type "write" | 1. The `max`, `min`, and `median` v1 metrics provide the aggregated value of the corresponding metric over a 30-second period. This was intended to alleviate a limitation in the v1 system that limited the resolution of reported metrics regardless of the configured scrape interval. This limitation does not apply to v2. You should avoid the extra aggregations unless required for specific use cases. @@ -212,8 +212,8 @@ Here are the metrics available to Prometheus: | V1 metric | Equivalent V2 PromQL | Description | | --------- | :------------------- | :---------- | -| bdb_replicaof_syncer_ingress_bytes | `rate(replica_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to Replica Of DB (bytes/sec) | -| bdb_replicaof_syncer_ingress_bytes_decompressed | `rate(replica_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to Replica Of DB (bytes/sec) | +| bdb_replicaof_syncer_ingress_bytes | `rate(replica_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to a Replica Of database (bytes/sec) | +| bdb_replicaof_syncer_ingress_bytes_decompressed | `rate(replica_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to a Replica Of database (bytes/sec) | | bdb_replicaof_syncer_local_ingress_lag_time | `database_syncer_lag_ms{syncer_type="replicaof"}` | Lag time between the source and the destination for Replica Of traffic (ms) | | bdb_replicaof_syncer_status | `database_syncer_current_status{syncer_type="replicaof"}` | Syncer status for Replica Of traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | | bdb_crdt_syncer_ingress_bytes | `rate(crdt_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to CRDB (bytes/sec) | @@ -232,7 +232,7 @@ Here are the metrics available to Prometheus: | redis_aof_last_cow_size | `redis_server_aof_last_cow_size` | Last AOFR, CopyOnWrite memory | | redis_aof_rewrite_in_progress | `redis_server_aof_rewrite_in_progress` | The number of simultaneous AOF rewrites that are in progress | | redis_aof_rewrites | `redis_server_aof_rewrites` | Number of AOF rewrites this process executed | -| redis_aof_delayed_fsync | `redis_server_aof_delayed_fsync` | Number of times an AOF fsync caused delays in the Redis main thread (inducing latency); This can indicate that the disk is slow or overloaded | +| redis_aof_delayed_fsync | `redis_server_aof_delayed_fsync` | Number of times an AOF fsync caused delays in the main Redis thread (inducing latency); this can indicate that the disk is slow or overloaded | | redis_blocked_clients | `redis_server_blocked_clients` | Count the clients waiting on a blocking call | | redis_connected_clients | `redis_server_connected_clients` | Number of client connections to the specific shard | | redis_connected_slaves | `redis_server_connected_slaves` | Number of connected slaves | @@ -252,7 +252,7 @@ Here are the metrics available to Prometheus: | redis_master_repl_offset | `redis_server_master_repl_offset` | Number of bytes sent to replicas by the shard; Calculate the throughput for a time period by comparing the value at different times | | redis_master_sync_in_progress | `redis_server_master_sync_in_progress` | The master shard is synchronizing (1 true | 0 false) | | redis_max_process_mem | `redis_server_max_process_mem` | Current memory limit configured by redis_mgr according to node free memory | -| redis_maxmemory | `redis_server_maxmemory` | Current memory limit configured by redis_mgr according to DB memory limits | +| redis_maxmemory | `redis_server_maxmemory` | Current memory limit configured by redis_mgr according to database memory limits | | redis_mem_aof_buffer | `redis_server_mem_aof_buffer` | Current size of AOF buffer | | redis_mem_clients_normal | `redis_server_mem_clients_normal` | Current memory used for input and output buffers of non-replica clients | | redis_mem_clients_slaves | `redis_server_mem_clients_slaves` | Current memory used for input and output buffers of replica clients | @@ -260,8 +260,8 @@ Here are the metrics available to Prometheus: | redis_mem_not_counted_for_evict | `redis_server_mem_not_counted_for_evict` | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | | redis_mem_replication_backlog | `redis_server_mem_replication_backlog` | Size of replication backlog | | redis_module_fork_in_progress | `redis_server_module_fork_in_progress` | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | -| redis_process_cpu_system_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="system"}` | Shard Process system CPU time spent in seconds | -| redis_process_cpu_usage_percent | `namedprocess_namegroup_cpu_seconds_total{mode=~"system|user"}` | Shard Process CPU usage percentage | +| redis_process_cpu_system_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="system"}` | Shard process system CPU time spent in seconds | +| redis_process_cpu_usage_percent | `namedprocess_namegroup_cpu_seconds_total{mode=~"system\|user"}` | Shard process CPU usage percentage | | redis_process_cpu_user_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="user"}` | Shard user CPU time spent in seconds | | redis_process_main_thread_cpu_system_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="system",threadname="redis-server"}` | Shard main thread system CPU time spent in seconds | | redis_process_main_thread_cpu_user_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="user",threadname="redis-server"}` | Shard main thread user CPU time spent in seconds | From ab858d49c6f541aa7180234d40d1ec121494197b Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 9 Aug 2024 16:51:58 -0500 Subject: [PATCH 09/25] More small copy edits and fixes --- .../prometheus-metrics-definitions.md | 174 +++++++++--------- 1 file changed, 87 insertions(+), 87 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 34091e17fb..b2b6256c8c 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -31,17 +31,17 @@ Here are the metrics available to Prometheus: | bdb_bigstore_shard_count | `sum((sum(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+", threadname=~"(speedb\|rocksdb).*"}, "redis", "$1", "groupname", "redis-(\d+)"), "driver", "$1", "threadname", "(speedb\|rocksdb).*")) by (redis, driver) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, driver)` | Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled | | bdb_conns | `sum by(bdb) (endpoint_conns)` | Number of client connections to database | | bdb_egress_bytes | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Rate of outgoing network traffic from the database (bytes/sec) | -| bdb_egress_bytes_max | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Highest value of rate of outgoing network traffic from the database (bytes/sec) | +| bdb_egress_bytes_max | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Highest value of the rate of outgoing network traffic from the database (bytes/sec) | | bdb_evicted_objects | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Rate of key evictions from database (evictions/sec) | -| bdb_evicted_objects_max | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Highest value of rate of key evictions from database (evictions/sec) | +| bdb_evicted_objects_max | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Highest value of the rate of key evictions from database (evictions/sec) | | bdb_expired_objects | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Rate keys expired in database (expirations/sec) | -| bdb_expired_objects_max | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Highest value of rate keys expired in database (expirations/sec) | +| bdb_expired_objects_max | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Highest value of the rate keys expired in database (expirations/sec) | | bdb_fork_cpu_system | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | % cores utilization in system mode for all Redis shard fork child processes of this database | | bdb_fork_cpu_system_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard fork child processes of this database | | bdb_fork_cpu_user | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | % cores utilization in user mode for all Redis shard fork child processes of this database | | bdb_fork_cpu_user_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard fork child processes of this database | | bdb_ingress_bytes | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Rate of incoming network traffic to database (bytes/sec) | -| bdb_ingress_bytes_max | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Highest value of rate of incoming network traffic to database (bytes/sec) | +| bdb_ingress_bytes_max | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Highest value of the rate of incoming network traffic to database (bytes/sec) | | bdb_instantaneous_ops_per_sec | `sum by(bdb) (redis_server_instantaneous_ops_per_sec)` | Request rate handled by all shards of database (ops/sec) | | bdb_main_thread_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | % cores utilization in system mode for all Redis shard main threads of this database | | bdb_main_thread_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard main threads of this database | @@ -50,86 +50,86 @@ Here are the metrics available to Prometheus: | bdb_mem_frag_ratio | `avg(redis_server_mem_fragmentation_ratio)` | RAM fragmentation ratio (RSS / allocated RAM) | | bdb_mem_size_lua | `sum by(bdb) (redis_server_used_memory_lua)` | Redis lua scripting heap size (bytes) | | bdb_memory_limit | `sum by(bdb) (redis_server_maxmemory)` | Configured RAM limit for the database | -| bdb_monitor_sessions_count | `sum by(bdb) (endpoint_monitor_sessions_count)` | Number of client connected in monitor mode to the database | +| bdb_monitor_sessions_count | `sum by(bdb) (endpoint_monitor_sessions_count)` | Number of clients connected in monitor mode to the database | | bdb_no_of_keys | `sum by (bdb) (redis_server_db_keys{role="master"})` | Number of keys in database | -| bdb_other_req | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Rate of other (non read/write) requests on database (ops/sec) | -| bdb_other_req_max | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Highest value of rate of other (non read/write) requests on database (ops/sec) | -| bdb_other_res | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Rate of other (non read/write) responses on database (ops/sec) | -| bdb_other_res_max | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Highest value of rate of other (non read/write) responses on database (ops/sec) | +| bdb_other_req | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Rate of other (non read/write) requests on the database (ops/sec) | +| bdb_other_req_max | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Highest value of the rate of other (non read/write) requests on the database (ops/sec) | +| bdb_other_res | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Rate of other (non read/write) responses on the database (ops/sec) | +| bdb_other_res_max | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Highest value of the rate of other (non read/write) responses on the database (ops/sec) | | bdb_pubsub_channels | `sum by(bdb) (redis_server_pubsub_channels)` | Count the pub/sub channels with subscribed clients | | bdb_pubsub_channels_max | `sum by(bdb) (redis_server_pubsub_channels)` | Highest value of count the pub/sub channels with subscribed clients | | bdb_pubsub_patterns | `sum by(bdb) (redis_server_pubsub_patterns)` | Count the pub/sub patterns with subscribed clients | | bdb_pubsub_patterns_max | `sum by(bdb) (redis_server_pubsub_patterns)` | Highest value of count the pub/sub patterns with subscribed clients | | bdb_read_hits | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Rate of read operations accessing an existing key (ops/sec) | -| bdb_read_hits_max | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Highest value of rate of read operations accessing an existing key (ops/sec) | +| bdb_read_hits_max | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Highest value of the rate of read operations accessing an existing key (ops/sec) | | bdb_read_misses | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Rate of read operations accessing a non-existing key (ops/sec) | -| bdb_read_misses_max | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Highest value of rate of read operations accessing a non-existing key (ops/sec) | -| bdb_read_req | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Rate of read requests on database (ops/sec) | -| bdb_read_req_max | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Highest value of rate of read requests on database (ops/sec) | -| bdb_read_res | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Rate of read responses on database (ops/sec) | -| bdb_read_res_max | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Highest value of rate of read responses on database (ops/sec) | +| bdb_read_misses_max | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Highest value of the rate of read operations accessing a non-existing key (ops/sec) | +| bdb_read_req | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Rate of read requests on the database (ops/sec) | +| bdb_read_req_max | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Highest value of the rate of read requests on the database (ops/sec) | +| bdb_read_res | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Rate of read responses on the database (ops/sec) | +| bdb_read_res_max | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Highest value of the rate of read responses on the database (ops/sec) | | bdb_shard_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | % cores utilization in system mode for all Redis shard processes of this database | | bdb_shard_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard processes of this database | | bdb_shard_cpu_user | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | % cores utilization in user mode for the Redis shard process | | bdb_shard_cpu_user_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | Highest value of % cores utilization in user mode for the Redis shard process | | bdb_shards_used | `sum((sum(label_replace(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+"}, "redis", "$1", "groupname", "redis-(\d+)"), "shard_type", "flash", "threadname", "(bigstore).*"), "shard_type", "ram", "shard_type", "")) by (redis, shard_type) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, shard_type)` | Used shard count by database and by shard type (ram / flash) | | bdb_total_connections_received | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Rate of new client connections to database (connections/sec) | -| bdb_total_connections_received_max | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Highest value of rate of new client connections to database (connections/sec) | -| bdb_total_req | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Rate of all requests on database (ops/sec) | -| bdb_total_req_max | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Highest value of rate of all requests on database (ops/sec) | -| bdb_total_res | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Rate of all responses on database (ops/sec) | -| bdb_total_res_max | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Highest value of rate of all responses on database (ops/sec) | +| bdb_total_connections_received_max | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Highest value of the rate of new client connections to database (connections/sec) | +| bdb_total_req | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Rate of all requests on the database (ops/sec) | +| bdb_total_req_max | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Highest value of the rate of all requests on the database (ops/sec) | +| bdb_total_res | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Rate of all responses on the database (ops/sec) | +| bdb_total_res_max | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Highest value of the rate of all responses on the database (ops/sec) | | bdb_up | `min by(bdb) (redis_up)` | Database is up and running | | bdb_used_memory | `sum by (bdb) (redis_server_used_memory)` | Memory used by database (in BigRedis this includes flash) (bytes) | | bdb_write_hits | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Rate of write operations accessing an existing key (ops/sec) | -| bdb_write_hits_max | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Highest value of rate of write operations accessing an existing key (ops/sec) | +| bdb_write_hits_max | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Highest value of the rate of write operations accessing an existing key (ops/sec) | | bdb_write_misses | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Rate of write operations accessing a non-existing key (ops/sec) | -| bdb_write_misses_max | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Highest value of rate of write operations accessing a non-existing key (ops/sec) | -| bdb_write_req | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Rate of write requests on database (ops/sec) | -| bdb_write_req_max | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Highest value of rate of write requests on database (ops/sec) | -| bdb_write_res | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Rate of write responses on database (ops/sec) | -| bdb_write_res_max | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Highest value of rate of write responses on database (ops/sec) | +| bdb_write_misses_max | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Highest value of the rate of write operations accessing a non-existing key (ops/sec) | +| bdb_write_req | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Rate of write requests on the database (ops/sec) | +| bdb_write_req_max | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Highest value of the rate of write requests on the database (ops/sec) | +| bdb_write_res | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Rate of write responses on the database (ops/sec) | +| bdb_write_res_max | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Highest value of the rate of write responses on the database (ops/sec) | | no_of_expires | `sum by(bdb) (redis_server_db_expires{role="master"})` | Current number of volatile keys in the database | ## Node metrics | V1 metric | Equivalent V2 PromQL | Description | | --------- | :------------------- | :---------- | -| node_available_flash | `node_available_flash_bytes` | Available flash in node (bytes) | -| node_available_flash_no_overbooking | `node_available_flash_no_overbooking_bytes` | Available flash in node (bytes), without taking into account overbooking | -| node_available_memory | `node_available_memory_bytes` | Amount of free memory in node (bytes) that is available for database provisioning | -| node_available_memory_no_overbooking | `node_available_memory_no_overbooking_bytes` | Available RAM in node (bytes) without taking into account overbooking | +| node_available_flash | `node_available_flash_bytes` | Available flash in the node (bytes) | +| node_available_flash_no_overbooking | `node_available_flash_no_overbooking_bytes` | Available flash in the node (bytes), without taking into account overbooking | +| node_available_memory | `node_available_memory_bytes` | Amount of free memory in the node (bytes) that is available for database provisioning | +| node_available_memory_no_overbooking | `node_available_memory_no_overbooking_bytes` | Available RAM in the node (bytes) without taking into account overbooking | | node_avg_latency | `sum by (proxy) (irate(endpoint_acc_latency[1m])) / sum by (proxy) (irate(endpoint_total_started_res[1m]))` | Average latency of requests handled by endpoints on the node in milliseconds; returned only when there is traffic | | node_bigstore_free | `node_bigstore_free_bytes` | Sum of free space of back-end flash (used by flash database's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | -| node_bigstore_iops | `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | -| node_bigstore_kv_ops | `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | -| node_bigstore_throughput | `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in cluster (bytes/sec); returned only when BigRedis is enabled | -| node_cert_expiration_seconds | `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates expiration]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | +| node_bigstore_iops | `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | +| node_bigstore_kv_ops | `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | +| node_bigstore_throughput | `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (bytes/sec); returned only when BigRedis is enabled | +| node_cert_expiration_seconds | `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | | node_conns | `sum by (node) (endpoint_conns)` | Number of clients connected to endpoints on node | | node_cpu_idle | `avg by (node) (irate(node_cpu_seconds_total{mode="idle"}[1m]))` | CPU idle time portion (0-1, multiply by 100 to get percent) | | node_cpu_idle_max | `not supported - see footnote2` | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | | node_cpu_idle_median | `not supported - see footnote2` | Average value of CPU idle time portion (0-1, multiply by 100 to get percent) | | node_cpu_idle_min | `not supported - see footnote2` | Lowest value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_system | `avg by (node) (irate(node_cpu_seconds_total{mode="system"}[1m]))` | CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_max | `not supported - see footnote2` | Highest value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_median | `not supported - see footnote2` | Average value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_min | `not supported - see footnote2` | Lowest value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system | `avg by (node) (irate(node_cpu_seconds_total{mode="system"}[1m]))` | CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_max | `not supported - see footnote2` | Highest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_median | `not supported - see footnote2` | Average value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_min | `not supported - see footnote2` | Lowest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | | node_cpu_user | `avg by (node) (irate(node_cpu_seconds_total{mode="user"}[1m]))` | CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | | node_cpu_user_max | `not supported - see footnote2` | Highest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | | node_cpu_user_median | `not supported - see footnote2` | Average value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | | node_cpu_user_min | `not supported - see footnote2` | Lowest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | | node_cur_aof_rewrites | `sum by (cluster, node) (redis_server_aof_rewrite_in_progress)` | Number of AOF rewrites that are currently performed by shards on this node | | node_egress_bytes | `irate(node_network_transmit_bytes_total{device=""}[1m])` | Rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_max | `not supported - see footnote2` | Highest value of rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_median | `not supported - see footnote2` | Average value of rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_min | `not supported - see footnote2` | Lowest value of rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_max | `not supported - see footnote2` | Highest value of the rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_median | `not supported - see footnote2` | Average value of the rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_min | `not supported - see footnote2` | Lowest value of the rate of outgoing network traffic to node (bytes/sec) | | node_ephemeral_storage_avail | `node_ephemeral_storage_avail_bytes` | Disk space available to RLEC processes on configured ephemeral disk (bytes) | | node_ephemeral_storage_free | `node_ephemeral_storage_free_bytes` | Free disk space on configured ephemeral disk (bytes) | -| node_free_memory | `node_memory_MemFree_bytes` | Free memory in node (bytes) | +| node_free_memory | `node_memory_MemFree_bytes` | Free memory in the node (bytes) | | node_ingress_bytes | `irate(node_network_receive_bytes_total{device=""}[1m])` | Rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_max | `not supported - see footnote2` | Highest value of rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_median | `not supported - see footnote2` | Average value of rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_min | `not supported - see footnote2` | Lowest value of rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_max | `not supported - see footnote2` | Highest value of the rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_median | `not supported - see footnote2` | Average value of the rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_min | `not supported - see footnote2` | Lowest value of the rate of incoming network traffic to node (bytes/sec) | | node_persistent_storage_avail | `node_persistent_storage_avail_bytes` | Disk space available to RLEC processes on configured persistent disk (bytes) | | node_persistent_storage_free | `node_persistent_storage_free_bytes` | Free disk space on configured persistent disk (bytes) | | node_provisional_flash | `node_provisional_flash_bytes` | Amount of flash available for new shards on this node, taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | @@ -151,60 +151,60 @@ Here are the metrics available to Prometheus: | --------- | :------------------- | :---------- | | listener_acc_latency | N/A | Accumulative latency (sum of the latencies) of all types of commands on the database. For the average latency, divide this value by listener_total_res | | listener_acc_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of all types of commands on the database | -| listener_acc_other_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "other" on the database. For the average latency, divide this value by listener_other_res | -| listener_acc_other_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "other" on the database | -| listener_acc_read_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "read" on the database. For the average latency, divide this value by listener_read_res | -| listener_acc_read_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "read" on the database | -| listener_acc_write_latency | N/A | Accumulative latency (sum of the latencies) of commands that are type "write" on the database. For the average latency, divide this value by listener_write_res | -| listener_acc_write_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are type "write" on the database | +| listener_acc_other_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "other" on the database. For the average latency, divide this value by listener_other_res | +| listener_acc_other_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are a type "other" on the database | +| listener_acc_read_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "read" on the database. For the average latency, divide this value by listener_read_res | +| listener_acc_read_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are a type "read" on the database | +| listener_acc_write_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "write" on the database. For the average latency, divide this value by listener_write_res | +| listener_acc_write_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are a type "write" on the database | | listener_auth_cmds | N/A | Number of memcached AUTH commands sent to the database | -| listener_auth_cmds_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached AUTH commands sent to the database | +| listener_auth_cmds_max | N/A[1](#proxy-table-note-1) | Highest value of the number of memcached AUTH commands sent to the database | | listener_auth_errors | N/A | Number of error responses to memcached AUTH commands | -| listener_auth_errors_max | N/A[1](#proxy-table-note-1) | Highest value of number of error responses to memcached AUTH commands | +| listener_auth_errors_max | N/A[1](#proxy-table-note-1) | Highest value of the number of error responses to memcached AUTH commands | | listener_cmd_flush | N/A | Number of memcached FLUSH_ALL commands sent to the database | -| listener_cmd_flush_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached FLUSH_ALL commands sent to the database | +| listener_cmd_flush_max | N/A[1](#proxy-table-note-1) | Highest value of the number of memcached FLUSH_ALL commands sent to the database | | listener_cmd_get | N/A | Number of memcached GET commands sent to the database | -| listener_cmd_get_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached GET commands sent to the database | +| listener_cmd_get_max | N/A[1](#proxy-table-note-1) | Highest value of the number of memcached GET commands sent to the database | | listener_cmd_set | N/A | Number of memcached SET commands sent to the database | -| listener_cmd_set_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached SET commands sent to the database | +| listener_cmd_set_max | N/A[1](#proxy-table-note-1) | Highest value of the number of memcached SET commands sent to the database | | listener_cmd_touch | N/A | Number of memcached TOUCH commands sent to the database | -| listener_cmd_touch_max | N/A[1](#proxy-table-note-1) | Highest value of number of memcached TOUCH commands sent to the database | +| listener_cmd_touch_max | N/A[1](#proxy-table-note-1) | Highest value of the number of memcached TOUCH commands sent to the database | | listener_conns | N/A | Number of clients connected to the endpoint | | listener_egress_bytes | N/A | Rate of outgoing network traffic to the endpoint (bytes/sec) | -| listener_egress_bytes_max | N/A[1](#proxy-table-note-1) | Highest value of rate of outgoing network traffic to the endpoint (bytes/sec) | +| listener_egress_bytes_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of outgoing network traffic to the endpoint (bytes/sec) | | listener_ingress_bytes | N/A | Rate of incoming network traffic to the endpoint (bytes/sec) | -| listener_ingress_bytes_max | N/A[1](#proxy-table-note-1) | Highest value of rate of incoming network traffic to the endpoint (bytes/sec) | +| listener_ingress_bytes_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of incoming network traffic to the endpoint (bytes/sec) | | listener_last_req_time | N/A | Time of last command sent to the database | | listener_last_res_time | N/A | Time of last response sent from the database | | listener_max_connections_exceeded | `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the database at the same time has exceeded the max limit | -| listener_max_connections_exceeded_max | N/A[1](#proxy-table-note-1) | Highest value of number of times the number of clients connected to the database at the same time has exceeded the max limit | -| listener_monitor_sessions_count | N/A | Number of client connected in monitor mode to the endpoint | +| listener_max_connections_exceeded_max | N/A[1](#proxy-table-note-1) | Highest value of the number of times the number of clients connected to the database at the same time has exceeded the max limit | +| listener_monitor_sessions_count | N/A | Number of clients connected in monitor mode to the endpoint | | listener_other_req | N/A | Rate of other (non-read/write) requests on the endpoint (ops/sec) | -| listener_other_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of other (non-read/write) requests on the endpoint (ops/sec) | +| listener_other_req_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of other (non-read/write) requests on the endpoint (ops/sec) | | listener_other_res | N/A | Rate of other (non-read/write) responses on the endpoint (ops/sec) | -| listener_other_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of other (non-read/write) responses on the endpoint (ops/sec) | +| listener_other_res_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of other (non-read/write) responses on the endpoint (ops/sec) | | listener_other_started_res | N/A | Number of responses sent from the database of type "other" | -| listener_other_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the database of type "other" | +| listener_other_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of the number of responses sent from the database of type "other" | | listener_read_req | `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | -| listener_read_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of read requests on the endpoint (ops/sec) | +| listener_read_req_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of read requests on the endpoint (ops/sec) | | listener_read_res | `irate(endpoint_read_responses[1m])` | Rate of read responses on the endpoint (ops/sec) | -| listener_read_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of read responses on the endpoint (ops/sec) | +| listener_read_res_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of read responses on the endpoint (ops/sec) | | listener_read_started_res | N/A | Number of responses sent from the database of type "read" | -| listener_read_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the database of type "read" | +| listener_read_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of the number of responses sent from the database of type "read" | | listener_total_connections_received | `irate(endpoint_total_connections_received[1m])` | Rate of new client connections to the endpoint (connections/sec) | -| listener_total_connections_received_max | N/A[1](#proxy-table-note-1) | Highest value of rate of new client connections to the endpoint (connections/sec) | +| listener_total_connections_received_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of new client connections to the endpoint (connections/sec) | | listener_total_req | N/A | Request rate handled by the endpoint (ops/sec) | -| listener_total_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of all requests on the endpoint (ops/sec) | +| listener_total_req_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of all requests on the endpoint (ops/sec) | | listener_total_res | N/A | Rate of all responses on the endpoint (ops/sec) | -| listener_total_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of all responses on the endpoint (ops/sec) | +| listener_total_res_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of all responses on the endpoint (ops/sec) | | listener_total_started_res | N/A | Number of responses sent from the database of all types | -| listener_total_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the database of all types | +| listener_total_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of the number of responses sent from the database of all types | | listener_write_req | `irate(endpoint_write_requests[1m])` | Rate of write requests on the endpoint (ops/sec) | -| listener_write_req_max | N/A[1](#proxy-table-note-1) | Highest value of rate of write requests on the endpoint (ops/sec) | +| listener_write_req_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of write requests on the endpoint (ops/sec) | | listener_write_res | `irate(endpoint_write_responses[1m])` | Rate of write responses on the endpoint (ops/sec) | -| listener_write_res_max | N/A[1](#proxy-table-note-1) | Highest value of rate of write responses on the endpoint (ops/sec) | +| listener_write_res_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of write responses on the endpoint (ops/sec) | | listener_write_started_res | N/A | Number of responses sent from the database of type "write" | -| listener_write_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of number of responses sent from the database of type "write" | +| listener_write_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of the number of responses sent from the database of type "write" | 1. The `max`, `min`, and `median` v1 metrics provide the aggregated value of the corresponding metric over a 30-second period. This was intended to alleviate a limitation in the v1 system that limited the resolution of reported metrics regardless of the configured scrape interval. This limitation does not apply to v2. You should avoid the extra aggregations unless required for specific use cases. @@ -226,7 +226,7 @@ Here are the metrics available to Prometheus: | V1 metric | Equivalent V2 PromQL | Description | | --------- | :------------------- | :---------- | | redis_active_defrag_running | `redis_server_active_defrag_running` | Automatic memory defragmentation current aggressiveness (% cpu) | -| redis_allocator_active | `redis_server_allocator_active` | Total used memory including external fragmentation | +| redis_allocator_active | `redis_server_allocator_active` | Total used memory, including external fragmentation | | redis_allocator_allocated | `redis_server_allocator_allocated` | Total allocated memory | | redis_allocator_resident | `redis_server_allocator_resident` | Total resident memory (RSS) | | redis_aof_last_cow_size | `redis_server_aof_last_cow_size` | Last AOFR, CopyOnWrite memory | @@ -235,7 +235,7 @@ Here are the metrics available to Prometheus: | redis_aof_delayed_fsync | `redis_server_aof_delayed_fsync` | Number of times an AOF fsync caused delays in the main Redis thread (inducing latency); this can indicate that the disk is slow or overloaded | | redis_blocked_clients | `redis_server_blocked_clients` | Count the clients waiting on a blocking call | | redis_connected_clients | `redis_server_connected_clients` | Number of client connections to the specific shard | -| redis_connected_slaves | `redis_server_connected_slaves` | Number of connected slaves | +| redis_connected_slaves | `redis_server_connected_slaves` | Number of connected replicas | | redis_db0_avg_ttl | `redis_server_db0_avg_ttl` | Average TTL of all volatile keys | | redis_db0_expires | `redis_server_expired_keys` | Total count of volatile keys | | redis_db0_keys | `redis_server_db0_keys` | Total key count | @@ -249,7 +249,7 @@ Here are the metrics available to Prometheus: | redis_keyspace_write_hits | `redis_server_keyspace_write_hits` | Number of write operations accessing an existing keyspace | | redis_keyspace_write_misses | `redis_server_keyspace_write_misses` | Number of write operations accessing a non-existing keyspace | | redis_master_link_status | `redis_server_master_link_status` | Indicates if the replica is connected to its master | -| redis_master_repl_offset | `redis_server_master_repl_offset` | Number of bytes sent to replicas by the shard; Calculate the throughput for a time period by comparing the value at different times | +| redis_master_repl_offset | `redis_server_master_repl_offset` | Number of bytes sent to replicas by the shard; calculate the throughput for a time period by comparing the value at different times | | redis_master_sync_in_progress | `redis_server_master_sync_in_progress` | The master shard is synchronizing (1 true | 0 false) | | redis_max_process_mem | `redis_server_max_process_mem` | Current memory limit configured by redis_mgr according to node free memory | | redis_maxmemory | `redis_server_maxmemory` | Current memory limit configured by redis_mgr according to database memory limits | @@ -265,18 +265,18 @@ Here are the metrics available to Prometheus: | redis_process_cpu_user_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="user"}` | Shard user CPU time spent in seconds | | redis_process_main_thread_cpu_system_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="system",threadname="redis-server"}` | Shard main thread system CPU time spent in seconds | | redis_process_main_thread_cpu_user_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="user",threadname="redis-server"}` | Shard main thread user CPU time spent in seconds | -| redis_process_max_fds | `max(namedprocess_namegroup_open_filedesc)` | Shard Maximum number of open file descriptors | -| redis_process_open_fds | `namedprocess_namegroup_open_filedesc` | Shard Number of open file descriptors | -| redis_process_resident_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="resident"}` | Shard Resident memory size in bytes | -| redis_process_start_time_seconds | `namedprocess_namegroup_oldest_start_time_seconds` | Shard Start time of the process since unix epoch in seconds | +| redis_process_max_fds | `max(namedprocess_namegroup_open_filedesc)` | Shard maximum number of open file descriptors | +| redis_process_open_fds | `namedprocess_namegroup_open_filedesc` | Shard number of open file descriptors | +| redis_process_resident_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="resident"}` | Shard resident memory size in bytes | +| redis_process_start_time_seconds | `namedprocess_namegroup_oldest_start_time_seconds` | Shard start time of the process since unix epoch in seconds | | redis_process_virtual_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="virtual"}` | Shard virtual memory in bytes | | redis_rdb_bgsave_in_progress | `redis_server_rdb_bgsave_in_progress` | Indication if bgsave is currently in progress | | redis_rdb_last_cow_size | `redis_server_rdb_last_cow_size` | Last bgsave (or SYNC fork) used CopyOnWrite memory | -| redis_rdb_saves | `redis_server_rdb_saves` | Total count of bgsaves since process was restarted (including replica fullsync and persistence) | -| redis_repl_touch_bytes | `redis_server_repl_touch_bytes` | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; Calculate the throughput for a time period by comparing the value at different times | -| redis_total_commands_processed | `redis_server_total_commands_processed` | Number of commands processed by the shard; Calculate the number of commands for a time period by comparing the value at different times | -| redis_total_connections_received | `redis_server_total_connections_received` | Number of connections received by the shard; Calculate the number of connections for a time period by comparing the value at different times | -| redis_total_net_input_bytes | `redis_server_total_net_input_bytes` | Number of bytes received by the shard; Calculate the throughput for a time period by comparing the value at different times | -| redis_total_net_output_bytes | `redis_server_total_net_output_bytes` | Number of bytes sent by the shard; Calculate the throughput for a time period by comparing the value at different times | +| redis_rdb_saves | `redis_server_rdb_saves` | Total count of bgsaves since the process was restarted (including replica fullsync and persistence) | +| redis_repl_touch_bytes | `redis_server_repl_touch_bytes` | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; calculate the throughput for a time period by comparing the value at different times | +| redis_total_commands_processed | `redis_server_total_commands_processed` | Number of commands processed by the shard; calculate the number of commands for a time period by comparing the value at different times | +| redis_total_connections_received | `redis_server_total_connections_received` | Number of connections received by the shard; calculate the number of connections for a time period by comparing the value at different times | +| redis_total_net_input_bytes | `redis_server_total_net_input_bytes` | Number of bytes received by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_total_net_output_bytes | `redis_server_total_net_output_bytes` | Number of bytes sent by the shard; calculate the throughput for a time period by comparing the value at different times | | redis_up | `redis_server_up` | Shard is up and running | | redis_used_memory | `redis_server_used_memory` | Memory used by shard (in BigRedis this includes flash) (bytes) | From cb86993f67e680a89a318fac41ea5b1364043595 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 9 Aug 2024 17:04:36 -0500 Subject: [PATCH 10/25] Change v2 not supported messages to N/A in tables --- .../prometheus-metrics-definitions.md | 84 +++++++++---------- 1 file changed, 41 insertions(+), 43 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index b2b6256c8c..300be2e6a4 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -107,29 +107,29 @@ Here are the metrics available to Prometheus: | node_cert_expiration_seconds | `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | | node_conns | `sum by (node) (endpoint_conns)` | Number of clients connected to endpoints on node | | node_cpu_idle | `avg by (node) (irate(node_cpu_seconds_total{mode="idle"}[1m]))` | CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_max | `not supported - see footnote2` | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_median | `not supported - see footnote2` | Average value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_min | `not supported - see footnote2` | Lowest value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_max | N/A | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_median | N/A | Average value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_min | N/A | Lowest value of CPU idle time portion (0-1, multiply by 100 to get percent) | | node_cpu_system | `avg by (node) (irate(node_cpu_seconds_total{mode="system"}[1m]))` | CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_max | `not supported - see footnote2` | Highest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_median | `not supported - see footnote2` | Average value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_min | `not supported - see footnote2` | Lowest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_max | N/A | Highest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_median | N/A | Average value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_min | N/A | Lowest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | | node_cpu_user | `avg by (node) (irate(node_cpu_seconds_total{mode="user"}[1m]))` | CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_max | `not supported - see footnote2` | Highest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_median | `not supported - see footnote2` | Average value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_min | `not supported - see footnote2` | Lowest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_max | N/A | Highest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_median | N/A | Average value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_min | N/A | Lowest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | | node_cur_aof_rewrites | `sum by (cluster, node) (redis_server_aof_rewrite_in_progress)` | Number of AOF rewrites that are currently performed by shards on this node | | node_egress_bytes | `irate(node_network_transmit_bytes_total{device=""}[1m])` | Rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_max | `not supported - see footnote2` | Highest value of the rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_median | `not supported - see footnote2` | Average value of the rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_min | `not supported - see footnote2` | Lowest value of the rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_max | N/A | Highest value of the rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_median | N/A | Average value of the rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_min | N/A | Lowest value of the rate of outgoing network traffic to node (bytes/sec) | | node_ephemeral_storage_avail | `node_ephemeral_storage_avail_bytes` | Disk space available to RLEC processes on configured ephemeral disk (bytes) | | node_ephemeral_storage_free | `node_ephemeral_storage_free_bytes` | Free disk space on configured ephemeral disk (bytes) | | node_free_memory | `node_memory_MemFree_bytes` | Free memory in the node (bytes) | | node_ingress_bytes | `irate(node_network_receive_bytes_total{device=""}[1m])` | Rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_max | `not supported - see footnote2` | Highest value of the rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_median | `not supported - see footnote2` | Average value of the rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_min | `not supported - see footnote2` | Lowest value of the rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_max | N/A | Highest value of the rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_median | N/A | Average value of the rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_min | N/A | Lowest value of the rate of incoming network traffic to node (bytes/sec) | | node_persistent_storage_avail | `node_persistent_storage_avail_bytes` | Disk space available to RLEC processes on configured persistent disk (bytes) | | node_persistent_storage_free | `node_persistent_storage_free_bytes` | Free disk space on configured persistent disk (bytes) | | node_provisional_flash | `node_provisional_flash_bytes` | Amount of flash available for new shards on this node, taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | @@ -150,63 +150,61 @@ Here are the metrics available to Prometheus: | V1 metric | Equivalent V2 PromQL | Description | | --------- | :------------------- | :---------- | | listener_acc_latency | N/A | Accumulative latency (sum of the latencies) of all types of commands on the database. For the average latency, divide this value by listener_total_res | -| listener_acc_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of all types of commands on the database | +| listener_acc_latency_max | N/A | Highest value of accumulative latency of all types of commands on the database | | listener_acc_other_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "other" on the database. For the average latency, divide this value by listener_other_res | -| listener_acc_other_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are a type "other" on the database | +| listener_acc_other_latency_max | N/A | Highest value of accumulative latency of commands that are a type "other" on the database | | listener_acc_read_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "read" on the database. For the average latency, divide this value by listener_read_res | -| listener_acc_read_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are a type "read" on the database | +| listener_acc_read_latency_max | N/A | Highest value of accumulative latency of commands that are a type "read" on the database | | listener_acc_write_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "write" on the database. For the average latency, divide this value by listener_write_res | -| listener_acc_write_latency_max | N/A[1](#proxy-table-note-1) | Highest value of accumulative latency of commands that are a type "write" on the database | +| listener_acc_write_latency_max | N/A | Highest value of accumulative latency of commands that are a type "write" on the database | | listener_auth_cmds | N/A | Number of memcached AUTH commands sent to the database | -| listener_auth_cmds_max | N/A[1](#proxy-table-note-1) | Highest value of the number of memcached AUTH commands sent to the database | +| listener_auth_cmds_max | N/A | Highest value of the number of memcached AUTH commands sent to the database | | listener_auth_errors | N/A | Number of error responses to memcached AUTH commands | -| listener_auth_errors_max | N/A[1](#proxy-table-note-1) | Highest value of the number of error responses to memcached AUTH commands | +| listener_auth_errors_max | N/A | Highest value of the number of error responses to memcached AUTH commands | | listener_cmd_flush | N/A | Number of memcached FLUSH_ALL commands sent to the database | -| listener_cmd_flush_max | N/A[1](#proxy-table-note-1) | Highest value of the number of memcached FLUSH_ALL commands sent to the database | +| listener_cmd_flush_max | N/A | Highest value of the number of memcached FLUSH_ALL commands sent to the database | | listener_cmd_get | N/A | Number of memcached GET commands sent to the database | -| listener_cmd_get_max | N/A[1](#proxy-table-note-1) | Highest value of the number of memcached GET commands sent to the database | +| listener_cmd_get_max | N/A | Highest value of the number of memcached GET commands sent to the database | | listener_cmd_set | N/A | Number of memcached SET commands sent to the database | -| listener_cmd_set_max | N/A[1](#proxy-table-note-1) | Highest value of the number of memcached SET commands sent to the database | +| listener_cmd_set_max | N/A | Highest value of the number of memcached SET commands sent to the database | | listener_cmd_touch | N/A | Number of memcached TOUCH commands sent to the database | -| listener_cmd_touch_max | N/A[1](#proxy-table-note-1) | Highest value of the number of memcached TOUCH commands sent to the database | +| listener_cmd_touch_max | N/A | Highest value of the number of memcached TOUCH commands sent to the database | | listener_conns | N/A | Number of clients connected to the endpoint | | listener_egress_bytes | N/A | Rate of outgoing network traffic to the endpoint (bytes/sec) | -| listener_egress_bytes_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of outgoing network traffic to the endpoint (bytes/sec) | +| listener_egress_bytes_max | N/A | Highest value of the rate of outgoing network traffic to the endpoint (bytes/sec) | | listener_ingress_bytes | N/A | Rate of incoming network traffic to the endpoint (bytes/sec) | -| listener_ingress_bytes_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of incoming network traffic to the endpoint (bytes/sec) | +| listener_ingress_bytes_max | N/A | Highest value of the rate of incoming network traffic to the endpoint (bytes/sec) | | listener_last_req_time | N/A | Time of last command sent to the database | | listener_last_res_time | N/A | Time of last response sent from the database | | listener_max_connections_exceeded | `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the database at the same time has exceeded the max limit | -| listener_max_connections_exceeded_max | N/A[1](#proxy-table-note-1) | Highest value of the number of times the number of clients connected to the database at the same time has exceeded the max limit | +| listener_max_connections_exceeded_max | N/A | Highest value of the number of times the number of clients connected to the database at the same time has exceeded the max limit | | listener_monitor_sessions_count | N/A | Number of clients connected in monitor mode to the endpoint | | listener_other_req | N/A | Rate of other (non-read/write) requests on the endpoint (ops/sec) | -| listener_other_req_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of other (non-read/write) requests on the endpoint (ops/sec) | +| listener_other_req_max | N/A | Highest value of the rate of other (non-read/write) requests on the endpoint (ops/sec) | | listener_other_res | N/A | Rate of other (non-read/write) responses on the endpoint (ops/sec) | -| listener_other_res_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of other (non-read/write) responses on the endpoint (ops/sec) | +| listener_other_res_max | N/A | Highest value of the rate of other (non-read/write) responses on the endpoint (ops/sec) | | listener_other_started_res | N/A | Number of responses sent from the database of type "other" | -| listener_other_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of the number of responses sent from the database of type "other" | +| listener_other_started_res_max | N/A | Highest value of the number of responses sent from the database of type "other" | | listener_read_req | `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | -| listener_read_req_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of read requests on the endpoint (ops/sec) | +| listener_read_req_max | N/A | Highest value of the rate of read requests on the endpoint (ops/sec) | | listener_read_res | `irate(endpoint_read_responses[1m])` | Rate of read responses on the endpoint (ops/sec) | -| listener_read_res_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of read responses on the endpoint (ops/sec) | +| listener_read_res_max | N/A | Highest value of the rate of read responses on the endpoint (ops/sec) | | listener_read_started_res | N/A | Number of responses sent from the database of type "read" | -| listener_read_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of the number of responses sent from the database of type "read" | +| listener_read_started_res_max | N/A | Highest value of the number of responses sent from the database of type "read" | | listener_total_connections_received | `irate(endpoint_total_connections_received[1m])` | Rate of new client connections to the endpoint (connections/sec) | -| listener_total_connections_received_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of new client connections to the endpoint (connections/sec) | +| listener_total_connections_received_max | N/A | Highest value of the rate of new client connections to the endpoint (connections/sec) | | listener_total_req | N/A | Request rate handled by the endpoint (ops/sec) | -| listener_total_req_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of all requests on the endpoint (ops/sec) | +| listener_total_req_max | N/A | Highest value of the rate of all requests on the endpoint (ops/sec) | | listener_total_res | N/A | Rate of all responses on the endpoint (ops/sec) | -| listener_total_res_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of all responses on the endpoint (ops/sec) | +| listener_total_res_max | N/A | Highest value of the rate of all responses on the endpoint (ops/sec) | | listener_total_started_res | N/A | Number of responses sent from the database of all types | -| listener_total_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of the number of responses sent from the database of all types | +| listener_total_started_res_max | N/A | Highest value of the number of responses sent from the database of all types | | listener_write_req | `irate(endpoint_write_requests[1m])` | Rate of write requests on the endpoint (ops/sec) | -| listener_write_req_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of write requests on the endpoint (ops/sec) | +| listener_write_req_max | N/A | Highest value of the rate of write requests on the endpoint (ops/sec) | | listener_write_res | `irate(endpoint_write_responses[1m])` | Rate of write responses on the endpoint (ops/sec) | -| listener_write_res_max | N/A[1](#proxy-table-note-1) | Highest value of the rate of write responses on the endpoint (ops/sec) | +| listener_write_res_max | N/A | Highest value of the rate of write responses on the endpoint (ops/sec) | | listener_write_started_res | N/A | Number of responses sent from the database of type "write" | -| listener_write_started_res_max | N/A[1](#proxy-table-note-1) | Highest value of the number of responses sent from the database of type "write" | - -1. The `max`, `min`, and `median` v1 metrics provide the aggregated value of the corresponding metric over a 30-second period. This was intended to alleviate a limitation in the v1 system that limited the resolution of reported metrics regardless of the configured scrape interval. This limitation does not apply to v2. You should avoid the extra aggregations unless required for specific use cases. +| listener_write_started_res_max | N/A | Highest value of the number of responses sent from the database of type "write" | ## Replication metrics From 96908589f2def275a20c23537ac8ab98a1acf470 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 27 Sep 2024 16:29:48 -0500 Subject: [PATCH 11/25] DOC-4071 Separate v2 Prometheus metrics and transition tables --- .../prometheus-metrics-definitions.md | 436 ++++++++---------- .../prometheus-metrics-v1-to-v2.md | 279 +++++++++++ 2 files changed, 464 insertions(+), 251 deletions(-) create mode 100644 content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 300be2e6a4..a7c776c8e5 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -1,280 +1,214 @@ --- -Title: Metrics in Prometheus +Title: Prometheus metrics v2 alwaysopen: false categories: - docs - integrate - rs -description: The metrics available to Prometheus. +description: PromQL metrics available to Prometheus as of Redis Enterprise Software version 7.8.0. group: observability -linkTitle: Prometheus metrics -summary: You can use Prometheus and Grafana to collect and visualize your Redis Enterprise - Software metrics. +linkTitle: Prometheus metrics v2 +summary: PromQL metrics available to Prometheus as of Redis Enterprise Software version 7.8.0. type: integration weight: 45 --- -The [integration with Prometheus]({{< relref "/integrate/prometheus-with-redis-enterprise/" >}}) -lets you create dashboards that highlight the metrics that are important to you. -Here are the metrics available to Prometheus: +You can [integrate Redis Enterprise Software with Prometheus and Grafana]({{}}) to create dashboards for important metrics. + +The [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/latest/querying/basics/) metrics in the following tables are available as of Redis Enterprise Software version 7.8.0. For help transitioning from v1 metrics to PromQL, see [Prometheus v1 metrics and equivalent v2 PromQL]({{}}). ## Database metrics -| V1 metric | Equivalent V2 PromQL | Description | -| --------- | :------------------- | :---------- | -| bdb_avg_latency | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of operations on the database (seconds); returned only when there is traffic | -| bdb_avg_latency_max | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of operations on the database (seconds); returned only when there is traffic | -| bdb_avg_read_latency | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of read operations (seconds); returned only when there is traffic | -| bdb_avg_read_latency_max | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of read operations (seconds); returned only when there is traffic | -| bdb_avg_write_latency | `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of write operations (seconds); returned only when there is traffic | -| bdb_avg_write_latency_max | `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of write operations (seconds); returned only when there is traffic | -| bdb_bigstore_shard_count | `sum((sum(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+", threadname=~"(speedb\|rocksdb).*"}, "redis", "$1", "groupname", "redis-(\d+)"), "driver", "$1", "threadname", "(speedb\|rocksdb).*")) by (redis, driver) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, driver)` | Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled | -| bdb_conns | `sum by(bdb) (endpoint_conns)` | Number of client connections to database | -| bdb_egress_bytes | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Rate of outgoing network traffic from the database (bytes/sec) | -| bdb_egress_bytes_max | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Highest value of the rate of outgoing network traffic from the database (bytes/sec) | -| bdb_evicted_objects | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Rate of key evictions from database (evictions/sec) | -| bdb_evicted_objects_max | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Highest value of the rate of key evictions from database (evictions/sec) | -| bdb_expired_objects | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Rate keys expired in database (expirations/sec) | -| bdb_expired_objects_max | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Highest value of the rate keys expired in database (expirations/sec) | -| bdb_fork_cpu_system | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | % cores utilization in system mode for all Redis shard fork child processes of this database | -| bdb_fork_cpu_system_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard fork child processes of this database | -| bdb_fork_cpu_user | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | % cores utilization in user mode for all Redis shard fork child processes of this database | -| bdb_fork_cpu_user_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard fork child processes of this database | -| bdb_ingress_bytes | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Rate of incoming network traffic to database (bytes/sec) | -| bdb_ingress_bytes_max | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Highest value of the rate of incoming network traffic to database (bytes/sec) | -| bdb_instantaneous_ops_per_sec | `sum by(bdb) (redis_server_instantaneous_ops_per_sec)` | Request rate handled by all shards of database (ops/sec) | -| bdb_main_thread_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | % cores utilization in system mode for all Redis shard main threads of this database | -| bdb_main_thread_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard main threads of this database | -| bdb_main_thread_cpu_user | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | % cores utilization in user mode for all Redis shard main threads of this database | -| bdb_main_thread_cpu_user_max | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard main threads of this database | -| bdb_mem_frag_ratio | `avg(redis_server_mem_fragmentation_ratio)` | RAM fragmentation ratio (RSS / allocated RAM) | -| bdb_mem_size_lua | `sum by(bdb) (redis_server_used_memory_lua)` | Redis lua scripting heap size (bytes) | -| bdb_memory_limit | `sum by(bdb) (redis_server_maxmemory)` | Configured RAM limit for the database | -| bdb_monitor_sessions_count | `sum by(bdb) (endpoint_monitor_sessions_count)` | Number of clients connected in monitor mode to the database | -| bdb_no_of_keys | `sum by (bdb) (redis_server_db_keys{role="master"})` | Number of keys in database | -| bdb_other_req | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Rate of other (non read/write) requests on the database (ops/sec) | -| bdb_other_req_max | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Highest value of the rate of other (non read/write) requests on the database (ops/sec) | -| bdb_other_res | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Rate of other (non read/write) responses on the database (ops/sec) | -| bdb_other_res_max | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Highest value of the rate of other (non read/write) responses on the database (ops/sec) | -| bdb_pubsub_channels | `sum by(bdb) (redis_server_pubsub_channels)` | Count the pub/sub channels with subscribed clients | -| bdb_pubsub_channels_max | `sum by(bdb) (redis_server_pubsub_channels)` | Highest value of count the pub/sub channels with subscribed clients | -| bdb_pubsub_patterns | `sum by(bdb) (redis_server_pubsub_patterns)` | Count the pub/sub patterns with subscribed clients | -| bdb_pubsub_patterns_max | `sum by(bdb) (redis_server_pubsub_patterns)` | Highest value of count the pub/sub patterns with subscribed clients | -| bdb_read_hits | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Rate of read operations accessing an existing key (ops/sec) | -| bdb_read_hits_max | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Highest value of the rate of read operations accessing an existing key (ops/sec) | -| bdb_read_misses | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Rate of read operations accessing a non-existing key (ops/sec) | -| bdb_read_misses_max | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Highest value of the rate of read operations accessing a non-existing key (ops/sec) | -| bdb_read_req | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Rate of read requests on the database (ops/sec) | -| bdb_read_req_max | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Highest value of the rate of read requests on the database (ops/sec) | -| bdb_read_res | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Rate of read responses on the database (ops/sec) | -| bdb_read_res_max | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Highest value of the rate of read responses on the database (ops/sec) | -| bdb_shard_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | % cores utilization in system mode for all Redis shard processes of this database | -| bdb_shard_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard processes of this database | -| bdb_shard_cpu_user | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | % cores utilization in user mode for the Redis shard process | -| bdb_shard_cpu_user_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | Highest value of % cores utilization in user mode for the Redis shard process | -| bdb_shards_used | `sum((sum(label_replace(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+"}, "redis", "$1", "groupname", "redis-(\d+)"), "shard_type", "flash", "threadname", "(bigstore).*"), "shard_type", "ram", "shard_type", "")) by (redis, shard_type) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, shard_type)` | Used shard count by database and by shard type (ram / flash) | -| bdb_total_connections_received | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Rate of new client connections to database (connections/sec) | -| bdb_total_connections_received_max | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Highest value of the rate of new client connections to database (connections/sec) | -| bdb_total_req | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Rate of all requests on the database (ops/sec) | -| bdb_total_req_max | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Highest value of the rate of all requests on the database (ops/sec) | -| bdb_total_res | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Rate of all responses on the database (ops/sec) | -| bdb_total_res_max | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Highest value of the rate of all responses on the database (ops/sec) | -| bdb_up | `min by(bdb) (redis_up)` | Database is up and running | -| bdb_used_memory | `sum by (bdb) (redis_server_used_memory)` | Memory used by database (in BigRedis this includes flash) (bytes) | -| bdb_write_hits | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Rate of write operations accessing an existing key (ops/sec) | -| bdb_write_hits_max | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Highest value of the rate of write operations accessing an existing key (ops/sec) | -| bdb_write_misses | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Rate of write operations accessing a non-existing key (ops/sec) | -| bdb_write_misses_max | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Highest value of the rate of write operations accessing a non-existing key (ops/sec) | -| bdb_write_req | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Rate of write requests on the database (ops/sec) | -| bdb_write_req_max | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Highest value of the rate of write requests on the database (ops/sec) | -| bdb_write_res | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Rate of write responses on the database (ops/sec) | -| bdb_write_res_max | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Highest value of the rate of write responses on the database (ops/sec) | -| no_of_expires | `sum by(bdb) (redis_server_db_expires{role="master"})` | Current number of volatile keys in the database | +| PromQL | Description | +| :----- | :---------- | +| `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of operations on the database (seconds); returned only when there is traffic | +| `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of operations on the database (seconds); returned only when there is traffic | +| `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of read operations (seconds); returned only when there is traffic | +| `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of read operations (seconds); returned only when there is traffic | +| `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of write operations (seconds); returned only when there is traffic | +| `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of write operations (seconds); returned only when there is traffic | +| `sum((sum(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+", threadname=~"(speedb\|rocksdb).*"}, "redis", "$1", "groupname", "redis-(\d+)"), "driver", "$1", "threadname", "(speedb\|rocksdb).*")) by (redis, driver) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, driver)` | Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled | +| `sum by(bdb) (endpoint_conns)` | Number of client connections to database | +| `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Rate of outgoing network traffic from the database (bytes/sec) | +| `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Highest value of the rate of outgoing network traffic from the database (bytes/sec) | +| `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Rate of key evictions from database (evictions/sec) | +| `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Highest value of the rate of key evictions from database (evictions/sec) | +| `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Rate of keys expired in database (expirations/sec) | +| `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Highest value of the rate of keys expired in database (expirations/sec) | +| `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | % cores utilization in system mode for all Redis shard fork child processes of this database | +| `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard fork child processes of this database | +| `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | % cores utilization in user mode for all Redis shard fork child processes of this database | +| `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard fork child processes of this database | +| `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Rate of incoming network traffic to database (bytes/sec) | +| `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Highest value of the rate of incoming network traffic to database (bytes/sec) | +| `sum by(bdb) (redis_server_instantaneous_ops_per_sec)` | Request rate handled by all shards of database (ops/sec) | +| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | % cores utilization in system mode for all Redis shard main threads of this database | +| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard main threads of this database | +| `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | % cores utilization in user mode for all Redis shard main threads of this database | +| `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard main threads of this database | +| `avg(redis_server_mem_fragmentation_ratio)` | RAM fragmentation ratio (RSS / allocated RAM) | +| `sum by(bdb) (redis_server_used_memory_lua)` | Redis lua scripting heap size (bytes) | +| `sum by(bdb) (redis_server_maxmemory)` | Configured RAM limit for the database | +| `sum by(bdb) (endpoint_monitor_sessions_count)` | Number of clients connected in monitor mode to the database | +| `sum by (bdb) (redis_server_db_keys{role="master"})` | Number of keys in database | +| `sum by(bdb) (irate(endpoint_other_req[1m]))` | Rate of other (non-read/write) requests on the database (ops/sec) | +| `sum by(bdb) (irate(endpoint_other_req[1m]))` | Highest value of the rate of other (non-read/write) requests on the database (ops/sec) | +| `sum by(bdb) (irate(endpoint_other_res[1m]))` | Rate of other (non-read/write) responses on the database (ops/sec) | +| `sum by(bdb) (irate(endpoint_other_res[1m]))` | Highest value of the rate of other (non-read/write) responses on the database (ops/sec) | +| `sum by(bdb) (redis_server_pubsub_channels)` | Count the pub/sub channels with subscribed clients | +| `sum by(bdb) (redis_server_pubsub_channels)` | Highest value of count the pub/sub channels with subscribed clients | +| `sum by(bdb) (redis_server_pubsub_patterns)` | Count the pub/sub patterns with subscribed clients | +| `sum by(bdb) (redis_server_pubsub_patterns)` | Highest value of count the pub/sub patterns with subscribed clients | +| `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Rate of read operations accessing an existing key (ops/sec) | +| `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Highest value of the rate of read operations accessing an existing key (ops/sec) | +| `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Rate of read operations accessing a non-existing key (ops/sec) | +| `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Highest value of the rate of read operations accessing a non-existing key (ops/sec) | +| `sum by (bdb) (irate(endpoint_read_req[1m]))` | Rate of read requests on the database (ops/sec) | +| `sum by (bdb) (irate(endpoint_read_req[1m]))` | Highest value of the rate of read requests on the database (ops/sec) | +| `sum by(bdb) (irate(endpoint_read_res[1m]))` | Rate of read responses on the database (ops/sec) | +| `sum by(bdb) (irate(endpoint_read_res[1m]))` | Highest value of the rate of read responses on the database (ops/sec) | +| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | % cores utilization in system mode for all Redis shard processes of this database | +| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard processes of this database | +| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | % cores utilization in user mode for the Redis shard process | +| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | Highest value of % cores utilization in user mode for the Redis shard process | +| `sum((sum(label_replace(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+"}, "redis", "$1", "groupname", "redis-(\d+)"), "shard_type", "flash", "threadname", "(bigstore).*"), "shard_type", "ram", "shard_type", "")) by (redis, shard_type) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, shard_type)` | Used shard count by database and by shard type (ram / flash) | +| `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Rate of new client connections to database (connections/sec) | +| `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Highest value of the rate of new client connections to database (connections/sec) | +| `sum by (bdb) (irate(endpoint_total_req[1m]))` | Rate of all requests on the database (ops/sec) | +| `sum by (bdb) (irate(endpoint_total_req[1m]))` | Highest value of the rate of all requests on the database (ops/sec) | +| `sum by(bdb) (irate(endpoint_total_res[1m]))` | Rate of all responses on the database (ops/sec) | +| `sum by(bdb) (irate(endpoint_total_res[1m]))` | Highest value of the rate of all responses on the database (ops/sec) | +| `min by(bdb) (redis_up)` | Database is up and running | +| `sum by (bdb) (redis_server_used_memory)` | Memory used by database (in BigRedis this includes flash) (bytes) | +| `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Rate of write operations accessing an existing key (ops/sec) | +| `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Highest value of the rate of write operations accessing an existing key (ops/sec) | +| `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Rate of write operations accessing a non-existing key (ops/sec) | +| `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Highest value of the rate of write operations accessing a non-existing key (ops/sec) | +| `sum by (bdb) (irate(endpoint_write_req[1m]))` | Rate of write requests on the database (ops/sec) | +| `sum by (bdb) (irate(endpoint_write_req[1m]))` | Highest value of the rate of write requests on the database (ops/sec) | +| `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Rate of write responses on the database (ops/sec) | +| `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Highest value of the rate of write responses on the database (ops/sec) | +| `sum by(bdb) (redis_server_db_expires{role="master"})` | Current number of volatile keys in the database | ## Node metrics -| V1 metric | Equivalent V2 PromQL | Description | -| --------- | :------------------- | :---------- | -| node_available_flash | `node_available_flash_bytes` | Available flash in the node (bytes) | -| node_available_flash_no_overbooking | `node_available_flash_no_overbooking_bytes` | Available flash in the node (bytes), without taking into account overbooking | -| node_available_memory | `node_available_memory_bytes` | Amount of free memory in the node (bytes) that is available for database provisioning | -| node_available_memory_no_overbooking | `node_available_memory_no_overbooking_bytes` | Available RAM in the node (bytes) without taking into account overbooking | -| node_avg_latency | `sum by (proxy) (irate(endpoint_acc_latency[1m])) / sum by (proxy) (irate(endpoint_total_started_res[1m]))` | Average latency of requests handled by endpoints on the node in milliseconds; returned only when there is traffic | -| node_bigstore_free | `node_bigstore_free_bytes` | Sum of free space of back-end flash (used by flash database's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | -| node_bigstore_iops | `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | -| node_bigstore_kv_ops | `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | -| node_bigstore_throughput | `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (bytes/sec); returned only when BigRedis is enabled | -| node_cert_expiration_seconds | `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | -| node_conns | `sum by (node) (endpoint_conns)` | Number of clients connected to endpoints on node | -| node_cpu_idle | `avg by (node) (irate(node_cpu_seconds_total{mode="idle"}[1m]))` | CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_max | N/A | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_median | N/A | Average value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_min | N/A | Lowest value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_system | `avg by (node) (irate(node_cpu_seconds_total{mode="system"}[1m]))` | CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_max | N/A | Highest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_median | N/A | Average value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_min | N/A | Lowest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | -| node_cpu_user | `avg by (node) (irate(node_cpu_seconds_total{mode="user"}[1m]))` | CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_max | N/A | Highest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_median | N/A | Average value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_min | N/A | Lowest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | -| node_cur_aof_rewrites | `sum by (cluster, node) (redis_server_aof_rewrite_in_progress)` | Number of AOF rewrites that are currently performed by shards on this node | -| node_egress_bytes | `irate(node_network_transmit_bytes_total{device=""}[1m])` | Rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_max | N/A | Highest value of the rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_median | N/A | Average value of the rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_min | N/A | Lowest value of the rate of outgoing network traffic to node (bytes/sec) | -| node_ephemeral_storage_avail | `node_ephemeral_storage_avail_bytes` | Disk space available to RLEC processes on configured ephemeral disk (bytes) | -| node_ephemeral_storage_free | `node_ephemeral_storage_free_bytes` | Free disk space on configured ephemeral disk (bytes) | -| node_free_memory | `node_memory_MemFree_bytes` | Free memory in the node (bytes) | -| node_ingress_bytes | `irate(node_network_receive_bytes_total{device=""}[1m])` | Rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_max | N/A | Highest value of the rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_median | N/A | Average value of the rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_min | N/A | Lowest value of the rate of incoming network traffic to node (bytes/sec) | -| node_persistent_storage_avail | `node_persistent_storage_avail_bytes` | Disk space available to RLEC processes on configured persistent disk (bytes) | -| node_persistent_storage_free | `node_persistent_storage_free_bytes` | Free disk space on configured persistent disk (bytes) | -| node_provisional_flash | `node_provisional_flash_bytes` | Amount of flash available for new shards on this node, taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | -| node_provisional_flash_no_overbooking | `node_provisional_flash_no_overbooking_bytes` | Amount of flash available for new shards on this node, without taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | -| node_provisional_memory | `node_provisional_memory_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases | -| node_provisional_memory_no_overbooking | `node_provisional_memory_no_overbooking_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases, without taking into account overbooking | -| node_total_req | `sum by (cluster, node) (irate(endpoint_total_req[1m]))` | Request rate handled by endpoints on node (ops/sec) | -| node_up | `node_metrics_up` | Node is part of the cluster and is connected | +| PromQL | Description | +| :----- | :---------- | +| `node_available_flash_bytes` | Available flash in the node (bytes) | +| `node_available_flash_no_overbooking_bytes` | Available flash in the node (bytes), without taking into account overbooking | +| `node_available_memory_bytes` | Amount of free memory in the node (bytes) that is available for database provisioning | +| `node_available_memory_no_overbooking_bytes` | Available RAM in the node (bytes) without taking into account overbooking | +| `sum by (proxy) (irate(endpoint_acc_latency[1m])) / sum by (proxy) (irate(endpoint_total_started_res[1m]))` | Average latency of requests handled by endpoints on the node in milliseconds; returned only when there is traffic | +| `node_bigstore_free_bytes` | Sum of free space of back-end flash (used by flash database's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | +| `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | +| `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | +| `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (bytes/sec); returned only when BigRedis is enabled | +| `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | +| `sum by (node) (endpoint_conns)` | Number of clients connected to endpoints on node | +| `avg by (node) (irate(node_cpu_seconds_total{mode="idle"}[1m]))` | CPU idle time portion (0-1, multiply by 100 to get percent) | +| `avg by (node) (irate(node_cpu_seconds_total{mode="system"}[1m]))` | CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| `avg by (node) (irate(node_cpu_seconds_total{mode="user"}[1m]))` | CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| `sum by (cluster, node) (redis_server_aof_rewrite_in_progress)` | Number of AOF rewrites that are currently performed by shards on this node | +| `irate(node_network_transmit_bytes_total{device=""}[1m])` | Rate of outgoing network traffic to node (bytes/sec) | +| `node_ephemeral_storage_avail_bytes` | Disk space available to RLEC processes on configured ephemeral disk (bytes) | +| `node_ephemeral_storage_free_bytes` | Free disk space on configured ephemeral disk (bytes) | +| `node_memory_MemFree_bytes` | Free memory in the node (bytes) | +| `irate(node_network_receive_bytes_total{device=""}[1m])` | Rate of incoming network traffic to node (bytes/sec) | +| `node_persistent_storage_avail_bytes` | Disk space available to RLEC processes on configured persistent disk (bytes) | +| `node_persistent_storage_free_bytes` | Free disk space on configured persistent disk (bytes) | +| `node_provisional_flash_bytes` | Amount of flash available for new shards on this node, taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| `node_provisional_flash_no_overbooking_bytes` | Amount of flash available for new shards on this node, without taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| `node_provisional_memory_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases | +| `node_provisional_memory_no_overbooking_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases, without taking into account overbooking | +| `sum by (cluster, node) (irate(endpoint_total_req[1m]))` | Request rate handled by endpoints on node (ops/sec) | +| `node_metrics_up` | Node is part of the cluster and is connected | ## Cluster metrics -| V1 metric | Equivalent V2 PromQL | Description | -| --------- | :------------------- | :---------- | -| cluster_shards_limit | `license_shards_limit` | Total shard limit by the license by shard type (ram / flash) | +| PromQL | Description | +| :----- | :---------- | +| `license_shards_limit` | Total shard limit by the license by shard type (ram / flash) | ## Proxy metrics -| V1 metric | Equivalent V2 PromQL | Description | -| --------- | :------------------- | :---------- | -| listener_acc_latency | N/A | Accumulative latency (sum of the latencies) of all types of commands on the database. For the average latency, divide this value by listener_total_res | -| listener_acc_latency_max | N/A | Highest value of accumulative latency of all types of commands on the database | -| listener_acc_other_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "other" on the database. For the average latency, divide this value by listener_other_res | -| listener_acc_other_latency_max | N/A | Highest value of accumulative latency of commands that are a type "other" on the database | -| listener_acc_read_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "read" on the database. For the average latency, divide this value by listener_read_res | -| listener_acc_read_latency_max | N/A | Highest value of accumulative latency of commands that are a type "read" on the database | -| listener_acc_write_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "write" on the database. For the average latency, divide this value by listener_write_res | -| listener_acc_write_latency_max | N/A | Highest value of accumulative latency of commands that are a type "write" on the database | -| listener_auth_cmds | N/A | Number of memcached AUTH commands sent to the database | -| listener_auth_cmds_max | N/A | Highest value of the number of memcached AUTH commands sent to the database | -| listener_auth_errors | N/A | Number of error responses to memcached AUTH commands | -| listener_auth_errors_max | N/A | Highest value of the number of error responses to memcached AUTH commands | -| listener_cmd_flush | N/A | Number of memcached FLUSH_ALL commands sent to the database | -| listener_cmd_flush_max | N/A | Highest value of the number of memcached FLUSH_ALL commands sent to the database | -| listener_cmd_get | N/A | Number of memcached GET commands sent to the database | -| listener_cmd_get_max | N/A | Highest value of the number of memcached GET commands sent to the database | -| listener_cmd_set | N/A | Number of memcached SET commands sent to the database | -| listener_cmd_set_max | N/A | Highest value of the number of memcached SET commands sent to the database | -| listener_cmd_touch | N/A | Number of memcached TOUCH commands sent to the database | -| listener_cmd_touch_max | N/A | Highest value of the number of memcached TOUCH commands sent to the database | -| listener_conns | N/A | Number of clients connected to the endpoint | -| listener_egress_bytes | N/A | Rate of outgoing network traffic to the endpoint (bytes/sec) | -| listener_egress_bytes_max | N/A | Highest value of the rate of outgoing network traffic to the endpoint (bytes/sec) | -| listener_ingress_bytes | N/A | Rate of incoming network traffic to the endpoint (bytes/sec) | -| listener_ingress_bytes_max | N/A | Highest value of the rate of incoming network traffic to the endpoint (bytes/sec) | -| listener_last_req_time | N/A | Time of last command sent to the database | -| listener_last_res_time | N/A | Time of last response sent from the database | -| listener_max_connections_exceeded | `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the database at the same time has exceeded the max limit | -| listener_max_connections_exceeded_max | N/A | Highest value of the number of times the number of clients connected to the database at the same time has exceeded the max limit | -| listener_monitor_sessions_count | N/A | Number of clients connected in monitor mode to the endpoint | -| listener_other_req | N/A | Rate of other (non-read/write) requests on the endpoint (ops/sec) | -| listener_other_req_max | N/A | Highest value of the rate of other (non-read/write) requests on the endpoint (ops/sec) | -| listener_other_res | N/A | Rate of other (non-read/write) responses on the endpoint (ops/sec) | -| listener_other_res_max | N/A | Highest value of the rate of other (non-read/write) responses on the endpoint (ops/sec) | -| listener_other_started_res | N/A | Number of responses sent from the database of type "other" | -| listener_other_started_res_max | N/A | Highest value of the number of responses sent from the database of type "other" | -| listener_read_req | `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | -| listener_read_req_max | N/A | Highest value of the rate of read requests on the endpoint (ops/sec) | -| listener_read_res | `irate(endpoint_read_responses[1m])` | Rate of read responses on the endpoint (ops/sec) | -| listener_read_res_max | N/A | Highest value of the rate of read responses on the endpoint (ops/sec) | -| listener_read_started_res | N/A | Number of responses sent from the database of type "read" | -| listener_read_started_res_max | N/A | Highest value of the number of responses sent from the database of type "read" | -| listener_total_connections_received | `irate(endpoint_total_connections_received[1m])` | Rate of new client connections to the endpoint (connections/sec) | -| listener_total_connections_received_max | N/A | Highest value of the rate of new client connections to the endpoint (connections/sec) | -| listener_total_req | N/A | Request rate handled by the endpoint (ops/sec) | -| listener_total_req_max | N/A | Highest value of the rate of all requests on the endpoint (ops/sec) | -| listener_total_res | N/A | Rate of all responses on the endpoint (ops/sec) | -| listener_total_res_max | N/A | Highest value of the rate of all responses on the endpoint (ops/sec) | -| listener_total_started_res | N/A | Number of responses sent from the database of all types | -| listener_total_started_res_max | N/A | Highest value of the number of responses sent from the database of all types | -| listener_write_req | `irate(endpoint_write_requests[1m])` | Rate of write requests on the endpoint (ops/sec) | -| listener_write_req_max | N/A | Highest value of the rate of write requests on the endpoint (ops/sec) | -| listener_write_res | `irate(endpoint_write_responses[1m])` | Rate of write responses on the endpoint (ops/sec) | -| listener_write_res_max | N/A | Highest value of the rate of write responses on the endpoint (ops/sec) | -| listener_write_started_res | N/A | Number of responses sent from the database of type "write" | -| listener_write_started_res_max | N/A | Highest value of the number of responses sent from the database of type "write" | +| PromQL | Description | +| :----- | :---------- | +| `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the database at the same time has exceeded the max limit | +| `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | +| `irate(endpoint_read_responses[1m])` | Rate of read responses on the endpoint (ops/sec) | +| `irate(endpoint_total_connections_received[1m])` | Rate of new client connections to the endpoint (connections/sec) | +| `irate(endpoint_write_requests[1m])` | Rate of write requests on the endpoint (ops/sec) | +| `irate(endpoint_write_responses[1m])` | Rate of write responses on the endpoint (ops/sec) | ## Replication metrics -| V1 metric | Equivalent V2 PromQL | Description | -| --------- | :------------------- | :---------- | -| bdb_replicaof_syncer_ingress_bytes | `rate(replica_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to a Replica Of database (bytes/sec) | -| bdb_replicaof_syncer_ingress_bytes_decompressed | `rate(replica_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to a Replica Of database (bytes/sec) | -| bdb_replicaof_syncer_local_ingress_lag_time | `database_syncer_lag_ms{syncer_type="replicaof"}` | Lag time between the source and the destination for Replica Of traffic (ms) | -| bdb_replicaof_syncer_status | `database_syncer_current_status{syncer_type="replicaof"}` | Syncer status for Replica Of traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | -| bdb_crdt_syncer_ingress_bytes | `rate(crdt_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to CRDB (bytes/sec) | -| bdb_crdt_syncer_ingress_bytes_decompressed | `rate(crdt_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to CRDB (bytes/sec) | -| bdb_crdt_syncer_local_ingress_lag_time | `database_syncer_lag_ms{syncer_type="crdt"}` | Lag time between the source and the destination (ms) for CRDB traffic | -| bdb_crdt_syncer_status | `database_syncer_current_status{syncer_type="crdt"}` | Syncer status for CRDB traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | +| PromQL | Description | +| :----- | :---------- | +| `rate(replica_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to a Replica Of database (bytes/sec) | +| `rate(replica_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to a Replica Of database (bytes/sec) | +| `database_syncer_lag_ms{syncer_type="replicaof"}` | Lag time between the source and the destination for Replica Of traffic (ms) | +| `database_syncer_current_status{syncer_type="replicaof"}` | Syncer status for Replica Of traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | +| `rate(crdt_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to CRDB (bytes/sec) | +| `rate(crdt_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to CRDB (bytes/sec) | +| `database_syncer_lag_ms{syncer_type="crdt"}` | Lag time between the source and the destination (ms) for CRDB traffic | +| `database_syncer_current_status{syncer_type="crdt"}` | Syncer status for CRDB traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | ## Shard metrics -| V1 metric | Equivalent V2 PromQL | Description | -| --------- | :------------------- | :---------- | -| redis_active_defrag_running | `redis_server_active_defrag_running` | Automatic memory defragmentation current aggressiveness (% cpu) | -| redis_allocator_active | `redis_server_allocator_active` | Total used memory, including external fragmentation | -| redis_allocator_allocated | `redis_server_allocator_allocated` | Total allocated memory | -| redis_allocator_resident | `redis_server_allocator_resident` | Total resident memory (RSS) | -| redis_aof_last_cow_size | `redis_server_aof_last_cow_size` | Last AOFR, CopyOnWrite memory | -| redis_aof_rewrite_in_progress | `redis_server_aof_rewrite_in_progress` | The number of simultaneous AOF rewrites that are in progress | -| redis_aof_rewrites | `redis_server_aof_rewrites` | Number of AOF rewrites this process executed | -| redis_aof_delayed_fsync | `redis_server_aof_delayed_fsync` | Number of times an AOF fsync caused delays in the main Redis thread (inducing latency); this can indicate that the disk is slow or overloaded | -| redis_blocked_clients | `redis_server_blocked_clients` | Count the clients waiting on a blocking call | -| redis_connected_clients | `redis_server_connected_clients` | Number of client connections to the specific shard | -| redis_connected_slaves | `redis_server_connected_slaves` | Number of connected replicas | -| redis_db0_avg_ttl | `redis_server_db0_avg_ttl` | Average TTL of all volatile keys | -| redis_db0_expires | `redis_server_expired_keys` | Total count of volatile keys | -| redis_db0_keys | `redis_server_db0_keys` | Total key count | -| redis_evicted_keys | `redis_server_evicted_keys` | Keys evicted so far (since restart) | -| redis_expire_cycle_cpu_milliseconds | `redis_server_expire_cycle_cpu_milliseconds` | The cumulative amount of time spent on active expiry cycles | -| redis_expired_keys | `redis_server_expired_keys` | Keys expired so far (since restart) | -| redis_forwarding_state | `redis_server_forwarding_state` | Shard forwarding state (on or off) | -| redis_keys_trimmed | `redis_server_keys_trimmed` | The number of keys that were trimmed in the current or last resharding process | -| redis_keyspace_read_hits | `redis_server_keyspace_read_hits` | Number of read operations accessing an existing keyspace | -| redis_keyspace_read_misses | `redis_server_keyspace_read_misses` | Number of read operations accessing a non-existing keyspace | -| redis_keyspace_write_hits | `redis_server_keyspace_write_hits` | Number of write operations accessing an existing keyspace | -| redis_keyspace_write_misses | `redis_server_keyspace_write_misses` | Number of write operations accessing a non-existing keyspace | -| redis_master_link_status | `redis_server_master_link_status` | Indicates if the replica is connected to its master | -| redis_master_repl_offset | `redis_server_master_repl_offset` | Number of bytes sent to replicas by the shard; calculate the throughput for a time period by comparing the value at different times | -| redis_master_sync_in_progress | `redis_server_master_sync_in_progress` | The master shard is synchronizing (1 true | 0 false) | -| redis_max_process_mem | `redis_server_max_process_mem` | Current memory limit configured by redis_mgr according to node free memory | -| redis_maxmemory | `redis_server_maxmemory` | Current memory limit configured by redis_mgr according to database memory limits | -| redis_mem_aof_buffer | `redis_server_mem_aof_buffer` | Current size of AOF buffer | -| redis_mem_clients_normal | `redis_server_mem_clients_normal` | Current memory used for input and output buffers of non-replica clients | -| redis_mem_clients_slaves | `redis_server_mem_clients_slaves` | Current memory used for input and output buffers of replica clients | -| redis_mem_fragmentation_ratio | `redis_server_mem_fragmentation_ratio` | Memory fragmentation ratio (1.3 means 30% overhead) | -| redis_mem_not_counted_for_evict | `redis_server_mem_not_counted_for_evict` | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | -| redis_mem_replication_backlog | `redis_server_mem_replication_backlog` | Size of replication backlog | -| redis_module_fork_in_progress | `redis_server_module_fork_in_progress` | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | -| redis_process_cpu_system_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="system"}` | Shard process system CPU time spent in seconds | -| redis_process_cpu_usage_percent | `namedprocess_namegroup_cpu_seconds_total{mode=~"system\|user"}` | Shard process CPU usage percentage | -| redis_process_cpu_user_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="user"}` | Shard user CPU time spent in seconds | -| redis_process_main_thread_cpu_system_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="system",threadname="redis-server"}` | Shard main thread system CPU time spent in seconds | -| redis_process_main_thread_cpu_user_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="user",threadname="redis-server"}` | Shard main thread user CPU time spent in seconds | -| redis_process_max_fds | `max(namedprocess_namegroup_open_filedesc)` | Shard maximum number of open file descriptors | -| redis_process_open_fds | `namedprocess_namegroup_open_filedesc` | Shard number of open file descriptors | -| redis_process_resident_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="resident"}` | Shard resident memory size in bytes | -| redis_process_start_time_seconds | `namedprocess_namegroup_oldest_start_time_seconds` | Shard start time of the process since unix epoch in seconds | -| redis_process_virtual_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="virtual"}` | Shard virtual memory in bytes | -| redis_rdb_bgsave_in_progress | `redis_server_rdb_bgsave_in_progress` | Indication if bgsave is currently in progress | -| redis_rdb_last_cow_size | `redis_server_rdb_last_cow_size` | Last bgsave (or SYNC fork) used CopyOnWrite memory | -| redis_rdb_saves | `redis_server_rdb_saves` | Total count of bgsaves since the process was restarted (including replica fullsync and persistence) | -| redis_repl_touch_bytes | `redis_server_repl_touch_bytes` | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; calculate the throughput for a time period by comparing the value at different times | -| redis_total_commands_processed | `redis_server_total_commands_processed` | Number of commands processed by the shard; calculate the number of commands for a time period by comparing the value at different times | -| redis_total_connections_received | `redis_server_total_connections_received` | Number of connections received by the shard; calculate the number of connections for a time period by comparing the value at different times | -| redis_total_net_input_bytes | `redis_server_total_net_input_bytes` | Number of bytes received by the shard; calculate the throughput for a time period by comparing the value at different times | -| redis_total_net_output_bytes | `redis_server_total_net_output_bytes` | Number of bytes sent by the shard; calculate the throughput for a time period by comparing the value at different times | -| redis_up | `redis_server_up` | Shard is up and running | -| redis_used_memory | `redis_server_used_memory` | Memory used by shard (in BigRedis this includes flash) (bytes) | +| PromQL | Description | +| :----- | :---------- | +| `redis_server_active_defrag_running` | Automatic memory defragmentation current aggressiveness (% cpu) | +| `redis_server_allocator_active` | Total used memory, including external fragmentation | +| `redis_server_allocator_allocated` | Total allocated memory | +| `redis_server_allocator_resident` | Total resident memory (RSS) | +| `redis_server_aof_last_cow_size` | Last AOFR, CopyOnWrite memory | +| `redis_server_aof_rewrite_in_progress` | The number of simultaneous AOF rewrites that are in progress | +| `redis_server_aof_rewrites` | Number of AOF rewrites this process executed | +| `redis_server_aof_delayed_fsync` | Number of times an AOF fsync caused delays in the main Redis thread (inducing latency); this can indicate that the disk is slow or overloaded | +| `redis_server_blocked_clients` | Count the clients waiting on a blocking call | +| `redis_server_connected_clients` | Number of client connections to the specific shard | +| `redis_server_connected_slaves` | Number of connected replicas | +| `redis_server_db0_avg_ttl` | Average TTL of all volatile keys | +| `redis_server_expired_keys` | Total count of volatile keys | +| `redis_server_db0_keys` | Total key count | +| `redis_server_evicted_keys` | Keys evicted so far (since restart) | +| `redis_server_expire_cycle_cpu_milliseconds` | The cumulative amount of time spent on active expiry cycles | +| `redis_server_expired_keys` | Keys expired so far (since restart) | +| `redis_server_forwarding_state` | Shard forwarding state (on or off) | +| `redis_server_keys_trimmed` | The number of keys that were trimmed in the current or last resharding process | +| `redis_server_keyspace_read_hits` | Number of read operations accessing an existing keyspace | +| `redis_server_keyspace_read_misses` | Number of read operations accessing a non-existing keyspace | +| `redis_server_keyspace_write_hits` | Number of write operations accessing an existing keyspace | +| `redis_server_keyspace_write_misses` | Number of write operations accessing a non-existing keyspace | +| `redis_server_master_link_status` | Indicates if the replica is connected to its master | +| `redis_server_master_repl_offset` | Number of bytes sent to replicas by the shard; calculate the throughput for a time period by comparing the value at different times | +| `redis_server_master_sync_in_progress` | The master shard is synchronizing (1 true | 0 false) | +| `redis_server_max_process_mem` | Current memory limit configured by redis_mgr according to node free memory | +| `redis_server_maxmemory` | Current memory limit configured by redis_mgr according to database memory limits | +| `redis_server_mem_aof_buffer` | Current size of AOF buffer | +| `redis_server_mem_clients_normal` | Current memory used for input and output buffers of non-replica clients | +| `redis_server_mem_clients_slaves` | Current memory used for input and output buffers of replica clients | +| `redis_server_mem_fragmentation_ratio` | Memory fragmentation ratio (1.3 means 30% overhead) | +| `redis_server_mem_not_counted_for_evict` | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | +| `redis_server_mem_replication_backlog` | Size of replication backlog | +| `redis_server_module_fork_in_progress` | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | +| `namedprocess_namegroup_cpu_seconds_total{mode="system"}` | Shard process system CPU time spent in seconds | +| `namedprocess_namegroup_cpu_seconds_total{mode=~"system\|user"}` | Shard process CPU usage percentage | +| `namedprocess_namegroup_cpu_seconds_total{mode="user"}` | Shard user CPU time spent in seconds | +| `namedprocess_namegroup_thread_cpu_seconds_total{mode="system",threadname="redis-server"}` | Shard main thread system CPU time spent in seconds | +| `namedprocess_namegroup_thread_cpu_seconds_total{mode="user",threadname="redis-server"}` | Shard main thread user CPU time spent in seconds | +| `max(namedprocess_namegroup_open_filedesc)` | Shard maximum number of open file descriptors | +| `namedprocess_namegroup_open_filedesc` | Shard number of open file descriptors | +| `namedprocess_namegroup_memory_bytes{memtype="resident"}` | Shard resident memory size in bytes | +| `namedprocess_namegroup_oldest_start_time_seconds` | Shard start time of the process since unix epoch in seconds | +| `namedprocess_namegroup_memory_bytes{memtype="virtual"}` | Shard virtual memory in bytes | +| `redis_server_rdb_bgsave_in_progress` | Indication if bgsave is currently in progress | +| `redis_server_rdb_last_cow_size` | Last bgsave (or SYNC fork) used CopyOnWrite memory | +| `redis_server_rdb_saves` | Total count of bgsaves since the process was restarted (including replica fullsync and persistence) | +| `redis_server_repl_touch_bytes` | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; calculate the throughput for a time period by comparing the value at different times | +| `redis_server_total_commands_processed` | Number of commands processed by the shard; calculate the number of commands for a time period by comparing the value at different times | +| `redis_server_total_connections_received` | Number of connections received by the shard; calculate the number of connections for a time period by comparing the value at different times | +| `redis_server_total_net_input_bytes` | Number of bytes received by the shard; calculate the throughput for a time period by comparing the value at different times | +| `redis_server_total_net_output_bytes` | Number of bytes sent by the shard; calculate the throughput for a time period by comparing the value at different times | +| `redis_server_up` | Shard is up and running | +| `redis_server_used_memory` | Memory used by shard (in BigRedis this includes flash) (bytes) | diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md new file mode 100644 index 0000000000..c9666bc0f6 --- /dev/null +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md @@ -0,0 +1,279 @@ +--- +Title: Prometheus v1 metrics and equivalent v2 PromQL +alwaysopen: false +categories: +- docs +- integrate +- rs +description: Transition from v1 metrics to v2 PromQL equivalents. +group: observability +linkTitle: Prometheus v1 metrics & v2 equivalents +summary: Transition from v1 metrics to v2 PromQL equivalents. +type: integration +weight: 45 +--- + +You can [integrate Redis Enterprise Software with Prometheus and Grafana]({{}}) to create dashboards for important metrics. + +As of Redis Enterprise Software version 7.8.0, [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/latest/querying/basics/) metrics are available, and v1 metrics are deprecated. You can use the following tables to transition from v1 metrics to equivalent v2 PromQL. For a list of all available v2 PromQL metrics, see [Prometheus metrics v2]({{}}). + +## Database metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| bdb_avg_latency | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of operations on the database (seconds); returned only when there is traffic | +| bdb_avg_latency_max | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of operations on the database (seconds); returned only when there is traffic | +| bdb_avg_read_latency | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of read operations (seconds); returned only when there is traffic | +| bdb_avg_read_latency_max | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of read operations (seconds); returned only when there is traffic | +| bdb_avg_write_latency | `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of write operations (seconds); returned only when there is traffic | +| bdb_avg_write_latency_max | `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of write operations (seconds); returned only when there is traffic | +| bdb_bigstore_shard_count | `sum((sum(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+", threadname=~"(speedb\|rocksdb).*"}, "redis", "$1", "groupname", "redis-(\d+)"), "driver", "$1", "threadname", "(speedb\|rocksdb).*")) by (redis, driver) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, driver)` | Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled | +| bdb_conns | `sum by(bdb) (endpoint_conns)` | Number of client connections to database | +| bdb_egress_bytes | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Rate of outgoing network traffic from the database (bytes/sec) | +| bdb_egress_bytes_max | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Highest value of the rate of outgoing network traffic from the database (bytes/sec) | +| bdb_evicted_objects | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Rate of key evictions from database (evictions/sec) | +| bdb_evicted_objects_max | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Highest value of the rate of key evictions from database (evictions/sec) | +| bdb_expired_objects | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Rate keys expired in database (expirations/sec) | +| bdb_expired_objects_max | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Highest value of the rate keys expired in database (expirations/sec) | +| bdb_fork_cpu_system | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | % cores utilization in system mode for all Redis shard fork child processes of this database | +| bdb_fork_cpu_system_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard fork child processes of this database | +| bdb_fork_cpu_user | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | % cores utilization in user mode for all Redis shard fork child processes of this database | +| bdb_fork_cpu_user_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard fork child processes of this database | +| bdb_ingress_bytes | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Rate of incoming network traffic to database (bytes/sec) | +| bdb_ingress_bytes_max | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Highest value of the rate of incoming network traffic to database (bytes/sec) | +| bdb_instantaneous_ops_per_sec | `sum by(bdb) (redis_server_instantaneous_ops_per_sec)` | Request rate handled by all shards of database (ops/sec) | +| bdb_main_thread_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | % cores utilization in system mode for all Redis shard main threads of this database | +| bdb_main_thread_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard main threads of this database | +| bdb_main_thread_cpu_user | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | % cores utilization in user mode for all Redis shard main threads of this database | +| bdb_main_thread_cpu_user_max | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard main threads of this database | +| bdb_mem_frag_ratio | `avg(redis_server_mem_fragmentation_ratio)` | RAM fragmentation ratio (RSS / allocated RAM) | +| bdb_mem_size_lua | `sum by(bdb) (redis_server_used_memory_lua)` | Redis lua scripting heap size (bytes) | +| bdb_memory_limit | `sum by(bdb) (redis_server_maxmemory)` | Configured RAM limit for the database | +| bdb_monitor_sessions_count | `sum by(bdb) (endpoint_monitor_sessions_count)` | Number of clients connected in monitor mode to the database | +| bdb_no_of_keys | `sum by (bdb) (redis_server_db_keys{role="master"})` | Number of keys in database | +| bdb_other_req | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Rate of other (non read/write) requests on the database (ops/sec) | +| bdb_other_req_max | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Highest value of the rate of other (non read/write) requests on the database (ops/sec) | +| bdb_other_res | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Rate of other (non read/write) responses on the database (ops/sec) | +| bdb_other_res_max | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Highest value of the rate of other (non read/write) responses on the database (ops/sec) | +| bdb_pubsub_channels | `sum by(bdb) (redis_server_pubsub_channels)` | Count the pub/sub channels with subscribed clients | +| bdb_pubsub_channels_max | `sum by(bdb) (redis_server_pubsub_channels)` | Highest value of count the pub/sub channels with subscribed clients | +| bdb_pubsub_patterns | `sum by(bdb) (redis_server_pubsub_patterns)` | Count the pub/sub patterns with subscribed clients | +| bdb_pubsub_patterns_max | `sum by(bdb) (redis_server_pubsub_patterns)` | Highest value of count the pub/sub patterns with subscribed clients | +| bdb_read_hits | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Rate of read operations accessing an existing key (ops/sec) | +| bdb_read_hits_max | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Highest value of the rate of read operations accessing an existing key (ops/sec) | +| bdb_read_misses | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Rate of read operations accessing a non-existing key (ops/sec) | +| bdb_read_misses_max | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Highest value of the rate of read operations accessing a non-existing key (ops/sec) | +| bdb_read_req | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Rate of read requests on the database (ops/sec) | +| bdb_read_req_max | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Highest value of the rate of read requests on the database (ops/sec) | +| bdb_read_res | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Rate of read responses on the database (ops/sec) | +| bdb_read_res_max | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Highest value of the rate of read responses on the database (ops/sec) | +| bdb_shard_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | % cores utilization in system mode for all Redis shard processes of this database | +| bdb_shard_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard processes of this database | +| bdb_shard_cpu_user | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | % cores utilization in user mode for the Redis shard process | +| bdb_shard_cpu_user_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | Highest value of % cores utilization in user mode for the Redis shard process | +| bdb_shards_used | `sum((sum(label_replace(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+"}, "redis", "$1", "groupname", "redis-(\d+)"), "shard_type", "flash", "threadname", "(bigstore).*"), "shard_type", "ram", "shard_type", "")) by (redis, shard_type) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, shard_type)` | Used shard count by database and by shard type (ram / flash) | +| bdb_total_connections_received | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Rate of new client connections to database (connections/sec) | +| bdb_total_connections_received_max | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Highest value of the rate of new client connections to database (connections/sec) | +| bdb_total_req | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Rate of all requests on the database (ops/sec) | +| bdb_total_req_max | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Highest value of the rate of all requests on the database (ops/sec) | +| bdb_total_res | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Rate of all responses on the database (ops/sec) | +| bdb_total_res_max | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Highest value of the rate of all responses on the database (ops/sec) | +| bdb_up | `min by(bdb) (redis_up)` | Database is up and running | +| bdb_used_memory | `sum by (bdb) (redis_server_used_memory)` | Memory used by database (in BigRedis this includes flash) (bytes) | +| bdb_write_hits | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Rate of write operations accessing an existing key (ops/sec) | +| bdb_write_hits_max | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Highest value of the rate of write operations accessing an existing key (ops/sec) | +| bdb_write_misses | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Rate of write operations accessing a non-existing key (ops/sec) | +| bdb_write_misses_max | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Highest value of the rate of write operations accessing a non-existing key (ops/sec) | +| bdb_write_req | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Rate of write requests on the database (ops/sec) | +| bdb_write_req_max | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Highest value of the rate of write requests on the database (ops/sec) | +| bdb_write_res | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Rate of write responses on the database (ops/sec) | +| bdb_write_res_max | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Highest value of the rate of write responses on the database (ops/sec) | +| no_of_expires | `sum by(bdb) (redis_server_db_expires{role="master"})` | Current number of volatile keys in the database | + +## Node metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| node_available_flash | `node_available_flash_bytes` | Available flash in the node (bytes) | +| node_available_flash_no_overbooking | `node_available_flash_no_overbooking_bytes` | Available flash in the node (bytes), without taking into account overbooking | +| node_available_memory | `node_available_memory_bytes` | Amount of free memory in the node (bytes) that is available for database provisioning | +| node_available_memory_no_overbooking | `node_available_memory_no_overbooking_bytes` | Available RAM in the node (bytes) without taking into account overbooking | +| node_avg_latency | `sum by (proxy) (irate(endpoint_acc_latency[1m])) / sum by (proxy) (irate(endpoint_total_started_res[1m]))` | Average latency of requests handled by endpoints on the node in milliseconds; returned only when there is traffic | +| node_bigstore_free | `node_bigstore_free_bytes` | Sum of free space of back-end flash (used by flash database's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | +| node_bigstore_iops | `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | +| node_bigstore_kv_ops | `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | +| node_bigstore_throughput | `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (bytes/sec); returned only when BigRedis is enabled | +| node_cert_expiration_seconds | `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | +| node_conns | `sum by (node) (endpoint_conns)` | Number of clients connected to endpoints on node | +| node_cpu_idle | `avg by (node) (irate(node_cpu_seconds_total{mode="idle"}[1m]))` | CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_max | N/A | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_median | N/A | Average value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_min | N/A | Lowest value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_system | `avg by (node) (irate(node_cpu_seconds_total{mode="system"}[1m]))` | CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_max | N/A | Highest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_median | N/A | Average value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_min | N/A | Lowest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_user | `avg by (node) (irate(node_cpu_seconds_total{mode="user"}[1m]))` | CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_max | N/A | Highest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_median | N/A | Average value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_min | N/A | Lowest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cur_aof_rewrites | `sum by (cluster, node) (redis_server_aof_rewrite_in_progress)` | Number of AOF rewrites that are currently performed by shards on this node | +| node_egress_bytes | `irate(node_network_transmit_bytes_total{device=""}[1m])` | Rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_max | N/A | Highest value of the rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_median | N/A | Average value of the rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_min | N/A | Lowest value of the rate of outgoing network traffic to node (bytes/sec) | +| node_ephemeral_storage_avail | `node_ephemeral_storage_avail_bytes` | Disk space available to RLEC processes on configured ephemeral disk (bytes) | +| node_ephemeral_storage_free | `node_ephemeral_storage_free_bytes` | Free disk space on configured ephemeral disk (bytes) | +| node_free_memory | `node_memory_MemFree_bytes` | Free memory in the node (bytes) | +| node_ingress_bytes | `irate(node_network_receive_bytes_total{device=""}[1m])` | Rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_max | N/A | Highest value of the rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_median | N/A | Average value of the rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_min | N/A | Lowest value of the rate of incoming network traffic to node (bytes/sec) | +| node_persistent_storage_avail | `node_persistent_storage_avail_bytes` | Disk space available to RLEC processes on configured persistent disk (bytes) | +| node_persistent_storage_free | `node_persistent_storage_free_bytes` | Free disk space on configured persistent disk (bytes) | +| node_provisional_flash | `node_provisional_flash_bytes` | Amount of flash available for new shards on this node, taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| node_provisional_flash_no_overbooking | `node_provisional_flash_no_overbooking_bytes` | Amount of flash available for new shards on this node, without taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| node_provisional_memory | `node_provisional_memory_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases | +| node_provisional_memory_no_overbooking | `node_provisional_memory_no_overbooking_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases, without taking into account overbooking | +| node_total_req | `sum by (cluster, node) (irate(endpoint_total_req[1m]))` | Request rate handled by endpoints on node (ops/sec) | +| node_up | `node_metrics_up` | Node is part of the cluster and is connected | + +## Cluster metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| cluster_shards_limit | `license_shards_limit` | Total shard limit by the license by shard type (ram / flash) | + +## Proxy metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| listener_acc_latency | N/A | Accumulative latency (sum of the latencies) of all types of commands on the database. For the average latency, divide this value by listener_total_res | +| listener_acc_latency_max | N/A | Highest value of accumulative latency of all types of commands on the database | +| listener_acc_other_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "other" on the database. For the average latency, divide this value by listener_other_res | +| listener_acc_other_latency_max | N/A | Highest value of accumulative latency of commands that are a type "other" on the database | +| listener_acc_read_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "read" on the database. For the average latency, divide this value by listener_read_res | +| listener_acc_read_latency_max | N/A | Highest value of accumulative latency of commands that are a type "read" on the database | +| listener_acc_write_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "write" on the database. For the average latency, divide this value by listener_write_res | +| listener_acc_write_latency_max | N/A | Highest value of accumulative latency of commands that are a type "write" on the database | +| listener_auth_cmds | N/A | Number of memcached AUTH commands sent to the database | +| listener_auth_cmds_max | N/A | Highest value of the number of memcached AUTH commands sent to the database | +| listener_auth_errors | N/A | Number of error responses to memcached AUTH commands | +| listener_auth_errors_max | N/A | Highest value of the number of error responses to memcached AUTH commands | +| listener_cmd_flush | N/A | Number of memcached FLUSH_ALL commands sent to the database | +| listener_cmd_flush_max | N/A | Highest value of the number of memcached FLUSH_ALL commands sent to the database | +| listener_cmd_get | N/A | Number of memcached GET commands sent to the database | +| listener_cmd_get_max | N/A | Highest value of the number of memcached GET commands sent to the database | +| listener_cmd_set | N/A | Number of memcached SET commands sent to the database | +| listener_cmd_set_max | N/A | Highest value of the number of memcached SET commands sent to the database | +| listener_cmd_touch | N/A | Number of memcached TOUCH commands sent to the database | +| listener_cmd_touch_max | N/A | Highest value of the number of memcached TOUCH commands sent to the database | +| listener_conns | N/A | Number of clients connected to the endpoint | +| listener_egress_bytes | N/A | Rate of outgoing network traffic to the endpoint (bytes/sec) | +| listener_egress_bytes_max | N/A | Highest value of the rate of outgoing network traffic to the endpoint (bytes/sec) | +| listener_ingress_bytes | N/A | Rate of incoming network traffic to the endpoint (bytes/sec) | +| listener_ingress_bytes_max | N/A | Highest value of the rate of incoming network traffic to the endpoint (bytes/sec) | +| listener_last_req_time | N/A | Time of last command sent to the database | +| listener_last_res_time | N/A | Time of last response sent from the database | +| listener_max_connections_exceeded | `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the database at the same time has exceeded the max limit | +| listener_max_connections_exceeded_max | N/A | Highest value of the number of times the number of clients connected to the database at the same time has exceeded the max limit | +| listener_monitor_sessions_count | N/A | Number of clients connected in monitor mode to the endpoint | +| listener_other_req | N/A | Rate of other (non-read/write) requests on the endpoint (ops/sec) | +| listener_other_req_max | N/A | Highest value of the rate of other (non-read/write) requests on the endpoint (ops/sec) | +| listener_other_res | N/A | Rate of other (non-read/write) responses on the endpoint (ops/sec) | +| listener_other_res_max | N/A | Highest value of the rate of other (non-read/write) responses on the endpoint (ops/sec) | +| listener_other_started_res | N/A | Number of responses sent from the database of type "other" | +| listener_other_started_res_max | N/A | Highest value of the number of responses sent from the database of type "other" | +| listener_read_req | `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | +| listener_read_req_max | N/A | Highest value of the rate of read requests on the endpoint (ops/sec) | +| listener_read_res | `irate(endpoint_read_responses[1m])` | Rate of read responses on the endpoint (ops/sec) | +| listener_read_res_max | N/A | Highest value of the rate of read responses on the endpoint (ops/sec) | +| listener_read_started_res | N/A | Number of responses sent from the database of type "read" | +| listener_read_started_res_max | N/A | Highest value of the number of responses sent from the database of type "read" | +| listener_total_connections_received | `irate(endpoint_total_connections_received[1m])` | Rate of new client connections to the endpoint (connections/sec) | +| listener_total_connections_received_max | N/A | Highest value of the rate of new client connections to the endpoint (connections/sec) | +| listener_total_req | N/A | Request rate handled by the endpoint (ops/sec) | +| listener_total_req_max | N/A | Highest value of the rate of all requests on the endpoint (ops/sec) | +| listener_total_res | N/A | Rate of all responses on the endpoint (ops/sec) | +| listener_total_res_max | N/A | Highest value of the rate of all responses on the endpoint (ops/sec) | +| listener_total_started_res | N/A | Number of responses sent from the database of all types | +| listener_total_started_res_max | N/A | Highest value of the number of responses sent from the database of all types | +| listener_write_req | `irate(endpoint_write_requests[1m])` | Rate of write requests on the endpoint (ops/sec) | +| listener_write_req_max | N/A | Highest value of the rate of write requests on the endpoint (ops/sec) | +| listener_write_res | `irate(endpoint_write_responses[1m])` | Rate of write responses on the endpoint (ops/sec) | +| listener_write_res_max | N/A | Highest value of the rate of write responses on the endpoint (ops/sec) | +| listener_write_started_res | N/A | Number of responses sent from the database of type "write" | +| listener_write_started_res_max | N/A | Highest value of the number of responses sent from the database of type "write" | + +## Replication metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| bdb_replicaof_syncer_ingress_bytes | `rate(replica_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to a Replica Of database (bytes/sec) | +| bdb_replicaof_syncer_ingress_bytes_decompressed | `rate(replica_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to a Replica Of database (bytes/sec) | +| bdb_replicaof_syncer_local_ingress_lag_time | `database_syncer_lag_ms{syncer_type="replicaof"}` | Lag time between the source and the destination for Replica Of traffic (ms) | +| bdb_replicaof_syncer_status | `database_syncer_current_status{syncer_type="replicaof"}` | Syncer status for Replica Of traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | +| bdb_crdt_syncer_ingress_bytes | `rate(crdt_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to CRDB (bytes/sec) | +| bdb_crdt_syncer_ingress_bytes_decompressed | `rate(crdt_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to CRDB (bytes/sec) | +| bdb_crdt_syncer_local_ingress_lag_time | `database_syncer_lag_ms{syncer_type="crdt"}` | Lag time between the source and the destination (ms) for CRDB traffic | +| bdb_crdt_syncer_status | `database_syncer_current_status{syncer_type="crdt"}` | Syncer status for CRDB traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | + +## Shard metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| redis_active_defrag_running | `redis_server_active_defrag_running` | Automatic memory defragmentation current aggressiveness (% cpu) | +| redis_allocator_active | `redis_server_allocator_active` | Total used memory, including external fragmentation | +| redis_allocator_allocated | `redis_server_allocator_allocated` | Total allocated memory | +| redis_allocator_resident | `redis_server_allocator_resident` | Total resident memory (RSS) | +| redis_aof_last_cow_size | `redis_server_aof_last_cow_size` | Last AOFR, CopyOnWrite memory | +| redis_aof_rewrite_in_progress | `redis_server_aof_rewrite_in_progress` | The number of simultaneous AOF rewrites that are in progress | +| redis_aof_rewrites | `redis_server_aof_rewrites` | Number of AOF rewrites this process executed | +| redis_aof_delayed_fsync | `redis_server_aof_delayed_fsync` | Number of times an AOF fsync caused delays in the main Redis thread (inducing latency); this can indicate that the disk is slow or overloaded | +| redis_blocked_clients | `redis_server_blocked_clients` | Count the clients waiting on a blocking call | +| redis_connected_clients | `redis_server_connected_clients` | Number of client connections to the specific shard | +| redis_connected_slaves | `redis_server_connected_slaves` | Number of connected replicas | +| redis_db0_avg_ttl | `redis_server_db0_avg_ttl` | Average TTL of all volatile keys | +| redis_db0_expires | `redis_server_expired_keys` | Total count of volatile keys | +| redis_db0_keys | `redis_server_db0_keys` | Total key count | +| redis_evicted_keys | `redis_server_evicted_keys` | Keys evicted so far (since restart) | +| redis_expire_cycle_cpu_milliseconds | `redis_server_expire_cycle_cpu_milliseconds` | The cumulative amount of time spent on active expiry cycles | +| redis_expired_keys | `redis_server_expired_keys` | Keys expired so far (since restart) | +| redis_forwarding_state | `redis_server_forwarding_state` | Shard forwarding state (on or off) | +| redis_keys_trimmed | `redis_server_keys_trimmed` | The number of keys that were trimmed in the current or last resharding process | +| redis_keyspace_read_hits | `redis_server_keyspace_read_hits` | Number of read operations accessing an existing keyspace | +| redis_keyspace_read_misses | `redis_server_keyspace_read_misses` | Number of read operations accessing a non-existing keyspace | +| redis_keyspace_write_hits | `redis_server_keyspace_write_hits` | Number of write operations accessing an existing keyspace | +| redis_keyspace_write_misses | `redis_server_keyspace_write_misses` | Number of write operations accessing a non-existing keyspace | +| redis_master_link_status | `redis_server_master_link_status` | Indicates if the replica is connected to its master | +| redis_master_repl_offset | `redis_server_master_repl_offset` | Number of bytes sent to replicas by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_master_sync_in_progress | `redis_server_master_sync_in_progress` | The master shard is synchronizing (1 true | 0 false) | +| redis_max_process_mem | `redis_server_max_process_mem` | Current memory limit configured by redis_mgr according to node free memory | +| redis_maxmemory | `redis_server_maxmemory` | Current memory limit configured by redis_mgr according to database memory limits | +| redis_mem_aof_buffer | `redis_server_mem_aof_buffer` | Current size of AOF buffer | +| redis_mem_clients_normal | `redis_server_mem_clients_normal` | Current memory used for input and output buffers of non-replica clients | +| redis_mem_clients_slaves | `redis_server_mem_clients_slaves` | Current memory used for input and output buffers of replica clients | +| redis_mem_fragmentation_ratio | `redis_server_mem_fragmentation_ratio` | Memory fragmentation ratio (1.3 means 30% overhead) | +| redis_mem_not_counted_for_evict | `redis_server_mem_not_counted_for_evict` | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | +| redis_mem_replication_backlog | `redis_server_mem_replication_backlog` | Size of replication backlog | +| redis_module_fork_in_progress | `redis_server_module_fork_in_progress` | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | +| redis_process_cpu_system_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="system"}` | Shard process system CPU time spent in seconds | +| redis_process_cpu_usage_percent | `namedprocess_namegroup_cpu_seconds_total{mode=~"system\|user"}` | Shard process CPU usage percentage | +| redis_process_cpu_user_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="user"}` | Shard user CPU time spent in seconds | +| redis_process_main_thread_cpu_system_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="system",threadname="redis-server"}` | Shard main thread system CPU time spent in seconds | +| redis_process_main_thread_cpu_user_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="user",threadname="redis-server"}` | Shard main thread user CPU time spent in seconds | +| redis_process_max_fds | `max(namedprocess_namegroup_open_filedesc)` | Shard maximum number of open file descriptors | +| redis_process_open_fds | `namedprocess_namegroup_open_filedesc` | Shard number of open file descriptors | +| redis_process_resident_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="resident"}` | Shard resident memory size in bytes | +| redis_process_start_time_seconds | `namedprocess_namegroup_oldest_start_time_seconds` | Shard start time of the process since unix epoch in seconds | +| redis_process_virtual_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="virtual"}` | Shard virtual memory in bytes | +| redis_rdb_bgsave_in_progress | `redis_server_rdb_bgsave_in_progress` | Indication if bgsave is currently in progress | +| redis_rdb_last_cow_size | `redis_server_rdb_last_cow_size` | Last bgsave (or SYNC fork) used CopyOnWrite memory | +| redis_rdb_saves | `redis_server_rdb_saves` | Total count of bgsaves since the process was restarted (including replica fullsync and persistence) | +| redis_repl_touch_bytes | `redis_server_repl_touch_bytes` | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; calculate the throughput for a time period by comparing the value at different times | +| redis_total_commands_processed | `redis_server_total_commands_processed` | Number of commands processed by the shard; calculate the number of commands for a time period by comparing the value at different times | +| redis_total_connections_received | `redis_server_total_connections_received` | Number of connections received by the shard; calculate the number of connections for a time period by comparing the value at different times | +| redis_total_net_input_bytes | `redis_server_total_net_input_bytes` | Number of bytes received by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_total_net_output_bytes | `redis_server_total_net_output_bytes` | Number of bytes sent by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_up | `redis_server_up` | Shard is up and running | +| redis_used_memory | `redis_server_used_memory` | Memory used by shard (in BigRedis this includes flash) (bytes) | From 560223140992a557003693ea7adee33f763aaadd Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 27 Sep 2024 16:39:25 -0500 Subject: [PATCH 12/25] DOC-3552 Cluster watchdog Prometheus metrics --- .../prometheus-metrics-definitions.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index a7c776c8e5..ff4bf8feb5 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -129,6 +129,21 @@ The [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/l | :----- | :---------- | | `license_shards_limit` | Total shard limit by the license by shard type (ram / flash) | +## Cluster watchdog metrics + +| PromQL | Type | Description | +| :----- | :--- | :---------- | +| `azure_token_ttl{cluster_wd=}` | gauge| How many seconds left or the timestamp when the token is invalid.| +| `generation{cluster_wd=}` | gauge| Generation number of the specific cluster_wd| +| `has_qourum{cluster_wd=, has_witness_disk=BOOL}` | gauge| Has_qourum = 1
No quorum = 0 | +| `is_primary{cluster_wd=}` | gauge| primary = 1
secondary = 0 | +| `total_live_nodes_count{cluster_wd=}` | gauge| Number of live nodes| +| `total_node_count{cluster_wd=}` | gauge| Number of nodes | +| `total_primary_selection_ended{cluster_wd=}` | counter | Monotonic counter for each selection process that ended | +| `total_primary_selections{cluster_wd=}` | counter | Monotonic counter for each selection process that started| +| `witness_disk_reads{status=” success/failure”, cluster_wd=}` | counter | How many times read from the witness disk | +| `witness_disk_writes{status=”success/failure”, cluster_wd=}` | counter | How many times wrote to the witness disk | + ## Proxy metrics | PromQL | Description | From 5ff7cdb1101766c02115358d85f117ce98fd2e8e Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Wed, 9 Oct 2024 16:00:52 -0500 Subject: [PATCH 13/25] DOC-3944 Placeholder - plan to provide v2 metric names when available instead of PromQL on dedicated v2 metrics page --- .../prometheus-metrics-definitions.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index ff4bf8feb5..f3519cf01f 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -5,21 +5,21 @@ categories: - docs - integrate - rs -description: PromQL metrics available to Prometheus as of Redis Enterprise Software version 7.8.0. +description: V2 metrics available to Prometheus as of Redis Enterprise Software version 7.8.0. group: observability linkTitle: Prometheus metrics v2 -summary: PromQL metrics available to Prometheus as of Redis Enterprise Software version 7.8.0. +summary: V2 metrics available to Prometheus as of Redis Enterprise Software version 7.8.0. type: integration weight: 45 --- You can [integrate Redis Enterprise Software with Prometheus and Grafana]({{}}) to create dashboards for important metrics. -The [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/latest/querying/basics/) metrics in the following tables are available as of Redis Enterprise Software version 7.8.0. For help transitioning from v1 metrics to PromQL, see [Prometheus v1 metrics and equivalent v2 PromQL]({{}}). +The v2 metrics in the following tables are available as of Redis Enterprise Software version 7.8.0. For help transitioning from v1 metrics to v2 PromQL, see [Prometheus v1 metrics and equivalent v2 PromQL]({{}}). ## Database metrics -| PromQL | Description | +| V2 metric | Description | | :----- | :---------- | | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of operations on the database (seconds); returned only when there is traffic | | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of operations on the database (seconds); returned only when there is traffic | @@ -92,7 +92,7 @@ The [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/l ## Node metrics -| PromQL | Description | +| V2 metric | Description | | :----- | :---------- | | `node_available_flash_bytes` | Available flash in the node (bytes) | | `node_available_flash_no_overbooking_bytes` | Available flash in the node (bytes), without taking into account overbooking | @@ -125,13 +125,13 @@ The [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/l ## Cluster metrics -| PromQL | Description | +| V2 metric | Description | | :----- | :---------- | | `license_shards_limit` | Total shard limit by the license by shard type (ram / flash) | ## Cluster watchdog metrics -| PromQL | Type | Description | +| V2 metric | Type | Description | | :----- | :--- | :---------- | | `azure_token_ttl{cluster_wd=}` | gauge| How many seconds left or the timestamp when the token is invalid.| | `generation{cluster_wd=}` | gauge| Generation number of the specific cluster_wd| @@ -146,7 +146,7 @@ The [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/l ## Proxy metrics -| PromQL | Description | +| V2 metric | Description | | :----- | :---------- | | `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the database at the same time has exceeded the max limit | | `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | @@ -157,7 +157,7 @@ The [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/l ## Replication metrics -| PromQL | Description | +| V2 metric | Description | | :----- | :---------- | | `rate(replica_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to a Replica Of database (bytes/sec) | | `rate(replica_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to a Replica Of database (bytes/sec) | @@ -170,7 +170,7 @@ The [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/l ## Shard metrics -| PromQL | Description | +| V2 metric | Description | | :----- | :---------- | | `redis_server_active_defrag_running` | Automatic memory defragmentation current aggressiveness (% cpu) | | `redis_server_allocator_active` | Total used memory, including external fragmentation | From 76217070b06a799c1f6511904d10c65ef5f77358 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Wed, 9 Oct 2024 16:05:05 -0500 Subject: [PATCH 14/25] Table formatting --- .../prometheus-metrics-definitions.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index f3519cf01f..2203754b54 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -20,7 +20,7 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Database metrics | V2 metric | Description | -| :----- | :---------- | +| :-------- | :---------- | | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of operations on the database (seconds); returned only when there is traffic | | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of operations on the database (seconds); returned only when there is traffic | | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of read operations (seconds); returned only when there is traffic | @@ -93,7 +93,7 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Node metrics | V2 metric | Description | -| :----- | :---------- | +| :-------- | :---------- | | `node_available_flash_bytes` | Available flash in the node (bytes) | | `node_available_flash_no_overbooking_bytes` | Available flash in the node (bytes), without taking into account overbooking | | `node_available_memory_bytes` | Amount of free memory in the node (bytes) that is available for database provisioning | @@ -126,13 +126,13 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Cluster metrics | V2 metric | Description | -| :----- | :---------- | +| :-------- | :---------- | | `license_shards_limit` | Total shard limit by the license by shard type (ram / flash) | ## Cluster watchdog metrics | V2 metric | Type | Description | -| :----- | :--- | :---------- | +| :-------- | :--- | :---------- | | `azure_token_ttl{cluster_wd=}` | gauge| How many seconds left or the timestamp when the token is invalid.| | `generation{cluster_wd=}` | gauge| Generation number of the specific cluster_wd| | `has_qourum{cluster_wd=, has_witness_disk=BOOL}` | gauge| Has_qourum = 1
No quorum = 0 | @@ -147,7 +147,7 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Proxy metrics | V2 metric | Description | -| :----- | :---------- | +| :-------- | :---------- | | `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the database at the same time has exceeded the max limit | | `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | | `irate(endpoint_read_responses[1m])` | Rate of read responses on the endpoint (ops/sec) | @@ -158,7 +158,7 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Replication metrics | V2 metric | Description | -| :----- | :---------- | +| :-------- | :---------- | | `rate(replica_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to a Replica Of database (bytes/sec) | | `rate(replica_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to a Replica Of database (bytes/sec) | | `database_syncer_lag_ms{syncer_type="replicaof"}` | Lag time between the source and the destination for Replica Of traffic (ms) | @@ -171,7 +171,7 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Shard metrics | V2 metric | Description | -| :----- | :---------- | +| :-------- | :---------- | | `redis_server_active_defrag_running` | Automatic memory defragmentation current aggressiveness (% cpu) | | `redis_server_allocator_active` | Total used memory, including external fragmentation | | `redis_server_allocator_allocated` | Total allocated memory | From 415a98edfb67f51216a69ff1a1a294e8cb306b03 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Wed, 9 Oct 2024 16:31:54 -0500 Subject: [PATCH 15/25] DOC-4294 Add latency histogram metrics to v2 Prometheus metrics --- .../prometheus-metrics-definitions.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 2203754b54..a75e9b89c6 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -144,6 +144,14 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft | `witness_disk_reads{status=” success/failure”, cluster_wd=}` | counter | How many times read from the witness disk | | `witness_disk_writes{status=”success/failure”, cluster_wd=}` | counter | How many times wrote to the witness disk | +## Latency histogram metrics + +| V2 metric | Description | +| :-------- | :---------- | +| `endpoint_other_requests_latency_histogram_bucket{cluster="$cluster", bdb="$bdb"}` | Latency histograms for commands other than read or write commands | +| `endpoint_read_requests_latency_histogram_bucket{cluster="$cluster", bdb="$bdb"}` | Latency histograms for read commands | +| `endpoint_write_requests_latency_histogram_bucket{cluster="$cluster", bdb="$bdb"}` | Latency histograms for write commands | + ## Proxy metrics | V2 metric | Description | From 44a12ebe046f575c3b2d9b7e31390233d0649f5d Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Thu, 10 Oct 2024 16:09:16 -0500 Subject: [PATCH 16/25] Remove PromQL from dedicated v2 metrics page --- .../prometheus-metrics-definitions.md | 267 ++++++------------ 1 file changed, 84 insertions(+), 183 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index a75e9b89c6..741f4b6040 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -19,219 +19,120 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Database metrics -| V2 metric | Description | -| :-------- | :---------- | -| `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of operations on the database (seconds); returned only when there is traffic | -| `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of operations on the database (seconds); returned only when there is traffic | -| `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of read operations (seconds); returned only when there is traffic | -| `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of read operations (seconds); returned only when there is traffic | -| `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of write operations (seconds); returned only when there is traffic | -| `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of write operations (seconds); returned only when there is traffic | -| `sum((sum(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+", threadname=~"(speedb\|rocksdb).*"}, "redis", "$1", "groupname", "redis-(\d+)"), "driver", "$1", "threadname", "(speedb\|rocksdb).*")) by (redis, driver) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, driver)` | Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled | -| `sum by(bdb) (endpoint_conns)` | Number of client connections to database | -| `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Rate of outgoing network traffic from the database (bytes/sec) | -| `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Highest value of the rate of outgoing network traffic from the database (bytes/sec) | -| `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Rate of key evictions from database (evictions/sec) | -| `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Highest value of the rate of key evictions from database (evictions/sec) | -| `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Rate of keys expired in database (expirations/sec) | -| `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Highest value of the rate of keys expired in database (expirations/sec) | -| `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | % cores utilization in system mode for all Redis shard fork child processes of this database | -| `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard fork child processes of this database | -| `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | % cores utilization in user mode for all Redis shard fork child processes of this database | -| `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard fork child processes of this database | -| `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Rate of incoming network traffic to database (bytes/sec) | -| `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Highest value of the rate of incoming network traffic to database (bytes/sec) | -| `sum by(bdb) (redis_server_instantaneous_ops_per_sec)` | Request rate handled by all shards of database (ops/sec) | -| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | % cores utilization in system mode for all Redis shard main threads of this database | -| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard main threads of this database | -| `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | % cores utilization in user mode for all Redis shard main threads of this database | -| `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard main threads of this database | -| `avg(redis_server_mem_fragmentation_ratio)` | RAM fragmentation ratio (RSS / allocated RAM) | -| `sum by(bdb) (redis_server_used_memory_lua)` | Redis lua scripting heap size (bytes) | -| `sum by(bdb) (redis_server_maxmemory)` | Configured RAM limit for the database | -| `sum by(bdb) (endpoint_monitor_sessions_count)` | Number of clients connected in monitor mode to the database | -| `sum by (bdb) (redis_server_db_keys{role="master"})` | Number of keys in database | -| `sum by(bdb) (irate(endpoint_other_req[1m]))` | Rate of other (non-read/write) requests on the database (ops/sec) | -| `sum by(bdb) (irate(endpoint_other_req[1m]))` | Highest value of the rate of other (non-read/write) requests on the database (ops/sec) | -| `sum by(bdb) (irate(endpoint_other_res[1m]))` | Rate of other (non-read/write) responses on the database (ops/sec) | -| `sum by(bdb) (irate(endpoint_other_res[1m]))` | Highest value of the rate of other (non-read/write) responses on the database (ops/sec) | -| `sum by(bdb) (redis_server_pubsub_channels)` | Count the pub/sub channels with subscribed clients | -| `sum by(bdb) (redis_server_pubsub_channels)` | Highest value of count the pub/sub channels with subscribed clients | -| `sum by(bdb) (redis_server_pubsub_patterns)` | Count the pub/sub patterns with subscribed clients | -| `sum by(bdb) (redis_server_pubsub_patterns)` | Highest value of count the pub/sub patterns with subscribed clients | -| `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Rate of read operations accessing an existing key (ops/sec) | -| `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Highest value of the rate of read operations accessing an existing key (ops/sec) | -| `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Rate of read operations accessing a non-existing key (ops/sec) | -| `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Highest value of the rate of read operations accessing a non-existing key (ops/sec) | -| `sum by (bdb) (irate(endpoint_read_req[1m]))` | Rate of read requests on the database (ops/sec) | -| `sum by (bdb) (irate(endpoint_read_req[1m]))` | Highest value of the rate of read requests on the database (ops/sec) | -| `sum by(bdb) (irate(endpoint_read_res[1m]))` | Rate of read responses on the database (ops/sec) | -| `sum by(bdb) (irate(endpoint_read_res[1m]))` | Highest value of the rate of read responses on the database (ops/sec) | -| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | % cores utilization in system mode for all Redis shard processes of this database | -| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard processes of this database | -| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | % cores utilization in user mode for the Redis shard process | -| `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | Highest value of % cores utilization in user mode for the Redis shard process | -| `sum((sum(label_replace(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+"}, "redis", "$1", "groupname", "redis-(\d+)"), "shard_type", "flash", "threadname", "(bigstore).*"), "shard_type", "ram", "shard_type", "")) by (redis, shard_type) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, shard_type)` | Used shard count by database and by shard type (ram / flash) | -| `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Rate of new client connections to database (connections/sec) | -| `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Highest value of the rate of new client connections to database (connections/sec) | -| `sum by (bdb) (irate(endpoint_total_req[1m]))` | Rate of all requests on the database (ops/sec) | -| `sum by (bdb) (irate(endpoint_total_req[1m]))` | Highest value of the rate of all requests on the database (ops/sec) | -| `sum by(bdb) (irate(endpoint_total_res[1m]))` | Rate of all responses on the database (ops/sec) | -| `sum by(bdb) (irate(endpoint_total_res[1m]))` | Highest value of the rate of all responses on the database (ops/sec) | -| `min by(bdb) (redis_up)` | Database is up and running | -| `sum by (bdb) (redis_server_used_memory)` | Memory used by database (in BigRedis this includes flash) (bytes) | -| `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Rate of write operations accessing an existing key (ops/sec) | -| `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Highest value of the rate of write operations accessing an existing key (ops/sec) | -| `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Rate of write operations accessing a non-existing key (ops/sec) | -| `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Highest value of the rate of write operations accessing a non-existing key (ops/sec) | -| `sum by (bdb) (irate(endpoint_write_req[1m]))` | Rate of write requests on the database (ops/sec) | -| `sum by (bdb) (irate(endpoint_write_req[1m]))` | Highest value of the rate of write requests on the database (ops/sec) | -| `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Rate of write responses on the database (ops/sec) | -| `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Highest value of the rate of write responses on the database (ops/sec) | -| `sum by(bdb) (redis_server_db_expires{role="master"})` | Current number of volatile keys in the database | +TBA ## Node metrics | V2 metric | Description | | :-------- | :---------- | -| `node_available_flash_bytes` | Available flash in the node (bytes) | -| `node_available_flash_no_overbooking_bytes` | Available flash in the node (bytes), without taking into account overbooking | -| `node_available_memory_bytes` | Amount of free memory in the node (bytes) that is available for database provisioning | -| `node_available_memory_no_overbooking_bytes` | Available RAM in the node (bytes) without taking into account overbooking | -| `sum by (proxy) (irate(endpoint_acc_latency[1m])) / sum by (proxy) (irate(endpoint_total_started_res[1m]))` | Average latency of requests handled by endpoints on the node in milliseconds; returned only when there is traffic | -| `node_bigstore_free_bytes` | Sum of free space of back-end flash (used by flash database's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | -| `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | -| `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | -| `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (bytes/sec); returned only when BigRedis is enabled | -| `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | -| `sum by (node) (endpoint_conns)` | Number of clients connected to endpoints on node | -| `avg by (node) (irate(node_cpu_seconds_total{mode="idle"}[1m]))` | CPU idle time portion (0-1, multiply by 100 to get percent) | -| `avg by (node) (irate(node_cpu_seconds_total{mode="system"}[1m]))` | CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | -| `avg by (node) (irate(node_cpu_seconds_total{mode="user"}[1m]))` | CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | -| `sum by (cluster, node) (redis_server_aof_rewrite_in_progress)` | Number of AOF rewrites that are currently performed by shards on this node | -| `irate(node_network_transmit_bytes_total{device=""}[1m])` | Rate of outgoing network traffic to node (bytes/sec) | -| `node_ephemeral_storage_avail_bytes` | Disk space available to RLEC processes on configured ephemeral disk (bytes) | -| `node_ephemeral_storage_free_bytes` | Free disk space on configured ephemeral disk (bytes) | -| `node_memory_MemFree_bytes` | Free memory in the node (bytes) | -| `irate(node_network_receive_bytes_total{device=""}[1m])` | Rate of incoming network traffic to node (bytes/sec) | -| `node_persistent_storage_avail_bytes` | Disk space available to RLEC processes on configured persistent disk (bytes) | -| `node_persistent_storage_free_bytes` | Free disk space on configured persistent disk (bytes) | -| `node_provisional_flash_bytes` | Amount of flash available for new shards on this node, taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | -| `node_provisional_flash_no_overbooking_bytes` | Amount of flash available for new shards on this node, without taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | -| `node_provisional_memory_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases | -| `node_provisional_memory_no_overbooking_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases, without taking into account overbooking | -| `sum by (cluster, node) (irate(endpoint_total_req[1m]))` | Request rate handled by endpoints on node (ops/sec) | -| `node_metrics_up` | Node is part of the cluster and is connected | +| node_available_flash_bytes | Available flash in the node (bytes) | +| node_available_flash_no_overbooking_bytes | Available flash in the node (bytes), without taking into account overbooking | +| node_available_memory_bytes | Amount of free memory in the node (bytes) that is available for database provisioning | +| node_available_memory_no_overbooking_bytes | Available RAM in the node (bytes) without taking into account overbooking | +| node_bigstore_free_bytes | Sum of free space of back-end flash (used by flash database's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | +| x509_cert_expires_in_seconds | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | +| node_ephemeral_storage_avail_bytes | Disk space available to RLEC processes on configured ephemeral disk (bytes) | +| node_ephemeral_storage_free_bytes | Free disk space on configured ephemeral disk (bytes) | +| node_memory_MemFree_bytes | Free memory in the node (bytes) | +| node_persistent_storage_avail_bytes | Disk space available to RLEC processes on configured persistent disk (bytes) | +| node_persistent_storage_free_bytes | Free disk space on configured persistent disk (bytes) | +| node_provisional_flash_bytes | Amount of flash available for new shards on this node, taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| node_provisional_flash_no_overbooking_bytes | Amount of flash available for new shards on this node, without taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| node_provisional_memory_bytes | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases | +| node_provisional_memory_no_overbooking_bytes | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases, without taking into account overbooking | +| node_metrics_up | Node is part of the cluster and is connected | ## Cluster metrics | V2 metric | Description | | :-------- | :---------- | -| `license_shards_limit` | Total shard limit by the license by shard type (ram / flash) | +| license_shards_limit | Total shard limit by the license by shard type (ram / flash) | ## Cluster watchdog metrics | V2 metric | Type | Description | | :-------- | :--- | :---------- | -| `azure_token_ttl{cluster_wd=}` | gauge| How many seconds left or the timestamp when the token is invalid.| -| `generation{cluster_wd=}` | gauge| Generation number of the specific cluster_wd| -| `has_qourum{cluster_wd=, has_witness_disk=BOOL}` | gauge| Has_qourum = 1
No quorum = 0 | -| `is_primary{cluster_wd=}` | gauge| primary = 1
secondary = 0 | -| `total_live_nodes_count{cluster_wd=}` | gauge| Number of live nodes| -| `total_node_count{cluster_wd=}` | gauge| Number of nodes | -| `total_primary_selection_ended{cluster_wd=}` | counter | Monotonic counter for each selection process that ended | -| `total_primary_selections{cluster_wd=}` | counter | Monotonic counter for each selection process that started| -| `witness_disk_reads{status=” success/failure”, cluster_wd=}` | counter | How many times read from the witness disk | -| `witness_disk_writes{status=”success/failure”, cluster_wd=}` | counter | How many times wrote to the witness disk | +| azure_token_ttl{cluster_wd=} | gauge| How many seconds left or the timestamp when the token is invalid.| +| generation{cluster_wd=} | gauge| Generation number of the specific cluster_wd| +| has_qourum{cluster_wd=, has_witness_disk=BOOL} | gauge| Has_qourum = 1
No quorum = 0 | +| is_primary{cluster_wd=} | gauge| primary = 1
secondary = 0 | +| total_live_nodes_count{cluster_wd=} | gauge| Number of live nodes| +| total_node_count{cluster_wd=} | gauge| Number of nodes | +| total_primary_selection_ended{cluster_wd=} | counter | Monotonic counter for each selection process that ended | +| total_primary_selections{cluster_wd=} | counter | Monotonic counter for each selection process that started| +| witness_disk_reads{status=” success/failure”, cluster_wd=} | counter | How many times read from the witness disk | +| witness_disk_writes{status=”success/failure”, cluster_wd=} | counter | How many times wrote to the witness disk | ## Latency histogram metrics | V2 metric | Description | | :-------- | :---------- | -| `endpoint_other_requests_latency_histogram_bucket{cluster="$cluster", bdb="$bdb"}` | Latency histograms for commands other than read or write commands | -| `endpoint_read_requests_latency_histogram_bucket{cluster="$cluster", bdb="$bdb"}` | Latency histograms for read commands | -| `endpoint_write_requests_latency_histogram_bucket{cluster="$cluster", bdb="$bdb"}` | Latency histograms for write commands | +| endpoint_other_requests_latency_histogram_bucket | Latency histograms for commands other than read or write commands | +| endpoint_read_requests_latency_histogram_bucket | Latency histograms for read commands | +| endpoint_write_requests_latency_histogram_bucket | Latency histograms for write commands | ## Proxy metrics -| V2 metric | Description | -| :-------- | :---------- | -| `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the database at the same time has exceeded the max limit | -| `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | -| `irate(endpoint_read_responses[1m])` | Rate of read responses on the endpoint (ops/sec) | -| `irate(endpoint_total_connections_received[1m])` | Rate of new client connections to the endpoint (connections/sec) | -| `irate(endpoint_write_requests[1m])` | Rate of write requests on the endpoint (ops/sec) | -| `irate(endpoint_write_responses[1m])` | Rate of write responses on the endpoint (ops/sec) | +TBA ## Replication metrics | V2 metric | Description | | :-------- | :---------- | -| `rate(replica_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to a Replica Of database (bytes/sec) | -| `rate(replica_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to a Replica Of database (bytes/sec) | -| `database_syncer_lag_ms{syncer_type="replicaof"}` | Lag time between the source and the destination for Replica Of traffic (ms) | -| `database_syncer_current_status{syncer_type="replicaof"}` | Syncer status for Replica Of traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | -| `rate(crdt_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to CRDB (bytes/sec) | -| `rate(crdt_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to CRDB (bytes/sec) | -| `database_syncer_lag_ms{syncer_type="crdt"}` | Lag time between the source and the destination (ms) for CRDB traffic | -| `database_syncer_current_status{syncer_type="crdt"}` | Syncer status for CRDB traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | +| database_syncer_lag_ms | Lag time between the source and the destination for traffic (ms) | +| database_syncer_current_status | Syncer status for traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | ## Shard metrics | V2 metric | Description | | :-------- | :---------- | -| `redis_server_active_defrag_running` | Automatic memory defragmentation current aggressiveness (% cpu) | -| `redis_server_allocator_active` | Total used memory, including external fragmentation | -| `redis_server_allocator_allocated` | Total allocated memory | -| `redis_server_allocator_resident` | Total resident memory (RSS) | -| `redis_server_aof_last_cow_size` | Last AOFR, CopyOnWrite memory | -| `redis_server_aof_rewrite_in_progress` | The number of simultaneous AOF rewrites that are in progress | -| `redis_server_aof_rewrites` | Number of AOF rewrites this process executed | -| `redis_server_aof_delayed_fsync` | Number of times an AOF fsync caused delays in the main Redis thread (inducing latency); this can indicate that the disk is slow or overloaded | -| `redis_server_blocked_clients` | Count the clients waiting on a blocking call | -| `redis_server_connected_clients` | Number of client connections to the specific shard | -| `redis_server_connected_slaves` | Number of connected replicas | -| `redis_server_db0_avg_ttl` | Average TTL of all volatile keys | -| `redis_server_expired_keys` | Total count of volatile keys | -| `redis_server_db0_keys` | Total key count | -| `redis_server_evicted_keys` | Keys evicted so far (since restart) | -| `redis_server_expire_cycle_cpu_milliseconds` | The cumulative amount of time spent on active expiry cycles | -| `redis_server_expired_keys` | Keys expired so far (since restart) | -| `redis_server_forwarding_state` | Shard forwarding state (on or off) | -| `redis_server_keys_trimmed` | The number of keys that were trimmed in the current or last resharding process | -| `redis_server_keyspace_read_hits` | Number of read operations accessing an existing keyspace | -| `redis_server_keyspace_read_misses` | Number of read operations accessing a non-existing keyspace | -| `redis_server_keyspace_write_hits` | Number of write operations accessing an existing keyspace | -| `redis_server_keyspace_write_misses` | Number of write operations accessing a non-existing keyspace | -| `redis_server_master_link_status` | Indicates if the replica is connected to its master | -| `redis_server_master_repl_offset` | Number of bytes sent to replicas by the shard; calculate the throughput for a time period by comparing the value at different times | -| `redis_server_master_sync_in_progress` | The master shard is synchronizing (1 true | 0 false) | -| `redis_server_max_process_mem` | Current memory limit configured by redis_mgr according to node free memory | -| `redis_server_maxmemory` | Current memory limit configured by redis_mgr according to database memory limits | -| `redis_server_mem_aof_buffer` | Current size of AOF buffer | -| `redis_server_mem_clients_normal` | Current memory used for input and output buffers of non-replica clients | -| `redis_server_mem_clients_slaves` | Current memory used for input and output buffers of replica clients | -| `redis_server_mem_fragmentation_ratio` | Memory fragmentation ratio (1.3 means 30% overhead) | -| `redis_server_mem_not_counted_for_evict` | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | -| `redis_server_mem_replication_backlog` | Size of replication backlog | -| `redis_server_module_fork_in_progress` | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | -| `namedprocess_namegroup_cpu_seconds_total{mode="system"}` | Shard process system CPU time spent in seconds | -| `namedprocess_namegroup_cpu_seconds_total{mode=~"system\|user"}` | Shard process CPU usage percentage | -| `namedprocess_namegroup_cpu_seconds_total{mode="user"}` | Shard user CPU time spent in seconds | -| `namedprocess_namegroup_thread_cpu_seconds_total{mode="system",threadname="redis-server"}` | Shard main thread system CPU time spent in seconds | -| `namedprocess_namegroup_thread_cpu_seconds_total{mode="user",threadname="redis-server"}` | Shard main thread user CPU time spent in seconds | -| `max(namedprocess_namegroup_open_filedesc)` | Shard maximum number of open file descriptors | -| `namedprocess_namegroup_open_filedesc` | Shard number of open file descriptors | -| `namedprocess_namegroup_memory_bytes{memtype="resident"}` | Shard resident memory size in bytes | -| `namedprocess_namegroup_oldest_start_time_seconds` | Shard start time of the process since unix epoch in seconds | -| `namedprocess_namegroup_memory_bytes{memtype="virtual"}` | Shard virtual memory in bytes | -| `redis_server_rdb_bgsave_in_progress` | Indication if bgsave is currently in progress | -| `redis_server_rdb_last_cow_size` | Last bgsave (or SYNC fork) used CopyOnWrite memory | -| `redis_server_rdb_saves` | Total count of bgsaves since the process was restarted (including replica fullsync and persistence) | -| `redis_server_repl_touch_bytes` | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; calculate the throughput for a time period by comparing the value at different times | -| `redis_server_total_commands_processed` | Number of commands processed by the shard; calculate the number of commands for a time period by comparing the value at different times | -| `redis_server_total_connections_received` | Number of connections received by the shard; calculate the number of connections for a time period by comparing the value at different times | -| `redis_server_total_net_input_bytes` | Number of bytes received by the shard; calculate the throughput for a time period by comparing the value at different times | -| `redis_server_total_net_output_bytes` | Number of bytes sent by the shard; calculate the throughput for a time period by comparing the value at different times | -| `redis_server_up` | Shard is up and running | -| `redis_server_used_memory` | Memory used by shard (in BigRedis this includes flash) (bytes) | +| redis_server_active_defrag_running | Automatic memory defragmentation current aggressiveness (% cpu) | +| redis_server_allocator_active | Total used memory, including external fragmentation | +| redis_server_allocator_allocated | Total allocated memory | +| redis_server_allocator_resident | Total resident memory (RSS) | +| redis_server_aof_last_cow_size | Last AOFR, CopyOnWrite memory | +| redis_server_aof_rewrite_in_progress | The number of simultaneous AOF rewrites that are in progress | +| redis_server_aof_rewrites | Number of AOF rewrites this process executed | +| redis_server_aof_delayed_fsync | Number of times an AOF fsync caused delays in the main Redis thread (inducing latency); this can indicate that the disk is slow or overloaded | +| redis_server_blocked_clients | Count the clients waiting on a blocking call | +| redis_server_connected_clients | Number of client connections to the specific shard | +| redis_server_connected_slaves | Number of connected replicas | +| redis_server_db0_avg_ttl | Average TTL of all volatile keys | +| redis_server_expired_keys | Total count of volatile keys | +| redis_server_db0_keys | Total key count | +| redis_server_evicted_keys | Keys evicted so far (since restart) | +| redis_server_expire_cycle_cpu_milliseconds | The cumulative amount of time spent on active expiry cycles | +| redis_server_expired_keys | Keys expired so far (since restart) | +| redis_server_forwarding_state | Shard forwarding state (on or off) | +| redis_server_keys_trimmed | The number of keys that were trimmed in the current or last resharding process | +| redis_server_keyspace_read_hits | Number of read operations accessing an existing keyspace | +| redis_server_keyspace_read_misses | Number of read operations accessing a non-existing keyspace | +| redis_server_keyspace_write_hits | Number of write operations accessing an existing keyspace | +| redis_server_keyspace_write_misses | Number of write operations accessing a non-existing keyspace | +| redis_server_master_link_status | Indicates if the replica is connected to its master | +| redis_server_master_repl_offset | Number of bytes sent to replicas by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_server_master_sync_in_progress | The master shard is synchronizing (1 true | 0 false) | +| redis_server_max_process_mem | Current memory limit configured by redis_mgr according to node free memory | +| redis_server_maxmemory | Current memory limit configured by redis_mgr according to database memory limits | +| redis_server_mem_aof_buffer | Current size of AOF buffer | +| redis_server_mem_clients_normal | Current memory used for input and output buffers of non-replica clients | +| redis_server_mem_clients_slaves | Current memory used for input and output buffers of replica clients | +| redis_server_mem_fragmentation_ratio | Memory fragmentation ratio (1.3 means 30% overhead) | +| redis_server_mem_not_counted_for_evict | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | +| redis_server_mem_replication_backlog | Size of replication backlog | +| redis_server_module_fork_in_progress | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | +| namedprocess_namegroup_cpu_seconds_total | Shard process CPU usage percentage | +| namedprocess_namegroup_thread_cpu_seconds_total | Shard main thread CPU time spent in seconds | +| namedprocess_namegroup_open_filedesc | Shard number of open file descriptors | +| namedprocess_namegroup_memory_bytes | Shard memory size in bytes | +| namedprocess_namegroup_oldest_start_time_seconds | Shard start time of the process since unix epoch in seconds | +| redis_server_rdb_bgsave_in_progress | Indication if bgsave is currently in progress | +| redis_server_rdb_last_cow_size | Last bgsave (or SYNC fork) used CopyOnWrite memory | +| redis_server_rdb_saves | Total count of bgsaves since the process was restarted (including replica fullsync and persistence) | +| redis_server_repl_touch_bytes | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; calculate the throughput for a time period by comparing the value at different times | +| redis_server_total_commands_processed | Number of commands processed by the shard; calculate the number of commands for a time period by comparing the value at different times | +| redis_server_total_connections_received | Number of connections received by the shard; calculate the number of connections for a time period by comparing the value at different times | +| redis_server_total_net_input_bytes | Number of bytes received by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_server_total_net_output_bytes | Number of bytes sent by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_server_up | Shard is up and running | +| redis_server_used_memory | Memory used by shard (in BigRedis this includes flash) (bytes) | From c37e0ec425a9441a7a09a78d9aa5a933024f8602 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Thu, 10 Oct 2024 16:20:55 -0500 Subject: [PATCH 17/25] DOC-4294 Moved example PromQL for latency histogram metrics to description column --- .../prometheus-metrics-definitions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 741f4b6040..8ad3de5f73 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -67,9 +67,9 @@ TBA | V2 metric | Description | | :-------- | :---------- | -| endpoint_other_requests_latency_histogram_bucket | Latency histograms for commands other than read or write commands | -| endpoint_read_requests_latency_histogram_bucket | Latency histograms for read commands | -| endpoint_write_requests_latency_histogram_bucket | Latency histograms for write commands | +| endpoint_other_requests_latency_histogram_bucket | Latency histograms for commands other than read or write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_other_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | +| endpoint_read_requests_latency_histogram_bucket | Latency histograms for read commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_read_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | +| endpoint_write_requests_latency_histogram_bucket | Latency histograms for write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_write_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | ## Proxy metrics From 1eeeb5a507e2a84e2d10f738a419f0074cdec531 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Wed, 30 Oct 2024 15:20:46 -0500 Subject: [PATCH 18/25] DOC-4417 RS: Update v2 cert metric with new name --- .../prometheus-metrics-definitions.md | 2 +- .../prometheus-metrics-v1-to-v2.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 8ad3de5f73..c48320eab1 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -30,7 +30,7 @@ TBA | node_available_memory_bytes | Amount of free memory in the node (bytes) that is available for database provisioning | | node_available_memory_no_overbooking_bytes | Available RAM in the node (bytes) without taking into account overbooking | | node_bigstore_free_bytes | Sum of free space of back-end flash (used by flash database's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | -| x509_cert_expires_in_seconds | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | +| node_cert_expires_in_seconds | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | | node_ephemeral_storage_avail_bytes | Disk space available to RLEC processes on configured ephemeral disk (bytes) | | node_ephemeral_storage_free_bytes | Free disk space on configured ephemeral disk (bytes) | | node_memory_MemFree_bytes | Free memory in the node (bytes) | diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md index c9666bc0f6..fbf4029f01 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md @@ -103,7 +103,7 @@ As of Redis Enterprise Software version 7.8.0, [PromQL (Prometheus Query Languag | node_bigstore_iops | `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | | node_bigstore_kv_ops | `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | | node_bigstore_throughput | `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (bytes/sec); returned only when BigRedis is enabled | -| node_cert_expiration_seconds | `x509_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | +| node_cert_expiration_seconds | `node_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | | node_conns | `sum by (node) (endpoint_conns)` | Number of clients connected to endpoints on node | | node_cpu_idle | `avg by (node) (irate(node_cpu_seconds_total{mode="idle"}[1m]))` | CPU idle time portion (0-1, multiply by 100 to get percent) | | node_cpu_idle_max | N/A | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | From 775a99cdf06334a7b16e49ea32964982935a5fc8 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Tue, 5 Nov 2024 13:55:51 -0600 Subject: [PATCH 19/25] DOC-3944 Feedback update to remove a few v2 metrics --- .../prometheus-metrics-definitions.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index c48320eab1..f8a6127ac6 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -17,10 +17,6 @@ You can [integrate Redis Enterprise Software with Prometheus and Grafana]({{}}). -## Database metrics - -TBA - ## Node metrics | V2 metric | Description | @@ -52,7 +48,6 @@ TBA | V2 metric | Type | Description | | :-------- | :--- | :---------- | -| azure_token_ttl{cluster_wd=} | gauge| How many seconds left or the timestamp when the token is invalid.| | generation{cluster_wd=} | gauge| Generation number of the specific cluster_wd| | has_qourum{cluster_wd=, has_witness_disk=BOOL} | gauge| Has_qourum = 1
No quorum = 0 | | is_primary{cluster_wd=} | gauge| primary = 1
secondary = 0 | @@ -60,8 +55,6 @@ TBA | total_node_count{cluster_wd=} | gauge| Number of nodes | | total_primary_selection_ended{cluster_wd=} | counter | Monotonic counter for each selection process that ended | | total_primary_selections{cluster_wd=} | counter | Monotonic counter for each selection process that started| -| witness_disk_reads{status=” success/failure”, cluster_wd=} | counter | How many times read from the witness disk | -| witness_disk_writes{status=”success/failure”, cluster_wd=} | counter | How many times wrote to the witness disk | ## Latency histogram metrics @@ -71,10 +64,6 @@ TBA | endpoint_read_requests_latency_histogram_bucket | Latency histograms for read commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_read_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | | endpoint_write_requests_latency_histogram_bucket | Latency histograms for write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_write_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | -## Proxy metrics - -TBA - ## Replication metrics | V2 metric | Description | From e0f0b697bc3f96e875434d70630245d7ab74faa5 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Tue, 5 Nov 2024 13:56:38 -0600 Subject: [PATCH 20/25] Update RS version to 7.8.2 --- .../prometheus-metrics-definitions.md | 4 ++-- .../prometheus-metrics-v1-to-v2.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index f8a6127ac6..925996e5ce 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -5,10 +5,10 @@ categories: - docs - integrate - rs -description: V2 metrics available to Prometheus as of Redis Enterprise Software version 7.8.0. +description: V2 metrics available to Prometheus as of Redis Enterprise Software version 7.8.2. group: observability linkTitle: Prometheus metrics v2 -summary: V2 metrics available to Prometheus as of Redis Enterprise Software version 7.8.0. +summary: V2 metrics available to Prometheus as of Redis Enterprise Software version 7.8.2. type: integration weight: 45 --- diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md index fbf4029f01..e56bbe83c5 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md @@ -15,7 +15,7 @@ weight: 45 You can [integrate Redis Enterprise Software with Prometheus and Grafana]({{}}) to create dashboards for important metrics. -As of Redis Enterprise Software version 7.8.0, [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/latest/querying/basics/) metrics are available, and v1 metrics are deprecated. You can use the following tables to transition from v1 metrics to equivalent v2 PromQL. For a list of all available v2 PromQL metrics, see [Prometheus metrics v2]({{}}). +As of Redis Enterprise Software version 7.8.2, [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/latest/querying/basics/) metrics are available, and v1 metrics are deprecated. You can use the following tables to transition from v1 metrics to equivalent v2 PromQL. For a list of all available v2 PromQL metrics, see [Prometheus metrics v2]({{}}). ## Database metrics From 6d5fea5f4ff928a3e7d10161cbbe547cade73ce8 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Wed, 6 Nov 2024 13:49:09 -0600 Subject: [PATCH 21/25] DOC-3944 Feedback updates for v2 metrics --- .../prometheus-metrics-definitions.md | 29 +++++++++---------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 925996e5ce..9eb193967d 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -1,5 +1,5 @@ --- -Title: Prometheus metrics v2 +Title: Prometheus metrics v2 preview alwaysopen: false categories: - docs @@ -13,10 +13,22 @@ type: integration weight: 45 --- +{{}} +While the metrics stream engine is in preview, this document provides only a partial list of v2 metrics. More metrics will be added. +{{}} + You can [integrate Redis Enterprise Software with Prometheus and Grafana]({{}}) to create dashboards for important metrics. The v2 metrics in the following tables are available as of Redis Enterprise Software version 7.8.0. For help transitioning from v1 metrics to v2 PromQL, see [Prometheus v1 metrics and equivalent v2 PromQL]({{}}). +## Database metrics + +| V2 metric | Description | +| :-------- | :---------- | +| endpoint_other_requests_latency_histogram_bucket | Latency histograms for commands other than read or write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_other_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | +| endpoint_read_requests_latency_histogram_bucket | Latency histograms for read commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_read_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | +| endpoint_write_requests_latency_histogram_bucket | Latency histograms for write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_write_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | + ## Node metrics | V2 metric | Description | @@ -40,30 +52,17 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Cluster metrics -| V2 metric | Description | -| :-------- | :---------- | -| license_shards_limit | Total shard limit by the license by shard type (ram / flash) | - -## Cluster watchdog metrics - | V2 metric | Type | Description | | :-------- | :--- | :---------- | | generation{cluster_wd=} | gauge| Generation number of the specific cluster_wd| | has_qourum{cluster_wd=, has_witness_disk=BOOL} | gauge| Has_qourum = 1
No quorum = 0 | | is_primary{cluster_wd=} | gauge| primary = 1
secondary = 0 | +| license_shards_limit | | Total shard limit by the license by shard type (ram / flash) | | total_live_nodes_count{cluster_wd=} | gauge| Number of live nodes| | total_node_count{cluster_wd=} | gauge| Number of nodes | | total_primary_selection_ended{cluster_wd=} | counter | Monotonic counter for each selection process that ended | | total_primary_selections{cluster_wd=} | counter | Monotonic counter for each selection process that started| -## Latency histogram metrics - -| V2 metric | Description | -| :-------- | :---------- | -| endpoint_other_requests_latency_histogram_bucket | Latency histograms for commands other than read or write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_other_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | -| endpoint_read_requests_latency_histogram_bucket | Latency histograms for read commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_read_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | -| endpoint_write_requests_latency_histogram_bucket | Latency histograms for write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_write_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | - ## Replication metrics | V2 metric | Description | From b825f25ec76a13e9bccf374c71290a1de0a55b1f Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Wed, 6 Nov 2024 14:14:37 -0600 Subject: [PATCH 22/25] DOC-3944 Feedback update for title of v1 to v2 metrics transition doc --- .../prometheus-metrics-v1-to-v2.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md index e56bbe83c5..4453785fae 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md @@ -1,5 +1,5 @@ --- -Title: Prometheus v1 metrics and equivalent v2 PromQL +Title: Transition from Prometheus v1 to Prometheus v2 alwaysopen: false categories: - docs @@ -7,7 +7,7 @@ categories: - rs description: Transition from v1 metrics to v2 PromQL equivalents. group: observability -linkTitle: Prometheus v1 metrics & v2 equivalents +linkTitle: Transition from Prometheus v1 to v2 summary: Transition from v1 metrics to v2 PromQL equivalents. type: integration weight: 45 From 26b902490a3217aced2d6f32cb7e534c0d949a00 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Wed, 6 Nov 2024 15:31:17 -0600 Subject: [PATCH 23/25] DOC-3944 Add v2 replication metrics --- .../prometheus-metrics-definitions.md | 20 ++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 9eb193967d..4055bd0273 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -67,8 +67,26 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft | V2 metric | Description | | :-------- | :---------- | -| database_syncer_lag_ms | Lag time between the source and the destination for traffic (ms) | +| database_syncer_config | Used as a placeholder for configuration labels | | database_syncer_current_status | Syncer status for traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | +| database_syncer_dst_connectivity_state | Destination connectivity state | +| database_syncer_dst_connectivity_state_ms | Destination connectivity state duration | +| database_syncer_dst_lag | Lag in milliseconds between the syncer and the destination | +| database_syncer_dst_repl_offset | Offset of the last command acknowledged | +| database_syncer_flush_counter | Number of destination flushes | +| database_syncer_ingress_bytes | Number of bytes read from source shard | +| database_syncer_ingress_bytes_decompressed | Number of bytes read from source shard | +| database_syncer_internal_state | Internal state of the syncer | +| database_syncer_lag_ms | Lag time between the source and the destination for traffic in milliseconds | +| database_syncer_rdb_size | The source's RDB size in bytes to be transferred during the syncing phase | +| database_syncer_rdb_transferred | Number of bytes transferred from the source's RDB during the syncing phase | +| database_syncer_src_connectivity_state | Source connectivity state | +| database_syncer_src_connectivity_state_ms | Source connectivity state duration | +| database_syncer_src_repl_offset | Last known source offset | +| database_syncer_state | Internal state of the shard syncer | +| database_syncer_syncer_repl_offset | Offset of the last command handled by the syncer | +| database_syncer_total_requests | Number of destination writes | +| database_syncer_total_responses | Number of destination writes acknowledged | ## Shard metrics From 4449ea248a16abcd5b848eb77428930c1c483d38 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Wed, 6 Nov 2024 16:12:18 -0600 Subject: [PATCH 24/25] DOC-3944 Add v2 DB metrics --- .../prometheus-metrics-definitions.md | 32 +++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 4055bd0273..ec0a55112a 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -25,9 +25,41 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft | V2 metric | Description | | :-------- | :---------- | +| endpoint_client_connections | Number of client connection establishment events | +| endpoint_client_disconnections | Number of client disconnections initiated by the client | +| endpoint_client_connection_expired | Total number of client connections with expired TTL (Time To Live) | +| endpoint_client_establishment_failures | Number of client connections that failed to establish properly | +| endpoint_client_expiration_refresh | Number of expiration time changes of clients | +| endpoint_client_tracking_off_requests | Total number of `CLIENT TRACKING OFF` requests | +| endpoint_client_tracking_on_requests | Total number of `CLIENT TRACKING ON` requests | +| endpoint_disconnected_cba_client | Number of certificate-based clients disconnected | +| endpoint_disconnected_ldap_client | Number of LDAP clients disconnected | +| endpoint_disconnected_user_password_client | Number of user&password clients disconnected | +| endpoint_disposed_commands_after_client_caching | Total number of client caching commands that were disposed due to misuse | +| endpoint_egress | Number of egress bytes | +| endpoint_egress_pending | Number of send-pending bytes | +| endpoint_egress_pending_discarded | Number of send-pending bytes that were discarded due to disconnection | +| endpoint_failed_cba_authentication | Number of clients that failed certificate-based authentication | +| endpoint_failed_ldap_authentication | Number of clients that failed LDAP authentication | +| endpoint_failed_user_password_authentication | Number of clients that failed user password authentication | +| endpoint_ingress | Number of ingress bytes | +| endpoint_longest_pipeline_histogram | Client connections with the longest pipeline lengths | +| endpoint_other_requests | Number of other requests | +| endpoint_other_requests_latency_histogram | Latency (in µs) histogram of other commands | | endpoint_other_requests_latency_histogram_bucket | Latency histograms for commands other than read or write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_other_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | +| endpoint_other_responses | Number of other responses | +| endpoint_proxy_disconnections | Number of client disconnections initiated by the proxy | +| endpoint_read_requests | Number of read requests | +| endpoint_read_requests_latency_histogram | Latency (in µs) histogram of read commands | | endpoint_read_requests_latency_histogram_bucket | Latency histograms for read commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_read_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | +| endpoint_read_responses | Number of read responses | +| endpoint_successful_cba_authentication | Number of clients that successfully authenticated with certificate-based authentication | +| endpoint_successful_ldap_authentication | Number of clients that successfully authenticated with LDAP | +| endpoint_successful_user_password_authentication | Number of clients that successfully authenticated with user&password | +| endpoint_write_requests | Number of write requests | +| endpoint_write_requests_latency_histogram | Latency (in µs) histogram of write commands | | endpoint_write_requests_latency_histogram_bucket | Latency histograms for write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_write_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | +| endpoint_write_responses | Number of write responses | ## Node metrics From ac688fac5e868270567f67ee9e24dc2bff81870d Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Thu, 7 Nov 2024 09:52:32 -0600 Subject: [PATCH 25/25] DOC-3944 Change v2 metric column name --- .../prometheus-metrics-definitions.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index ec0a55112a..c264ece9ad 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -23,7 +23,7 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Database metrics -| V2 metric | Description | +| Metric | Description | | :-------- | :---------- | | endpoint_client_connections | Number of client connection establishment events | | endpoint_client_disconnections | Number of client disconnections initiated by the client | @@ -63,7 +63,7 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Node metrics -| V2 metric | Description | +| Metric | Description | | :-------- | :---------- | | node_available_flash_bytes | Available flash in the node (bytes) | | node_available_flash_no_overbooking_bytes | Available flash in the node (bytes), without taking into account overbooking | @@ -84,7 +84,7 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Cluster metrics -| V2 metric | Type | Description | +| Metric | Type | Description | | :-------- | :--- | :---------- | | generation{cluster_wd=} | gauge| Generation number of the specific cluster_wd| | has_qourum{cluster_wd=, has_witness_disk=BOOL} | gauge| Has_qourum = 1
No quorum = 0 | @@ -97,7 +97,7 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Replication metrics -| V2 metric | Description | +| Metric | Description | | :-------- | :---------- | | database_syncer_config | Used as a placeholder for configuration labels | | database_syncer_current_status | Syncer status for traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | @@ -122,7 +122,7 @@ The v2 metrics in the following tables are available as of Redis Enterprise Soft ## Shard metrics -| V2 metric | Description | +| Metric | Description | | :-------- | :---------- | | redis_server_active_defrag_running | Automatic memory defragmentation current aggressiveness (% cpu) | | redis_server_allocator_active | Total used memory, including external fragmentation |