diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md index 0c44afb102..c264ece9ad 100644 --- a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-definitions.md @@ -1,281 +1,176 @@ --- -Title: Metrics in Prometheus +Title: Prometheus metrics v2 preview alwaysopen: false categories: - docs - integrate - rs -description: The metrics available to Prometheus. +description: V2 metrics available to Prometheus as of Redis Enterprise Software version 7.8.2. group: observability -linkTitle: Prometheus metrics -summary: You can use Prometheus and Grafana to collect and visualize your Redis Enterprise - Software metrics. +linkTitle: Prometheus metrics v2 +summary: V2 metrics available to Prometheus as of Redis Enterprise Software version 7.8.2. type: integration weight: 45 --- -The [integration with Prometheus]({{< relref "/integrate/prometheus-with-redis-enterprise/" >}}) -lets you create dashboards that highlight the metrics that are important to you. -Here are the metrics available to Prometheus: +{{}} +While the metrics stream engine is in preview, this document provides only a partial list of v2 metrics. More metrics will be added. +{{}} + +You can [integrate Redis Enterprise Software with Prometheus and Grafana]({{}}) to create dashboards for important metrics. + +The v2 metrics in the following tables are available as of Redis Enterprise Software version 7.8.0. For help transitioning from v1 metrics to v2 PromQL, see [Prometheus v1 metrics and equivalent v2 PromQL]({{}}). ## Database metrics | Metric | Description | -| ------ | :------ | -| bdb_avg_latency | Average latency of operations on the DB (seconds); returned only when there is traffic | -| bdb_avg_latency_max | Highest value of average latency of operations on the DB (seconds); returned only when there is traffic | -| bdb_avg_read_latency | Average latency of read operations (seconds); returned only when there is traffic | -| bdb_avg_read_latency_max | Highest value of average latency of read operations (seconds); returned only when there is traffic | -| bdb_avg_write_latency | Average latency of write operations (seconds); returned only when there is traffic | -| bdb_avg_write_latency_max | Highest value of average latency of write operations (seconds); returned only when there is traffic | -| bdb_bigstore_shard_count | Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled | -| bdb_conns | Number of client connections to DB | -| bdb_egress_bytes | Rate of outgoing network traffic from the DB (bytes/sec) | -| bdb_egress_bytes_max | Highest value of rate of outgoing network traffic from the DB (bytes/sec) | -| bdb_evicted_objects | Rate of key evictions from DB (evictions/sec) | -| bdb_evicted_objects_max | Highest value of rate of key evictions from DB (evictions/sec) | -| bdb_expired_objects | Rate keys expired in DB (expirations/sec) | -| bdb_expired_objects_max | Highest value of rate keys expired in DB (expirations/sec) | -| bdb_fork_cpu_system | % cores utilization in system mode for all redis shard fork child processes of this database | -| bdb_fork_cpu_system_max | Highest value of % cores utilization in system mode for all redis shard fork child processes of this database | -| bdb_fork_cpu_user | % cores utilization in user mode for all redis shard fork child processes of this database | -| bdb_fork_cpu_user_max | Highest value of % cores utilization in user mode for all redis shard fork child processes of this database | -| bdb_ingress_bytes | Rate of incoming network traffic to DB (bytes/sec) | -| bdb_ingress_bytes_max | Highest value of rate of incoming network traffic to DB (bytes/sec) | -| bdb_instantaneous_ops_per_sec | Request rate handled by all shards of DB (ops/sec) | -| bdb_main_thread_cpu_system | % cores utilization in system mode for all redis shard main threads of this database | -| bdb_main_thread_cpu_system_max | Highest value of % cores utilization in system mode for all redis shard main threads of this database | -| bdb_main_thread_cpu_user | % cores utilization in user mode for all redis shard main threads of this database | -| bdb_main_thread_cpu_user_max | Highest value of % cores utilization in user mode for all redis shard main threads of this database | -| bdb_mem_frag_ratio | RAM fragmentation ratio (RSS / allocated RAM) | -| bdb_mem_size_lua | Redis lua scripting heap size (bytes) | -| bdb_memory_limit | Configured RAM limit for the database | -| bdb_monitor_sessions_count | Number of client connected in monitor mode to the DB | -| bdb_no_of_keys | Number of keys in DB | -| bdb_other_req | Rate of other (non read/write) requests on DB (ops/sec) | -| bdb_other_req_max | Highest value of rate of other (non read/write) requests on DB (ops/sec) | -| bdb_other_res | Rate of other (non read/write) responses on DB (ops/sec) | -| bdb_other_res_max | Highest value of rate of other (non read/write) responses on DB (ops/sec) | -| bdb_pubsub_channels | Count the pub/sub channels with subscribed clients | -| bdb_pubsub_channels_max | Highest value of count the pub/sub channels with subscribed clients | -| bdb_pubsub_patterns | Count the pub/sub patterns with subscribed clients | -| bdb_pubsub_patterns_max | Highest value of count the pub/sub patterns with subscribed clients | -| bdb_read_hits | Rate of read operations accessing an existing key (ops/sec) | -| bdb_read_hits_max | Highest value of rate of read operations accessing an existing key (ops/sec) | -| bdb_read_misses | Rate of read operations accessing a non-existing key (ops/sec) | -| bdb_read_misses_max | Highest value of rate of read operations accessing a non-existing key (ops/sec) | -| bdb_read_req | Rate of read requests on DB (ops/sec) | -| bdb_read_req_max | Highest value of rate of read requests on DB (ops/sec) | -| bdb_read_res | Rate of read responses on DB (ops/sec) | -| bdb_read_res_max | Highest value of rate of read responses on DB (ops/sec) | -| bdb_shard_cpu_system | % cores utilization in system mode for all redis shard processes of this database | -| bdb_shard_cpu_system_max | Highest value of % cores utilization in system mode for all redis shard processes of this database | -| bdb_shard_cpu_user | % cores utilization in user mode for the redis shard process | -| bdb_shard_cpu_user_max | Highest value of % cores utilization in user mode for the redis shard process | -| bdb_shards_used | Used shard count by database and by shard type (ram / flash) | -| bdb_total_connections_received | Rate of new client connections to DB (connections/sec) | -| bdb_total_connections_received_max | Highest value of rate of new client connections to DB (connections/sec) | -| bdb_total_req | Rate of all requests on DB (ops/sec) | -| bdb_total_req_max | Highest value of rate of all requests on DB (ops/sec) | -| bdb_total_res | Rate of all responses on DB (ops/sec) | -| bdb_total_res_max | Highest value of rate of all responses on DB (ops/sec) | -| bdb_up | Database is up and running | -| bdb_used_memory | Memory used by db (in bigredis this includes flash) (bytes) | -| bdb_write_hits | Rate of write operations accessing an existing key (ops/sec) | -| bdb_write_hits_max | Highest value of rate of write operations accessing an existing key (ops/sec) | -| bdb_write_misses | Rate of write operations accessing a non-existing key (ops/sec) | -| bdb_write_misses_max | Highest value of rate of write operations accessing a non-existing key (ops/sec) | -| bdb_write_req | Rate of write requests on DB (ops/sec) | -| bdb_write_req_max | Highest value of rate of write requests on DB (ops/sec) | -| bdb_write_res | Rate of write responses on DB (ops/sec) | -| bdb_write_res_max | Highest value of rate of write responses on DB (ops/sec) | -| no_of_expires | Current number of volatile keys in the database | +| :-------- | :---------- | +| endpoint_client_connections | Number of client connection establishment events | +| endpoint_client_disconnections | Number of client disconnections initiated by the client | +| endpoint_client_connection_expired | Total number of client connections with expired TTL (Time To Live) | +| endpoint_client_establishment_failures | Number of client connections that failed to establish properly | +| endpoint_client_expiration_refresh | Number of expiration time changes of clients | +| endpoint_client_tracking_off_requests | Total number of `CLIENT TRACKING OFF` requests | +| endpoint_client_tracking_on_requests | Total number of `CLIENT TRACKING ON` requests | +| endpoint_disconnected_cba_client | Number of certificate-based clients disconnected | +| endpoint_disconnected_ldap_client | Number of LDAP clients disconnected | +| endpoint_disconnected_user_password_client | Number of user&password clients disconnected | +| endpoint_disposed_commands_after_client_caching | Total number of client caching commands that were disposed due to misuse | +| endpoint_egress | Number of egress bytes | +| endpoint_egress_pending | Number of send-pending bytes | +| endpoint_egress_pending_discarded | Number of send-pending bytes that were discarded due to disconnection | +| endpoint_failed_cba_authentication | Number of clients that failed certificate-based authentication | +| endpoint_failed_ldap_authentication | Number of clients that failed LDAP authentication | +| endpoint_failed_user_password_authentication | Number of clients that failed user password authentication | +| endpoint_ingress | Number of ingress bytes | +| endpoint_longest_pipeline_histogram | Client connections with the longest pipeline lengths | +| endpoint_other_requests | Number of other requests | +| endpoint_other_requests_latency_histogram | Latency (in µs) histogram of other commands | +| endpoint_other_requests_latency_histogram_bucket | Latency histograms for commands other than read or write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_other_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | +| endpoint_other_responses | Number of other responses | +| endpoint_proxy_disconnections | Number of client disconnections initiated by the proxy | +| endpoint_read_requests | Number of read requests | +| endpoint_read_requests_latency_histogram | Latency (in µs) histogram of read commands | +| endpoint_read_requests_latency_histogram_bucket | Latency histograms for read commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_read_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | +| endpoint_read_responses | Number of read responses | +| endpoint_successful_cba_authentication | Number of clients that successfully authenticated with certificate-based authentication | +| endpoint_successful_ldap_authentication | Number of clients that successfully authenticated with LDAP | +| endpoint_successful_user_password_authentication | Number of clients that successfully authenticated with user&password | +| endpoint_write_requests | Number of write requests | +| endpoint_write_requests_latency_histogram | Latency (in µs) histogram of write commands | +| endpoint_write_requests_latency_histogram_bucket | Latency histograms for write commands. Can be used to represent different latency percentiles.
p99.9 example:
`histogram_quantile(0.999, sum(rate(endpoint_write_requests_latency_histogram_bucket{cluster="$cluster", db="$db"}[$__rate_interval]) ) by (le, db))` | +| endpoint_write_responses | Number of write responses | ## Node metrics | Metric | Description | -| ------ | :------ | -| node_available_flash | Available flash in node (bytes) | -| node_available_flash_no_overbooking | Available flash in node (bytes), without taking into account overbooking | -| node_available_memory | Amount of free memory in node (bytes) that is available for database provisioning | -| node_available_memory_no_overbooking | Available ram in node (bytes) without taking into account overbooking | -| node_avg_latency | Average latency of requests handled by endpoints on the node in milliseconds; returned only when there is traffic | -| node_bigstore_free | Sum of free space of back-end flash (used by flash DB's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | -| node_bigstore_iops | Rate of i/o operations against back-end flash for all shards which are part of a flash based DB (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | -| node_bigstore_kv_ops | Rate of value read/write operations against back-end flash for all shards which are part of a flash based DB (BigRedis) in cluster (ops/sec); returned only when BigRedis is enabled | -| node_bigstore_throughput | Throughput i/o operations against back-end flash for all shards which are part of a flash based DB (BigRedis) in cluster (bytes/sec); returned only when BigRedis is enabled | -| node_cert_expiration_seconds | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates expiration]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | -| node_conns | Number of clients connected to endpoints on node | -| node_cpu_idle | CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_max | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_median | Average value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_idle_min | Lowest value of CPU idle time portion (0-1, multiply by 100 to get percent) | -| node_cpu_system | CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_max | Highest value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_median | Average value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_system_min | Lowest value of CPU time portion spent in kernel (0-1, multiply by 100 to get percent) | -| node_cpu_user | CPU time portion spent by users-pace processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_max | Highest value of CPU time portion spent by users-pace processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_median | Average value of CPU time portion spent by users-pace processes (0-1, multiply by 100 to get percent) | -| node_cpu_user_min | Lowest value of CPU time portion spent by users-pace processes (0-1, multiply by 100 to get percent) | -| node_cur_aof_rewrites | Number of aof rewrites that are currently performed by shards on this node | -| node_egress_bytes | Rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_max | Highest value of rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_median | Average value of rate of outgoing network traffic to node (bytes/sec) | -| node_egress_bytes_min | Lowest value of rate of outgoing network traffic to node (bytes/sec) | -| node_ephemeral_storage_avail | Disk space available to RLEC processes on configured ephemeral disk (bytes) | -| node_ephemeral_storage_free | Free disk space on configured ephemeral disk (bytes) | -| node_free_memory | Free memory in node (bytes) | -| node_ingress_bytes | Rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_max | Highest value of rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_median | Average value of rate of incoming network traffic to node (bytes/sec) | -| node_ingress_bytes_min | Lowest value of rate of incoming network traffic to node (bytes/sec) | -| node_persistent_storage_avail | Disk space available to RLEC processes on configured persistent disk (bytes) | -| node_persistent_storage_free | Free disk space on configured persistent disk (bytes) | -| node_provisional_flash | Amount of flash available for new shards on this node, taking into account overbooking, max redis servers, reserved flash and provision and migration thresholds (bytes) | -| node_provisional_flash_no_overbooking | Amount of flash available for new shards on this node, without taking into account overbooking, max redis servers, reserved flash and provision and migration thresholds (bytes) | -| node_provisional_memory | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases | -| node_provisional_memory_no_overbooking | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases, without taking into account overbooking | -| node_total_req | Request rate handled by endpoints on node (ops/sec) | -| node_up | Node is part of the cluster and is connected | +| :-------- | :---------- | +| node_available_flash_bytes | Available flash in the node (bytes) | +| node_available_flash_no_overbooking_bytes | Available flash in the node (bytes), without taking into account overbooking | +| node_available_memory_bytes | Amount of free memory in the node (bytes) that is available for database provisioning | +| node_available_memory_no_overbooking_bytes | Available RAM in the node (bytes) without taking into account overbooking | +| node_bigstore_free_bytes | Sum of free space of back-end flash (used by flash database's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | +| node_cert_expires_in_seconds | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | +| node_ephemeral_storage_avail_bytes | Disk space available to RLEC processes on configured ephemeral disk (bytes) | +| node_ephemeral_storage_free_bytes | Free disk space on configured ephemeral disk (bytes) | +| node_memory_MemFree_bytes | Free memory in the node (bytes) | +| node_persistent_storage_avail_bytes | Disk space available to RLEC processes on configured persistent disk (bytes) | +| node_persistent_storage_free_bytes | Free disk space on configured persistent disk (bytes) | +| node_provisional_flash_bytes | Amount of flash available for new shards on this node, taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| node_provisional_flash_no_overbooking_bytes | Amount of flash available for new shards on this node, without taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| node_provisional_memory_bytes | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases | +| node_provisional_memory_no_overbooking_bytes | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases, without taking into account overbooking | +| node_metrics_up | Node is part of the cluster and is connected | ## Cluster metrics -| Metric | Description | -| ------ | :------ | -| cluster_shards_limit | Total shard limit by the license by shard type (ram / flash) | - - -## Proxy metrics - -| Metric | Description | -| ------ | :------ | -| listener_acc_latency | Accumulative latency (sum of the latencies) of all types of commands on DB. For the average latency, divide this value by listener_total_res | -| listener_acc_latency_max | Highest value of accumulative latency of all types of commands on DB | -| listener_acc_other_latency | Accumulative latency (sum of the latencies) of commands that are type "other" on DB. For the average latency, divide this value by listener_other_res | -| listener_acc_other_latency_max | Highest value of accumulative latency of commands that are type "other" on DB | -| listener_acc_read_latency | Accumulative latency (sum of the latencies) of commands that are type "read" on DB. For the average latency, divide this value by listener_read_res | -| listener_acc_read_latency_max | Highest value of accumulative latency of commands that are type "read" on DB | -| listener_acc_write_latency | Accumulative latency (sum of the latencies) of commands that are type "write" on DB. For the average latency, divide this value by listener_write_res | -| listener_acc_write_latency_max | Highest value of accumulative latency of commands that are type "write" on DB | -| listener_auth_cmds | Number of memcached AUTH commands sent to the DB | -| listener_auth_cmds_max | Highest value of number of memcached AUTH commands sent to the DB | -| listener_auth_errors | Number of error responses to memcached AUTH commands | -| listener_auth_errors_max | Highest value of number of error responses to memcached AUTH commands | -| listener_cmd_flush | Number of memcached FLUSH_ALL commands sent to the DB | -| listener_cmd_flush_max | Highest value of number of memcached FLUSH_ALL commands sent to the DB | -| listener_cmd_get | Number of memcached GET commands sent to the DB | -| listener_cmd_get_max | Highest value of number of memcached GET commands sent to the DB | -| listener_cmd_set | Number of memcached SET commands sent to the DB | -| listener_cmd_set_max | Highest value of number of memcached SET commands sent to the DB | -| listener_cmd_touch | Number of memcached TOUCH commands sent to the DB | -| listener_cmd_touch_max | Highest value of number of memcached TOUCH commands sent to the DB | -| listener_conns | Number of clients connected to the endpoint | -| listener_egress_bytes | Rate of outgoing network traffic to the endpoint (bytes/sec) | -| listener_egress_bytes_max | Highest value of rate of outgoing network traffic to the endpoint (bytes/sec) | -| listener_ingress_bytes | Rate of incoming network traffic to the endpoint (bytes/sec) | -| listener_ingress_bytes_max | Highest value of rate of incoming network traffic to the endpoint (bytes/sec) | -| listener_last_req_time | Time of last command sent to the DB | -| listener_last_res_time | Time of last response sent from the DB | -| listener_max_connections_exceeded | Number of times the Number of clients connected to the db at the same time has exeeded the max limit | -| listener_max_connections_exceeded_max | Highest value of number of times the Number of clients connected to the db at the same time has exeeded the max limit | -| listener_monitor_sessions_count | Number of client connected in monitor mode to the endpoint | -| listener_other_req | Rate of other (non read/write) requests on the endpoint (ops/sec) | -| listener_other_req_max | Highest value of rate of other (non read/write) requests on the endpoint (ops/sec) | -| listener_other_res | Rate of other (non read/write) responses on the endpoint (ops/sec) | -| listener_other_res_max | Highest value of rate of other (non read/write) responses on the endpoint (ops/sec) | -| listener_other_started_res | Number of responses sent from the DB of type "other" | -| listener_other_started_res_max | Highest value of number of responses sent from the DB of type "other" | -| listener_read_req | Rate of read requests on the endpoint (ops/sec) | -| listener_read_req_max | Highest value of rate of read requests on the endpoint (ops/sec) | -| listener_read_res | Rate of read responses on the endpoint (ops/sec) | -| listener_read_res_max | Highest value of rate of read responses on the endpoint (ops/sec) | -| listener_read_started_res | Number of responses sent from the DB of type "read" | -| listener_read_started_res_max | Highest value of number of responses sent from the DB of type "read" | -| listener_total_connections_received | Rate of new client connections to the endpoint (connections/sec) | -| listener_total_connections_received_max | Highest value of rate of new client connections to the endpoint (connections/sec) | -| listener_total_req | Request rate handled by the endpoint (ops/sec) | -| listener_total_req_max | Highest value of rate of all requests on the endpoint (ops/sec) | -| listener_total_res | Rate of all responses on the endpoint (ops/sec) | -| listener_total_res_max | Highest value of rate of all responses on the endpoint (ops/sec) | -| listener_total_started_res | Number of responses sent from the DB of all types | -| listener_total_started_res_max | Highest value of number of responses sent from the DB of all types | -| listener_write_req | Rate of write requests on the endpoint (ops/sec) | -| listener_write_req_max | Highest value of rate of write requests on the endpoint (ops/sec) | -| listener_write_res | Rate of write responses on the endpoint (ops/sec) | -| listener_write_res_max | Highest value of rate of write responses on the endpoint (ops/sec) | -| listener_write_started_res | Number of responses sent from the DB of type "write" | -| listener_write_started_res_max | Highest value of number of responses sent from the DB of type "write" | +| Metric | Type | Description | +| :-------- | :--- | :---------- | +| generation{cluster_wd=} | gauge| Generation number of the specific cluster_wd| +| has_qourum{cluster_wd=, has_witness_disk=BOOL} | gauge| Has_qourum = 1
No quorum = 0 | +| is_primary{cluster_wd=} | gauge| primary = 1
secondary = 0 | +| license_shards_limit | | Total shard limit by the license by shard type (ram / flash) | +| total_live_nodes_count{cluster_wd=} | gauge| Number of live nodes| +| total_node_count{cluster_wd=} | gauge| Number of nodes | +| total_primary_selection_ended{cluster_wd=} | counter | Monotonic counter for each selection process that ended | +| total_primary_selections{cluster_wd=} | counter | Monotonic counter for each selection process that started| ## Replication metrics | Metric | Description | -| ------ | :------ | -| bdb_replicaof_syncer_ingress_bytes | Rate of compressed incoming network traffic to Replica Of DB (bytes/sec) | -| bdb_replicaof_syncer_ingress_bytes_decompressed | Rate of decompressed incoming network traffic to Replica Of DB (bytes/sec) | -| bdb_replicaof_syncer_local_ingress_lag_time | Lag time between the source and the destination for Replica Of traffic (ms) | -| bdb_replicaof_syncer_status | Syncer status for Replica Of traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | -| bdb_crdt_syncer_ingress_bytes | Rate of compressed incoming network traffic to CRDB (bytes/sec) | -| bdb_crdt_syncer_ingress_bytes_decompressed | Rate of decompressed incoming network traffic to CRDB (bytes/sec) | -| bdb_crdt_syncer_local_ingress_lag_time | Lag time between the source and the destination (ms) for CRDB traffic | -| bdb_crdt_syncer_status | Syncer status for CRDB traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | +| :-------- | :---------- | +| database_syncer_config | Used as a placeholder for configuration labels | +| database_syncer_current_status | Syncer status for traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | +| database_syncer_dst_connectivity_state | Destination connectivity state | +| database_syncer_dst_connectivity_state_ms | Destination connectivity state duration | +| database_syncer_dst_lag | Lag in milliseconds between the syncer and the destination | +| database_syncer_dst_repl_offset | Offset of the last command acknowledged | +| database_syncer_flush_counter | Number of destination flushes | +| database_syncer_ingress_bytes | Number of bytes read from source shard | +| database_syncer_ingress_bytes_decompressed | Number of bytes read from source shard | +| database_syncer_internal_state | Internal state of the syncer | +| database_syncer_lag_ms | Lag time between the source and the destination for traffic in milliseconds | +| database_syncer_rdb_size | The source's RDB size in bytes to be transferred during the syncing phase | +| database_syncer_rdb_transferred | Number of bytes transferred from the source's RDB during the syncing phase | +| database_syncer_src_connectivity_state | Source connectivity state | +| database_syncer_src_connectivity_state_ms | Source connectivity state duration | +| database_syncer_src_repl_offset | Last known source offset | +| database_syncer_state | Internal state of the shard syncer | +| database_syncer_syncer_repl_offset | Offset of the last command handled by the syncer | +| database_syncer_total_requests | Number of destination writes | +| database_syncer_total_responses | Number of destination writes acknowledged | ## Shard metrics | Metric | Description | -| ------ | :------ | -| redis_active_defrag_running | Automatic memory defragmentation current aggressiveness (% cpu) | -| redis_allocator_active | Total used memory including external fragmentation | -| redis_allocator_allocated | Total allocated memory | -| redis_allocator_resident | Total resident memory (RSS) | -| redis_aof_last_cow_size | Last AOFR, CopyOnWrite memory | -| redis_aof_rewrite_in_progress | The number of simultaneous AOF rewrites that are in progress | -| redis_aof_rewrites | Number of AOF rewrites this process executed | -| redis_aof_delayed_fsync | Number of times an AOF fsync caused delays in the redis main thread (inducing latency); This can indicate that the disk is slow or overloaded | -| redis_blocked_clients | Count the clients waiting on a blocking call | -| redis_connected_clients | Number of client connections to the specific shard | -| redis_connected_slaves | Number of connected slaves | -| redis_db0_avg_ttl | Average TTL of all volatile keys | -| redis_db0_expires | Total count of volatile keys | -| redis_db0_keys | Total key count | -| redis_evicted_keys | Keys evicted so far (since restart) | -| redis_expire_cycle_cpu_milliseconds | The cumulative amount of time spent on active expiry cycles | -| redis_expired_keys | Keys expired so far (since restart) | -| redis_forwarding_state | Shard forwarding state (on or off) | -| redis_keys_trimmed | The number of keys that were trimmed in the current or last resharding process | -| redis_keyspace_read_hits | Number of read operations accessing an existing keyspace | -| redis_keyspace_read_misses | Number of read operations accessing an non-existing keyspace | -| redis_keyspace_write_hits | Number of write operations accessing an existing keyspace | -| redis_keyspace_write_misses | Number of write operations accessing an non-existing keyspace | -| redis_master_link_status | Indicates if the replica is connected to its master | -| redis_master_repl_offset | Number of bytes sent to replicas by the shard; Calculate the throughput for a time period by comparing the value at different times | -| redis_master_sync_in_progress | The master shard is synchronizing (1 true | 0 false) | -| redis_max_process_mem | Current memory limit configured by redis_mgr according to node free memory | -| redis_maxmemory | Current memory limit configured by redis_mgr according to db memory limits | -| redis_mem_aof_buffer | Current size of AOF buffer | -| redis_mem_clients_normal | Current memory used for input and output buffers of non-replica clients | -| redis_mem_clients_slaves | Current memory used for input and output buffers of replica clients | -| redis_mem_fragmentation_ratio | Memory fragmentation ratio (1.3 means 30% overhead) | -| redis_mem_not_counted_for_evict | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | -| redis_mem_replication_backlog | Size of replication backlog | -| redis_module_fork_in_progress | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | -| redis_process_cpu_system_seconds_total | Shard Process system CPU time spent in seconds | -| redis_process_cpu_usage_percent | Shard Process cpu usage precentage | -| redis_process_cpu_user_seconds_total | Shard user CPU time spent in seconds | -| redis_process_main_thread_cpu_system_seconds_total | Shard main thread system CPU time spent in seconds | -| redis_process_main_thread_cpu_user_seconds_total | Shard main thread user CPU time spent in seconds | -| redis_process_max_fds | Shard Maximum number of open file descriptors | -| redis_process_open_fds | Shard Number of open file descriptors | -| redis_process_resident_memory_bytes | Shard Resident memory size in bytes | -| redis_process_start_time_seconds | Shard Start time of the process since unix epoch in seconds | -| redis_process_virtual_memory_bytes | Shard virtual memory in bytes | -| redis_rdb_bgsave_in_progress | Indication if bgsave is currently in progress | -| redis_rdb_last_cow_size | Last bgsave (or SYNC fork) used CopyOnWrite memory | -| redis_rdb_saves | Total count of bgsaves since process was restarted (including replica fullsync and persistence) | -| redis_repl_touch_bytes | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; Calculate the throughput for a time period by comparing the value at different times | -| redis_total_commands_processed | Number of commands processed by the shard; Calculate the number of commands for a time period by comparing the value at different times | -| redis_total_connections_received | Number of connections received by the shard; Calculate the number of connections for a time period by comparing the value at different times | -| redis_total_net_input_bytes | Number of bytes received by the shard; Calculate the throughput for a time period by comparing the value at different times | -| redis_total_net_output_bytes | Number of bytes sent by the shard; Calculate the throughput for a time period by comparing the value at different times | -| redis_up | Shard is up and running | -| redis_used_memory | Memory used by shard (in bigredis this includes flash) (bytes) | +| :-------- | :---------- | +| redis_server_active_defrag_running | Automatic memory defragmentation current aggressiveness (% cpu) | +| redis_server_allocator_active | Total used memory, including external fragmentation | +| redis_server_allocator_allocated | Total allocated memory | +| redis_server_allocator_resident | Total resident memory (RSS) | +| redis_server_aof_last_cow_size | Last AOFR, CopyOnWrite memory | +| redis_server_aof_rewrite_in_progress | The number of simultaneous AOF rewrites that are in progress | +| redis_server_aof_rewrites | Number of AOF rewrites this process executed | +| redis_server_aof_delayed_fsync | Number of times an AOF fsync caused delays in the main Redis thread (inducing latency); this can indicate that the disk is slow or overloaded | +| redis_server_blocked_clients | Count the clients waiting on a blocking call | +| redis_server_connected_clients | Number of client connections to the specific shard | +| redis_server_connected_slaves | Number of connected replicas | +| redis_server_db0_avg_ttl | Average TTL of all volatile keys | +| redis_server_expired_keys | Total count of volatile keys | +| redis_server_db0_keys | Total key count | +| redis_server_evicted_keys | Keys evicted so far (since restart) | +| redis_server_expire_cycle_cpu_milliseconds | The cumulative amount of time spent on active expiry cycles | +| redis_server_expired_keys | Keys expired so far (since restart) | +| redis_server_forwarding_state | Shard forwarding state (on or off) | +| redis_server_keys_trimmed | The number of keys that were trimmed in the current or last resharding process | +| redis_server_keyspace_read_hits | Number of read operations accessing an existing keyspace | +| redis_server_keyspace_read_misses | Number of read operations accessing a non-existing keyspace | +| redis_server_keyspace_write_hits | Number of write operations accessing an existing keyspace | +| redis_server_keyspace_write_misses | Number of write operations accessing a non-existing keyspace | +| redis_server_master_link_status | Indicates if the replica is connected to its master | +| redis_server_master_repl_offset | Number of bytes sent to replicas by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_server_master_sync_in_progress | The master shard is synchronizing (1 true | 0 false) | +| redis_server_max_process_mem | Current memory limit configured by redis_mgr according to node free memory | +| redis_server_maxmemory | Current memory limit configured by redis_mgr according to database memory limits | +| redis_server_mem_aof_buffer | Current size of AOF buffer | +| redis_server_mem_clients_normal | Current memory used for input and output buffers of non-replica clients | +| redis_server_mem_clients_slaves | Current memory used for input and output buffers of replica clients | +| redis_server_mem_fragmentation_ratio | Memory fragmentation ratio (1.3 means 30% overhead) | +| redis_server_mem_not_counted_for_evict | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | +| redis_server_mem_replication_backlog | Size of replication backlog | +| redis_server_module_fork_in_progress | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | +| namedprocess_namegroup_cpu_seconds_total | Shard process CPU usage percentage | +| namedprocess_namegroup_thread_cpu_seconds_total | Shard main thread CPU time spent in seconds | +| namedprocess_namegroup_open_filedesc | Shard number of open file descriptors | +| namedprocess_namegroup_memory_bytes | Shard memory size in bytes | +| namedprocess_namegroup_oldest_start_time_seconds | Shard start time of the process since unix epoch in seconds | +| redis_server_rdb_bgsave_in_progress | Indication if bgsave is currently in progress | +| redis_server_rdb_last_cow_size | Last bgsave (or SYNC fork) used CopyOnWrite memory | +| redis_server_rdb_saves | Total count of bgsaves since the process was restarted (including replica fullsync and persistence) | +| redis_server_repl_touch_bytes | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; calculate the throughput for a time period by comparing the value at different times | +| redis_server_total_commands_processed | Number of commands processed by the shard; calculate the number of commands for a time period by comparing the value at different times | +| redis_server_total_connections_received | Number of connections received by the shard; calculate the number of connections for a time period by comparing the value at different times | +| redis_server_total_net_input_bytes | Number of bytes received by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_server_total_net_output_bytes | Number of bytes sent by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_server_up | Shard is up and running | +| redis_server_used_memory | Memory used by shard (in BigRedis this includes flash) (bytes) | diff --git a/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md new file mode 100644 index 0000000000..4453785fae --- /dev/null +++ b/content/integrate/prometheus-with-redis-enterprise/prometheus-metrics-v1-to-v2.md @@ -0,0 +1,279 @@ +--- +Title: Transition from Prometheus v1 to Prometheus v2 +alwaysopen: false +categories: +- docs +- integrate +- rs +description: Transition from v1 metrics to v2 PromQL equivalents. +group: observability +linkTitle: Transition from Prometheus v1 to v2 +summary: Transition from v1 metrics to v2 PromQL equivalents. +type: integration +weight: 45 +--- + +You can [integrate Redis Enterprise Software with Prometheus and Grafana]({{}}) to create dashboards for important metrics. + +As of Redis Enterprise Software version 7.8.2, [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/latest/querying/basics/) metrics are available, and v1 metrics are deprecated. You can use the following tables to transition from v1 metrics to equivalent v2 PromQL. For a list of all available v2 PromQL metrics, see [Prometheus metrics v2]({{}}). + +## Database metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| bdb_avg_latency | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of operations on the database (seconds); returned only when there is traffic | +| bdb_avg_latency_max | `sum by (bdb) (irate(endpoint_acc_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of operations on the database (seconds); returned only when there is traffic | +| bdb_avg_read_latency | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of read operations (seconds); returned only when there is traffic | +| bdb_avg_read_latency_max | `sum by (bdb) (irate(endpoint_acc_read_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of read operations (seconds); returned only when there is traffic | +| bdb_avg_write_latency | `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Average latency of write operations (seconds); returned only when there is traffic | +| bdb_avg_write_latency_max | `sum by (bdb) (irate(endpoint_acc_write_latency[1m])) / sum by (bdb) (irate(endpoint_total_started_res[1m])) / 1000000` | Highest value of average latency of write operations (seconds); returned only when there is traffic | +| bdb_bigstore_shard_count | `sum((sum(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+", threadname=~"(speedb\|rocksdb).*"}, "redis", "$1", "groupname", "redis-(\d+)"), "driver", "$1", "threadname", "(speedb\|rocksdb).*")) by (redis, driver) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, driver)` | Shard count by database and by storage engine (driver - rocksdb / speedb); Only for databases with Auto Tiering enabled | +| bdb_conns | `sum by(bdb) (endpoint_conns)` | Number of client connections to database | +| bdb_egress_bytes | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Rate of outgoing network traffic from the database (bytes/sec) | +| bdb_egress_bytes_max | `sum by(bdb) (irate(endpoint_egress_bytes[1m]))` | Highest value of the rate of outgoing network traffic from the database (bytes/sec) | +| bdb_evicted_objects | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Rate of key evictions from database (evictions/sec) | +| bdb_evicted_objects_max | `sum by (bdb) (irate(redis_server_evicted_keys{role="master"}[1m]))` | Highest value of the rate of key evictions from database (evictions/sec) | +| bdb_expired_objects | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Rate keys expired in database (expirations/sec) | +| bdb_expired_objects_max | `sum by (bdb) (irate(redis_server_expired_keys{role="master"}[1m]))` | Highest value of the rate keys expired in database (expirations/sec) | +| bdb_fork_cpu_system | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | % cores utilization in system mode for all Redis shard fork child processes of this database | +| bdb_fork_cpu_system_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard fork child processes of this database | +| bdb_fork_cpu_user | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | % cores utilization in user mode for all Redis shard fork child processes of this database | +| bdb_fork_cpu_user_max | `sum by (bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard fork child processes of this database | +| bdb_ingress_bytes | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Rate of incoming network traffic to database (bytes/sec) | +| bdb_ingress_bytes_max | `sum by(bdb) (irate(endpoint_ingress_bytes[1m]))` | Highest value of the rate of incoming network traffic to database (bytes/sec) | +| bdb_instantaneous_ops_per_sec | `sum by(bdb) (redis_server_instantaneous_ops_per_sec)` | Request rate handled by all shards of database (ops/sec) | +| bdb_main_thread_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | % cores utilization in system mode for all Redis shard main threads of this database | +| bdb_main_thread_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard main threads of this database | +| bdb_main_thread_cpu_user | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | % cores utilization in user mode for all Redis shard main threads of this database | +| bdb_main_thread_cpu_user_max | `sum by(irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", threadname=~"redis-server.*"}[1m]))` | Highest value of % cores utilization in user mode for all Redis shard main threads of this database | +| bdb_mem_frag_ratio | `avg(redis_server_mem_fragmentation_ratio)` | RAM fragmentation ratio (RSS / allocated RAM) | +| bdb_mem_size_lua | `sum by(bdb) (redis_server_used_memory_lua)` | Redis lua scripting heap size (bytes) | +| bdb_memory_limit | `sum by(bdb) (redis_server_maxmemory)` | Configured RAM limit for the database | +| bdb_monitor_sessions_count | `sum by(bdb) (endpoint_monitor_sessions_count)` | Number of clients connected in monitor mode to the database | +| bdb_no_of_keys | `sum by (bdb) (redis_server_db_keys{role="master"})` | Number of keys in database | +| bdb_other_req | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Rate of other (non read/write) requests on the database (ops/sec) | +| bdb_other_req_max | `sum by(bdb) (irate(endpoint_other_req[1m]))` | Highest value of the rate of other (non read/write) requests on the database (ops/sec) | +| bdb_other_res | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Rate of other (non read/write) responses on the database (ops/sec) | +| bdb_other_res_max | `sum by(bdb) (irate(endpoint_other_res[1m]))` | Highest value of the rate of other (non read/write) responses on the database (ops/sec) | +| bdb_pubsub_channels | `sum by(bdb) (redis_server_pubsub_channels)` | Count the pub/sub channels with subscribed clients | +| bdb_pubsub_channels_max | `sum by(bdb) (redis_server_pubsub_channels)` | Highest value of count the pub/sub channels with subscribed clients | +| bdb_pubsub_patterns | `sum by(bdb) (redis_server_pubsub_patterns)` | Count the pub/sub patterns with subscribed clients | +| bdb_pubsub_patterns_max | `sum by(bdb) (redis_server_pubsub_patterns)` | Highest value of count the pub/sub patterns with subscribed clients | +| bdb_read_hits | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Rate of read operations accessing an existing key (ops/sec) | +| bdb_read_hits_max | `sum by (bdb) (irate(redis_server_keyspace_read_hits{role="master"}[1m]))` | Highest value of the rate of read operations accessing an existing key (ops/sec) | +| bdb_read_misses | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Rate of read operations accessing a non-existing key (ops/sec) | +| bdb_read_misses_max | `sum by (bdb) (irate(redis_server_keyspace_read_misses{role="master"}[1m]))` | Highest value of the rate of read operations accessing a non-existing key (ops/sec) | +| bdb_read_req | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Rate of read requests on the database (ops/sec) | +| bdb_read_req_max | `sum by (bdb) (irate(endpoint_read_req[1m]))` | Highest value of the rate of read requests on the database (ops/sec) | +| bdb_read_res | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Rate of read responses on the database (ops/sec) | +| bdb_read_res_max | `sum by(bdb) (irate(endpoint_read_res[1m]))` | Highest value of the rate of read responses on the database (ops/sec) | +| bdb_shard_cpu_system | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | % cores utilization in system mode for all Redis shard processes of this database | +| bdb_shard_cpu_system_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="system", role="master"}[1m]))` | Highest value of % cores utilization in system mode for all Redis shard processes of this database | +| bdb_shard_cpu_user | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | % cores utilization in user mode for the Redis shard process | +| bdb_shard_cpu_user_max | `sum by(bdb) (irate(namedprocess_namegroup_thread_cpu_seconds_total{mode="user", role="master"}[1m]))` | Highest value of % cores utilization in user mode for the Redis shard process | +| bdb_shards_used | `sum((sum(label_replace(label_replace(label_replace(namedprocess_namegroup_thread_count{groupname=~"redis-\d+"}, "redis", "$1", "groupname", "redis-(\d+)"), "shard_type", "flash", "threadname", "(bigstore).*"), "shard_type", "ram", "shard_type", "")) by (redis, shard_type) > bool 0) * on (redis) group_left(bdb) redis_server_up) by (bdb, shard_type)` | Used shard count by database and by shard type (ram / flash) | +| bdb_total_connections_received | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Rate of new client connections to database (connections/sec) | +| bdb_total_connections_received_max | `sum by(bdb) (irate(endpoint_total_connections_received[1m]))` | Highest value of the rate of new client connections to database (connections/sec) | +| bdb_total_req | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Rate of all requests on the database (ops/sec) | +| bdb_total_req_max | `sum by (bdb) (irate(endpoint_total_req[1m]))` | Highest value of the rate of all requests on the database (ops/sec) | +| bdb_total_res | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Rate of all responses on the database (ops/sec) | +| bdb_total_res_max | `sum by(bdb) (irate(endpoint_total_res[1m]))` | Highest value of the rate of all responses on the database (ops/sec) | +| bdb_up | `min by(bdb) (redis_up)` | Database is up and running | +| bdb_used_memory | `sum by (bdb) (redis_server_used_memory)` | Memory used by database (in BigRedis this includes flash) (bytes) | +| bdb_write_hits | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Rate of write operations accessing an existing key (ops/sec) | +| bdb_write_hits_max | `sum by (bdb) (irate(redis_server_keyspace_write_hits{role="master"}[1m]))` | Highest value of the rate of write operations accessing an existing key (ops/sec) | +| bdb_write_misses | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Rate of write operations accessing a non-existing key (ops/sec) | +| bdb_write_misses_max | `sum by (bdb) (irate(redis_server_keyspace_write_misses{role="master"}[1m]))` | Highest value of the rate of write operations accessing a non-existing key (ops/sec) | +| bdb_write_req | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Rate of write requests on the database (ops/sec) | +| bdb_write_req_max | `sum by (bdb) (irate(endpoint_write_req[1m]))` | Highest value of the rate of write requests on the database (ops/sec) | +| bdb_write_res | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Rate of write responses on the database (ops/sec) | +| bdb_write_res_max | `sum by(bdb) (irate(endpoint_write_responses[1m]))` | Highest value of the rate of write responses on the database (ops/sec) | +| no_of_expires | `sum by(bdb) (redis_server_db_expires{role="master"})` | Current number of volatile keys in the database | + +## Node metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| node_available_flash | `node_available_flash_bytes` | Available flash in the node (bytes) | +| node_available_flash_no_overbooking | `node_available_flash_no_overbooking_bytes` | Available flash in the node (bytes), without taking into account overbooking | +| node_available_memory | `node_available_memory_bytes` | Amount of free memory in the node (bytes) that is available for database provisioning | +| node_available_memory_no_overbooking | `node_available_memory_no_overbooking_bytes` | Available RAM in the node (bytes) without taking into account overbooking | +| node_avg_latency | `sum by (proxy) (irate(endpoint_acc_latency[1m])) / sum by (proxy) (irate(endpoint_total_started_res[1m]))` | Average latency of requests handled by endpoints on the node in milliseconds; returned only when there is traffic | +| node_bigstore_free | `node_bigstore_free_bytes` | Sum of free space of back-end flash (used by flash database's [BigRedis]) on all cluster nodes (bytes); returned only when BigRedis is enabled | +| node_bigstore_iops | `node_flash_reads_total + node_flash_writes_total` | Rate of I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | +| node_bigstore_kv_ops | `sum by (node) (irate(redis_server_big_io_dels[1m]) + irate(redis_server_big_io_reads[1m]) + irate(redis_server_big_io_writes[1m]))` | Rate of value read/write operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (ops/sec); returned only when BigRedis is enabled | +| node_bigstore_throughput | `sum by (node) (irate(redis_server_big_io_read_bytes[1m]) + irate(redis_server_big_io_write_bytes[1m]))` | Throughput I/O operations against back-end flash for all shards which are part of a flash-based database (BigRedis) in the cluster (bytes/sec); returned only when BigRedis is enabled | +| node_cert_expiration_seconds | `node_cert_expires_in_seconds` | Certificate expiration (in seconds) per given node; read more about [certificates in Redis Enterprise]({{< relref "/operate/rs/security/certificates" >}}) and [monitoring certificates]({{< relref "/operate/rs/security/certificates/monitor-certificates" >}}) | +| node_conns | `sum by (node) (endpoint_conns)` | Number of clients connected to endpoints on node | +| node_cpu_idle | `avg by (node) (irate(node_cpu_seconds_total{mode="idle"}[1m]))` | CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_max | N/A | Highest value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_median | N/A | Average value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_idle_min | N/A | Lowest value of CPU idle time portion (0-1, multiply by 100 to get percent) | +| node_cpu_system | `avg by (node) (irate(node_cpu_seconds_total{mode="system"}[1m]))` | CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_max | N/A | Highest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_median | N/A | Average value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_system_min | N/A | Lowest value of CPU time portion spent in the kernel (0-1, multiply by 100 to get percent) | +| node_cpu_user | `avg by (node) (irate(node_cpu_seconds_total{mode="user"}[1m]))` | CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_max | N/A | Highest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_median | N/A | Average value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cpu_user_min | N/A | Lowest value of CPU time portion spent by user-space processes (0-1, multiply by 100 to get percent) | +| node_cur_aof_rewrites | `sum by (cluster, node) (redis_server_aof_rewrite_in_progress)` | Number of AOF rewrites that are currently performed by shards on this node | +| node_egress_bytes | `irate(node_network_transmit_bytes_total{device=""}[1m])` | Rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_max | N/A | Highest value of the rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_median | N/A | Average value of the rate of outgoing network traffic to node (bytes/sec) | +| node_egress_bytes_min | N/A | Lowest value of the rate of outgoing network traffic to node (bytes/sec) | +| node_ephemeral_storage_avail | `node_ephemeral_storage_avail_bytes` | Disk space available to RLEC processes on configured ephemeral disk (bytes) | +| node_ephemeral_storage_free | `node_ephemeral_storage_free_bytes` | Free disk space on configured ephemeral disk (bytes) | +| node_free_memory | `node_memory_MemFree_bytes` | Free memory in the node (bytes) | +| node_ingress_bytes | `irate(node_network_receive_bytes_total{device=""}[1m])` | Rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_max | N/A | Highest value of the rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_median | N/A | Average value of the rate of incoming network traffic to node (bytes/sec) | +| node_ingress_bytes_min | N/A | Lowest value of the rate of incoming network traffic to node (bytes/sec) | +| node_persistent_storage_avail | `node_persistent_storage_avail_bytes` | Disk space available to RLEC processes on configured persistent disk (bytes) | +| node_persistent_storage_free | `node_persistent_storage_free_bytes` | Free disk space on configured persistent disk (bytes) | +| node_provisional_flash | `node_provisional_flash_bytes` | Amount of flash available for new shards on this node, taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| node_provisional_flash_no_overbooking | `node_provisional_flash_no_overbooking_bytes` | Amount of flash available for new shards on this node, without taking into account overbooking, max Redis servers, reserved flash, and provision and migration thresholds (bytes) | +| node_provisional_memory | `node_provisional_memory_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases | +| node_provisional_memory_no_overbooking | `node_provisional_memory_no_overbooking_bytes` | Amount of RAM that is available for provisioning to databases out of the total RAM allocated for databases, without taking into account overbooking | +| node_total_req | `sum by (cluster, node) (irate(endpoint_total_req[1m]))` | Request rate handled by endpoints on node (ops/sec) | +| node_up | `node_metrics_up` | Node is part of the cluster and is connected | + +## Cluster metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| cluster_shards_limit | `license_shards_limit` | Total shard limit by the license by shard type (ram / flash) | + +## Proxy metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| listener_acc_latency | N/A | Accumulative latency (sum of the latencies) of all types of commands on the database. For the average latency, divide this value by listener_total_res | +| listener_acc_latency_max | N/A | Highest value of accumulative latency of all types of commands on the database | +| listener_acc_other_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "other" on the database. For the average latency, divide this value by listener_other_res | +| listener_acc_other_latency_max | N/A | Highest value of accumulative latency of commands that are a type "other" on the database | +| listener_acc_read_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "read" on the database. For the average latency, divide this value by listener_read_res | +| listener_acc_read_latency_max | N/A | Highest value of accumulative latency of commands that are a type "read" on the database | +| listener_acc_write_latency | N/A | Accumulative latency (sum of the latencies) of commands that are a type "write" on the database. For the average latency, divide this value by listener_write_res | +| listener_acc_write_latency_max | N/A | Highest value of accumulative latency of commands that are a type "write" on the database | +| listener_auth_cmds | N/A | Number of memcached AUTH commands sent to the database | +| listener_auth_cmds_max | N/A | Highest value of the number of memcached AUTH commands sent to the database | +| listener_auth_errors | N/A | Number of error responses to memcached AUTH commands | +| listener_auth_errors_max | N/A | Highest value of the number of error responses to memcached AUTH commands | +| listener_cmd_flush | N/A | Number of memcached FLUSH_ALL commands sent to the database | +| listener_cmd_flush_max | N/A | Highest value of the number of memcached FLUSH_ALL commands sent to the database | +| listener_cmd_get | N/A | Number of memcached GET commands sent to the database | +| listener_cmd_get_max | N/A | Highest value of the number of memcached GET commands sent to the database | +| listener_cmd_set | N/A | Number of memcached SET commands sent to the database | +| listener_cmd_set_max | N/A | Highest value of the number of memcached SET commands sent to the database | +| listener_cmd_touch | N/A | Number of memcached TOUCH commands sent to the database | +| listener_cmd_touch_max | N/A | Highest value of the number of memcached TOUCH commands sent to the database | +| listener_conns | N/A | Number of clients connected to the endpoint | +| listener_egress_bytes | N/A | Rate of outgoing network traffic to the endpoint (bytes/sec) | +| listener_egress_bytes_max | N/A | Highest value of the rate of outgoing network traffic to the endpoint (bytes/sec) | +| listener_ingress_bytes | N/A | Rate of incoming network traffic to the endpoint (bytes/sec) | +| listener_ingress_bytes_max | N/A | Highest value of the rate of incoming network traffic to the endpoint (bytes/sec) | +| listener_last_req_time | N/A | Time of last command sent to the database | +| listener_last_res_time | N/A | Time of last response sent from the database | +| listener_max_connections_exceeded | `irate(endpoint_maximal_connections_exceeded[1m])` | Number of times the number of clients connected to the database at the same time has exceeded the max limit | +| listener_max_connections_exceeded_max | N/A | Highest value of the number of times the number of clients connected to the database at the same time has exceeded the max limit | +| listener_monitor_sessions_count | N/A | Number of clients connected in monitor mode to the endpoint | +| listener_other_req | N/A | Rate of other (non-read/write) requests on the endpoint (ops/sec) | +| listener_other_req_max | N/A | Highest value of the rate of other (non-read/write) requests on the endpoint (ops/sec) | +| listener_other_res | N/A | Rate of other (non-read/write) responses on the endpoint (ops/sec) | +| listener_other_res_max | N/A | Highest value of the rate of other (non-read/write) responses on the endpoint (ops/sec) | +| listener_other_started_res | N/A | Number of responses sent from the database of type "other" | +| listener_other_started_res_max | N/A | Highest value of the number of responses sent from the database of type "other" | +| listener_read_req | `irate(endpoint_read_requests[1m])` | Rate of read requests on the endpoint (ops/sec) | +| listener_read_req_max | N/A | Highest value of the rate of read requests on the endpoint (ops/sec) | +| listener_read_res | `irate(endpoint_read_responses[1m])` | Rate of read responses on the endpoint (ops/sec) | +| listener_read_res_max | N/A | Highest value of the rate of read responses on the endpoint (ops/sec) | +| listener_read_started_res | N/A | Number of responses sent from the database of type "read" | +| listener_read_started_res_max | N/A | Highest value of the number of responses sent from the database of type "read" | +| listener_total_connections_received | `irate(endpoint_total_connections_received[1m])` | Rate of new client connections to the endpoint (connections/sec) | +| listener_total_connections_received_max | N/A | Highest value of the rate of new client connections to the endpoint (connections/sec) | +| listener_total_req | N/A | Request rate handled by the endpoint (ops/sec) | +| listener_total_req_max | N/A | Highest value of the rate of all requests on the endpoint (ops/sec) | +| listener_total_res | N/A | Rate of all responses on the endpoint (ops/sec) | +| listener_total_res_max | N/A | Highest value of the rate of all responses on the endpoint (ops/sec) | +| listener_total_started_res | N/A | Number of responses sent from the database of all types | +| listener_total_started_res_max | N/A | Highest value of the number of responses sent from the database of all types | +| listener_write_req | `irate(endpoint_write_requests[1m])` | Rate of write requests on the endpoint (ops/sec) | +| listener_write_req_max | N/A | Highest value of the rate of write requests on the endpoint (ops/sec) | +| listener_write_res | `irate(endpoint_write_responses[1m])` | Rate of write responses on the endpoint (ops/sec) | +| listener_write_res_max | N/A | Highest value of the rate of write responses on the endpoint (ops/sec) | +| listener_write_started_res | N/A | Number of responses sent from the database of type "write" | +| listener_write_started_res_max | N/A | Highest value of the number of responses sent from the database of type "write" | + +## Replication metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| bdb_replicaof_syncer_ingress_bytes | `rate(replica_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to a Replica Of database (bytes/sec) | +| bdb_replicaof_syncer_ingress_bytes_decompressed | `rate(replica_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to a Replica Of database (bytes/sec) | +| bdb_replicaof_syncer_local_ingress_lag_time | `database_syncer_lag_ms{syncer_type="replicaof"}` | Lag time between the source and the destination for Replica Of traffic (ms) | +| bdb_replicaof_syncer_status | `database_syncer_current_status{syncer_type="replicaof"}` | Syncer status for Replica Of traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | +| bdb_crdt_syncer_ingress_bytes | `rate(crdt_src_ingress_bytes[1m])` | Rate of compressed incoming network traffic to CRDB (bytes/sec) | +| bdb_crdt_syncer_ingress_bytes_decompressed | `rate(crdt_src_ingress_bytes_decompressed[1m])` | Rate of decompressed incoming network traffic to CRDB (bytes/sec) | +| bdb_crdt_syncer_local_ingress_lag_time | `database_syncer_lag_ms{syncer_type="crdt"}` | Lag time between the source and the destination (ms) for CRDB traffic | +| bdb_crdt_syncer_status | `database_syncer_current_status{syncer_type="crdt"}` | Syncer status for CRDB traffic; 0 = in-sync, 1 = syncing, 2 = out of sync | + +## Shard metrics + +| V1 metric | Equivalent V2 PromQL | Description | +| --------- | :------------------- | :---------- | +| redis_active_defrag_running | `redis_server_active_defrag_running` | Automatic memory defragmentation current aggressiveness (% cpu) | +| redis_allocator_active | `redis_server_allocator_active` | Total used memory, including external fragmentation | +| redis_allocator_allocated | `redis_server_allocator_allocated` | Total allocated memory | +| redis_allocator_resident | `redis_server_allocator_resident` | Total resident memory (RSS) | +| redis_aof_last_cow_size | `redis_server_aof_last_cow_size` | Last AOFR, CopyOnWrite memory | +| redis_aof_rewrite_in_progress | `redis_server_aof_rewrite_in_progress` | The number of simultaneous AOF rewrites that are in progress | +| redis_aof_rewrites | `redis_server_aof_rewrites` | Number of AOF rewrites this process executed | +| redis_aof_delayed_fsync | `redis_server_aof_delayed_fsync` | Number of times an AOF fsync caused delays in the main Redis thread (inducing latency); this can indicate that the disk is slow or overloaded | +| redis_blocked_clients | `redis_server_blocked_clients` | Count the clients waiting on a blocking call | +| redis_connected_clients | `redis_server_connected_clients` | Number of client connections to the specific shard | +| redis_connected_slaves | `redis_server_connected_slaves` | Number of connected replicas | +| redis_db0_avg_ttl | `redis_server_db0_avg_ttl` | Average TTL of all volatile keys | +| redis_db0_expires | `redis_server_expired_keys` | Total count of volatile keys | +| redis_db0_keys | `redis_server_db0_keys` | Total key count | +| redis_evicted_keys | `redis_server_evicted_keys` | Keys evicted so far (since restart) | +| redis_expire_cycle_cpu_milliseconds | `redis_server_expire_cycle_cpu_milliseconds` | The cumulative amount of time spent on active expiry cycles | +| redis_expired_keys | `redis_server_expired_keys` | Keys expired so far (since restart) | +| redis_forwarding_state | `redis_server_forwarding_state` | Shard forwarding state (on or off) | +| redis_keys_trimmed | `redis_server_keys_trimmed` | The number of keys that were trimmed in the current or last resharding process | +| redis_keyspace_read_hits | `redis_server_keyspace_read_hits` | Number of read operations accessing an existing keyspace | +| redis_keyspace_read_misses | `redis_server_keyspace_read_misses` | Number of read operations accessing a non-existing keyspace | +| redis_keyspace_write_hits | `redis_server_keyspace_write_hits` | Number of write operations accessing an existing keyspace | +| redis_keyspace_write_misses | `redis_server_keyspace_write_misses` | Number of write operations accessing a non-existing keyspace | +| redis_master_link_status | `redis_server_master_link_status` | Indicates if the replica is connected to its master | +| redis_master_repl_offset | `redis_server_master_repl_offset` | Number of bytes sent to replicas by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_master_sync_in_progress | `redis_server_master_sync_in_progress` | The master shard is synchronizing (1 true | 0 false) | +| redis_max_process_mem | `redis_server_max_process_mem` | Current memory limit configured by redis_mgr according to node free memory | +| redis_maxmemory | `redis_server_maxmemory` | Current memory limit configured by redis_mgr according to database memory limits | +| redis_mem_aof_buffer | `redis_server_mem_aof_buffer` | Current size of AOF buffer | +| redis_mem_clients_normal | `redis_server_mem_clients_normal` | Current memory used for input and output buffers of non-replica clients | +| redis_mem_clients_slaves | `redis_server_mem_clients_slaves` | Current memory used for input and output buffers of replica clients | +| redis_mem_fragmentation_ratio | `redis_server_mem_fragmentation_ratio` | Memory fragmentation ratio (1.3 means 30% overhead) | +| redis_mem_not_counted_for_evict | `redis_server_mem_not_counted_for_evict` | Portion of used_memory (in bytes) that's not counted for eviction and OOM error | +| redis_mem_replication_backlog | `redis_server_mem_replication_backlog` | Size of replication backlog | +| redis_module_fork_in_progress | `redis_server_module_fork_in_progress` | A binary value that indicates if there is an active fork spawned by a module (1) or not (0) | +| redis_process_cpu_system_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="system"}` | Shard process system CPU time spent in seconds | +| redis_process_cpu_usage_percent | `namedprocess_namegroup_cpu_seconds_total{mode=~"system\|user"}` | Shard process CPU usage percentage | +| redis_process_cpu_user_seconds_total | `namedprocess_namegroup_cpu_seconds_total{mode="user"}` | Shard user CPU time spent in seconds | +| redis_process_main_thread_cpu_system_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="system",threadname="redis-server"}` | Shard main thread system CPU time spent in seconds | +| redis_process_main_thread_cpu_user_seconds_total | `namedprocess_namegroup_thread_cpu_seconds_total{mode="user",threadname="redis-server"}` | Shard main thread user CPU time spent in seconds | +| redis_process_max_fds | `max(namedprocess_namegroup_open_filedesc)` | Shard maximum number of open file descriptors | +| redis_process_open_fds | `namedprocess_namegroup_open_filedesc` | Shard number of open file descriptors | +| redis_process_resident_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="resident"}` | Shard resident memory size in bytes | +| redis_process_start_time_seconds | `namedprocess_namegroup_oldest_start_time_seconds` | Shard start time of the process since unix epoch in seconds | +| redis_process_virtual_memory_bytes | `namedprocess_namegroup_memory_bytes{memtype="virtual"}` | Shard virtual memory in bytes | +| redis_rdb_bgsave_in_progress | `redis_server_rdb_bgsave_in_progress` | Indication if bgsave is currently in progress | +| redis_rdb_last_cow_size | `redis_server_rdb_last_cow_size` | Last bgsave (or SYNC fork) used CopyOnWrite memory | +| redis_rdb_saves | `redis_server_rdb_saves` | Total count of bgsaves since the process was restarted (including replica fullsync and persistence) | +| redis_repl_touch_bytes | `redis_server_repl_touch_bytes` | Number of bytes sent to replicas as TOUCH commands by the shard as a result of a READ command that was processed; calculate the throughput for a time period by comparing the value at different times | +| redis_total_commands_processed | `redis_server_total_commands_processed` | Number of commands processed by the shard; calculate the number of commands for a time period by comparing the value at different times | +| redis_total_connections_received | `redis_server_total_connections_received` | Number of connections received by the shard; calculate the number of connections for a time period by comparing the value at different times | +| redis_total_net_input_bytes | `redis_server_total_net_input_bytes` | Number of bytes received by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_total_net_output_bytes | `redis_server_total_net_output_bytes` | Number of bytes sent by the shard; calculate the throughput for a time period by comparing the value at different times | +| redis_up | `redis_server_up` | Shard is up and running | +| redis_used_memory | `redis_server_used_memory` | Memory used by shard (in BigRedis this includes flash) (bytes) |