From 981c727cb8ac0d4b3e4b0dd32c2271ba2d6e67e9 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 19 Jul 2024 15:54:05 -0500 Subject: [PATCH 1/6] DOC-3269 Change DB latency threshold alert to milliseconds --- .../logging/rsyslog-logging/bdb-events.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md index 8782c4a3b6..88b778ff4f 100644 --- a/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md +++ b/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md @@ -18,17 +18,17 @@ Logged alerts that appear in the UI | Alert code name | Alert as shown in the UI | Severity | Notes | |-----------------|--------------------------|----------|-------| -backup_delayed | Periodic backup has been delayed for longer than minutes | true: warning
false: info | Has threshold parameter in the data section of the log entry. -high_latency | Latency is higher than msec | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -high_syncer_lag | Replica of - sync lag is higher than seconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -high_throughput | Throughput is higher than RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -low_throughput | Throughput is lower than RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -ram_dataset_overhead | RAM Dataset overhead in a shard has reached % of its RAM limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -ram_values | Percent of values in a shard’s RAM is lower than % of its key count | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -shard_num_ram_values | Number of values in a shard’s RAM is lower than values | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -size | Dataset size has reached % of the memory limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -syncer_connection_error | Replica of - database unable to sync with source | error | -syncer_general_error | Replica of - database unable to sync with source | error | +| backup_delayed | Periodic backup has been delayed for longer than `` minutes | true: warning
false: info | Has threshold parameter in the data section of the log entry. +| high_latency | Latency is higher than `` milliseconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +| high_syncer_lag | Replica of - sync lag is higher than `` seconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +| high_throughput | Throughput is higher than `` RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +| low_throughput | Throughput is lower than `` RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +| ram_dataset_overhead | RAM Dataset overhead in a shard has reached ``% of its RAM limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +| ram_values | Percent of values in a shard’s RAM is lower than ``% of its key count | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +| shard_num_ram_values | Number of values in a shard’s RAM is lower than `` values | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +| size | Dataset size has reached ``% of the memory limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +| syncer_connection_error | Replica of - database unable to sync with source | error | +| syncer_general_error | Replica of - database unable to sync with source | error | ## Non-UI events From ca839cea99b30d6262ab454fabba82d42ed088b8 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 19 Jul 2024 16:20:59 -0500 Subject: [PATCH 2/6] DOC-3280 RS: Move incorrectly classified non-UI events to UI events tables --- .../logging/rsyslog-logging/bdb-events.md | 64 +++++++++---------- .../logging/rsyslog-logging/cluster-events.md | 19 ++---- .../logging/rsyslog-logging/user-events.md | 14 ++-- 3 files changed, 45 insertions(+), 52 deletions(-) diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md index 88b778ff4f..45e858dfc2 100644 --- a/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md +++ b/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md @@ -14,41 +14,41 @@ The following database (BDB) alerts and events can appear in `syslog`. ## UI alerts -Logged alerts that appear in the UI +Logged alerts that appear in the UI: | Alert code name | Alert as shown in the UI | Severity | Notes | |-----------------|--------------------------|----------|-------| -| backup_delayed | Periodic backup has been delayed for longer than `` minutes | true: warning
false: info | Has threshold parameter in the data section of the log entry. -| high_latency | Latency is higher than `` milliseconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -| high_syncer_lag | Replica of - sync lag is higher than `` seconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -| high_throughput | Throughput is higher than `` RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -| low_throughput | Throughput is lower than `` RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -| ram_dataset_overhead | RAM Dataset overhead in a shard has reached ``% of its RAM limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -| ram_values | Percent of values in a shard’s RAM is lower than ``% of its key count | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -| shard_num_ram_values | Number of values in a shard’s RAM is lower than `` values | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -| size | Dataset size has reached ``% of the memory limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -| syncer_connection_error | Replica of - database unable to sync with source | error | -| syncer_general_error | Replica of - database unable to sync with source | error | +| backup_delayed | Periodic backup has been delayed for longer than `` minutes | true: warning
false: info | Has threshold parameter in the data section of the log entry. | +| high_latency | Latency is higher than `` milliseconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | +| high_syncer_lag | Replica of - sync lag is higher than `` seconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | +| high_throughput | Throughput is higher than `` RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | +| low_throughput | Throughput is lower than `` RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | +| ram_dataset_overhead | RAM Dataset overhead in a shard has reached ``% of its RAM limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | +| ram_values | Percent of values in a shard’s RAM is lower than ``% of its key count | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | +| shard_num_ram_values | Number of values in a shard’s RAM is lower than `` values | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | +| size | Dataset size has reached ``% of the memory limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | +| syncer_connection_error | Replica of - database unable to sync with source | error | | +| syncer_general_error | Replica of - database unable to sync with source | error | | -## Non-UI events +## UI events -Logged events that do not appear in the UI +Logged events that appear in the UI: -| Event code name | Severity | Notes | -|-----------------|----------|-------| -| authentication_err | error | Replica of - Error authenticating with the source database | -| backup_failed | error | | -| backup_started | info | | -| backup_succeeded | info | | -| bdb_created | info | | -| bdb_deleted | info | | -| bdb_updated | info | Indicates that a BDB configuration has been updated | -| compression_unsup_err | error | Replica of - Compression not supported by sync destination | -| crossslot_err | error | Replica of - Sharded destination does not support operation executed on source | -| export_failed | error | | -| export_started | info | | -| export_succeeded | info | | -| import_failed | error | | -| import_started | info | | -| import_succeeded | info | | -| oom_err | error | Replica of - Replication source/target out of memory | \ No newline at end of file +| Event code name | Event as shown in the UI | Severity | Notes | +|-----------------|--------------------------|----------|-------| +| authentication_err | | error | Replica of - Error authenticating with the source database | +| backup_failed | | error | | +| backup_started | | info | | +| backup_succeeded | | info | | +| bdb_created | | info | | +| bdb_deleted | | info | | +| bdb_updated | | info | Indicates that a BDB configuration has been updated | +| compression_unsup_err | | error | Replica of - Compression not supported by sync destination | +| crossslot_err | | error | Replica of - Sharded destination does not support operation executed on source | +| export_failed | | error | | +| export_started | | info | | +| export_succeeded | | info | | +| import_failed | | error | | +| import_started | | info | | +| import_succeeded | | info | | +| oom_err | | error | Replica of - Replication source/target out of memory | \ No newline at end of file diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md index 45d38efbdf..b868cfd05f 100644 --- a/content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md +++ b/content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md @@ -14,7 +14,7 @@ The following cluster alerts and events can appear in `syslog`. ## UI alerts -Logged alerts that appear in the UI +Logged alerts that appear in the UI: | Alert code name | Alert as shown in the UI | Severity | Notes | |-----------------|--------------------------|----------|-------| @@ -28,10 +28,14 @@ too_few_nodes_for_replication | Database replication requires at least two nodes ## UI events -Logged events that appear in the UI +Logged events that appear in the UI: | Event code name | Event as shown in the UI | Severity | Notes | |-----------------|--------------------------|----------|-------| +| cluster_updated | | info | Indicates that cluster settings have been updated | | +| license_added | | info | | +| license_deleted | | info | | +| license_updated | | info | | | node_joined | Node joined | info | | | node_remove_abort_completed | Node removed | info | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | | node_remove_abort_failed | Node removed | error | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | @@ -41,14 +45,3 @@ Logged events that appear in the UI | rebalance_abort_failed | Nodes rebalanced | error | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | | rebalance_completed | Nodes rebalanced | info | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | | rebalance_failed | Nodes rebalanced | error | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | - -## Non-UI events - -Logged events that do not appear in the UI - -| Event code name | Severity | Notes | -|-----------------|----------|-------| -| cluster_updated | info | Indicates that cluster settings have been updated | -| license_added | info | | -| license_deleted | info | | -| license_updated | info | | \ No newline at end of file diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/user-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/user-events.md index 4a46a9a04d..52e1d57515 100644 --- a/content/operate/rs/clusters/logging/rsyslog-logging/user-events.md +++ b/content/operate/rs/clusters/logging/rsyslog-logging/user-events.md @@ -12,12 +12,12 @@ weight: 50 The following user events can appear in `syslog`. -## Non-UI events +## UI events -Logged events that do not appear in the UI +Logged events that appear in the UI: -| Event code name | Severity | Notes | -|-----------------|----------|-------| -| user_created | info | | -| user_deleted | info | | -| user_updated | info | Indicates that a user configuration has been updated | +| Event code name | Event as shown in the UI | Severity | Notes | +|-----------------|--------------------------|----------|-------| +| user_created | | info | | +| user_deleted | | info | | +| user_updated | | info | Indicates that a user configuration has been updated | From 46c71d7e68616d091461639a6b2c5dd043d689e8 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 26 Jul 2024 16:32:53 -0500 Subject: [PATCH 3/6] DOC-744 DOC-3280 Combine and add missing logged alerts/events --- .../rs/clusters/logging/alerts-events.md | 76 +++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 content/operate/rs/clusters/logging/alerts-events.md diff --git a/content/operate/rs/clusters/logging/alerts-events.md b/content/operate/rs/clusters/logging/alerts-events.md new file mode 100644 index 0000000000..78378023a1 --- /dev/null +++ b/content/operate/rs/clusters/logging/alerts-events.md @@ -0,0 +1,76 @@ +--- +Title: Alerts and events +alwaysopen: false +categories: +- docs +- operate +- rs +description: Logged alerts and events +linkTitle: Alerts and events +weight: 50 +--- + +The following alerts and events can appear in `syslog` and the Cluster Manager UI logs. + +| Alert/Event | UI message | Severity | Notes | +|-----------------------------------|----------------------------------------------------------------|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------| +| aof_slow_disk_io | Redis performance is degraded as a result of disk I/O limits | True: error, False: info | node alert | +| authentication_err | | error | bdb event; Replica of - error authenticating with the source database | +| backup_delayed | Periodic backup has been delayed for longer than minutes | True: warning, False: info | bdb alert; Has threshold parameter in the data: section of the log entry. | +| backup_failed | | error | bdb event | +| backup_started | | info | bdb event | +| backup_succeeded | | info | bdb event | +| bdb_created | | info | bdb event | +| bdb_deleted | | info | bdb event | +| bdb_updated | | info | bdb event; Indicates that a bdb configuration has been updated | +| checks_error | | error | node event; Indicates that one or more node checks have failed | +| cluster_updated | | info | cluster event; Indicates that cluster settings have been updated | +| compression_unsup_err | | error | bdb event; Replica of - Compression not supported by sync destination | +| crossslot_err | | error | bdb event; Replica of - sharded destination does not support operation executed on source | +| cpu_utilization | CPU utilization has reached % | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| even_node_count | True high availability requires an odd number of nodes | True: warning, False: info | cluster alert | +| ephemeral_storage | Ephemeral storage has reached % of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| export_failed | | error | bdb event | +| export_started | | info | bdb event | +| export_succeeded | | info | bdb event | +| failed | Node failed | critical | node alert | +| free_flash | Flash storage has reached % of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| high_latency | Latency is higher than msec | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| high_syncer_lag | Replica of - sync lag is higher than seconds | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| high_throughput | Throughput is higher than RPS (requests per second) | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| import_failed | | error | bdb event | +| import_started | | info | bdb event | +| import_succeeded | | info | bdb event | +| inconsistent_redis_sw | Not all databases are running the same open source version | True: warning, False: info | cluster alert | +| inconsistent_rl_sw | Not all nodes in the cluster are running the same Redis Labs Enterprise Cluster version | True: warning, False: info | cluster alert | +| insufficient_disk_aofrw | Node has insufficient disk space for AOF rewrite | True: error, False: info | node alert | +| internal_bdb | Issues with internal cluster databases | True: warning, False: info | cluster alert | +| license_added | | info | cluster event | +| license_deleted | | info | cluster event | +| license_updated | | info | cluster event | +| low_throughput | Throughput is lower than RPS (requests per second) | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| memory | Node memory has reached % of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| multiple_nodes_down | Multiple cluster nodes are down - this might cause data loss | True: warning, False: info | cluster alert | +| net_throughput | Network throughput has reached MB/s | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| node_abort_remove_request | | info | node event | +| node_joined | Node joined | info | cluster event | +| node_operation_failed | Node operation failed | error | cluster event | +| node_remove_abort_completed | Node removed | info | cluster event; The remove node is a process that can fail and can also be aborted. If aborted, the abort can succeed or fail. | +| node_remove_abort_failed | Node removed | error | cluster event; The remove node is a process that can fail and can also be aborted. If aborted, the abort can succeed or fail. | +| node_remove_completed | Node removed | info | cluster event; The remove node is a process that can fail and can also be aborted. If aborted, the abort can succeed or fail. | +| node_remove_failed | Node removed | error | cluster event; The remove node is a process that can fail and can also be aborted. If aborted, the abort can succeed or fail. | +| node_remove_request | | info | node event | +| ocsp_query_failed | Failed querying OCSP server | True: error, False: info | cluster alert | +| ocsp_status_revoked | OCSP status revoked | True: error, False: info | cluster alert | +| persistent_storage | Persistent storage has reached % of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| ram_dataset_overhead | RAM Dataset overhead in a shard has reached % of its RAM limit | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| ram_overcommit | Cluster capacity is less than total memory allocated to its databases | True: error, False: info | cluster alert | +| ram_values | Percent of values in a shard's RAM is lower than % of its key count | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| shard_num_ram_values | Number of values in a shard's RAM is lower than values | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| size | Dataset size has reached % of the memory limit | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| syncer_connection_error | | error | bdb alert | +| syncer_general_error | | error | bdb alert | +| too_few_nodes_for_replication | Database replication requires at least two nodes in cluster | True: warning, False: info | cluster alert | +| user_created | | info | user event | +| user_deleted | | info | user event | +| user_updated | | info | user event; Indicates that a user configuration has been updated | From 4d1e7f174360ac02afac603c11d48bb76eafff46 Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 26 Jul 2024 16:50:19 -0500 Subject: [PATCH 4/6] DOC-744 DOC-3280 Add missing logged alerts/events, remove old separate reference pages, add aliases --- .../rs/clusters/logging/alerts-events.md | 6 +++ .../logging/rsyslog-logging/_index.md | 18 +++---- .../logging/rsyslog-logging/bdb-events.md | 54 ------------------- .../logging/rsyslog-logging/cluster-events.md | 47 ---------------- .../logging/rsyslog-logging/node-events.md | 39 -------------- .../logging/rsyslog-logging/user-events.md | 23 -------- 6 files changed, 15 insertions(+), 172 deletions(-) delete mode 100644 content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md delete mode 100644 content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md delete mode 100644 content/operate/rs/clusters/logging/rsyslog-logging/node-events.md delete mode 100644 content/operate/rs/clusters/logging/rsyslog-logging/user-events.md diff --git a/content/operate/rs/clusters/logging/alerts-events.md b/content/operate/rs/clusters/logging/alerts-events.md index 78378023a1..423d622aeb 100644 --- a/content/operate/rs/clusters/logging/alerts-events.md +++ b/content/operate/rs/clusters/logging/alerts-events.md @@ -8,6 +8,11 @@ categories: description: Logged alerts and events linkTitle: Alerts and events weight: 50 +aliases: + - /operate/rs/clusters/logging/rsyslog-logging/cluster-events/ + - /operate/rs/clusters/logging/rsyslog-logging/bdb-events/ + - /operate/rs/clusters/logging/rsyslog-logging/node-events/ + - /operate/rs/clusters/logging/rsyslog-logging/user-events/ --- The following alerts and events can appear in `syslog` and the Cluster Manager UI logs. @@ -62,6 +67,7 @@ The following alerts and events can appear in `syslog` and the Cluster Manager U | node_remove_request | | info | node event | | ocsp_query_failed | Failed querying OCSP server | True: error, False: info | cluster alert | | ocsp_status_revoked | OCSP status revoked | True: error, False: info | cluster alert | +| oom_err | | error | bdb event; Replica of - Replication source/target out of memory | | persistent_storage | Persistent storage has reached % of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | | ram_dataset_overhead | RAM Dataset overhead in a shard has reached % of its RAM limit | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | | ram_overcommit | Cluster capacity is less than total memory allocated to its databases | True: error, False: info | cluster alert | diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/_index.md b/content/operate/rs/clusters/logging/rsyslog-logging/_index.md index b70d8d1899..19bc977df6 100644 --- a/content/operate/rs/clusters/logging/rsyslog-logging/_index.md +++ b/content/operate/rs/clusters/logging/rsyslog-logging/_index.md @@ -29,7 +29,7 @@ All log entries displayed in the Cluster Manager UI are also written to `syslog` Log entries are categorized into events and alerts. Both types of entries appear in the logs, but alert log entries also include a boolean `"state"` parameter that indicates whether the alert is enabled or disabled. -Log entries include information about the specific event that occurred. See the log entry tables for [clusters]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/cluster-events" >}}), [databases]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/bdb-events" >}}), [nodes]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}), and [users]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/user-events" >}}) for more details. +Log entries include information about the specific event that occurred. See the log entry tables for [alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. ### Severity @@ -66,13 +66,13 @@ The log entries have the following basic structure: - **process id­**: The ID of the logging process - **list of key-value pairs in any order**:­ A list of key-value pairs that describe the specific event. They can appear in any order. Some key­-value pairs are always shown, and some appear depending on the specific event. - **Key-­value pairs that always appear:** - - `"type"`: A unique code­ name for the logged event. For the list of codenames, see the logged events and alerts tables for [clusters]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/cluster-events" >}}), [databases]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/bdb-events" >}}), [nodes]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}), and [users]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/user-events" >}}). + - `"type"`: A unique code­ name for the logged event. For the list of codenames, see the [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) tables. - `"object"`: Defines the object type and ID (if relevant) of the object this event relates to, such as cluster, node with ID, BDB with ID, etc. Has the format of `[:]`. - `"time"`: Unix epoch time but can be ignored in this context. - **Key-­value pairs that might appear depending on the specific entry:** - `"state"`: A boolean where `true` means the alert is enabled, and `false` means the alert is disabled. This is only relevant for alert log entries. - `"global_threshold"`: The value of a threshold for alerts related to cluster or node objects. - - `"threshold"`: The value of a threshold for [alerts related to a BDB object]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/bdb-events" >}}). + - `"threshold"`: The value of a threshold for alerts related to a BDB object ## Log entry samples @@ -108,7 +108,7 @@ In this example, the storage utilization on node 1 reached the value of ~90%, wh - `"object":"node:1"`­ - The object related to this alert - `"state":true­` - Current state of the alert - `"time":1434282560­` - Can be ignored -- `"type":"ephemeral_storage"` - The code name of this specific event. See [logged node alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}) for more details. +- `"type":"ephemeral_storage"` - The code name of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. #### "Alert off" log entry sample @@ -138,7 +138,7 @@ This log entry is an example of when the alert for the node with ID 1 "Ephemeral - `"object":"node:1"` -­ The object related to this alert - `"state":false­` - Current state of the alert - `"time":1434283480­` - Can be ignored -- `"type":"ephemeral_storage"` -­ The code name identifier of this specific event. See [logged node alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}) for more details. +- `"type":"ephemeral_storage"` -­ The code name identifier of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. ### Odd number of nodes with a minimum of three nodes alert @@ -168,7 +168,7 @@ This log entry is an example of when the alert for "True high availability requi - `"state":true` -­ Current state of the alert - `"time":1434284700­` - Can be ignored - `"node_count":1­` - The number of nodes in the cluster -- `"type":"even_node_count"­` - The code name identifier of this specific event. See [logged cluster alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/cluster-events" >}}) for more details. +- `"type":"even_node_count"­` - The code name identifier of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. #### "Alert off" log entry sample @@ -196,7 +196,7 @@ This log entry is an example of when the alert for "True high availability requi - `"state":false­` - Current state of the alert - `"time":1434285200­` - Can be ignored - `"node_count":3­` - The number of nodes in the cluster -- `"type":"even_node_count"` -­ The code name of this specific event. See [logged cluster alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/cluster-events" >}}) for more details. +- `"type":"even_node_count"` -­ The code name of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. ### Node has insufficient disk space for AOF rewrite @@ -235,7 +235,7 @@ This log entry is an example of when the alert for "Node has insufficient disk s - `"state":true­` - Current state of the alert - `"time":1434365483` -­ Can be ignored - `"disk":705667072­` - The total size in bytes of the persistent storage -- `"type":"insufficient_disk_aofrw"­` - The code name of this specific event. See [logged node alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}) for more details. +- `"type":"insufficient_disk_aofrw"­` - The code name of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. #### "Alert off" log entry sample @@ -268,4 +268,4 @@ daemon.info: Jun 15 13:51:11 node1 event_log[34252]: - `"state":false­` - Current state of the alert - `"time":1434365471­` - Can be ignored - `"disk":705667072­` - The total size in bytes of the persistent storage -- `"type":"insufficient_disk_aofrw"`­ - The code name of this specific event. See [logged node alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}) for more details. +- `"type":"insufficient_disk_aofrw"`­ - The code name of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md deleted file mode 100644 index 45e858dfc2..0000000000 --- a/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -Title: Database alert and event logs -alwaysopen: false -categories: -- docs -- operate -- rs -description: Logged database alerts and events -linkTitle: Database alerts/events -weight: 50 ---- - -The following database (BDB) alerts and events can appear in `syslog`. - -## UI alerts - -Logged alerts that appear in the UI: - -| Alert code name | Alert as shown in the UI | Severity | Notes | -|-----------------|--------------------------|----------|-------| -| backup_delayed | Periodic backup has been delayed for longer than `` minutes | true: warning
false: info | Has threshold parameter in the data section of the log entry. | -| high_latency | Latency is higher than `` milliseconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| high_syncer_lag | Replica of - sync lag is higher than `` seconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| high_throughput | Throughput is higher than `` RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| low_throughput | Throughput is lower than `` RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| ram_dataset_overhead | RAM Dataset overhead in a shard has reached ``% of its RAM limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| ram_values | Percent of values in a shard’s RAM is lower than ``% of its key count | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| shard_num_ram_values | Number of values in a shard’s RAM is lower than `` values | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| size | Dataset size has reached ``% of the memory limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| syncer_connection_error | Replica of - database unable to sync with source | error | | -| syncer_general_error | Replica of - database unable to sync with source | error | | - -## UI events - -Logged events that appear in the UI: - -| Event code name | Event as shown in the UI | Severity | Notes | -|-----------------|--------------------------|----------|-------| -| authentication_err | | error | Replica of - Error authenticating with the source database | -| backup_failed | | error | | -| backup_started | | info | | -| backup_succeeded | | info | | -| bdb_created | | info | | -| bdb_deleted | | info | | -| bdb_updated | | info | Indicates that a BDB configuration has been updated | -| compression_unsup_err | | error | Replica of - Compression not supported by sync destination | -| crossslot_err | | error | Replica of - Sharded destination does not support operation executed on source | -| export_failed | | error | | -| export_started | | info | | -| export_succeeded | | info | | -| import_failed | | error | | -| import_started | | info | | -| import_succeeded | | info | | -| oom_err | | error | Replica of - Replication source/target out of memory | \ No newline at end of file diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md deleted file mode 100644 index b868cfd05f..0000000000 --- a/content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -Title: Cluster alert and event logs -alwaysopen: false -categories: -- docs -- operate -- rs -description: Logged cluster alerts and events -linkTitle: Cluster alerts/events -weight: 50 ---- - -The following cluster alerts and events can appear in `syslog`. - -## UI alerts - -Logged alerts that appear in the UI: - -| Alert code name | Alert as shown in the UI | Severity | Notes | -|-----------------|--------------------------|----------|-------| -even_node_count | True high availability requires an odd number of nodes with a minimum of three nodes | true: warning
false: info | -inconsistent_redis_sw | Not all databases are running the same source available version | true: warning
false: info | -inconsistent_rl_sw | Not all nodes in the cluster are running the same Redis Enterprise Cluster version | true: warning
false: info | -internal_bdb | Issues with internal cluster databases | true: warning
false: info | -multiple_nodes_down | Multiple cluster nodes are down - this might cause data loss | true: warning
false: info | -ram_overcommit | Cluster capacity is less than total memory allocated to its databases | true: error
false: info | -too_few_nodes_for_replication | Database replication requires at least two nodes in cluster | true: warning
false: info | - -## UI events - -Logged events that appear in the UI: - -| Event code name | Event as shown in the UI | Severity | Notes | -|-----------------|--------------------------|----------|-------| -| cluster_updated | | info | Indicates that cluster settings have been updated | | -| license_added | | info | | -| license_deleted | | info | | -| license_updated | | info | | -| node_joined | Node joined | info | | -| node_remove_abort_completed | Node removed | info | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| node_remove_abort_failed | Node removed | error | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| node_remove_completed | Node removed | info | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| node_remove_failed | Node removed | error | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| rebalance_abort_completed | Nodes rebalanced | info | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| rebalance_abort_failed | Nodes rebalanced | error | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| rebalance_completed | Nodes rebalanced | info | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| rebalance_failed | Nodes rebalanced | error | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/node-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/node-events.md deleted file mode 100644 index ab5fd5186f..0000000000 --- a/content/operate/rs/clusters/logging/rsyslog-logging/node-events.md +++ /dev/null @@ -1,39 +0,0 @@ ---- -Title: Node alert and event logs -alwaysopen: false -categories: -- docs -- operate -- rs -description: Logged node alerts and events -linkTitle: Node alerts/events -weight: 50 ---- - -The following node alerts and events can appear in `syslog`. - -## UI alerts - -Logged alerts that appear in the UI - -| Alert code name | Alert as shown in the UI | Severity | Notes | -|-----------------|--------------------------|----------|-------| -aof_slow_disk_io | Redis performance is degraded as result of disk I/O limits | true: error
false: info | -cpu_utilization | CPU utilization has reached % | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. -ephemeral_storage | Ephemeral storage has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | -failed | Node failed | critical | -free_flash | Flash storage has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. -insufficient_disk_aofrw | Node has insufficient disk space for AOF rewrite | true: error
false: info | -memory | Node memory has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | -net_throughput | Network throughput has reached MB/s | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | -persistent_storage | Persistent storage has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | - -## Non-UI events - -Logged events that do not appear in the UI - -| Event code name | Severity | Notes | -|-----------------|----------|-------| -| checks_error | error | Indicates that one or more node checks have failed | -| node_abort_remove_request | info | | -| node_remove_request | info | | \ No newline at end of file diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/user-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/user-events.md deleted file mode 100644 index 52e1d57515..0000000000 --- a/content/operate/rs/clusters/logging/rsyslog-logging/user-events.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -Title: User event logs -alwaysopen: false -categories: -- docs -- operate -- rs -description: Logged user events -linkTitle: User events -weight: 50 ---- - -The following user events can appear in `syslog`. - -## UI events - -Logged events that appear in the UI: - -| Event code name | Event as shown in the UI | Severity | Notes | -|-----------------|--------------------------|----------|-------| -| user_created | | info | | -| user_deleted | | info | | -| user_updated | | info | Indicates that a user configuration has been updated | From dcef49a2582afd3760ecc0ed1c6ac069b17bb8ad Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 26 Jul 2024 16:57:45 -0500 Subject: [PATCH 5/6] DOC-3269 Change DB latency threshold alert to milliseconds and fixed formatting --- .../rs/clusters/logging/alerts-events.md | 30 +++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/content/operate/rs/clusters/logging/alerts-events.md b/content/operate/rs/clusters/logging/alerts-events.md index 423d622aeb..58a856307b 100644 --- a/content/operate/rs/clusters/logging/alerts-events.md +++ b/content/operate/rs/clusters/logging/alerts-events.md @@ -21,7 +21,7 @@ The following alerts and events can appear in `syslog` and the Cluster Manager U |-----------------------------------|----------------------------------------------------------------|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------| | aof_slow_disk_io | Redis performance is degraded as a result of disk I/O limits | True: error, False: info | node alert | | authentication_err | | error | bdb event; Replica of - error authenticating with the source database | -| backup_delayed | Periodic backup has been delayed for longer than minutes | True: warning, False: info | bdb alert; Has threshold parameter in the data: section of the log entry. | +| backup_delayed | Periodic backup has been delayed for longer than `` minutes | True: warning, False: info | bdb alert; Has threshold parameter in the data: section of the log entry. | | backup_failed | | error | bdb event | | backup_started | | info | bdb event | | backup_succeeded | | info | bdb event | @@ -32,17 +32,17 @@ The following alerts and events can appear in `syslog` and the Cluster Manager U | cluster_updated | | info | cluster event; Indicates that cluster settings have been updated | | compression_unsup_err | | error | bdb event; Replica of - Compression not supported by sync destination | | crossslot_err | | error | bdb event; Replica of - sharded destination does not support operation executed on source | -| cpu_utilization | CPU utilization has reached % | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| cpu_utilization | CPU utilization has reached ``% | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | | even_node_count | True high availability requires an odd number of nodes | True: warning, False: info | cluster alert | -| ephemeral_storage | Ephemeral storage has reached % of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| ephemeral_storage | Ephemeral storage has reached ``% of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | | export_failed | | error | bdb event | | export_started | | info | bdb event | | export_succeeded | | info | bdb event | | failed | Node failed | critical | node alert | -| free_flash | Flash storage has reached % of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | -| high_latency | Latency is higher than msec | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | -| high_syncer_lag | Replica of - sync lag is higher than seconds | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | -| high_throughput | Throughput is higher than RPS (requests per second) | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| free_flash | Flash storage has reached ``% of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| high_latency | Latency is higher than `` milliseconds | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| high_syncer_lag | Replica of - sync lag is higher than `` seconds | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| high_throughput | Throughput is higher than `` RPS (requests per second) | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | | import_failed | | error | bdb event | | import_started | | info | bdb event | | import_succeeded | | info | bdb event | @@ -53,10 +53,10 @@ The following alerts and events can appear in `syslog` and the Cluster Manager U | license_added | | info | cluster event | | license_deleted | | info | cluster event | | license_updated | | info | cluster event | -| low_throughput | Throughput is lower than RPS (requests per second) | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | -| memory | Node memory has reached % of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| low_throughput | Throughput is lower than `` RPS (requests per second) | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| memory | Node memory has reached ``% of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | | multiple_nodes_down | Multiple cluster nodes are down - this might cause data loss | True: warning, False: info | cluster alert | -| net_throughput | Network throughput has reached MB/s | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| net_throughput | Network throughput has reached ``MB/s | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | | node_abort_remove_request | | info | node event | | node_joined | Node joined | info | cluster event | | node_operation_failed | Node operation failed | error | cluster event | @@ -68,12 +68,12 @@ The following alerts and events can appear in `syslog` and the Cluster Manager U | ocsp_query_failed | Failed querying OCSP server | True: error, False: info | cluster alert | | ocsp_status_revoked | OCSP status revoked | True: error, False: info | cluster alert | | oom_err | | error | bdb event; Replica of - Replication source/target out of memory | -| persistent_storage | Persistent storage has reached % of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | -| ram_dataset_overhead | RAM Dataset overhead in a shard has reached % of its RAM limit | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| persistent_storage | Persistent storage has reached ``% of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| ram_dataset_overhead | RAM Dataset overhead in a shard has reached ``% of its RAM limit | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | | ram_overcommit | Cluster capacity is less than total memory allocated to its databases | True: error, False: info | cluster alert | -| ram_values | Percent of values in a shard's RAM is lower than % of its key count | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | -| shard_num_ram_values | Number of values in a shard's RAM is lower than values | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | -| size | Dataset size has reached % of the memory limit | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| ram_values | Percent of values in a shard's RAM is lower than ``% of its key count | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| shard_num_ram_values | Number of values in a shard's RAM is lower than `` values | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| size | Dataset size has reached ``% of the memory limit | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | | syncer_connection_error | | error | bdb alert | | syncer_general_error | | error | bdb alert | | too_few_nodes_for_replication | Database replication requires at least two nodes in cluster | True: warning, False: info | cluster alert | From fe36d4eff981e22502ba977667ddf09979e6eb7e Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 13 Sep 2024 16:08:19 -0500 Subject: [PATCH 6/6] Feedback update - remove duplicated page description --- .../operate/rs/clusters/logging/rsyslog-logging/_index.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/_index.md b/content/operate/rs/clusters/logging/rsyslog-logging/_index.md index 19bc977df6..42bbc9d109 100644 --- a/content/operate/rs/clusters/logging/rsyslog-logging/_index.md +++ b/content/operate/rs/clusters/logging/rsyslog-logging/_index.md @@ -11,11 +11,6 @@ hideListLinks: true linktitle: rsyslog weight: $weight --- -This document explains the structure of Redis Enterprise Software log entries in `rsyslog` and how to use these log entries to identify events. - -{{}} -You can also [secure your logs]({{< relref "/operate/rs/clusters/logging/log-security.md" >}}) with a remote logging server and log rotation. -{{}} ## Log concepts @@ -25,6 +20,8 @@ In some cases, a single action, such as removing a node from the cluster, may ac All log entries displayed in the Cluster Manager UI are also written to `syslog`. You can configure `rsyslog` to monitor `syslog`. Enabled alerts are logged to `syslog` and appear with other log entries. +You can also [manage your logs]({{< relref "/operate/rs/clusters/logging/log-security" >}}) with a remote logging server and log rotation. + ### Types of log entries Log entries are categorized into events and alerts. Both types of entries appear in the logs, but alert log entries also include a boolean `"state"` parameter that indicates whether the alert is enabled or disabled.