diff --git a/content/operate/rs/clusters/logging/alerts-events.md b/content/operate/rs/clusters/logging/alerts-events.md new file mode 100644 index 0000000000..58a856307b --- /dev/null +++ b/content/operate/rs/clusters/logging/alerts-events.md @@ -0,0 +1,82 @@ +--- +Title: Alerts and events +alwaysopen: false +categories: +- docs +- operate +- rs +description: Logged alerts and events +linkTitle: Alerts and events +weight: 50 +aliases: + - /operate/rs/clusters/logging/rsyslog-logging/cluster-events/ + - /operate/rs/clusters/logging/rsyslog-logging/bdb-events/ + - /operate/rs/clusters/logging/rsyslog-logging/node-events/ + - /operate/rs/clusters/logging/rsyslog-logging/user-events/ +--- + +The following alerts and events can appear in `syslog` and the Cluster Manager UI logs. + +| Alert/Event | UI message | Severity | Notes | +|-----------------------------------|----------------------------------------------------------------|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------| +| aof_slow_disk_io | Redis performance is degraded as a result of disk I/O limits | True: error, False: info | node alert | +| authentication_err | | error | bdb event; Replica of - error authenticating with the source database | +| backup_delayed | Periodic backup has been delayed for longer than `` minutes | True: warning, False: info | bdb alert; Has threshold parameter in the data: section of the log entry. | +| backup_failed | | error | bdb event | +| backup_started | | info | bdb event | +| backup_succeeded | | info | bdb event | +| bdb_created | | info | bdb event | +| bdb_deleted | | info | bdb event | +| bdb_updated | | info | bdb event; Indicates that a bdb configuration has been updated | +| checks_error | | error | node event; Indicates that one or more node checks have failed | +| cluster_updated | | info | cluster event; Indicates that cluster settings have been updated | +| compression_unsup_err | | error | bdb event; Replica of - Compression not supported by sync destination | +| crossslot_err | | error | bdb event; Replica of - sharded destination does not support operation executed on source | +| cpu_utilization | CPU utilization has reached ``% | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| even_node_count | True high availability requires an odd number of nodes | True: warning, False: info | cluster alert | +| ephemeral_storage | Ephemeral storage has reached ``% of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| export_failed | | error | bdb event | +| export_started | | info | bdb event | +| export_succeeded | | info | bdb event | +| failed | Node failed | critical | node alert | +| free_flash | Flash storage has reached ``% of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| high_latency | Latency is higher than `` milliseconds | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| high_syncer_lag | Replica of - sync lag is higher than `` seconds | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| high_throughput | Throughput is higher than `` RPS (requests per second) | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| import_failed | | error | bdb event | +| import_started | | info | bdb event | +| import_succeeded | | info | bdb event | +| inconsistent_redis_sw | Not all databases are running the same open source version | True: warning, False: info | cluster alert | +| inconsistent_rl_sw | Not all nodes in the cluster are running the same Redis Labs Enterprise Cluster version | True: warning, False: info | cluster alert | +| insufficient_disk_aofrw | Node has insufficient disk space for AOF rewrite | True: error, False: info | node alert | +| internal_bdb | Issues with internal cluster databases | True: warning, False: info | cluster alert | +| license_added | | info | cluster event | +| license_deleted | | info | cluster event | +| license_updated | | info | cluster event | +| low_throughput | Throughput is lower than `` RPS (requests per second) | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| memory | Node memory has reached ``% of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| multiple_nodes_down | Multiple cluster nodes are down - this might cause data loss | True: warning, False: info | cluster alert | +| net_throughput | Network throughput has reached ``MB/s | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| node_abort_remove_request | | info | node event | +| node_joined | Node joined | info | cluster event | +| node_operation_failed | Node operation failed | error | cluster event | +| node_remove_abort_completed | Node removed | info | cluster event; The remove node is a process that can fail and can also be aborted. If aborted, the abort can succeed or fail. | +| node_remove_abort_failed | Node removed | error | cluster event; The remove node is a process that can fail and can also be aborted. If aborted, the abort can succeed or fail. | +| node_remove_completed | Node removed | info | cluster event; The remove node is a process that can fail and can also be aborted. If aborted, the abort can succeed or fail. | +| node_remove_failed | Node removed | error | cluster event; The remove node is a process that can fail and can also be aborted. If aborted, the abort can succeed or fail. | +| node_remove_request | | info | node event | +| ocsp_query_failed | Failed querying OCSP server | True: error, False: info | cluster alert | +| ocsp_status_revoked | OCSP status revoked | True: error, False: info | cluster alert | +| oom_err | | error | bdb event; Replica of - Replication source/target out of memory | +| persistent_storage | Persistent storage has reached ``% of its capacity | True: warning, False: info | node alert; Has global_threshold parameter in the key/value section of the log entry. | +| ram_dataset_overhead | RAM Dataset overhead in a shard has reached ``% of its RAM limit | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| ram_overcommit | Cluster capacity is less than total memory allocated to its databases | True: error, False: info | cluster alert | +| ram_values | Percent of values in a shard's RAM is lower than ``% of its key count | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| shard_num_ram_values | Number of values in a shard's RAM is lower than `` values | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| size | Dataset size has reached ``% of the memory limit | True: warning, False: info | bdb alert; Has threshold parameter in the key/value section of the log entry. | +| syncer_connection_error | | error | bdb alert | +| syncer_general_error | | error | bdb alert | +| too_few_nodes_for_replication | Database replication requires at least two nodes in cluster | True: warning, False: info | cluster alert | +| user_created | | info | user event | +| user_deleted | | info | user event | +| user_updated | | info | user event; Indicates that a user configuration has been updated | diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/_index.md b/content/operate/rs/clusters/logging/rsyslog-logging/_index.md index b70d8d1899..42bbc9d109 100644 --- a/content/operate/rs/clusters/logging/rsyslog-logging/_index.md +++ b/content/operate/rs/clusters/logging/rsyslog-logging/_index.md @@ -11,11 +11,6 @@ hideListLinks: true linktitle: rsyslog weight: $weight --- -This document explains the structure of Redis Enterprise Software log entries in `rsyslog` and how to use these log entries to identify events. - -{{}} -You can also [secure your logs]({{< relref "/operate/rs/clusters/logging/log-security.md" >}}) with a remote logging server and log rotation. -{{}} ## Log concepts @@ -25,11 +20,13 @@ In some cases, a single action, such as removing a node from the cluster, may ac All log entries displayed in the Cluster Manager UI are also written to `syslog`. You can configure `rsyslog` to monitor `syslog`. Enabled alerts are logged to `syslog` and appear with other log entries. +You can also [manage your logs]({{< relref "/operate/rs/clusters/logging/log-security" >}}) with a remote logging server and log rotation. + ### Types of log entries Log entries are categorized into events and alerts. Both types of entries appear in the logs, but alert log entries also include a boolean `"state"` parameter that indicates whether the alert is enabled or disabled. -Log entries include information about the specific event that occurred. See the log entry tables for [clusters]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/cluster-events" >}}), [databases]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/bdb-events" >}}), [nodes]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}), and [users]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/user-events" >}}) for more details. +Log entries include information about the specific event that occurred. See the log entry tables for [alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. ### Severity @@ -66,13 +63,13 @@ The log entries have the following basic structure: - **process id­**: The ID of the logging process - **list of key-value pairs in any order**:­ A list of key-value pairs that describe the specific event. They can appear in any order. Some key­-value pairs are always shown, and some appear depending on the specific event. - **Key-­value pairs that always appear:** - - `"type"`: A unique code­ name for the logged event. For the list of codenames, see the logged events and alerts tables for [clusters]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/cluster-events" >}}), [databases]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/bdb-events" >}}), [nodes]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}), and [users]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/user-events" >}}). + - `"type"`: A unique code­ name for the logged event. For the list of codenames, see the [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) tables. - `"object"`: Defines the object type and ID (if relevant) of the object this event relates to, such as cluster, node with ID, BDB with ID, etc. Has the format of `[:]`. - `"time"`: Unix epoch time but can be ignored in this context. - **Key-­value pairs that might appear depending on the specific entry:** - `"state"`: A boolean where `true` means the alert is enabled, and `false` means the alert is disabled. This is only relevant for alert log entries. - `"global_threshold"`: The value of a threshold for alerts related to cluster or node objects. - - `"threshold"`: The value of a threshold for [alerts related to a BDB object]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/bdb-events" >}}). + - `"threshold"`: The value of a threshold for alerts related to a BDB object ## Log entry samples @@ -108,7 +105,7 @@ In this example, the storage utilization on node 1 reached the value of ~90%, wh - `"object":"node:1"`­ - The object related to this alert - `"state":true­` - Current state of the alert - `"time":1434282560­` - Can be ignored -- `"type":"ephemeral_storage"` - The code name of this specific event. See [logged node alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}) for more details. +- `"type":"ephemeral_storage"` - The code name of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. #### "Alert off" log entry sample @@ -138,7 +135,7 @@ This log entry is an example of when the alert for the node with ID 1 "Ephemeral - `"object":"node:1"` -­ The object related to this alert - `"state":false­` - Current state of the alert - `"time":1434283480­` - Can be ignored -- `"type":"ephemeral_storage"` -­ The code name identifier of this specific event. See [logged node alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}) for more details. +- `"type":"ephemeral_storage"` -­ The code name identifier of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. ### Odd number of nodes with a minimum of three nodes alert @@ -168,7 +165,7 @@ This log entry is an example of when the alert for "True high availability requi - `"state":true` -­ Current state of the alert - `"time":1434284700­` - Can be ignored - `"node_count":1­` - The number of nodes in the cluster -- `"type":"even_node_count"­` - The code name identifier of this specific event. See [logged cluster alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/cluster-events" >}}) for more details. +- `"type":"even_node_count"­` - The code name identifier of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. #### "Alert off" log entry sample @@ -196,7 +193,7 @@ This log entry is an example of when the alert for "True high availability requi - `"state":false­` - Current state of the alert - `"time":1434285200­` - Can be ignored - `"node_count":3­` - The number of nodes in the cluster -- `"type":"even_node_count"` -­ The code name of this specific event. See [logged cluster alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/cluster-events" >}}) for more details. +- `"type":"even_node_count"` -­ The code name of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. ### Node has insufficient disk space for AOF rewrite @@ -235,7 +232,7 @@ This log entry is an example of when the alert for "Node has insufficient disk s - `"state":true­` - Current state of the alert - `"time":1434365483` -­ Can be ignored - `"disk":705667072­` - The total size in bytes of the persistent storage -- `"type":"insufficient_disk_aofrw"­` - The code name of this specific event. See [logged node alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}) for more details. +- `"type":"insufficient_disk_aofrw"­` - The code name of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. #### "Alert off" log entry sample @@ -268,4 +265,4 @@ daemon.info: Jun 15 13:51:11 node1 event_log[34252]: - `"state":false­` - Current state of the alert - `"time":1434365471­` - Can be ignored - `"disk":705667072­` - The total size in bytes of the persistent storage -- `"type":"insufficient_disk_aofrw"`­ - The code name of this specific event. See [logged node alerts and events]({{< relref "/operate/rs/clusters/logging/rsyslog-logging/node-events" >}}) for more details. +- `"type":"insufficient_disk_aofrw"`­ - The code name of this specific event. See [logged alerts and events]({{< relref "/operate/rs/clusters/logging/alerts-events" >}}) for more details. diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md deleted file mode 100644 index 8782c4a3b6..0000000000 --- a/content/operate/rs/clusters/logging/rsyslog-logging/bdb-events.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -Title: Database alert and event logs -alwaysopen: false -categories: -- docs -- operate -- rs -description: Logged database alerts and events -linkTitle: Database alerts/events -weight: 50 ---- - -The following database (BDB) alerts and events can appear in `syslog`. - -## UI alerts - -Logged alerts that appear in the UI - -| Alert code name | Alert as shown in the UI | Severity | Notes | -|-----------------|--------------------------|----------|-------| -backup_delayed | Periodic backup has been delayed for longer than minutes | true: warning
false: info | Has threshold parameter in the data section of the log entry. -high_latency | Latency is higher than msec | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -high_syncer_lag | Replica of - sync lag is higher than seconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -high_throughput | Throughput is higher than RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -low_throughput | Throughput is lower than RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -ram_dataset_overhead | RAM Dataset overhead in a shard has reached % of its RAM limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -ram_values | Percent of values in a shard’s RAM is lower than % of its key count | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -shard_num_ram_values | Number of values in a shard’s RAM is lower than values | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -size | Dataset size has reached % of the memory limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. -syncer_connection_error | Replica of - database unable to sync with source | error | -syncer_general_error | Replica of - database unable to sync with source | error | - -## Non-UI events - -Logged events that do not appear in the UI - -| Event code name | Severity | Notes | -|-----------------|----------|-------| -| authentication_err | error | Replica of - Error authenticating with the source database | -| backup_failed | error | | -| backup_started | info | | -| backup_succeeded | info | | -| bdb_created | info | | -| bdb_deleted | info | | -| bdb_updated | info | Indicates that a BDB configuration has been updated | -| compression_unsup_err | error | Replica of - Compression not supported by sync destination | -| crossslot_err | error | Replica of - Sharded destination does not support operation executed on source | -| export_failed | error | | -| export_started | info | | -| export_succeeded | info | | -| import_failed | error | | -| import_started | info | | -| import_succeeded | info | | -| oom_err | error | Replica of - Replication source/target out of memory | \ No newline at end of file diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md deleted file mode 100644 index 45d38efbdf..0000000000 --- a/content/operate/rs/clusters/logging/rsyslog-logging/cluster-events.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -Title: Cluster alert and event logs -alwaysopen: false -categories: -- docs -- operate -- rs -description: Logged cluster alerts and events -linkTitle: Cluster alerts/events -weight: 50 ---- - -The following cluster alerts and events can appear in `syslog`. - -## UI alerts - -Logged alerts that appear in the UI - -| Alert code name | Alert as shown in the UI | Severity | Notes | -|-----------------|--------------------------|----------|-------| -even_node_count | True high availability requires an odd number of nodes with a minimum of three nodes | true: warning
false: info | -inconsistent_redis_sw | Not all databases are running the same source available version | true: warning
false: info | -inconsistent_rl_sw | Not all nodes in the cluster are running the same Redis Enterprise Cluster version | true: warning
false: info | -internal_bdb | Issues with internal cluster databases | true: warning
false: info | -multiple_nodes_down | Multiple cluster nodes are down - this might cause data loss | true: warning
false: info | -ram_overcommit | Cluster capacity is less than total memory allocated to its databases | true: error
false: info | -too_few_nodes_for_replication | Database replication requires at least two nodes in cluster | true: warning
false: info | - -## UI events - -Logged events that appear in the UI - -| Event code name | Event as shown in the UI | Severity | Notes | -|-----------------|--------------------------|----------|-------| -| node_joined | Node joined | info | | -| node_remove_abort_completed | Node removed | info | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| node_remove_abort_failed | Node removed | error | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| node_remove_completed | Node removed | info | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| node_remove_failed | Node removed | error | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| rebalance_abort_completed | Nodes rebalanced | info | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| rebalance_abort_failed | Nodes rebalanced | error | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| rebalance_completed | Nodes rebalanced | info | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | -| rebalance_failed | Nodes rebalanced | error | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | - -## Non-UI events - -Logged events that do not appear in the UI - -| Event code name | Severity | Notes | -|-----------------|----------|-------| -| cluster_updated | info | Indicates that cluster settings have been updated | -| license_added | info | | -| license_deleted | info | | -| license_updated | info | | \ No newline at end of file diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/node-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/node-events.md deleted file mode 100644 index ab5fd5186f..0000000000 --- a/content/operate/rs/clusters/logging/rsyslog-logging/node-events.md +++ /dev/null @@ -1,39 +0,0 @@ ---- -Title: Node alert and event logs -alwaysopen: false -categories: -- docs -- operate -- rs -description: Logged node alerts and events -linkTitle: Node alerts/events -weight: 50 ---- - -The following node alerts and events can appear in `syslog`. - -## UI alerts - -Logged alerts that appear in the UI - -| Alert code name | Alert as shown in the UI | Severity | Notes | -|-----------------|--------------------------|----------|-------| -aof_slow_disk_io | Redis performance is degraded as result of disk I/O limits | true: error
false: info | -cpu_utilization | CPU utilization has reached % | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. -ephemeral_storage | Ephemeral storage has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | -failed | Node failed | critical | -free_flash | Flash storage has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. -insufficient_disk_aofrw | Node has insufficient disk space for AOF rewrite | true: error
false: info | -memory | Node memory has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | -net_throughput | Network throughput has reached MB/s | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | -persistent_storage | Persistent storage has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | - -## Non-UI events - -Logged events that do not appear in the UI - -| Event code name | Severity | Notes | -|-----------------|----------|-------| -| checks_error | error | Indicates that one or more node checks have failed | -| node_abort_remove_request | info | | -| node_remove_request | info | | \ No newline at end of file diff --git a/content/operate/rs/clusters/logging/rsyslog-logging/user-events.md b/content/operate/rs/clusters/logging/rsyslog-logging/user-events.md deleted file mode 100644 index 4a46a9a04d..0000000000 --- a/content/operate/rs/clusters/logging/rsyslog-logging/user-events.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -Title: User event logs -alwaysopen: false -categories: -- docs -- operate -- rs -description: Logged user events -linkTitle: User events -weight: 50 ---- - -The following user events can appear in `syslog`. - -## Non-UI events - -Logged events that do not appear in the UI - -| Event code name | Severity | Notes | -|-----------------|----------|-------| -| user_created | info | | -| user_deleted | info | | -| user_updated | info | Indicates that a user configuration has been updated |