Skip to content

Commit 45dcc18

Browse files
authored
Merge pull request #67059 from kalexand-rh/4.14-fixes
fixing incorrect anchor formats
2 parents 9e882ad + db6da73 commit 45dcc18

8 files changed

+16
-16
lines changed

microshift_troubleshooting/microshift-troubleshoot-backup-restore.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
:_content-type: ASSEMBLY
2-
[id="microshift-troubleshoot-backup-restore.adoc"]
2+
[id="microshift-troubleshoot-backup-restore"]
33
= Troubleshooting data backup and restore
44
include::_attributes/attributes-microshift.adoc[]
55
:context: microshift-troubleshoot-data-backup-and-restore

modules/nodes-dashboard-using-identify-critical-cpu-kubelet.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
// * nodes/nodes-dashboard-using.adoc
44

55
:_content-type: CONCEPT
6-
[id="nodes-dashboard-using-identify-critical-cpu-kubelet.adoc"]
6+
[id="nodes-dashboard-using-identify-critical-cpu-kubelet"]
77
= Nodes with Kubelet system reserved CPU utilization > 50%
88

99
The *Nodes with Kubelet system reserved CPU utilization > 50%* query calculates the percentage of the CPU that the Kubelet system is currently using from system reserved.
@@ -13,6 +13,6 @@ The *Nodes with Kubelet system reserved CPU utilization > 50%* query calculates
1313
sum by (node) (rate(container_cpu_usage_seconds_total{id="/system.slice/kubelet.service"}[5m]) * 100) / sum by (node) (kube_node_status_capacity{resource="cpu"} - kube_node_status_allocatable{resource="cpu"}) >= 50
1414
----
1515

16-
The Kubelet uses the system reserved CPU for its own operations and for running critical system services. For the node's health, it is important to ensure that system reserve CPU usage does not exceed the 50% threshold. Exceeding this limit could indicate heavy utilization or load on the Kubelet, which affects node stability and potentially the performance of the entire Kubernetes cluster.
16+
The Kubelet uses the system reserved CPU for its own operations and for running critical system services. For the node's health, it is important to ensure that system reserve CPU usage does not exceed the 50% threshold. Exceeding this limit could indicate heavy utilization or load on the Kubelet, which affects node stability and potentially the performance of the entire Kubernetes cluster.
1717

1818
If any node is displayed in this metric, the Kubelet and the system overall are under heavy load. You can reduce overload on a particular node by balancing the load across other nodes in the cluster. Check other query metrics under the *Outliers*, *Average durations*, and *Number of operations* categories to gain further insights and take necessary corrective action.

modules/nodes-dashboard-using-identify-critical-cpu.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
// * nodes/nodes-dashboard-using.adoc
44

55
:_content-type: CONCEPT
6-
[id="nodes-dashboard-using-identify-critical-cpu.adoc"]
6+
[id="nodes-dashboard-using-identify-critical-cpu"]
77
= Nodes with System Reserved CPU Utilization > 80%
88

99
The *Nodes with system reserved CPU utilization > 80%* query identifies nodes where the system-reserved CPU utilization is more than 80%. The query focuses on the system-reserved capacity to calculate the rate of CPU usage in the last 5 minutes and compares that to the CPU resources available on the nodes. If the ratio exceeds 80%, the node's result is displayed in the metric.
@@ -13,6 +13,6 @@ The *Nodes with system reserved CPU utilization > 80%* query identifies nodes wh
1313
sum by (node) (rate(container_cpu_usage_seconds_total{id="/system.slice"}[5m]) * 100) / sum by (node) (kube_node_status_capacity{resource="cpu"} - kube_node_status_allocatable{resource="cpu"}) >= 80
1414
----
1515

16-
This query indicates a critical level of system-reserved CPU usage, which can lead to resource exhaustion. High system-reserved CPU usage can result in the inability of the system processes (including the Kubelet and CRI-O) to adequately manage resources on the node. This query can indicate excessive system processes or misconfigured CPU allocation.
16+
This query indicates a critical level of system-reserved CPU usage, which can lead to resource exhaustion. High system-reserved CPU usage can result in the inability of the system processes (including the Kubelet and CRI-O) to adequately manage resources on the node. This query can indicate excessive system processes or misconfigured CPU allocation.
1717

1818
Potential corrective measures include rebalancing workloads to other nodes or increasing the CPU resources allocated to the nodes. Investigate the cause of the high system CPU utilization and review the corresponding metrics in the *Outliers*, *Average durations*, and *Number of operations* categories for additional insights into the node's behavior.

modules/nodes-dashboard-using-identify-critical-memory-crio.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
// * nodes/nodes-dashboard-using.adoc
44

55
:_content-type: CONCEPT
6-
[id="nodes-dashboard-using-identify-critical-memory-crio.adoc"]
6+
[id="nodes-dashboard-using-identify-critical-memory-crio"]
77
= Nodes with CRI-O system reserved memory utilization > 50%
88

99
The *Nodes with CRI-O system reserved memory utilization > 50%* query calculates all nodes where the percentage of used memory reserved for the CRI-O system is greater than or equal to 50%. In this case, memory usage is defined by the resident set size (RSS), which is the portion of the CRI-O system's memory held in RAM.
@@ -13,6 +13,6 @@ The *Nodes with CRI-O system reserved memory utilization > 50%* query calculates
1313
sum by (node) (container_memory_rss{id="/system.slice/crio.service"}) / sum by (node) (kube_node_status_capacity{resource="memory"} - kube_node_status_allocatable{resource="memory"}) * 100 >= 50
1414
----
1515

16-
This query helps you monitor the status of memory reserved for the CRI-O system on each node. High utilization could indicate a lack of available resources and potential performance issues. If the memory reserved for the CRI-O system exceeds the advised limit of 50%, it indicates that half of the system reserved memory is being used by CRI-O on a node.
16+
This query helps you monitor the status of memory reserved for the CRI-O system on each node. High utilization could indicate a lack of available resources and potential performance issues. If the memory reserved for the CRI-O system exceeds the advised limit of 50%, it indicates that half of the system reserved memory is being used by CRI-O on a node.
1717

1818
Check memory allocation and usage and assess whether memory resources need to be shifted or increased to prevent possible node instability. You can also examine the metrics under the *Outliers*, *Average durations*, and *Number of operations* categories to gain further insights.

modules/nodes-dashboard-using-identify-critical-memory-kubelet.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
// * nodes/nodes-dashboard-using.adoc
44

55
:_content-type: CONCEPT
6-
[id="nodes-dashboard-using-identify-critical-memory-kubelet.adoc"]
6+
[id="nodes-dashboard-using-identify-critical-memory-kubelet"]
77
= Nodes with Kubelet system reserved memory utilization > 50%
88

99
The *Nodes with Kubelet system reserved memory utilization > 50%* query indicates nodes where the Kubelet's system reserved memory utilization exceeds 50%. The query examines the memory that the Kubelet process itself is consuming on a node.
@@ -13,6 +13,6 @@ The *Nodes with Kubelet system reserved memory utilization > 50%* query indicate
1313
sum by (node) (container_memory_rss{id="/system.slice/kubelet.service"}) / sum by (node) (kube_node_status_capacity{resource="memory"} - kube_node_status_allocatable{resource="memory"}) * 100 >= 50
1414
----
1515

16-
This query helps you identify any possible memory pressure situations in your nodes that could affect the stability and efficiency of node operations. Kubelet memory utilization that consistently exceeds 50% of the system reserved memory, indicate that the system reserved settings are not configured properly and that there is a high risk of the node becoming unstable.
16+
This query helps you identify any possible memory pressure situations in your nodes that could affect the stability and efficiency of node operations. Kubelet memory utilization that consistently exceeds 50% of the system reserved memory, indicate that the system reserved settings are not configured properly and that there is a high risk of the node becoming unstable.
1717

1818
If this metric is highlighted, review your configuration policy and consider adjusting the system reserved settings or the resource limits settings for the Kubelet. Additionally, if your Kubelet memory utilization consistently exceeds half of your total reserved system memory, examine metrics under the *Outliers*, *Average durations*, and *Number of operations* categories to gain further insights for more precise diagnostics.

modules/nodes-dashboard-using-identify-critical-memory.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
// * nodes/nodes-dashboard-using.adoc
44

55
:_content-type: CONCEPT
6-
[id="nodes-dashboard-using-identify-critical-memory.adoc"]
6+
[id="nodes-dashboard-using-identify-critical-memory"]
77
= Nodes with system reserved memory utilization > 80%
88

99
The *Nodes with system reserved memory utilization > 80%* query calculates the percentage of system reserved memory that is utilized for each node. The calculation divides the total resident set size (RSS) by the total memory capacity of the node subtracted from the allocatable memory. RSS is the portion of the system's memory occupied by a process that is held in main memory (RAM). Nodes are flagged if their resulting value equals or exceeds an 80% threshold.
@@ -13,6 +13,6 @@ The *Nodes with system reserved memory utilization > 80%* query calculates the p
1313
sum by (node) (container_memory_rss{id="/system.slice"}) / sum by (node) (kube_node_status_capacity{resource="memory"} - kube_node_status_allocatable{resource="memory"}) * 100 >= 80
1414
----
1515

16-
System reserved memory is crucial for a Kubernetes node as it is utilized to run system daemons and Kubernetes system daemons. System reserved memory utilization that exceeds 80% indicates that the system and Kubernetes daemons are consuming too much memory and can suggest node instability that could affect the performance of running pods. Excessive memory consumption can cause Out-of-Memory (OOM) killers that can terminate critical system processes to free up memory.
16+
System reserved memory is crucial for a Kubernetes node as it is utilized to run system daemons and Kubernetes system daemons. System reserved memory utilization that exceeds 80% indicates that the system and Kubernetes daemons are consuming too much memory and can suggest node instability that could affect the performance of running pods. Excessive memory consumption can cause Out-of-Memory (OOM) killers that can terminate critical system processes to free up memory.
1717

1818
If a node is flagged by this metric, identify which system or Kubernetes processes are consuming excessive memory and take appropriate actions to mitigate the situation. These actions may include scaling back non-critical processes, optimizing program configurations to reduce memory usage, or upgrading node systems to hardware with greater memory capacity. You can also review the metrics under the *Outliers*, *Average durations*, and *Number of operations* categories to gain further insights into node performance.

modules/nodes-dashboard-using-identify-critical-pulls.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
// * nodes/nodes-dashboard-using.adoc
44

55
:_content-type: CONCEPT
6-
[id="nodes-dashboard-using-identify-critical-pulls.adoc"]
6+
[id="nodes-dashboard-using-identify-critical-pulls"]
77
= Failure rate for image pulls in the last hour
88

99
The *Failure rate for image pulls in the last hour* query divides the total number of failed image pulls by the sum of successful and failed image pulls to provide a ratio of failures.
@@ -13,6 +13,6 @@ The *Failure rate for image pulls in the last hour* query divides the total numb
1313
rate(container_runtime_crio_image_pulls_failure_total[1h]) / (rate(container_runtime_crio_image_pulls_success_total[1h]) + rate(container_runtime_crio_image_pulls_failure_total[1h]))
1414
----
1515

16-
Understanding the failure rate of image pulls is crucial for maintaining the health of the node. A high failure rate might indicate networking issues, storage problems, misconfigurations, or other issues that could disrupt pod density and the deployment of new containers.
16+
Understanding the failure rate of image pulls is crucial for maintaining the health of the node. A high failure rate might indicate networking issues, storage problems, misconfigurations, or other issues that could disrupt pod density and the deployment of new containers.
1717

1818
If the outcome of this query is high, investigate possible causes such as network connections, the availability of remote repositories, node storage, and the accuracy of image references. You can also review the metrics under the *Outliers*, *Average durations*, and *Number of operations* categories to gain further insights.

modules/nodes-dashboard-using-identify-critical-top3.adoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,16 @@
33
// * nodes/nodes-dashboard-using.adoc
44

55
:_content-type: CONCEPT
6-
[id="nodes-dashboard-using-identify-critical-top3.adoc"]
6+
[id="nodes-dashboard-using-identify-critical-top3"]
77
= Top 3 containers with the most OOM kills in the last day
88

9-
The *Top 3 containers with the most OOM kills in the last day* query fetches details regarding the top three containers that have experienced the most Out-Of-Memory (OOM) kills in the previous day.
9+
The *Top 3 containers with the most OOM kills in the last day* query fetches details regarding the top three containers that have experienced the most Out-Of-Memory (OOM) kills in the previous day.
1010

1111
.Example default query
1212
----
1313
topk(3, sum(increase(container_runtime_crio_containers_oom_count_total[1d])) by (name))
1414
----
1515

16-
OOM kills force the system to terminate some processes due to low memory. Frequent OOM kills can hinder the functionality of the node and even the entire Kubernetes ecosystem. Containers experiencing frequent OOM kills might be consuming more memory than they should, which causes system instability.
16+
OOM kills force the system to terminate some processes due to low memory. Frequent OOM kills can hinder the functionality of the node and even the entire Kubernetes ecosystem. Containers experiencing frequent OOM kills might be consuming more memory than they should, which causes system instability.
1717

1818
Use this metric to identify containers that are experiencing frequent OOM kills and investigate why these containers are consuming an excessive amount of memory. Adjust the resource allocation if necessary and consider resizing the containers based on their memory usage. You can also review the metrics under the *Outliers*, *Average durations*, and *Number of operations* categories to gain further insights into the health and stability of your nodes.

0 commit comments

Comments
 (0)