From b113d5b72abfcbdb096b81e49cc8cfe2ae46c46c Mon Sep 17 00:00:00 2001 From: 00aixxia00 <28895375+00aixxia00@users.noreply.github.com> Date: Thu, 22 Feb 2024 11:09:41 +0100 Subject: [PATCH 1/4] Create NodeMemoryMajorPagesFaults.md --- .../node/NodeMemoryMajorPagesFaults.md | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 content/runbooks/node/NodeMemoryMajorPagesFaults.md diff --git a/content/runbooks/node/NodeMemoryMajorPagesFaults.md b/content/runbooks/node/NodeMemoryMajorPagesFaults.md new file mode 100644 index 0000000..96dfa73 --- /dev/null +++ b/content/runbooks/node/NodeMemoryMajorPagesFaults.md @@ -0,0 +1,27 @@ +--- +title: NodeMemoryMajorPagesFaults +--- + +## Meaning +Memory major pages are occurring at very high rate at {{ $labels.instance }}, 500 major page faults per second for the last 15 minutes, is currently at {{ printf "%.2f" $value }}. +Please check that there is enough memory available at this instance. + +## Impact + +The high rate of memory major pages faults indicates potential issues with memory management on the instance, which could lead to degraded performance or even service disruptions. + +## Diagnosis + +1. **Check Memory Usage**: Review the memory usage statistics on the instance to determine if memory is being exhausted. +2. **Identify Resource-Intensive Processes**: Identify any processes or applications that are consuming large amounts of memory. +3. **Review System Logs**: Check system logs for any error messages related to memory allocation or paging. +4. **Analyze Historical Data**: Review historical metrics data to identify any recent changes or trends in memory usage. +5. **Check for Memory Leaks**: Investigate for any memory leaks in applications running on the instance. + +## Mitigation + +1. **Increase Memory**: Consider increasing the memory allocation for the instance to provide more resources for applications and processes. +2. **Optimize Applications**: Optimize memory usage within applications to reduce the likelihood of memory exhaustion. +3. **Restart Services**: If possible, restart any services or applications that are consuming excessive memory to free up resources. +4. **Monitor and Tune**: Continuously monitor memory usage and tune system parameters as needed to ensure optimal performance. +5. **Alerting**: Set up alerts to notify administrators when memory usage exceeds certain thresholds to proactively address potential issues. From 56f7236de3702482c30cc6eb872bf88df8bf6e6a Mon Sep 17 00:00:00 2001 From: 00aixxia00 <28895375+00aixxia00@users.noreply.github.com> Date: Thu, 22 Feb 2024 11:14:58 +0100 Subject: [PATCH 2/4] Update NodeMemoryMajorPagesFaults.md --- content/runbooks/node/NodeMemoryMajorPagesFaults.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/runbooks/node/NodeMemoryMajorPagesFaults.md b/content/runbooks/node/NodeMemoryMajorPagesFaults.md index 96dfa73..866b803 100644 --- a/content/runbooks/node/NodeMemoryMajorPagesFaults.md +++ b/content/runbooks/node/NodeMemoryMajorPagesFaults.md @@ -1,5 +1,6 @@ --- title: NodeMemoryMajorPagesFaults +weight: 20 --- ## Meaning From be53dc9d5509e04e903053c4fad8780c2a0e33a7 Mon Sep 17 00:00:00 2001 From: 00aixxia00 <28895375+00aixxia00@users.noreply.github.com> Date: Thu, 22 Feb 2024 11:21:00 +0100 Subject: [PATCH 3/4] Update NodeMemoryMajorPagesFaults.md --- content/runbooks/node/NodeMemoryMajorPagesFaults.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/content/runbooks/node/NodeMemoryMajorPagesFaults.md b/content/runbooks/node/NodeMemoryMajorPagesFaults.md index 866b803..1e5644c 100644 --- a/content/runbooks/node/NodeMemoryMajorPagesFaults.md +++ b/content/runbooks/node/NodeMemoryMajorPagesFaults.md @@ -3,7 +3,10 @@ title: NodeMemoryMajorPagesFaults weight: 20 --- +# NodeMemoryMajorPagesFaults + ## Meaning + Memory major pages are occurring at very high rate at {{ $labels.instance }}, 500 major page faults per second for the last 15 minutes, is currently at {{ printf "%.2f" $value }}. Please check that there is enough memory available at this instance. From 5c575bf4bdb7a2e64008100d75bf32dfe1670463 Mon Sep 17 00:00:00 2001 From: 00aixxia00 <28895375+00aixxia00@users.noreply.github.com> Date: Mon, 26 Feb 2024 16:37:34 +0100 Subject: [PATCH 4/4] Update NodeMemoryMajorPagesFaults.md --- .../node/NodeMemoryMajorPagesFaults.md | 31 +++++++++++-------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/content/runbooks/node/NodeMemoryMajorPagesFaults.md b/content/runbooks/node/NodeMemoryMajorPagesFaults.md index 1e5644c..269a7ed 100644 --- a/content/runbooks/node/NodeMemoryMajorPagesFaults.md +++ b/content/runbooks/node/NodeMemoryMajorPagesFaults.md @@ -7,25 +7,30 @@ weight: 20 ## Meaning -Memory major pages are occurring at very high rate at {{ $labels.instance }}, 500 major page faults per second for the last 15 minutes, is currently at {{ printf "%.2f" $value }}. -Please check that there is enough memory available at this instance. +The `NodeMemoryMajorPagesFaults` alert is triggered when a Kubernetes node experiences a significant number of major page faults, indicating issues with memory access. This could be due to excessive swapping of memory pages to the swap area or general memory problems. + +As shown here: +[Kubernetes-Mixin](https://monitoring.mixins.dev/node-exporter/) +> Memory major pages are occurring at very high rate at {{ $labels.instance }}, 500 major page faults per second for the last 15 minutes, is currently at {{ printf "%.2f" $value }}. +> +> Please check that there is enough memory available at this instance. ## Impact -The high rate of memory major pages faults indicates potential issues with memory management on the instance, which could lead to degraded performance or even service disruptions. +- Possible performance degradation for applications running on the affected Kubernetes node. +- Increased latency for memory accesses. +- Risk of application crashes or errors due to memory overload. ## Diagnosis -1. **Check Memory Usage**: Review the memory usage statistics on the instance to determine if memory is being exhausted. -2. **Identify Resource-Intensive Processes**: Identify any processes or applications that are consuming large amounts of memory. -3. **Review System Logs**: Check system logs for any error messages related to memory allocation or paging. -4. **Analyze Historical Data**: Review historical metrics data to identify any recent changes or trends in memory usage. -5. **Check for Memory Leaks**: Investigate for any memory leaks in applications running on the instance. +1. Check the utilization of physical memory (RAM) and swap space on the affected Kubernetes node. +2. Examine the memory profiles of running applications to determine which processes are consuming memory. +3. Monitor memory usage over time to identify trends and peak loads. + ## Mitigation -1. **Increase Memory**: Consider increasing the memory allocation for the instance to provide more resources for applications and processes. -2. **Optimize Applications**: Optimize memory usage within applications to reduce the likelihood of memory exhaustion. -3. **Restart Services**: If possible, restart any services or applications that are consuming excessive memory to free up resources. -4. **Monitor and Tune**: Continuously monitor memory usage and tune system parameters as needed to ensure optimal performance. -5. **Alerting**: Set up alerts to notify administrators when memory usage exceeds certain thresholds to proactively address potential issues. +1. Optimize the resource utilization of running applications by stopping unnecessary processes or adjusting their resource requirements. +2. Review Kubernetes resource requests and limits configuration to ensure they match the actual requirements of the applications. +3. Scale the resources of the Kubernetes node as needed by adding additional memory or increasing node capacity. +4. Optimize swap configuration to ensure efficient utilization while minimizing the impact of swapping on performance.