Skip to content

Commit a842bc9

Browse files
[DOCS] Enhance troubleshooting high cpu page. Opster migration (#909)
1 parent 0ec6758 commit a842bc9

File tree

1 file changed

+52
-10
lines changed

1 file changed

+52
-10
lines changed

troubleshoot/elasticsearch/high-cpu-usage.md

Lines changed: 52 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,6 @@ products:
1919

2020
If a thread pool is depleted, {{es}} will [reject requests](rejected-requests.md) related to the thread pool. For example, if the `search` thread pool is depleted, {{es}} will reject search requests until more threads are available.
2121

22-
You might experience high CPU usage if a [data tier](../../manage-data/lifecycle/data-tiers.md), and therefore the nodes assigned to that tier, is experiencing more traffic than other tiers. This imbalance in resource utilization is also known as [hot spotting](hotspotting.md).
23-
2422
::::{tip}
2523
If you're using {{ech}}, you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. For more information, refer to [](/deploy-manage/monitor/autoops.md).
2624
::::
@@ -29,7 +27,7 @@ If you're using {{ech}}, you can use AutoOps to monitor your cluster. AutoOps si
2927

3028
## Diagnose high CPU usage [diagnose-high-cpu-usage]
3129

32-
**Check CPU usage**
30+
### Check CPU usage [check-cpu-usage]
3331

3432
You can check the CPU usage per node using the [cat nodes API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-nodes):
3533

@@ -64,7 +62,7 @@ To track CPU usage over time, we recommend enabling monitoring:
6462
::::::
6563

6664
:::::::
67-
**Check hot threads**
65+
### Check hot threads [check-hot-threads]
6866

6967
If a node has high CPU usage, use the [nodes hot threads API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-nodes-hot-threads) to check for resource-intensive threads running on the node.
7068

@@ -79,17 +77,61 @@ This API returns a breakdown of any hot threads in plain text. High CPU usage fr
7977

8078
The following tips outline the most common causes of high CPU usage and their solutions.
8179

82-
**Scale your cluster**
80+
### Check JVM garbage collection [check-jvm-garbage-collection]
81+
82+
High CPU usage is often caused by excessive JVM garbage collection (GC) activity. This excessive GC typically arises from configuration problems or inefficient queries causing increased heap memory usage.
83+
84+
For optimal JVM performance, garbage collection should meet these criteria:
85+
86+
| GC type | Completion time | Frequency |
87+
|---------|----------------|---------------------|
88+
| Young GC | <50ms | ~once per 10 seconds |
89+
| Old GC | <1s | ≤once per 10 minutes |
90+
91+
Excessive JVM garbage collection usually indicates high heap memory usage. Common potential reasons for increased heap memory usage include:
92+
93+
* Oversharding of indices
94+
* Very large aggregation queries
95+
* Excessively large bulk indexing requests
96+
* Inefficient or incorrect mapping definitions
97+
* Improper heap size configuration
98+
* Misconfiguration of JVM new generation ratio (`-XX:NewRatio`)
99+
100+
### Hot spotting [high-cpu-usage-hot-spotting]
101+
102+
You might experience high CPU usage on specific data nodes or an entire [data tier](/manage-data/lifecycle/data-tiers.md) if traffic isn’t evenly distributed. This is known as [hot spotting](hotspotting.md). Hot spotting commonly occurs when read or write applications don’t evenly distribute requests across nodes, or when indices receiving heavy write activity, such as indices in the hot tier, have their shards concentrated on just one or a few nodes.
103+
104+
For details on diagnosing and resolving these issues, refer to [](hotspotting.md).
105+
106+
### Oversharding [high-cpu-usage-oversharding]
107+
108+
Oversharding occurs when a cluster has too many shards, often times caused by shards being smaller than optimal. While {{es}} doesn’t have a strict minimum shard size, an excessive number of small shards can negatively impact performance. Each shard consumes cluster resources because {{es}} must maintain metadata and manage shard states across all nodes.
109+
110+
If you have too many small shards, you can address this by doing the following:
111+
112+
* Removing empty or unused indices.
113+
* Deleting or closing indices containing outdated or unnecessary data.
114+
* Reindexing smaller shards into fewer, larger shards to optimize cluster performance.
115+
116+
If your shards are sized correctly but you are still experiencing oversharding, creating a more aggressive [index lifecycle management strategy](/manage-data/lifecycle/index-lifecycle-management.md) or deleting old indices can help reduce the number of shards.
117+
118+
For more information, refer to [](/deploy-manage/production-guidance/optimize-performance/size-shards.md).
119+
120+
### Additional recommendations
121+
122+
To further reduce CPU load or mitigate temporary spikes in resource usage, consider these steps:
123+
124+
#### Scale your cluster [scale-your-cluster]
83125

84-
Heavy indexing and search loads can deplete smaller thread pools. To better handle heavy workloads, add more nodes to your cluster or upgrade your existing nodes to increase capacity.
126+
Heavy indexing and search loads can deplete smaller thread pools. Add nodes or upgrade existing ones to handle increased indexing and search loads more effectively.
85127

86-
**Spread out bulk requests**
128+
#### Spread out bulk requests [spread-out-bulk-requests]
87129

88-
While more efficient than individual requests, large [bulk indexing](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk) or [multi-search](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-msearch) requests still require CPU resources. If possible, submit smaller requests and allow more time between them.
130+
Submit smaller [bulk indexing](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk-1) or multi-search requests, and space them out to avoid overwhelming thread pools.
89131

90-
**Cancel long-running searches**
132+
#### Cancel long-running searches [cancel-long-running-searches]
91133

92-
Long-running searches can block threads in the `search` thread pool. To check for these searches, use the [task management API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-tasks).
134+
Regularly use the [task management API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-tasks-list) to identify and cancel searches that consume excessive CPU time.
93135

94136
```console
95137
GET _tasks?actions=*search&detailed

0 commit comments

Comments
 (0)