Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,13 @@ $$$ech-logging-and-monitoring-production$$$

$$$ech-logging-and-monitoring-retention$$$

% Please leave the AutoOps banner in the final content of this page

:::{important}
If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md).
:::


**This page is a work in progress.** The documentation team is working to combine content pulled from the following pages:

* [/raw-migrated-files/cloud/cloud-heroku/ech-monitoring.md](/raw-migrated-files/cloud/cloud-heroku/ech-monitoring.md)
Expand Down
4 changes: 4 additions & 0 deletions raw-migrated-files/cloud/cloud/ec-saas-metrics-accessing.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ Cluster performance metrics are available directly in the [{{ecloud}} Console](h

For advanced views or production monitoring, [enable logging and monitoring](../../../deploy-manage/monitor/stack-monitoring/elastic-cloud-stack-monitoring.md). The monitoring application provides more advanced views for Elasticsearch and JVM metrics, and includes a configurable retention period.

:::{important}
If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md).
:::

To access cluster performance metrics:

1. Log in to the [{{ecloud}} Console](https://cloud.elastic.co?page=docs&placement=docs-body).
Expand Down
6 changes: 5 additions & 1 deletion troubleshoot/monitoring/cluster-response-time.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,8 @@ Memory pressure is not the culprit. The **Memory Pressure per Node** metric is a

So what caused the sudden increase in response times? The key to the puzzle lies in the **Number of Requests** metric, which indicates the number of requests that a cluster receives per second. Beginning shortly before 13:32, there was a substantial increase in the number of user requests per second. The number of requests per second continued to rise until the requests began to plateau as your cluster reached its maximum throughput, which in turn caused response times to rise. The number of requests remained at a high level for approximately five minutes, until they started to drop off again around 13:40. Overall, the sustained increase of user requests lasted a bit over 10 minutes, consistent with the slowdown you observed.

This cluster was sized to handle a certain number of user requests. As the user requests exceeded the maximum throughput that a cluster of this size could sustain, response times increased. To avoid such a slowdown, you either need to control the volume of user requests that reaches the {{es}} cluster or you need to size your cluster to be able to accommodate a sudden increase in user requests.
This cluster was sized to handle a certain number of user requests. As the user requests exceeded the maximum throughput that a cluster of this size could sustain, response times increased. To avoid such a slowdown, you either need to control the volume of user requests that reaches the {{es}} cluster or you need to size your cluster to be able to accommodate a sudden increase in user requests.

:::{important}
If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md).
:::
4 changes: 4 additions & 0 deletions troubleshoot/monitoring/deployment-health-warnings.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,7 @@ If multiple health warnings appear for one of your deployments, or if your deplo
**Warning about system changes**

If the warning refers to a system change, check the deployment’s [Activity](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) page.

:::{important}
If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md).
:::
4 changes: 4 additions & 0 deletions troubleshoot/monitoring/high-availability.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,7 @@ Cluster performance metrics are shown per node and are color-coded to indicate w
This CPU usage graph indicates that your cluster is load-balancing between the nodes in the different availability zones as designed, but the workload is too high to be able to handle the loss of an availability zone. For a cluster to be able to handle the failure of a node, it should be considered at capacity when it uses 50% of its resources. In this case, two of the nodes are already maxed out and the third one is around 50%. If any one of the three nodes were to fail, the volume of user requests would overwhelm the remaining nodes. On smaller clusters up to and including 8 GB of RAM, CPU boosting can temporarily relieve some of the pressure, but you should not rely on this feature for high availability. On larger clusters, CPU boosting is not available.

Even if your cluster is performing well, you still need to make sure that there is sufficient spare capacity to deal with the outage of an entire availability zone. For this cluster to remain highly available at all times, you either need to increase its size or reduce its workload.

:::{important}
If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md).
:::
3 changes: 3 additions & 0 deletions troubleshoot/monitoring/high-memory-pressure.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ In our example, the **Index Response Times** metric shows that high memory press

If the performance impact from high memory pressure is not acceptable, you need to increase the cluster size or reduce the workload.

:::{important}
If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md).
:::

## Increase the deployment size [ec_increase_the_deployment_size]

Expand Down
3 changes: 3 additions & 0 deletions troubleshoot/monitoring/node-bootlooping.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ Following are some frequent causes of a failed configuration change:
If you’re unable to remediate the failing plan’s root cause, you can attempt to reset the deployment to the latest successful {{es}} configuration by performing a [no-op plan](/troubleshoot/monitoring/deployment-health-warnings.md). For an example, see this [video walkthrough](https://www.youtube.com/watch?v=8MnXZ9egBbQ).
:::{important}
If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md).
:::
## Secure settings [ec-config-change-errors-secure-settings]
Expand Down
4 changes: 4 additions & 0 deletions troubleshoot/monitoring/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,7 @@ When you look in the **Cluster Performance Metrics** section of the [{{ecloud}}
Between just after 00:10 and 00:20, excessively high CPU usage consumes all CPU credits until no more credits are available. CPU credits enable boosting the assigned CPU resources temporarily to improve performance on smaller clusters up to and including 8 GB of RAM when it is needed most, but CPU credits are by their nature limited. You accumulate CPU credits when you use less than your assigned share of CPU resources, and you consume credits when you use more CPU resources than assigned. As you max out your CPU resources, CPU credits permit your cluster to consume more than 100% of the assigned resources temporarily, which explains why CPU usage exceeds 100%, with usage peaks that reach well over 400% for one node. As CPU credits are depleted, CPU usage gradually drops until it returns to 100% at 00:30 when no more CPU credits are available. You can also notice that after 00:30 credits gradually begin to accumulate again.

If you need your cluster to be able to sustain a certain level of performance, you cannot rely on CPU boosting to handle the workload except temporarily. To ensure that performance can be sustained, consider increasing the size of your cluster.

:::{important}
If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md).
:::
5 changes: 4 additions & 1 deletion troubleshoot/monitoring/unavailable-nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,12 @@ This section provides a list of common symptoms and possible actions that you ca
Some actions described here, such as stopping indexing or Machine Learning jobs, are temporary remediations intended to get your cluster into a state where you can make configuration changes to resolve the issue.
::::


For production deployments, we recommend setting up a dedicated monitoring cluster to collect metrics and logs, troubleshooting views, and cluster alerts.

:::{important}
If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md).
:::

If your issue is not addressed here, then [contact Elastic support for help](/troubleshoot/index.md).

## Full disk on single-node deployment [ec-single-node-deployment-disk-used]
Expand Down
6 changes: 5 additions & 1 deletion troubleshoot/monitoring/unavailable-shards.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
navigation_title: "Unavailable shards"
mapped_urls:
- https://www.elastic.co/guide/en/cloud/current/ec-scenario_why_are_shards_unavailable.html
-
- https://www.elastic.co/guide/en/cloud-heroku/current/echscenario_why_are_shards_unavailable.html
- https://www.elastic.co/guide/en/cloud-heroku/current/ech-analyze_shards_with-api.html
- https://www.elastic.co/guide/en/cloud-heroku/current/ech-analyze_shards_with-kibana.html
Expand Down Expand Up @@ -32,6 +32,10 @@ If a cluster has unassigned shards, you might see an error message such as this
:alt: Unhealthy deployment error message
:::

:::{important}
If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md).
:::

If your issue is not addressed here, then [contact Elastic support for help](/troubleshoot/index.md).

## Analyze unassigned shards using the {{es}} API [ec-analyze_shards_with-api]
Expand Down
Loading