You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: deploy-manage/tools/snapshot-and-restore/create-snapshots.md
+21-3Lines changed: 21 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,9 +11,9 @@ products:
11
11
- id: elasticsearch
12
12
---
13
13
14
-
# Create snapshots [snapshots-take-snapshot]
14
+
# Create, monitor and delete snapshots [snapshots-take-snapshot]
15
15
16
-
This guide shows you how to take snapshots of a running cluster. You can later [restore a snapshot](restore-snapshot.md) to recover or transfer its data.
16
+
This guide shows you how to create, monitor and delete snapshots of a running cluster. You can later [restore a snapshot](restore-snapshot.md) to recover or transfer its data.
17
17
18
18
In this guide, you’ll learn how to:
19
19
@@ -187,6 +187,20 @@ We recommend you include retention rules in your {{slm-init}} policy to delete s
187
187
188
188
A snapshot repository can safely scale to thousands of snapshots. However, to manage its metadata, a large repository requires more memory on the master node. Retention rules ensure a repository’s metadata doesn’t grow to a size that could destabilize the master node.
189
189
190
+
### Update an existing {{slm-init}} policy [update-slm-policy]
191
+
192
+
You can update an existing {{slm-init}} policy after it's created. To manage {{slm-init}} in {{kib}}, go to the main menu and click **Stack Management** > **Snapshot and Restore** > **Policies**, click **Edit**`✎`, and make the desired change.
193
+
194
+
For example, you can change the schedule, or snapshot retention-related configurations.
You can also update an {{slm-init}} policy using the [{{slm-init}} APIs](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-slm), as described in [Create an {{slm-init}} policy](#create-slm-policy).
203
+
190
204
191
205
## Manually create a snapshot [manually-create-snapshot]
192
206
@@ -242,7 +256,11 @@ GET _slm/policy/nightly-snapshots
242
256
243
257
## Delete or cancel a snapshot [delete-snapshot]
244
258
245
-
To delete a snapshot in {{kib}}, go to the **Snapshots** page and click the trash icon under the **Actions** column. You can also use the [delete snapshot API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-snapshot-delete).
259
+
To delete a snapshot in {{kib}}, go to the **Snapshots** page and click the trash icon under the **Actions** column. To delete multiple snapshots at once, select the snapshots from the list and then click **Delete snaphshots**.
# Troubleshoot node moves and outages [ec-deployment-node-move]
12
+
# Understanding node moves and system maintenance [node-moves-system-maintenance]
13
13
14
-
To ensure that your nodes are located on healthy hosts, we vacate nodes to perform routine system maintenance or to remove a host with hardware issues from service.
14
+
To ensure that your deployment nodes are located on healthy hosts, Elastic vacates nodes to perform essential system maintenance or to remove a host with hardware issues from service. These tasks cannot be skipped or delayed.
15
15
16
-
All major scheduled maintenance and incidents can be found on the Elastic [status page](https://status.elastic.co/). You can subscribe to that page to be notified about updates.
16
+
You can subscribe to the [status page](https://status.elastic.co/) to be notified about planned maintenance or actions that have been taken to respond to incidents.
17
17
18
-
If events on your deployment don’t correlate to any items listed on the status page, the events are due to minor routine maintenance performed on only a subset of {{ech}} deployments.
18
+
If events on your deployment don’t correlate to any items listed on the status page, the events are due to minor essential maintenance performed on only a subset of {{ech}} deployments.
19
19
20
-
**What is the impact?**
20
+
When {{ech}} undergoes system maintenance, the following message appears on the [activity page](../../deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md):
21
21
22
-
During the routine system maintenance, having replicas and multiple availability zones ensures minimal interruption to your service. When nodes are vacated, as long as you have high availability, all search and indexing requests are expected to work within the reduced capacity until the node is back to normal.
22
+
```sh
23
+
Move nodes off of allocator [allocator_id] due to essential system maintenance.
24
+
```
23
25
24
-
**How can I be notified when a node is changed?**
26
+
This page explains the causes and impact of node moves and system maintenance, and how to mitigate any possible risks to your deployment.
25
27
26
-
To receive an email when nodes are added or removed from your deployment:
28
+
::::{note}
29
+
You can also [configure email notifications](#configure-email-notification) to be alerted when this situation occurs.
30
+
::::
27
31
28
-
1. Follow the first five steps in [Getting notified about deployment health issues](../../deploy-manage/monitor/monitoring-data/configure-stack-monitoring-alerts.md).
29
-
2. At Step 6, to choose the alert type for when a node is changed, select **CLUSTER HEALTH** → **Nodes changed** → **Edit alert**.
32
+
## Possible causes [possible-cause]
30
33
31
-
::::{note}
32
-
If you have only one master node in your cluster, during the master node vacate no notification will be sent. Kibana needs to communicate with the master node in order to send a notification. One way to avoid this is by shipping your deployment metrics to a dedicated monitoring cluster, which you can configure in Step 2 of [Getting notified about deployment health issues](../../deploy-manage/monitor/monitoring-data/configure-stack-monitoring-alerts.md) when you enable logging and monitoring.
34
+
Potential causes of system maintenance include, but not limited to, situations like the following:
35
+
36
+
* A host where the Cloud Service Provider (CSP), like AWS, GCP, or Azure, has reported upcoming hardware deprecation or identified issues requiring remediation.
37
+
* Abrupt host termination by the CSP due to underlying infrastructure problems.
38
+
* Mandatory host operating system (OS) patching or upgrades for security or compliance reasons.
39
+
* Other scheduled maintenance announced on the [Elastic status page](https://status.elastic.co/).
40
+
41
+
## Behavior difference [behavior-difference]
42
+
43
+
Depending on the cause, the maintenance behaviors may differ.
44
+
45
+
* During planned operations, such as hardware upgrades or host patches, the system attempts to gracefully move the node to another host before shutting down the original one. This process allows shard relocation to complete ahead of time, minimizing any potential disruption.
46
+
47
+
* If a node’s host experiences an unexpected outage, the system automatically vacates the node and displays a related `Don't attempt to gracefully move shards` message on the [activity page](../../deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md), skipping the check to ensure the node’s shards have been moved before shutdown.
48
+
49
+
## Impact and mitigation [impact-mitigation]
50
+
51
+
The following sections describe how your deployment behaves during maintenance, and how to reduce risks of data loss.
52
+
53
+
### Service availability
54
+
55
+
The system will automatically try to recover the vacated node’s data from replicas or snapshots. If your cluster has [high availability (HA)](/deploy-manage/deploy/elastic-cloud/elastic-cloud-hosted-planning.md#ec-ha) configured, all search and indexing requests should continue to work within the reduced capacity until the node is replaced.
56
+
57
+
Overall, having replicas and multiple availability zones helps minimize service interruption.
58
+
59
+
### Data resiliency
60
+
61
+
The system maintenance process always attempts to recover the vacated node's data from replicas or snapshots. However, if the deployment is not configured with high availability, the maintenance process might not be able to recover the data from the vacated node.
62
+
63
+
Configuring multiple availability zones helps your deployment remain available for indexing and search requests if one zone becomes unavailable. However, this alone does not guarantee data availability. If an index has no replica shards and its primary shard is located on a node that must be vacated, data loss might occur if the system is unable to move the node gracefully during the maintenance activity.
64
+
65
+
To minimize this risk and keep your data accessible, ensure that your deployment follows [high availability best practices](/deploy-manage/deploy/elastic-cloud/elastic-cloud-hosted-planning.md#ec-ha):
66
+
- Use at least two availability zones for production systems, and three for mission-critical systems.
67
+
- Configure one or more [replica shards](/deploy-manage/distributed-architecture/clusters-nodes-shards.md) for each index, except for searchable snapshot indices.
68
+
69
+
As long as these recommendations are followed, system maintenance processes should not impact the availability of the data in the deployment.
70
+
71
+
### Performance stability
72
+
73
+
The performance impact of system maintenance depends on how well the deployment is sized. Well-provisioned deployments with sufficient buffer capacity typically remain unaffected, while deployments already operating near their limits might experience slowdowns, or even intermittent request failures, during node vacating.
74
+
75
+
High availability assumes not just redundancy in data and zones, but also the ability to absorb the loss or restart of a node without service disruption. To learn more, refer to [Plan for production](/deploy-manage/deploy/elastic-cloud/elastic-cloud-hosted-planning.md#ec-ha).
76
+
77
+
At a minimum, you should size your deployment to tolerate the temporary loss of one node in order to avoid single points of failure and ensure proper HA. For critical systems, ensure that the deployment can continue operating even in the event of losing an entire availability zone.
78
+
79
+
::::{admonition} Availability zones and sizing recommendations
80
+
Increasing the number of zones should not be used to add more resources. The concept of zones is meant for high availability (two zones) and fault tolerance (three zones), but neither will work if the cluster relies on the resources from those zones to be operational.
81
+
82
+
You should to scale up the resources within a single zone until the cluster can take the full load, adding some buffer to be prepared for a peak of requests. You should then scale out by adding additional zones depending on your requirements: two zones for high availability, three zones for fault tolerance.
You can configure email alerts for system maintenance by following these steps:
38
89
39
-
If a node’s host experiences an outage, the system automatically vacates the node and displays a related `Don't attempt to gracefully move shards` message on the [**Activity**](../../deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) page. Since the node is unavailable, the system skips checks that ensure the node’s shards have been moved before shutting down the node.
90
+
1. Enable [Stack monitoring](/deploy-manage/monitor/stack-monitoring/ece-ech-stack-monitoring.md#enable-logging-and-monitoring-steps) (logs and metrics) on your deployment. Only metrics collection is required for these notifications to work.
91
+
92
+
2. In the deployment used as the destination of Stack monitoring:
* (Optional) Configure an email [connector](kibana://reference/connectors-kibana/email-action-type.md). If you prefer, use the preconfigured `Elastic-Cloud-SMTP` email connector.
97
+
98
+
* Edit the rule **Cluster alerting** > **{{es}} nodes changed** and select the email connector.
99
+
100
+
::::{note}
101
+
If you have only one master node in your cluster, no notification will be sent during the master node vacate. {{kib}} needs to communicate with the master node in order to send a notification. You can avoid this by shipping your deployment metrics to a dedicated monitoring cluster when you enable logging and monitoring.
102
+
::::
40
103
41
-
Unless overridden or unable, the system will automatically recover the vacated node’s data from replicas or snapshots. If your cluster has high availability, all search and indexing requests should work within the reduced capacity until the node is back to normal.
0 commit comments