Skip to content

Commit 68526ca

Browse files
Fix applies_to in 2 troubleshooting documents: snapshot failures and restore from snapshot (#4487)
updated applies_to tags and refined content of two troubleshoot docs --------- Co-authored-by: Vlada Chirmicci <vlada.chirmicci@elastic.co>
1 parent 5e20b22 commit 68526ca

File tree

4 files changed

+32
-313
lines changed

4 files changed

+32
-313
lines changed

troubleshoot/elasticsearch/repeated-snapshot-failures.md

Lines changed: 24 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,6 @@ mapped_pages:
44
- https://www.elastic.co/guide/en/elasticsearch/reference/current/repeated-snapshot-failures.html
55
applies_to:
66
stack:
7-
deployment:
8-
eck:
9-
ess:
10-
ece:
11-
self:
127
products:
138
- id: elasticsearch
149
---
@@ -17,105 +12,44 @@ products:
1712

1813
Repeated snapshot failures are usually an indicator of a problem with your deployment. Continuous failures of automated snapshots can leave a deployment without recovery options in cases of data loss or outages.
1914

20-
Elasticsearch keeps track of the number of repeated failures when executing automated snapshots. If an automated snapshot fails too many times without a successful execution, the health API will report a warning. The number of repeated failures before reporting a warning is controlled by the [`slm.health.failed_snapshot_warn_threshold`](elasticsearch://reference/elasticsearch/configuration-reference/snapshot-restore-settings.md#slm-health-failed-snapshot-warn-threshold) setting.
15+
:::{include} /deploy-manage/_snippets/autoops-callout-with-ech.md
16+
:::
2117

22-
In the event that an automated {{slm}} policy execution is experiencing repeated failures, follow these steps to get more information about the problem:
18+
{{es}} keeps track of the number of repeated failures when executing automated snapshots with [{{slm}} ({{slm-init}})](/deploy-manage/tools/snapshot-and-restore/create-snapshots.md#automate-snapshots-slm) policies. If an automated snapshot fails too many times without a successful execution, the health API reports a warning. The number of repeated failures before reporting a warning is controlled by the [`slm.health.failed_snapshot_warn_threshold`](elasticsearch://reference/elasticsearch/configuration-reference/snapshot-restore-settings.md#slm-health-failed-snapshot-warn-threshold) setting.
2319

24-
:::::::{tab-set}
20+
## Review snapshot policy failures
2521

26-
::::::{tab-item} {{ech}}
27-
In order to check the status of failing {{slm}} policies we need to go to Kibana and retrieve the [Snapshot Lifecycle Policy information](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle).
22+
If an automated {{slm-init}} policy execution is experiencing repeated failures, follow these steps to get more information about the problem:
2823

29-
**Use {{kib}}**
30-
31-
1. Log in to the [{{ecloud}} console](https://cloud.elastic.co?page=docs&placement=docs-body).
32-
2. On the **Hosted deployments** panel, click the name of your deployment.
24+
:::::::{tab-set}
3325

34-
::::{note}
35-
If the name of your deployment is disabled your {{kib}} instances might be unhealthy, in which case contact [Elastic Support](https://support.elastic.co). If your deployment doesn’t include {{kib}}, all you need to do is [enable it first](../../deploy-manage/deploy/elastic-cloud/access-kibana.md).
36-
::::
26+
::::::{tab-item} Using {{kib}}
27+
In {{kib}}, you can view all configured {{slm-init}} policies and review their status and execution history. If the UI does not provide sufficient details about the failure, use the Console to retrieve the [snapshot policy information](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle) with the {{es}} API.
3728

38-
3. Open your deployment’s side navigation menu (placed under the Elastic logo in the upper left corner) and go to **Dev Tools > Console**.
29+
1. Go to **Snapshot and Restore > Policies** to see the list of configured policies. You can find the **Snapshot and Restore** management page using the navigation menu or the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md).
3930

40-
:::{image} /troubleshoot/images/elasticsearch-reference-kibana-console.png
31+
:::{image} /troubleshoot/images/elasticsearch-reference-slm-policies.png
4132
:alt: {{kib}} Console
4233
:screenshot:
4334
:::
4435

45-
4. [Retrieve](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle) the {{slm}} policy:
46-
47-
```console
48-
GET _slm/policy/<affected-policy-name>
49-
```
50-
51-
The response will look like this:
52-
53-
```console-result
54-
{
55-
"affected-policy-name": { <1>
56-
"version": 1,
57-
"modified_date": "2099-05-06T01:30:00.000Z",
58-
"modified_date_millis": 4081757400000,
59-
"policy" : {
60-
"schedule": "0 30 1 * * ?",
61-
"name": "<daily-snap-{now/d}>",
62-
"repository": "my_repository",
63-
"config": {
64-
"indices": ["data-*", "important"],
65-
"ignore_unavailable": false,
66-
"include_global_state": false
67-
},
68-
"retention": {
69-
"expire_after": "30d",
70-
"min_count": 5,
71-
"max_count": 50
72-
}
73-
},
74-
"last_success" : {
75-
"snapshot_name" : "daily-snap-2099.05.30-tme_ivjqswgkpryvnao2lg",
76-
"start_time" : 4083782400000,
77-
"time" : 4083782400000
78-
},
79-
"last_failure" : { <2>
80-
"snapshot_name" : "daily-snap-2099.06.16-ywe-kgh5rfqfrpnchvsujq",
81-
"time" : 4085251200000, <3>
82-
"details" : """{"type":"snapshot_exception","reason":"[daily-snap-2099.06.16-ywe-kgh5rfqfrpnchvsujq] failed to create snapshot successfully, 5 out of 149 total shards failed"}""" <4>
83-
},
84-
"stats": {
85-
"policy": "daily-snapshots",
86-
"snapshots_taken": 0,
87-
"snapshots_failed": 0,
88-
"snapshots_deleted": 0,
89-
"snapshot_deletion_failures": 0
90-
},
91-
"next_execution": "2099-06-17T01:30:00.000Z",
92-
"next_execution_millis": 4085343000000
93-
}
94-
}
95-
```
96-
97-
1. The affected snapshot lifecycle policy.
98-
2. The information about the last failure for the policy.
99-
3. The time when the failure occurred in millis. Use the `human=true` request parameter to see a formatted timestamp.
100-
4. Error details containing the reason for the snapshot failure.
101-
102-
103-
Snapshots can fail for a variety reasons. If the failures are due to configuration errors, consult the documentation for the repository that the automated snapshots are using. Refer to the [guide on managing repositories in ECE](/deploy-manage/tools/snapshot-and-restore/cloud-enterprise.md) if you are using such a deployment.
104-
36+
2. The policies table lists all configured policies. Click on any of the policies to review the details and execution history.
10537

106-
One common failure scenario is repository corruption. This occurs most often when multiple instances of {{es}} write to the same repository location. There is a [separate troubleshooting guide](diagnosing-corrupted-repositories.md) to fix this problem.
38+
3. To get more detailed information about the failure, open {{kib}} **Dev Tools > Console**. You can find the **Console** using the navigation menu or the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md).
10739

108-
In the event that snapshots are failing for other reasons check the logs on the elected master node during the snapshot execution period for more information.
40+
Once the Console is open, execute the steps described in the **Using the {{es}} API** tab to retrieve the affected {{slm-init}} policy information.
10941
::::::
11042

111-
::::::{tab-item} Self-managed
112-
[Retrieve](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle) the {{slm}} policy:
43+
::::::{tab-item} Using the {{es}} API
44+
The following step can be run using either [{{kib}} console](/explore-analyze/query-filter/tools/console.md) or direct [{{es}} API](elasticsearch://reference/elasticsearch/rest-apis/index.md) calls.
45+
46+
[Retrieve the affected {{slm-init}} policy](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle):
11347

11448
```console
11549
GET _slm/policy/<affected-policy-name>
11650
```
11751

118-
The response will look like this:
52+
The response looks like this:
11953

12054
```console-result
12155
{
@@ -166,15 +100,14 @@ The response will look like this:
166100
3. The time when the failure occurred in millis. Use the `human=true` request parameter to see a formatted timestamp.
167101
4. Error details containing the reason for the snapshot failure.
168102

103+
::::::
169104

170-
Snapshots can fail for a variety reasons. If the failures are due to configuration errors, consult the documentation for the repository that the automated snapshots are using.
105+
:::::::
171106

172-
One common failure scenario is repository corruption. This occurs most often when multiple instances of {{es}} write to the same repository location. There is a [separate troubleshooting guide](diagnosing-corrupted-repositories.md) to fix this problem.
107+
## Possible causes
173108

174-
In the event that snapshots are failing for other reasons check the logs on the elected master node during the snapshot execution period for more information.
175-
::::::
109+
Snapshots can fail for a variety of reasons. If the failures are due to configuration errors, consult the documentation for the repository type that the snapshot policy is using. Refer to the [guide on managing repositories in ECE](/deploy-manage/tools/snapshot-and-restore/cloud-enterprise.md) if you are using an Elastic Cloud Enterprise deployment.
176110

177-
:::::::
111+
One common failure scenario is repository corruption. This occurs most often when multiple instances of {{es}} write to the same repository location. There is a [separate troubleshooting guide](diagnosing-corrupted-repositories.md) to fix this problem.
178112

179-
:::{include} /deploy-manage/_snippets/autoops-callout-with-ech.md
180-
:::
113+
If snapshots are failing for other reasons check the logs on the elected master node during the snapshot execution period for more information.

0 commit comments

Comments
 (0)