diff --git a/troubleshoot/elasticsearch/repeated-snapshot-failures.md b/troubleshoot/elasticsearch/repeated-snapshot-failures.md index 7033bb23d7..5dbc4dda99 100644 --- a/troubleshoot/elasticsearch/repeated-snapshot-failures.md +++ b/troubleshoot/elasticsearch/repeated-snapshot-failures.md @@ -4,11 +4,6 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/repeated-snapshot-failures.html applies_to: stack: - deployment: - eck: - ess: - ece: - self: products: - id: elasticsearch --- @@ -17,105 +12,44 @@ products: Repeated snapshot failures are usually an indicator of a problem with your deployment. Continuous failures of automated snapshots can leave a deployment without recovery options in cases of data loss or outages. -Elasticsearch keeps track of the number of repeated failures when executing automated snapshots. If an automated snapshot fails too many times without a successful execution, the health API will report a warning. The number of repeated failures before reporting a warning is controlled by the [`slm.health.failed_snapshot_warn_threshold`](elasticsearch://reference/elasticsearch/configuration-reference/snapshot-restore-settings.md#slm-health-failed-snapshot-warn-threshold) setting. +:::{include} /deploy-manage/_snippets/autoops-callout-with-ech.md +::: -In the event that an automated {{slm}} policy execution is experiencing repeated failures, follow these steps to get more information about the problem: +{{es}} keeps track of the number of repeated failures when executing automated snapshots with [{{slm}} ({{slm-init}})](/deploy-manage/tools/snapshot-and-restore/create-snapshots.md#automate-snapshots-slm) policies. If an automated snapshot fails too many times without a successful execution, the health API reports a warning. The number of repeated failures before reporting a warning is controlled by the [`slm.health.failed_snapshot_warn_threshold`](elasticsearch://reference/elasticsearch/configuration-reference/snapshot-restore-settings.md#slm-health-failed-snapshot-warn-threshold) setting. -:::::::{tab-set} +## Review snapshot policy failures -::::::{tab-item} {{ech}} -In order to check the status of failing {{slm}} policies we need to go to Kibana and retrieve the [Snapshot Lifecycle Policy information](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle). +If an automated {{slm-init}} policy execution is experiencing repeated failures, follow these steps to get more information about the problem: -**Use {{kib}}** - -1. Log in to the [{{ecloud}} console](https://cloud.elastic.co?page=docs&placement=docs-body). -2. On the **Hosted deployments** panel, click the name of your deployment. +:::::::{tab-set} - ::::{note} - If the name of your deployment is disabled your {{kib}} instances might be unhealthy, in which case contact [Elastic Support](https://support.elastic.co). If your deployment doesn’t include {{kib}}, all you need to do is [enable it first](../../deploy-manage/deploy/elastic-cloud/access-kibana.md). - :::: +::::::{tab-item} Using {{kib}} +In {{kib}}, you can view all configured {{slm-init}} policies and review their status and execution history. If the UI does not provide sufficient details about the failure, use the Console to retrieve the [snapshot policy information](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle) with the {{es}} API. -3. Open your deployment’s side navigation menu (placed under the Elastic logo in the upper left corner) and go to **Dev Tools > Console**. +1. Go to **Snapshot and Restore > Policies** to see the list of configured policies. You can find the **Snapshot and Restore** management page using the navigation menu or the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). - :::{image} /troubleshoot/images/elasticsearch-reference-kibana-console.png + :::{image} /troubleshoot/images/elasticsearch-reference-slm-policies.png :alt: {{kib}} Console :screenshot: ::: -4. [Retrieve](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle) the {{slm}} policy: - - ```console - GET _slm/policy/ - ``` - - The response will look like this: - - ```console-result - { - "affected-policy-name": { <1> - "version": 1, - "modified_date": "2099-05-06T01:30:00.000Z", - "modified_date_millis": 4081757400000, - "policy" : { - "schedule": "0 30 1 * * ?", - "name": "", - "repository": "my_repository", - "config": { - "indices": ["data-*", "important"], - "ignore_unavailable": false, - "include_global_state": false - }, - "retention": { - "expire_after": "30d", - "min_count": 5, - "max_count": 50 - } - }, - "last_success" : { - "snapshot_name" : "daily-snap-2099.05.30-tme_ivjqswgkpryvnao2lg", - "start_time" : 4083782400000, - "time" : 4083782400000 - }, - "last_failure" : { <2> - "snapshot_name" : "daily-snap-2099.06.16-ywe-kgh5rfqfrpnchvsujq", - "time" : 4085251200000, <3> - "details" : """{"type":"snapshot_exception","reason":"[daily-snap-2099.06.16-ywe-kgh5rfqfrpnchvsujq] failed to create snapshot successfully, 5 out of 149 total shards failed"}""" <4> - }, - "stats": { - "policy": "daily-snapshots", - "snapshots_taken": 0, - "snapshots_failed": 0, - "snapshots_deleted": 0, - "snapshot_deletion_failures": 0 - }, - "next_execution": "2099-06-17T01:30:00.000Z", - "next_execution_millis": 4085343000000 - } - } - ``` - - 1. The affected snapshot lifecycle policy. - 2. The information about the last failure for the policy. - 3. The time when the failure occurred in millis. Use the `human=true` request parameter to see a formatted timestamp. - 4. Error details containing the reason for the snapshot failure. - - - Snapshots can fail for a variety reasons. If the failures are due to configuration errors, consult the documentation for the repository that the automated snapshots are using. Refer to the [guide on managing repositories in ECE](/deploy-manage/tools/snapshot-and-restore/cloud-enterprise.md) if you are using such a deployment. - +2. The policies table lists all configured policies. Click on any of the policies to review the details and execution history. -One common failure scenario is repository corruption. This occurs most often when multiple instances of {{es}} write to the same repository location. There is a [separate troubleshooting guide](diagnosing-corrupted-repositories.md) to fix this problem. +3. To get more detailed information about the failure, open {{kib}} **Dev Tools > Console**. You can find the **Console** using the navigation menu or the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). -In the event that snapshots are failing for other reasons check the logs on the elected master node during the snapshot execution period for more information. + Once the Console is open, execute the steps described in the **Using the {{es}} API** tab to retrieve the affected {{slm-init}} policy information. :::::: -::::::{tab-item} Self-managed -[Retrieve](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle) the {{slm}} policy: +::::::{tab-item} Using the {{es}} API +The following step can be run using either [{{kib}} console](/explore-analyze/query-filter/tools/console.md) or direct [{{es}} API](elasticsearch://reference/elasticsearch/rest-apis/index.md) calls. + +[Retrieve the affected {{slm-init}} policy](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-slm-get-lifecycle): ```console GET _slm/policy/ ``` -The response will look like this: +The response looks like this: ```console-result { @@ -166,15 +100,14 @@ The response will look like this: 3. The time when the failure occurred in millis. Use the `human=true` request parameter to see a formatted timestamp. 4. Error details containing the reason for the snapshot failure. +:::::: -Snapshots can fail for a variety reasons. If the failures are due to configuration errors, consult the documentation for the repository that the automated snapshots are using. +::::::: -One common failure scenario is repository corruption. This occurs most often when multiple instances of {{es}} write to the same repository location. There is a [separate troubleshooting guide](diagnosing-corrupted-repositories.md) to fix this problem. +## Possible causes -In the event that snapshots are failing for other reasons check the logs on the elected master node during the snapshot execution period for more information. -:::::: +Snapshots can fail for a variety of reasons. If the failures are due to configuration errors, consult the documentation for the repository type that the snapshot policy is using. Refer to the [guide on managing repositories in ECE](/deploy-manage/tools/snapshot-and-restore/cloud-enterprise.md) if you are using an Elastic Cloud Enterprise deployment. -::::::: +One common failure scenario is repository corruption. This occurs most often when multiple instances of {{es}} write to the same repository location. There is a [separate troubleshooting guide](diagnosing-corrupted-repositories.md) to fix this problem. -:::{include} /deploy-manage/_snippets/autoops-callout-with-ech.md -::: +If snapshots are failing for other reasons check the logs on the elected master node during the snapshot execution period for more information. diff --git a/troubleshoot/elasticsearch/restore-from-snapshot.md b/troubleshoot/elasticsearch/restore-from-snapshot.md index 4f0e6f32f0..d14b89e4b9 100644 --- a/troubleshoot/elasticsearch/restore-from-snapshot.md +++ b/troubleshoot/elasticsearch/restore-from-snapshot.md @@ -3,234 +3,24 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/restore-from-snapshot.html applies_to: stack: - deployment: - eck: - ess: - ece: - self: products: - id: elasticsearch --- # Restore from snapshot [restore-from-snapshot] -Elasticsearch is using snapshots to store a copy of your data outside a cluster. You can restore a snapshot to recover indices and data streams for which there are no copies of the shards in the cluster. This can happen if the data (indices or data streams) was deleted or if the cluster membership changed and the current nodes in the system do not contain a copy of the data anymore. +This guide applies when one or more indices or data streams in your {{es}} cluster are missing or contain incomplete data, and you want to recover that data from an existing snapshot. [Snapshots](/deploy-manage/tools/snapshot-and-restore.md) store copies of your data outside the cluster. + +Missing data can occur for several reasons, such as accidental deletion of indices or data streams, or node or disk failures when no replicas are configured. Depending on the cause, there are multiple ways to recover the data. Restoring from a snapshot is appropriate when a recent snapshot contains the affected indices or data streams. + ::::{important} Restoring the missing data requires you to have a backup of the affected indices and data streams that is up-to-date enough for your use case. Don't proceed without confirming this. :::: +To restore the indices and data streams with missing data, run the following steps using either [{{kib}} console](/explore-analyze/query-filter/tools/console.md) or direct [{{es}} API](elasticsearch://reference/elasticsearch/rest-apis/index.md) calls. -:::::::{tab-set} - -::::::{tab-item} {{ech}} -In order to restore the indices and data streams that are missing data: - -**Use {{kib}}** - -1. Log in to the [{{ecloud}} console](https://cloud.elastic.co?page=docs&placement=docs-body). -2. On the **Hosted deployments** panel, click the name of your deployment. - - ::::{note} - If the name of your deployment is disabled your {{kib}} instances might be unhealthy, in which case contact [Elastic Support](https://support.elastic.co). If your deployment doesn’t include {{kib}}, all you need to do is [enable it first](../../deploy-manage/deploy/elastic-cloud/access-kibana.md). - :::: - -3. Open your deployment’s side navigation menu (placed under the Elastic logo in the upper left corner) and go to **Dev Tools > Console**. - - :::{image} /troubleshoot/images/elasticsearch-reference-kibana-console.png - :alt: {{kib}} Console - :screenshot: - ::: - -4. To view the affected indices using the [cat indices API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-indices). - - ```console - GET _cat/indices?v&health=red&h=index,status,health - ``` - - The response will look like this: - - ```console-result - index status health - .ds-my-data-stream-2022.06.17-000001 open red - kibana_sample_data_flights open red - ``` - - The `red` health of the indices above indicates that these indices are missing primary shards, meaning they are missing data. - -5. In order to restore the data we need to find a snapshot that contains these two indices. To find such a snapshot use the [get snapshot API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-snapshot-get). - - ```console - GET _snapshot/my_repository/*?verbose=false - ``` - - The response will look like this: - - ```console-result - { - "snapshots" : [ - { - "snapshot" : "snapshot-20200617", <1> - "uuid" : "dZyPs1HyTwS-cnKdH08EPg", - "repository" : "my_repository", <2> - "indices" : [ <3> - ".apm-agent-configuration", - ".apm-custom-link", - ".ds-ilm-history-5-2022.06.17-000001", - ".ds-my-data-stream-2022.06.17-000001", - ".geoip_databases", - ".kibana-event-log-8.2.2-000001", - ".kibana_8.2.2_001", - ".kibana_task_manager_8.2.2_001", - "kibana_sample_data_ecommerce", - "kibana_sample_data_flights", - "kibana_sample_data_logs" - ], - "data_streams" : [ ], - "state" : "SUCCESS" <4> - } - ], - "total" : 1, - "remaining" : 0 - } - ``` - - 1. The name of the snapshot. - 2. The repository of the snapshot. - 3. The indices backed up in the snapshot. - 4. If the snapshot was successful. - -6. The snapshot `snapshot-20200617` contains the two indices we want to restore. You might have multiple snapshots from which you could restore the target indices. Choose the latest snapshot. -7. Now that we found a snapshot, we will proceed with the data stream preparation for restoring the lost data. We will check the index metadata to see if any index is part of a data stream: - - ```console - GET kibana_sample_data_flights,.ds-my-data-stream-2022.06.17-000001?features=settings&flat_settings - ``` - - The response will look like this: - - ```console-result - { - ".ds-my-data-stream-2022.06.17-000001" : { <1> - "aliases" : { }, - "mappings" : { }, - "settings" : { <2> - "index.creation_date" : "1658406121699", - "index.hidden" : "true", - "index.lifecycle.name" : "my-lifecycle-policy", - "index.number_of_replicas" : "1", - "index.number_of_shards" : "1", - "index.provided_name" : ".ds-my-data-stream-2022.06.17-000001", - "index.routing.allocation.include._tier_preference" : "data_hot", - "index.uuid" : "HmlFXp6VSu2XbQ-O3hVrwQ", - "index.version.created" : "8020299" - }, - "data_stream" : "my-data-stream" <3> - }, - "kibana_sample_data_flights" : { <4> - "aliases" : { }, - "mappings" : { }, - "settings" : { - "index.creation_date" : "1655121541454", - "index.number_of_replicas" : "0", - "index.number_of_shards" : "1", - "index.provided_name" : "kibana_sample_data_flights", - "index.routing.allocation.include._tier_preference" : "data_content", - "index.uuid" : "jMOlwKPPSzSraeeBWyuoDA", - "index.version.created" : "8020299" - } - } - } - ``` - - 1. The name of an index. - 2. The settings of this index that contains the metadata we are looking for. - 3. This indicates that this index is part of a data stream and displays the data stream name. - 4. The name of the other index we requested. - - - The response above shows that `kibana_sample_data_flights` is not part of a data stream because it doesn’t have a field called `data_stream` in the settings. - - On the contrary, `.ds-my-data-stream-2022.06.17-000001` is part of the data stream called `my-data-stream`. When you find an index like this, which belongs to a data stream, you need to check if data are still being indexed. You can see that by checking the `settings`, if you can find this property: `"index.lifecycle.indexing_complete" : "true"`, it means that indexing is completed in this index and you can continue to the next step. - - If `index.lifecycle.indexing_complete` is not there or is configured to `false` you need to rollover the data stream so you can restore the missing data without blocking the ingestion of new data. The following command will achieve that. - - ```console - POST my-data-stream/_rollover - ``` - -8. Now that the data stream preparation is done, we will close the target indices by using the [close indices API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-close). - - ```console - POST kibana_sample_data_flights,.ds-my-data-stream-2022.06.17-000001/_close - ``` - - You can confirm that they are closed with the [cat indices API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-indices). - - ```console - GET _cat/indices?v&health=red&h=index,status,health - ``` - - The response will look like this: - - ```console-result - index status health - .ds-my-data-stream-2022.06.17-000001 close red - kibana_sample_data_flights close red - ``` - -9. The indices are closed, now we can restore them from snapshots without causing any complications using the [restore snapshot API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-snapshot-restore): - - ```console - POST _snapshot/my_repository/snapshot-20200617/_restore - { - "indices": "kibana_sample_data_flights,.ds-my-data-stream-2022.06.17-000001", <1> - "include_aliases": true <2> - } - ``` - - 1. The indices to restore. - 2. We also want to restore the aliases. - - - ::::{note} - If any [feature states](../../deploy-manage/tools/snapshot-and-restore.md#feature-state) need to be restored we’ll need to specify them using the `feature_states` field and the indices that belong to the feature states we restore must not be specified under `indices`. The [Health API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-health-report) returns both the `indices` and `feature_states` that need to be restored for the restore from snapshot diagnosis. e.g.: - :::: - - - ```console - POST _snapshot/my_repository/snapshot-20200617/_restore - { - "feature_states": [ "geoip" ], - "indices": "kibana_sample_data_flights,.ds-my-data-stream-2022.06.17-000001", - "include_aliases": true - } - ``` - -10. Finally we can verify that the indices health is now `green` via the [cat indices API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-indices). - - ```console - GET _cat/indices?v&index=.ds-my-data-stream-2022.06.17-000001,kibana_sample_data_flightsh=index,status,health - ``` - - The response will look like this: - - ```console-result - index status health - .ds-my-data-stream-2022.06.17-000001 open green - kibana_sample_data_flights open green - ``` - - As we can see above the indices are `green` and open. The issue is resolved. - - -For more guidance on creating and restoring snapshots see [this guide](../../deploy-manage/tools/snapshot-and-restore.md). -:::::: - -::::::{tab-item} Self-managed -In order to restore the indices that are missing shards: - -1. View the affected indices using the [cat indices API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-indices). +1. Review the affected indices using the [cat indices API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-indices). ```console GET _cat/indices?v&health=red&h=index,status,health @@ -246,7 +36,7 @@ In order to restore the indices that are missing shards: The `red` health of the indices above indicates that these indices are missing primary shards, meaning they are missing data. -2. In order to restore the data we need to find a snapshot that contains these two indices. To find such a snapshot use the [get snapshot API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-snapshot-get). +2. To restore the data we need to find a snapshot that contains these two indices. To find such a snapshot use the [get snapshot API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-snapshot-get). ```console GET _snapshot/my_repository/*?verbose=false @@ -411,8 +201,4 @@ In order to restore the indices that are missing shards: As we can see above the indices are `green` and open. The issue is resolved. - -For more guidance on creating and restoring snapshots see [this guide](../../deploy-manage/tools/snapshot-and-restore.md). -:::::: - -::::::: +For additional information about creating and restoring snapshots, refer to the [Snapshot and restore](../../deploy-manage/tools/snapshot-and-restore.md) documentation. diff --git a/troubleshoot/images/elasticsearch-reference-kibana-console.png b/troubleshoot/images/elasticsearch-reference-kibana-console.png deleted file mode 100644 index ac0c39049a..0000000000 Binary files a/troubleshoot/images/elasticsearch-reference-kibana-console.png and /dev/null differ diff --git a/troubleshoot/images/elasticsearch-reference-slm-policies.png b/troubleshoot/images/elasticsearch-reference-slm-policies.png new file mode 100644 index 0000000000..25a68dc25d Binary files /dev/null and b/troubleshoot/images/elasticsearch-reference-slm-policies.png differ