Skip to content

Commit 72de0d1

Browse files
jbaierashainaraskas
andcommitted
Updates from docs team
Co-authored-by: shainaraskas <[email protected]>
1 parent d521fd7 commit 72de0d1

File tree

2 files changed

+37
-9
lines changed

2 files changed

+37
-9
lines changed

manage-data/data-store/data-streams/failure-store-recipes.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,22 @@
1-
# Failure store recipes and use cases [failure-store-recipes]
1+
---
2+
mapped_pages:
3+
- (8.19 docs)
4+
5+
applies_to:
6+
stack: ga 9.1
7+
serverless: ga
8+
9+
products:
10+
- id: elasticsearch
11+
- id: elastic-stack
12+
- id: cloud-serverless
13+
---
14+
15+
# Using failure stores to address ingestion issues [failure-store-examples]
216

317
When something goes wrong during ingestion it is often not an isolated event. Included for your convenience are some examples of how you can use the failure store to quickly respond to ingestion failures and get your indexing back on track.
418

5-
## Troubleshooting nested ingest pipelines [failure-store-recipes-nested-ingest-troubleshoot]
19+
## Troubleshooting nested ingest pipelines [failure-store-examples-nested-ingest-troubleshoot]
620

721
When a document fails in an ingest pipeline it can be difficult to figure out exactly what went wrong and where. When these failures are captured by the failure store during this part of the ingestion process, they will contain additional debugging information. Failed documents will note the type of processor and which pipeline was executing when the failure occurred. Failed documents will also contain a pipeline trace which keeps track of any nested pipeline calls that the document was in at time of failure.
822

@@ -152,7 +166,7 @@ GET _ingest/pipeline/ingest-step-1
152166

153167
We find a remove processor in the first pipeline that is the root cause of the problem! The pipeline should be updated to not remove important data, or the downstream pipeline should be changed to not expect the important data to be always present.
154168

155-
## Troubleshooting complicated ingest pipelines [failure-store-recipes-complicated-ingest-troubleshoot]
169+
## Troubleshooting complicated ingest pipelines [failure-store-examples-complicated-ingest-troubleshoot]
156170

157171
Ingest processors can be labeled with tags. These tags are user-provided information that names or describes the processor's purpose in the pipeline. When documents are redirected to the failure store due to a processor issue, they capture the tag from the processor in which the failure occurred, if it exists. Because of this behavior, it is a good practice to tag the processors in your pipeline so that the location of a failure can be identified quickly.
158172

@@ -292,7 +306,7 @@ GET my-datastream-ingest::failures/_search
292306

293307
Without tags in place it would not be as clear where in the pipeline the indexing problem occurred. Tags provide a unique identifier for a processor that can be quickly referenced in case of an ingest failure.
294308

295-
## Alerting on failed ingestion [failure-store-recipes-alerting]
309+
## Alerting on failed ingestion [failure-store-examples-alerting]
296310

297311
Since failure stores can be searched just like a normal data stream, we can use them as inputs to [alerting rules](../../../explore-analyze/alerts-cases/alerts.md) in
298312
{{kib}}. Here is a simple alerting example that is triggered when more than ten indexing failures have occurred in the last five minutes for a data stream:
@@ -353,7 +367,7 @@ Configure schedule, actions, and details of the alert before saving the rule.
353367

354368
:::::
355369

356-
## Data remediation [failure-store-recipes-remediation]
370+
## Data remediation [failure-store-examples-remediation]
357371

358372
If you've encountered a long span of ingestion failures you may find that a sizeable gap of events has appeared in your data stream. If the failure store is enabled, the documents that should fill those gaps would be tucked away in the data stream's failure store. Because failure stores are made up of regular indices and the failure documents contain the document source that failed, the failure documents can often times be replayed into your production data streams.
359373

@@ -371,7 +385,7 @@ We recommend a few best practices for remediating failure data.
371385

372386
**Simulate first to avoid repeat failures.** If you must run a pipeline as part of your remediation process, it is best to simulate the pipeline against the failure first. This will catch any unforeseen issues that may fail the document a second time. Remember, ingest pipeline failures will capture the document before an ingest pipeline is applied to it, which can further complicate remediation when a failure document becomes nested inside a new failure. The easiest way to simulate these changes is via the [pipeline simulate API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-simulate) or the [simulate ingest API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-simulate-ingest).
373387

374-
### Remediating ingest node failures [failure-store-recipes-remediation-ingest]
388+
### Remediating ingest node failures [failure-store-examples-remediation-ingest]
375389

376390
Failures that occurred during ingest processing will be stored as they were before any pipelines were run. To replay the document into the data stream we will need to re-run any applicable pipelines for the document.
377391

@@ -829,7 +843,7 @@ Once any failures have been remediated, you may wish to purge the failures from
829843

830844
:::::
831845

832-
### Remediating mapping and shard failures [failure-store-recipes-remediation-mapping]
846+
### Remediating mapping and shard failures [failure-store-examples-remediation-mapping]
833847

834848
As described in the previous [failure document source](./failure-store.md#use-failure-store-document-source) section, failures that occur due to a mapping or indexing issue will be stored as they were after any pipelines had executed. This means that to replay the document into the data stream we will need to make sure to skip any pipelines that have already run.
835849

manage-data/data-store/data-streams/failure-store.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,15 @@
1-
---
1+
---
2+
mapped_pages:
3+
- (8.19 docs)
4+
25
applies_to:
3-
stack: ga 8.19.0, ga 9.1.0
6+
stack: ga 9.1
47
serverless: ga
8+
9+
products:
10+
- id: elasticsearch
11+
- id: elastic-stack
12+
- id: cloud-serverless
513
---
614

715
# Failure store [failure-store]
@@ -14,6 +22,10 @@ When a data stream's failure store is enabled, these failures are instead captur
1422
Failure stores do not capture failures caused by backpressure or document version conflicts. These failures are always returned as-is since they warrant specific action by the client.
1523
:::
1624

25+
On this page, you'll learn how to set up, use, and manage a failure store, as well as the structure of failure store documents.
26+
27+
For examples of how to use failure stores to identify and fix errors in ingest pipelines and your data, refer to [](/manage-data/data-store/data-streams/failure-store-recipes.md).
28+
1729
## Set up a data stream failure store [set-up-failure-store]
1830

1931
Each data stream has its own failure store that can be enabled to accept failed documents. By default, this failure store is disabled and any ingestion problems are raised in the response to write operations.
@@ -119,6 +131,8 @@ PUT _data_stream/my-datastream-1/_options
119131

120132
The failure store is meant to ease the burden of detecting and handling failures when ingesting data to {{es}}. Clients are less likely to encounter unrecoverable failures when writing documents, and developers are more easily able to troubleshoot faulty pipelines and mappings.
121133

134+
For examples of how to use failure stores to identify and fix errors in ingest pipelines and your data, refer to [](/manage-data/data-store/data-streams/failure-store-recipes.md).
135+
122136
### Failure redirection [use-failure-store-redirect]
123137

124138
Once a failure store is enabled for a data stream it will begin redirecting documents that fail due to common ingestion problems instead of returning errors in write operations. Clients are notified in a non-intrusive way when a document is redirected to the failure store.

0 commit comments

Comments
 (0)