You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: manage-data/data-store/data-streams/failure-store-recipes.md
+21-7Lines changed: 21 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,22 @@
1
-
# Failure store recipes and use cases [failure-store-recipes]
1
+
---
2
+
mapped_pages:
3
+
- (8.19 docs)
4
+
5
+
applies_to:
6
+
stack: ga 9.1
7
+
serverless: ga
8
+
9
+
products:
10
+
- id: elasticsearch
11
+
- id: elastic-stack
12
+
- id: cloud-serverless
13
+
---
14
+
15
+
# Using failure stores to address ingestion issues [failure-store-examples]
2
16
3
17
When something goes wrong during ingestion it is often not an isolated event. Included for your convenience are some examples of how you can use the failure store to quickly respond to ingestion failures and get your indexing back on track.
When a document fails in an ingest pipeline it can be difficult to figure out exactly what went wrong and where. When these failures are captured by the failure store during this part of the ingestion process, they will contain additional debugging information. Failed documents will note the type of processor and which pipeline was executing when the failure occurred. Failed documents will also contain a pipeline trace which keeps track of any nested pipeline calls that the document was in at time of failure.
8
22
@@ -152,7 +166,7 @@ GET _ingest/pipeline/ingest-step-1
152
166
153
167
We find a remove processor in the first pipeline that is the root cause of the problem! The pipeline should be updated to not remove important data, or the downstream pipeline should be changed to not expect the important data to be always present.
Ingest processors can be labeled with tags. These tags are user-provided information that names or describes the processor's purpose in the pipeline. When documents are redirected to the failure store due to a processor issue, they capture the tag from the processor in which the failure occurred, if it exists. Because of this behavior, it is a good practice to tag the processors in your pipeline so that the location of a failure can be identified quickly.
158
172
@@ -292,7 +306,7 @@ GET my-datastream-ingest::failures/_search
292
306
293
307
Without tags in place it would not be as clear where in the pipeline the indexing problem occurred. Tags provide a unique identifier for a processor that can be quickly referenced in case of an ingest failure.
294
308
295
-
## Alerting on failed ingestion [failure-store-recipes-alerting]
309
+
## Alerting on failed ingestion [failure-store-examples-alerting]
296
310
297
311
Since failure stores can be searched just like a normal data stream, we can use them as inputs to [alerting rules](../../../explore-analyze/alerts-cases/alerts.md) in
298
312
{{kib}}. Here is a simple alerting example that is triggered when more than ten indexing failures have occurred in the last five minutes for a data stream:
@@ -353,7 +367,7 @@ Configure schedule, actions, and details of the alert before saving the rule.
353
367
354
368
:::::
355
369
356
-
## Data remediation [failure-store-recipes-remediation]
370
+
## Data remediation [failure-store-examples-remediation]
357
371
358
372
If you've encountered a long span of ingestion failures you may find that a sizeable gap of events has appeared in your data stream. If the failure store is enabled, the documents that should fill those gaps would be tucked away in the data stream's failure store. Because failure stores are made up of regular indices and the failure documents contain the document source that failed, the failure documents can often times be replayed into your production data streams.
359
373
@@ -371,7 +385,7 @@ We recommend a few best practices for remediating failure data.
371
385
372
386
**Simulate first to avoid repeat failures.** If you must run a pipeline as part of your remediation process, it is best to simulate the pipeline against the failure first. This will catch any unforeseen issues that may fail the document a second time. Remember, ingest pipeline failures will capture the document before an ingest pipeline is applied to it, which can further complicate remediation when a failure document becomes nested inside a new failure. The easiest way to simulate these changes is via the [pipeline simulate API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-simulate) or the [simulate ingest API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-simulate-ingest).
Failures that occurred during ingest processing will be stored as they were before any pipelines were run. To replay the document into the data stream we will need to re-run any applicable pipelines for the document.
377
391
@@ -829,7 +843,7 @@ Once any failures have been remediated, you may wish to purge the failures from
829
843
830
844
:::::
831
845
832
-
### Remediating mapping and shard failures [failure-store-recipes-remediation-mapping]
846
+
### Remediating mapping and shard failures [failure-store-examples-remediation-mapping]
833
847
834
848
As described in the previous [failure document source](./failure-store.md#use-failure-store-document-source) section, failures that occur due to a mapping or indexing issue will be stored as they were after any pipelines had executed. This means that to replay the document into the data stream we will need to make sure to skip any pipelines that have already run.
Copy file name to clipboardExpand all lines: manage-data/data-store/data-streams/failure-store.md
+16-2Lines changed: 16 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,15 @@
1
-
---
1
+
---
2
+
mapped_pages:
3
+
- (8.19 docs)
4
+
2
5
applies_to:
3
-
stack: ga 8.19.0, ga 9.1.0
6
+
stack: ga 9.1
4
7
serverless: ga
8
+
9
+
products:
10
+
- id: elasticsearch
11
+
- id: elastic-stack
12
+
- id: cloud-serverless
5
13
---
6
14
7
15
# Failure store [failure-store]
@@ -14,6 +22,10 @@ When a data stream's failure store is enabled, these failures are instead captur
14
22
Failure stores do not capture failures caused by backpressure or document version conflicts. These failures are always returned as-is since they warrant specific action by the client.
15
23
:::
16
24
25
+
On this page, you'll learn how to set up, use, and manage a failure store, as well as the structure of failure store documents.
26
+
27
+
For examples of how to use failure stores to identify and fix errors in ingest pipelines and your data, refer to [](/manage-data/data-store/data-streams/failure-store-recipes.md).
28
+
17
29
## Set up a data stream failure store [set-up-failure-store]
18
30
19
31
Each data stream has its own failure store that can be enabled to accept failed documents. By default, this failure store is disabled and any ingestion problems are raised in the response to write operations.
@@ -119,6 +131,8 @@ PUT _data_stream/my-datastream-1/_options
119
131
120
132
The failure store is meant to ease the burden of detecting and handling failures when ingesting data to {{es}}. Clients are less likely to encounter unrecoverable failures when writing documents, and developers are more easily able to troubleshoot faulty pipelines and mappings.
121
133
134
+
For examples of how to use failure stores to identify and fix errors in ingest pipelines and your data, refer to [](/manage-data/data-store/data-streams/failure-store-recipes.md).
Once a failure store is enabled for a data stream it will begin redirecting documents that fail due to common ingestion problems instead of returning errors in write operations. Clients are notified in a non-intrusive way when a document is redirected to the failure store.
0 commit comments