From 439171bc28ed1742421dfa763da73ed2e2456fce Mon Sep 17 00:00:00 2001 From: natasha-moore-elastic Date: Tue, 11 Nov 2025 12:59:36 +0000 Subject: [PATCH 1/3] [OnWeek] Fix Vale rule warnings in manage-data/data-store --- manage-data/data-store/data-streams.md | 2 +- .../data-store/data-streams/failure-store-recipes.md | 12 ++++++------ manage-data/data-store/data-streams/failure-store.md | 12 ++++++------ .../data-store/data-streams/run-downsampling.md | 2 +- manage-data/data-store/mapping.md | 2 +- .../define-runtime-fields-in-search-request.md | 2 +- manage-data/data-store/mapping/dynamic-templates.md | 6 +++--- .../mapping/explore-data-with-runtime-fields.md | 4 ++-- .../data-store/mapping/index-runtime-field.md | 2 +- manage-data/data-store/mapping/map-runtime-field.md | 4 ++-- .../data-store/mapping/retrieve-runtime-field.md | 2 +- manage-data/data-store/mapping/runtime-fields.md | 4 ++-- manage-data/data-store/near-real-time-search.md | 2 +- manage-data/data-store/perform-index-operations.md | 2 +- manage-data/data-store/text-analysis.md | 2 +- .../text-analysis/anatomy-of-an-analyzer.md | 2 +- .../text-analysis/index-search-analysis.md | 6 +++--- 17 files changed, 34 insertions(+), 34 deletions(-) diff --git a/manage-data/data-store/data-streams.md b/manage-data/data-store/data-streams.md index 2f9e3b23c2..58b90997f9 100644 --- a/manage-data/data-store/data-streams.md +++ b/manage-data/data-store/data-streams.md @@ -106,7 +106,7 @@ When a backing index is created, the index is named using the following conventi Some operations, such as a [shrink](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-shrink) or [restore](../../deploy-manage/tools/snapshot-and-restore/restore-snapshot.md), can change a backing index’s name. These name changes do not remove a backing index from its data stream. -The generation of the data stream can change without a new index being added to the data stream (e.g. when an existing backing index is shrunk). This means the backing indices for some generations will never exist. You should not derive any intelligence from the backing indices names. +The generation of the data stream can change without a new index being added to the data stream (for example, when an existing backing index is shrunk). This means the backing indices for some generations will never exist. You should not derive any intelligence from the backing indices names. ## Append-only (mostly) [data-streams-append-only] diff --git a/manage-data/data-store/data-streams/failure-store-recipes.md b/manage-data/data-store/data-streams/failure-store-recipes.md index 04b5bffbff..ceec58f016 100644 --- a/manage-data/data-store/data-streams/failure-store-recipes.md +++ b/manage-data/data-store/data-streams/failure-store-recipes.md @@ -307,7 +307,7 @@ Without tags in place it would not be as clear where in the pipeline the indexin ## Alerting on failed ingestion [failure-store-examples-alerting] -Since failure stores can be searched just like a normal data stream, we can use them as inputs to [alerting rules](../../../explore-analyze/alerts-cases/alerts.md) in +Since failure stores can be searched like a normal data stream, we can use them as inputs to [alerting rules](../../../explore-analyze/alerts-cases/alerts.md) in {{kib}}. Here is a simple alerting example that is triggered when more than ten indexing failures have occurred in the last five minutes for a data stream: :::::{stepper} @@ -382,7 +382,7 @@ We recommend a few best practices for remediating failure data. **Use an ingest pipeline to convert failure documents back into their original document.** Failure documents store failure information along with the document that failed ingestion. The first step for remediating documents should be to use an ingest pipeline to extract the original source from the failure document and then discard any other information about the failure. -**Simulate first to avoid repeat failures.** If you must run a pipeline as part of your remediation process, it is best to simulate the pipeline against the failure first. This will catch any unforeseen issues that may fail the document a second time. Remember, ingest pipeline failures will capture the document before an ingest pipeline is applied to it, which can further complicate remediation when a failure document becomes nested inside a new failure. The easiest way to simulate these changes is via the [pipeline simulate API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-simulate) or the [simulate ingest API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-simulate-ingest). +**Simulate first to avoid repeat failures.** If you must run a pipeline as part of your remediation process, it is best to simulate the pipeline against the failure first. This will catch any unforeseen issues that may fail the document a second time. Remember, ingest pipeline failures will capture the document before an ingest pipeline is applied to it, which can further complicate remediation when a failure document becomes nested inside a new failure. The easiest way to simulate these changes is using the [pipeline simulate API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-simulate) or the [simulate ingest API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-simulate-ingest). ### Remediating ingest node failures [failure-store-examples-remediation-ingest] @@ -511,7 +511,7 @@ Because ingest pipeline failures need to be reprocessed by their original pipeli ``` 1. The `data.id` field is expected to be present. If it isn't present this pipeline will fail. -Fixing a failure's root cause is a often a bespoke process. In this example, instead of discarding the data, we will make this identifier field optional. +Fixing a failure's root cause is often a bespoke process. In this example, instead of discarding the data, we will make this identifier field optional. ```console PUT _ingest/pipeline/my-datastream-default-pipeline @@ -658,7 +658,7 @@ POST _ingest/pipeline/_simulate ] } ``` -1. The index has been updated via the reroute processor. +1. The index has been updated through the reroute processor. 2. The document ID has stayed the same. 3. The source should cleanly match the contents of the original document. @@ -995,7 +995,7 @@ PUT _ingest/pipeline/my-datastream-remediation-pipeline 2. Capture the source of the original document. 3. Discard the `error` field since it wont be needed for the remediation. 4. Also discard the `document` field. -5. We extract all the fields from the original document's source back to the root of the document. The `@timestamp` field is not overwritten and thus will be present in the final document. +5. We extract all the fields from the original document's source back to the root of the document. The `@timestamp` field is not overwritten and will be present in the final document. :::{important} Remember that a document that has failed during indexing has already been processed by the ingest processor! It shouldn't need to be processed again unless you made changes to your pipeline to fix the original problem. Make sure that any fixes applied to the ingest pipeline are reflected in the pipeline logic here. @@ -1088,7 +1088,7 @@ Caused by: j.l.IllegalArgumentException: data stream timestamp field [@timestamp ] } ``` -1. The index has been updated via the script processor. +1. The index has been updated through the script processor. 2. The source should reflect any fixes and match the expected document shape for the final index. 3. In this example case, we find that the failure timestamp has stayed in the source. diff --git a/manage-data/data-store/data-streams/failure-store.md b/manage-data/data-store/data-streams/failure-store.md index 190622fc52..3f27fb5895 100644 --- a/manage-data/data-store/data-streams/failure-store.md +++ b/manage-data/data-store/data-streams/failure-store.md @@ -62,7 +62,7 @@ After a matching data stream is created, its failure store will be enabled. ### Set up for existing data streams [set-up-failure-store-existing] -Enabling the failure store via [index templates](../templates.md) can only affect data streams that are newly created. Existing data streams that use a template are not affected by changes to the template's `data_stream_options` field. +Enabling the failure store using [index templates](../templates.md) can only affect data streams that are newly created. Existing data streams that use a template are not affected by changes to the template's `data_stream_options` field. To modify an existing data stream's options, use the [put data stream options](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-stream-options) API: ```console @@ -96,7 +96,7 @@ PUT _data_stream/my-datastream-existing/_options You can also enable the data stream failure store in {{kib}}. Locate the data stream on the **Streams** page, where a stream maps directly to a data stream. Select a stream to view its details and go to the **Retention** tab where you can find the **Enable failure store** option. ::: -### Enable failure store via cluster setting [set-up-failure-store-cluster-setting] +### Enable failure store using cluster setting [set-up-failure-store-cluster-setting] If you have a large number of existing data streams you may want to enable their failure stores in one place. Instead of updating each of their options individually, set `data_streams.failure_store.enabled` to a list of index patterns in the [cluster settings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-put-settings). Any data streams that match one of these patterns will operate with their failure store enabled. @@ -257,7 +257,7 @@ If the document could have been redirected to a data stream's failure store but 3. The response status is `400 Bad Request` due to the mapping problem. -If the document was redirected to a data stream's failure store but that failed document could not be stored (e.g. due to shard unavailability or a similar problem), then the `failure_store` field on the response will be `failed`, and the response will display the error for the original failure, as well as a suppressed error detailing why the failure could not be stored: +If the document was redirected to a data stream's failure store but that failed document could not be stored (for example, due to shard unavailability or a similar problem), then the `failure_store` field on the response will be `failed`, and the response will display the error for the original failure, as well as a suppressed error detailing why the failure could not be stored: ```console-result { @@ -306,7 +306,7 @@ Once you have accumulated some failures, the failure store can be searched much :::{warning} Documents redirected to the failure store in the event of a failed ingest pipeline will be stored in their original, unprocessed form. If an ingest pipeline normally redacts sensitive information from a document, then failed documents in their original, unprocessed form may contain sensitive information. -Furthermore, failed documents are likely to be structured differently than normal data in a data stream, and thus special care should be taken when making use of [document level security](../../../deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md#document-level-security) or [field level security](../../../deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md#field-level-security). Any security policies that expect to utilize these features for both regular documents and failure documents should account for any differences in document structure between the two document types. +Furthermore, failed documents are likely to be structured differently than normal data in a data stream, and special care should be taken when making use of [document level security](../../../deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md#document-level-security) or [field level security](../../../deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md#field-level-security). Any security policies that expect to utilize these features for both regular documents and failure documents should account for any differences in document structure between the two document types. To limit visibility on potentially sensitive data, users require the [`read_failure_store`](elasticsearch://reference/elasticsearch/security-privileges.md#privileges-list-indices) index privilege for a data stream in order to search that data stream's failure store data. ::: @@ -324,7 +324,7 @@ POST _query?format=txt "query": """FROM my-datastream::failures | DROP error.stack_trace | LIMIT 1""" <1> } ``` -1. We drop the `error.stack_trace` field here just to keep the example free of newlines. +1. We drop the `error.stack_trace` field here to keep the example free of newlines. An example of a search result with the failed document present: @@ -820,7 +820,7 @@ PUT _cluster/settings } ``` -You can also specify the failure store retention period for a data stream on its data stream options. These can be specified via the index template for new data streams, or via the [put data stream options](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-stream-options) API for existing data streams. +You can also specify the failure store retention period for a data stream on its data stream options. These can be specified using the index template for new data streams, or using the [put data stream options](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-stream-options) API for existing data streams. ```console PUT _data_stream/my-datastream/_options diff --git a/manage-data/data-store/data-streams/run-downsampling.md b/manage-data/data-store/data-streams/run-downsampling.md index 0617faf0f7..d4c67a3e55 100644 --- a/manage-data/data-store/data-streams/run-downsampling.md +++ b/manage-data/data-store/data-streams/run-downsampling.md @@ -33,7 +33,7 @@ stack: ga serverless: ga ``` -To downsample a time series via a [data stream lifecycle](/manage-data/lifecycle/data-stream.md), add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the data stream lifecycle (for existing data streams) or the index template (for new data streams). +To downsample a time series using a [data stream lifecycle](/manage-data/lifecycle/data-stream.md), add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the data stream lifecycle (for existing data streams) or the index template (for new data streams). * Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. * Set `after` to the minimum time to wait after an index rollover, before running downsampling. diff --git a/manage-data/data-store/mapping.md b/manage-data/data-store/mapping.md index 06e23759c4..517f67ef54 100644 --- a/manage-data/data-store/mapping.md +++ b/manage-data/data-store/mapping.md @@ -23,7 +23,7 @@ products: % - [x] ./raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-mapper.md % Notes: redirect only -% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): +% Internal links rely on the following IDs being on this page (for example, as a heading ID, paragraph ID, and so on): $$$mapping-limit-settings$$$ diff --git a/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md b/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md index 2f8b64d69e..d78404c42c 100644 --- a/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md +++ b/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md @@ -12,7 +12,7 @@ products: You can specify a `runtime_mappings` section in a search request to create runtime fields that exist only as part of the query. You specify a script as part of the `runtime_mappings` section, just as you would if [adding a runtime field to the mappings](map-runtime-field.md). -Defining a runtime field in a search request uses the same format as defining a runtime field in the index mapping. Just copy the field definition from the `runtime` in the index mapping to the `runtime_mappings` section of the search request. +Defining a runtime field in a search request uses the same format as defining a runtime field in the index mapping. Copy the field definition from the `runtime` in the index mapping to the `runtime_mappings` section of the search request. The following search request adds a `day_of_week` field to the `runtime_mappings` section. The field values will be calculated dynamically, and only within the context of this search request: diff --git a/manage-data/data-store/mapping/dynamic-templates.md b/manage-data/data-store/mapping/dynamic-templates.md index 7fd8d034ce..875fa2a8cc 100644 --- a/manage-data/data-store/mapping/dynamic-templates.md +++ b/manage-data/data-store/mapping/dynamic-templates.md @@ -193,7 +193,7 @@ The `match_pattern` parameter adjusts the behavior of the `match` parameter to s "match": "^profit_\d+$" ``` -The following example matches all `string` fields whose name starts with `long_` (except for those which end with `_text`) and maps them as `long` fields: +The following example matches all `string` fields whose name starts with `long_` (except for those that end with `_text`) and maps them as `long` fields: ```console PUT my-index-000001 @@ -265,7 +265,7 @@ PUT my-index-000001/_doc/1 ## `path_match` and `path_unmatch` [path-match-unmatch] -The `path_match` and `path_unmatch` parameters work in the same way as `match` and `unmatch`, but operate on the full dotted path to the field, not just the final name, e.g. `some_object.*.some_field`. +The `path_match` and `path_unmatch` parameters work in the same way as `match` and `unmatch`, but operate on the full dotted path to the field, not just the final name, for example, `some_object.*.some_field`. This example copies the values of any fields in the `name` object to the top-level `full_name` field, except for the `middle` field: @@ -342,7 +342,7 @@ PUT my-index-000001/_doc/2 } ``` -Note that the `path_match` and `path_unmatch` parameters match on object paths in addition to leaf fields. As an example, indexing the following document will result in an error because the `path_match` setting also matches the object field `name.title`, which can’t be mapped as text: +The `path_match` and `path_unmatch` parameters match on object paths in addition to leaf fields. As an example, indexing the following document will result in an error because the `path_match` setting also matches the object field `name.title`, which can't be mapped as text: ```console PUT my-index-000001/_doc/2 diff --git a/manage-data/data-store/mapping/explore-data-with-runtime-fields.md b/manage-data/data-store/mapping/explore-data-with-runtime-fields.md index 88033b3e75..ec95a34989 100644 --- a/manage-data/data-store/mapping/explore-data-with-runtime-fields.md +++ b/manage-data/data-store/mapping/explore-data-with-runtime-fields.md @@ -96,7 +96,7 @@ The mapping contains two fields: `@timestamp` and `message`. If you want to retrieve results that include `clientip`, you can add that field as a runtime field in the mapping. The following runtime script defines a [grok pattern](../../../explore-analyze/scripting/grok.md) that extracts structured fields out of a single text field within a document. A grok pattern is like a regular expression that supports aliased expressions that you can reuse. -The script matches on the `%{{COMMONAPACHELOG}}` log pattern, which understands the structure of Apache logs. If the pattern matches (`clientip != null`), the script emits the value of the matching IP address. If the pattern doesn’t match, the script just returns the field value without crashing. +The script matches on the `%{{COMMONAPACHELOG}}` log pattern, which understands the structure of Apache logs. If the pattern matches (`clientip != null`), the script emits the value of the matching IP address. If the pattern doesn't match, the script returns the field value without crashing. ```console PUT my-index-000001/_mappings @@ -116,7 +116,7 @@ PUT my-index-000001/_mappings 1. This condition ensures that the script doesn’t crash even if the pattern of the message doesn’t match. -Alternatively, you can define the same runtime field but in the context of a search request. The runtime definition and the script are exactly the same as the one defined previously in the index mapping. Just copy that definition into the search request under the `runtime_mappings` section and include a query that matches on the runtime field. This query returns the same results as if you defined a search query for the `http.clientip` runtime field in your index mappings, but only in the context of this specific search: +Alternatively, you can define the same runtime field but in the context of a search request. The runtime definition and the script are exactly the same as the one defined previously in the index mapping. Copy that definition into the search request under the `runtime_mappings` section and include a query that matches on the runtime field. This query returns the same results as if you defined a search query for the `http.clientip` runtime field in your index mappings, but only in the context of this specific search: ```console GET my-index-000001/_search diff --git a/manage-data/data-store/mapping/index-runtime-field.md b/manage-data/data-store/mapping/index-runtime-field.md index df647da067..9e05edbe51 100644 --- a/manage-data/data-store/mapping/index-runtime-field.md +++ b/manage-data/data-store/mapping/index-runtime-field.md @@ -10,7 +10,7 @@ products: # Index a runtime field [runtime-indexed] -Runtime fields are defined by the context where they run. For example, you can define runtime fields in the [context of a search query](define-runtime-fields-in-search-request.md) or within the [`runtime` section](map-runtime-field.md) of an index mapping. If you decide to index a runtime field for greater performance, just move the full runtime field definition (including the script) to the context of an index mapping. {{es}} automatically uses these indexed fields to drive queries, resulting in a fast response time. This capability means you can write a script only once, and apply it to any context that supports runtime fields. +Runtime fields are defined by the context where they run. For example, you can define runtime fields in the [context of a search query](define-runtime-fields-in-search-request.md) or within the [`runtime` section](map-runtime-field.md) of an index mapping. If you decide to index a runtime field for greater performance, move the full runtime field definition (including the script) to the context of an index mapping. {{es}} automatically uses these indexed fields to drive queries, resulting in a fast response time. This capability means you can write a script only once, and apply it to any context that supports runtime fields. ::::{note} Indexing a `composite` runtime field is currently not supported. diff --git a/manage-data/data-store/mapping/map-runtime-field.md b/manage-data/data-store/mapping/map-runtime-field.md index b222a2487d..85926ddf92 100644 --- a/manage-data/data-store/mapping/map-runtime-field.md +++ b/manage-data/data-store/mapping/map-runtime-field.md @@ -10,7 +10,7 @@ products: # Map a runtime field [runtime-mapping-fields] -You map runtime fields by adding a `runtime` section under the mapping definition and defining [a Painless script](../../../explore-analyze/scripting/modules-scripting-using.md). This script has access to the entire context of a document, including the original `_source` via `params._source` and any mapped fields plus their values. At query time, the script runs and generates values for each scripted field that is required for the query. +You map runtime fields by adding a `runtime` section under the mapping definition and defining [a Painless script](../../../explore-analyze/scripting/modules-scripting-using.md). This script has access to the entire context of a document, including the original `_source` through `params._source` and any mapped fields plus their values. At query time, the script runs and generates values for each scripted field that is required for the query. ::::{admonition} Emitting runtime field values When defining a Painless script to use with runtime fields, you must include the [`emit` method](elasticsearch://reference/scripting-languages/painless/painless-runtime-fields-context.md) to emit calculated values. @@ -102,7 +102,7 @@ You can alternatively prefix the field you want to retrieve values for with `par ## Ignoring script errors on runtime fields [runtime-errorhandling] -Scripts can throw errors at runtime, e.g. on accessing missing or invalid values in documents or because of performing invalid operations. The `on_script_error` parameter can be used to control error behavior when this happens. Setting this parameter to `continue` will have the effect of silently ignoring all errors on this runtime field. The default `fail` value will cause a shard failure which gets reported in the search response. +Scripts can throw errors at runtime, for example, on accessing missing or invalid values in documents or because of performing invalid operations. The `on_script_error` parameter can be used to control error behavior when this happens. Setting this parameter to `continue` will have the effect of silently ignoring all errors on this runtime field. The default `fail` value will cause a shard failure which gets reported in the search response. ## Updating and removing runtime fields [runtime-updating-scripts] diff --git a/manage-data/data-store/mapping/retrieve-runtime-field.md b/manage-data/data-store/mapping/retrieve-runtime-field.md index 2772667eb2..03ccbeca95 100644 --- a/manage-data/data-store/mapping/retrieve-runtime-field.md +++ b/manage-data/data-store/mapping/retrieve-runtime-field.md @@ -157,7 +157,7 @@ This time, the response includes only two hits. The value for `day_of_week` (`Su ## Retrieve fields from related indices [lookup-runtime-fields] -The [`fields`](elasticsearch://reference/elasticsearch/rest-apis/retrieve-selected-fields.md) parameter on the `_search` API can also be used to retrieve fields from the related indices via runtime fields with a type of `lookup`. +The [`fields`](elasticsearch://reference/elasticsearch/rest-apis/retrieve-selected-fields.md) parameter on the `_search` API can also be used to retrieve fields from the related indices using runtime fields with a type of `lookup`. ::::{note} Fields that are retrieved by runtime fields of type `lookup` can be used to enrich the hits in a search response. It’s not possible to query or aggregate on these fields. diff --git a/manage-data/data-store/mapping/runtime-fields.md b/manage-data/data-store/mapping/runtime-fields.md index ac35bfbea6..d823b48a8a 100644 --- a/manage-data/data-store/mapping/runtime-fields.md +++ b/manage-data/data-store/mapping/runtime-fields.md @@ -35,11 +35,11 @@ At its core, the most important benefit of runtime fields is the ability to add ## Incentives [runtime-incentives] -Runtime fields can replace many of the ways you can use scripting with the `_search` API. How you use a runtime field is impacted by the number of documents that the included script runs against. For example, if you’re using the `fields` parameter on the `_search` API to [retrieve the values of a runtime field](retrieve-runtime-field.md), the script runs only against the top hits just like script fields do. +Runtime fields can replace many of the ways you can use scripting with the `_search` API. How you use a runtime field is impacted by the number of documents that the included script runs against. For example, if you're using the `fields` parameter on the `_search` API to [retrieve the values of a runtime field](retrieve-runtime-field.md), the script runs only against the top hits, similar to script fields. You can use [script fields](elasticsearch://reference/elasticsearch/rest-apis/retrieve-selected-fields.md#script-fields) to access values in `_source` and return calculated values based on a script valuation. Runtime fields have the same capabilities, but provide greater flexibility because you can query and aggregate on runtime fields in a search request. Script fields can only fetch values. -Similarly, you could write a [script query](elasticsearch://reference/query-languages/query-dsl/query-dsl-script-query.md) that filters documents in a search request based on a script. Runtime fields provide a very similar feature that is more flexible. You write a script to create field values and they are available everywhere, such as [`fields`](elasticsearch://reference/elasticsearch/rest-apis/retrieve-selected-fields.md), [all queries](../../../explore-analyze/query-filter/languages/querydsl.md), and [aggregations](../../../explore-analyze/query-filter/aggregations.md). +Similarly, you could write a [script query](elasticsearch://reference/query-languages/query-dsl/query-dsl-script-query.md) that filters documents in a search request based on a script. Runtime fields provide a similar feature that is more flexible. You write a script to create field values and they are available everywhere, such as [`fields`](elasticsearch://reference/elasticsearch/rest-apis/retrieve-selected-fields.md), [all queries](../../../explore-analyze/query-filter/languages/querydsl.md), and [aggregations](../../../explore-analyze/query-filter/aggregations.md). You can also use scripts to [sort search results](elasticsearch://reference/elasticsearch/rest-apis/sort-search-results.md#script-based-sorting), but that same script works exactly the same in a runtime field. diff --git a/manage-data/data-store/near-real-time-search.md b/manage-data/data-store/near-real-time-search.md index df43cc51f5..f0080b41e3 100644 --- a/manage-data/data-store/near-real-time-search.md +++ b/manage-data/data-store/near-real-time-search.md @@ -13,7 +13,7 @@ When a document is stored in {{es}}, it is indexed and fully searchable in *near Lucene, the Java libraries on which {{es}} is based, introduced the concept of per-segment search. A *segment* is similar to an inverted index, but the word *index* in Lucene means "a collection of segments plus a commit point". After a commit, a new segment is added to the commit point and the buffer is cleared. -Sitting between {{es}} and the disk is the filesystem cache. Documents in the in-memory indexing buffer ([Figure 1](#img-pre-refresh)) are written to a new segment ([Figure 2](#img-post-refresh)). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read just like any other file. +Sitting between {{es}} and the disk is the filesystem cache. Documents in the in-memory indexing buffer ([Figure 1](#img-pre-refresh)) are written to a new segment ([Figure 2](#img-post-refresh)). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read like any other file. :::{image} /manage-data/images/elasticsearch-reference-lucene-in-memory-buffer.png :alt: A Lucene index with new documents in the in-memory buffer diff --git a/manage-data/data-store/perform-index-operations.md b/manage-data/data-store/perform-index-operations.md index ed69b0eec0..41744b990a 100644 --- a/manage-data/data-store/perform-index-operations.md +++ b/manage-data/data-store/perform-index-operations.md @@ -20,7 +20,7 @@ To perform index actions: ## Available index operations -Several index operations are available from the **Manage index** menu. Note that some of the operations listed are unavailable in {{serverless-full}} since in that environment many data management tasks are handled automatically. +Several index operations are available from the **Manage index** menu. Some of the operations listed are unavailable in {{serverless-full}} since in that environment many data management tasks are handled automatically. **Show index overview** {applies_to}`stack: ga` {applies_to}`serverless: ga` : View an overview of the index, including its storage size, status, and aliases, as well as a sample API request to add new documents. diff --git a/manage-data/data-store/text-analysis.md b/manage-data/data-store/text-analysis.md index 14ec069fac..d7464ad394 100644 --- a/manage-data/data-store/text-analysis.md +++ b/manage-data/data-store/text-analysis.md @@ -14,7 +14,7 @@ products: _Text analysis_ is the process of converting unstructured text, like the body of an email or a product description, into a structured format that’s [optimized for search](/solutions/search/full-text.md). -Text analysis enables {{es}} to perform full-text search, where the search returns all *relevant* results rather than just exact matches. For example, if you search for `Quick fox jumps`, you probably want the document that contains `A quick brown fox jumps over the lazy dog`, and you might also want documents that contain related words like `fast fox` or `foxes leap`. +Text analysis enables {{es}} to perform full-text search, where the search returns all *relevant* results rather than only exact matches. For example, if you search for `Quick fox jumps`, you probably want the document that contains `A quick brown fox jumps over the lazy dog`, and you might also want documents that contain related words like `fast fox` or `foxes leap`. {{es}} performs text analysis when indexing or searching [`text`](elasticsearch://reference/elasticsearch/mapping-reference/text.md) fields. If your index does _not_ contain `text` fields, no further setup is needed; you can skip the pages in this section. If you _do_ use `text` fields or your text searches aren’t returning results as expected, configuring text analysis can often help. You should also look into analysis configuration if you’re using {{es}} to: diff --git a/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md b/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md index 561910a1ea..4e916ea88e 100644 --- a/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md +++ b/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md @@ -10,7 +10,7 @@ products: # Anatomy of an analyzer [analyzer-anatomy] -An *analyzer* — whether built-in or custom — is just a package which contains three lower-level building blocks: *character filters*, *tokenizers*, and *token filters*. +An *analyzer* — whether built-in or custom — is a package which contains three lower-level building blocks: *character filters*, *tokenizers*, and *token filters*. The built-in [analyzers](elasticsearch://reference/text-analysis/analyzer-reference.md) pre-package these building blocks into analyzers suitable for different languages and types of text. Elasticsearch also exposes the individual building blocks so that they can be combined to define new [`custom`](create-custom-analyzer.md) analyzers. diff --git a/manage-data/data-store/text-analysis/index-search-analysis.md b/manage-data/data-store/text-analysis/index-search-analysis.md index 3861908626..809d0e5d6f 100644 --- a/manage-data/data-store/text-analysis/index-search-analysis.md +++ b/manage-data/data-store/text-analysis/index-search-analysis.md @@ -49,10 +49,10 @@ Later, a user searches the same `text` field for: The user expects this search to match the sentence indexed earlier, `The QUICK brown foxes jumped over the dog!`. -However, the query string does not contain the exact words used in the document’s original text: +However, the query string does not contain the exact words used in the document's original text: -* `Quick` vs `QUICK` -* `fox` vs `foxes` +* `Quick` versus `QUICK` +* `fox` versus `foxes` To account for this, the query string is analyzed using the same analyzer. This analyzer produces the following tokens: From 607cb94396e2d311d2b24d8a8c54b23b87af52e7 Mon Sep 17 00:00:00 2001 From: natasha-moore-elastic Date: Tue, 11 Nov 2025 14:28:33 +0000 Subject: [PATCH 2/3] [OnWeek] Fix Vale rule warnings in manage-data/ingest --- manage-data/ingest/ingest-reference-architectures.md | 2 +- .../ingest/ingest-reference-architectures/agent-kafka-es.md | 2 +- .../ingest/ingest-reference-architectures/ls-for-input.md | 2 +- .../ingest/ingest-reference-architectures/ls-multi.md | 4 ++-- manage-data/ingest/ingest-reference-architectures/lspq.md | 2 +- ...a-from-relational-database-into-elasticsearch-service.md | 6 +++--- ...ngest-logs-from-nodejs-web-application-using-filebeat.md | 2 +- .../ingest-logs-from-python-application-using-filebeat.md | 6 +++--- manage-data/ingest/tools.md | 2 +- manage-data/ingest/transform-enrich.md | 4 ++-- manage-data/ingest/transform-enrich/error-handling.md | 4 ++-- manage-data/ingest/transform-enrich/ingest-pipelines.md | 2 +- .../readable-maintainable-ingest-pipelines.md | 2 +- .../ingest/transform-enrich/set-up-an-enrich-processor.md | 2 +- 14 files changed, 21 insertions(+), 21 deletions(-) diff --git a/manage-data/ingest/ingest-reference-architectures.md b/manage-data/ingest/ingest-reference-architectures.md index ae0917cebf..47e0364c6c 100644 --- a/manage-data/ingest/ingest-reference-architectures.md +++ b/manage-data/ingest/ingest-reference-architectures.md @@ -24,7 +24,7 @@ You can host {{es}} on your own hardware or send your data to {{es}} on {{ecloud | --- | --- | | [*{{agent}} to Elasticsearch*](./ingest-reference-architectures/agent-to-es.md)

![Image showing {{agent}} collecting data and sending to {{es}}](/manage-data/images/ingest-ea-es.png "") | An [{{agent}} integration](https://docs.elastic.co/en/integrations) is available for your data source:

* Software components with [{{agent}} installed](./ingest-reference-architectures/agent-installed.md)
* Software components using [APIs for data collection](./ingest-reference-architectures/agent-apis.md)
| | [*{{agent}} to {{ls}} to Elasticsearch*](./ingest-reference-architectures/agent-ls.md)

![Image showing {{agent}} to {{ls}} to {{es}}](/manage-data/images/ingest-ea-ls-es.png "") | You need additional capabilities offered by {{ls}}:

* [**enrichment**](./ingest-reference-architectures/ls-enrich.md) between {{agent}} and {{es}}
* [**persistent queue (PQ) buffering**](./ingest-reference-architectures/lspq.md) to accommodate network issues and downstream unavailability
* [**proxying**](./ingest-reference-architectures/ls-networkbridge.md) in cases where {{agent}}s have network restrictions for connecting outside of the {{agent}} network
* data needs to be [**routed to multiple**](./ingest-reference-architectures/ls-multi.md) {{es}} clusters and other destinations depending on the content
| -| [*{{agent}} to proxy to Elasticsearch*](./ingest-reference-architectures/agent-proxy.md)

![Image showing connections between {{agent}} and {{es}} using a proxy](/manage-data/images/ingest-ea-proxy-es.png "") | Agents have [network restrictions](./ingest-reference-architectures/agent-proxy.md) that prevent connecting outside of the {{agent}} network Note that [{{ls}} as proxy](./ingest-reference-architectures/ls-networkbridge.md) is one option.
| +| [*{{agent}} to proxy to Elasticsearch*](./ingest-reference-architectures/agent-proxy.md)

![Image showing connections between {{agent}} and {{es}} using a proxy](/manage-data/images/ingest-ea-proxy-es.png "") | Agents have [network restrictions](./ingest-reference-architectures/agent-proxy.md) that prevent connecting outside of the {{agent}} network. [{{ls}} as proxy](./ingest-reference-architectures/ls-networkbridge.md) is one option.
| | [*{{agent}} to {{es}} with Kafka as middleware message queue*](./ingest-reference-architectures/agent-kafka-es.md)

![Image showing {{agent}} collecting data and using Kafka as a message queue enroute to {{es}}](/manage-data/images/ingest-ea-kafka.png "") | Kafka is your [middleware message queue](./ingest-reference-architectures/agent-kafka-es.md):

* [Kafka ES sink connector](./ingest-reference-architectures/agent-kafka-essink.md) to write from Kafka to {{es}}
* [{{ls}} to read from Kafka and route to {{es}}](./ingest-reference-architectures/agent-kafka-ls.md)
| | [*{{ls}} to Elasticsearch*](./ingest-reference-architectures/ls-for-input.md)

![Image showing {{ls}} collecting data and sending to {{es}}](/manage-data/images/ingest-ls-es.png "") | You need to collect data from a source that {{agent}} can’t read (such as databases, AWS Kinesis). Check out the [{{ls}} input plugins](logstash-docs-md://lsr/input-plugins.md).
| | [*Elastic air-gapped architectures*](./ingest-reference-architectures/airgapped-env.md)

![Image showing {{stack}} in an air-gapped environment](/manage-data/images/ingest-ea-airgapped.png "") | You want to deploy {{agent}} and {{stack}} in an air-gapped environment (no access to outside networks)
| diff --git a/manage-data/ingest/ingest-reference-architectures/agent-kafka-es.md b/manage-data/ingest/ingest-reference-architectures/agent-kafka-es.md index b83a824915..9c95d1fbea 100644 --- a/manage-data/ingest/ingest-reference-architectures/agent-kafka-es.md +++ b/manage-data/ingest/ingest-reference-architectures/agent-kafka-es.md @@ -12,7 +12,7 @@ products: ::: Ingest models -: [{{agent}} to {{ls}} to Kafka to {{ls}} to {{es}}: Kafka as middleware message queue](agent-kafka-ls.md).
{{ls}} reads data from Kafka and routes it to {{es}} clusters (and/or other destinations). +: [{{agent}} to {{ls}} to Kafka to {{ls}} to {{es}}: Kafka as middleware message queue](agent-kafka-ls.md).
{{ls}} reads data from Kafka and routes it to {{es}} clusters and other destinations. [{{agent}} to {{ls}} to Kafka to Kafka ES Sink to {{es}}: Kafka as middleware message queue](agent-kafka-essink.md).
Kafka ES sink connector reads from Kafka and writes to {{es}}. diff --git a/manage-data/ingest/ingest-reference-architectures/ls-for-input.md b/manage-data/ingest/ingest-reference-architectures/ls-for-input.md index 93f5e944ba..a9e4777fd6 100644 --- a/manage-data/ingest/ingest-reference-architectures/ls-for-input.md +++ b/manage-data/ingest/ingest-reference-architectures/ls-for-input.md @@ -12,7 +12,7 @@ products: ::: Ingest model -: {{ls}} to collect data from sources not currently supported by {{agent}} and sending the data to {{es}}. Note that the data transformation still happens within the {{es}} ingest pipeline. +: {{ls}} to collect data from sources not currently supported by {{agent}} and sending the data to {{es}}. The data transformation still happens within the {{es}} ingest pipeline. Use when : {{agent}} doesn’t currently support your data source. diff --git a/manage-data/ingest/ingest-reference-architectures/ls-multi.md b/manage-data/ingest/ingest-reference-architectures/ls-multi.md index ca7272b117..a0277d28a5 100644 --- a/manage-data/ingest/ingest-reference-architectures/ls-multi.md +++ b/manage-data/ingest/ingest-reference-architectures/ls-multi.md @@ -13,7 +13,7 @@ products: ::: Ingest model -: {{agent}} to {{ls}} to {{es}} clusters and/or additional destinations +: {{agent}} to {{ls}} to {{es}} clusters and additional destinations Use when : Data collected by {{agent}} needs to be routed to different {{es}} clusters or non-{{es}} destinations depending on the content @@ -21,7 +21,7 @@ Use when Example : Let’s take an example of a Windows workstation, for which we are collecting different types of logs using the System and Windows integrations. These logs need to be sent to different {{es}} clusters and to S3 for backup and a mechanism to send it to other destinations such as different SIEM solutions. In addition, the {{es}} destination is derived based on the type of datastream and an organization identifier. - In such use cases, agents send the data to {{ls}} as a routing mechanism to different destinations. Note that the System and Windows integrations must be installed on all {{es}} clusters to which the data is routed. + In such use cases, agents send the data to {{ls}} as a routing mechanism to different destinations. The System and Windows integrations must be installed on all {{es}} clusters to which the data is routed. Sample config diff --git a/manage-data/ingest/ingest-reference-architectures/lspq.md b/manage-data/ingest/ingest-reference-architectures/lspq.md index 6ce4e9c735..b3b21dbe61 100644 --- a/manage-data/ingest/ingest-reference-architectures/lspq.md +++ b/manage-data/ingest/ingest-reference-architectures/lspq.md @@ -16,7 +16,7 @@ Ingest model : {{agent}} to {{ls}} persistent queue to {{es}} Use when -: Your data flow may encounter network issues, bursts of events, and/or downstream unavailability and you need the ability to buffer the data before ingestion. +: Your data flow may encounter network issues, bursts of events, or downstream unavailability, and you need the ability to buffer the data before ingestion. ## Resources [lspq-resources] diff --git a/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-relational-database-into-elasticsearch-service.md b/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-relational-database-into-elasticsearch-service.md index afd5d47716..67ddb6bf57 100644 --- a/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-relational-database-into-elasticsearch-service.md +++ b/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-relational-database-into-elasticsearch-service.md @@ -109,7 +109,7 @@ For this example, let’s create a new database *es_db* with table *es_table*, a There are two possible ways to address this: - * You can use "soft deletes" in your source database. Essentially, a record is first marked for deletion through a boolean flag. Other programs that are currently using your source database would have to filter out "soft deletes" in their queries. The "soft deletes" are sent over to Elasticsearch, where they can be processed. After that, your source database and Elasticsearch must both remove these "soft deletes." + * You can use "soft deletes" in your source database. Essentially, a record is first marked for deletion through a boolean flag. Other programs that are currently using your source database would have to filter out "soft deletes" in their queries. The "soft deletes" are sent over to Elasticsearch, where they can be processed. After that, your source database and Elasticsearch must both remove these "soft deletes". * You can periodically clear the Elasticsearch indices that are based off of the database, and then refresh Elasticsearch with a fresh ingest of the contents of the database. 3. Log in to your MySQL server and add three records to your new database: @@ -122,7 +122,7 @@ For this example, let’s create a new database *es_db* with table *es_table*, a (3,"Stark"); ``` -4. Verify your data with a SQL statement: +4. Verify your data with an SQL statement: ```txt select * from es_table; @@ -364,7 +364,7 @@ In this section, we configure Logstash to send the MySQL data to Elasticsearch. } ``` -4. At this point, if you simply restart Logstash as is with your new output, then no MySQL data is sent to our Elasticsearch index. +4. If you simply restart Logstash as is with your new output, then no MySQL data is sent to our Elasticsearch index. Why? Logstash retains the previous `sql_last_value` timestamp and sees that no new changes have occurred in the MySQL database since that time. Therefore, based on the SQL query that we configured, there’s no new data to send to Logstash. diff --git a/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-nodejs-web-application-using-filebeat.md b/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-nodejs-web-application-using-filebeat.md index ec90f837b6..c6ae8ad956 100644 --- a/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-nodejs-web-application-using-filebeat.md +++ b/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-nodejs-web-application-using-filebeat.md @@ -42,7 +42,7 @@ For the three following packages, you can create a working directory to install npm install @elastic/ecs-winston-format ``` -* [Got](https://www.npmjs.com/package/got): Got is a "Human-friendly and powerful HTTP request library for Node.js." - this plugin can be used to query the sample web server used in the tutorial. To install the Got package, run the following command in your working directory: +* [Got](https://www.npmjs.com/package/got): Got is a "Human-friendly and powerful HTTP request library for Node.js" - this plugin can be used to query the sample web server used in the tutorial. To install the Got package, run the following command in your working directory: ```sh npm install got diff --git a/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-python-application-using-filebeat.md b/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-python-application-using-filebeat.md index a64260965a..7a242c509c 100644 --- a/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-python-application-using-filebeat.md +++ b/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-python-application-using-filebeat.md @@ -100,7 +100,7 @@ In this step, you’ll create a Python script that generates logs in JSON format This Python script randomly generates one of twelve log messages, continuously, at a random interval of between 1 and 10 seconds. The log messages are written to an `elvis.json` file, each with a timestamp, a log level of _info_, _warning_, _error_, or _critical_, and other data. To add some variance to the log data, the _info_ message _Elvis has left the building_ is set to be the most probable log event. - For simplicity, there is just one log file (`elvis.json`), and it is written to the local directory where `elvis.py` is located. In a production environment, you may have multiple log files associated with different modules and loggers and likely stored in `/var/log` or similar. To learn more about configuring logging in Python, check [Logging facility for Python](https://docs.python.org/3/library/logging.html). + For simplicity, there is only one log file (`elvis.json`), and it is written to the local directory where `elvis.py` is located. In a production environment, you may have multiple log files associated with different modules and loggers and likely stored in `/var/log` or similar. To learn more about configuring logging in Python, check [Logging facility for Python](https://docs.python.org/3/library/logging.html). Having your logs written in a JSON format with ECS fields allows for easy parsing and analysis, and for standardization with other applications. A standard, easily parsable format becomes increasingly important as the volume and type of data captured in your logs expands over time. @@ -127,7 +127,7 @@ To connect to your {{ech}} deployment, stream data, and issue queries, you have ### Cloud ID -To find the [Cloud ID](/deploy-manage/deploy/elastic-cloud/find-cloud-id.md) of your deployment, go to the {{kib}} main menu, then select **Management** → **Integrations** → **Connection details**. Note that the Cloud ID value is in the format `deployment-name:hash`. Save this value to use it later. +To find the [Cloud ID](/deploy-manage/deploy/elastic-cloud/find-cloud-id.md) of your deployment, go to the {{kib}} main menu, then select **Management** → **Integrations** → **Connection details**. The Cloud ID value is in the format `deployment-name:hash`. Save this value to use it later. ### Basic authentication @@ -203,7 +203,7 @@ cloud.id: deployment-name:hash <1> cloud.auth: username:password <2> ``` -1. Uncomment the `cloud.id` line, and add the deployment’s Cloud ID as the key's value. Note that the `cloud.id` value is in the format `deployment-name:hash`. Find your Cloud ID by going to the {{kib}} main menu, and selecting **Management** → **Integrations** → **Connection details**. +1. Uncomment the `cloud.id` line, and add the deployment's Cloud ID as the key's value. The `cloud.id` value is in the format `deployment-name:hash`. Find your Cloud ID by going to the {{kib}} main menu, and selecting **Management** → **Integrations** → **Connection details**. 2. Uncomment the `cloud.auth` line, and add the username and password for your deployment in the format `username:password`. For example, `cloud.auth: elastic:57ugj782kvkwmSKg8uVe`. ::::{note} diff --git a/manage-data/ingest/tools.md b/manage-data/ingest/tools.md index e8098ecc1f..d41e6dcd93 100644 --- a/manage-data/ingest/tools.md +++ b/manage-data/ingest/tools.md @@ -30,7 +30,7 @@ products: % - [x] https://www.elastic.co/customer-success/data-ingestion % - [x] https://github.com/elastic/ingest-docs/pull/1373 -% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): +% Internal links rely on the following IDs being on this page (for example, as a heading ID, paragraph ID, and so on): % These IDs are from content that I'm not including on this current page. I've resolved them by changing the internal links to anchor links where needed. - Wajiha $$$supported-outputs-beats-and-agent$$$ diff --git a/manage-data/ingest/transform-enrich.md b/manage-data/ingest/transform-enrich.md index fbbc087562..418ca6ed13 100644 --- a/manage-data/ingest/transform-enrich.md +++ b/manage-data/ingest/transform-enrich.md @@ -19,7 +19,7 @@ According to your use case, you may want to control the structure of your ingest Finally, to help ensure optimal query results, you may want to customize how text is analyzed and how text fields are defined inside {{es}}. -Note that you can also perform transforms on existing {{es}} indices to pivot data into a summarized format, for example to break down web requests by geography or browser type. To learn more, refer to [Transforming data](../../explore-analyze/transforms.md). +You can also perform transforms on existing {{es}} indices to pivot data into a summarized format, for example to break down web requests by geography or browser type. To learn more, refer to [Transforming data](../../explore-analyze/transforms.md). {{agent}} processors : You can use [{{agent}} processors](/reference/fleet/agent-processors.md) to sanitize or enrich raw data at the source. Use {{agent}} processors if you need to control what data is sent across the wire, or if you need to enrich the raw data with information available on the host. @@ -49,7 +49,7 @@ Index mapping : Ingested data can be mapped dynamically, where {{es}} adds all fields automatically based on the detected data types, or explicitly, where {{es}} maps the incoming data to fields based on your custom rules. -: You can use {{es}} [runtime fields](../data-store/mapping/runtime-fields.md) to define or alter the schema at query time. You can start working with your data without needing to understand how it is structured, add fields to existing documents without reindexing your data, override the value returned from an indexed field, and/or define fields for a specific use without modifying the underlying schema. +: You can use {{es}} [runtime fields](../data-store/mapping/runtime-fields.md) to define or alter the schema at query time. You can start working with your data without needing to understand how it is structured, add fields to existing documents without reindexing your data, override the value returned from an indexed field, and define fields for a specific use without modifying the underlying schema. : Refer to the [Index mapping](../data-store/mapping.md) pages to learn about the dynamic mapping rules that {{es}} runs by default, which ones you can customize, and how to configure your own explicit data to field mappings. diff --git a/manage-data/ingest/transform-enrich/error-handling.md b/manage-data/ingest/transform-enrich/error-handling.md index 31a6d660a7..46c9f30373 100644 --- a/manage-data/ingest/transform-enrich/error-handling.md +++ b/manage-data/ingest/transform-enrich/error-handling.md @@ -9,7 +9,7 @@ applies_to: Ingest pipelines in Elasticsearch are powerful tools for transforming and enriching data before indexing. However, errors can occur during processing. This guide outlines strategies for handling such errors effectively. :::{important} -Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (i.e. transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store. +Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (that is, transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store. ::: Errors in ingest pipelines typically fall into the following categories: @@ -23,7 +23,7 @@ Create an `error-handling-pipeline` that sets `event.kind` to `pipeline_error` a The `on_failure` parameter can be defined either for individual processors or at the pipeline level to catch exceptions that may occur during document processing. The `ignore_failure` option allows a specific processor to silently skip errors without affecting the rest of the pipeline. -## Global vs. processor-specific +## Global versus processor-specific The following example demonstrates how to use the `on_failure` handler at the pipeline level rather than within individual processors. While this approach ensures the pipeline exits gracefully on failure, it also means that processing stops at the point of error. diff --git a/manage-data/ingest/transform-enrich/ingest-pipelines.md b/manage-data/ingest/transform-enrich/ingest-pipelines.md index c64fc7b0b5..553c588f06 100644 --- a/manage-data/ingest/transform-enrich/ingest-pipelines.md +++ b/manage-data/ingest/transform-enrich/ingest-pipelines.md @@ -463,7 +463,7 @@ PUT _ingest/pipeline/my-pipeline ### Classic field access pattern [access-source-pattern-classic] -The `classic` access pattern is the default access pattern that has been around since ingest node first released. Field paths given to processors (e.g. `event.tags.ingest.processed_by`) are split on the dot character (`.`). The processor then uses the resulting field names to traverse the document until a value is found. When writing a value to a document, if its parent fields do not exist in the source, the processor will create nested objects for the missing fields. +The `classic` access pattern is the default access pattern that has been around since ingest node first released. Field paths given to processors (for example, `event.tags.ingest.processed_by`) are split on the dot character (`.`). The processor then uses the resulting field names to traverse the document until a value is found. When writing a value to a document, if its parent fields do not exist in the source, the processor will create nested objects for the missing fields. ```console POST /_ingest/pipeline/_simulate diff --git a/manage-data/ingest/transform-enrich/readable-maintainable-ingest-pipelines.md b/manage-data/ingest/transform-enrich/readable-maintainable-ingest-pipelines.md index 23c9204a17..7b7e2bbf50 100644 --- a/manage-data/ingest/transform-enrich/readable-maintainable-ingest-pipelines.md +++ b/manage-data/ingest/transform-enrich/readable-maintainable-ingest-pipelines.md @@ -366,7 +366,7 @@ POST _ingest/pipeline/_simulate ``` :::{tip} -After storing values as bytes, you can use Kibana's field formatting to display them in a human-friendly format (KB, MB, GB, etc.) without changing the underlying data. +After storing values as bytes, you can use Kibana's field formatting to display them in a human-friendly format (KB, MB, GB, and so on) without changing the underlying data. ::: ## Rename fields diff --git a/manage-data/ingest/transform-enrich/set-up-an-enrich-processor.md b/manage-data/ingest/transform-enrich/set-up-an-enrich-processor.md index 33edcbf115..7704aeba36 100644 --- a/manage-data/ingest/transform-enrich/set-up-an-enrich-processor.md +++ b/manage-data/ingest/transform-enrich/set-up-an-enrich-processor.md @@ -140,7 +140,7 @@ The `enrich` processor has node settings for enrich coordinator and enrich polic The enrich coordinator supports the following node settings: `enrich.cache_size` -: Maximum size of the cache that caches searches for enriching documents. The size can be specified in three units: the raw number of cached searches (e.g. `1000`), an absolute size in bytes (e.g. `100Mb`), or a percentage of the max heap space of the node (e.g. `1%`). Both for the absolute byte size and the percentage of heap space, {{es}} does not guarantee that the enrich cache size will adhere exactly to that maximum, as {{es}} uses the byte size of the serialized search response which is is a good representation of the used space on the heap, but not an exact match. Defaults to `1%`. There is a single cache for all enrich processors in the cluster. +: Maximum size of the cache that caches searches for enriching documents. The size can be specified in three units: the raw number of cached searches (for example, `1000`), an absolute size in bytes (for example, `100Mb`), or a percentage of the max heap space of the node (for example, `1%`). Both for the absolute byte size and the percentage of heap space, {{es}} does not guarantee that the enrich cache size will adhere exactly to that maximum, as {{es}} uses the byte size of the serialized search response which is is a good representation of the used space on the heap, but not an exact match. Defaults to `1%`. There is a single cache for all enrich processors in the cluster. `enrich.coordinator_proxy.max_concurrent_requests` : Maximum number of concurrent [multi-search requests](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-msearch) to run when enriching documents. Defaults to `8`. From 92e595f117bf136cdbc05bb78086dbfafec93ac0 Mon Sep 17 00:00:00 2001 From: natasha-moore-elastic Date: Tue, 11 Nov 2025 14:33:57 +0000 Subject: [PATCH 3/3] Revert "[OnWeek] Fix Vale rule warnings in manage-data/data-store" This reverts commit 439171bc28ed1742421dfa763da73ed2e2456fce. --- manage-data/data-store/data-streams.md | 2 +- .../data-store/data-streams/failure-store-recipes.md | 12 ++++++------ manage-data/data-store/data-streams/failure-store.md | 12 ++++++------ .../data-store/data-streams/run-downsampling.md | 2 +- manage-data/data-store/mapping.md | 2 +- .../define-runtime-fields-in-search-request.md | 2 +- manage-data/data-store/mapping/dynamic-templates.md | 6 +++--- .../mapping/explore-data-with-runtime-fields.md | 4 ++-- .../data-store/mapping/index-runtime-field.md | 2 +- manage-data/data-store/mapping/map-runtime-field.md | 4 ++-- .../data-store/mapping/retrieve-runtime-field.md | 2 +- manage-data/data-store/mapping/runtime-fields.md | 4 ++-- manage-data/data-store/near-real-time-search.md | 2 +- manage-data/data-store/perform-index-operations.md | 2 +- manage-data/data-store/text-analysis.md | 2 +- .../text-analysis/anatomy-of-an-analyzer.md | 2 +- .../text-analysis/index-search-analysis.md | 6 +++--- 17 files changed, 34 insertions(+), 34 deletions(-) diff --git a/manage-data/data-store/data-streams.md b/manage-data/data-store/data-streams.md index 58b90997f9..2f9e3b23c2 100644 --- a/manage-data/data-store/data-streams.md +++ b/manage-data/data-store/data-streams.md @@ -106,7 +106,7 @@ When a backing index is created, the index is named using the following conventi Some operations, such as a [shrink](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-shrink) or [restore](../../deploy-manage/tools/snapshot-and-restore/restore-snapshot.md), can change a backing index’s name. These name changes do not remove a backing index from its data stream. -The generation of the data stream can change without a new index being added to the data stream (for example, when an existing backing index is shrunk). This means the backing indices for some generations will never exist. You should not derive any intelligence from the backing indices names. +The generation of the data stream can change without a new index being added to the data stream (e.g. when an existing backing index is shrunk). This means the backing indices for some generations will never exist. You should not derive any intelligence from the backing indices names. ## Append-only (mostly) [data-streams-append-only] diff --git a/manage-data/data-store/data-streams/failure-store-recipes.md b/manage-data/data-store/data-streams/failure-store-recipes.md index ceec58f016..04b5bffbff 100644 --- a/manage-data/data-store/data-streams/failure-store-recipes.md +++ b/manage-data/data-store/data-streams/failure-store-recipes.md @@ -307,7 +307,7 @@ Without tags in place it would not be as clear where in the pipeline the indexin ## Alerting on failed ingestion [failure-store-examples-alerting] -Since failure stores can be searched like a normal data stream, we can use them as inputs to [alerting rules](../../../explore-analyze/alerts-cases/alerts.md) in +Since failure stores can be searched just like a normal data stream, we can use them as inputs to [alerting rules](../../../explore-analyze/alerts-cases/alerts.md) in {{kib}}. Here is a simple alerting example that is triggered when more than ten indexing failures have occurred in the last five minutes for a data stream: :::::{stepper} @@ -382,7 +382,7 @@ We recommend a few best practices for remediating failure data. **Use an ingest pipeline to convert failure documents back into their original document.** Failure documents store failure information along with the document that failed ingestion. The first step for remediating documents should be to use an ingest pipeline to extract the original source from the failure document and then discard any other information about the failure. -**Simulate first to avoid repeat failures.** If you must run a pipeline as part of your remediation process, it is best to simulate the pipeline against the failure first. This will catch any unforeseen issues that may fail the document a second time. Remember, ingest pipeline failures will capture the document before an ingest pipeline is applied to it, which can further complicate remediation when a failure document becomes nested inside a new failure. The easiest way to simulate these changes is using the [pipeline simulate API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-simulate) or the [simulate ingest API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-simulate-ingest). +**Simulate first to avoid repeat failures.** If you must run a pipeline as part of your remediation process, it is best to simulate the pipeline against the failure first. This will catch any unforeseen issues that may fail the document a second time. Remember, ingest pipeline failures will capture the document before an ingest pipeline is applied to it, which can further complicate remediation when a failure document becomes nested inside a new failure. The easiest way to simulate these changes is via the [pipeline simulate API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ingest-simulate) or the [simulate ingest API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-simulate-ingest). ### Remediating ingest node failures [failure-store-examples-remediation-ingest] @@ -511,7 +511,7 @@ Because ingest pipeline failures need to be reprocessed by their original pipeli ``` 1. The `data.id` field is expected to be present. If it isn't present this pipeline will fail. -Fixing a failure's root cause is often a bespoke process. In this example, instead of discarding the data, we will make this identifier field optional. +Fixing a failure's root cause is a often a bespoke process. In this example, instead of discarding the data, we will make this identifier field optional. ```console PUT _ingest/pipeline/my-datastream-default-pipeline @@ -658,7 +658,7 @@ POST _ingest/pipeline/_simulate ] } ``` -1. The index has been updated through the reroute processor. +1. The index has been updated via the reroute processor. 2. The document ID has stayed the same. 3. The source should cleanly match the contents of the original document. @@ -995,7 +995,7 @@ PUT _ingest/pipeline/my-datastream-remediation-pipeline 2. Capture the source of the original document. 3. Discard the `error` field since it wont be needed for the remediation. 4. Also discard the `document` field. -5. We extract all the fields from the original document's source back to the root of the document. The `@timestamp` field is not overwritten and will be present in the final document. +5. We extract all the fields from the original document's source back to the root of the document. The `@timestamp` field is not overwritten and thus will be present in the final document. :::{important} Remember that a document that has failed during indexing has already been processed by the ingest processor! It shouldn't need to be processed again unless you made changes to your pipeline to fix the original problem. Make sure that any fixes applied to the ingest pipeline are reflected in the pipeline logic here. @@ -1088,7 +1088,7 @@ Caused by: j.l.IllegalArgumentException: data stream timestamp field [@timestamp ] } ``` -1. The index has been updated through the script processor. +1. The index has been updated via the script processor. 2. The source should reflect any fixes and match the expected document shape for the final index. 3. In this example case, we find that the failure timestamp has stayed in the source. diff --git a/manage-data/data-store/data-streams/failure-store.md b/manage-data/data-store/data-streams/failure-store.md index 3f27fb5895..190622fc52 100644 --- a/manage-data/data-store/data-streams/failure-store.md +++ b/manage-data/data-store/data-streams/failure-store.md @@ -62,7 +62,7 @@ After a matching data stream is created, its failure store will be enabled. ### Set up for existing data streams [set-up-failure-store-existing] -Enabling the failure store using [index templates](../templates.md) can only affect data streams that are newly created. Existing data streams that use a template are not affected by changes to the template's `data_stream_options` field. +Enabling the failure store via [index templates](../templates.md) can only affect data streams that are newly created. Existing data streams that use a template are not affected by changes to the template's `data_stream_options` field. To modify an existing data stream's options, use the [put data stream options](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-stream-options) API: ```console @@ -96,7 +96,7 @@ PUT _data_stream/my-datastream-existing/_options You can also enable the data stream failure store in {{kib}}. Locate the data stream on the **Streams** page, where a stream maps directly to a data stream. Select a stream to view its details and go to the **Retention** tab where you can find the **Enable failure store** option. ::: -### Enable failure store using cluster setting [set-up-failure-store-cluster-setting] +### Enable failure store via cluster setting [set-up-failure-store-cluster-setting] If you have a large number of existing data streams you may want to enable their failure stores in one place. Instead of updating each of their options individually, set `data_streams.failure_store.enabled` to a list of index patterns in the [cluster settings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-put-settings). Any data streams that match one of these patterns will operate with their failure store enabled. @@ -257,7 +257,7 @@ If the document could have been redirected to a data stream's failure store but 3. The response status is `400 Bad Request` due to the mapping problem. -If the document was redirected to a data stream's failure store but that failed document could not be stored (for example, due to shard unavailability or a similar problem), then the `failure_store` field on the response will be `failed`, and the response will display the error for the original failure, as well as a suppressed error detailing why the failure could not be stored: +If the document was redirected to a data stream's failure store but that failed document could not be stored (e.g. due to shard unavailability or a similar problem), then the `failure_store` field on the response will be `failed`, and the response will display the error for the original failure, as well as a suppressed error detailing why the failure could not be stored: ```console-result { @@ -306,7 +306,7 @@ Once you have accumulated some failures, the failure store can be searched much :::{warning} Documents redirected to the failure store in the event of a failed ingest pipeline will be stored in their original, unprocessed form. If an ingest pipeline normally redacts sensitive information from a document, then failed documents in their original, unprocessed form may contain sensitive information. -Furthermore, failed documents are likely to be structured differently than normal data in a data stream, and special care should be taken when making use of [document level security](../../../deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md#document-level-security) or [field level security](../../../deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md#field-level-security). Any security policies that expect to utilize these features for both regular documents and failure documents should account for any differences in document structure between the two document types. +Furthermore, failed documents are likely to be structured differently than normal data in a data stream, and thus special care should be taken when making use of [document level security](../../../deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md#document-level-security) or [field level security](../../../deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level.md#field-level-security). Any security policies that expect to utilize these features for both regular documents and failure documents should account for any differences in document structure between the two document types. To limit visibility on potentially sensitive data, users require the [`read_failure_store`](elasticsearch://reference/elasticsearch/security-privileges.md#privileges-list-indices) index privilege for a data stream in order to search that data stream's failure store data. ::: @@ -324,7 +324,7 @@ POST _query?format=txt "query": """FROM my-datastream::failures | DROP error.stack_trace | LIMIT 1""" <1> } ``` -1. We drop the `error.stack_trace` field here to keep the example free of newlines. +1. We drop the `error.stack_trace` field here just to keep the example free of newlines. An example of a search result with the failed document present: @@ -820,7 +820,7 @@ PUT _cluster/settings } ``` -You can also specify the failure store retention period for a data stream on its data stream options. These can be specified using the index template for new data streams, or using the [put data stream options](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-stream-options) API for existing data streams. +You can also specify the failure store retention period for a data stream on its data stream options. These can be specified via the index template for new data streams, or via the [put data stream options](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-stream-options) API for existing data streams. ```console PUT _data_stream/my-datastream/_options diff --git a/manage-data/data-store/data-streams/run-downsampling.md b/manage-data/data-store/data-streams/run-downsampling.md index d4c67a3e55..0617faf0f7 100644 --- a/manage-data/data-store/data-streams/run-downsampling.md +++ b/manage-data/data-store/data-streams/run-downsampling.md @@ -33,7 +33,7 @@ stack: ga serverless: ga ``` -To downsample a time series using a [data stream lifecycle](/manage-data/lifecycle/data-stream.md), add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the data stream lifecycle (for existing data streams) or the index template (for new data streams). +To downsample a time series via a [data stream lifecycle](/manage-data/lifecycle/data-stream.md), add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the data stream lifecycle (for existing data streams) or the index template (for new data streams). * Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. * Set `after` to the minimum time to wait after an index rollover, before running downsampling. diff --git a/manage-data/data-store/mapping.md b/manage-data/data-store/mapping.md index 517f67ef54..06e23759c4 100644 --- a/manage-data/data-store/mapping.md +++ b/manage-data/data-store/mapping.md @@ -23,7 +23,7 @@ products: % - [x] ./raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-mapper.md % Notes: redirect only -% Internal links rely on the following IDs being on this page (for example, as a heading ID, paragraph ID, and so on): +% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): $$$mapping-limit-settings$$$ diff --git a/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md b/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md index d78404c42c..2f8b64d69e 100644 --- a/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md +++ b/manage-data/data-store/mapping/define-runtime-fields-in-search-request.md @@ -12,7 +12,7 @@ products: You can specify a `runtime_mappings` section in a search request to create runtime fields that exist only as part of the query. You specify a script as part of the `runtime_mappings` section, just as you would if [adding a runtime field to the mappings](map-runtime-field.md). -Defining a runtime field in a search request uses the same format as defining a runtime field in the index mapping. Copy the field definition from the `runtime` in the index mapping to the `runtime_mappings` section of the search request. +Defining a runtime field in a search request uses the same format as defining a runtime field in the index mapping. Just copy the field definition from the `runtime` in the index mapping to the `runtime_mappings` section of the search request. The following search request adds a `day_of_week` field to the `runtime_mappings` section. The field values will be calculated dynamically, and only within the context of this search request: diff --git a/manage-data/data-store/mapping/dynamic-templates.md b/manage-data/data-store/mapping/dynamic-templates.md index 875fa2a8cc..7fd8d034ce 100644 --- a/manage-data/data-store/mapping/dynamic-templates.md +++ b/manage-data/data-store/mapping/dynamic-templates.md @@ -193,7 +193,7 @@ The `match_pattern` parameter adjusts the behavior of the `match` parameter to s "match": "^profit_\d+$" ``` -The following example matches all `string` fields whose name starts with `long_` (except for those that end with `_text`) and maps them as `long` fields: +The following example matches all `string` fields whose name starts with `long_` (except for those which end with `_text`) and maps them as `long` fields: ```console PUT my-index-000001 @@ -265,7 +265,7 @@ PUT my-index-000001/_doc/1 ## `path_match` and `path_unmatch` [path-match-unmatch] -The `path_match` and `path_unmatch` parameters work in the same way as `match` and `unmatch`, but operate on the full dotted path to the field, not just the final name, for example, `some_object.*.some_field`. +The `path_match` and `path_unmatch` parameters work in the same way as `match` and `unmatch`, but operate on the full dotted path to the field, not just the final name, e.g. `some_object.*.some_field`. This example copies the values of any fields in the `name` object to the top-level `full_name` field, except for the `middle` field: @@ -342,7 +342,7 @@ PUT my-index-000001/_doc/2 } ``` -The `path_match` and `path_unmatch` parameters match on object paths in addition to leaf fields. As an example, indexing the following document will result in an error because the `path_match` setting also matches the object field `name.title`, which can't be mapped as text: +Note that the `path_match` and `path_unmatch` parameters match on object paths in addition to leaf fields. As an example, indexing the following document will result in an error because the `path_match` setting also matches the object field `name.title`, which can’t be mapped as text: ```console PUT my-index-000001/_doc/2 diff --git a/manage-data/data-store/mapping/explore-data-with-runtime-fields.md b/manage-data/data-store/mapping/explore-data-with-runtime-fields.md index ec95a34989..88033b3e75 100644 --- a/manage-data/data-store/mapping/explore-data-with-runtime-fields.md +++ b/manage-data/data-store/mapping/explore-data-with-runtime-fields.md @@ -96,7 +96,7 @@ The mapping contains two fields: `@timestamp` and `message`. If you want to retrieve results that include `clientip`, you can add that field as a runtime field in the mapping. The following runtime script defines a [grok pattern](../../../explore-analyze/scripting/grok.md) that extracts structured fields out of a single text field within a document. A grok pattern is like a regular expression that supports aliased expressions that you can reuse. -The script matches on the `%{{COMMONAPACHELOG}}` log pattern, which understands the structure of Apache logs. If the pattern matches (`clientip != null`), the script emits the value of the matching IP address. If the pattern doesn't match, the script returns the field value without crashing. +The script matches on the `%{{COMMONAPACHELOG}}` log pattern, which understands the structure of Apache logs. If the pattern matches (`clientip != null`), the script emits the value of the matching IP address. If the pattern doesn’t match, the script just returns the field value without crashing. ```console PUT my-index-000001/_mappings @@ -116,7 +116,7 @@ PUT my-index-000001/_mappings 1. This condition ensures that the script doesn’t crash even if the pattern of the message doesn’t match. -Alternatively, you can define the same runtime field but in the context of a search request. The runtime definition and the script are exactly the same as the one defined previously in the index mapping. Copy that definition into the search request under the `runtime_mappings` section and include a query that matches on the runtime field. This query returns the same results as if you defined a search query for the `http.clientip` runtime field in your index mappings, but only in the context of this specific search: +Alternatively, you can define the same runtime field but in the context of a search request. The runtime definition and the script are exactly the same as the one defined previously in the index mapping. Just copy that definition into the search request under the `runtime_mappings` section and include a query that matches on the runtime field. This query returns the same results as if you defined a search query for the `http.clientip` runtime field in your index mappings, but only in the context of this specific search: ```console GET my-index-000001/_search diff --git a/manage-data/data-store/mapping/index-runtime-field.md b/manage-data/data-store/mapping/index-runtime-field.md index 9e05edbe51..df647da067 100644 --- a/manage-data/data-store/mapping/index-runtime-field.md +++ b/manage-data/data-store/mapping/index-runtime-field.md @@ -10,7 +10,7 @@ products: # Index a runtime field [runtime-indexed] -Runtime fields are defined by the context where they run. For example, you can define runtime fields in the [context of a search query](define-runtime-fields-in-search-request.md) or within the [`runtime` section](map-runtime-field.md) of an index mapping. If you decide to index a runtime field for greater performance, move the full runtime field definition (including the script) to the context of an index mapping. {{es}} automatically uses these indexed fields to drive queries, resulting in a fast response time. This capability means you can write a script only once, and apply it to any context that supports runtime fields. +Runtime fields are defined by the context where they run. For example, you can define runtime fields in the [context of a search query](define-runtime-fields-in-search-request.md) or within the [`runtime` section](map-runtime-field.md) of an index mapping. If you decide to index a runtime field for greater performance, just move the full runtime field definition (including the script) to the context of an index mapping. {{es}} automatically uses these indexed fields to drive queries, resulting in a fast response time. This capability means you can write a script only once, and apply it to any context that supports runtime fields. ::::{note} Indexing a `composite` runtime field is currently not supported. diff --git a/manage-data/data-store/mapping/map-runtime-field.md b/manage-data/data-store/mapping/map-runtime-field.md index 85926ddf92..b222a2487d 100644 --- a/manage-data/data-store/mapping/map-runtime-field.md +++ b/manage-data/data-store/mapping/map-runtime-field.md @@ -10,7 +10,7 @@ products: # Map a runtime field [runtime-mapping-fields] -You map runtime fields by adding a `runtime` section under the mapping definition and defining [a Painless script](../../../explore-analyze/scripting/modules-scripting-using.md). This script has access to the entire context of a document, including the original `_source` through `params._source` and any mapped fields plus their values. At query time, the script runs and generates values for each scripted field that is required for the query. +You map runtime fields by adding a `runtime` section under the mapping definition and defining [a Painless script](../../../explore-analyze/scripting/modules-scripting-using.md). This script has access to the entire context of a document, including the original `_source` via `params._source` and any mapped fields plus their values. At query time, the script runs and generates values for each scripted field that is required for the query. ::::{admonition} Emitting runtime field values When defining a Painless script to use with runtime fields, you must include the [`emit` method](elasticsearch://reference/scripting-languages/painless/painless-runtime-fields-context.md) to emit calculated values. @@ -102,7 +102,7 @@ You can alternatively prefix the field you want to retrieve values for with `par ## Ignoring script errors on runtime fields [runtime-errorhandling] -Scripts can throw errors at runtime, for example, on accessing missing or invalid values in documents or because of performing invalid operations. The `on_script_error` parameter can be used to control error behavior when this happens. Setting this parameter to `continue` will have the effect of silently ignoring all errors on this runtime field. The default `fail` value will cause a shard failure which gets reported in the search response. +Scripts can throw errors at runtime, e.g. on accessing missing or invalid values in documents or because of performing invalid operations. The `on_script_error` parameter can be used to control error behavior when this happens. Setting this parameter to `continue` will have the effect of silently ignoring all errors on this runtime field. The default `fail` value will cause a shard failure which gets reported in the search response. ## Updating and removing runtime fields [runtime-updating-scripts] diff --git a/manage-data/data-store/mapping/retrieve-runtime-field.md b/manage-data/data-store/mapping/retrieve-runtime-field.md index 03ccbeca95..2772667eb2 100644 --- a/manage-data/data-store/mapping/retrieve-runtime-field.md +++ b/manage-data/data-store/mapping/retrieve-runtime-field.md @@ -157,7 +157,7 @@ This time, the response includes only two hits. The value for `day_of_week` (`Su ## Retrieve fields from related indices [lookup-runtime-fields] -The [`fields`](elasticsearch://reference/elasticsearch/rest-apis/retrieve-selected-fields.md) parameter on the `_search` API can also be used to retrieve fields from the related indices using runtime fields with a type of `lookup`. +The [`fields`](elasticsearch://reference/elasticsearch/rest-apis/retrieve-selected-fields.md) parameter on the `_search` API can also be used to retrieve fields from the related indices via runtime fields with a type of `lookup`. ::::{note} Fields that are retrieved by runtime fields of type `lookup` can be used to enrich the hits in a search response. It’s not possible to query or aggregate on these fields. diff --git a/manage-data/data-store/mapping/runtime-fields.md b/manage-data/data-store/mapping/runtime-fields.md index d823b48a8a..ac35bfbea6 100644 --- a/manage-data/data-store/mapping/runtime-fields.md +++ b/manage-data/data-store/mapping/runtime-fields.md @@ -35,11 +35,11 @@ At its core, the most important benefit of runtime fields is the ability to add ## Incentives [runtime-incentives] -Runtime fields can replace many of the ways you can use scripting with the `_search` API. How you use a runtime field is impacted by the number of documents that the included script runs against. For example, if you're using the `fields` parameter on the `_search` API to [retrieve the values of a runtime field](retrieve-runtime-field.md), the script runs only against the top hits, similar to script fields. +Runtime fields can replace many of the ways you can use scripting with the `_search` API. How you use a runtime field is impacted by the number of documents that the included script runs against. For example, if you’re using the `fields` parameter on the `_search` API to [retrieve the values of a runtime field](retrieve-runtime-field.md), the script runs only against the top hits just like script fields do. You can use [script fields](elasticsearch://reference/elasticsearch/rest-apis/retrieve-selected-fields.md#script-fields) to access values in `_source` and return calculated values based on a script valuation. Runtime fields have the same capabilities, but provide greater flexibility because you can query and aggregate on runtime fields in a search request. Script fields can only fetch values. -Similarly, you could write a [script query](elasticsearch://reference/query-languages/query-dsl/query-dsl-script-query.md) that filters documents in a search request based on a script. Runtime fields provide a similar feature that is more flexible. You write a script to create field values and they are available everywhere, such as [`fields`](elasticsearch://reference/elasticsearch/rest-apis/retrieve-selected-fields.md), [all queries](../../../explore-analyze/query-filter/languages/querydsl.md), and [aggregations](../../../explore-analyze/query-filter/aggregations.md). +Similarly, you could write a [script query](elasticsearch://reference/query-languages/query-dsl/query-dsl-script-query.md) that filters documents in a search request based on a script. Runtime fields provide a very similar feature that is more flexible. You write a script to create field values and they are available everywhere, such as [`fields`](elasticsearch://reference/elasticsearch/rest-apis/retrieve-selected-fields.md), [all queries](../../../explore-analyze/query-filter/languages/querydsl.md), and [aggregations](../../../explore-analyze/query-filter/aggregations.md). You can also use scripts to [sort search results](elasticsearch://reference/elasticsearch/rest-apis/sort-search-results.md#script-based-sorting), but that same script works exactly the same in a runtime field. diff --git a/manage-data/data-store/near-real-time-search.md b/manage-data/data-store/near-real-time-search.md index f0080b41e3..df43cc51f5 100644 --- a/manage-data/data-store/near-real-time-search.md +++ b/manage-data/data-store/near-real-time-search.md @@ -13,7 +13,7 @@ When a document is stored in {{es}}, it is indexed and fully searchable in *near Lucene, the Java libraries on which {{es}} is based, introduced the concept of per-segment search. A *segment* is similar to an inverted index, but the word *index* in Lucene means "a collection of segments plus a commit point". After a commit, a new segment is added to the commit point and the buffer is cleared. -Sitting between {{es}} and the disk is the filesystem cache. Documents in the in-memory indexing buffer ([Figure 1](#img-pre-refresh)) are written to a new segment ([Figure 2](#img-post-refresh)). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read like any other file. +Sitting between {{es}} and the disk is the filesystem cache. Documents in the in-memory indexing buffer ([Figure 1](#img-pre-refresh)) are written to a new segment ([Figure 2](#img-post-refresh)). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read just like any other file. :::{image} /manage-data/images/elasticsearch-reference-lucene-in-memory-buffer.png :alt: A Lucene index with new documents in the in-memory buffer diff --git a/manage-data/data-store/perform-index-operations.md b/manage-data/data-store/perform-index-operations.md index 41744b990a..ed69b0eec0 100644 --- a/manage-data/data-store/perform-index-operations.md +++ b/manage-data/data-store/perform-index-operations.md @@ -20,7 +20,7 @@ To perform index actions: ## Available index operations -Several index operations are available from the **Manage index** menu. Some of the operations listed are unavailable in {{serverless-full}} since in that environment many data management tasks are handled automatically. +Several index operations are available from the **Manage index** menu. Note that some of the operations listed are unavailable in {{serverless-full}} since in that environment many data management tasks are handled automatically. **Show index overview** {applies_to}`stack: ga` {applies_to}`serverless: ga` : View an overview of the index, including its storage size, status, and aliases, as well as a sample API request to add new documents. diff --git a/manage-data/data-store/text-analysis.md b/manage-data/data-store/text-analysis.md index d7464ad394..14ec069fac 100644 --- a/manage-data/data-store/text-analysis.md +++ b/manage-data/data-store/text-analysis.md @@ -14,7 +14,7 @@ products: _Text analysis_ is the process of converting unstructured text, like the body of an email or a product description, into a structured format that’s [optimized for search](/solutions/search/full-text.md). -Text analysis enables {{es}} to perform full-text search, where the search returns all *relevant* results rather than only exact matches. For example, if you search for `Quick fox jumps`, you probably want the document that contains `A quick brown fox jumps over the lazy dog`, and you might also want documents that contain related words like `fast fox` or `foxes leap`. +Text analysis enables {{es}} to perform full-text search, where the search returns all *relevant* results rather than just exact matches. For example, if you search for `Quick fox jumps`, you probably want the document that contains `A quick brown fox jumps over the lazy dog`, and you might also want documents that contain related words like `fast fox` or `foxes leap`. {{es}} performs text analysis when indexing or searching [`text`](elasticsearch://reference/elasticsearch/mapping-reference/text.md) fields. If your index does _not_ contain `text` fields, no further setup is needed; you can skip the pages in this section. If you _do_ use `text` fields or your text searches aren’t returning results as expected, configuring text analysis can often help. You should also look into analysis configuration if you’re using {{es}} to: diff --git a/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md b/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md index 4e916ea88e..561910a1ea 100644 --- a/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md +++ b/manage-data/data-store/text-analysis/anatomy-of-an-analyzer.md @@ -10,7 +10,7 @@ products: # Anatomy of an analyzer [analyzer-anatomy] -An *analyzer* — whether built-in or custom — is a package which contains three lower-level building blocks: *character filters*, *tokenizers*, and *token filters*. +An *analyzer* — whether built-in or custom — is just a package which contains three lower-level building blocks: *character filters*, *tokenizers*, and *token filters*. The built-in [analyzers](elasticsearch://reference/text-analysis/analyzer-reference.md) pre-package these building blocks into analyzers suitable for different languages and types of text. Elasticsearch also exposes the individual building blocks so that they can be combined to define new [`custom`](create-custom-analyzer.md) analyzers. diff --git a/manage-data/data-store/text-analysis/index-search-analysis.md b/manage-data/data-store/text-analysis/index-search-analysis.md index 809d0e5d6f..3861908626 100644 --- a/manage-data/data-store/text-analysis/index-search-analysis.md +++ b/manage-data/data-store/text-analysis/index-search-analysis.md @@ -49,10 +49,10 @@ Later, a user searches the same `text` field for: The user expects this search to match the sentence indexed earlier, `The QUICK brown foxes jumped over the dog!`. -However, the query string does not contain the exact words used in the document's original text: +However, the query string does not contain the exact words used in the document’s original text: -* `Quick` versus `QUICK` -* `fox` versus `foxes` +* `Quick` vs `QUICK` +* `fox` vs `foxes` To account for this, the query string is analyzed using the same analyzer. This analyzer produces the following tokens: