diff --git a/docs/reference/data-analysis/aggregations/search-aggregations-bucket-adjacency-matrix-aggregation.md b/docs/reference/data-analysis/aggregations/search-aggregations-bucket-adjacency-matrix-aggregation.md index eb1bc60441714..34b7120d18ad5 100644 --- a/docs/reference/data-analysis/aggregations/search-aggregations-bucket-adjacency-matrix-aggregation.md +++ b/docs/reference/data-analysis/aggregations/search-aggregations-bucket-adjacency-matrix-aggregation.md @@ -1,7 +1,5 @@ --- navigation_title: "Adjacency matrix" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-adjacency-matrix-aggregation.html --- # Adjacency matrix aggregation [search-aggregations-bucket-adjacency-matrix-aggregation] @@ -88,6 +86,12 @@ The response contains buckets with document counts for each filter and combinati } ``` +% TESTRESPONSE[s/"took": 9/"took": $body.took/] + +% TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/] + +% TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/] + ## Parameters [adjacency-matrix-agg-params] @@ -96,9 +100,9 @@ The response contains buckets with document counts for each filter and combinati ::::{dropdown} Properties of `filters` `` - : (Required, [Query DSL object](/reference/query-languages/querydsl.md)) Query used to filter documents. The key is the filter name. + : (Required, [Query DSL object](query-dsl.md)) Query used to filter documents. The key is the filter name. - At least one filter is required. The total number of filters cannot exceed the [`indices.query.bool.max_clause_count`](/reference/elasticsearch/configuration-reference/search-settings.md#indices-query-bool-max-clause-count) setting. See [Filter limits](#adjacency-matrix-agg-filter-limits). + At least one filter is required. The total number of filters cannot exceed the [`indices.query.bool.max_clause_count`](search-settings.md#indices-query-bool-max-clause-count) setting. See [Filter limits](search-aggregations-bucket-adjacency-matrix-aggregation.md#adjacency-matrix-agg-filter-limits). :::: diff --git a/docs/reference/data-analysis/aggregations/search-aggregations-bucket-autodatehistogram-aggregation.md b/docs/reference/data-analysis/aggregations/search-aggregations-bucket-autodatehistogram-aggregation.md index 67ffd7b0d6291..17a9cd921c348 100644 --- a/docs/reference/data-analysis/aggregations/search-aggregations-bucket-autodatehistogram-aggregation.md +++ b/docs/reference/data-analysis/aggregations/search-aggregations-bucket-autodatehistogram-aggregation.md @@ -1,13 +1,11 @@ --- navigation_title: "Auto-interval date histogram" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-autodatehistogram-aggregation.html --- # Auto-interval date histogram aggregation [search-aggregations-bucket-autodatehistogram-aggregation] -A multi-bucket aggregation similar to the [Date histogram](/reference/data-analysis/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) except instead of providing an interval to use as the width of each bucket, a target number of buckets is provided indicating the number of buckets needed and the interval of the buckets is automatically chosen to best achieve that target. The number of buckets returned will always be less than or equal to this target number. +A multi-bucket aggregation similar to the [Date histogram](search-aggregations-bucket-datehistogram-aggregation.md) except instead of providing an interval to use as the width of each bucket, a target number of buckets is provided indicating the number of buckets needed and the interval of the buckets is automatically chosen to best achieve that target. The number of buckets returned will always be less than or equal to this target number. The buckets field is optional, and will default to 10 buckets if not specified. @@ -29,12 +27,14 @@ POST /sales/_search?size=0 } ``` +% TEST[setup:sales] + ## Keys [_keys] Internally, a date is represented as a 64 bit number representing a timestamp in milliseconds-since-the-epoch. These timestamps are returned as the bucket `key`s. The `key_as_string` is the same timestamp converted to a formatted date string using the format specified with the `format` parameter: -::::{tip} -If no `format` is specified, then it will use the first date [format](/reference/elasticsearch/mapping-reference/mapping-date-format.md) specified in the field mapping. +::::{tip} +If no `format` is specified, then it will use the first date [format](mapping-date-format.md) specified in the field mapping. :::: @@ -55,7 +55,9 @@ POST /sales/_search?size=0 } ``` -1. Supports expressive date [format pattern](/reference/data-analysis/aggregations/search-aggregations-bucket-daterange-aggregation.md#date-format-pattern) +% TEST[setup:sales] + +1. Supports expressive date [format pattern](search-aggregations-bucket-daterange-aggregation.md#date-format-pattern) Response: @@ -88,6 +90,8 @@ Response: } ``` +% TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/] + ## Intervals [_intervals] @@ -183,6 +187,8 @@ UTC is used if no time zone is specified, three 1-hour buckets are returned star } ``` +% TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/] + If a `time_zone` of `-01:00` is specified, then midnight starts at one hour before midnight UTC: ```console @@ -200,6 +206,8 @@ GET my-index-000001/_search?size=0 } ``` +% TEST[continued] + Now three 1-hour buckets are still returned but the first bucket starts at 11:00pm on 30 September 2015 since that is the local time for the bucket in the specified time zone. ```console-result @@ -230,10 +238,12 @@ Now three 1-hour buckets are still returned but the first bucket starts at 11:00 } ``` +% TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/] + 1. The `key_as_string` value represents midnight on each day in the specified time zone. -::::{warning} +::::{warning} When using time zones that follow DST (daylight savings time) changes, buckets close to the moment when those changes happen can have slightly different sizes than neighbouring buckets. For example, consider a DST start in the `CET` time zone: on 27 March 2016 at 2am, clocks were turned forward 1 hour to 3am local time. If the result of the aggregation was daily buckets, the bucket covering that day will only hold data for 23 hours instead of the usual 24 hours for other buckets. The same is true for shorter intervals like e.g. 12h. Here, we will have only a 11h bucket on the morning of 27 March when the DST shift happens. :::: @@ -269,6 +279,8 @@ POST /sales/_search?size=0 } ``` +% TEST[setup:sales] + ## Missing value [_missing_value] @@ -291,6 +303,8 @@ POST /sales/_search?size=0 } ``` +% TEST[setup:sales] + 1. Documents without a value in the `publish_date` field will fall into the same bucket as documents that have the value `2000-01-01`. diff --git a/docs/reference/data-analysis/aggregations/search-aggregations-bucket-iprange-aggregation.md b/docs/reference/data-analysis/aggregations/search-aggregations-bucket-iprange-aggregation.md index b8bbd060c117a..e3ec32625b454 100644 --- a/docs/reference/data-analysis/aggregations/search-aggregations-bucket-iprange-aggregation.md +++ b/docs/reference/data-analysis/aggregations/search-aggregations-bucket-iprange-aggregation.md @@ -1,13 +1,11 @@ --- navigation_title: "IP range" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-iprange-aggregation.html --- # IP range aggregation [search-aggregations-bucket-iprange-aggregation] -Just like the dedicated [date](/reference/data-analysis/aggregations/search-aggregations-bucket-daterange-aggregation.md) range aggregation, there is also a dedicated range aggregation for IP typed fields: +Just like the dedicated [date](search-aggregations-bucket-daterange-aggregation.md) range aggregation, there is also a dedicated range aggregation for IP typed fields: Example: @@ -31,6 +29,8 @@ GET /ip_addresses/_search } ``` +% TEST[setup:iprange] + Response: ```console-result @@ -56,6 +56,8 @@ Response: } ``` +% TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/] + IP ranges can also be defined as CIDR masks: $$$ip-range-cidr-example$$$ @@ -78,6 +80,8 @@ GET /ip_addresses/_search } ``` +% TEST[setup:iprange] + Response: ```console-result @@ -105,6 +109,8 @@ Response: } ``` +% TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/] + ## Keyed Response [_keyed_response_3] Setting the `keyed` flag to `true` will associate a unique string key with each bucket and return the ranges as a hash rather than an array: @@ -130,6 +136,8 @@ GET /ip_addresses/_search } ``` +% TEST[setup:iprange] + Response: ```console-result @@ -153,6 +161,8 @@ Response: } ``` +% TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/] + It is also possible to customize the key for each range: $$$ip-range-keyed-customized-keys-example$$$ @@ -176,6 +186,8 @@ GET /ip_addresses/_search } ``` +% TEST[setup:iprange] + Response: ```console-result @@ -199,4 +211,6 @@ Response: } ``` +% TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/] + diff --git a/docs/reference/data-analysis/aggregations/search-aggregations-bucket-multi-terms-aggregation.md b/docs/reference/data-analysis/aggregations/search-aggregations-bucket-multi-terms-aggregation.md index 3b0e825186a66..61bdf2683252d 100644 --- a/docs/reference/data-analysis/aggregations/search-aggregations-bucket-multi-terms-aggregation.md +++ b/docs/reference/data-analysis/aggregations/search-aggregations-bucket-multi-terms-aggregation.md @@ -1,15 +1,52 @@ --- navigation_title: "Multi Terms" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-multi-terms-aggregation.html --- # Multi Terms aggregation [search-aggregations-bucket-multi-terms-aggregation] -A multi-bucket value source based aggregation where buckets are dynamically built - one per unique set of values. The multi terms aggregation is very similar to the [`terms aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), however in most cases it will be slower than the terms aggregation and will consume more memory. Therefore, if the same set of fields is constantly used, it would be more efficient to index a combined key for this fields as a separate field and use the terms aggregation on this field. - -The multi_term aggregations are the most useful when you need to sort by a number of document or a metric aggregation on a composite key and get top N results. If sorting is not required and all values are expected to be retrieved using nested terms aggregation or [`composite aggregations`](/reference/data-analysis/aggregations/search-aggregations-bucket-composite-aggregation.md) will be a faster and more memory efficient solution. +A multi-bucket value source based aggregation where buckets are dynamically built - one per unique set of values. The multi terms aggregation is very similar to the [`terms aggregation`](search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), however in most cases it will be slower than the terms aggregation and will consume more memory. Therefore, if the same set of fields is constantly used, it would be more efficient to index a combined key for this fields as a separate field and use the terms aggregation on this field. + +The multi_term aggregations are the most useful when you need to sort by a number of document or a metric aggregation on a composite key and get top N results. If sorting is not required and all values are expected to be retrieved using nested terms aggregation or [`composite aggregations`](search-aggregations-bucket-composite-aggregation.md) will be a faster and more memory efficient solution. + +% +% [source,js] +% -------------------------------------------------- +% PUT /products +% { +% "mappings": { +% "properties": { +% "genre": { +% "type": "keyword" +% }, +% "product": { +% "type": "keyword" +% }, +% "quantity": { +% "type": "integer" +% } +% } +% } +% } +% +% POST /products/_bulk?refresh +% {"index":{"_id":0}} +% {"genre": "rock", "product": "Product A", "quantity": 4} +% {"index":{"_id":1}} +% {"genre": "rock", "product": "Product A", "quantity": 5} +% {"index":{"_id":2}} +% {"genre": "rock", "product": "Product B", "quantity": 1} +% {"index":{"_id":3}} +% {"genre": "jazz", "product": "Product B", "quantity": 10} +% {"index":{"_id":4}} +% {"genre": "electronic", "product": "Product B", "quantity": 3} +% {"index":{"_id":5}} +% {"genre": "electronic"} +% +% ------------------------------------------------- +% // NOTCONSOLE +% // TESTSETUP +% Example: @@ -32,7 +69,9 @@ GET /products/_search } ``` -1. `multi_terms` aggregation can work with the same field types as a [`terms aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) and supports most of the terms aggregation parameters. +% TEST[s/_search/_search\?filter_path=aggregations/] + +1. `multi_terms` aggregation can work with the same field types as a [`terms aggregation`](search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) and supports most of the terms aggregation parameters. Response: @@ -83,6 +122,8 @@ Response: } ``` +% TESTRESPONSE[s/\.\.\.//] + 1. an upper bound of the error on the document counts for each term, see < 2. when there are lots of unique terms, Elasticsearch only returns the top terms; this number is the sum of the document counts for all buckets that are not part of the response 3. the list of the top buckets. @@ -93,7 +134,7 @@ By default, the `multi_terms` aggregation will return the buckets for the top te ## Aggregation Parameters [search-aggregations-bucket-multi-terms-aggregation-parameters] -The following parameters are supported. See [`terms aggregation`](/reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) for more detailed explanation of these parameters. +The following parameters are supported. See [`terms aggregation`](search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order) for more detailed explanation of these parameters. size : Optional. Defines how many term buckets should be returned out of the overall terms list. Defaults to 10. @@ -149,6 +190,8 @@ GET /products/_search } ``` +% TEST[s/_search/_search\?filter_path=aggregations/] + Response: ```console-result @@ -189,6 +232,8 @@ Response: } ``` +% TESTRESPONSE[s/\.\.\.//] + ## Missing value [_missing_value_3] @@ -217,6 +262,8 @@ GET /products/_search } ``` +% TEST[s/_search/_search\?filter_path=aggregations/] + Response: ```console-result @@ -273,13 +320,15 @@ Response: } ``` +% TESTRESPONSE[s/\.\.\.//] + 1. Documents without a value in the `product` field will fall into the same bucket as documents that have the value `Product Z`. ## Mixing field types [_mixing_field_types] -::::{warning} +::::{warning} When aggregating on multiple indices the type of the aggregated field may not be the same in all indices. Some types are compatible with each other (`integer` and `long` or `float` and `double`) but when the types are a mix of decimal and non-decimal number the terms aggregation will promote the non-decimal numbers to decimal numbers. This can result in a loss of precision in the bucket values. :::: @@ -321,6 +370,8 @@ GET /products/_search } ``` +% TEST[s/_search/_search\?filter_path=aggregations/] + ```console-result { ... @@ -379,4 +430,6 @@ GET /products/_search } ``` +% TESTRESPONSE[s/\.\.\.//] + diff --git a/docs/reference/elasticsearch/command-line-tools/setup-passwords.md b/docs/reference/elasticsearch/command-line-tools/setup-passwords.md index b14798c651267..6ad7c1b11fa70 100644 --- a/docs/reference/elasticsearch/command-line-tools/setup-passwords.md +++ b/docs/reference/elasticsearch/command-line-tools/setup-passwords.md @@ -1,21 +1,16 @@ ---- -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-passwords.html ---- - # elasticsearch-setup-passwords [setup-passwords] ::::{admonition} Deprecated in 8.0. :class: warning -The `elasticsearch-setup-passwords` tool is deprecated and will be removed in a future release. To manually reset the password for the built-in users (including the `elastic` user), use the [`elasticsearch-reset-password`](/reference/elasticsearch/command-line-tools/reset-password.md) tool, the {{es}} change password API, or the User Management features in {{kib}}. +The `elasticsearch-setup-passwords` tool is deprecated and will be removed in a future release. To manually reset the password for the built-in users (including the `elastic` user), use the [`elasticsearch-reset-password`](reset-password.md) tool, the {{es}} change password API, or the User Management features in {{kib}}. :::: -The `elasticsearch-setup-passwords` command sets the passwords for the [built-in users](docs-content://deploy-manage/users-roles/cluster-or-deployment-auth/built-in-users.md). +The `elasticsearch-setup-passwords` command sets the passwords for the [built-in users](built-in-users.md). -## Synopsis [_synopsis_10] +## Synopsis [_synopsis_10] ```shell bin/elasticsearch-setup-passwords auto|interactive @@ -24,14 +19,14 @@ bin/elasticsearch-setup-passwords auto|interactive ``` -## Description [_description_17] +## Description [_description_17] -This command is intended for use only during the initial configuration of the {{es}} {{security-features}}. It uses the [`elastic` bootstrap password](docs-content://deploy-manage/users-roles/cluster-or-deployment-auth/built-in-users.md#bootstrap-elastic-passwords) to run user management API requests. If your {{es}} keystore is password protected, before you can set the passwords for the built-in users, you must enter the keystore password. After you set a password for the `elastic` user, the bootstrap password is no longer active and you cannot use this command. Instead, you can change passwords by using the **Management > Users** UI in {{kib}} or the [Change Password API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-security-change-password). +This command is intended for use only during the initial configuration of the {{es}} {security-features}. It uses the [`elastic` bootstrap password](built-in-users.md#bootstrap-elastic-passwords) to run user management API requests. If your {{es}} keystore is password protected, before you can set the passwords for the built-in users, you must enter the keystore password. After you set a password for the `elastic` user, the bootstrap password is no longer active and you cannot use this command. Instead, you can change passwords by using the **Management > Users** UI in {{kib}} or the [Change Password API](security-api-change-password.md). -This command uses an HTTP connection to connect to the cluster and run the user management requests. If your cluster uses TLS/SSL on the HTTP layer, the command automatically attempts to establish the connection by using the HTTPS protocol. It configures the connection by using the `xpack.security.http.ssl` settings in the `elasticsearch.yml` file. If you do not use the default config directory location, ensure that the **ES_PATH_CONF** environment variable returns the correct path before you run the `elasticsearch-setup-passwords` command. You can override settings in your `elasticsearch.yml` file by using the `-E` command option. For more information about debugging connection failures, see [Setup-passwords command fails due to connection failure](docs-content://troubleshoot/elasticsearch/security/trb-security-setup.md). +This command uses an HTTP connection to connect to the cluster and run the user management requests. If your cluster uses TLS/SSL on the HTTP layer, the command automatically attempts to establish the connection by using the HTTPS protocol. It configures the connection by using the `xpack.security.http.ssl` settings in the `elasticsearch.yml` file. If you do not use the default config directory location, ensure that the **ES_PATH_CONF** environment variable returns the correct path before you run the `elasticsearch-setup-passwords` command. You can override settings in your `elasticsearch.yml` file by using the `-E` command option. For more information about debugging connection failures, see [Setup-passwords command fails due to connection failure](trb-security-setup.md). -## Parameters [setup-passwords-parameters] +## Parameters [setup-passwords-parameters] `auto` : Outputs randomly-generated passwords to the console. @@ -58,7 +53,7 @@ This command uses an HTTP connection to connect to the cluster and run the user : Shows verbose output. -## Examples [_examples_22] +## Examples [_examples_22] The following example uses the `-u` parameter to tell the tool where to submit its user management API requests: diff --git a/docs/reference/elasticsearch/index-lifecycle-actions/ilm-allocate.md b/docs/reference/elasticsearch/index-lifecycle-actions/ilm-allocate.md index 69157e0cb7b00..259f2954ac357 100644 --- a/docs/reference/elasticsearch/index-lifecycle-actions/ilm-allocate.md +++ b/docs/reference/elasticsearch/index-lifecycle-actions/ilm-allocate.md @@ -1,29 +1,24 @@ ---- -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-allocate.html ---- - # Allocate [ilm-allocate] Phases allowed: warm, cold. Updates the index settings to change which nodes are allowed to host the index shards and change the number of replicas. -The allocate action is not allowed in the hot phase. The initial allocation for the index must be done manually or via [index templates](docs-content://manage-data/data-store/templates.md). +The allocate action is not allowed in the hot phase. The initial allocation for the index must be done manually or via [index templates](index-templates.md). -You can configure this action to modify both the allocation rules and number of replicas, only the allocation rules, or only the number of replicas. For more information about how {{es}} uses replicas for scaling, see [Get ready for production](docs-content://deploy-manage/production-guidance/getting-ready-for-production-elasticsearch.md). See [Index-level shard allocation filtering](/reference/elasticsearch/index-settings/shard-allocation.md) for more information about controlling where {{es}} allocates shards of a particular index. +You can configure this action to modify both the allocation rules and number of replicas, only the allocation rules, or only the number of replicas. For more information about how {{es}} uses replicas for scaling, see [Get ready for production](scalability.md). See [Index-level shard allocation filtering](shard-allocation-filtering.md) for more information about controlling where {{es}} allocates shards of a particular index. ## Options [ilm-allocate-options] You must specify the number of replicas or at least one `include`, `exclude`, or `require` option. An empty allocate action is invalid. -For more information about using custom attributes for shard allocation, refer to [](/reference/elasticsearch/index-settings/shard-allocation.md). +For more information about using custom attributes for shard allocation, see [Index-level shard allocation filtering](shard-allocation-filtering.md). `number_of_replicas` : (Optional, integer) Number of replicas to assign to the index. `total_shards_per_node` -: (Optional, integer) The maximum number of shards for the index on a single {{es}} node. A value of `-1` is interpreted as unlimited. See [total shards](/reference/elasticsearch/index-settings/total-shards-per-node.md). +: (Optional, integer) The maximum number of shards for the index on a single {{es}} node. A value of `-1` is interpreted as unlimited. See [total shards](allocation-total-shards.md). `include` : (Optional, object) Assigns an index to nodes that have at least *one* of the specified custom attributes. @@ -61,7 +56,7 @@ PUT _ilm/policy/my_policy The allocate action in the following policy assigns the index to nodes that have a `box_type` of *hot* or *warm*. -To designate a node’s `box_type`, you set a custom attribute in the node configuration. For example, set `node.attr.box_type: hot` in `elasticsearch.yml`. For more information, refer to [](/reference/elasticsearch/index-settings/shard-allocation.md#index-allocation-filters). +To designate a node’s `box_type`, you set a custom attribute in the node configuration. For example, set `node.attr.box_type: hot` in `elasticsearch.yml`. For more information, see [Enabling index-level shard allocation filtering](shard-allocation-filtering.md#index-allocation-filters). ```console PUT _ilm/policy/my_policy @@ -112,7 +107,7 @@ PUT _ilm/policy/my_policy The allocate action in the following policy updates the index to have one replica per shard and be allocated to nodes that have a `box_type` of *cold*. -To designate a node’s `box_type`, you set a custom attribute in the node configuration. For example, set `node.attr.box_type: cold` in `elasticsearch.yml`. For more information, refer to [](/reference/elasticsearch/index-settings/shard-allocation.md#index-allocation-filters). +To designate a node’s `box_type`, you set a custom attribute in the node configuration. For example, set `node.attr.box_type: cold` in `elasticsearch.yml`. For more information, see [Enabling index-level shard allocation filtering](shard-allocation-filtering.md#index-allocation-filters). ```console PUT _ilm/policy/my_policy diff --git a/docs/reference/elasticsearch/mapping-reference/position-increment-gap.md b/docs/reference/elasticsearch/mapping-reference/position-increment-gap.md index b76494d0289b8..ed0a1398d2fa9 100644 --- a/docs/reference/elasticsearch/mapping-reference/position-increment-gap.md +++ b/docs/reference/elasticsearch/mapping-reference/position-increment-gap.md @@ -1,11 +1,6 @@ ---- -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/position-increment-gap.html ---- +# `position_increment_gap` [position-increment-gap] -# position_increment_gap [position-increment-gap] - -[Analyzed](/reference/elasticsearch/mapping-reference/mapping-index.md) text fields take term [positions](/reference/elasticsearch/mapping-reference/index-options.md) into account, in order to be able to support [proximity or phrase queries](/reference/query-languages/query-dsl-match-query-phrase.md). When indexing text fields with multiple values a "fake" gap is added between the values to prevent most phrase queries from matching across the values. The size of this gap is configured using `position_increment_gap` and defaults to `100`. +[Analyzed](mapping-index.md) text fields take term [positions](index-options.md) into account, in order to be able to support [proximity or phrase queries](query-dsl-match-query-phrase.md). When indexing text fields with multiple values a "fake" gap is added between the values to prevent most phrase queries from matching across the values. The size of this gap is configured using `position_increment_gap` and defaults to `100`. For example: diff --git a/docs/reference/elasticsearch/mapping-reference/text.md b/docs/reference/elasticsearch/mapping-reference/text.md index d396c6c26af44..423180b036782 100644 --- a/docs/reference/elasticsearch/mapping-reference/text.md +++ b/docs/reference/elasticsearch/mapping-reference/text.md @@ -1,7 +1,5 @@ --- navigation_title: "Text" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html --- # Text type family [text] @@ -9,17 +7,17 @@ mapped_pages: The text family includes the following field types: -* [`text`](#text-field-type), the traditional field type for full-text content such as the body of an email or the description of a product. -* [`match_only_text`](#match-only-text-field-type), a space-optimized variant of `text` that disables scoring and performs slower on queries that need positions. It is best suited for indexing log messages. +* [`text`](text.md#text-field-type), the traditional field type for full-text content such as the body of an email or the description of a product. +* [`match_only_text`](text.md#match-only-text-field-type), a space-optimized variant of `text` that disables scoring and performs slower on queries that need positions. It is best suited for indexing log messages. -## Text field type [text-field-type] +## Text field type [text-field-type] -A field to index full-text values, such as the body of an email or the description of a product. These fields are `analyzed`, that is they are passed through an [analyzer](docs-content://manage-data/data-store/text-analysis.md) to convert the string into a list of individual terms before being indexed. The analysis process allows Elasticsearch to search for individual words *within* each full text field. Text fields are not used for sorting and seldom used for aggregations (although the [significant text aggregation](/reference/data-analysis/aggregations/search-aggregations-bucket-significanttext-aggregation.md) is a notable exception). +A field to index full-text values, such as the body of an email or the description of a product. These fields are `analyzed`, that is they are passed through an [analyzer](analysis.md) to convert the string into a list of individual terms before being indexed. The analysis process allows Elasticsearch to search for individual words *within* each full text field. Text fields are not used for sorting and seldom used for aggregations (although the [significant text aggregation](search-aggregations-bucket-significanttext-aggregation.md) is a notable exception). -`text` fields are best suited for unstructured but human-readable content. If you need to index unstructured machine-generated content, see [Mapping unstructured content](/reference/elasticsearch/mapping-reference/keyword.md#mapping-unstructured-content). +`text` fields are best suited for unstructured but human-readable content. If you need to index unstructured machine-generated content, see [Mapping unstructured content](keyword.md#mapping-unstructured-content). -If you need to index structured content such as email addresses, hostnames, status codes, or tags, it is likely that you should rather use a [`keyword`](/reference/elasticsearch/mapping-reference/keyword.md) field. +If you need to index structured content such as email addresses, hostnames, status codes, or tags, it is likely that you should rather use a [`keyword`](keyword.md) field. Below is an example of a mapping for a text field: @@ -38,73 +36,73 @@ PUT my-index-000001 ## Use a field as both text and keyword [text-multi-fields] -Sometimes it is useful to have both a full text (`text`) and a keyword (`keyword`) version of the same field: one for full text search and the other for aggregations and sorting. This can be achieved with [multi-fields](/reference/elasticsearch/mapping-reference/multi-fields.md). +Sometimes it is useful to have both a full text (`text`) and a keyword (`keyword`) version of the same field: one for full text search and the other for aggregations and sorting. This can be achieved with [multi-fields](multi-fields.md). ## Parameters for text fields [text-params] The following parameters are accepted by `text` fields: -[`analyzer`](/reference/elasticsearch/mapping-reference/analyzer.md) -: The [analyzer](docs-content://manage-data/data-store/text-analysis.md) which should be used for the `text` field, both at index-time and at search-time (unless overridden by the [`search_analyzer`](/reference/elasticsearch/mapping-reference/search-analyzer.md)). Defaults to the default index analyzer, or the [`standard` analyzer](/reference/data-analysis/text-analysis/analysis-standard-analyzer.md). +[`analyzer`](analyzer.md) +: The [analyzer](analysis.md) which should be used for the `text` field, both at index-time and at search-time (unless overridden by the [`search_analyzer`](search-analyzer.md)). Defaults to the default index analyzer, or the [`standard` analyzer](analysis-standard-analyzer.md). -[`eager_global_ordinals`](/reference/elasticsearch/mapping-reference/eager-global-ordinals.md) +[`eager_global_ordinals`](eager-global-ordinals.md) : Should global ordinals be loaded eagerly on refresh? Accepts `true` or `false` (default). Enabling this is a good idea on fields that are frequently used for (significant) terms aggregations. -[`fielddata`](#fielddata-mapping-param) +[`fielddata`](text.md#fielddata-mapping-param) : Can the field use in-memory fielddata for sorting, aggregations, or scripting? Accepts `true` or `false` (default). -[`fielddata_frequency_filter`](#field-data-filtering) +[`fielddata_frequency_filter`](text.md#field-data-filtering) : Expert settings which allow to decide which values to load in memory when `fielddata` is enabled. By default all values are loaded. -[`fields`](/reference/elasticsearch/mapping-reference/multi-fields.md) +[`fields`](multi-fields.md) : Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations, or the same string value analyzed by different analyzers. -[`index`](/reference/elasticsearch/mapping-reference/mapping-index.md) +[`index`](mapping-index.md) : Should the field be searchable? Accepts `true` (default) or `false`. -[`index_options`](/reference/elasticsearch/mapping-reference/index-options.md) +[`index_options`](index-options.md) : What information should be stored in the index, for search and highlighting purposes. Defaults to `positions`. -[`index_prefixes`](/reference/elasticsearch/mapping-reference/index-prefixes.md) +[`index_prefixes`](index-prefixes.md) : If enabled, term prefixes of between 2 and 5 characters are indexed into a separate field. This allows prefix searches to run more efficiently, at the expense of a larger index. -[`index_phrases`](/reference/elasticsearch/mapping-reference/index-phrases.md) +[`index_phrases`](index-phrases.md) : If enabled, two-term word combinations (*shingles*) are indexed into a separate field. This allows exact phrase queries (no slop) to run more efficiently, at the expense of a larger index. Note that this works best when stopwords are not removed, as phrases containing stopwords will not use the subsidiary field and will fall back to a standard phrase query. Accepts `true` or `false` (default). -[`norms`](/reference/elasticsearch/mapping-reference/norms.md) +[`norms`](norms.md) : Whether field-length should be taken into account when scoring queries. Accepts `true` (default) or `false`. -[`position_increment_gap`](/reference/elasticsearch/mapping-reference/position-increment-gap.md) +[`position_increment_gap`](position-increment-gap.md) : The number of fake term position which should be inserted between each element of an array of strings. Defaults to the `position_increment_gap` configured on the analyzer which defaults to `100`. `100` was chosen because it prevents phrase queries with reasonably large slops (less than 100) from matching terms across field values. -[`store`](/reference/elasticsearch/mapping-reference/mapping-store.md) -: Whether the field value should be stored and retrievable separately from the [`_source`](/reference/elasticsearch/mapping-reference/mapping-source-field.md) field. Accepts `true` or `false` (default). +[`store`](mapping-store.md) +: Whether the field value should be stored and retrievable separately from the [`_source`](mapping-source-field.md) field. Accepts `true` or `false` (default). -[`search_analyzer`](/reference/elasticsearch/mapping-reference/search-analyzer.md) -: The [`analyzer`](/reference/elasticsearch/mapping-reference/analyzer.md) that should be used at search time on the `text` field. Defaults to the `analyzer` setting. +[`search_analyzer`](search-analyzer.md) +: The [`analyzer`](analyzer.md) that should be used at search time on the `text` field. Defaults to the `analyzer` setting. -[`search_quote_analyzer`](/reference/elasticsearch/mapping-reference/analyzer.md#search-quote-analyzer) -: The [`analyzer`](/reference/elasticsearch/mapping-reference/analyzer.md) that should be used at search time when a phrase is encountered. Defaults to the `search_analyzer` setting. +[`search_quote_analyzer`](analyzer.md#search-quote-analyzer) +: The [`analyzer`](analyzer.md) that should be used at search time when a phrase is encountered. Defaults to the `search_analyzer` setting. -[`similarity`](/reference/elasticsearch/mapping-reference/similarity.md) +[`similarity`](similarity.md) : Which scoring algorithm or *similarity* should be used. Defaults to `BM25`. -[`term_vector`](/reference/elasticsearch/mapping-reference/term-vector.md) +[`term_vector`](term-vector.md) : Whether term vectors should be stored for the field. Defaults to `no`. -[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md) +[`meta`](mapping-field-meta.md) : Metadata about the field. ## Synthetic `_source` [text-synthetic-source] -::::{important} +::::{important} Synthetic `_source` is Generally Available only for TSDB indices (indices that have `index.mode` set to `time_series`). For other indices synthetic `_source` is in technical preview. Features in technical preview may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. :::: -`text` fields support [synthetic `_source`](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source) if they have a [`keyword`](/reference/elasticsearch/mapping-reference/keyword.md#keyword-synthetic-source) sub-field that supports synthetic `_source` or if the `text` field sets `store` to `true`. Either way, it may not have [`copy_to`](/reference/elasticsearch/mapping-reference/copy-to.md). +`text` fields support [synthetic `_source`](mapping-source-field.md#synthetic-source) if they have a [`keyword`](keyword.md#keyword-synthetic-source) sub-field that supports synthetic `_source` or if the `text` field sets `store` to `true`. Either way, it may not have [`copy_to`](copy-to.md). If using a sub-`keyword` field, then the values are sorted in the same way as a `keyword` field’s values are sorted. By default, that means sorted with duplicates removed. So: @@ -145,6 +143,8 @@ PUT idx/_doc/1 } ``` +% TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/] + Will become: ```console-result @@ -156,8 +156,10 @@ Will become: } ``` -::::{note} -Reordering text fields can have an effect on [phrase](/reference/query-languages/query-dsl-match-query-phrase.md) and [span](/reference/query-languages/span-queries.md) queries. See the discussion about [`position_increment_gap`](/reference/elasticsearch/mapping-reference/position-increment-gap.md) for more detail. You can avoid this by making sure the `slop` parameter on the phrase queries is lower than the `position_increment_gap`. This is the default. +% TEST[s/^/{"_source":/ s/\n$/}/] + +::::{note} +Reordering text fields can have an effect on [phrase](query-dsl-match-query-phrase.md) and [span](span-queries.md) queries. See the discussion about [`position_increment_gap`](position-increment-gap.md) for more detail. You can avoid this by making sure the `slop` parameter on the phrase queries is lower than the `position_increment_gap`. This is the default. :::: @@ -193,6 +195,8 @@ PUT idx/_doc/1 } ``` +% TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/] + Will become: ```console-result @@ -205,12 +209,14 @@ Will become: } ``` +% TEST[s/^/{"_source":/ s/\n$/}/] + ## `fielddata` mapping parameter [fielddata-mapping-param] `text` fields are searchable by default, but by default are not available for aggregations, sorting, or scripting. If you try to sort, aggregate, or access values from a `text` field using a script, you’ll see an exception indicating that field data is disabled by default on text fields. To load field data in memory, set `fielddata=true` on your field. -::::{note} +::::{note} Loading field data in memory can consume significant memory. :::: @@ -220,9 +226,9 @@ Field data is the only way to access the analyzed tokens from a full text field ## Before enabling fielddata [before-enabling-fielddata] -It usually doesn’t make sense to enable fielddata on text fields. Field data is stored in the heap with the [field data cache](/reference/elasticsearch/configuration-reference/field-data-cache-settings.md) because it is expensive to calculate. Calculating the field data can cause latency spikes, and increasing heap usage is a cause of cluster performance issues. +It usually doesn’t make sense to enable fielddata on text fields. Field data is stored in the heap with the [field data cache](modules-fielddata.md) because it is expensive to calculate. Calculating the field data can cause latency spikes, and increasing heap usage is a cause of cluster performance issues. -Most users who want to do more with text fields use [multi-field mappings](/reference/elasticsearch/mapping-reference/multi-fields.md) by having both a `text` field for full text searches, and an unanalyzed [`keyword`](/reference/elasticsearch/mapping-reference/keyword.md) field for aggregations, as follows: +Most users who want to do more with text fields use [multi-field mappings](multi-fields.md) by having both a `text` field for full text searches, and an unanalyzed [`keyword`](keyword.md) field for aggregations, as follows: ```console PUT my-index-000001 @@ -249,7 +255,7 @@ PUT my-index-000001 ## Enabling fielddata on `text` fields [enable-fielddata-text-fields] -You can enable fielddata on an existing `text` field using the [update mapping API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) as follows: +You can enable fielddata on an existing `text` field using the [update mapping API](indices-put-mapping.md) as follows: ```console PUT my-index-000001/_mapping @@ -263,6 +269,8 @@ PUT my-index-000001/_mapping } ``` +% TEST[continued] + 1. The mapping that you specify for `my_field` should consist of the existing mapping for that field, plus the `fielddata` parameter. @@ -295,13 +303,13 @@ PUT my-index-000001 ``` -## Match-only text field type [match-only-text-field-type] +## Match-only text field type [match-only-text-field-type] -A variant of [`text`](#text-field-type) that trades scoring and efficiency of positional queries for space efficiency. This field effectively stores data the same way as a `text` field that only indexes documents (`index_options: docs`) and disables norms (`norms: false`). Term queries perform as fast if not faster as on `text` fields, however queries that need positions such as the [`match_phrase` query](/reference/query-languages/query-dsl-match-query-phrase.md) perform slower as they need to look at the `_source` document to verify whether a phrase matches. All queries return constant scores that are equal to 1.0. +A variant of [`text`](text.md#text-field-type) that trades scoring and efficiency of positional queries for space efficiency. This field effectively stores data the same way as a `text` field that only indexes documents (`index_options: docs`) and disables norms (`norms: false`). Term queries perform as fast if not faster as on `text` fields, however queries that need positions such as the [`match_phrase` query](query-dsl-match-query-phrase.md) perform slower as they need to look at the `_source` document to verify whether a phrase matches. All queries return constant scores that are equal to 1.0. -Analysis is not configurable: text is always analyzed with the [default analyzer](docs-content://manage-data/data-store/text-analysis/specify-an-analyzer.md#specify-index-time-default-analyzer) ([`standard`](/reference/data-analysis/text-analysis/analysis-standard-analyzer.md) by default). +Analysis is not configurable: text is always analyzed with the [default analyzer](specify-analyzer.md#specify-index-time-default-analyzer) ([`standard`](analysis-standard-analyzer.md) by default). -[span queries](/reference/query-languages/span-queries.md) are not supported with this field, use [interval queries](/reference/query-languages/query-dsl-intervals-query.md) instead, or the [`text`](#text-field-type) field type if you absolutely need span queries. +[span queries](span-queries.md) are not supported with this field, use [interval queries](query-dsl-intervals-query.md) instead, or the [`text`](text.md#text-field-type) field type if you absolutely need span queries. Other than that, `match_only_text` supports the same queries as `text`. And like `text`, it does not support sorting and has only limited support for aggregations. @@ -322,14 +330,14 @@ PUT logs ``` -### Parameters for match-only text fields [match-only-text-params] +### Parameters for match-only text fields [match-only-text-params] The following mapping parameters are accepted: -[`fields`](/reference/elasticsearch/mapping-reference/multi-fields.md) +[`fields`](multi-fields.md) : Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations, or the same string value analyzed by different analyzers. -[`meta`](/reference/elasticsearch/mapping-reference/mapping-field-meta.md) +[`meta`](mapping-field-meta.md) : Metadata about the field. diff --git a/docs/reference/ingestion-tools/enrich-processor/pipeline-processor.md b/docs/reference/ingestion-tools/enrich-processor/pipeline-processor.md index c4da79bfce30b..c0f573502c36d 100644 --- a/docs/reference/ingestion-tools/enrich-processor/pipeline-processor.md +++ b/docs/reference/ingestion-tools/enrich-processor/pipeline-processor.md @@ -1,7 +1,5 @@ --- navigation_title: "Pipeline" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/pipeline-processor.html --- # Pipeline processor [pipeline-processor] @@ -13,12 +11,12 @@ $$$pipeline-options$$$ | Name | Required | Default | Description | | --- | --- | --- | --- | -| `name` | yes | - | The name of the pipeline to execute. Supports [template snippets](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#template-snippets). | +| `name` | yes | - | The name of the pipeline to execute. Supports [template snippets](ingest.md#template-snippets). | | `ignore_missing_pipeline` | no | false | Whether to ignore missing pipelines instead of failing. | | `description` | no | - | Description of the processor. Useful for describing the purpose of the processor or its configuration. | -| `if` | no | - | Conditionally execute the processor. See [Conditionally run a processor](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#conditionally-run-processor). | -| `ignore_failure` | no | `false` | Ignore failures for the processor. See [Handling pipeline failures](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#handling-pipeline-failures). | -| `on_failure` | no | - | Handle failures for the processor. See [Handling pipeline failures](docs-content://manage-data/ingest/transform-enrich/ingest-pipelines.md#handling-pipeline-failures). | +| `if` | no | - | Conditionally execute the processor. See [Conditionally run a processor](ingest.md#conditionally-run-processor). | +| `ignore_failure` | no | `false` | Ignore failures for the processor. See [Handling pipeline failures](ingest.md#handling-pipeline-failures). | +| `on_failure` | no | - | Handle failures for the processor. See [Handling pipeline failures](ingest.md#handling-pipeline-failures). | | `tag` | no | - | Identifier for the processor. Useful for debugging and metrics. | ```js @@ -29,6 +27,8 @@ $$$pipeline-options$$$ } ``` +% NOTCONSOLE + The name of the current pipeline can be accessed from the `_ingest.pipeline` ingest metadata key. An example of using this processor for nesting pipelines would be: @@ -72,6 +72,8 @@ PUT _ingest/pipeline/pipelineB } ``` +% TEST[continued] + Now indexing a document while applying the outer pipeline will see the inner pipeline executed from the outer pipeline: ```console @@ -81,6 +83,8 @@ PUT /my-index-000001/_doc/1?pipeline=pipelineB } ``` +% TEST[continued] + Response from the index request: ```console-result @@ -99,6 +103,8 @@ Response from the index request: } ``` +% TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] + Indexed document: ```js @@ -109,3 +115,5 @@ Indexed document: } ``` +% NOTCONSOLE + diff --git a/docs/reference/ingestion-tools/search-connectors/es-connectors-onedrive.md b/docs/reference/ingestion-tools/search-connectors/es-connectors-onedrive.md index 726bb40c65900..2816a824e9170 100644 --- a/docs/reference/ingestion-tools/search-connectors/es-connectors-onedrive.md +++ b/docs/reference/ingestion-tools/search-connectors/es-connectors-onedrive.md @@ -1,45 +1,430 @@ --- navigation_title: "OneDrive" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors-onedrive.html --- # Elastic OneDrive connector reference [es-connectors-onedrive] -The *Elastic OneDrive connector* is a [connector](/reference/ingestion-tools/search-connectors/index.md) for OneDrive. This connector is written in Python using the [Elastic connector framework](https://github.com/elastic/connectors/tree/main). +% Attributes used in this file + +The *Elastic OneDrive connector* is a [connector](es-connectors.md) for OneDrive. This connector is written in Python using the [Elastic connector framework](https://github.com/elastic/connectors/tree/main). View the [**source code** for this connector](https://github.com/elastic/connectors/tree/main/connectors/sources/onedrive.py) (branch *main*, compatible with Elastic *9.0*). -::::{important} -As of Elastic 9.0, managed connectors on Elastic Cloud Hosted are no longer available. All connectors must be [self-managed](/reference/ingestion-tools/search-connectors/self-managed-connectors.md). +::::{admonition} Choose your connector reference +Are you using a managed connector on Elastic Cloud or a self-managed connector? Expand the documentation based on your deployment method. + +:::: + + +% //////// //// //// //// //// //// //// //////// + +% //////// NATIVE CONNECTOR REFERENCE /////// + +% //////// //// //// //// //// //// //// //////// + + +## **Elastic managed connector reference** [es-connectors-onedrive-native-connector-reference] + +::::::{dropdown} View **Elastic managed connector** reference + +### Availability and prerequisites [es-connectors-onedrive-availability-prerequisites] + +This connector is available as a **managed connector** as of Elastic version **8.11.0**. + +To use this connector natively in Elastic Cloud, satisfy all [managed connector requirements](es-native-connectors.md#es-native-connectors-prerequisites). + + +### Create a OneDrive connector [es-connectors-onedrive-create-native-connector] + + +## Use the UI [es-connectors-onedrive-create-use-the-ui] + +To create a new OneDrive connector: + +1. In the Kibana UI, navigate to the **Search → Content → Connectors** page from the main menu, or use the [global search field](https://www.elastic.co/guide/en/kibana/current/kibana-concepts-analysts.html#_finding_your_apps_and_objects). +2. Follow the instructions to create a new native **OneDrive** connector. + +For additional operations, see [*Connectors UI in {{kib}}*](es-connectors-usage.md). + + +## Use the API [es-connectors-onedrive-create-use-the-api] + +You can use the {{es}} [Create connector API](https://www.elastic.co/guide/en/elasticsearch/reference/current/connector-apis.html) to create a new native OneDrive connector. + +For example: + +```console +PUT _connector/my-onedrive-connector +{ + "index_name": "my-elasticsearch-index", + "name": "Content synced from OneDrive", + "service_type": "onedrive", + "is_native": true +} +``` + +% TEST[skip:can’t test in isolation] + +:::::{dropdown} You’ll also need to **create an API key** for the connector to use. +::::{note} +The user needs the cluster privileges `manage_api_key`, `manage_connector` and `write_connector_secrets` to generate API keys programmatically. + +:::: + + +To create an API key for the connector: + +1. Run the following command, replacing values where indicated. Note the `id` and `encoded` return values from the response: + + ```console + POST /_security/api_key + { + "name": "my-connector-api-key", + "role_descriptors": { + "my-connector-connector-role": { + "cluster": [ + "monitor", + "manage_connector" + ], + "indices": [ + { + "names": [ + "my-index_name", + ".search-acl-filter-my-index_name", + ".elastic-connectors*" + ], + "privileges": [ + "all" + ], + "allow_restricted_indices": false + } + ] + } + } + } + ``` + +2. Use the `encoded` value to store a connector secret, and note the `id` return value from this response: + + ```console + POST _connector/_secret + { + "value": "encoded_api_key" + } + ``` + + +% TEST[skip:need to retrieve ids from the response] + ++ . Use the API key `id` and the connector secret `id` to update the connector: + ++ + +```console +PUT /_connector/my_connector_id>/_api_key_id +{ + "api_key_id": "API key_id", + "api_key_secret_id": "secret_id" +} +``` + +% TEST[skip:need to retrieve ids from the response] + +::::: + + +Refer to the [{{es}} API documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/connector-apis.html) for details of all available Connector APIs. + + +### Usage [es-connectors-onedrive-usage] + +To use this connector natively in Elastic Cloud, see [*Elastic managed connectors*](es-native-connectors.md). + +For additional operations, see [*Connectors UI in {{kib}}*](es-connectors-usage.md). + + +#### Connecting to OneDrive [es-connectors-onedrive-usage-connection] + +To connect to OneDrive you need to [create an Azure Active Directory application and service principal](https://learn.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal) that can access resources. + +Follow these steps: + +1. Go to the [Azure portal](https://portal.azure.com) and sign in with your Azure account. +2. Navigate to the **Azure Active Directory** service. +3. Select **App registrations** from the left-hand menu. +4. Click on the **New registration** button to register a new application. +5. Provide a name for your app, and optionally select the supported account types (e.g., single tenant, multi-tenant). +6. Click on the **Register** button to create the app registration. +7. After the registration is complete, you will be redirected to the app’s overview page. Take note of the **Application (client) ID** value, as you’ll need it later. +8. Scroll down to the **API permissions** section and click on the **Add a permission** button. +9. In the **Request API permissions** pane, select **Microsoft Graph** as the API. +10. Choose the application permissions and select the following permissions under the **Application** tab: `User.Read.All`, `File.Read.All` +11. Click on the **Add permissions** button to add the selected permissions to your app. Finally, click on the **Grant admin consent** button to grant the required permissions to the app. This step requires administrative privileges. ***NOTE***: If you are not an admin, you need to request the Admin to grant consent via their Azure Portal. +12. Click on **Certificates & Secrets** tab. Go to Client Secrets. Generate a new client secret and keep a note of the string present under `Value` column. + + +### Configuration [es-connectors-onedrive-usage-configuration] + +The following configuration fields are **required**: + +Azure application Client ID +: Unique identifier for your Azure Application, found on the app’s overview page. Example: + + * `ab123453-12a2-100a-1123-93fd09d67394` + + +Azure application Client Secret +: String value that the application uses to prove its identity when requesting a token, available under the `Certificates & Secrets` tab of your Azure application menu. Example: + + * `eyav1~12aBadIg6SL-STDfg102eBfCGkbKBq_Ddyu` + + +Azure application Tenant ID +: Unique identifier of your Azure Active Directory instance. Example: + + * `123a1b23-12a3-45b6-7c8d-fc931cfb448d` + + +Enable document level security +: Toggle to enable [document level security](es-dls.md). When enabled: + + * Full syncs will fetch access control lists for each document and store them in the `_allow_access_control` field. + * Access control syncs will fetch users' access control lists and store them in a separate index. + + +::::{warning} +Enabling DLS for your connector will cause a significant performance degradation, as the API calls to the data source required for this functionality are rate limited. This impacts the speed at which your content can be retrieved. + +:::: + + + +### Content Extraction [es-connectors-onedrive-usage-content-extraction] + +Refer to [Content extraction](es-connectors-content-extraction.md) for more details. + + +### Documents and syncs [es-connectors-onedrive-documents-syncs] + +The connector syncs the following objects and entities: + +* **Files** + + * Includes metadata such as file name, path, size, content, etc. + +* **Folders** + +::::{note} +* Content from files bigger than 10 MB won’t be extracted. (Self-managed connectors can use the [self-managed local extraction service](es-connectors-content-extraction.md#es-connectors-content-extraction-local) to handle larger binary files.) +* Permissions are not synced by default. You must first enable [DLS](es-connectors-onedrive.md#es-connectors-onedrive-client-dls). Otherwise, **all documents** indexed to an Elastic deployment will be visible to **all users with access** to that Elastic Deployment. + +:::: + + + +#### Sync types [es-connectors-onedrive-connectors-onedrive-sync-types] + +[Full syncs](es-connectors-sync-types.md#es-connectors-sync-types-full) are supported by default for all connectors. + +This connector also supports [incremental syncs](es-connectors-sync-types.md#es-connectors-sync-types-incremental). + + +### Document level security [es-connectors-onedrive-dls] + +Document level security (DLS) enables you to restrict access to documents based on a user’s permissions. This feature is available by default for the OneDrive connector. See [Configuration](es-connectors-onedrive.md#es-connectors-onedrive-usage-configuration) for how to enable DLS for this connector. + +Refer to [document level security](es-dls.md) for more details about this feature. + +::::{note} +Refer to [DLS in Search Applications](es-dls-e2e-guide.md) to learn how to ingest data with DLS enabled, when building a search application. + +:::: + + + +### Sync rules [es-connectors-onedrive-documents-sync-rules] + +*Basic* sync rules are identical for all connectors and are available by default. For more information read [Types of sync rule](es-sync-rules.md#es-sync-rules-types). + + +#### Advanced sync rules [es-connectors-onedrive-sync-rules-advanced] + +This connector supports [advanced sync rules](es-sync-rules.md#es-sync-rules-advanced) for remote filtering. These rules cover complex query-and-filter scenarios that cannot be expressed with basic sync rules. Advanced sync rules are defined through a source-specific DSL JSON snippet. + +::::{note} +A [full sync](es-connectors-sync-types.md#es-connectors-sync-types-full) is required for advanced sync rules to take effect. + :::: -## **Self-managed connector** [es-connectors-onedrive-connector-client-reference] -### Availability and prerequisites [es-connectors-onedrive-client-availability-prerequisites] +Here are a few examples of advanced sync rules for this connector. + +$$$es-connectors-onedrive-sync-rules-advanced-examples-1$$$ +**Example 1** + +This rule skips indexing for files with `.xlsx` and `.docx` extensions. All other files and folders will be indexed. + +```js +[ + { + "skipFilesWithExtensions": [".xlsx" , ".docx"] + } +] +``` + +% NOTCONSOLE + +$$$es-connectors-onedrive-sync-rules-advanced-examples-2$$$ +**Example 2** + +This rule focuses on indexing files and folders owned by `user1-domain@onmicrosoft.com` and `user2-domain@onmicrosoft.com` but excludes files with `.py` extension. + +```js +[ + { + "owners": ["user1-domain@onmicrosoft.com", "user2-domain@onmicrosoft.com"], + "skipFilesWithExtensions": [".py"] + } +] +``` + +% NOTCONSOLE + +$$$es-connectors-onedrive-sync-rules-advanced-examples-3$$$ +**Example 3** + +This rule indexes only the files and folders directly inside the root folder, excluding any `.md` files. + +```js +[ + { + "skipFilesWithExtensions": [".md"], + "parentPathPattern": "/drive/root:" + } +] +``` + +% NOTCONSOLE + +$$$es-connectors-onedrive-sync-rules-advanced-examples-4$$$ +**Example 4** + +This rule indexes files and folders owned by `user1-domain@onmicrosoft.com` and `user3-domain@onmicrosoft.com` that are directly inside the `abc` folder, which is a subfolder of any folder under the `hello` directory in the root. Files with extensions `.pdf` and `.py` are excluded. + +```js +[ + { + "owners": ["user1-domain@onmicrosoft.com", "user3-domain@onmicrosoft.com"], + "skipFilesWithExtensions": [".pdf", ".py"], + "parentPathPattern": "/drive/root:/hello/**/abc" + } +] +``` + +% NOTCONSOLE + +$$$es-connectors-onedrive-sync-rules-advanced-examples-5$$$ +**Example 5** + +This example contains two rules. The first rule indexes all files and folders owned by `user1-domain@onmicrosoft.com` and `user2-domain@onmicrosoft.com`. The second rule indexes files for all other users, but skips files with a `.py` extension. + +```js +[ + { + "owners": ["user1-domain@onmicrosoft.com", "user2-domain@onmicrosoft.com"] + }, + { + "skipFilesWithExtensions": [".py"] + } +] +``` + +% NOTCONSOLE + +$$$es-connectors-onedrive-sync-rules-advanced-examples-6$$$ +**Example 6** + +This example contains two rules. The first rule indexes all files owned by `user1-domain@onmicrosoft.com` and `user2-domain@onmicrosoft.com`, excluding `.md` files. The second rule indexes files and folders recursively inside the `abc` folder. + +```js +[ + { + "owners": ["user1-domain@onmicrosoft.com", "user2-domain@onmicrosoft.com"], + "skipFilesWithExtensions": [".md"] + }, + { + "parentPathPattern": "/drive/root:/abc/**" + } +] +``` + +% NOTCONSOLE + + +### Content Extraction [es-connectors-onedrive-content-extraction] -This connector is available as a self-managed connector. +See [Content extraction](es-connectors-content-extraction.md). + + +### Known issues [es-connectors-onedrive-known-issues] + +* **Enabling document-level security impacts performance.** + + Enabling DLS for your connector will cause a significant performance degradation, as the API calls to the data source required for this functionality are rate limited. This impacts the speed at which your content can be retrieved. + + +Refer to [Known issues](es-connectors-known-issues.md) for a list of known issues for all connectors. + + +### Troubleshooting [es-connectors-onedrive-troubleshooting] + +See [Troubleshooting](es-connectors-troubleshooting.md). + + +### Security [es-connectors-onedrive-security] + +See [Security](es-connectors-security.md). + +% Closing the collapsible section + +:::::: + + +% //////// //// //// //// //// //// //// //////// + +% //////// CONNECTOR CLIENT REFERENCE /////// + +% //////// //// //// //// //// //// //// //////// + + +## **Self-managed connector** [es-connectors-onedrive-connector-client-reference] + +::::::{dropdown} View **self-managed connector** reference + +### Availability and prerequisites [es-connectors-onedrive-client-availability-prerequisites] + +This connector is available as a self-managed **self-managed connector**. This self-managed connector is compatible with Elastic versions **8.10.0+**. -To use this connector, satisfy all [self-managed connector requirements](/reference/ingestion-tools/search-connectors/self-managed-connectors.md). +To use this connector, satisfy all [self-managed connector requirements](es-build-connector.md). -### Create a OneDrive connector [es-connectors-onedrive-create-connector-client] +### Create a OneDrive connector [es-connectors-onedrive-create-connector-client] -#### Use the UI [es-connectors-onedrive-client-create-use-the-ui] +## Use the UI [es-connectors-onedrive-client-create-use-the-ui] To create a new OneDrive connector: -1. In the Kibana UI, navigate to the **Search → Content → Connectors** page from the main menu, or use the [global search field](docs-content://explore-analyze/query-filter/filtering.md#_finding_your_apps_and_objects). +1. In the Kibana UI, navigate to the **Search → Content → Connectors** page from the main menu, or use the [global search field](https://www.elastic.co/guide/en/kibana/current/kibana-concepts-analysts.html#_finding_your_apps_and_objects). 2. Follow the instructions to create a new **OneDrive** self-managed connector. -#### Use the API [es-connectors-onedrive-client-create-use-the-api] +## Use the API [es-connectors-onedrive-client-create-use-the-api] -You can use the {{es}} [Create connector API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-connector) to create a new self-managed OneDrive self-managed connector. +You can use the {{es}} [Create connector API](https://www.elastic.co/guide/en/elasticsearch/reference/current/connector-apis.html) to create a new self-managed OneDrive self-managed connector. For example: @@ -52,8 +437,10 @@ PUT _connector/my-onedrive-connector } ``` -:::::{dropdown} You’ll also need to create an API key for the connector to use. -::::{note} +% TEST[skip:can’t test in isolation] + +:::::{dropdown} You’ll also need to **create an API key** for the connector to use. +::::{note} The user needs the cluster privileges `manage_api_key`, `manage_connector` and `write_connector_secrets` to generate API keys programmatically. :::: @@ -96,15 +483,15 @@ To create an API key for the connector: ::::: -Refer to the [{{es}} API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-connector) for details of all available Connector APIs. +Refer to the [{{es}} API documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/connector-apis.html) for details of all available Connector APIs. -### Usage [es-connectors-onedrive-client-usage] +### Usage [es-connectors-onedrive-client-usage] -For additional operations, see [*Connectors UI in {{kib}}*](/reference/ingestion-tools/search-connectors/connectors-ui-in-kibana.md). +For additional operations, see [*Connectors UI in {{kib}}*](es-connectors-usage.md). -#### Connecting to OneDrive [es-connectors-onedrive-client-usage-connection] +#### Connecting to OneDrive [es-connectors-onedrive-client-usage-connection] To connect to OneDrive you need to [create an Azure Active Directory application and service principal](https://learn.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal) that can access resources. @@ -120,29 +507,31 @@ Follow these steps: 8. Scroll down to the **API permissions** section and click on the **Add a permission** button. 9. In the **Request API permissions** pane, select **Microsoft Graph** as the API. 10. Choose the application permissions and select the following permissions under the **Application** tab: `User.Read.All`, `File.Read.All` -11. Click on the **Add permissions** button to add the selected permissions to your app. Finally, click on the **Grant admin consent** button to grant the required permissions to the app. This step requires administrative privileges. **NOTE**: If you are not an admin, you need to request the Admin to grant consent via their Azure Portal. +11. Click on the **Add permissions** button to add the selected permissions to your app. Finally, click on the **Grant admin consent** button to grant the required permissions to the app. This step requires administrative privileges. ***NOTE***: If you are not an admin, you need to request the Admin to grant consent via their Azure Portal. 12. Click on **Certificates & Secrets** tab. Go to Client Secrets. Generate a new client secret and keep a note of the string present under `Value` column. -### Deployment using Docker [es-connectors-onedrive-client-docker] +### Deployment using Docker [es-connectors-onedrive-client-docker] Self-managed connectors are run on your own infrastructure. You can deploy the OneDrive connector as a self-managed connector using Docker. Follow these instructions. -::::{dropdown} Step 1: Download sample configuration file +::::{dropdown} **Step 1: Download sample configuration file** Download the sample configuration file. You can either download it manually or run the following command: ```sh curl https://raw.githubusercontent.com/elastic/connectors/main/config.yml.example --output ~/connectors-config/config.yml ``` +% NOTCONSOLE + Remember to update the `--output` argument value if your directory name is different, or you want to use a different config file name. :::: -::::{dropdown} Step 2: Update the configuration file for your self-managed connector +::::{dropdown} **Step 2: Update the configuration file for your self-managed connector** Update the configuration file with the following settings to match your environment: * `elasticsearch.host` @@ -170,7 +559,7 @@ Note: You can change other default configurations by simply uncommenting specifi :::: -::::{dropdown} Step 3: Run the Docker image +::::{dropdown} **Step 3: Run the Docker image** Run the Docker image with the Connector Service using the following command: ```sh @@ -179,7 +568,7 @@ docker run \ --network "elastic" \ --tty \ --rm \ -docker.elastic.co/integrations/elastic-connectors:9.0.0 \ +docker.elastic.co/integrations/elastic-connectors:9.0.0-beta1.0 \ /app/bin/elastic-ingest \ -c /config/config.yml ``` @@ -191,14 +580,14 @@ Refer to [`DOCKER.md`](https://github.com/elastic/connectors/tree/main/docs/DOCK Find all available Docker images in the [official registry](https://www.docker.elastic.co/r/integrations/elastic-connectors). -::::{tip} +::::{tip} We also have a quickstart self-managed option using Docker Compose, so you can spin up all required services at once: Elasticsearch, Kibana, and the connectors service. Refer to this [README](https://github.com/elastic/connectors/tree/main/scripts/stack#readme) in the `elastic/connectors` repo for more information. :::: -### Configuration [es-connectors-onedrive-client-usage-configuration] +### Configuration [es-connectors-onedrive-client-usage-configuration] The following configuration fields are **required**: @@ -224,27 +613,27 @@ The following configuration fields are **required**: : The number of retry attempts after failed request to OneDrive. Default value is `3`. `use_document_level_security` -: Toggle to enable [document level security](/reference/ingestion-tools/search-connectors/document-level-security.md). When enabled: +: Toggle to enable [document level security](es-dls.md). When enabled: * Full syncs will fetch access control lists for each document and store them in the `_allow_access_control` field. * Access control syncs will fetch users' access control lists and store them in a separate index. - ::::{warning} + ::::{warning} Enabling DLS for your connector will cause a significant performance degradation, as the API calls to the data source required for this functionality are rate limited. This impacts the speed at which your content can be retrieved. :::: `use_text_extraction_service` -: Requires a separate deployment of the [Elastic Text Extraction Service](/reference/ingestion-tools/search-connectors/es-connectors-content-extraction.md#es-connectors-content-extraction-local). Requires that ingest pipeline settings disable text extraction. Default value is `False`. +: Requires a separate deployment of the [Elastic Text Extraction Service](es-connectors-content-extraction.md#es-connectors-content-extraction-local). Requires that ingest pipeline settings disable text extraction. Default value is `False`. -### Content Extraction [es-connectors-onedrive-client-usage-content-extraction] +### Content Extraction [es-connectors-onedrive-client-usage-content-extraction] -Refer to [Content extraction](/reference/ingestion-tools/search-connectors/es-connectors-content-extraction.md) for more details. +Refer to [Content extraction](es-connectors-content-extraction.md) for more details. -### Documents and syncs [es-connectors-onedrive-client-documents-syncs] +### Documents and syncs [es-connectors-onedrive-client-documents-syncs] The connector syncs the following objects and entities: @@ -254,45 +643,45 @@ The connector syncs the following objects and entities: * **Folders** -::::{note} -* Content from files bigger than 10 MB won’t be extracted by default. You can use the [self-managed local extraction service](/reference/ingestion-tools/search-connectors/es-connectors-content-extraction.md#es-connectors-content-extraction-local) to handle larger binary files. -* Permissions are not synced by default. You must first enable [DLS](#es-connectors-onedrive-client-dls). Otherwise, **all documents** indexed to an Elastic deployment will be visible to **all users with access** to that Elastic Deployment. +::::{note} +* Content from files bigger than 10 MB won’t be extracted by default. You can use the [self-managed local extraction service](es-connectors-content-extraction.md#es-connectors-content-extraction-local) to handle larger binary files. +* Permissions are not synced by default. You must first enable [DLS](es-connectors-onedrive.md#es-connectors-onedrive-client-dls). Otherwise, **all documents** indexed to an Elastic deployment will be visible to **all users with access** to that Elastic Deployment. :::: -#### Sync types [es-connectors-onedrive-client-sync-types] +#### Sync types [es-connectors-onedrive-client-sync-types] -[Full syncs](/reference/ingestion-tools/search-connectors/content-syncs.md#es-connectors-sync-types-full) are supported by default for all connectors. +[Full syncs](es-connectors-sync-types.md#es-connectors-sync-types-full) are supported by default for all connectors. -This connector also supports [incremental syncs](/reference/ingestion-tools/search-connectors/content-syncs.md#es-connectors-sync-types-incremental). +This connector also supports [incremental syncs](es-connectors-sync-types.md#es-connectors-sync-types-incremental). -### Document level security [es-connectors-onedrive-client-dls] +### Document level security [es-connectors-onedrive-client-dls] -Document level security (DLS) enables you to restrict access to documents based on a user’s permissions. This feature is available by default for the OneDrive connector. See [Configuration](#es-connectors-onedrive-client-usage-configuration) for how to enable DLS for this connector. +Document level security (DLS) enables you to restrict access to documents based on a user’s permissions. This feature is available by default for the OneDrive connector. See [Configuration](es-connectors-onedrive.md#es-connectors-onedrive-client-usage-configuration) for how to enable DLS for this connector. -Refer to [document level security](/reference/ingestion-tools/search-connectors/document-level-security.md) for more details about this feature. +Refer to [document level security](es-dls.md) for more details about this feature. -::::{note} -Refer to [DLS in Search Applications](/reference/ingestion-tools/search-connectors/es-dls-e2e-guide.md) to learn how to ingest data with DLS enabled, when building a search application. +::::{note} +Refer to [DLS in Search Applications](es-dls-e2e-guide.md) to learn how to ingest data with DLS enabled, when building a search application. :::: -### Sync rules [es-connectors-onedrive-client-documents-sync-rules] +### Sync rules [es-connectors-onedrive-client-documents-sync-rules] -*Basic* sync rules are identical for all connectors and are available by default. For more information read [Types of sync rule](/reference/ingestion-tools/search-connectors/es-sync-rules.md#es-sync-rules-types). +*Basic* sync rules are identical for all connectors and are available by default. For more information read [Types of sync rule](es-sync-rules.md#es-sync-rules-types). -#### Advanced sync rules [es-connectors-onedrive-client-sync-rules-advanced] +#### Advanced sync rules [es-connectors-onedrive-client-sync-rules-advanced] -This connector supports [advanced sync rules](/reference/ingestion-tools/search-connectors/es-sync-rules.md#es-sync-rules-advanced) for remote filtering. These rules cover complex query-and-filter scenarios that cannot be expressed with basic sync rules. Advanced sync rules are defined through a source-specific DSL JSON snippet. +This connector supports [advanced sync rules](es-sync-rules.md#es-sync-rules-advanced) for remote filtering. These rules cover complex query-and-filter scenarios that cannot be expressed with basic sync rules. Advanced sync rules are defined through a source-specific DSL JSON snippet. -::::{note} -A [full sync](/reference/ingestion-tools/search-connectors/content-syncs.md#es-connectors-sync-types-full) is required for advanced sync rules to take effect. +::::{note} +A [full sync](es-connectors-sync-types.md#es-connectors-sync-types-full) is required for advanced sync rules to take effect. :::: @@ -312,6 +701,8 @@ This rule skips indexing for files with `.xlsx` and `.docx` extensions. All othe ] ``` +% NOTCONSOLE + $$$es-connectors-onedrive-client-sync-rules-advanced-examples-2$$$ **Example 2** @@ -326,6 +717,8 @@ This rule focuses on indexing files and folders owned by `user1-domain@onmicroso ] ``` +% NOTCONSOLE + $$$es-connectors-onedrive-client-sync-rules-advanced-examples-3$$$ **Example 3** @@ -340,6 +733,8 @@ This rule indexes only the files and folders directly inside the root folder, ex ] ``` +% NOTCONSOLE + $$$es-connectors-onedrive-client-sync-rules-advanced-examples-4$$$ **Example 4** @@ -355,6 +750,8 @@ This rule indexes files and folders owned by `user1-domain@onmicrosoft.com` and ] ``` +% NOTCONSOLE + $$$es-connectors-onedrive-client-sync-rules-advanced-examples-5$$$ **Example 5** @@ -371,6 +768,8 @@ This example contains two rules. The first rule indexes all files and folders ow ] ``` +% NOTCONSOLE + $$$es-connectors-onedrive-client-sync-rules-advanced-examples-6$$$ **Example 6** @@ -388,18 +787,20 @@ This example contains two rules. The first rule indexes all files owned by `user ] ``` +% NOTCONSOLE + -### Content Extraction [es-connectors-onedrive-client-content-extraction] +### Content Extraction [es-connectors-onedrive-client-content-extraction] -See [Content extraction](/reference/ingestion-tools/search-connectors/es-connectors-content-extraction.md). +See [Content extraction](es-connectors-content-extraction.md). -### Self-managed connector operations [es-connectors-onedrive-client-connector-client-operations] +### Self-managed connector operations [es-connectors-onedrive-client-connector-client-operations] -### End-to-end testing [es-connectors-onedrive-client-testing] +### End-to-end testing [es-connectors-onedrive-client-testing] -The connector framework enables operators to run functional tests against a real data source. Refer to [Connector testing](/reference/ingestion-tools/search-connectors/self-managed-connectors.md#es-build-connector-testing) for more details. +The connector framework enables operators to run functional tests against a real data source. Refer to [Connector testing](es-build-connector.md#es-build-connector-testing) for more details. To perform E2E testing for the GitHub connector, run the following command: @@ -414,21 +815,27 @@ make ftest NAME=onedrive DATA_SIZE=small ``` -### Known issues [es-connectors-onedrive-client-known-issues] +### Known issues [es-connectors-onedrive-client-known-issues] * **Enabling document-level security impacts performance.** Enabling DLS for your connector will cause a significant performance degradation, as the API calls to the data source required for this functionality are rate limited. This impacts the speed at which your content can be retrieved. -Refer to [Known issues](/release-notes/known-issues.md) for a list of known issues for all connectors. +Refer to [Known issues](es-connectors-known-issues.md) for a list of known issues for all connectors. + + +### Troubleshooting [es-connectors-onedrive-client-troubleshooting] + +See [Troubleshooting](es-connectors-troubleshooting.md). + +### Security [es-connectors-onedrive-client-security] -### Troubleshooting [es-connectors-onedrive-client-troubleshooting] +See [Security](es-connectors-security.md). -See [Troubleshooting](/reference/ingestion-tools/search-connectors/es-connectors-troubleshooting.md). +% Closing the collapsible section +:::::: -### Security [es-connectors-onedrive-client-security] -See [Security](/reference/ingestion-tools/search-connectors/es-connectors-security.md). \ No newline at end of file diff --git a/docs/reference/ingestion-tools/search-connectors/es-connectors-s3.md b/docs/reference/ingestion-tools/search-connectors/es-connectors-s3.md index 01ca7c42e842e..b9de11c23d246 100644 --- a/docs/reference/ingestion-tools/search-connectors/es-connectors-s3.md +++ b/docs/reference/ingestion-tools/search-connectors/es-connectors-s3.md @@ -1,38 +1,326 @@ --- navigation_title: "S3" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors-s3.html --- # Elastic S3 connector reference [es-connectors-s3] -The *Elastic S3 connector* is a [connector](/reference/ingestion-tools/search-connectors/index.md) for [Amazon S3](https://aws.amazon.com/s3/) data sources. -::::{important} -As of Elastic 9.0, managed connectors on Elastic Cloud Hosted are no longer available. All connectors must be [self-managed](/reference/ingestion-tools/search-connectors/self-managed-connectors.md). +% Attributes used in this file: + +The *Elastic S3 connector* is a [connector](es-connectors.md) for [Amazon S3](https://aws.amazon.com/s3/) data sources. + +% //////// //// //// //// //// //// //// //////// + +% //////// NATIVE CONNECTOR REFERENCE (MANAGED SERVICE) /////// + +% //////// //// //// //// //// //// //// //////// + + +## **Elastic managed connector reference** [es-connectors-s3-native-connector-reference] + +::::::{dropdown} View **Elastic managed connector** reference + +### Availability and prerequisites [es-connectors-s3-prerequisites] + +This connector is available natively in Elastic Cloud as of version **8.12.0**. To use this connector, satisfy all [managed connector requirements](es-native-connectors.md). + + +### Create a Amazon S3 connector [es-connectors-s3-create-native-connector] + + +## Use the UI [es-connectors-s3-create-use-the-ui] + +To create a new Amazon S3 connector: + +1. In the Kibana UI, navigate to the **Search → Content → Connectors** page from the main menu, or use the [global search field](https://www.elastic.co/guide/en/kibana/current/kibana-concepts-analysts.html#_finding_your_apps_and_objects). +2. Follow the instructions to create a new native **Amazon S3** connector. + +For additional operations, see [*Connectors UI in {{kib}}*](es-connectors-usage.md). + + +## Use the API [es-connectors-s3-create-use-the-api] + +You can use the {{es}} [Create connector API](https://www.elastic.co/guide/en/elasticsearch/reference/current/connector-apis.html) to create a new native Amazon S3 connector. + +For example: + +```console +PUT _connector/my-s3-connector +{ + "index_name": "my-elasticsearch-index", + "name": "Content synced from Amazon S3", + "service_type": "s3", + "is_native": true +} +``` + +% TEST[skip:can’t test in isolation] + +:::::{dropdown} You’ll also need to **create an API key** for the connector to use. +::::{note} +The user needs the cluster privileges `manage_api_key`, `manage_connector` and `write_connector_secrets` to generate API keys programmatically. + +:::: + + +To create an API key for the connector: + +1. Run the following command, replacing values where indicated. Note the `id` and `encoded` return values from the response: + + ```console + POST /_security/api_key + { + "name": "my-connector-api-key", + "role_descriptors": { + "my-connector-connector-role": { + "cluster": [ + "monitor", + "manage_connector" + ], + "indices": [ + { + "names": [ + "my-index_name", + ".search-acl-filter-my-index_name", + ".elastic-connectors*" + ], + "privileges": [ + "all" + ], + "allow_restricted_indices": false + } + ] + } + } + } + ``` + +2. Use the `encoded` value to store a connector secret, and note the `id` return value from this response: + + ```console + POST _connector/_secret + { + "value": "encoded_api_key" + } + ``` + + +% TEST[skip:need to retrieve ids from the response] + ++ . Use the API key `id` and the connector secret `id` to update the connector: + ++ + +```console +PUT /_connector/my_connector_id>/_api_key_id +{ + "api_key_id": "API key_id", + "api_key_secret_id": "secret_id" +} +``` + +% TEST[skip:need to retrieve ids from the response] + +::::: + + +Refer to the [{{es}} API documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/connector-apis.html) for details of all available Connector APIs. + + +### Usage [es-connectors-s3-usage] + +To use this managed connector, see [*Elastic managed connectors*](es-native-connectors.md). + +For additional operations, see [*Connectors UI in {{kib}}*](es-connectors-usage.md). + +S3 users will also need to [Create an IAM identity](es-connectors-s3.md#es-connectors-s3-usage-create-iam) + + +#### Create an IAM identity [es-connectors-s3-usage-create-iam] + +Users need to create an IAM identity to use this connector as a **self-managed connector**. Refer to [the AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-set-up.md). + +The [policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.md) associated with the IAM identity must have the following **AWS permissions**: + +* `ListAllMyBuckets` +* `ListBucket` +* `GetBucketLocation` +* `GetObject` + + +### Compatibility [es-connectors-s3-compatibility] + +Currently the connector does not support S3-compatible vendors. + + +### Configuration [es-connectors-s3-configuration] + +The following configuration fields are required to **set up** the connector: + +AWS Buckets +: List of S3 bucket names. `*` will fetch data from all buckets. Examples: + + * `testbucket, prodbucket` + * `testbucket` + * `*` + + +::::{note} +This field is ignored when using advanced sync rules. + :::: -## **Self-managed connector reference** [es-connectors-s3-connector-client-reference] -### Availability and prerequisites [es-connectors-s3-client-prerequisites] +AWS Access Key ID +: Access Key ID for the AWS identity that will be used for bucket access. + +AWS Secret Key +: Secret Access Key for the AWS identity that will be used for bucket access. + + +### Documents and syncs [es-connectors-s3-documents-syncs] + +::::{note} +* Content from files bigger than 10 MB won’t be extracted. (Self-managed connectors can use the [self-managed local extraction service](es-connectors-content-extraction.md#es-connectors-content-extraction-local) to handle larger binary files.) +* Permissions are not synced. ***All documents*** indexed to an Elastic deployment will be visible to ***all users with access*** to that Elastic Deployment. + +:::: + + + +### Sync rules [es-connectors-s3-sync-rules] -This connector is available as a self-managed connector. This self-managed connector is compatible with Elastic versions **8.6.0+**. To use this connector, satisfy all [self-managed connector requirements](/reference/ingestion-tools/search-connectors/self-managed-connectors.md). +[Basic sync rules](es-sync-rules.md#es-sync-rules-basic) are identical for all connectors and are available by default. -### Create a Amazon S3 connector [es-connectors-s3-create-connector-client] +#### Advanced sync rules [es-connectors-s3-sync-rules-advanced] +::::{note} +A [full sync](es-connectors-sync-types.md#es-connectors-sync-types-full) is required for advanced sync rules to take effect. -#### Use the UI [es-connectors-s3-client-create-use-the-ui] +:::: + + +Advanced sync rules are defined through a source-specific DSL JSON snippet. + +Use advanced sync rules to filter data to be fetched from Amazon S3 buckets. They take the following parameters: + +1. `bucket`: S3 bucket the rule applies to. +2. `extension` (optional): Lists which file types to sync. Defaults to syncing all types. +3. `prefix` (optional): String of prefix characters. The connector will fetch file and folder data that matches the string. Defaults to `""` (syncs all bucket objects). + +$$$es-connectors-s3-sync-rules-advanced-examples$$$ +**Advanced sync rules examples** + +**Fetching files and folders recursively by prefix** + +**Example**: Fetch files/folders in `folder1/docs`. + +```js +[ + { + "bucket": "bucket1", + "prefix": "folder1/docs" + } + +] +``` + +% NOTCONSOLE + +**Example**: Fetch files/folder starting with `folder1`. + +```js +[ + { + "bucket": "bucket2", + "prefix": "folder1" + } +] +``` + +% NOTCONSOLE + +**Fetching files and folders by specifying extensions** + +**Example**: Fetch all objects which start with `abc` and then filter using file extensions. + +```js +[ + { + "bucket": "bucket2", + "prefix": "abc", + "extension": [".txt", ".png"] + } +] +``` + +% NOTCONSOLE + + +### Content extraction [es-connectors-s3-content-extraction] + +See [Content extraction](es-connectors-content-extraction.md). + + +### Known issues [es-connectors-s3-known-issues] + +There are no known issues for this connector. + +See [Known issues](es-connectors-known-issues.md) for any issues affecting all connectors. + + +### Troubleshooting [es-connectors-s3-troubleshooting] + +See [Troubleshooting](es-connectors-troubleshooting.md). + + +### Security [es-connectors-s3-security] + +See [Security](es-connectors-security.md). + + +### Framework and source [es-connectors-s3-source] + +This connector is built with the [Elastic connector framework](https://github.com/elastic/connectors/tree/main). + +View the [source code for this connector](https://github.com/elastic/connectors/tree/main/connectors/sources/s3.py) (branch *main*, compatible with Elastic *9.0*). + +% Closing the collapsible section + +:::::: + + +% //////// //// //// //// //// //// //// //////// + +% //////// CONNECTOR CLIENT REFERENCE (SELF-MANAGED) /////// + +% //////// //// //// //// //// //// //// //////// + + +## **Self-managed connector reference** [es-connectors-s3-connector-client-reference] + +::::::{dropdown} View **self-managed connector** reference + +### Availability and prerequisites [es-connectors-s3-client-prerequisites] + +This connector is available as a self-managed **self-managed connector**. This self-managed connector is compatible with Elastic versions **8.6.0+**. To use this connector, satisfy all [self-managed connector requirements](es-build-connector.md). + + +### Create a Amazon S3 connector [es-connectors-s3-create-connector-client] + + +## Use the UI [es-connectors-s3-client-create-use-the-ui] To create a new Amazon S3 connector: -1. In the Kibana UI, navigate to the **Search → Content → Connectors** page from the main menu, or use the [global search field](docs-content://explore-analyze/query-filter/filtering.md#_finding_your_apps_and_objects). +1. In the Kibana UI, navigate to the **Search → Content → Connectors** page from the main menu, or use the [global search field](https://www.elastic.co/guide/en/kibana/current/kibana-concepts-analysts.html#_finding_your_apps_and_objects). 2. Follow the instructions to create a new **Amazon S3** self-managed connector. -#### Use the API [es-connectors-s3-client-create-use-the-api] +## Use the API [es-connectors-s3-client-create-use-the-api] -You can use the {{es}} [Create connector API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-connector) to create a new self-managed Amazon S3 self-managed connector. +You can use the {{es}} [Create connector API](https://www.elastic.co/guide/en/elasticsearch/reference/current/connector-apis.html) to create a new self-managed Amazon S3 self-managed connector. For example: @@ -45,8 +333,10 @@ PUT _connector/my-s3-connector } ``` -:::::{dropdown} You’ll also need to create an API key for the connector to use. -::::{note} +% TEST[skip:can’t test in isolation] + +:::::{dropdown} You’ll also need to **create an API key** for the connector to use. +::::{note} The user needs the cluster privileges `manage_api_key`, `manage_connector` and `write_connector_secrets` to generate API keys programmatically. :::: @@ -89,19 +379,19 @@ To create an API key for the connector: ::::: -Refer to the [{{es}} API documentation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-connector) for details of all available Connector APIs. +Refer to the [{{es}} API documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/connector-apis.html) for details of all available Connector APIs. -### Usage [es-connectors-s3-client-usage] +### Usage [es-connectors-s3-client-usage] -To use this connector as a **self-managed connector**, see [*Self-managed connectors*](/reference/ingestion-tools/search-connectors/self-managed-connectors.md). +To use this connector as a **self-managed connector**, see [*Self-managed connectors*](es-build-connector.md). -For additional operations, see [*Connectors UI in {{kib}}*](/reference/ingestion-tools/search-connectors/connectors-ui-in-kibana.md). +For additional operations, see [*Connectors UI in {{kib}}*](es-connectors-usage.md). -S3 users will also need to [Create an IAM identity](#es-connectors-s3-client-usage-create-iam) +S3 users will also need to [Create an IAM identity](es-connectors-s3.md#es-connectors-s3-client-usage-create-iam) -#### Create an IAM identity [es-connectors-s3-client-usage-create-iam] +#### Create an IAM identity [es-connectors-s3-client-usage-create-iam] Users need to create an IAM identity to use this connector as a **self-managed connector**. Refer to [the AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-set-up.md). @@ -113,14 +403,17 @@ The [policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.md * `GetObject` -### Compatibility [es-connectors-s3-client-compatibility] +### Compatibility [es-connectors-s3-client-compatibility] Currently the connector does not support S3-compatible vendors. -### Configuration [es-connectors-s3-client-configuration] +### Configuration [es-connectors-s3-client-configuration] +::::{tip} +When using the [self-managed connector](es-build-connector.md) workflow, these fields will use the default configuration set in the [connector source code](https://github.com/elastic/connectors/blob/a5976d20cd8277ae46511f7176662afc889e56ec/connectors/sources/s3.py#L231-L258). These configurable fields will be rendered with their respective **labels** in the Kibana UI. Once connected, you’ll be able to update these values in Kibana. +:::: The following configuration fields are required to **set up** the connector: @@ -133,7 +426,7 @@ The following configuration fields are required to **set up** the connector: * `*` -::::{note} +::::{note} This field is ignored when using advanced sync rules. :::: @@ -158,23 +451,25 @@ This field is ignored when using advanced sync rules. : Page size for iterating bucket objects in Amazon S3. Default value is `100`. -### Deployment using Docker [es-connectors-s3-client-docker] +### Deployment using Docker [es-connectors-s3-client-docker] You can deploy the Amazon S3 connector as a self-managed connector using Docker. Follow these instructions. -::::{dropdown} Step 1: Download sample configuration file +::::{dropdown} **Step 1: Download sample configuration file** Download the sample configuration file. You can either download it manually or run the following command: ```sh curl https://raw.githubusercontent.com/elastic/connectors/main/config.yml.example --output ~/connectors-config/config.yml ``` +% NOTCONSOLE + Remember to update the `--output` argument value if your directory name is different, or you want to use a different config file name. :::: -::::{dropdown} Step 2: Update the configuration file for your self-managed connector +::::{dropdown} **Step 2: Update the configuration file for your self-managed connector** Update the configuration file with the following settings to match your environment: * `elasticsearch.host` @@ -202,7 +497,7 @@ Note: You can change other default configurations by simply uncommenting specifi :::: -::::{dropdown} Step 3: Run the Docker image +::::{dropdown} **Step 3: Run the Docker image** Run the Docker image with the Connector Service using the following command: ```sh @@ -211,7 +506,7 @@ docker run \ --network "elastic" \ --tty \ --rm \ -docker.elastic.co/integrations/elastic-connectors:9.0.0 \ +docker.elastic.co/integrations/elastic-connectors:9.0.0-beta1.0 \ /app/bin/elastic-ingest \ -c /config/config.yml ``` @@ -223,32 +518,32 @@ Refer to [`DOCKER.md`](https://github.com/elastic/connectors/tree/main/docs/DOCK Find all available Docker images in the [official registry](https://www.docker.elastic.co/r/integrations/elastic-connectors). -::::{tip} +::::{tip} We also have a quickstart self-managed option using Docker Compose, so you can spin up all required services at once: Elasticsearch, Kibana, and the connectors service. Refer to this [README](https://github.com/elastic/connectors/tree/main/scripts/stack#readme) in the `elastic/connectors` repo for more information. :::: -### Documents and syncs [es-connectors-s3-client-documents-syncs] +### Documents and syncs [es-connectors-s3-client-documents-syncs] -::::{note} -* Content from files bigger than 10 MB won’t be extracted by default. You can use the [self-managed local extraction service](/reference/ingestion-tools/search-connectors/es-connectors-content-extraction.md#es-connectors-content-extraction-local) to handle larger binary files. -* Permissions are not synced. **All documents** indexed to an Elastic deployment will be visible to **all users with access** to that Elastic Deployment. +::::{note} +* Content from files bigger than 10 MB won’t be extracted by default. You can use the [self-managed local extraction service](es-connectors-content-extraction.md#es-connectors-content-extraction-local) to handle larger binary files. +* Permissions are not synced. ***All documents*** indexed to an Elastic deployment will be visible to ***all users with access*** to that Elastic Deployment. :::: -### Sync rules [es-connectors-s3-client-sync-rules] +### Sync rules [es-connectors-s3-client-sync-rules] -[Basic sync rules](/reference/ingestion-tools/search-connectors/es-sync-rules.md#es-sync-rules-basic) are identical for all connectors and are available by default. +[Basic sync rules](es-sync-rules.md#es-sync-rules-basic) are identical for all connectors and are available by default. -#### Advanced sync rules [es-connectors-s3-client-sync-rules-advanced] +#### Advanced sync rules [es-connectors-s3-client-sync-rules-advanced] -::::{note} -A [full sync](/reference/ingestion-tools/search-connectors/content-syncs.md#es-connectors-sync-types-full) is required for advanced sync rules to take effect. +::::{note} +A [full sync](es-connectors-sync-types.md#es-connectors-sync-types-full) is required for advanced sync rules to take effect. :::: @@ -278,6 +573,8 @@ $$$es-connectors-s3-client-sync-rules-advanced-examples$$$ ] ``` +% NOTCONSOLE + **Example**: Fetch files/folder starting with `folder1`. ```js @@ -289,6 +586,8 @@ $$$es-connectors-s3-client-sync-rules-advanced-examples$$$ ] ``` +% NOTCONSOLE + **Fetching files and folders by specifying extensions** **Example**: Fetch all objects which start with `abc` and then filter using file extensions. @@ -303,15 +602,17 @@ $$$es-connectors-s3-client-sync-rules-advanced-examples$$$ ] ``` +% NOTCONSOLE -### Content extraction [es-connectors-s3-client-content-extraction] -See [Content extraction](/reference/ingestion-tools/search-connectors/es-connectors-content-extraction.md). +### Content extraction [es-connectors-s3-client-content-extraction] +See [Content extraction](es-connectors-content-extraction.md). -### End-to-end testing [es-connectors-s3-client-testing] -The connector framework enables operators to run functional tests against a real data source. Refer to [Connector testing](/reference/ingestion-tools/search-connectors/self-managed-connectors.md#es-build-connector-testing) for more details. +### End-to-end testing [es-connectors-s3-client-testing] + +The connector framework enables operators to run functional tests against a real data source. Refer to [Connector testing](es-build-connector.md#es-build-connector-testing) for more details. To execute a functional test for the Amazon S3 **self-managed connector**, run the following command: @@ -326,25 +627,31 @@ make ftest NAME=s3 DATA_SIZE=small ``` -### Known issues [es-connectors-s3-client-known-issues] +### Known issues [es-connectors-s3-client-known-issues] There are no known issues for this connector. -See [Known issues](/release-notes/known-issues.md) for any issues affecting all connectors. +See [Known issues](es-connectors-known-issues.md) for any issues affecting all connectors. -### Troubleshooting [es-connectors-s3-client-troubleshooting] +### Troubleshooting [es-connectors-s3-client-troubleshooting] -See [Troubleshooting](/reference/ingestion-tools/search-connectors/es-connectors-troubleshooting.md). +See [Troubleshooting](es-connectors-troubleshooting.md). -### Security [es-connectors-s3-client-security] +### Security [es-connectors-s3-client-security] -See [Security](/reference/ingestion-tools/search-connectors/es-connectors-security.md). +See [Security](es-connectors-security.md). -### Framework and source [es-connectors-s3-client-source] +### Framework and source [es-connectors-s3-client-source] This connector is built with the [Elastic connector framework](https://github.com/elastic/connectors/tree/main). -View the [source code for this connector](https://github.com/elastic/connectors/tree/main/connectors/sources/s3.py) (branch *main*, compatible with Elastic *9.0*). \ No newline at end of file +View the [source code for this connector](https://github.com/elastic/connectors/tree/main/connectors/sources/s3.py) (branch *main*, compatible with Elastic *9.0*). + +% Closing the collapsible section + +:::::: + + diff --git a/docs/reference/query-languages/query-dsl-match-all-query.md b/docs/reference/query-languages/query-dsl-match-all-query.md index 12d0dcaaab1ca..996bec29bfe5a 100644 --- a/docs/reference/query-languages/query-dsl-match-all-query.md +++ b/docs/reference/query-languages/query-dsl-match-all-query.md @@ -1,7 +1,5 @@ --- navigation_title: "Match all" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-all-query.html --- # Match all query [query-dsl-match-all-query] @@ -30,7 +28,7 @@ GET /_search ``` -## Match None Query [query-dsl-match-none-query] +## Match None Query [query-dsl-match-none-query] This is the inverse of the `match_all` query, which matches no documents. diff --git a/docs/reference/query-languages/query-dsl-mlt-query.md b/docs/reference/query-languages/query-dsl-mlt-query.md index 17ef4afa32402..343f03711c68c 100644 --- a/docs/reference/query-languages/query-dsl-mlt-query.md +++ b/docs/reference/query-languages/query-dsl-mlt-query.md @@ -1,13 +1,11 @@ --- -navigation_title: "more_like_this" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html +navigation_title: "More like this" --- -# more_like_this query [query-dsl-mlt-query] +# More like this query [query-dsl-mlt-query] -The `more_like_this` query finds documents that are "like" a given set of documents. To do so, MLT selects a set of representative terms of these input documents, forms a query using these terms, executes the query and returns the results. The user controls the input documents, how the terms should be selected and how the query is formed. +The More Like This Query finds documents that are "like" a given set of documents. In order to do so, MLT selects a set of representative terms of these input documents, forms a query using these terms, executes the query and returns the results. The user controls the input documents, how the terms should be selected and how the query is formed. The simplest use case consists of asking for documents that are similar to a provided piece of text. Here, we are asking for all movies that have some text similar to "Once upon a time" in their "title" and in their "description" fields, limiting the number of selected terms to 12. @@ -25,7 +23,7 @@ GET /_search } ``` -A more complicated use case consists of mixing texts with documents already existing in the index. In this case, the syntax to specify a document is similar to the one used in the [Multi GET API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-mget). +A more complicated use case consists of mixing texts with documents already existing in the index. In this case, the syntax to specify a document is similar to the one used in the [Multi GET API](docs-multi-get.md). ```console GET /_search @@ -51,7 +49,7 @@ GET /_search } ``` -Finally, users can mix some texts, a chosen set of documents but also provide documents not necessarily present in the index. To provide documents not present in the index, the syntax is similar to [artificial documents](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-termvectors). +Finally, users can mix some texts, a chosen set of documents but also provide documents not necessarily present in the index. To provide documents not present in the index, the syntax is similar to [artificial documents](docs-termvectors.md#docs-termvectors-artificial-doc). ```console GET /_search @@ -82,11 +80,11 @@ GET /_search } ``` -## How it works [_how_it_works] +## How it Works [_how_it_works] -Suppose we wanted to find all documents similar to a given input document. Obviously, the input document itself should be its best match for that type of query. And the reason would be mostly, according to [Lucene scoring formula](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html), due to the terms with the highest tf-idf. Therefore, the terms of the input document that have the highest tf-idf are good representatives of that document, and could be used within a disjunctive query (or `OR`) to retrieve similar documents. The MLT query simply extracts the text from the input document, analyzes it, usually using the same analyzer at the field, then selects the top K terms with highest tf-idf to form a disjunctive query of these terms. +Suppose we wanted to find all documents similar to a given input document. Obviously, the input document itself should be its best match for that type of query. And the reason would be mostly, according to [Lucene scoring formula](https://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.md), due to the terms with the highest tf-idf. Therefore, the terms of the input document that have the highest tf-idf are good representatives of that document, and could be used within a disjunctive query (or `OR`) to retrieve similar documents. The MLT query simply extracts the text from the input document, analyzes it, usually using the same analyzer at the field, then selects the top K terms with highest tf-idf to form a disjunctive query of these terms. -::::{important} +::::{important} The fields on which to perform MLT must be indexed and of type `text` or `keyword`. Additionally, when using `like` with documents, either `_source` must be enabled or the fields must be `stored` or store `term_vector`. In order to speed up analysis, it could help to store term vectors at index time. :::: @@ -126,19 +124,19 @@ PUT /imdb The only required parameter is `like`, all other parameters have sensible defaults. There are three types of parameters: one to specify the document input, the other one for term selection and for query formation. -### Document input parameters [_document_input_parameters] +### Document Input Parameters [_document_input_parameters] `like` -: The only **required** parameter of the MLT query is `like` and follows a versatile syntax, in which the user can specify free form text and/or a single or multiple documents (see examples above). The syntax to specify documents is similar to the one used by the [Multi GET API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-mget). When specifying documents, the text is fetched from `fields` unless overridden in each document request. The text is analyzed by the analyzer at the field, but could also be overridden. The syntax to override the analyzer at the field follows a similar syntax to the `per_field_analyzer` parameter of the [Term Vectors API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-termvectors). Additionally, to provide documents not necessarily present in the index, [artificial documents](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-termvectors) are also supported. +: The only **required** parameter of the MLT query is `like` and follows a versatile syntax, in which the user can specify free form text and/or a single or multiple documents (see examples above). The syntax to specify documents is similar to the one used by the [Multi GET API](docs-multi-get.md). When specifying documents, the text is fetched from `fields` unless overridden in each document request. The text is analyzed by the analyzer at the field, but could also be overridden. The syntax to override the analyzer at the field follows a similar syntax to the `per_field_analyzer` parameter of the [Term Vectors API](docs-termvectors.md#docs-termvectors-per-field-analyzer). Additionally, to provide documents not necessarily present in the index, [artificial documents](docs-termvectors.md#docs-termvectors-artificial-doc) are also supported. `unlike` : The `unlike` parameter is used in conjunction with `like` in order not to select terms found in a chosen set of documents. In other words, we could ask for documents `like: "Apple"`, but `unlike: "cake crumble tree"`. The syntax is the same as `like`. `fields` -: A list of fields to fetch and analyze the text from. Defaults to the `index.query.default_field` index setting, which has a default value of `*`. The `*` value matches all fields eligible for [term-level queries](/reference/query-languages/term-level-queries.md), excluding metadata fields. +: A list of fields to fetch and analyze the text from. Defaults to the `index.query.default_field` index setting, which has a default value of `*`. The `*` value matches all fields eligible for [term-level queries](term-level-queries.md), excluding metadata fields. -### Term selection parameters [mlt-query-term-selection] +### Term Selection Parameters [mlt-query-term-selection] `max_query_terms` : The maximum number of query terms that will be selected. Increasing this value gives greater accuracy at the expense of query execution speed. Defaults to `25`. @@ -165,10 +163,10 @@ The only required parameter is `like`, all other parameters have sensible defaul : The analyzer that is used to analyze the free form text. Defaults to the analyzer associated with the first field in `fields`. -### Query formation parameters [_query_formation_parameters] +### Query Formation Parameters [_query_formation_parameters] `minimum_should_match` -: After the disjunctive query has been formed, this parameter controls the number of terms that must match. The syntax is the same as the [minimum should match](/reference/query-languages/query-dsl-minimum-should-match.md). (Defaults to `"30%"`). +: After the disjunctive query has been formed, this parameter controls the number of terms that must match. The syntax is the same as the [minimum should match](query-dsl-minimum-should-match.md). (Defaults to `"30%"`). `fail_on_unsupported_field` : Controls whether the query should fail (throw an exception) if any of the specified fields are not of the supported types (`text` or `keyword`). Set this to `false` to ignore the field and continue processing. Defaults to `true`. @@ -185,6 +183,6 @@ The only required parameter is `like`, all other parameters have sensible defaul ## Alternative [_alternative] -To take more control over the construction of a query for similar documents it is worth considering writing custom client code to assemble selected terms from an example document into a Boolean query with the desired settings. The logic in `more_like_this` that selects "interesting" words from a piece of text is also accessible via the [TermVectors API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-termvectors). For example, using the termvectors API it would be possible to present users with a selection of topical keywords found in a document’s text, allowing them to select words of interest to drill down on, rather than using the more "black-box" approach of matching used by `more_like_this`. +To take more control over the construction of a query for similar documents it is worth considering writing custom client code to assemble selected terms from an example document into a Boolean query with the desired settings. The logic in `more_like_this` that selects "interesting" words from a piece of text is also accessible via the [TermVectors API](docs-termvectors.md). For example, using the termvectors API it would be possible to present users with a selection of topical keywords found in a document’s text, allowing them to select words of interest to drill down on, rather than using the more "black-box" approach of matching used by `more_like_this`. diff --git a/docs/reference/query-languages/sql-functions-geo.md b/docs/reference/query-languages/sql-functions-geo.md index 2bcdbc8bf1f3c..46030508b7471 100644 --- a/docs/reference/query-languages/sql-functions-geo.md +++ b/docs/reference/query-languages/sql-functions-geo.md @@ -1,11 +1,6 @@ ---- -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-geo.html ---- - # Geo Functions [sql-functions-geo] -::::{warning} +::::{warning} This functionality is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features. :::: @@ -14,7 +9,7 @@ The geo functions work with geometries stored in `geo_point`, `geo_shape` and `s ## Limitations [_limitations_4] -[`geo_point`](/reference/elasticsearch/mapping-reference/geo-point.md), [`geo_shape`](/reference/elasticsearch/mapping-reference/geo-shape.md) and [`shape`](/reference/elasticsearch/mapping-reference/shape.md) and types are represented in SQL as geometry and can be used interchangeably with the following exceptions: +[`geo_point`](geo-point.md), [`geo_shape`](geo-shape.md) and [`shape`](shape.md) and types are represented in SQL as geometry and can be used interchangeably with the following exceptions: * `geo_shape` and `shape` fields don’t have doc values, therefore these fields cannot be used for filtering, grouping or sorting. * `geo_points` fields are indexed and have doc values by default, however only latitude and longitude are stored and indexed with some loss of precision from the original values (4.190951585769653E-8 for the latitude and 8.381903171539307E-8 for longitude). The altitude component is accepted but not stored in doc values nor indexed. Therefore calling `ST_Z` function in the filtering, grouping or sorting will return `null`.