From a1ac87aa1ce2c005d4b3ff0cb7464094eba3023b Mon Sep 17 00:00:00 2001 From: Craig Taverner Date: Mon, 7 Apr 2025 11:05:51 +0200 Subject: [PATCH] Document special behaviour of ignore_malformed for geo_point mappings (#125692) With `geo_point` fields, here is the special case of values that have a syntactically valid format, but the numerical values for `latitude` and `longitude` are out of range. If `ignore_malformed` is `false`, an exception will be thrown as usual. But if it is `true`, the document will be indexed correctly, by normalizing the latitude and longitude values into the valid range. The special `_ignored` field will not be set. The original source document will remain as before, but indexed values, doc-values and stored fields will all be normalized. --- .../mapping-reference/geo-point.md | 87 +++++++++++++++---- .../mapping-reference/ignore-malformed.md | 19 +++- 2 files changed, 87 insertions(+), 19 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/geo-point.md b/docs/reference/elasticsearch/mapping-reference/geo-point.md index d69033c219695..faeada7cf15a4 100644 --- a/docs/reference/elasticsearch/mapping-reference/geo-point.md +++ b/docs/reference/elasticsearch/mapping-reference/geo-point.md @@ -9,14 +9,23 @@ mapped_pages: Fields of type `geo_point` accept latitude-longitude pairs, which can be used: -* to find geopoints within a [bounding box](/reference/query-languages/query-dsl/query-dsl-geo-bounding-box-query.md), within a certain [distance](/reference/query-languages/query-dsl/query-dsl-geo-distance-query.md) of a central point, or within a [`geo_shape` query](/reference/query-languages/query-dsl/query-dsl-geo-shape-query.md) (for example, points in a polygon). +* to find geopoints within a [bounding box](/reference/query-languages/query-dsl/query-dsl-geo-bounding-box-query.md), + within a certain [distance](/reference/query-languages/query-dsl/query-dsl-geo-distance-query.md) of a central point, + or within a [`geo_shape` query](/reference/query-languages/query-dsl/query-dsl-geo-shape-query.md) (for example, points in a polygon). * to aggregate documents by [distance](/reference/aggregations/search-aggregations-bucket-geodistance-aggregation.md) from a central point. -* to aggregate documents by geographic grids: either [`geo_hash`](/reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md), [`geo_tile`](/reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) or [`geo_hex`](/reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md). -* to aggregate geopoints into a track using the metrics aggregation [`geo_line`](/reference/aggregations/search-aggregations-metrics-geo-line.md). +* to aggregate documents by geographic grids: either + [`geo_hash`](/reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md), + [`geo_tile`](/reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) or + [`geo_hex`](/reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md). +* to aggregate geopoints into a track using the metrics aggregation + [`geo_line`](/reference/aggregations/search-aggregations-metrics-geo-line.md). * to integrate distance into a document’s [relevance score](/reference/query-languages/query-dsl/query-dsl-function-score-query.md). * to [sort](/reference/elasticsearch/rest-apis/sort-search-results.md#geo-sorting) documents by distance. -As with [geo_shape](/reference/elasticsearch/mapping-reference/geo-shape.md) and [point](/reference/elasticsearch/mapping-reference/point.md), `geo_point` can be specified in [GeoJSON](http://geojson.org) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) formats. However, there are a number of additional formats that are supported for convenience and historical reasons. In total there are six ways that a geopoint may be specified, as demonstrated below: +As with [geo_shape](/reference/elasticsearch/mapping-reference/geo-shape.md) and [point](/reference/elasticsearch/mapping-reference/point.md), `geo_point` can be specified in [GeoJSON](http://geojson.org) +and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) formats. +However, there are a number of additional formats that are supported for convenience and historical reasons. +In total there are six ways that a geopoint may be specified, as demonstrated below: ```console PUT my-index-000001 @@ -103,15 +112,28 @@ GET my-index-000001/_search ::::{admonition} Geopoints expressed as an array or string :class: important -Please note that string geopoints are ordered as `lat,lon`, while array geopoints, GeoJSON and WKT are ordered as the reverse: `lon,lat`. +Please note that string geopoints are ordered as `lat,lon`, while array +geopoints, GeoJSON and WKT are ordered as the reverse: `lon,lat`. -The reasons for this are historical. Geographers traditionally write `latitude` before `longitude`, while recent formats specified for geographic data like [GeoJSON](https://geojson.org/) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) order `longitude` before `latitude` (easting before northing) in order to match the mathematical convention of ordering `x` before `y`. +The reasons for this are historical. Geographers traditionally write `latitude` +before `longitude`, while recent formats specified for geographic data like +[GeoJSON](https://geojson.org/) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) +order `longitude` before `latitude` (easting before northing) in order to match +the mathematical convention of ordering `x` before `y`. :::: ::::{note} -A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash). Geohashes are [base32](https://en.wikipedia.org/wiki/Base32) encoded strings of the bits of the latitude and longitude interleaved. Each character in a geohash adds additional 5 bits to the precision. So the longer the hash, the more precise it is. For the indexing purposed geohashs are translated into latitude-longitude pairs. During this process only first 12 characters are used, so specifying more than 12 characters in a geohash doesn’t increase the precision. The 12 characters provide 60 bits, which should reduce a possible error to less than 2cm. +A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash). +Geohashes are [base32](https://en.wikipedia.org/wiki/Base32) encoded strings of +the bits of the latitude and longitude interleaved. Each character in a geohash +adds additional 5 bits to the precision. So the longer the hash, the more +precise it is. For the indexing purposed geohashs are translated into +latitude-longitude pairs. During this process only first 12 characters are +used, so specifying more than 12 characters in a geohash doesn’t increase the +precision. The 12 characters provide 60 bits, which should reduce a possible +error to less than 2cm. :::: @@ -120,27 +142,54 @@ A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash). The following parameters are accepted by `geo_point` fields: [`ignore_malformed`](/reference/elasticsearch/mapping-reference/ignore-malformed.md) -: If `true`, malformed geopoints are ignored. If `false` (default), malformed geopoints throw an exception and reject the whole document. A geopoint is considered malformed if its latitude is outside the range -90 ⇐ latitude ⇐ 90, or if its longitude is outside the range -180 ⇐ longitude ⇐ 180. Note that this cannot be set if the `script` parameter is used. +: If `true`, malformed geopoints are ignored. + If `false` (default), malformed geopoints throw an exception and reject the whole document. + A geopoint is considered malformed if its latitude is outside the range -90 ⇐ latitude ⇐ 90, + or if its longitude is outside the range -180 ⇐ longitude ⇐ 180. + When set to `true`, if the format is valid, but the values are out of range, + the values will be normalized into the valid range, and the document will be indexed. + This is a special case, and a [different behaviour](/reference/elasticsearch/mapping-reference/ignore-malformed.md#_ignore_malformed_geo_point) from the normal for `ignore_malformed`. + Note that this cannot be set if the `script` parameter is used. `ignore_z_value` -: If `true` (default) three dimension points will be accepted (stored in source) but only latitude and longitude values will be indexed; the third dimension is ignored. If `false`, geopoints containing any more than latitude and longitude (two dimensions) values throw an exception and reject the whole document. Note that this cannot be set if the `script` parameter is used. +: If `true` (default) three dimension points will be accepted (stored in source) + but only latitude and longitude values will be indexed; the third dimension is + ignored. If `false`, geopoints containing any more than latitude and longitude + (two dimensions) values throw an exception and reject the whole document. Note + that this cannot be set if the `script` parameter is used. [`index`](/reference/elasticsearch/mapping-reference/mapping-index.md) -: Should the field be quickly searchable? Accepts `true` (default) and `false`. Fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md) enabled can still be queried, albeit slower. +: Should the field be quickly searchable? Accepts `true` (default) and + `false`. Fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md) + enabled can still be queried, albeit slower. [`null_value`](/reference/elasticsearch/mapping-reference/null-value.md) -: Accepts an geopoint value which is substituted for any explicit `null` values. Defaults to `null`, which means the field is treated as missing. Note that this cannot be set if the `script` parameter is used. +: Accepts a geopoint value which is substituted for any explicit `null` values. + Defaults to `null`, which means the field is treated as missing. Note that this + cannot be set if the `script` parameter is used. `on_script_error` -: Defines what to do if the script defined by the `script` parameter throws an error at indexing time. Accepts `fail` (default), which will cause the entire document to be rejected, and `continue`, which will register the field in the document’s [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) metadata field and continue indexing. This parameter can only be set if the `script` field is also set. +: Defines what to do if the script defined by the `script` parameter + throws an error at indexing time. Accepts `fail` (default), which + will cause the entire document to be rejected, and `continue`, which + will register the field in the document’s [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) metadata field and continue + indexing. This parameter can only be set if the `script` field is + also set. `script` -: If this parameter is set, then the field will index values generated by this script, rather than reading the values directly from the source. If a value is set for this field on the input document, then the document will be rejected with an error. Scripts are in the same format as their [runtime equivalent](docs-content://manage-data/data-store/mapping/map-runtime-field.md), and should emit points as a pair of (lat, lon) double values. +: If this parameter is set, then the field will index values generated + by this script, rather than reading the values directly from the + source. If a value is set for this field on the input document, then + the document will be rejected with an error. + Scripts are in the same format as their [runtime equivalent](docs-content://manage-data/data-store/mapping/map-runtime-field.md), and should emit points + as a pair of (lat, lon) double values. ## Using geopoints in scripts [_using_geopoints_in_scripts] -When accessing the value of a geopoint in a script, the value is returned as a `GeoPoint` object, which allows access to the `.lat` and `.lon` values respectively: +When accessing the value of a geopoint in a script, the value is returned as +a `GeoPoint` object, which allows access to the `.lat` and `.lon` values +respectively: ```painless def geopoint = doc['location'].value; @@ -159,11 +208,17 @@ def lon = doc['location'].lon; ## Synthetic source [geo-point-synthetic-source] ::::{important} -Synthetic `_source` is Generally Available only for TSDB indices (indices that have `index.mode` set to `time_series`). For other indices synthetic `_source` is in technical preview. Features in technical preview may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. +Synthetic `_source` is Generally Available only for TSDB indices +(indices that have `index.mode` set to `time_series`). For other indices +synthetic `_source` is in technical preview. Features in technical preview may +be changed or removed in a future release. Elastic will work to fix +any issues, but features in technical preview are not subject to the support SLA +of official GA features. :::: -Synthetic source may sort `geo_point` fields (first by latitude and then longitude) and reduces them to their stored precision. For example: +Synthetic source may sort `geo_point` fields (first by latitude and then +longitude) and reduces them to their stored precision. For example: $$$synthetic-source-geo-point-example$$$ diff --git a/docs/reference/elasticsearch/mapping-reference/ignore-malformed.md b/docs/reference/elasticsearch/mapping-reference/ignore-malformed.md index b5ae850adfce4..ea3f111abecfa 100644 --- a/docs/reference/elasticsearch/mapping-reference/ignore-malformed.md +++ b/docs/reference/elasticsearch/mapping-reference/ignore-malformed.md @@ -59,7 +59,7 @@ The `ignore_malformed` setting is currently supported by the following [mapping : `date_nanos` [Geopoint](/reference/elasticsearch/mapping-reference/geo-point.md) -: `geo_point` for lat/lon points +: `geo_point` for lat/lon points, although there is a [special case](#_ignore_malformed_geo_point) for out-of-range values [Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md) : `geo_shape` for complex shapes like polygons @@ -103,8 +103,21 @@ PUT my-index-000001 ## Dealing with malformed fields [_dealing_with_malformed_fields] -Malformed fields are silently ignored at indexing time when `ignore_malformed` is turned on. Whenever possible it is recommended to keep the number of documents that have a malformed field contained, or queries on this field will become meaningless. Elasticsearch makes it easy to check how many documents have malformed fields by using `exists`,`term` or `terms` queries on the special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field. - +Malformed fields are silently ignored at indexing time when `ignore_malformed` is turned on. +Whenever possible it is recommended to keep the number of documents that have a malformed field contained, +or queries on this field will become meaningless. +Elasticsearch makes it easy to check how many documents have malformed fields by using `exists`, +`term` or `terms` queries on the special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field. + +## The special case of `geo_point` fields [_ignore_malformed_geo_point] + +With [`geo_point`](/reference/elasticsearch/mapping-reference/geo-point.md) fields, +there is the special case of values that have a syntactically valid format, +but the numerical values for `latitude` and `longitude` are out of range. +If `ignore_malformed` is `false`, an exception will be thrown as usual. But if it is `true`, +the document will be indexed correctly, by normalizing the latitude and longitude values into the valid range. +The special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field will not be set. +The original source document will remain as before, but indexed values, doc-values and stored fields will all be normalized. ## Limits for JSON Objects [json-object-limits]