Skip to content

Commit baa9703

Browse files
Document special behaviour of ignore_malformed for geo_point mappings (#125692) (#126384)
With `geo_point` fields, here is the special case of values that have a syntactically valid format, but the numerical values for `latitude` and `longitude` are out of range. If `ignore_malformed` is `false`, an exception will be thrown as usual. But if it is `true`, the document will be indexed correctly, by normalizing the latitude and longitude values into the valid range. The special `_ignored` field will not be set. The original source document will remain as before, but indexed values, doc-values and stored fields will all be normalized.
1 parent 03503e8 commit baa9703

File tree

2 files changed

+87
-19
lines changed

2 files changed

+87
-19
lines changed

docs/reference/elasticsearch/mapping-reference/geo-point.md

Lines changed: 71 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,23 @@ mapped_pages:
99

1010
Fields of type `geo_point` accept latitude-longitude pairs, which can be used:
1111

12-
* to find geopoints within a [bounding box](/reference/query-languages/query-dsl/query-dsl-geo-bounding-box-query.md), within a certain [distance](/reference/query-languages/query-dsl/query-dsl-geo-distance-query.md) of a central point, or within a [`geo_shape` query](/reference/query-languages/query-dsl/query-dsl-geo-shape-query.md) (for example, points in a polygon).
12+
* to find geopoints within a [bounding box](/reference/query-languages/query-dsl/query-dsl-geo-bounding-box-query.md),
13+
within a certain [distance](/reference/query-languages/query-dsl/query-dsl-geo-distance-query.md) of a central point,
14+
or within a [`geo_shape` query](/reference/query-languages/query-dsl/query-dsl-geo-shape-query.md) (for example, points in a polygon).
1315
* to aggregate documents by [distance](/reference/aggregations/search-aggregations-bucket-geodistance-aggregation.md) from a central point.
14-
* to aggregate documents by geographic grids: either [`geo_hash`](/reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md), [`geo_tile`](/reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) or [`geo_hex`](/reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md).
15-
* to aggregate geopoints into a track using the metrics aggregation [`geo_line`](/reference/aggregations/search-aggregations-metrics-geo-line.md).
16+
* to aggregate documents by geographic grids: either
17+
[`geo_hash`](/reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md),
18+
[`geo_tile`](/reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) or
19+
[`geo_hex`](/reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md).
20+
* to aggregate geopoints into a track using the metrics aggregation
21+
[`geo_line`](/reference/aggregations/search-aggregations-metrics-geo-line.md).
1622
* to integrate distance into a document’s [relevance score](/reference/query-languages/query-dsl/query-dsl-function-score-query.md).
1723
* to [sort](/reference/elasticsearch/rest-apis/sort-search-results.md#geo-sorting) documents by distance.
1824

19-
As with [geo_shape](/reference/elasticsearch/mapping-reference/geo-shape.md) and [point](/reference/elasticsearch/mapping-reference/point.md), `geo_point` can be specified in [GeoJSON](http://geojson.org) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) formats. However, there are a number of additional formats that are supported for convenience and historical reasons. In total there are six ways that a geopoint may be specified, as demonstrated below:
25+
As with [geo_shape](/reference/elasticsearch/mapping-reference/geo-shape.md) and [point](/reference/elasticsearch/mapping-reference/point.md), `geo_point` can be specified in [GeoJSON](http://geojson.org)
26+
and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) formats.
27+
However, there are a number of additional formats that are supported for convenience and historical reasons.
28+
In total there are six ways that a geopoint may be specified, as demonstrated below:
2029

2130
```console
2231
PUT my-index-000001
@@ -103,15 +112,28 @@ GET my-index-000001/_search
103112
::::{admonition} Geopoints expressed as an array or string
104113
:class: important
105114

106-
Please note that string geopoints are ordered as `lat,lon`, while array geopoints, GeoJSON and WKT are ordered as the reverse: `lon,lat`.
115+
Please note that string geopoints are ordered as `lat,lon`, while array
116+
geopoints, GeoJSON and WKT are ordered as the reverse: `lon,lat`.
107117

108-
The reasons for this are historical. Geographers traditionally write `latitude` before `longitude`, while recent formats specified for geographic data like [GeoJSON](https://geojson.org/) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) order `longitude` before `latitude` (easting before northing) in order to match the mathematical convention of ordering `x` before `y`.
118+
The reasons for this are historical. Geographers traditionally write `latitude`
119+
before `longitude`, while recent formats specified for geographic data like
120+
[GeoJSON](https://geojson.org/) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html)
121+
order `longitude` before `latitude` (easting before northing) in order to match
122+
the mathematical convention of ordering `x` before `y`.
109123

110124
::::
111125

112126

113127
::::{note}
114-
A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash). Geohashes are [base32](https://en.wikipedia.org/wiki/Base32) encoded strings of the bits of the latitude and longitude interleaved. Each character in a geohash adds additional 5 bits to the precision. So the longer the hash, the more precise it is. For the indexing purposed geohashs are translated into latitude-longitude pairs. During this process only first 12 characters are used, so specifying more than 12 characters in a geohash doesn’t increase the precision. The 12 characters provide 60 bits, which should reduce a possible error to less than 2cm.
128+
A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash).
129+
Geohashes are [base32](https://en.wikipedia.org/wiki/Base32) encoded strings of
130+
the bits of the latitude and longitude interleaved. Each character in a geohash
131+
adds additional 5 bits to the precision. So the longer the hash, the more
132+
precise it is. For the indexing purposed geohashs are translated into
133+
latitude-longitude pairs. During this process only first 12 characters are
134+
used, so specifying more than 12 characters in a geohash doesn’t increase the
135+
precision. The 12 characters provide 60 bits, which should reduce a possible
136+
error to less than 2cm.
115137
::::
116138

117139

@@ -120,27 +142,54 @@ A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash).
120142
The following parameters are accepted by `geo_point` fields:
121143

122144
[`ignore_malformed`](/reference/elasticsearch/mapping-reference/ignore-malformed.md)
123-
: If `true`, malformed geopoints are ignored. If `false` (default), malformed geopoints throw an exception and reject the whole document. A geopoint is considered malformed if its latitude is outside the range -90 ⇐ latitude ⇐ 90, or if its longitude is outside the range -180 ⇐ longitude ⇐ 180. Note that this cannot be set if the `script` parameter is used.
145+
: If `true`, malformed geopoints are ignored.
146+
If `false` (default), malformed geopoints throw an exception and reject the whole document.
147+
A geopoint is considered malformed if its latitude is outside the range -90 ⇐ latitude ⇐ 90,
148+
or if its longitude is outside the range -180 ⇐ longitude ⇐ 180.
149+
When set to `true`, if the format is valid, but the values are out of range,
150+
the values will be normalized into the valid range, and the document will be indexed.
151+
This is a special case, and a [different behaviour](/reference/elasticsearch/mapping-reference/ignore-malformed.md#_ignore_malformed_geo_point) from the normal for `ignore_malformed`.
152+
Note that this cannot be set if the `script` parameter is used.
124153

125154
`ignore_z_value`
126-
: If `true` (default) three dimension points will be accepted (stored in source) but only latitude and longitude values will be indexed; the third dimension is ignored. If `false`, geopoints containing any more than latitude and longitude (two dimensions) values throw an exception and reject the whole document. Note that this cannot be set if the `script` parameter is used.
155+
: If `true` (default) three dimension points will be accepted (stored in source)
156+
but only latitude and longitude values will be indexed; the third dimension is
157+
ignored. If `false`, geopoints containing any more than latitude and longitude
158+
(two dimensions) values throw an exception and reject the whole document. Note
159+
that this cannot be set if the `script` parameter is used.
127160

128161
[`index`](/reference/elasticsearch/mapping-reference/mapping-index.md)
129-
: Should the field be quickly searchable? Accepts `true` (default) and `false`. Fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md) enabled can still be queried, albeit slower.
162+
: Should the field be quickly searchable? Accepts `true` (default) and
163+
`false`. Fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md)
164+
enabled can still be queried, albeit slower.
130165

131166
[`null_value`](/reference/elasticsearch/mapping-reference/null-value.md)
132-
: Accepts an geopoint value which is substituted for any explicit `null` values. Defaults to `null`, which means the field is treated as missing. Note that this cannot be set if the `script` parameter is used.
167+
: Accepts a geopoint value which is substituted for any explicit `null` values.
168+
Defaults to `null`, which means the field is treated as missing. Note that this
169+
cannot be set if the `script` parameter is used.
133170

134171
`on_script_error`
135-
: Defines what to do if the script defined by the `script` parameter throws an error at indexing time. Accepts `fail` (default), which will cause the entire document to be rejected, and `continue`, which will register the field in the document’s [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) metadata field and continue indexing. This parameter can only be set if the `script` field is also set.
172+
: Defines what to do if the script defined by the `script` parameter
173+
throws an error at indexing time. Accepts `fail` (default), which
174+
will cause the entire document to be rejected, and `continue`, which
175+
will register the field in the document’s [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) metadata field and continue
176+
indexing. This parameter can only be set if the `script` field is
177+
also set.
136178

137179
`script`
138-
: If this parameter is set, then the field will index values generated by this script, rather than reading the values directly from the source. If a value is set for this field on the input document, then the document will be rejected with an error. Scripts are in the same format as their [runtime equivalent](docs-content://manage-data/data-store/mapping/map-runtime-field.md), and should emit points as a pair of (lat, lon) double values.
180+
: If this parameter is set, then the field will index values generated
181+
by this script, rather than reading the values directly from the
182+
source. If a value is set for this field on the input document, then
183+
the document will be rejected with an error.
184+
Scripts are in the same format as their [runtime equivalent](docs-content://manage-data/data-store/mapping/map-runtime-field.md), and should emit points
185+
as a pair of (lat, lon) double values.
139186

140187

141188
## Using geopoints in scripts [_using_geopoints_in_scripts]
142189

143-
When accessing the value of a geopoint in a script, the value is returned as a `GeoPoint` object, which allows access to the `.lat` and `.lon` values respectively:
190+
When accessing the value of a geopoint in a script, the value is returned as
191+
a `GeoPoint` object, which allows access to the `.lat` and `.lon` values
192+
respectively:
144193

145194
```painless
146195
def geopoint = doc['location'].value;
@@ -159,11 +208,17 @@ def lon = doc['location'].lon;
159208
## Synthetic source [geo-point-synthetic-source]
160209

161210
::::{important}
162-
Synthetic `_source` is Generally Available only for TSDB indices (indices that have `index.mode` set to `time_series`). For other indices synthetic `_source` is in technical preview. Features in technical preview may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
211+
Synthetic `_source` is Generally Available only for TSDB indices
212+
(indices that have `index.mode` set to `time_series`). For other indices
213+
synthetic `_source` is in technical preview. Features in technical preview may
214+
be changed or removed in a future release. Elastic will work to fix
215+
any issues, but features in technical preview are not subject to the support SLA
216+
of official GA features.
163217
::::
164218

165219

166-
Synthetic source may sort `geo_point` fields (first by latitude and then longitude) and reduces them to their stored precision. For example:
220+
Synthetic source may sort `geo_point` fields (first by latitude and then
221+
longitude) and reduces them to their stored precision. For example:
167222

168223
$$$synthetic-source-geo-point-example$$$
169224

docs/reference/elasticsearch/mapping-reference/ignore-malformed.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ The `ignore_malformed` setting is currently supported by the following [mapping
5959
: `date_nanos`
6060

6161
[Geopoint](/reference/elasticsearch/mapping-reference/geo-point.md)
62-
: `geo_point` for lat/lon points
62+
: `geo_point` for lat/lon points, although there is a [special case](#_ignore_malformed_geo_point) for out-of-range values
6363

6464
[Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md)
6565
: `geo_shape` for complex shapes like polygons
@@ -103,8 +103,21 @@ PUT my-index-000001
103103

104104
## Dealing with malformed fields [_dealing_with_malformed_fields]
105105

106-
Malformed fields are silently ignored at indexing time when `ignore_malformed` is turned on. Whenever possible it is recommended to keep the number of documents that have a malformed field contained, or queries on this field will become meaningless. Elasticsearch makes it easy to check how many documents have malformed fields by using `exists`,`term` or `terms` queries on the special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field.
107-
106+
Malformed fields are silently ignored at indexing time when `ignore_malformed` is turned on.
107+
Whenever possible it is recommended to keep the number of documents that have a malformed field contained,
108+
or queries on this field will become meaningless.
109+
Elasticsearch makes it easy to check how many documents have malformed fields by using `exists`,
110+
`term` or `terms` queries on the special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field.
111+
112+
## The special case of `geo_point` fields [_ignore_malformed_geo_point]
113+
114+
With [`geo_point`](/reference/elasticsearch/mapping-reference/geo-point.md) fields,
115+
there is the special case of values that have a syntactically valid format,
116+
but the numerical values for `latitude` and `longitude` are out of range.
117+
If `ignore_malformed` is `false`, an exception will be thrown as usual. But if it is `true`,
118+
the document will be indexed correctly, by normalizing the latitude and longitude values into the valid range.
119+
The special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field will not be set.
120+
The original source document will remain as before, but indexed values, doc-values and stored fields will all be normalized.
108121

109122
## Limits for JSON Objects [json-object-limits]
110123

0 commit comments

Comments
 (0)