Skip to content

Commit 9bc751a

Browse files
committed
Merge branch 'main' into pinned-retriever
2 parents 760415d + 212971a commit 9bc751a

File tree

130 files changed

+1949
-295
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

130 files changed

+1949
-295
lines changed

benchmarks/README.md

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,19 +82,21 @@ To get realistic results, you should exercise care when running benchmarks. Here
8282
NOTE: Linux only. Sorry Mac and Windows.
8383

8484
Disassembling is fun! Maybe not always useful, but always fun! Generally, you'll want to install `perf` and the JDK's `hsdis`.
85-
`perf` is generally available via `apg-get install perf` or `pacman -S perf`. `hsdis` you'll want to compile from source. is a little more involved. This worked
85+
`perf` is generally available via `apg-get install perf` or `pacman -S perf linux-tools`. `hsdis` you'll want to compile from source. is a little more involved. This worked
8686
on 2020-08-01:
8787

8888
```
8989
git clone [email protected]:openjdk/jdk.git
9090
cd jdk
91-
git checkout jdk-17-ga
92-
cd src/utils/hsdis
91+
git checkout jdk-24-ga
9392
# Get a known good binutils
9493
wget https://ftp.gnu.org/gnu/binutils/binutils-2.35.tar.gz
9594
tar xf binutils-2.35.tar.gz
96-
make BINUTILS=binutils-2.35 ARCH=amd64
97-
sudo cp build/linux-amd64/hsdis-amd64.so /usr/lib/jvm/java-17-openjdk/lib/server/
95+
bash configure --with-hsdis=binutils --with-binutils-src=binutils-2.35 \
96+
--with-boot-jdk=~/.gradle/jdks/oracle_corporation-24-amd64-linux.2
97+
make build-hsdis
98+
cp ./build/linux-x86_64-server-release/jdk/lib/hsdis-amd64.so \
99+
~/.gradle/jdks/oracle_corporation-24-amd64-linux.2/lib/hsdis.so
98100
```
99101

100102
If you want to disassemble a single method do something like this:
@@ -105,6 +107,30 @@ gradlew -p benchmarks run --args ' MemoryStatsBenchmark -jvmArgs "-XX:+UnlockDia
105107

106108
If you want `perf` to find the hot methods for you, then do add `-prof perfasm`.
107109

110+
NOTE: `perfasm` will need more access:
111+
```
112+
sudo bash
113+
echo -1 > /proc/sys/kernel/perf_event_paranoid
114+
exit
115+
```
116+
117+
If you get warnings like:
118+
```
119+
The perf event count is suspiciously low (0).
120+
```
121+
then check if you are bumping into [this](https://man.archlinux.org/man/perf-stat.1.en#INTEL_HYBRID_SUPPORT)
122+
by running:
123+
```
124+
perf stat -B dd if=/dev/zero of=/dev/null count=1000000
125+
```
126+
127+
If you see lines like:
128+
```
129+
765019980 cpu_atom/cycles/ # 1.728 GHz (0.60%)
130+
2258845959 cpu_core/cycles/ # 5.103 GHz (99.18%)
131+
```
132+
then `perf` is just not going to work for you.
133+
108134
## Async Profiler
109135

110136
Note: Linux and Mac only. Sorry Windows.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the "Elastic License
4+
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
5+
* Public License v 1"; you may not use this file except in compliance with, at
6+
* your election, the "Elastic License 2.0", the "GNU Affero General Public
7+
* License v3.0 only", or the "Server Side Public License, v 1".
8+
*/
9+
10+
package org.elasticsearch.benchmark.compute.operator;
11+
12+
import org.apache.lucene.document.InetAddressPoint;
13+
import org.apache.lucene.util.BytesRef;
14+
import org.elasticsearch.common.breaker.NoopCircuitBreaker;
15+
import org.elasticsearch.common.network.InetAddresses;
16+
import org.elasticsearch.compute.operator.BreakingBytesRefBuilder;
17+
import org.elasticsearch.xpack.esql.expression.function.scalar.convert.ParseIp;
18+
import org.openjdk.jmh.annotations.Benchmark;
19+
import org.openjdk.jmh.annotations.BenchmarkMode;
20+
import org.openjdk.jmh.annotations.Fork;
21+
import org.openjdk.jmh.annotations.Measurement;
22+
import org.openjdk.jmh.annotations.Mode;
23+
import org.openjdk.jmh.annotations.OutputTimeUnit;
24+
import org.openjdk.jmh.annotations.Scope;
25+
import org.openjdk.jmh.annotations.State;
26+
import org.openjdk.jmh.annotations.Warmup;
27+
28+
import java.net.InetAddress;
29+
import java.util.concurrent.TimeUnit;
30+
31+
@Warmup(iterations = 5)
32+
@Measurement(iterations = 7)
33+
@BenchmarkMode(Mode.AverageTime)
34+
@OutputTimeUnit(TimeUnit.NANOSECONDS)
35+
@State(Scope.Thread)
36+
@Fork(1)
37+
public class ParseIpBenchmark {
38+
private final BytesRef ip = new BytesRef("192.168.0.1");
39+
private final BreakingBytesRefBuilder scratch = ParseIp.buildScratch(new NoopCircuitBreaker("request"));
40+
41+
@Benchmark
42+
public BytesRef leadingZerosRejected() {
43+
return ParseIp.leadingZerosRejected(ip, scratch);
44+
}
45+
46+
@Benchmark
47+
public BytesRef leadingZerosAreDecimal() {
48+
return ParseIp.leadingZerosAreDecimal(ip, scratch);
49+
}
50+
51+
@Benchmark
52+
public BytesRef leadingZerosAreOctal() {
53+
return ParseIp.leadingZerosAreOctal(ip, scratch);
54+
}
55+
56+
@Benchmark
57+
public BytesRef original() {
58+
InetAddress inetAddress = InetAddresses.forString(ip.utf8ToString());
59+
return new BytesRef(InetAddressPoint.encode(inetAddress));
60+
}
61+
}

docs/changelog/125562.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 125562
2+
summary: Improve handling of empty response
3+
area: Infra/REST API
4+
type: bug
5+
issues:
6+
- 57639

docs/changelog/126237.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 126237
2+
summary: Use `FallbackSyntheticSourceBlockLoader` for text fields
3+
area: Mapping
4+
type: enhancement
5+
issues: []

docs/changelog/126296.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 126296
2+
summary: Fail with 500 not 400 for `ValueExtractor` bugs
3+
area: ES|QL
4+
type: bug
5+
issues: []

docs/changelog/126338.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 126338
2+
summary: Speed up TO_IP
3+
area: ES|QL
4+
type: enhancement
5+
issues: []

docs/reference/elasticsearch/mapping-reference/geo-point.md

Lines changed: 71 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,23 @@ mapped_pages:
99

1010
Fields of type `geo_point` accept latitude-longitude pairs, which can be used:
1111

12-
* to find geopoints within a [bounding box](/reference/query-languages/query-dsl/query-dsl-geo-bounding-box-query.md), within a certain [distance](/reference/query-languages/query-dsl/query-dsl-geo-distance-query.md) of a central point, or within a [`geo_shape` query](/reference/query-languages/query-dsl/query-dsl-geo-shape-query.md) (for example, points in a polygon).
12+
* to find geopoints within a [bounding box](/reference/query-languages/query-dsl/query-dsl-geo-bounding-box-query.md),
13+
within a certain [distance](/reference/query-languages/query-dsl/query-dsl-geo-distance-query.md) of a central point,
14+
or within a [`geo_shape` query](/reference/query-languages/query-dsl/query-dsl-geo-shape-query.md) (for example, points in a polygon).
1315
* to aggregate documents by [distance](/reference/aggregations/search-aggregations-bucket-geodistance-aggregation.md) from a central point.
14-
* to aggregate documents by geographic grids: either [`geo_hash`](/reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md), [`geo_tile`](/reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) or [`geo_hex`](/reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md).
15-
* to aggregate geopoints into a track using the metrics aggregation [`geo_line`](/reference/aggregations/search-aggregations-metrics-geo-line.md).
16+
* to aggregate documents by geographic grids: either
17+
[`geo_hash`](/reference/aggregations/search-aggregations-bucket-geohashgrid-aggregation.md),
18+
[`geo_tile`](/reference/aggregations/search-aggregations-bucket-geotilegrid-aggregation.md) or
19+
[`geo_hex`](/reference/aggregations/search-aggregations-bucket-geohexgrid-aggregation.md).
20+
* to aggregate geopoints into a track using the metrics aggregation
21+
[`geo_line`](/reference/aggregations/search-aggregations-metrics-geo-line.md).
1622
* to integrate distance into a document’s [relevance score](/reference/query-languages/query-dsl/query-dsl-function-score-query.md).
1723
* to [sort](/reference/elasticsearch/rest-apis/sort-search-results.md#geo-sorting) documents by distance.
1824

19-
As with [geo_shape](/reference/elasticsearch/mapping-reference/geo-shape.md) and [point](/reference/elasticsearch/mapping-reference/point.md), `geo_point` can be specified in [GeoJSON](http://geojson.org) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) formats. However, there are a number of additional formats that are supported for convenience and historical reasons. In total there are six ways that a geopoint may be specified, as demonstrated below:
25+
As with [geo_shape](/reference/elasticsearch/mapping-reference/geo-shape.md) and [point](/reference/elasticsearch/mapping-reference/point.md), `geo_point` can be specified in [GeoJSON](http://geojson.org)
26+
and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) formats.
27+
However, there are a number of additional formats that are supported for convenience and historical reasons.
28+
In total there are six ways that a geopoint may be specified, as demonstrated below:
2029

2130
```console
2231
PUT my-index-000001
@@ -103,15 +112,28 @@ GET my-index-000001/_search
103112
::::{admonition} Geopoints expressed as an array or string
104113
:class: important
105114

106-
Please note that string geopoints are ordered as `lat,lon`, while array geopoints, GeoJSON and WKT are ordered as the reverse: `lon,lat`.
115+
Please note that string geopoints are ordered as `lat,lon`, while array
116+
geopoints, GeoJSON and WKT are ordered as the reverse: `lon,lat`.
107117

108-
The reasons for this are historical. Geographers traditionally write `latitude` before `longitude`, while recent formats specified for geographic data like [GeoJSON](https://geojson.org/) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html) order `longitude` before `latitude` (easting before northing) in order to match the mathematical convention of ordering `x` before `y`.
118+
The reasons for this are historical. Geographers traditionally write `latitude`
119+
before `longitude`, while recent formats specified for geographic data like
120+
[GeoJSON](https://geojson.org/) and [Well-Known Text](https://docs.opengeospatial.org/is/12-063r5/12-063r5.html)
121+
order `longitude` before `latitude` (easting before northing) in order to match
122+
the mathematical convention of ordering `x` before `y`.
109123

110124
::::
111125

112126

113127
::::{note}
114-
A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash). Geohashes are [base32](https://en.wikipedia.org/wiki/Base32) encoded strings of the bits of the latitude and longitude interleaved. Each character in a geohash adds additional 5 bits to the precision. So the longer the hash, the more precise it is. For the indexing purposed geohashs are translated into latitude-longitude pairs. During this process only first 12 characters are used, so specifying more than 12 characters in a geohash doesn’t increase the precision. The 12 characters provide 60 bits, which should reduce a possible error to less than 2cm.
128+
A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash).
129+
Geohashes are [base32](https://en.wikipedia.org/wiki/Base32) encoded strings of
130+
the bits of the latitude and longitude interleaved. Each character in a geohash
131+
adds additional 5 bits to the precision. So the longer the hash, the more
132+
precise it is. For the indexing purposed geohashs are translated into
133+
latitude-longitude pairs. During this process only first 12 characters are
134+
used, so specifying more than 12 characters in a geohash doesn’t increase the
135+
precision. The 12 characters provide 60 bits, which should reduce a possible
136+
error to less than 2cm.
115137
::::
116138

117139

@@ -120,27 +142,54 @@ A point can be expressed as a [geohash](https://en.wikipedia.org/wiki/Geohash).
120142
The following parameters are accepted by `geo_point` fields:
121143

122144
[`ignore_malformed`](/reference/elasticsearch/mapping-reference/ignore-malformed.md)
123-
: If `true`, malformed geopoints are ignored. If `false` (default), malformed geopoints throw an exception and reject the whole document. A geopoint is considered malformed if its latitude is outside the range -90 ⇐ latitude ⇐ 90, or if its longitude is outside the range -180 ⇐ longitude ⇐ 180. Note that this cannot be set if the `script` parameter is used.
145+
: If `true`, malformed geopoints are ignored.
146+
If `false` (default), malformed geopoints throw an exception and reject the whole document.
147+
A geopoint is considered malformed if its latitude is outside the range -90 ⇐ latitude ⇐ 90,
148+
or if its longitude is outside the range -180 ⇐ longitude ⇐ 180.
149+
When set to `true`, if the format is valid, but the values are out of range,
150+
the values will be normalized into the valid range, and the document will be indexed.
151+
This is a special case, and a [different behaviour](/reference/elasticsearch/mapping-reference/ignore-malformed.md#_ignore_malformed_geo_point) from the normal for `ignore_malformed`.
152+
Note that this cannot be set if the `script` parameter is used.
124153

125154
`ignore_z_value`
126-
: If `true` (default) three dimension points will be accepted (stored in source) but only latitude and longitude values will be indexed; the third dimension is ignored. If `false`, geopoints containing any more than latitude and longitude (two dimensions) values throw an exception and reject the whole document. Note that this cannot be set if the `script` parameter is used.
155+
: If `true` (default) three dimension points will be accepted (stored in source)
156+
but only latitude and longitude values will be indexed; the third dimension is
157+
ignored. If `false`, geopoints containing any more than latitude and longitude
158+
(two dimensions) values throw an exception and reject the whole document. Note
159+
that this cannot be set if the `script` parameter is used.
127160

128161
[`index`](/reference/elasticsearch/mapping-reference/mapping-index.md)
129-
: Should the field be quickly searchable? Accepts `true` (default) and `false`. Fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md) enabled can still be queried, albeit slower.
162+
: Should the field be quickly searchable? Accepts `true` (default) and
163+
`false`. Fields that only have [`doc_values`](/reference/elasticsearch/mapping-reference/doc-values.md)
164+
enabled can still be queried, albeit slower.
130165

131166
[`null_value`](/reference/elasticsearch/mapping-reference/null-value.md)
132-
: Accepts an geopoint value which is substituted for any explicit `null` values. Defaults to `null`, which means the field is treated as missing. Note that this cannot be set if the `script` parameter is used.
167+
: Accepts a geopoint value which is substituted for any explicit `null` values.
168+
Defaults to `null`, which means the field is treated as missing. Note that this
169+
cannot be set if the `script` parameter is used.
133170

134171
`on_script_error`
135-
: Defines what to do if the script defined by the `script` parameter throws an error at indexing time. Accepts `fail` (default), which will cause the entire document to be rejected, and `continue`, which will register the field in the document’s [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) metadata field and continue indexing. This parameter can only be set if the `script` field is also set.
172+
: Defines what to do if the script defined by the `script` parameter
173+
throws an error at indexing time. Accepts `fail` (default), which
174+
will cause the entire document to be rejected, and `continue`, which
175+
will register the field in the document’s [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) metadata field and continue
176+
indexing. This parameter can only be set if the `script` field is
177+
also set.
136178

137179
`script`
138-
: If this parameter is set, then the field will index values generated by this script, rather than reading the values directly from the source. If a value is set for this field on the input document, then the document will be rejected with an error. Scripts are in the same format as their [runtime equivalent](docs-content://manage-data/data-store/mapping/map-runtime-field.md), and should emit points as a pair of (lat, lon) double values.
180+
: If this parameter is set, then the field will index values generated
181+
by this script, rather than reading the values directly from the
182+
source. If a value is set for this field on the input document, then
183+
the document will be rejected with an error.
184+
Scripts are in the same format as their [runtime equivalent](docs-content://manage-data/data-store/mapping/map-runtime-field.md), and should emit points
185+
as a pair of (lat, lon) double values.
139186

140187

141188
## Using geopoints in scripts [_using_geopoints_in_scripts]
142189

143-
When accessing the value of a geopoint in a script, the value is returned as a `GeoPoint` object, which allows access to the `.lat` and `.lon` values respectively:
190+
When accessing the value of a geopoint in a script, the value is returned as
191+
a `GeoPoint` object, which allows access to the `.lat` and `.lon` values
192+
respectively:
144193

145194
```painless
146195
def geopoint = doc['location'].value;
@@ -159,11 +208,17 @@ def lon = doc['location'].lon;
159208
## Synthetic source [geo-point-synthetic-source]
160209

161210
::::{important}
162-
Synthetic `_source` is Generally Available only for TSDB indices (indices that have `index.mode` set to `time_series`). For other indices synthetic `_source` is in technical preview. Features in technical preview may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
211+
Synthetic `_source` is Generally Available only for TSDB indices
212+
(indices that have `index.mode` set to `time_series`). For other indices
213+
synthetic `_source` is in technical preview. Features in technical preview may
214+
be changed or removed in a future release. Elastic will work to fix
215+
any issues, but features in technical preview are not subject to the support SLA
216+
of official GA features.
163217
::::
164218

165219

166-
Synthetic source may sort `geo_point` fields (first by latitude and then longitude) and reduces them to their stored precision. For example:
220+
Synthetic source may sort `geo_point` fields (first by latitude and then
221+
longitude) and reduces them to their stored precision. For example:
167222

168223
$$$synthetic-source-geo-point-example$$$
169224

docs/reference/elasticsearch/mapping-reference/ignore-malformed.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ The `ignore_malformed` setting is currently supported by the following [mapping
5959
: `date_nanos`
6060

6161
[Geopoint](/reference/elasticsearch/mapping-reference/geo-point.md)
62-
: `geo_point` for lat/lon points
62+
: `geo_point` for lat/lon points, although there is a [special case](#_ignore_malformed_geo_point) for out-of-range values
6363

6464
[Geoshape](/reference/elasticsearch/mapping-reference/geo-shape.md)
6565
: `geo_shape` for complex shapes like polygons
@@ -103,8 +103,21 @@ PUT my-index-000001
103103

104104
## Dealing with malformed fields [_dealing_with_malformed_fields]
105105

106-
Malformed fields are silently ignored at indexing time when `ignore_malformed` is turned on. Whenever possible it is recommended to keep the number of documents that have a malformed field contained, or queries on this field will become meaningless. Elasticsearch makes it easy to check how many documents have malformed fields by using `exists`,`term` or `terms` queries on the special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field.
107-
106+
Malformed fields are silently ignored at indexing time when `ignore_malformed` is turned on.
107+
Whenever possible it is recommended to keep the number of documents that have a malformed field contained,
108+
or queries on this field will become meaningless.
109+
Elasticsearch makes it easy to check how many documents have malformed fields by using `exists`,
110+
`term` or `terms` queries on the special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field.
111+
112+
## The special case of `geo_point` fields [_ignore_malformed_geo_point]
113+
114+
With [`geo_point`](/reference/elasticsearch/mapping-reference/geo-point.md) fields,
115+
there is the special case of values that have a syntactically valid format,
116+
but the numerical values for `latitude` and `longitude` are out of range.
117+
If `ignore_malformed` is `false`, an exception will be thrown as usual. But if it is `true`,
118+
the document will be indexed correctly, by normalizing the latitude and longitude values into the valid range.
119+
The special [`_ignored`](/reference/elasticsearch/mapping-reference/mapping-ignored-field.md) field will not be set.
120+
The original source document will remain as before, but indexed values, doc-values and stored fields will all be normalized.
108121

109122
## Limits for JSON Objects [json-object-limits]
110123

0 commit comments

Comments
 (0)