Skip to content

Commit b606732

Browse files
Merge branch 'main' into brothermich/ES-10264
2 parents 30f3c31 + 943b224 commit b606732

File tree

318 files changed

+1185
-790
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

318 files changed

+1185
-790
lines changed

docs/changelog/124825.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 124825
2+
summary: Check alias during update
3+
area: Transform
4+
type: bug
5+
issues: []

docs/changelog/127563.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 127563
2+
summary: "ESQL: Avoid unintended attribute removal"
3+
area: ES|QL
4+
type: bug
5+
issues:
6+
- 127468

docs/changelog/128161.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 128161
2+
summary: Fix system data streams incorrectly showing up in the list of template validation
3+
problems
4+
area: Data streams
5+
type: bug
6+
issues: []

docs/reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
22

3-
Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
3+
Clearly, the naive implementation does not scalethe sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
44

55
The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)).
66

docs/reference/aggregations/pipeline.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ An alternate syntax is supported to cope with aggregations or metrics which have
230230

231231
## Dealing with gaps in the data [gap-policy]
232232

233-
Data in the real world is often noisy and sometimes contains **gaps** — places where data simply doesn’t exist. This can occur for a variety of reasons, the most common being:
233+
Data in the real world is often noisy and sometimes contains **gaps**places where data simply doesn’t exist. This can occur for a variety of reasons, the most common being:
234234

235235
* Documents falling into a bucket do not contain a required field
236236
* There are no documents matching the query for one or more buckets

docs/reference/aggregations/search-aggregations-bucket-composite-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -606,7 +606,7 @@ PUT my-index-000001
606606
```
607607

608608
1. This index is sorted by `username` first then by `timestamp`.
609-
2. in ascending order for the `username` field and in descending order for the `timestamp` field.1. could be used to optimize these composite aggregations:
609+
2. in ascending order for the `username` field and in descending order for the `timestamp` field.1. could be used to optimize these composite aggregations:
610610

611611

612612

docs/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -679,7 +679,7 @@ Response:
679679
}
680680
```
681681

682-
The response will contain all the buckets having the relative day of the week as key : 1 for Monday, 2 for Tuesday… 7 for Sunday.
682+
The response will contain all the buckets having the relative day of the week as key : 1 for Monday, 2 for Tuesday… 7 for Sunday.
683683

684684

685685

docs/reference/aggregations/search-aggregations-bucket-rare-terms-aggregation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ mapped_pages:
77
# Rare terms aggregation [search-aggregations-bucket-rare-terms-aggregation]
88

99

10-
A multi-bucket value source based aggregation which finds "rare" terms — terms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a `terms` aggregation that is sorted by `_count` ascending. As noted in the [terms aggregation docs](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), actually ordering a `terms` agg by count ascending has unbounded error. Instead, you should use the `rare_terms` aggregation
10+
A multi-bucket value source based aggregation which finds "rare" termsterms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a `terms` aggregation that is sorted by `_count` ascending. As noted in the [terms aggregation docs](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), actually ordering a `terms` agg by count ascending has unbounded error. Instead, you should use the `rare_terms` aggregation
1111

1212
## Syntax [_syntax_3]
1313

@@ -117,7 +117,7 @@ This does, however, mean that a large number of results can be returned if chose
117117

118118
## Max Bucket Limit [search-aggregations-bucket-rare-terms-aggregation-max-buckets]
119119

120-
The Rare Terms aggregation is more liable to trip the `search.max_buckets` soft limit than other aggregations due to how it works. The `max_bucket` soft-limit is evaluated on a per-shard basis while the aggregation is collecting results. It is possible for a term to be "rare" on a shard but become "not rare" once all the shard results are merged together. This means that individual shards tend to collect more buckets than are truly rare, because they only have their own local view. This list is ultimately pruned to the correct, smaller list of rare terms on the coordinating node… but a shard may have already tripped the `max_buckets` soft limit and aborted the request.
120+
The Rare Terms aggregation is more liable to trip the `search.max_buckets` soft limit than other aggregations due to how it works. The `max_bucket` soft-limit is evaluated on a per-shard basis while the aggregation is collecting results. It is possible for a term to be "rare" on a shard but become "not rare" once all the shard results are merged together. This means that individual shards tend to collect more buckets than are truly rare, because they only have their own local view. This list is ultimately pruned to the correct, smaller list of rare terms on the coordinating node… but a shard may have already tripped the `max_buckets` soft limit and aborted the request.
121121

122122
When aggregating on fields that have potentially many "rare" terms, you may need to increase the `max_buckets` soft limit. Alternatively, you might need to find a way to filter the results to return fewer rare values (smaller time span, filter by category, etc), or re-evaluate your definition of "rare" (e.g. if something appears 100,000 times, is it truly "rare"?)
123123

docs/reference/aggregations/search-aggregations-bucket-significanttext-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Re-analyzing *large* result sets will require a lot of time and memory. It is re
2121
* Suggesting "H5N1" when users search for "bird flu" to help expand queries
2222
* Suggesting keywords relating to stock symbol $ATI for use in an automated news classifier
2323

24-
In these cases the words being selected are not simply the most popular terms in results. The most popular words tend to be very boring (*and, of, the, we, I, they*). The significant words are the ones that have undergone a significant change in popularity measured between a *foreground* and *background* set. If the term "H5N1" only exists in 5 documents in a 10 million document index and yet is found in 4 of the 100 documents that make up a user’s search results that is significant and probably very relevant to their search. 5/10,000,000 vs 4/100 is a big swing in frequency.
24+
In these cases the words being selected are not simply the most popular terms in results. The most popular words tend to be very boring (*and, of, the, we, I, they* ). The significant words are the ones that have undergone a significant change in popularity measured between a *foreground* and *background* set. If the term "H5N1" only exists in 5 documents in a 10 million document index and yet is found in 4 of the 100 documents that make up a user’s search results that is significant and probably very relevant to their search. 5/10,000,000 vs 4/100 is a big swing in frequency.
2525

2626
## Basic use [_basic_use_2]
2727

docs/reference/aggregations/search-aggregations-bucket-terms-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -696,7 +696,7 @@ When aggregating on multiple indices the type of the aggregated field may not be
696696

697697
### Failed Trying to Format Bytes [_failed_trying_to_format_bytes]
698698

699-
When running a terms aggregation (or other aggregation, but in practice usually terms) over multiple indices, you may get an error that starts with "Failed trying to format bytes…". This is usually caused by two of the indices not having the same mapping type for the field being aggregated.
699+
When running a terms aggregation (or other aggregation, but in practice usually terms) over multiple indices, you may get an error that starts with "Failed trying to format bytes… ". This is usually caused by two of the indices not having the same mapping type for the field being aggregated.
700700

701701
**Use an explicit `value_type`** Although it’s best to correct the mappings, you can work around this issue if the field is unmapped in one of the indices. Setting the `value_type` parameter can resolve the issue by coercing the unmapped field into the correct type.
702702

0 commit comments

Comments
 (0)