Skip to content

Commit d10ef76

Browse files
authored
[DOCS] Replace irregular whitespaces in docs (elastic#128199)
* Replace irregular whitespaces * More chars
1 parent a2b4a6f commit d10ef76

File tree

46 files changed

+86
-86
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+86
-86
lines changed

docs/reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
22

3-
Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
3+
Clearly, the naive implementation does not scalethe sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
44

55
The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)).
66

docs/reference/aggregations/pipeline.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ An alternate syntax is supported to cope with aggregations or metrics which have
230230

231231
## Dealing with gaps in the data [gap-policy]
232232

233-
Data in the real world is often noisy and sometimes contains **gaps** — places where data simply doesn’t exist. This can occur for a variety of reasons, the most common being:
233+
Data in the real world is often noisy and sometimes contains **gaps**places where data simply doesn’t exist. This can occur for a variety of reasons, the most common being:
234234

235235
* Documents falling into a bucket do not contain a required field
236236
* There are no documents matching the query for one or more buckets

docs/reference/aggregations/search-aggregations-bucket-composite-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -606,7 +606,7 @@ PUT my-index-000001
606606
```
607607

608608
1. This index is sorted by `username` first then by `timestamp`.
609-
2. in ascending order for the `username` field and in descending order for the `timestamp` field.1. could be used to optimize these composite aggregations:
609+
2. in ascending order for the `username` field and in descending order for the `timestamp` field.1. could be used to optimize these composite aggregations:
610610

611611

612612

docs/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -679,7 +679,7 @@ Response:
679679
}
680680
```
681681

682-
The response will contain all the buckets having the relative day of the week as key : 1 for Monday, 2 for Tuesday… 7 for Sunday.
682+
The response will contain all the buckets having the relative day of the week as key : 1 for Monday, 2 for Tuesday… 7 for Sunday.
683683

684684

685685

docs/reference/aggregations/search-aggregations-bucket-rare-terms-aggregation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ mapped_pages:
77
# Rare terms aggregation [search-aggregations-bucket-rare-terms-aggregation]
88

99

10-
A multi-bucket value source based aggregation which finds "rare" terms — terms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a `terms` aggregation that is sorted by `_count` ascending. As noted in the [terms aggregation docs](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), actually ordering a `terms` agg by count ascending has unbounded error. Instead, you should use the `rare_terms` aggregation
10+
A multi-bucket value source based aggregation which finds "rare" termsterms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a `terms` aggregation that is sorted by `_count` ascending. As noted in the [terms aggregation docs](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), actually ordering a `terms` agg by count ascending has unbounded error. Instead, you should use the `rare_terms` aggregation
1111

1212
## Syntax [_syntax_3]
1313

@@ -117,7 +117,7 @@ This does, however, mean that a large number of results can be returned if chose
117117

118118
## Max Bucket Limit [search-aggregations-bucket-rare-terms-aggregation-max-buckets]
119119

120-
The Rare Terms aggregation is more liable to trip the `search.max_buckets` soft limit than other aggregations due to how it works. The `max_bucket` soft-limit is evaluated on a per-shard basis while the aggregation is collecting results. It is possible for a term to be "rare" on a shard but become "not rare" once all the shard results are merged together. This means that individual shards tend to collect more buckets than are truly rare, because they only have their own local view. This list is ultimately pruned to the correct, smaller list of rare terms on the coordinating node… but a shard may have already tripped the `max_buckets` soft limit and aborted the request.
120+
The Rare Terms aggregation is more liable to trip the `search.max_buckets` soft limit than other aggregations due to how it works. The `max_bucket` soft-limit is evaluated on a per-shard basis while the aggregation is collecting results. It is possible for a term to be "rare" on a shard but become "not rare" once all the shard results are merged together. This means that individual shards tend to collect more buckets than are truly rare, because they only have their own local view. This list is ultimately pruned to the correct, smaller list of rare terms on the coordinating node… but a shard may have already tripped the `max_buckets` soft limit and aborted the request.
121121

122122
When aggregating on fields that have potentially many "rare" terms, you may need to increase the `max_buckets` soft limit. Alternatively, you might need to find a way to filter the results to return fewer rare values (smaller time span, filter by category, etc), or re-evaluate your definition of "rare" (e.g. if something appears 100,000 times, is it truly "rare"?)
123123

docs/reference/aggregations/search-aggregations-bucket-significanttext-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Re-analyzing *large* result sets will require a lot of time and memory. It is re
2121
* Suggesting "H5N1" when users search for "bird flu" to help expand queries
2222
* Suggesting keywords relating to stock symbol $ATI for use in an automated news classifier
2323

24-
In these cases the words being selected are not simply the most popular terms in results. The most popular words tend to be very boring (*and, of, the, we, I, they*). The significant words are the ones that have undergone a significant change in popularity measured between a *foreground* and *background* set. If the term "H5N1" only exists in 5 documents in a 10 million document index and yet is found in 4 of the 100 documents that make up a user’s search results that is significant and probably very relevant to their search. 5/10,000,000 vs 4/100 is a big swing in frequency.
24+
In these cases the words being selected are not simply the most popular terms in results. The most popular words tend to be very boring (*and, of, the, we, I, they* ). The significant words are the ones that have undergone a significant change in popularity measured between a *foreground* and *background* set. If the term "H5N1" only exists in 5 documents in a 10 million document index and yet is found in 4 of the 100 documents that make up a user’s search results that is significant and probably very relevant to their search. 5/10,000,000 vs 4/100 is a big swing in frequency.
2525

2626
## Basic use [_basic_use_2]
2727

docs/reference/aggregations/search-aggregations-bucket-terms-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -696,7 +696,7 @@ When aggregating on multiple indices the type of the aggregated field may not be
696696

697697
### Failed Trying to Format Bytes [_failed_trying_to_format_bytes]
698698

699-
When running a terms aggregation (or other aggregation, but in practice usually terms) over multiple indices, you may get an error that starts with "Failed trying to format bytes…". This is usually caused by two of the indices not having the same mapping type for the field being aggregated.
699+
When running a terms aggregation (or other aggregation, but in practice usually terms) over multiple indices, you may get an error that starts with "Failed trying to format bytes… ". This is usually caused by two of the indices not having the same mapping type for the field being aggregated.
700700

701701
**Use an explicit `value_type`** Although it’s best to correct the mappings, you can work around this issue if the field is unmapped in one of the indices. Setting the `value_type` parameter can resolve the issue by coercing the unmapped field into the correct type.
702702

docs/reference/aggregations/search-aggregations-metrics-boxplot-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ GET latency/_search
126126
1. Compression controls memory usage and approximation error
127127

128128

129-
The TDigest algorithm uses a number of "nodes" to approximate percentiles — the more nodes available, the higher the accuracy (and large memory footprint) proportional to the volume of data. The `compression` parameter limits the maximum number of nodes to `20 * compression`.
129+
The TDigest algorithm uses a number of "nodes" to approximate percentilesthe more nodes available, the higher the accuracy (and large memory footprint) proportional to the volume of data. The `compression` parameter limits the maximum number of nodes to `20 * compression`.
130130

131131
Therefore, by increasing the compression value, you can increase the accuracy of your percentiles at the cost of more memory. Larger compression values also make the algorithm slower since the underlying tree data structure grows in size, resulting in more expensive operations. The default compression value is `100`.
132132

docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ By default, the `percentile` metric will generate a range of percentiles: `[ 1,
6060

6161
As you can see, the aggregation will return a calculated value for each percentile in the default range. If we assume response times are in milliseconds, it is immediately obvious that the webpage normally loads in 10-720ms, but occasionally spikes to 940-980ms.
6262

63-
Often, administrators are only interested in outliers — the extreme percentiles. We can specify just the percents we are interested in (requested percentiles must be a value between 0-100 inclusive):
63+
Often, administrators are only interested in outliersthe extreme percentiles. We can specify just the percents we are interested in (requested percentiles must be a value between 0-100 inclusive):
6464

6565
```console
6666
GET latency/_search
@@ -177,7 +177,7 @@ GET latency/_search
177177

178178
There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
179179

180-
Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
180+
Clearly, the naive implementation does not scalethe sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
181181

182182
The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)).
183183

@@ -222,7 +222,7 @@ GET latency/_search
222222
1. Compression controls memory usage and approximation error
223223

224224

225-
The TDigest algorithm uses a number of "nodes" to approximate percentiles — the more nodes available, the higher the accuracy (and large memory footprint) proportional to the volume of data. The `compression` parameter limits the maximum number of nodes to `20 * compression`.
225+
The TDigest algorithm uses a number of "nodes" to approximate percentilesthe more nodes available, the higher the accuracy (and large memory footprint) proportional to the volume of data. The `compression` parameter limits the maximum number of nodes to `20 * compression`.
226226

227227
Therefore, by increasing the compression value, you can increase the accuracy of your percentiles at the cost of more memory. Larger compression values also make the algorithm slower since the underlying tree data structure grows in size, resulting in more expensive operations. The default compression value is `100`.
228228

docs/reference/aggregations/search-aggregations-metrics-weight-avg-aggregation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ mapped_pages:
99

1010
A `single-value` metrics aggregation that computes the weighted average of numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.
1111

12-
When calculating a regular average, each datapoint has an equal "weight" … it contributes equally to the final value. Weighted averages, on the other hand, weight each datapoint differently. The amount that each datapoint contributes to the final value is extracted from the document.
12+
When calculating a regular average, each datapoint has an equal "weight" … it contributes equally to the final value. Weighted averages, on the other hand, weight each datapoint differently. The amount that each datapoint contributes to the final value is extracted from the document.
1313

1414
As a formula, a weighted average is the `∑(value * weight) / ∑(weight)`
1515

0 commit comments

Comments
 (0)