diff --git a/docs/images/create-index-template.png b/docs/images/create-index-template.png deleted file mode 100644 index 7506409897e15..0000000000000 Binary files a/docs/images/create-index-template.png and /dev/null differ diff --git a/docs/images/hybrid-architecture.png b/docs/images/hybrid-architecture.png deleted file mode 100644 index 81d19179db3e2..0000000000000 Binary files a/docs/images/hybrid-architecture.png and /dev/null differ diff --git a/docs/images/mongodb-connector-config.png b/docs/images/mongodb-connector-config.png deleted file mode 100644 index 2c4d2e2158908..0000000000000 Binary files a/docs/images/mongodb-connector-config.png and /dev/null differ diff --git a/docs/images/mongodb-load-sample-data.png b/docs/images/mongodb-load-sample-data.png deleted file mode 100644 index f7bc9c4192b02..0000000000000 Binary files a/docs/images/mongodb-load-sample-data.png and /dev/null differ diff --git a/docs/images/mongodb-sample-document.png b/docs/images/mongodb-sample-document.png deleted file mode 100644 index f462c41ad751c..0000000000000 Binary files a/docs/images/mongodb-sample-document.png and /dev/null differ diff --git a/docs/images/token-graph-dns-invalid-ex.svg b/docs/images/token-graph-dns-invalid-ex.svg deleted file mode 100644 index 5614f39bfe35c..0000000000000 --- a/docs/images/token-graph-dns-invalid-ex.svg +++ /dev/null @@ -1,72 +0,0 @@ - - - - Slice 1 - Created with Sketch. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/docs/images/token-graph-dns-synonym-ex.svg b/docs/images/token-graph-dns-synonym-ex.svg deleted file mode 100644 index cff5b1306b73b..0000000000000 --- a/docs/images/token-graph-dns-synonym-ex.svg +++ /dev/null @@ -1,72 +0,0 @@ - - - - Slice 1 - Created with Sketch. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/docs/images/use-a-connector-workflow.png b/docs/images/use-a-connector-workflow.png deleted file mode 100644 index eb51863358e9a..0000000000000 Binary files a/docs/images/use-a-connector-workflow.png and /dev/null differ diff --git a/docs/reference/aggregations/_snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md b/docs/reference/aggregations/_snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md index 5bd61d1b1f23c..ceb7d2602f2b8 100644 --- a/docs/reference/aggregations/_snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md +++ b/docs/reference/aggregations/_snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md @@ -6,7 +6,7 @@ For a precision threshold of `c`, the implementation that we are using requires The following chart shows how the error varies before and after the threshold: -![cardinality error](/images/cardinality_error.png "") +![cardinality error](/reference/query-languages/images/cardinality_error.png "") For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed, this is likely to be the case. Accuracy in practice depends on the dataset in question. In general, diff --git a/docs/reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md b/docs/reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md index 87ba905bc1518..76a05164b1258 100644 --- a/docs/reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md +++ b/docs/reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md @@ -12,6 +12,6 @@ When using this metric, there are a few guidelines to keep in mind: The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile: -![percentiles error](/images/percentiles_error.png "") +![percentiles error](/reference/query-languages/images/percentiles_error.png "") It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions. diff --git a/docs/images/cardinality_error.png b/docs/reference/aggregations/images/cardinality_error.png similarity index 100% rename from docs/images/cardinality_error.png rename to docs/reference/aggregations/images/cardinality_error.png diff --git a/docs/images/percentiles_error.png b/docs/reference/aggregations/images/percentiles_error.png similarity index 100% rename from docs/images/percentiles_error.png rename to docs/reference/aggregations/images/percentiles_error.png diff --git a/docs/reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md b/docs/reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md index f67175f4892ab..f99596a450ffd 100644 --- a/docs/reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md +++ b/docs/reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md @@ -65,9 +65,23 @@ Computing exact counts requires loading values into a hash set and returning its This `cardinality` aggregation is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties: -:::{include} _snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md -::: +* configurable precision, which decides on how to trade memory for accuracy, +* excellent accuracy on low-cardinality sets, +* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision. +For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes. + +The following chart shows how the error varies before and after the threshold: + +![cardinality error](/reference/aggregations/images/cardinality_error.png "") + +For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed, +this is likely to be the case. Accuracy in practice depends on the dataset in question. In general, +most datasets show consistently good accuracy. Also note that even with a threshold as low as 100, +the error remains very low (1-6% as seen in the above graph) even when counting millions of items. + +The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of +hashes in a dataset can affect the accuracy of the cardinality. ## Pre-computed hashes [_pre_computed_hashes] diff --git a/docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md b/docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md index 1d3665b572305..9d16953007749 100644 --- a/docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md +++ b/docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md @@ -175,8 +175,23 @@ GET latency/_search ## Percentiles are (usually) approximate [search-aggregations-metrics-percentile-aggregation-approximation] -:::{include} /reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md -::: +There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`. + +Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated. + +The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)). + +When using this metric, there are a few guidelines to keep in mind: + +* Accuracy is proportional to `q(1-q)`. This means that extreme percentiles (e.g. 99%) are more accurate than less extreme percentiles, such as the median +* For small sets of values, percentiles are highly accurate (and potentially 100% accurate if the data is small enough). +* As the quantity of values in a bucket grows, the algorithm begins to approximate the percentiles. It is effectively trading accuracy for memory savings. The exact level of inaccuracy is difficult to generalize, since it depends on your data distribution and volume of data being aggregated + +The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile: + +![percentiles error](images/percentiles_error.png "") + +It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions. ::::{warning} Percentile aggregations are also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data. diff --git a/docs/reference/query-languages/eql/eql-syntax.md b/docs/reference/query-languages/eql/eql-syntax.md index 5506f6bcdab2f..994b2089d7d6c 100644 --- a/docs/reference/query-languages/eql/eql-syntax.md +++ b/docs/reference/query-languages/eql/eql-syntax.md @@ -788,7 +788,7 @@ You cannot use EQL to search the values of a [`nested`](/reference/elasticsearch * If two pending sequences are in the same state at the same time, the most recent sequence overwrites the older one. * If the query includes [`by` fields](#eql-by-keyword), the query uses a separate state machine for each unique `by` field value. -:::::{dropdown} **Example** +:::::{dropdown} Example A data set contains the following `process` events in ascending chronological order: ```js @@ -831,13 +831,13 @@ The query’s event items correspond to the following states: * State B: `[process where process.name == "bash"]` * Complete: `[process where process.name == "cat"]` -:::{image} /images/sequence-state-machine.svg +:::{image} ../images/sequence-state-machine.svg :alt: sequence state machine ::: To find matching sequences, the query uses separate state machines for each unique `user.name` value. Based on the data set, you can expect two state machines: one for the `root` user and one for `elkbee`. -:::{image} /images/separate-state-machines.svg +:::{image} ../images/separate-state-machines.svg :alt: separate state machines ::: diff --git a/docs/images/Exponential.png b/docs/reference/query-languages/images/Exponential.png similarity index 100% rename from docs/images/Exponential.png rename to docs/reference/query-languages/images/Exponential.png diff --git a/docs/images/Gaussian.png b/docs/reference/query-languages/images/Gaussian.png similarity index 100% rename from docs/images/Gaussian.png rename to docs/reference/query-languages/images/Gaussian.png diff --git a/docs/images/Linear.png b/docs/reference/query-languages/images/Linear.png similarity index 100% rename from docs/images/Linear.png rename to docs/reference/query-languages/images/Linear.png diff --git a/docs/reference/query-languages/images/cardinality_error.png b/docs/reference/query-languages/images/cardinality_error.png new file mode 100644 index 0000000000000..cf405be69ab97 Binary files /dev/null and b/docs/reference/query-languages/images/cardinality_error.png differ diff --git a/docs/images/decay_2d.png b/docs/reference/query-languages/images/decay_2d.png similarity index 100% rename from docs/images/decay_2d.png rename to docs/reference/query-languages/images/decay_2d.png diff --git a/docs/images/exponential-decay-keyword-exp-1.png b/docs/reference/query-languages/images/exponential-decay-keyword-exp-1.png similarity index 100% rename from docs/images/exponential-decay-keyword-exp-1.png rename to docs/reference/query-languages/images/exponential-decay-keyword-exp-1.png diff --git a/docs/images/exponential-decay-keyword-exp-2.png b/docs/reference/query-languages/images/exponential-decay-keyword-exp-2.png similarity index 100% rename from docs/images/exponential-decay-keyword-exp-2.png rename to docs/reference/query-languages/images/exponential-decay-keyword-exp-2.png diff --git a/docs/images/lambda.png b/docs/reference/query-languages/images/lambda.png similarity index 100% rename from docs/images/lambda.png rename to docs/reference/query-languages/images/lambda.png diff --git a/docs/images/lambda_calc.png b/docs/reference/query-languages/images/lambda_calc.png similarity index 100% rename from docs/images/lambda_calc.png rename to docs/reference/query-languages/images/lambda_calc.png diff --git a/docs/images/linear-decay-keyword-linear-1.png b/docs/reference/query-languages/images/linear-decay-keyword-linear-1.png similarity index 100% rename from docs/images/linear-decay-keyword-linear-1.png rename to docs/reference/query-languages/images/linear-decay-keyword-linear-1.png diff --git a/docs/images/linear-decay-keyword-linear-2.png b/docs/reference/query-languages/images/linear-decay-keyword-linear-2.png similarity index 100% rename from docs/images/linear-decay-keyword-linear-2.png rename to docs/reference/query-languages/images/linear-decay-keyword-linear-2.png diff --git a/docs/images/normal-decay-keyword-gauss-1.png b/docs/reference/query-languages/images/normal-decay-keyword-gauss-1.png similarity index 100% rename from docs/images/normal-decay-keyword-gauss-1.png rename to docs/reference/query-languages/images/normal-decay-keyword-gauss-1.png diff --git a/docs/images/normal-decay-keyword-gauss-2.png b/docs/reference/query-languages/images/normal-decay-keyword-gauss-2.png similarity index 100% rename from docs/images/normal-decay-keyword-gauss-2.png rename to docs/reference/query-languages/images/normal-decay-keyword-gauss-2.png diff --git a/docs/reference/query-languages/images/percentiles_error.png b/docs/reference/query-languages/images/percentiles_error.png new file mode 100644 index 0000000000000..b57464e72e0f9 Binary files /dev/null and b/docs/reference/query-languages/images/percentiles_error.png differ diff --git a/docs/images/s_calc.png b/docs/reference/query-languages/images/s_calc.png similarity index 100% rename from docs/images/s_calc.png rename to docs/reference/query-languages/images/s_calc.png diff --git a/docs/images/separate-state-machines.svg b/docs/reference/query-languages/images/separate-state-machines.svg similarity index 100% rename from docs/images/separate-state-machines.svg rename to docs/reference/query-languages/images/separate-state-machines.svg diff --git a/docs/images/sequence-state-machine.svg b/docs/reference/query-languages/images/sequence-state-machine.svg similarity index 100% rename from docs/images/sequence-state-machine.svg rename to docs/reference/query-languages/images/sequence-state-machine.svg diff --git a/docs/images/sigma.png b/docs/reference/query-languages/images/sigma.png similarity index 100% rename from docs/images/sigma.png rename to docs/reference/query-languages/images/sigma.png diff --git a/docs/images/sigma_calc.png b/docs/reference/query-languages/images/sigma_calc.png similarity index 100% rename from docs/images/sigma_calc.png rename to docs/reference/query-languages/images/sigma_calc.png diff --git a/docs/reference/query-languages/query-dsl/query-dsl-function-score-query.md b/docs/reference/query-languages/query-dsl/query-dsl-function-score-query.md index 68894d271f525..f08785c22b27a 100644 --- a/docs/reference/query-languages/query-dsl/query-dsl-function-score-query.md +++ b/docs/reference/query-languages/query-dsl/query-dsl-function-score-query.md @@ -360,11 +360,11 @@ The `DECAY_FUNCTION` determines the shape of the decay: `gauss` : Normal decay, computed as: -![Gaussian](/images/Gaussian.png "") +![Gaussian](../images/Gaussian.png "") -where ![sigma](/images/sigma.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset` +where ![sigma](../images/sigma.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset` -![sigma calc](/images/sigma_calc.png "") +![sigma calc](../images/sigma_calc.png "") See [Normal decay, keyword `gauss`](#gauss-decay) for graphs demonstrating the curve generated by the `gauss` function. @@ -372,11 +372,11 @@ See [Normal decay, keyword `gauss`](#gauss-decay) for graphs demonstrating the c `exp` : Exponential decay, computed as: -![Exponential](/images/Exponential.png "") +![Exponential](../images/Exponential.png "") -where again the parameter ![lambda](/images/lambda.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset` +where again the parameter ![lambda](../images/lambda.png "") is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset` -![lambda calc](/images/lambda_calc.png "") +![lambda calc](../images/lambda_calc.png "") See [Exponential decay, keyword `exp`](#exp-decay) for graphs demonstrating the curve generated by the `exp` function. @@ -384,18 +384,18 @@ See [Exponential decay, keyword `exp`](#exp-decay) for graphs demonstrating the `linear` : Linear decay, computed as: -![Linear](/images/Linear.png ""). +![Linear](../images/Linear.png ""). where again the parameter `s` is computed to assure that the score takes the value `decay` at distance `scale` from `origin`+-`offset` -![s calc](/images/s_calc.png "") +![s calc](../images/s_calc.png "") In contrast to the normal and exponential decay, this function actually sets the score to 0 if the field value exceeds twice the user given scale value. For single functions the three decay functions together with their parameters can be visualized like this (the field in this example called "age"): -![decay 2d](/images/decay_2d.png "") +![decay 2d](../images/decay_2d.png "") ### Multi-values fields [_multi_values_fields] @@ -510,10 +510,10 @@ Next, we show how the computed score looks like for each of the three possible d When choosing `gauss` as the decay function in the above example, the contour and surface plot of the multiplier looks like this: -:::{image} /images/normal-decay-keyword-gauss-1.png +:::{image} ../images/normal-decay-keyword-gauss-1.png ::: -:::{image} /images/normal-decay-keyword-gauss-2.png +:::{image} ../images/normal-decay-keyword-gauss-2.png ::: Suppose your original search results matches three hotels : @@ -529,20 +529,20 @@ Suppose your original search results matches three hotels : When choosing `exp` as the decay function in the above example, the contour and surface plot of the multiplier looks like this: -:::{image} /images/exponential-decay-keyword-exp-1.png +:::{image} ../images/exponential-decay-keyword-exp-1.png ::: -:::{image} /images/exponential-decay-keyword-exp-2.png +:::{image} ../images/exponential-decay-keyword-exp-2.png ::: ### Linear decay, keyword `linear` [linear-decay] When choosing `linear` as the decay function in the above example, the contour and surface plot of the multiplier looks like this: -:::{image} /images/linear-decay-keyword-linear-1.png +:::{image} ../images/linear-decay-keyword-linear-1.png ::: -:::{image} /images/linear-decay-keyword-linear-2.png +:::{image} ../images/linear-decay-keyword-linear-2.png ::: ## Supported fields for decay functions [_supported_fields_for_decay_functions]