Skip to content

Commit 5943cd5

Browse files
authored
[DOCS] Make doc_count error docs more searchable (#73870) (#73903)
Changes: * Combines the `Document counts are approximate` and `Calculating document count error` sections. * Rewrites the section to include `sum_other_doc_count` and `doc_count_error_upper_bound` for easier on-page (ctrl+f) searching. Closes #73200
1 parent 9bde33a commit 5943cd5

File tree

1 file changed

+20
-15
lines changed

1 file changed

+20
-15
lines changed

docs/reference/aggregations/bucket/terms-aggregation.asciidoc

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ Response:
100100
--------------------------------------------------
101101
// TESTRESPONSE[s/\.\.\.//]
102102

103-
<1> an upper bound of the error on the document counts for each term, see <<search-aggregations-bucket-terms-aggregation-approximate-counts,below>>
103+
<1> an upper bound of the error on the document counts for each term, see <<terms-agg-doc-count-error,below>>
104104
<2> when there are lots of unique terms, Elasticsearch only returns the top terms; this number is the sum of the document counts for all buckets that are not part of the response
105105
<3> the list of the top buckets, the meaning of `top` being defined by the <<search-aggregations-bucket-terms-aggregation-order,order>>
106106

@@ -122,14 +122,6 @@ NOTE: If you want to retrieve **all** terms or all combinations of terms in a ne
122122
allows to paginate over all possible terms rather than setting a size greater than the cardinality of the field in the
123123
`terms` aggregation. The `terms` aggregation is meant to return the `top` terms and does not allow pagination.
124124

125-
[[search-aggregations-bucket-terms-aggregation-approximate-counts]]
126-
==== Document counts are approximate
127-
128-
Document counts (and the results of any sub aggregations) in the terms
129-
aggregation are not always accurate. Each shard provides its own view of what
130-
the ordered list of terms should be. These views are combined to give a final
131-
view.
132-
133125
==== Shard Size
134126

135127
The higher the requested `size` is, the more accurate the results will be, but also, the more expensive it will be to
@@ -149,15 +141,28 @@ NOTE: `shard_size` cannot be smaller than `size` (as it doesn't make much sens
149141

150142
The default `shard_size` is `(size * 1.5 + 10)`.
151143

152-
==== Calculating Document Count Error
144+
[[terms-agg-doc-count-error]]
145+
==== Document count error
146+
147+
`doc_count` values for a `terms` aggregation may be approximate. As a result,
148+
any sub-aggregations on the `terms` aggregation may also be approximate.
149+
150+
To calculate `doc_count` values, each shard provides its own top terms and
151+
document counts. The aggregation combines these shard-level results to calculate
152+
its final `doc_count` values. To measure the accuracy of `doc_count` values, the
153+
aggregation results include the following properties:
154+
155+
`sum_other_doc_count`::
156+
(integer) The total document count for any terms not included in the results.
153157

154-
There are two error values which can be shown on the terms aggregation. The first gives a value for the aggregation as
155-
a whole which represents the maximum potential document count for a term which did not make it into the final list of
156-
terms. This is calculated as the sum of the document count from the last term returned from each shard.
158+
`doc_count_error_upper_bound`::
159+
(integer) The highest possible document count for any single term not included
160+
in the results. If `0`, `doc_count` values are accurate.
157161

158162
==== Per bucket document count error
159163

160-
The second error value can be enabled by setting the `show_term_doc_count_error` parameter to true:
164+
To get the `doc_count_error_upper_bound` for each term, set
165+
`show_term_doc_count_error` to `true`:
161166

162167
[source,console]
163168
--------------------------------------------------
@@ -194,7 +199,7 @@ The order of the buckets can be customized by setting the `order` parameter. By
194199
their `doc_count` descending. It is possible to change this behaviour as documented below:
195200

196201
WARNING: Sorting by ascending `_count` or by sub aggregation is discouraged as it increases the
197-
<<search-aggregations-bucket-terms-aggregation-approximate-counts,error>> on document counts.
202+
<<terms-agg-doc-count-error,error>> on document counts.
198203
It is fine when a single shard is queried, or when the field that is being aggregated was used
199204
as a routing key at index time: in these cases results will be accurate since shards have disjoint
200205
values. However otherwise, errors are unbounded. One particular case that could still be useful

0 commit comments

Comments
 (0)