|
| 1 | +/* |
| 2 | + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one |
| 3 | + * or more contributor license agreements. Licensed under the "Elastic License |
| 4 | + * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side |
| 5 | + * Public License v 1"; you may not use this file except in compliance with, at |
| 6 | + * your election, the "Elastic License 2.0", the "GNU Affero General Public |
| 7 | + * License v3.0 only", or the "Server Side Public License, v 1". |
| 8 | + */ |
| 9 | + |
| 10 | +/** |
| 11 | + * This library provides an implementation of merging and analysis algorithms for exponential histograms based on the |
| 12 | + * <a href="https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram">OpenTelemetry definition</a>. |
| 13 | + * It is designed as a complementary tool to the OpenTelemetry SDK, focusing specifically on efficient histogram merging and accurate |
| 14 | + * percentile estimation. |
| 15 | + * |
| 16 | + * <h2>Overview</h2> |
| 17 | + * |
| 18 | + * The library implements base-2 exponential histograms with perfect subsetting. The most important properties are: |
| 19 | + * |
| 20 | + * <ul> |
| 21 | + * <li>The histogram has a scale parameter, which defines the accuracy. A higher scale implies a higher accuracy.</li> |
| 22 | + * <li>The {@code base} for the buckets is defined as {@code base = 2^(2^-scale)}.</li> |
| 23 | + * <li>The histogram bucket at index {@code i} has the range {@code (base^i, base^(i+1)]}</li> |
| 24 | + * <li>Negative values are represented by a separate negative range of buckets with the boundaries {@code (-base^(i+1), -base^i]}</li> |
| 25 | + * <li>Histograms support perfect subsetting: when the scale is decreased by one, each pair of adjacent buckets is merged into a |
| 26 | + * single bucket without introducing error</li> |
| 27 | + * <li>A special zero bucket with a zero-threshold is used to handle zero and close-to-zero values</li> |
| 28 | + * </ul> |
| 29 | + * |
| 30 | + * For more details please refer to the |
| 31 | + * <a href="https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram">OpenTelemetry definition</a>. |
| 32 | + * <p> |
| 33 | + * The library implements a sparse storage approach where only populated buckets consume memory and count towards the bucket limit. |
| 34 | + * This differs from the OpenTelemetry implementation, which uses dense storage. While dense storage allows for O(1) time insertion of |
| 35 | + * individual values, our sparse representation requires O(log m) time where m is the bucket capacity. However, the sparse |
| 36 | + * representation enables more efficient storage and provides a simple merging algorithm with runtime linear in the number of |
| 37 | + * populated buckets. Additionally, this library also provides an array-backed sparse storage, ensuring cache efficiency. |
| 38 | + * <p> |
| 39 | + * The sparse storage approach offers significant advantages for distributions with fewer distinct values than the bucket count, |
| 40 | + * allowing the library to achieve representation of such distributions with an error so small that it won't be noticed in practice. |
| 41 | + * This makes it suitable not only for exponential histograms but also as a universal solution for handling explicit bucket |
| 42 | + * histograms. |
| 43 | + * |
| 44 | + * <h2>Merging Algorithm</h2> |
| 45 | + * |
| 46 | + * The merging algorithm works similarly to the merge-step of merge sort. We simultaneously walk through the buckets of both |
| 47 | + * histograms in order, merging them on the fly as needed. If the total number of buckets in the end would exceed the bucket limit, |
| 48 | + * we scale down as needed. |
| 49 | + * <p> |
| 50 | + * Before we merge the buckets, we need to take care of the special zero-bucket and bring both histograms to the same scale. |
| 51 | + * <p> |
| 52 | + * For the zero-bucket, we merge the zero threshold from both histograms and collapse any overlapping buckets into the resulting new |
| 53 | + * zero bucket. |
| 54 | + * <p> |
| 55 | + * In order to bring both histograms to the same scale, we can make adjustments in both directions: we can increase or decrease the |
| 56 | + * scale of histograms as needed. |
| 57 | + * <p> |
| 58 | + * See the upscaling section for details on how the upscaling works. Upscaling helps prevent the precision of |
| 59 | + * the result histogram merged from many histograms from being dragged down to the lowest scale of a potentially misconfigured input |
| 60 | + * histogram. For example, if a histogram is recorded with a too low zero threshold, this can result in a degraded scale when using |
| 61 | + * dense histogram storage, even if the histogram only contains two points. |
| 62 | + * |
| 63 | + * <h3>Upscaling</h3> |
| 64 | + * |
| 65 | + * In general, we assume that all values in a bucket lie on a single point: the point of least relative error. This is the point |
| 66 | + * {@code x} in the bucket such that: |
| 67 | + * |
| 68 | + * <pre> |
| 69 | + * (x - l) / l = (u - x) / u |
| 70 | + * </pre> |
| 71 | + * |
| 72 | + * where {@code l} is the lower bucket boundary and {@code u} is the upper bucket boundary. |
| 73 | + * <p> |
| 74 | + * This assumption allows us to increase the scale of histograms without increasing the bucket count. Buckets are simply mapped to |
| 75 | + * the ones in the new scale containing the point of least relative error of the original buckets. |
| 76 | + * <p> |
| 77 | + * This can introduce a small error, as the original center might be moved slightly. Therefore, we ensure that the upscaling happens |
| 78 | + * at most once to prevent errors from adding up. The higher the amount of upscaling, the less the error (higher scale means smaller |
| 79 | + * buckets, which in turn means we get a better fit around the original point of least relative error). |
| 80 | + * |
| 81 | + * <h2>Distributions with Few Distinct Values</h2> |
| 82 | + * |
| 83 | + * The sparse storage model only requires memory linear to the total number of buckets, while dense storage needs to store the entire |
| 84 | + * range of the smallest and biggest buckets. |
| 85 | + * <p> |
| 86 | + * This offers significant benefits for distributions with fewer distinct values: |
| 87 | + * If we have at least as many buckets as we have distinct values to store in the histogram, we can represent this distribution with |
| 88 | + * a much smaller error than the dense representation. |
| 89 | + * This can be achieved by maintaining the scale at the maximum supported value (so the buckets become the smallest). |
| 90 | + * At the time of writing, the maximum scale is 38, so the relative distance between the lower and upper bucket boundaries is |
| 91 | + * {@code (2^2^(-38))}. |
| 92 | + * <p> |
| 93 | + * The impact of the error is best shown with a concrete example: |
| 94 | + * If we store, for example, a duration value of {@code 10^15} nanoseconds (= roughly 11.5 days), this value will be stored in a |
| 95 | + * bucket that guarantees a relative error of at most {@code 2^2^(-38)}, so roughly 2.5 microseconds in this case. |
| 96 | + * As long as the number of values we insert is lower than the bucket count, we are guaranteed that no down-scaling happens: |
| 97 | + * In contrast to dense storage, the scale does not depend on the spread between the smallest and largest bucket index. |
| 98 | + * <p> |
| 99 | + * To clarify the difference between dense and sparse storage, let's assume that we have an empty histogram and the maximum scale is |
| 100 | + * zero while the maximum bucket count is four. |
| 101 | + * The same logic applies to higher scales and bucket counts, but we use these values to get easier numbers for this example. |
| 102 | + * The scale of zero means that our bucket boundaries are {@code 1, 2, 4, 8, 16, 32, 64, 128, 256, ...}. |
| 103 | + * We now want to insert the value {@code 6} into the histogram. The dense storage works by storing an array for the bucket counts |
| 104 | + * plus an initial offset. |
| 105 | + * This means that the first slot in the bucket counts array corresponds to the bucket with index {@code offset} and the last one to |
| 106 | + * {@code offset + bucketCounts.length - 1}. |
| 107 | + * So if we add the value {@code 6} to the histogram, it falls into the {@code (4,8]} bucket, which has the index {@code 2}. |
| 108 | + * <p> |
| 109 | + * So our dense histogram looks like this: |
| 110 | + * |
| 111 | + * <pre> |
| 112 | + * offset = 2 |
| 113 | + * bucketCounts = [1, 0, 0, 0] // represent bucket counts for bucket index 2 to 5 |
| 114 | + * </pre> |
| 115 | + * |
| 116 | + * If we now insert the value {@code 20} ({@code (16,32]}, bucket index 4), everything is still fine: |
| 117 | + * |
| 118 | + * <pre> |
| 119 | + * offset = 2 |
| 120 | + * bucketCounts = [1, 0, 1, 0] // represent bucket counts for bucket index 2 to 5 |
| 121 | + * </pre> |
| 122 | + * |
| 123 | + * However, we run into trouble if we insert the value {@code 100}, which corresponds to index 6: That index is outside of the bounds |
| 124 | + * of our array. |
| 125 | + * We can't just increase the {@code offset}, because the first bucket in our array is populated too. |
| 126 | + * We have no other option other than decreasing the scale of the histogram, to make sure that our values {@code 6} and {@code 100} |
| 127 | + * fall in the range of four <strong>consecutive</strong> buckets due to the bucket count limit of the dense storage. |
| 128 | + * <p> |
| 129 | + * In contrast, a sparse histogram has no trouble storing this data while keeping the scale of zero: |
| 130 | + * |
| 131 | + * <pre> |
| 132 | + * bucketIndiciesToCounts: { |
| 133 | + * "2" : 1, |
| 134 | + * "4" : 1, |
| 135 | + * "6" : 1 |
| 136 | + * } |
| 137 | + * </pre> |
| 138 | + * |
| 139 | + * Downscaling on the sparse representation only happens if either: |
| 140 | + * <ul> |
| 141 | + * <li>The number of populated buckets would become bigger than our maximum bucket count. We have to downscale to combine |
| 142 | + * neighboring, populated buckets to a single bucket until we are below our limit again.</li> |
| 143 | + * <li>The highest or smallest indices require more bits to store than we allow. This does not happen in our implementation for |
| 144 | + * normal inputs, because we allow up to 62 bits for index storage, which fits the entire numeric range of IEEE 754 double |
| 145 | + * precision floats at our maximum scale.</li> |
| 146 | + * </ul> |
| 147 | + * |
| 148 | + * <h3>Handling Explicit Bucket Histograms</h3> |
| 149 | + * |
| 150 | + * We can make use of this property to convert explicit bucket histograms |
| 151 | + * (<a href="https://opentelemetry.io/docs/specs/otel/metrics/data-model/#histogram">OpenTelemetry Histogram</a>) to exponential |
| 152 | + * ones by again assuming that all values in a bucket lie in a single point: |
| 153 | + * <ul> |
| 154 | + * <li>For each explicit bucket, we take its point of least relative error and add it to the corresponding exponential histogram |
| 155 | + * bucket with the corresponding count.</li> |
| 156 | + * <li>The open, upper, and lower buckets, including infinity, will need special treatment, but these are not useful for percentile |
| 157 | + * estimates anyway.</li> |
| 158 | + * </ul> |
| 159 | + * |
| 160 | + * This gives us a great solution for universally dealing with histograms: |
| 161 | + * When merging exponential histograms generated from explicit ones, the scale is not decreased (and therefore the error not |
| 162 | + * increased) as long as the number of distinct buckets from the original explicit bucket histograms does not exceed the exponential |
| 163 | + * histogram bucket count. As a result, the computed percentiles will be precise with only the |
| 164 | + * <a href="#distributions-with-few-distinct-values">relative error of the initial conversion</a>. |
| 165 | + * In addition, this allows us to compute percentiles on mixed explicit bucket histograms or even mix them with exponential ones by |
| 166 | + * just using the exponential histogram algorithms. |
| 167 | + */ |
| 168 | +package org.elasticsearch.exponentialhistogram; |
0 commit comments