Skip to content

Conversation

@dricross
Copy link

@dricross dricross commented Oct 27, 2025

Description

New implementation for converting OTel histograms to CloudWatch Values/Counts for emission to CloudWatch by the CloudWatch Agent. The OTel histogram format is incompatible with the CloudWatch APIs. A mapping algorithm is needed to transform OTel histograms to Values/Counts.

OTel histograms are in the format:

  • A series of buckets with:
    • Explicit boundary values. These values denote the lower and upper bounds for buckets and whether not a given observation would be recorded in this bucket.
    • A count of the number of observations that fell within this bucket.
  • Min (optional)
  • Max (optional)
  • Sum
  • Count
  • Attributes (key/value pairs)

See the following for more details on OTel histogram format: https://opentelemetry.io/docs/specs/otel/metrics/data-model/#histogram

For the purposes of this algorithm, the input histograms are assumed to always be in Delta temporarility as the CloudWatch Agent will use the cumuluativetodelta processor to convert before emission.

CloudWatch accepts histograms using the Values/Counts model in the PutMetricData API.

  • Values: Array of numbers representing the values for the metric during the period. Each unique value is listed just once in this array, and the corresponding number in the Counts array specifies the number of times that value occurred during the period. You can include up to 150 unique values in each PutMetricData action that specifies a Values array.
  • Counts: Array of numbers that is used along with the Values array. Each number in the Count array is the number of times the corresponding value in the Values array occurred during the period.
  • StatisticValues which contains statistic values for the input data set:
    • Min (not optional)
    • Max (not optional)
    • Sum
    • SampleCount
  • Dimensions (key/value pairs)

See the following for more details: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_MetricDatum.html. This API accepts:

This algorithm converts each of the buckets of the input histogram into (at most) 10 value/count data pairs aka "inner buckets". The values of the inner buckets are spread evenly across the bucket span. The counts of the inner buckets are determined using an exponential mapping algorithm. Counts are weighted more heavily to one side according to an exponential function depending on how the density of the nearby buckets are changing.

The following image demonstrates how an example input histogram is converted to the values/count model. The red dots indicate the values/counts that are pushed to CloudWatch.
image

Testing

Unit testing

Used the new tools introduced previously to send histogram test cases to CloudWatch and then retrieve the percentile metrics.

TestCase                                                   P10         P25         P50         P75         P90         P99       P99.9         Min         Max         Sum       Count
126 Buckets                                             125.95      314.76      629.19      944.75      1132.9      1258.2      1271.5           5        1300  5.2233e+06        8316
176 Buckets                                             175.88      440.04      880.01      1318.6      1583.1      1771.3      1797.1           5        1800  1.0182e+07       11616
225 Buckets                                              226.5      564.39      1128.8      1695.1      2033.5      2239.9      2289.3           5        2300  1.6822e+07       14916
325 Buckets                                             325.96      814.79      1628.7      2443.9      2931.6      3260.9      3296.1           5        3300  3.4983e+07       21516
Basic Histogram                                         17.913      28.327      50.986      73.413      86.886       194.4      199.43          10         200       36000         606
Cumulative bucket starts at 0                         0.010662    0.049403     0.10823     0.23481     0.40067      2.7043      11.867           0          45        6600       19086
Large Numbers                                       3.5613e+05  1.8884e+06  9.4334e+06  4.9984e+07   9.722e+07  7.2107e+08  8.7259e+08       1e+05       1e+09       6e+11        6006
Many Buckets                                            6.0464      35.102      89.752      558.59      889.85      1043.9      1090.7         0.5        1100     2.1e+06        6744
Negative and Positive Boundaries                           N/A         N/A         N/A         N/A         N/A         N/A         N/A         -50          50           0         636
No Min or Max                                           2.1182      18.084      55.369      71.599      180.26       242.8      250.74           0         300       21000         450
No Min/Max with Single Value                            142.82      143.99      145.97      147.97      149.18      149.92      149.99          50         150         600           6
Only Max Defined                                        52.465      118.33      203.07      303.27      367.55      733.64      748.35           0         750    1.05e+05         606
Only Min Defined                                        37.583      56.621      86.121       110.6      128.82      170.21       171.7          25         200       24000         306
Only Negative Boundaries                                   N/A         N/A         N/A         N/A         N/A         N/A         N/A        -200         -10      -60000         606
Positive boundaries but implied Negative Values            N/A         N/A         N/A         N/A         N/A         N/A         N/A        -100          60        1200         606
Single Bucket                                           37.763      38.306       39.23      40.176      40.754      41.106      41.141           5          75        6000         306
Tail Heavy Histogram                                    128.84      139.48       144.7      147.85      149.77      150.93         151          10         151     8.7e+05        6060
Two Buckets                                               1.78      2.6881       4.278       5.429      6.3839      9.9766      9.9977           1          10         900         186
Unbounded Histogram                                          0           0           0           0           0           0           0           0           0       21000         450
Very Small Numbers                                  5.2363e-08  7.2171e-07   1.629e-06  2.7846e-06  3.3259e-06  4.5734e-06  5.9513e-06       1e-08       6e-06      0.0009         606
Zero Counts and Sparse Data                             1.0712      2.8607      7.7614      221.86      983.31      1271.3      1489.5           0        1500     1.5e+05         606

Most percentiles fall within the expected range. A few are off by a percent or two. I believe this is due to the back-end applying another SEH1 mapping slightly modifying the values that the agent sends to CW for efficient storing.

For our accuracy tests, we see several improvments:

  • Maximum error reduced from 99% to 9%
  • Reduce average error from 30% to 3%
  • Improve throughput for histogram conversions by 60%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant