-
Couldn't load subscription status.
- Fork 34
[awsemfexporter] Support gauge to cloudwatch histogram convertion in EMF exporter #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aws-cwa-dev
Are you sure you want to change the base?
Conversation
1a663eb to
db2a1ad
Compare
exporter/awsemfexporter/config.go
Outdated
| // MetricDeclarations is the list of rules to be used to set dimensions for exported metrics. | ||
| MetricDeclarations []*MetricDeclaration `mapstructure:"metric_declarations"` | ||
|
|
||
| // List of string denoting gaugage metric names required for aggregation into cw histogram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Type gaugage -> Gauge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
3e5c2d0 to
6ad6507
Compare
|
What is the configuration you used for the test? Which of the metrics are you aggregating? |
| for val, cnt := range countMap { | ||
| hist.Values = append(hist.Values, val) | ||
| hist.Counts = append(hist.Counts, cnt) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd recommend pre-allocating hist.Values and hist.Counts to reduce the number of allocations. It can help quite a lot with performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! will revise it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will pre-allocate hist.Values and Counts in below format, correct me if i was wrong:
// Pre-allocate slices to avoid multiple allocations during append
hist.Values = make([]float64, 0, len(countMap))
hist.Counts = make([]float64, 0, len(countMap))
| if v > hist.Max { | ||
| hist.Max = v | ||
| } | ||
| countMap[v]++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but if this is the core of the aggregation logic, then this isn't really creating a histogram. It's aggregating to a series of value/count pairs for each unique floating point value. Considering the precision of float64, I dont think we'll get much aggregation out of it.
If we wanted to create an actual histogram that aggregates a range of datapoints, we'd need define bucket where each bucket represents a range of values, store all of the incoming datapoints into those buckets, and then convert the buckets to values/counts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, in fact, we are aggregating to cloudwatch histogram instead of opentelemetry histogram. For cw histogram, it's exactly in the formact of values and counts, and cloudwatch backend would do the calcluation for percentile values e.g. P90.
9406579 to
a58c06e
Compare
|
|
||
| updatedMetadata = replacePatternsIfNeeded(metadata, labels, config, patternReplaceSucceeded) | ||
| // For compacted gauge metrics, use ExponentialHistogram type to convert it to values and counts in PutLogEvent | ||
| updatedMetadata.metricDataType = pmetric.MetricTypeExponentialHistogram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this setting needed to force some other logic to run? The data isn't actually a histogram. It's in values/counts but its still a gauge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there was some special logic for converting values and counts in PutLogEvent, but it is actually not. Make it back to Gauge.
a58c06e to
95b4340
Compare
95b4340 to
6b90657
Compare
| } | ||
|
|
||
| // compactGaugeMetrics converts a collection of gauge data points into a compact representation, e.g. values and counts. | ||
| func compactGaugeMetrics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc it's not really compacting gauges but converting to CWHistogram. The name could be more aligned with what it actually does like convertGaugesToCWHistogram or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously I named it as buildHistogram, but from Rick's perspective, it should be a kind of compaction or deduplication, so renamed all occurrence of Histogram to Compact. Pasted the comment from Rick for reference:
Correct me if I'm wrong, but if this is the core of the aggregation logic, then this isn't really creating a histogram. It's aggregating to a series of value/count pairs for each unique floating point value. Considering the precision of float64, I dont think we'll get much aggregation out of it.
If we wanted to create an actual histogram that aggregates a range of datapoints, we'd need define bucket where each bucket represents a range of values, store all of the incoming datapoints into those buckets, and then convert the buckets to values/counts.
Note
This PR is part of enabling high frequency gpu metrics, other related PRs are 370 and 1893.
Description
This PR is exposing a string list -
GaugeMetricsToCompactin config ofawsemfexporter, all matching metrics will be converted to cloudwatch histogram, e.g. values and counts format. This functionality is to support high frequency metrics. For ex, if eks gpu metrics are collected every second, there will be a couple metrics with the same name in current batch. Aggregate them into one cloudwatch histogram would reduce the metrics quantity to Cloudwatch service and customer cost on high frequency metrics.Refactor the code in the meantime to make it more clean.
Testing