-
Notifications
You must be signed in to change notification settings - Fork 56
The aggregation params of metrics can be defined #844 #845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
The aggregation params of metrics can be defined #844 #845
Conversation
|
Had a quick look. This appears to be a freeform yaml block. Could |
|
I did consider using annotations, but would rather be more explicit in this case and opted against a strongly typed option as custom aggregation types could be implemented. We should discuss if there is a valid use case for allowing semconv to specify a non default aggregation type. If yes that would become a seperate string property alongside the hashmap. At the same time, should the output schema have the default parameters populated if not provided in input schema. Also aggregation if buckets are different sizes will impact your metrics etc hence not just annotation. I did this small to make it easier/quicker to review and enable more of semconv to be auto generated. |
|
I don't think we make any requirement on delta vs. cumulative, but we should have a way to specify histogram boundaries. |
| pub struct AggregationSpec { | ||
| /// The parameters used in the aggregation | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub parameters: Option<HashMap<String, YamlValue>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in the tooling call:
bucket boundaries are advisory params, changing them would not be breaking, so we should put them into annotations.
It'd still be useful to define the specific format, but not in the rust code. We can define specific format in the JSON schema, but we can also do it later and start by suggesting the format inside semconv repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with having it as advisory won't you run into issues/inaccurate data if you have multiple instruments generating measurements with different boundaries but using the same metric. Hence hard to see it as advisory but more a requirement. Also what about the aggregation type, that is also a requirment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"won't you run into issues/inaccurate data if you have multiple instruments generating measurements with different boundaries but using the same metric"
Not when following best practices for histograms/distributions. in fact, our Exponential histograms are designed around bucket boundaries changing in the lifespan of the same time series.
The OpenTelemetry specification has explicitly allowed bucket boundaries to change during a timeseries lifespan. So "default advise" would not have this restriction, and many metric systems will handle this effectively (even prometheus if using functions like histogram_quantile()).
So, by default, weaver will not enforce this. If someone wanted to enforce this, weaver would ALLOW that via annotations and custom rego policies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This aggregation object is for defining the properties based on the aggregation type being used. The supported settings are described in the spec at https://opentelemetry.io/docs/specs/otel/metrics/sdk/#aggregation
That document gives the impression that if you define explicit boundaries the sdk is instructed to use them.
I will make it more explicit by adding the type in there, that way it is possible to fall back to the defaults. That also provides a way to indicate it is exponential etc.
Note annotations are not emitted by weaver for group members/signal in resolved form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay in responding here. A few important points:
Note annotations are not emitted by weaver for group members/signal in resolved form.
This is a bug that should be fixed.
That document gives the impression that if you define explicit boundaries the sdk is instructed to use them.
Yes but there are two users of semconv here: The storage and visualization that uses the histogram and the instrumentation that produces it.
What we do NOT want is to pretend like there is a perfect set of boundaries in semconv for which all histograms should abide. We can provide a default to codegen. In reality, getting histogram boundaries right often needs specific knoweldge of the system being developed. Poor boundaries can lead to inefficient and inaccurate histograms for your services. This is why we have an "advice" API and want histogram boundaries as a "hint" to codegen vs. a first class thing. Additionally, this is why we're moving to exponential histograms, where you can more easily say "here's the resolution I want" and the histogram expands boundaries to fit the appropriate distribution.
Today - we took the approach the code generation + "advice" in Metrics should be a hint in weaver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we do NOT want is to pretend like there is a perfect set of boundaries in semconv for which all histograms should abide.
Agree it is bespoke to the metric and if not explicitly configured then the default is used which could be the metric default or the global default.
Additionally, this is why we're moving to exponential histograms, where you can more easily say "here's the resolution I want" and the histogram expands boundaries to fit the appropriate distribution.
For the exponential histogram do you see the scale and/or size properties as being general advice or requirements which should be followed?
Also what about aggregation method? Is this a requirement or advice?
Closes: #844
This is an attempt to help enable semconv to move towards an automated metric registry by closing the gap between current documented information and what is supported on the models.
This work has been completed by following: https://github.com/open-telemetry/weaver/blob/main/docs/developer-guide.md, if anything is missing lets also add it to the document.