Skip to content

Conversation

Blinkuu
Copy link

@Blinkuu Blinkuu commented Dec 3, 2024

Fixes #4298

This PR adds the MeasurementProcessor concept to the Metrics SDK specification.

The goal is to allow functionality such as:

  • Adding attributes, e.g., injection of additional attributes to measurements based on Context
  • Modifying attributes
  • Dropping attributes
  • Dropping individual measurements
  • Modifying measurements

Prototypes:
Python: open-telemetry/opentelemetry-python#4642
Rust: open-telemetry/opentelemetry-rust#2797

Copy link

linux-foundation-easycla bot commented Dec 3, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@pellared

This comment was marked as resolved.

Add status field

Co-authored-by: Robert Pająk <[email protected]>
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 14, 2024
@Blinkuu
Copy link
Author

Blinkuu commented Dec 19, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Still working on this; will try to provide another iteration early next year.

@pellared pellared removed the Stale label Dec 19, 2024
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 27, 2024
@Blinkuu Blinkuu changed the title [WIP] Add MeasurementProcessor specification to Metrics SDK Add MeasurementProcessor specification to Metrics SDK Jan 2, 2025
@Blinkuu Blinkuu marked this pull request as ready for review January 2, 2025 12:07
@Blinkuu Blinkuu requested review from a team and jmacd January 2, 2025 12:07
Copy link
Contributor

@tedsuo tedsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spec meeting today, there was a request for to clarify the motivations for and against this design:

  • Motivating use cases that we want to solve with this design.
  • Whether these use cases are rare enough that could be solved once in the Collector, vs many times in the language SDKs.
  • Examples showing how the cost of SDK implementation for this design could be high in some languages, compared to alternatives.

@tylerbenson
Copy link
Member

@pellared While I agree with the general reasoning about not allowing value to be changed without valid uses cases, I think the proposed API design is fairly clean and restricting to just modifying attributes would complicate things quite a bit.

@reyang
Copy link
Member

reyang commented Jul 15, 2025

Therefore, I suggest to remove the part related to modifying/dropping measurements from the scope of this PR. I think it is worth mentioning that the Rust prototype is following our proposal and it does NOT allow modifying nor dropping measurements:

I disagree.

I have couple scenarios that I need to drop/modify measurements.
One example - we have critical metrics (e.g. security, SLA) that we have carefully designed the attributes and values to control the cardinality centrally. There are cases where people made mistakes so we get measurements that didn't not follow the rule and we ended up with cardinality overflow. One approach is to wrap the API to check the value, this only works if all the instrumentation code is controlled by the same team. MeasurementProcessor would allow such outliers to be captured and reported (e.g. via audit logs + exception callback), and the policy owner can decide what to do (e.g. drop the measurement, aggregate the value using a special dimension, or something else).

@pellared
Copy link
Member

just modifying attributes would complicate things quite a bit.

@tylerbenson, how would it complicate things?

@pellared
Copy link
Member

pellared commented Jul 15, 2025

There are cases where people made mistakes so we get measurements that didn't not follow the rule and we ended up with cardinality overflow.

Wouldn't modifying attributes for these measurements be also a viable solution?

One approach is to wrap the API to check the value,

Wouldn't the ability to only read the measurement value be enough (without the ability to modify it)?

@reyang
Copy link
Member

reyang commented Jul 15, 2025

One approach is to wrap the API to check the value,

Wouldn't the ability to only read the measurement value be enough (without the ability to modify it)?

No, as the policy owner, you want to have the ability to drop illegal value.

@pellared
Copy link
Member

pellared commented Jul 15, 2025

No, as the policy owner, you want to have the ability to drop illegal value.

What about the alternative approach of adding an attribute instead e.g. illegal=true (from #4318 (comment))?
Why would it not be acceptable?

@reyang
Copy link
Member

reyang commented Jul 15, 2025

No, as the policy owner, you want to have the ability to drop illegal value.

What about the alternative approach of adding an attribute instead e.g. illegal=true (from #4318 (comment))? Why would it not be acceptable?

Compliance/contractual requirement.

@MrAlias
Copy link
Contributor

MrAlias commented Jul 15, 2025

Therefore, I suggest to remove the part related to modifying/dropping measurements from the scope of this PR. I think it is worth mentioning that the Rust prototype is following our proposal and it does NOT allow modifying nor dropping measurements:

I disagree.

I have couple scenarios that I need to drop/modify measurements. One example - we have critical metrics (e.g. security, SLA) that we have carefully designed the attributes and values to control the cardinality centrally. There are cases where people made mistakes so we get measurements that didn't not follow the rule and we ended up with cardinality overflow. One approach is to wrap the API to check the value, this only works if all the instrumentation code is controlled by the same team. MeasurementProcessor would allow such outliers to be captured and reported (e.g. via audit logs + exception callback), and the policy owner can decide what to do (e.g. drop the measurement, aggregate the value using a special dimension, or something else).

Can you please clarify, how is this a measurement filter problem, not an attribute filter problem?

If you measure 0.826, does this get dropped?

@reyang
Copy link
Member

reyang commented Jul 15, 2025

Please clarify how this is a measurement filter problem, not an attribute filter problem.

If you measure 0.826, does this get dropped?

Yes, it gets dropped. The value is considered illegal due to compliance/contractual requirement and will be independently escalated via an auditing system.

@MrAlias
Copy link
Contributor

MrAlias commented Jul 16, 2025

Please clarify how this is a measurement filter problem, not an attribute filter problem.
If you measure 0.826, does this get dropped?

Yes, it gets dropped. The value is considered illegal due to compliance/contractual requirement and will be independently escalated via an auditing system.

What criteria is used to evaluate this choice? Just the number?

How does this centralized cardinality control factor in?

@MrAlias
Copy link
Contributor

MrAlias commented Jul 16, 2025

Why are we not trying to solve the use-cases with views?

@Blinkuu
Copy link
Author

Blinkuu commented Jul 18, 2025

Thanks, everyone, for chiming in and for the healthy debate. I will add my two cents here for why the MeasurementProcessor should be as flexible as we currently define it.

Let me capture a few real-world use cases that fit the functionality I described in the original PR comment.

Adding attributes, e.g., injection of additional attributes to measurements based on Context

This is probably the easiest to justify. From a high level, this is the last missing piece to allow great flexibility in correlation between all three telemetry signals: Metrics, Logs, and Traces.

Use cases here are infinite. In the past, @jsuereth shared a critical user interaction use case with me, which is very appealing.

I am personally excited about being able to isolate data from tests, such as k6, where a load generator would embed a "test id" inside the baggage. If the entire system under test is instrumented with OpenTelemetry, this baggage would propagate across all the services that are part of the tested path, and we can dynamically inject it as an attribute. Today, we can only easily isolate Traces and Logs, which both support the processor pattern. Metrics are opaque, and without MeasurementProcessor, this tight correlation will not be possible.

I also know of people who would be interested in being able to inject, e.g., dynamically, a tenant_id -- for obvious reasons.

I also want to stress that none of these would be possible if the processor concept were only implemented in the collector. It must be available via the SDK.

Modifying attributes and Dropping attributes

I grouped these two. The use cases here are also easy to spot. I'll just share a few from the top of my head that I had to deal with in the past.

First, modify attributes. Imagine you already have a tenant_id dimension on your metric, but you may have a contractual argument to anonymise it. The Measurement Processor makes it very easy to do that. Another use case -- you may want to collapse attributes into just a tiny set, controlling your cardinality this way. I'll skip listing use cases for dropping attributes, as they are similar and pretty obvious.

Dropping individual measurements

I think @reyang nicely captures the need for this above. Let's say we have an instrument that reports several measurements for different tenants (tenant_id is a dimension). Now, a customer asks you not to track any metadata associated with their tenant. What do you do then? You want to drop measurements for that particular tenant, as dropping the metric after it's been sent over the wire violates a contract. Again, this cannot be facilitated with Views today, as they can only drop attributes.

Modifying measurements

This one is interesting too. Modyfing measurements allows e.g., to clamp values. I think there's a lot of use cases around this too, where the underlying instrument reports real numbers, but perhaps you're only interested in 0 or 1. Again, Measurement Processor allows to do this in the simplest possible way.


The best thing about this solution is that we can facilitate all this and more with a beautifully simple and concise design.


Built-in measurement processors are responsible for [Measurement Processing](#measurement-processing).

`MeasurementProcessors` can be registered directly on SDK `MeterProvider` and they are invoked in the same order as they were registered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be good to specify how these are meant to be registered, similar to what we have for span processors. Can they be provided when the MeterProvider is instantiated (similar to span processors)? Does it need to be possible to dynamically register/unregister them (similar to callbacks)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to use the same wording as we use for Provider configuration in general.
i.e It MUST be possible to add measurement_processor as part of MeterProvider creation.
Implementations MAY allow modifications after MeterProvider is created. If they do, then all existing meter/instruments should reflect the effect of this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the exact same wording we use for LogRecordProcessor and SpanProcessor.

I think it should be consistent, it's the same concept.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify how MeterProviders are configured in the MeterProvider spec, not in the MeasurementProcessors spec. See this section of TracerProvider: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#configuration that @cijothomas quoted above. Make sure anything related to MeasurementProcessor is marked as Development status.


SDK MUST ensure that the pipeline concludes with the built-in [DefaultProcessor](#defaultprocessor).

The following diagram shows `MeasurementProcessor`'s relationship to other components in the SDK:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include Views in this diagram. Importantly, is View configuration applied before the Measurement processor, or after, or both? We clearly want it to be applied before the aggregation portion of views is applied, but it would be nice if it was applied after the View's attribute filter so processors don't need to consider attributes that aren't going to be kept.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Views are meant to be applied after Measurement Processors. They operate on a higher level of abstraction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized cardinality limits should probably also be included, but we can wait until the discussion below is resolved.


`MeasurementProcessors` can be registered directly on SDK `MeterProvider` and they are invoked in the same order as they were registered.

Each processor registered on the `MeterProvider` is part of a pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we consider making processors part of the View configuration? Many of the use-cases seem to apply to specific metrics (e.g. changing the unit of a metric), or aggregating specific attribute values together. It also fits nicely within the scope of what Views are meant to be able to do: Make pre-aggregation adjustments to metrics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was considered. It looks similar, but in my opinion, Views are meant to work on a higher level of abstraction, not with individual measurements. The existing design is also deeply rooted in analysis of existing SDK implementations, where incorporating this concept into Views would be extremely difficult.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain further? Attribute filters from views are applied very early in the SDK's handling of a measurement. They apply even prior to cardinality limit enforcement. Implementing both of those features post-aggregation (i.e. not on measurements) would defeat one of their primary purpose, which is to limit memory consumption of the SDK, so I would be very surprised if they were not applied to individual measurements.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on the Metrics SDKs, but after looking into the code when preparing this PR I concluded that supporting all the use cases we envisioned would require major overhaul of how Views work.

Please see #4298 (comment)

Perhaps I'm wrong, and there's a neat way to make Views fully programmable.

Regarding performance - MeasurementProcessor, like LogRecordProcessor and SpanProcessor is an opt-in feature. If it's not used, the cost is a single if-statement, exactly the same as with https://github.com/MrAlias/opentelemetry-go/pull/1092/files#diff-2c6f5806e27a5ab81eeba6f9d6fff1a80ac35188c8e22deea1c97a77fdb04634

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to apply measurement processors immediately after the view's attribute filter, and prior to cardinality limit enforcement. The benefit of this would be that the measurement processor could reduce cardinality, and prevent hitting the limit by merging attribute values together, which is one of the intended uses.

In terms of implementation, this means you would just keep the measurement processor very close to the attribute filter (e.g. pass them to the same functions). The existing attribute filter could be thought of as a measurement processor that always comes first, and has more limited capabilities. In go, we would probably extend, or add something similar to this function to add the measurement processor in addition to the existing filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding performance - MeasurementProcessor, like LogRecordProcessor and SpanProcessor is an opt-in feature. If it's not used, the cost is a single if-statement, exactly the same as with https://github.com/MrAlias/opentelemetry-go/pull/1092/files#diff-2c6f5806e27a5ab81eeba6f9d6fff1a80ac35188c8e22deea1c97a77fdb04634

The if statement you are referencing is in the build path. The hot-path performance has no additional branch points for the views implementation.

@reyang
Copy link
Member

reyang commented Jul 18, 2025

Why are we not trying to solve the use-cases with views?

The original ask is to understand the usage scenario. @MrAlias do you think we're still collecting scenarios, or we agree with scenarios and now we want to discuss about the right design?

I don't think View is the right place.

View is designed to handle very specific scenarios via user-provided configurations. MeasurementProcess gives the user ability to plug their own logic (e.g. code).

Think about this - if the policy requires you to drop measurements for tenants that match certain a given list of names/ids, do you think it makes sense to add such feature to View and make it such complicated? Do you think View will eventually have a callback mechanism?

@MrAlias
Copy link
Contributor

MrAlias commented Jul 18, 2025

The original ask is to understand the usage scenario. @MrAlias do you think we're still collecting scenarios, or we agree with scenarios and now we want to discuss about the right design?

No, I'm still not following your example: #4318 (comment)

@MrAlias
Copy link
Contributor

MrAlias commented Jul 18, 2025

do you think it makes sense to add such feature to View

yes

@reyang
Copy link
Member

reyang commented Jul 18, 2025

The original ask is to understand the usage scenario. @MrAlias do you think we're still collecting scenarios, or we agree with scenarios and now we want to discuss about the right design?

No, I'm still not following your example: #4318 (comment)

do you think it makes sense to add such feature to View

yes

Sorry, would you summarize your position and thinking?
I'm confused as it sounds like you don't understand the scenario, and yet you think it should be solved by View?

@MrAlias
Copy link
Contributor

MrAlias commented Jul 18, 2025

The original ask is to understand the usage scenario. @MrAlias do you think we're still collecting scenarios, or we agree with scenarios and now we want to discuss about the right design?

No, I'm still not following your example: #4318 (comment)

do you think it makes sense to add such feature to View

yes

Sorry, would you summarize your position and thinking? I'm confused as it sounds like you don't understand the scenario, and yet you think it should be solved by View?

I have no idea if you're situation needs a solution, or what that solution is. Please clarify the problem your trying to address so I can respond effectively.

@MrAlias
Copy link
Contributor

MrAlias commented Jul 18, 2025

I'm confused as it sounds like you don't understand the scenario, and yet you think it should be solved by View?

None of the understandable use-cases provided so far have presented a problem that we couldn't solve by extending our existing View definition.

Adding a competing processing pipeline has yet to be justified.

@cijothomas
Copy link
Member

Narrow use case: We could not find concrete examples in either the issue or the PR description that demonstrate where such a feature is necessary. Meanwhile, implementing it would require significant refactoring, at least on the OTel Go side.

Concrete examples are covered in this comment now, perhaps make it very explicit in the PR description itself to avoid confusion about the use-cases?

Sharing two common requests I have been encountering:

  1. Ability to add extra attributes to measurements. (eg: could be from Context/Baggage, like a "is_synthetic" setting in Baggage)
  2. Ability to "collapse" attributes to reduce cardinality. (eg: /route/user/user1 -> /route/user/{userid})
    Both the above, assuming we have no control on the instrumentation side.

Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an additional processing pipeline to the metric signal (in addition to views) will negatively impact the usability of the metric SDK APIs and impose global performance degradation across SDKs. It will add a competing stream processing configuration endpoint for users to understand and use in concert with all views and it will require all streams be passed through the new Metric processor (imposing non-zero overhead in the process) instead of just matching instruments (as would be the case with a view).

I do not think this proposal should be accepted. We should look into how we can make Views support use-cases we would like to support.

There are already two proof-of-concepts for how to support attribute mutation in a view that will address the original concern of this PR:

I see moving forward in the direction proposed by this PR as a misstep and I do not think it should be taken at this point in time.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 30, 2025
Copy link

github-actions bot commented Aug 6, 2025

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support measurement processors in Metrics SDK