-
Notifications
You must be signed in to change notification settings - Fork 936
[DRAFT] Telemetry Policy #4738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[DRAFT] Telemetry Policy #4738
Changes from 6 commits
e70377e
da34640
6396605
4a092aa
4063e20
1e8697b
26645d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,379 @@ | ||||||
| # Telemetry Policies | ||||||
|
|
||||||
| Defines a new concept for OpenTelemetry: Telemetry Policy. | ||||||
|
|
||||||
| ## Motivation | ||||||
|
|
||||||
| OpenTelemetry provides a robust, standards based instrumentation solution. | ||||||
| this includes many great components, e.g. | ||||||
|
|
||||||
| - Declarative configuration | ||||||
| - Control Protocol via OpAMP | ||||||
| - X-language extension points in the SDK (samplers, processors, views) | ||||||
| - Telemetry-Plane controls via the OpenTelemetry collector. | ||||||
|
|
||||||
| However, OpenTelemetry still struggles to provide true "remote control" | ||||||
| capabilities that are implementation agnostic. When using OpAMP with an | ||||||
| OpenTelemetry collector, the "controlling server" of OpAMP needs to understand | ||||||
| the configuration layout of an OpenTelemetry collector. If a user asked the | ||||||
| server to "filter out all attributes starting with `x.`", the server would | ||||||
| need to understand/parse the OpenTelemetry collector configuration. If the | ||||||
| controlling sever was also managing an OpenTelemetry SDK, then it would need | ||||||
| a *second* implementation of the 'filter attribute" feature for the SDK vs. | ||||||
| the Collector. Additionally, as the OpenTelemetry collector allows custom | ||||||
| configuration file formats, there is no way for a "controlling server" to | ||||||
| operate with an OpenTelemetry Collection distribution without understanding all | ||||||
| possible implementations it may need to talk to. | ||||||
|
|
||||||
| Additionally, existing remote-control capabilities in OpenTelemetry are not | ||||||
| "guaranteed" to be usable due to specification language. For example, today | ||||||
| one can use the Jaeger Remote Sampler specified for OpenTelemetry SDKs and the | ||||||
| jaeger remote sampler extension in the OpenTelemetry collector to dynamically | ||||||
| control the sampling of spans in SDKs. However, File-based configuration does | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| not require dynamic reloading of configuration. This means attempting to | ||||||
| provide a solution like Jaeger-remote-sampler with just OpAMP + file-based | ||||||
| config is impossible, today. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dynamic control of SDKs is something that should be able to be built on top of or as an evolution of declarative config. I / we have been conscious of this eventually while building declarative config and I don't think anything will get in the way. Also, I hope that minimally, the declarative config data model can be used as a way for servers to communicate the desired configuration state of components in a dynamic config scenario.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While I agree to a degree, the type of control and abstraction these proposal seeks to enable is NOT possible without agreement on semantics and use-cases across diverse implementations. E.g. the declarative config + OpAMP could be used to send any config to any component. What it doesn't do, and what we need to sort out, is how to understand what config can be sent to what component, and how to drive control / policy independent of implementation or pipeline set-up, e.g. Imagine a world where we can control the reporting of metrics across open telemetry SDKs, custom implementations and Prometheus SDKs because we agreed to the semantics of policy independent of configuration.
So I see Declarative config as encompassing more than just policies, where policies would be a subset of what you'd find. Additionally, Policies can be independent things that you can bundle together. I should be able to "add" a policy at any point without needing to understand how it interacts with other components. AN example of this - If I have a configuration reporting metrics, that configuration would have a MetricReader->MetricExporter right? What If there's multiple. How would I know what to change generically, if I just wanted to say "stop producing metric X". Policies are ignorant of this. They just push a policy down and the SDK would be expected to enforce this via a Apologies not all of this is fleshed out, as it's a working draft, and one we're working on in the repo. Please continue to ask questions and I'll use that to flesh out the motivation more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an exact case I have, turning off metrics. And turning them back on. I implement this by having a flag in a custom exporter which stops/restarts exports. A generic solution to turning it off would be to change the exporter config to none, then I guess you could re-enable by setting again to otlp, but that implies a much more complex action in the SDK rather than switching a boolean on/off
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added this to the alternatives considered discussion
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
A bold vision. I think I was definitely misunderstanding the scope. I'll revise my position: If we want dynamic control solutions specifically for otel SDKs, the declarative config data model should play a role, because not using it means introducing yet another config interface (YACI 😛). With a broader scope targeting other tools besides otel SDKs, we would of course need something not loaded with otel SDK vocabulary / baggage. Should this type of thing even live in otel or in some neutral territory? (reminds me of the relationship between w3c trace content and opentelemetry) Are there other ecosystems that have expressed interest in or that we've reached out to for collaborating?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great quesitons!
100%!
Great question. I personally think this belongs in OTEL and should "feel native" to otel, but allow any component in o11y space to interact with it. This can increase the reach of "effective opentelemetry" as components which support writing OTLP can also participate with policies. However, to your question above, if this wasn't first-class in otel, how would we make sure our declarative config data model plays an important role?
The idea is the outcome of discussions with both Envoy (and their xDS control plane folks) and Google's Monarch team (see #4672). I would love to pull in more folks to collaborate for sure. First, I want to make sure we all understand the vision, scope and goals. This PR was meant to be a place for those of us who started discussing to flesh out the proposal in place (as draft), so this PR is meant to be collecting that interest and refining the message. APologies it was rough when you first reviewed it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| However, we believe there is a way to achieve our goals without changing | ||||||
| the direction of OpAmp or File-based configuration. Instead we can break apart | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| the notion of "Configuration" from "Policy", providing a new capability in | ||||||
| OpenTelemetry. | ||||||
|
|
||||||
| ## Explanation | ||||||
|
|
||||||
| We define a new concept called a `Telemetry Policy`. A Policy is an | ||||||
| intent-based specification from a user of OpenTelemetry. | ||||||
|
|
||||||
| - **Typed**: A policy self identifies its "type". Policies of different types | ||||||
| cannot be merged, but policies of the same type MUST be merged together. | ||||||
| - **Clearly specified behavior**: A policy type enforces a specific behavior for | ||||||
| a clear use case, e.g. trace sampling, metric aggregation, attribute | ||||||
| filtering. | ||||||
| - **Implementation Agnostic**: I can use the exact same policy in the collector | ||||||
| or an SDK or any other component supporting OpenTelemetry's ecosystem. | ||||||
| - **Standalone**: I don't need to understand how a pipeline is configured to define | ||||||
| policy. | ||||||
| - **Dynamic**: We expect policies to be defined and driven outside the lifecycle | ||||||
| of a single collector or SDK. This means the SDK behavior needs the ability | ||||||
| to change post-instantiation. | ||||||
| - **Idempotent**: I can give a policy to multiple components in a | ||||||
| telemetry-plane safely. E.g. if both an SDK and collector obtain an | ||||||
| attribute-filter policy, it would only occur once. | ||||||
|
|
||||||
| Every policy is defined with the following: | ||||||
|
|
||||||
| - A `type` denoting the use case for the policy | ||||||
| - A JSON schema denoting what a valid definitions of the policy entails, | ||||||
| describing how servers should present the policy to customers. | ||||||
| - An specification denoting behavior the policy enforces, i.e., for a given | ||||||
| JSON entry, to which elements the policy applies and which behaviors is | ||||||
| expected from an agent or collector implementing the policy. | ||||||
|
|
||||||
| Policies MUST NOT: | ||||||
|
|
||||||
| - Specify configuration relating to the underlying policy applier implementation. | ||||||
| - A policy cannot know where the policy is going to be run. | ||||||
| - Specify its transport methodology. | ||||||
| - Interfere with telemetry upon failure. | ||||||
| - Policies MUST be fail-open. | ||||||
| - Contain logical waterfalls. | ||||||
| - Each policy's application is distinct from one another and at this moment | ||||||
| MUST not depend on another running. This is in keeping with the idempotency | ||||||
| principle. | ||||||
|
|
||||||
| Example policy types include: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we think of this as new "subtypes" of declarative config that can be used in a standalone way? E.g. if we think of the current declarative config as configuration as type "SDK", we could define sub-types like "sampler", "view", or "log-record-processor"? If we can, I would love to keep the same yaml structure / definitions for these policies that we currently have in the declarative config so we avoid introducing another structured definition of what a "sampler" is. Or do you think because this is targeted at the collector as well that isn't feasible?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd expect the declarative config for a policy-component to be used directly in declarative config: So something like: The primary difference between the policy for sampling and a "sampler" will actually be in flexibility. A sampler component could be written in any language, allow any code and its configuration must be open. A sampler policy MUST have a well-defined behavior, have the same configuration and behavior in all languages or implementations. So primarily, a policy is highly limited in a way extension points are not. |
||||||
| - `trace-sampling`: define how traces are sampled | ||||||
| - `metric-rate`: define sampling period for metrics | ||||||
| - `log-filter`: define how logs are sampled/filtered | ||||||
| - `attribute-redaction`: define attributes which need redaction/removal. | ||||||
| - `metric-aggregation`: define how metrics should be aggregated (i.e. views). | ||||||
| - `exemplar-sampling`: define how exemplars are sampled | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This reads like a subset of declarative configuration capabilities. Wouldn't it be easier to unify on one data model (i.e. declarative config) for expressing the desired configuration, and build tooling to detect / apply diffs when a a change is pushed from a remote server? I.e. an app starts with: Later, a remote server pushes a new configuration state with an updated ratio for the trace id ratio sampler: Some controller is responsible for evaluating the diff between the current state and the desired state, and computing / executing update steps as allowed. In this case, substitute the sampler.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can read some of my rationale at the bottom of the OTEP. Effectively:
So there are a lot of similarities, but the key difference is the limitations. |
||||||
| - `attribute-filter`: define data that should be rejected based on attributes | ||||||
|
|
||||||
| ## Policy Ecosystem | ||||||
|
|
||||||
| Policies are designed to be straightforward objects with little to no logic | ||||||
| tied to them. Policies are also designed to be agnostic to the transport, | ||||||
| implementation, and data type. It is the goal of the ecosystem to support | ||||||
| policies in various ways. Policies MUST be additive and MUST NOT break existing | ||||||
| standards. It is therefore our goal to extend the ecosystem by recommending | ||||||
| implementations through the following architecture. | ||||||
|
|
||||||
| The architectural decisions are meant to be flexible to allow users optionality | ||||||
| in their infrastructure. For example, a user may decide to run a multi-stage | ||||||
| policy architecture where the SDK, daemon collector, and gateway collector work | ||||||
| in tandem where the SDK and Daemons are given set policies while the gateway is | ||||||
| remotely managed. Another user may choose to solely remotely manage their SDKs. | ||||||
| As a result of this scalable architecture, it's recommended the policy providers | ||||||
| updates are asynchronous. An out of date policy (i.e. one updated in a policy | ||||||
| provider but not yet in the applier) should not be lethal to the functionality | ||||||
| of the system. | ||||||
|
|
||||||
| ```mermaid | ||||||
| --- | ||||||
| title: Policy Architecture | ||||||
| --- | ||||||
| flowchart TB | ||||||
| subgraph providers ["Policy Providers"] | ||||||
| direction TB | ||||||
| PP["«interface» Policy Provider"] | ||||||
| File["File Provider"] | ||||||
| HTTP["HTTP Server Provider"] | ||||||
| OpAMP["OpAMP Server Provider"] | ||||||
| Custom["Custom Provider"] | ||||||
| PP -.->|implements| File | ||||||
| PP -.->|implements| HTTP | ||||||
| PP -.->|implements| OpAMP | ||||||
| PP -.->|implements| Custom | ||||||
| end | ||||||
| subgraph aggregator ["Policy Aggregator"] | ||||||
| PA["Policy Aggregator (Special Provider)"] | ||||||
| end | ||||||
| subgraph implementation ["Policy Implementation"] | ||||||
| PI["Policy Implementation"] | ||||||
| PT["Supported Policy Types"] | ||||||
| PI --- PT | ||||||
| end | ||||||
| subgraph policies ["Policies"] | ||||||
| P1["Policy 1"] | ||||||
| P2["Policy 2"] | ||||||
| P3["Policy N..."] | ||||||
| end | ||||||
| %% Provider relationships | ||||||
| PP -.->|implements| PA | ||||||
| %% Aggregator pulls from providers | ||||||
| File -->|policies| PA | ||||||
| HTTP -->|policies| PA | ||||||
| OpAMP -->|policies| PA | ||||||
| Custom -->|policies| PA | ||||||
| %% Providers supply policies to implementation | ||||||
| File -->|supplies policies| PI | ||||||
| HTTP -->|supplies policies| PI | ||||||
| OpAMP -->|supplies policies| PI | ||||||
| Custom -->|supplies policies| PI | ||||||
| PA -->|supplies policies| PI | ||||||
| %% Policies relationship | ||||||
| PP -->|provides| policies | ||||||
| PI -->|runs| policies | ||||||
| %% Optional type info | ||||||
| PP -.->|"may supply supported policy types (optional)"| PI | ||||||
| ``` | ||||||
|
|
||||||
| ## Example Ecosystem implementations | ||||||
|
|
||||||
| We make the following observations and recommendations for how the community may | ||||||
| integrate with this specification. | ||||||
|
|
||||||
| ### OpenTelemetry SDKs | ||||||
|
|
||||||
| An SDK's declaritive configuration may be extended to support a list of policy | ||||||
| providers. An SDK with no policy providers set is the same behavior as today as | ||||||
| policies are fail open. The simplest policy provider is the file provider. The SDK | ||||||
| should read this file upon startup, and optionally watch the file for changes. The | ||||||
| policy provider may supply the configuration for watching. | ||||||
|
|
||||||
| The policy providers for the SDK push policies into the SDK, allowing the SDK to become | ||||||
| a policy implementation. An SDK may receive updates at any time for these policies, so | ||||||
| it must allow for the reloading in its extension points. Sample SDK extension points: | ||||||
|
|
||||||
| - `PolicySampler`: Pulls relevant `trace-sampling` policies from | ||||||
| PolicyProvider, and uses them. | ||||||
| - `PolicyLogProcessor`: Pulls Relevant `log-filter` policies from | ||||||
| PolicyProvider and uses them. | ||||||
| - `PolicyPeriodicMetricReader`: Pulls Relevant `metric-rate` policies | ||||||
| from PolicyProvider and uses them to export metrics. | ||||||
|
|
||||||
| ### OpenTelemetry Collector | ||||||
|
|
||||||
| The collector is a natural place to run these policies. A policy processor may be | ||||||
| introduced to execute its set of policies. It is recommended that the collector uses | ||||||
| the same declaritive configuration the SDK uses for policy provider configuration. The | ||||||
| collector may introduce an inline policy provider that provides a set of default policies | ||||||
| to execute in addition to whatever may be received from the policy providers. | ||||||
|
|
||||||
| The collector may also have a policy extension which allows it to serve as a policy | ||||||
| aggregator. In this world, the collector's policy extension would have a list of policy | ||||||
| providers it pulls from while other policy implementations set the collector as a policy | ||||||
| provider. This is akin to the proxy pattern you see in other control plane implementations. | ||||||
| This pattern should allow for a horizontally scalable architecture where all extensions | ||||||
| eventually report the same policies. | ||||||
|
|
||||||
| ### OpAMP | ||||||
|
|
||||||
| Per the constraints above, this specification makes NO requirements to the transport layer | ||||||
| for policy providers. OpAMP is a great extension point which may serve as a | ||||||
| policy provider through the use of custom messages. A policy implementation with OpAMP | ||||||
| support may use the OpAMP connection to transport policies. This specification makes no | ||||||
| recommendation as to what that custom message may look currently. | ||||||
|
|
||||||
| ### Summary | ||||||
|
|
||||||
| While we make no requirements of these groups in this specification, it is recommended | ||||||
| that they all adhere to a consistent experience for users to enhance portability. The | ||||||
| authors here will coordinate with other SIGs to ensure agreement upon this configuration. | ||||||
| This may involve a follow up to this specification recommending policy provider specifics | ||||||
| such as an HTTP/gRPC definition. This definition would then serve as a basis for custom | ||||||
| implementations like that for OpAMP. More on this in `Future Possibilities` | ||||||
|
|
||||||
| ## Internal details | ||||||
|
|
||||||
| ### Merging policies | ||||||
|
|
||||||
| Since the policy itself does not enforce a transport mechanism or format, it is | ||||||
| natural that the merge algorithm is also not enforced by the policy. As such, | ||||||
| whenever a policy is transmitted it should specify how it is expected to be merged, either by | ||||||
| relying on a standard merge mechanism from the protocol or by setting it up explicitly during transmission, | ||||||
|
|
||||||
| For JSON, a service can follow either [JSON Patch](https://datatracker.ietf.org/doc/html/rfc6902) or | ||||||
| [JSON Merge Patch](https://datatracker.ietf.org/doc/html/rfc6902) to create policies that can be merged and | ||||||
| remain idempotent. Below we have the same update for a hypothetical `metric-rate` policy that can be merged following the RFCs | ||||||
|
|
||||||
| ```json | ||||||
| # JSON Merge Patch | ||||||
| { | ||||||
| "rpc.server.latency": { "sampling_rate_ms": 10000 } | ||||||
| } | ||||||
|
|
||||||
| # JSON Patch | ||||||
| [ | ||||||
| { "op": "add", | ||||||
| "path": "/rpc.server.latency", | ||||||
| "value": { | ||||||
| "sampling_rate_ms": 10000 | ||||||
| } | ||||||
| } | ||||||
| ] | ||||||
| ``` | ||||||
|
|
||||||
| Proto based transmission protocols can rely on [`Merge`](https://pkg.go.dev/google.golang.org/protobuf/proto#Merge) or [`MergeFrom`](https://protobuf.dev/reference/cpp/api-docs/google.protobuf.message/#Message.MergeFrom.details) provided by the library | ||||||
|
|
||||||
| The mechanism for negotiating a protocol will depend on the specific `PolicyProvider` implementation, some options are: | ||||||
|
|
||||||
| * A `FileProvider` will either use a default merger from the format (like the default proto merge), or accept a parameter that specifies which merger is expected when reading the specific file format (for example, for JSON). | ||||||
| * A HTTP provider can use different file formats to decide which merger to use, as specified in the RFCs for JSON patch formats. | ||||||
| * OpAmp providers could add a field specifying the merger as well as the data being transmitted, plus a mechanism for systems to inform each other which mergers are avaliable and how the data is expected to be merged. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| ## Trade-offs and mitigations | ||||||
|
|
||||||
| TODO - write | ||||||
|
|
||||||
| What are some (known!) drawbacks? What are some ways that they might be mitigated? | ||||||
|
|
||||||
| Note that mitigations do not need to be complete *solutions*, and that they do not need to be accomplished directly through your proposal. A suggested mitigation may even warrant its own OTEP! | ||||||
|
|
||||||
| ## Prior art and alternatives | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would love to see alternatives here. We've discussed things like dynamically-reloadable or merge rules for declarative config, and it would help reinforce why we need a new concept to solve the problems you are interested in.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, I need to write down the treatment of why dynamically reloaded config doesn't solve the problems that motivate the proposal. My answer to your other comment, hopefully, hints at that, but it'll be a longer write-up. |
||||||
|
|
||||||
| TODO - discuss https://github.com/open-telemetry/opentelemetry-specification/pull/4672 | ||||||
|
|
||||||
| ### Declarative Config + OpAMP as sole control for telemetry | ||||||
|
|
||||||
| The declarative config + OpAMP could be used to send any config to any | ||||||
| component in OpenTelemetry. Here, we would leverage OpAMP configuration passing | ||||||
| and the open-extension and definitions of Declarative Config to pass the whole | ||||||
| behavior of an SDK or Collector from an OpAMP "controlling server" down to a | ||||||
| component and have them dynamically reload behavior. | ||||||
|
|
||||||
| What this solution doesn't do is answer how to understand what config can be | ||||||
| sent to what component, and how to drive control / policy independent of | ||||||
| implementation or pipeline set-up. For example, imagine a simple collector | ||||||
| configuration: | ||||||
|
|
||||||
| ```yaml | ||||||
| receivers: | ||||||
| otlp: | ||||||
| prometheus: | ||||||
| # ... config ... | ||||||
| processors: | ||||||
| batch: | ||||||
| memorylimiter: | ||||||
| transform/drop_attribute: | ||||||
| # config to drop an attribute | ||||||
| exporters: | ||||||
| otlp: | ||||||
| pipelines: | ||||||
| metrics/critical: | ||||||
| receivers: [otlp] | ||||||
| processors: [batch, transform/drop_attribute] | ||||||
| exporters: [otlp] | ||||||
| metrics/all: | ||||||
| receivers: [prometheus] | ||||||
| processors: [memorylimiter] | ||||||
| exporters: [otlp] | ||||||
| ``` | ||||||
| Here, we have two pipelines with intended purposes and tuned configurations. | ||||||
| One which will *not* drop metrics when memory limits are reached and another | ||||||
| that will. Now - if we want to drop a particular metric from being reported, | ||||||
| which pipeline do we modify? Should we construct a new processor for that | ||||||
| purpose? Should we always do so? | ||||||
| Now imagine we *also* have an SDK we're controlling with declarative config. If | ||||||
| we want to control metric inclusion in that SDK, we'd need to generate a | ||||||
| completely different looking configuration file, as follows: | ||||||
| ```yaml | ||||||
| file_format: '1.0-rc.1' | ||||||
| # ... other config ... | ||||||
| meter_provider: | ||||||
| readers: | ||||||
| - my_custom_metric_filtering_reader: | ||||||
| my_filter_config: # defines what to filter | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You want to filter metrics using a filtering reader (this component doesn't exist in the SDK spec and so would have to be custom) vs. views or meter config?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure, I can update this to use views instead as well. I was taking from the proposed OTEP where you can control both the reporting of a metric and the report interval (i.e. periodic metric reader would need configuration for how often to report each set of metrics). |
||||||
| wrapped: | ||||||
| periodic: | ||||||
| exporter: | ||||||
| otlp_http: | ||||||
| endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT:-http://localhost:4318}/v1/metric | ||||||
| ``` | ||||||
| Here, I've created a custom component in java to allow filtering which metrics are read. | ||||||
| However, to insert / use this component I need to have all of the following: | ||||||
| - Know that this component exists in the java SDK | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is a popular use case we should extend the SDK spec to add an additional built in component. We're too reluctant to do this right now.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That still won't tell me if it's safe to send configuration to an SDK or not. I need to know, at runtime, that the version of the SDK I'm trying to control will support that config or if I'll crash a key component. Additionally, it doesn't help me ignore the implementation detail. E.g. what If I also want to control Prometheus client library? We don't own their config or their specification. However, we could build something that interacts with remote policies, similar to Jaeger-Remote-Sampler of today for traces. |
||||||
| - Know how to wire it into any existing metric export pipeline (e.g. my reader | ||||||
| wraps another reader that has the real export config). | ||||||
| Note: This likely means I need to understand the rest of the exporter | ||||||
| configuration or be able to parse it. | ||||||
| This is not ideal for a few reasons: | ||||||
| - Anyone designing a server that can control telemetry flow MUST have a deep | ||||||
| understanding of all components it could control and their implementations. | ||||||
| - We don't have a "safe" mechanism to declare what configuration is supported | ||||||
| or could be sent to a specific component (note: we can design one) | ||||||
| - The level of control we'd expose from our telemetry systems is *expansive* | ||||||
| and possibly dangerous. | ||||||
| - We cannot limit the impact of any remote configuration on the working of a | ||||||
| system. We cannot prevent changes that may take down a process. | ||||||
| - We cannot limit the execution overhead of configuration or fine-grained | ||||||
| control over what changes would be allowed remotely. | ||||||
| ## Open questions | ||||||
| What are some questions that you know aren't resolved yet by the OTEP? These may be questions that could be answered through further discussion, implementation experiments, or anything else that the future may bring. | ||||||
| - Should this specification give recommendations for the server protobufs | ||||||
| - How should we handle policy merging? | ||||||
| - (jacob) Could policies contain a priority and it's up to the providers to design around this? | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @andrzej-stencel i want to continue the OTTL discussion in a thread here.
IMO, we shouldn't tie ourselves to OTTL just yet for a few big reasons:
Currently, none of the example policy types have to do with transformation (save the redaction policy), and most are focused on filtration or minor configuration. I have a POC for the policy proto, and one of the biggest things for this project to be successful is that policies can be run efficiently based on matching conditions (i.e. run policy X when Y condition is met). For that matching logic, I think it makes sense to be based on semantic convention which is a well-defined standard at this point. I think for now, given we're proposing the overarching Policy model, I don't want to get too caught up in a single policy type and rather focus on the concept as a whole. You could imagine in the future as we design specific policy types, one would be of type |
||||||
| ## Prototypes | ||||||
| Link to any prototypes or proof-of-concept implementations that you have created. | ||||||
| This may include code, design documents, or anything else that demonstrates the | ||||||
| feasibility of your proposal. | ||||||
| Depending on the scope of the change, prototyping in multiple programming | ||||||
| languages might be required. | ||||||
| ## Future possibilities | ||||||
| What are some future changes that this proposal would enable? | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.