[RFC] tracer: add interface for custom probes#1326
[RFC] tracer: add interface for custom probes#1326florianl wants to merge 7 commits intoopen-telemetry:mainfrom
Conversation
> [!IMPORTANT]
> This is a preliminary draft intended to initiate discussion about potential changes and opportunities, not for immediate merging.
Package `tracer` was initially designed and architected for continuous sampling-based profiling. The subsequent addition of off-CPU and generic probe profiling capabilities has increased the complexity of the codebase. Given that requests for further features, such as memory profiling or process-specific profiling, are regular occurrences, a discussion is necessary to determine how to incorporate these additions while ensuring package `tracer` remains reliable and maintainable. For this purpose this draft addresses identified limitations and showcases a possible option.
Limitations
========
This draft showcases some potential options for current limitations.
Hard coded origin IDs
---------------------
The current mechanism for sending stack traces from the tracer package via the reporter package relies on hardcoded origin IDs. This draft proposes removing these hardcoded IDs. Instead, it introduces an API within the reporter package, allowing the tracer package to register known origin IDs and provide associated metadata to the reporter.
> [!NOTE]
> Resolving this limitation can be an independent change.
Maintainability
---------------
Similar to interpreters, this draft also shows that people that are involved in a custom probe could become code owner for their probe. This helps to keep the project in a maintainable state and engages people to contribute.
Configuration
-------------
Choosing names is notoriously difficult in programming. When this task involves configuration settings, the challenge becomes even greater. As individual features require individual configuration, introducing the concept of custom probes allows to have individual configuration per probe.
As an example, this draft can be compiled and used with the following OTel collector configuration:
```
receivers:
profiling:
custom_probes:
oom: {}
offcpu:
threshold: 0.1
```
This configuration enables two custom probes, a very basic out-of-memory profiling probe, oom, that requires no configuration and off-CPU profiling, offcpu, with a threshold of 0.1.
Reporting values
----------------
At the moment, only the current implementation of off-CPU profiling allows reporting values along with collected stack traces. This draft lifts this limitation and replaces `OffTimes` with `Values`. Custom probes can benefit from this and also report a value, e.g. a cookie or something else, along with collected stack traces.
> [!NOTE]
> Resolving this limitation can be an independent change.
Generic probe
-------------
Backends currently can not differentiate multiple events that were reported as TraceOriginProbe from package tracer. Therefore the idea is to introduce a custom probe that can be attached to hooks as needed and report with the correct information and make them distinguishable from other events of the same generic probe.
Next
===
As this draft is not intended to get merged in its current state, the idea is to get feedback on it and start a discussion. Is there interest in the introduction of custom probes and resolving the above listed limitations? Are there alternative options?
Memory profiling
----------------
Custom probes could enable language-specific memory profiling, acknowledging the diverse ways programming languages manage memory.
Process filtering
-----------------
To address use cases requiring process-specific filtering, a custom probe could be implemented. This probe would trigger stack unwinding specifically for a configured process, allowing the general system sampling frequency to be reduced for all other activity.
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
rogercoll
left a comment
There was a problem hiding this comment.
Overall the probe-as-plugin model seems like the right approach to me since all probes share the unwinding infrastructure. It reminds me of the hostmetrics receiver
pattern, where a single receiver owns the shared lifecycle and pipeline while individual scrapers register via factories with their own typed configs. Following
that same model here, a factory registry per probe instead of a central switch, and ProbeRegistrar decoupled from TraceReporter would align well with Collector conventions and keep the probe surface clean as new ones are added.
| return nil | ||
| } | ||
|
|
||
| func createCustomProbe(name string, cfg any) (tracer.Probe, error) { |
There was a problem hiding this comment.
This could be moved to a map factories instead
|
I need to look at this more closely but I like the concept! One thing we do that I've been meaning to bring up is the notion of a trace interceptor which might fit naturally into the probe interface, or at least be something other probe impls want to do. The idea is that a trace might not be complete when it hits loadBpfTrace/HandleTrace etc and we want to (post symbolizing) park it somewhere and report it later after anointing it with some extra data (in this case GPU kernel executions). This should give you a rough idea: parca-dev@decb784 This can't really be done downstream in the reporter interface because anoiting may add/modify frames and change the hash so we need to get in before the hash calculation to participate in caching goodness. If you don't think this needs to be tied up w/ probes let me know, pulling this out into a PR to upstream is on my TODO list. |
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Important
This is a preliminary draft intended to initiate discussion about potential changes and opportunities, not for immediate merging.
Package
tracerwas initially designed and architected for continuous sampling-based profiling. The subsequent addition of off-CPU and generic probe profiling capabilities has increased the complexity of the codebase. Given that requests for further features, such as memory profiling or process-specific profiling, are regular occurrences, a discussion is necessary to determine how to incorporate these additions while ensuring packagetracerremains reliable and maintainable. For this purpose this draft addresses identified limitations and showcases a possible option.Limitations
This draft showcases some potential options for current limitations.
Hard coded origin IDs
The current mechanism for sending stack traces from the tracer package via the reporter package relies on hardcoded origin IDs. This draft proposes removing these hardcoded IDs. Instead, it introduces an API within the reporter package, allowing the tracer package to register known origin IDs and provide associated metadata to the reporter.
Note
Resolving this limitation can be an independent change.
Maintainability
Similar to interpreters, this draft also shows that people that are involved in a custom probe could become code owner for their probe. This helps to keep the project in a maintainable state and engages people to contribute.
Configuration
Choosing names is notoriously difficult in programming. When this task involves configuration settings, the challenge becomes even greater. As individual features require individual configuration, introducing the concept of custom probes allows to have individual configuration per probe.
As an example, this draft can be compiled and used with the following OTel collector configuration:
This configuration enables two custom probes, a very basic out-of-memory profiling probe, oom, that requires no configuration and off-CPU profiling, offcpu, with a threshold of 0.1.
Reporting values
At the moment, only the current implementation of off-CPU profiling allows reporting values along with collected stack traces. This draft lifts this limitation and replaces
OffTimeswithValues. Custom probes can benefit from this and also report a value, e.g. a cookie or something else, along with collected stack traces.Note
Resolving this limitation can be an independent change.
Generic probe
Backends currently can not differentiate multiple events that were reported as TraceOriginProbe from package tracer. Therefore the idea is to introduce a custom probe that can be attached to hooks as needed and report with the correct information and make them distinguishable from other events of the same generic probe.
Next
As this draft is not intended to get merged in its current state, the idea is to get feedback on it and start a discussion. Is there interest in the introduction of custom probes and resolving the above listed limitations? Are there alternative options?
Memory profiling
Custom probes could enable language-specific memory profiling, acknowledging the diverse ways programming languages manage memory.
Process filtering
To address use cases requiring process-specific filtering, a custom probe could be implemented. This probe would trigger stack unwinding specifically for a configured process, allowing the general system sampling frequency to be reduced for all other activity.
For visibility @open-telemetry/ebpf-profiler-maintainers