Skip to content

[RFC] tracer: add interface for custom probes#1326

Draft
florianl wants to merge 7 commits intoopen-telemetry:mainfrom
florianl:probes
Draft

[RFC] tracer: add interface for custom probes#1326
florianl wants to merge 7 commits intoopen-telemetry:mainfrom
florianl:probes

Conversation

@florianl
Copy link
Copy Markdown
Member

@florianl florianl commented Apr 7, 2026

Important

This is a preliminary draft intended to initiate discussion about potential changes and opportunities, not for immediate merging.

Package tracer was initially designed and architected for continuous sampling-based profiling. The subsequent addition of off-CPU and generic probe profiling capabilities has increased the complexity of the codebase. Given that requests for further features, such as memory profiling or process-specific profiling, are regular occurrences, a discussion is necessary to determine how to incorporate these additions while ensuring package tracer remains reliable and maintainable. For this purpose this draft addresses identified limitations and showcases a possible option.

Limitations

This draft showcases some potential options for current limitations.

Hard coded origin IDs

The current mechanism for sending stack traces from the tracer package via the reporter package relies on hardcoded origin IDs. This draft proposes removing these hardcoded IDs. Instead, it introduces an API within the reporter package, allowing the tracer package to register known origin IDs and provide associated metadata to the reporter.

Note

Resolving this limitation can be an independent change.

Maintainability

Similar to interpreters, this draft also shows that people that are involved in a custom probe could become code owner for their probe. This helps to keep the project in a maintainable state and engages people to contribute.

Configuration

Choosing names is notoriously difficult in programming. When this task involves configuration settings, the challenge becomes even greater. As individual features require individual configuration, introducing the concept of custom probes allows to have individual configuration per probe.

As an example, this draft can be compiled and used with the following OTel collector configuration:

receivers:
  profiling:
    custom_probes:
      oom: {}
      offcpu:
        threshold: 0.1

This configuration enables two custom probes, a very basic out-of-memory profiling probe, oom, that requires no configuration and off-CPU profiling, offcpu, with a threshold of 0.1.

Reporting values

At the moment, only the current implementation of off-CPU profiling allows reporting values along with collected stack traces. This draft lifts this limitation and replaces OffTimes with Values. Custom probes can benefit from this and also report a value, e.g. a cookie or something else, along with collected stack traces.

Note

Resolving this limitation can be an independent change.

Generic probe

Backends currently can not differentiate multiple events that were reported as TraceOriginProbe from package tracer. Therefore the idea is to introduce a custom probe that can be attached to hooks as needed and report with the correct information and make them distinguishable from other events of the same generic probe.

Next

As this draft is not intended to get merged in its current state, the idea is to get feedback on it and start a discussion. Is there interest in the introduction of custom probes and resolving the above listed limitations? Are there alternative options?

Memory profiling

Custom probes could enable language-specific memory profiling, acknowledging the diverse ways programming languages manage memory.

Process filtering

To address use cases requiring process-specific filtering, a custom probe could be implemented. This probe would trigger stack unwinding specifically for a configured process, allowing the general system sampling frequency to be reduced for all other activity.

For visibility @open-telemetry/ebpf-profiler-maintainers

> [!IMPORTANT]
> This is a preliminary draft intended to initiate discussion about potential changes and opportunities, not for immediate merging.

Package `tracer` was initially designed and architected for continuous sampling-based profiling. The subsequent addition of off-CPU and generic probe profiling capabilities has increased the complexity of the codebase. Given that requests for further features, such as memory profiling or process-specific profiling, are regular occurrences, a discussion is necessary to determine how to incorporate these additions while ensuring package `tracer` remains reliable and maintainable. For this purpose this draft addresses identified limitations and showcases a possible option.

Limitations
========

This draft showcases some potential options for current limitations.

Hard coded origin IDs
---------------------

The current mechanism for sending stack traces from the tracer package via the reporter package relies on hardcoded origin IDs. This draft proposes removing these hardcoded IDs. Instead, it introduces an API within the reporter package, allowing the tracer package to register known origin IDs and provide associated metadata to the reporter.

> [!NOTE]
> Resolving this limitation can be an independent change.

Maintainability
---------------

Similar to interpreters, this draft also shows that people that are involved in a custom probe could become code owner for their probe. This helps to keep the project in a maintainable state and engages people to contribute.

Configuration
-------------

Choosing names is notoriously difficult in programming. When this task involves configuration settings, the challenge becomes even greater. As individual features require individual configuration, introducing the concept of custom probes allows to have individual configuration per probe.

As an example, this draft can be compiled and used with the following OTel collector configuration:

```
receivers:
  profiling:
    custom_probes:
      oom: {}
      offcpu:
        threshold: 0.1
```
This configuration enables two custom probes, a very basic out-of-memory profiling probe, oom, that requires no configuration and off-CPU profiling, offcpu, with a threshold of 0.1.

Reporting values
----------------

At the moment, only the current implementation of off-CPU profiling allows reporting values along with collected stack traces. This draft lifts this limitation and replaces `OffTimes` with `Values`. Custom probes can benefit from this and also report a value, e.g. a cookie or something else, along with collected stack traces.

> [!NOTE]
> Resolving this limitation can be an independent change.

Generic probe
-------------

Backends currently can not differentiate multiple events that were reported as TraceOriginProbe from package tracer. Therefore the idea is to introduce a custom probe that can be attached to hooks as needed and report with the correct information and make them distinguishable from other events of the same generic probe.

Next
===

As this draft is not intended to get merged in its current state, the idea is to get feedback on it and start a discussion. Is there interest in the introduction of custom probes and resolving the above listed limitations? Are there alternative options?

Memory profiling
----------------

Custom probes could enable language-specific memory profiling, acknowledging the diverse ways programming languages manage memory.

Process filtering
-----------------

To address use cases requiring process-specific filtering, a custom probe could be implemented. This probe would trigger stack unwinding specifically for a configured process, allowing the general system sampling frequency to be reduced for all other activity.

Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Copy link
Copy Markdown
Contributor

@rogercoll rogercoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the probe-as-plugin model seems like the right approach to me since all probes share the unwinding infrastructure. It reminds me of the hostmetrics receiver
pattern, where a single receiver owns the shared lifecycle and pipeline while individual scrapers register via factories with their own typed configs. Following
that same model here, a factory registry per probe instead of a central switch, and ProbeRegistrar decoupled from TraceReporter would align well with Collector conventions and keep the probe surface clean as new ones are added.

return nil
}

func createCustomProbe(name string, cfg any) (tracer.Probe, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be moved to a map factories instead

@gnurizen
Copy link
Copy Markdown
Contributor

I need to look at this more closely but I like the concept! One thing we do that I've been meaning to bring up is the notion of a trace interceptor which might fit naturally into the probe interface, or at least be something other probe impls want to do. The idea is that a trace might not be complete when it hits loadBpfTrace/HandleTrace etc and we want to (post symbolizing) park it somewhere and report it later after anointing it with some extra data (in this case GPU kernel executions). This should give you a rough idea: parca-dev@decb784

This can't really be done downstream in the reporter interface because anoiting may add/modify frames and change the hash so we need to get in before the hash calculation to participate in caching goodness.

If you don't think this needs to be tied up w/ probes let me know, pulling this out into a PR to upstream is on my TODO list.

Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants