Skip to content

feat: Dynamic target PIDs and target_pids_file for collector#1249

Closed
shivanshuraj1333 wants to merge 1 commit intoopen-telemetry:mainfrom
shivanshuraj1333:feat/dynamic-target-pids
Closed

feat: Dynamic target PIDs and target_pids_file for collector#1249
shivanshuraj1333 wants to merge 1 commit intoopen-telemetry:mainfrom
shivanshuraj1333:feat/dynamic-target-pids

Conversation

@shivanshuraj1333
Copy link
Copy Markdown
Member

The eBPF profiler currently instruments every process it sees. In huge environments there should be a way to filter out and provide which PIDs to use before starting the instrumentation, there's no way today to:

  1. Restrict profiling to a given set of PIDs.
  2. Change that set at runtime.

This PR adds the below functionalities to be able to solve the above problems

  • PID allowlist: Restrict instrumentation to a set of host PIDs via target_pids (static) or target_pids_file (path). Empty means instrument all.

  • Dynamic API: UpdateTargetPIDs(newSet), AddTargetPIDs(pids...), RemoveTargetPIDs(pids...) for runtime updates; removed PIDs are torn down

  • ProcessManager: TrackedPIDs() and RemoveFromInstrumentation(pid) to support revoking instrumentation.

  • target_pids_file: When set, the internal controller polls the file every 10s and applies the parsed PID list (newline or comma-separated).

  • CLI: -target-pids=1234,5678 and OTEL_PROFILING_AGENT_TARGET_PIDS for standalone runs.

@shivanshuraj1333 shivanshuraj1333 requested review from a team as code owners March 12, 2026 07:14
@shivanshuraj1333 shivanshuraj1333 marked this pull request as draft March 12, 2026 07:19
@florianl
Copy link
Copy Markdown
Member

This request aligns with #358 and #351.

eBPF profiler is designed as whole system profiler. Benchmarks of real world deployments show, that it's overhead on profiling all processes is minimal and negligible. Adding filtering on the eBPF level would introduce some noticeable complexity and maybe race conditions.

The Profiling SIG is working on making decisions on information, like filtering on a PID, more easily to access. For this reason some changes were made to the Profiling signal, that will be implemented in the eBPF profiler with #1234. Starting with #1234 the PID information will be a regular resource attribute and can be used by any OTel component like any other resource attribute.

@christos68k
Copy link
Copy Markdown
Member

Hi @shivanshuraj1333, as @florianl wrote, we do not plan to accept changes to the profiler core that will introduce filtering of processes since that's best performed by other components after profiles have been collected (e.g. using the regular OpenTelemetry Collector processing pipeline).

@shivanshuraj1333
Copy link
Copy Markdown
Member Author

@florianl do you mean to say that even at scale there's no need to only profile a subset of PIDs?

Say there are thousands of processes running in my node, and I just want profile data of just a handful of processes, why do we want to profile everything? Send that data to collector and filter and drop there?

IMO we should allow filtering by processes IDs, so if any agent is extending eBPF profiler they can just provide the correct PIDs and allow the instrumentation for only those PIDs.

Something similar is being done in the OBI as well.

eg open-telemetry/opentelemetry-ebpf-instrumentation#1321 and open-telemetry/opentelemetry-ebpf-instrumentation#1388 helps exactly that.

Do we have some benchmarks about profiling 10000s of processes vs a handful of processes somewhere? That'd be helpful.

@christos68k
Copy link
Copy Markdown
Member

christos68k commented Mar 12, 2026

@florianl do you mean to say that even at scale there's no need to only profile a subset
Do we have some benchmarks about profiling 10000s of processes vs a handful of processes somewhere? That'd be helpful.

Number of processes is the wrong granularity to use if you're trying to model the agent's behavior. The agent is not really profiling 10000s of processes at any given moment in time but is bounded by the number of CPU cores on the system. Periodically, it interrupts code execution and begins unwinding the stack for every non-idle CPU core.

@shivanshuraj1333
Copy link
Copy Markdown
Member Author

Thanks for clarifying, BUT

Two things I’m still weighing:

  1. Each sample still goes through the full HandleTrace path (symbolization, frame cache, reporting) for whatever PID was on that core, so the work per sample does depend on which PIDs we allow through.

  2. We call SynchronizeProcess for every PID in the events map, so how many processes we track and how much /proc work we do scale with that — an allowlist would cap it.

Even if we filter by PID in the collector (e.g. once #1234 lands), we still have to get the data there.

At scale - thousands of processes per node and thousands of nodes - “profile everyone and drop in the pipeline” means the collector has to receive, parse, and process every profile from every node before it can filter.

Doing PID filtering at the agent means we only instrument what we want not the whole node.

I’m fine relying on downstream filtering if that’s the direction you prefer; I’m just not sure about the impact at that scale.

If there’s a doc or benchmark for “profile all, filter in pipeline” under those conditions, that would help me understand the tradeoff.

If an opt-in PID allowlist (no change when unset) is acceptable, we could keep the patch small and still give integrators a way to reduce load on the agent and the collector.

Thanks again, let me know your thoughts!

@christos68k
Copy link
Copy Markdown
Member

christos68k commented Mar 12, 2026

At scale - thousands of processes per node and thousands of nodes - “profile everyone and drop in the pipeline” means the collector has to receive, parse, and process every profile from every node before it can filter.

There are multiple ways to deploy the profiling agent. The recommended deployment method is to run it inside an OTel collector as a receiver. Then, the entire system (collector+profiler) runs on the edge Node and can perform filtering for that Node before it forwards the filtered data to a second-level backend collector for additional processing. This means that the filtering overhead that you describe (for thousands of nodes) goes away, as filtering (and any other post-processing) stays local to the Node.

Doing PID filtering at the agent means we only instrument what we want not the whole node.

We've discussed this a couple of times and for now this is the direction that we want to pursue. This is not a disconnected opinion, as we've also been running the profiler in production environments for years, including on Nodes with 100+ CPU cores, and we're using that experience to guide our decisions.

That's not to say that we'll not re-evaluate in the future just that the focus for now remains on whole-system profiling.

@shivanshuraj1333
Copy link
Copy Markdown
Member Author

I see, but I don't understand what is the drawback of instrumenting based on the provided PIDs.

By default let's instrument the whole node, but if there's a list of PIDs available, how about respecting those and just instrumenting those? I don't see any problem but benefits with this design, thoughts?

@christos68k
Copy link
Copy Markdown
Member

christos68k commented Mar 12, 2026

I see, but I don't understand what is the drawback of instrumenting based on the provided PIDs.

By default let's instrument the whole node, but if there's a list of PIDs available, how about respecting those and just instrumenting those? I don't see any problem but benefits with this design, thoughts?

As a maintainer I have to be cognizant of the second order effects (e.g. maintenance, testing burden, sidelining of focus, change of precedent) that a given contribution brings with it.

There are multiple use-cases for the eBPF profiler that do not align with our current focus. This doesn't mean that one can't still pursue them: downstream users can use and extend the OpenTelemetry eBPF profiler and keep their modifications local to themselves (outside of upstream).

@shivanshuraj1333
Copy link
Copy Markdown
Member Author

sounds reasonable, if we want can help with the maintenance of this side of code, IMO it looks like a minimal change but can have some good output.

let me know if we want to have this in upstream, happy to help!

@damemi
Copy link
Copy Markdown
Member

damemi commented Mar 12, 2026

@florianl @christos68k Out of curiosity is there any reason to be concerned about the security of full-system profiling? Even though the output can be filtered, I can imagine that some end users might want to exclude some apps from ebpf profiling.

I'll admit that I'm not familiar with the project's overall architecture, but for example if we were to offer this to an end user with sensitive applications how would you best address a security concern like that?

@christos68k
Copy link
Copy Markdown
Member

christos68k commented Mar 12, 2026

I'll admit that I'm not familiar with the project's overall architecture, but for example if we were to offer this to an end user with sensitive applications how would you best address a security concern like that?

What is the exact security concern / threat model? The eBPF profiler loads and runs code inside the kernel that can arbitrarily access userspace memory. The information that the profiling agent currently extracts (read-only) is limited to symbols and some process / executable metadata. If this information is deemed security sensitive, then filtering it out in the same process (typically running with elevated privileges) the profiling agent executes is one option.

If that won't do then one can always add more fine-grained filtering inside the profiler in a downstream solution.

@florianl
Copy link
Copy Markdown
Member

One additional reason I'm not in favor of such a change is the complexity, that it puts on users to manage/orchestrate the right PID(s) for such a filter.

If source code is sensitive information can be discussed - but other than lines of code, the profiler does not extract sensitive information that would justify a security concerns, from my point of view.

If you want to see the eBPF profiler in an more isolated environment, my preference would be to make it sidecar deployment compatible - see WIP like #1172.

@shivanshuraj1333
Copy link
Copy Markdown
Member Author

One additional reason I'm not in favor of such a change is the complexity, that it puts on users to manage/orchestrate the right PID(s) for such a filter.

I think there's some gap in communication here, the main idea I want to push for is like below:


User:
"I only want profiling data for svc A & svc B, BUT I also run thousands of other svcs on my node too"

eBPF profiler:

  • Okay, I'll instrument whole node (all the thousands of processes)
  • Collector will process profiling data from whole node, and then you can get profiling data for Svc A and Svc B by filtering it out

Drawbacks:

  1. too much data processing on collector (think the same collector is also doing MELT processing)
  2. user may refrain back from using the profiler raising security concerns as they don't want the profiler to interfere with other processes for whatsoever reason.

Proposed solution to solve the above two problems:

  1. add a knob in eBPF profiler to respect PIDs for instrumentation.
  2. hide it behind a feature flag
  3. default setting is still instrument the whole node

Out of scope:

  • How user pipe those PIDs to eBPF if they enable the flag

They can use a collector extension, or some piece of code, to pipe those PIDs to profiler, but this is out of scope, we will only provide the knob to run profiler to specific PIDs, that's it.


Now the user can be happy, as there's a way to do that dynamically.

The default setting here will still be "pipe all PIDs and instrument everything", it is just that, now the user has freedom to tweak if they are facing problems at collector and want minimal data collection on the left side (agent).

So, IMO it would be a great feat, maybe hidden behind a feature flag, and having something like this in profiler is only going to give a knob to help user tweak, if they have to.

@christos68k
Copy link
Copy Markdown
Member

christos68k commented Mar 12, 2026

@shivanshuraj1333 I see code as a liability, which goes back to my previous comment to you re: focus and maintainability. Feature flag doesn't change the fact that there is more code for something we consider not a current focus, more potential failure modes, incompatibilities with the current process handling model (calling processPIDExit in this way is a hack) and additional constraints on us in that we now have to take the newly added logic into account when we make changes to those subsystems. Moreover, your PR contains a lot of references to "instrumentation" but that's not what the profiler is doing.

It seems to me that you'd be a lot happier if you implemented your ask downstream than trying to get it accepted upstream, when the synnergy isn't there.

I also agree with @florianl and his comment here: #1172 is much more in tune with the existing model of the profiler, but I still think it's not justifiable complexity-wise. However, if it can be drastically simplified like @florianl suggested my current concerns would largely disappear.

@damemi
Copy link
Copy Markdown
Member

damemi commented Mar 13, 2026

my question re: security was more in a general sense. we have a lot of users who are very conscious about running ebpf in production, and I could see the idea of full-system ebpf profiling being a no-go for them. it's hypothetical and proactive, but I was just asking @florianl and @christos68k how would you explain that to a user? say someone that asks, "we want to profile everything in this namespace, but for compliance/legal/whatever we cannot touch anything in that namespace".

it's clear that this isn't a feature that aligns with upstream, so at this point we're really just trying to understand it more since there seems to be a gap in our assumptions about it. if that's the case, then even implementing it downstream would be the wrong move (building something that isn't correct). thanks for your time in answering our questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants