feat: Dynamic target PIDs and target_pids_file for collector#1249
feat: Dynamic target PIDs and target_pids_file for collector#1249shivanshuraj1333 wants to merge 1 commit intoopen-telemetry:mainfrom
Conversation
|
This request aligns with #358 and #351. eBPF profiler is designed as whole system profiler. Benchmarks of real world deployments show, that it's overhead on profiling all processes is minimal and negligible. Adding filtering on the eBPF level would introduce some noticeable complexity and maybe race conditions. The Profiling SIG is working on making decisions on information, like filtering on a PID, more easily to access. For this reason some changes were made to the Profiling signal, that will be implemented in the eBPF profiler with #1234. Starting with #1234 the PID information will be a regular resource attribute and can be used by any OTel component like any other resource attribute. |
|
Hi @shivanshuraj1333, as @florianl wrote, we do not plan to accept changes to the profiler core that will introduce filtering of processes since that's best performed by other components after profiles have been collected (e.g. using the regular OpenTelemetry Collector processing pipeline). |
|
@florianl do you mean to say that even at scale there's no need to only profile a subset of PIDs? Say there are thousands of processes running in my node, and I just want profile data of just a handful of processes, why do we want to profile everything? Send that data to collector and filter and drop there? IMO we should allow filtering by processes IDs, so if any agent is extending eBPF profiler they can just provide the correct PIDs and allow the instrumentation for only those PIDs. Something similar is being done in the OBI as well. eg open-telemetry/opentelemetry-ebpf-instrumentation#1321 and open-telemetry/opentelemetry-ebpf-instrumentation#1388 helps exactly that. Do we have some benchmarks about profiling 10000s of processes vs a handful of processes somewhere? That'd be helpful. |
Number of processes is the wrong granularity to use if you're trying to model the agent's behavior. The agent is not really profiling 10000s of processes at any given moment in time but is bounded by the number of CPU cores on the system. Periodically, it interrupts code execution and begins unwinding the stack for every non-idle CPU core. |
|
Thanks for clarifying, BUT Two things I’m still weighing:
Even if we filter by PID in the collector (e.g. once #1234 lands), we still have to get the data there. At scale - thousands of processes per node and thousands of nodes - “profile everyone and drop in the pipeline” means the collector has to receive, parse, and process every profile from every node before it can filter. Doing PID filtering at the agent means we only instrument what we want not the whole node. I’m fine relying on downstream filtering if that’s the direction you prefer; I’m just not sure about the impact at that scale. If there’s a doc or benchmark for “profile all, filter in pipeline” under those conditions, that would help me understand the tradeoff. If an opt-in PID allowlist (no change when unset) is acceptable, we could keep the patch small and still give integrators a way to reduce load on the agent and the collector. Thanks again, let me know your thoughts! |
There are multiple ways to deploy the profiling agent. The recommended deployment method is to run it inside an OTel collector as a receiver. Then, the entire system (collector+profiler) runs on the edge Node and can perform filtering for that Node before it forwards the filtered data to a second-level backend collector for additional processing. This means that the filtering overhead that you describe (for thousands of nodes) goes away, as filtering (and any other post-processing) stays local to the Node.
We've discussed this a couple of times and for now this is the direction that we want to pursue. This is not a disconnected opinion, as we've also been running the profiler in production environments for years, including on Nodes with 100+ CPU cores, and we're using that experience to guide our decisions. That's not to say that we'll not re-evaluate in the future just that the focus for now remains on whole-system profiling. |
|
I see, but I don't understand what is the drawback of instrumenting based on the provided PIDs. By default let's instrument the whole node, but if there's a list of PIDs available, how about respecting those and just instrumenting those? I don't see any problem but benefits with this design, thoughts? |
As a maintainer I have to be cognizant of the second order effects (e.g. maintenance, testing burden, sidelining of focus, change of precedent) that a given contribution brings with it. There are multiple use-cases for the eBPF profiler that do not align with our current focus. This doesn't mean that one can't still pursue them: downstream users can use and extend the OpenTelemetry eBPF profiler and keep their modifications local to themselves (outside of upstream). |
|
sounds reasonable, if we want can help with the maintenance of this side of code, IMO it looks like a minimal change but can have some good output. let me know if we want to have this in upstream, happy to help! |
|
@florianl @christos68k Out of curiosity is there any reason to be concerned about the security of full-system profiling? Even though the output can be filtered, I can imagine that some end users might want to exclude some apps from ebpf profiling. I'll admit that I'm not familiar with the project's overall architecture, but for example if we were to offer this to an end user with sensitive applications how would you best address a security concern like that? |
What is the exact security concern / threat model? The eBPF profiler loads and runs code inside the kernel that can arbitrarily access userspace memory. The information that the profiling agent currently extracts (read-only) is limited to symbols and some process / executable metadata. If this information is deemed security sensitive, then filtering it out in the same process (typically running with elevated privileges) the profiling agent executes is one option. If that won't do then one can always add more fine-grained filtering inside the profiler in a downstream solution. |
|
One additional reason I'm not in favor of such a change is the complexity, that it puts on users to manage/orchestrate the right PID(s) for such a filter. If source code is sensitive information can be discussed - but other than lines of code, the profiler does not extract sensitive information that would justify a security concerns, from my point of view. If you want to see the eBPF profiler in an more isolated environment, my preference would be to make it sidecar deployment compatible - see WIP like #1172. |
I think there's some gap in communication here, the main idea I want to push for is like below: User: eBPF profiler:
Drawbacks:
Proposed solution to solve the above two problems:
Out of scope:
They can use a collector extension, or some piece of code, to pipe those PIDs to profiler, but this is out of scope, we will only provide the knob to run profiler to specific PIDs, that's it. Now the user can be happy, as there's a way to do that dynamically. The default setting here will still be "pipe all PIDs and instrument everything", it is just that, now the user has freedom to tweak if they are facing problems at collector and want minimal data collection on the left side (agent). So, IMO it would be a great feat, maybe hidden behind a feature flag, and having something like this in profiler is only going to give a knob to help user tweak, if they have to. |
|
@shivanshuraj1333 I see code as a liability, which goes back to my previous comment to you re: focus and maintainability. Feature flag doesn't change the fact that there is more code for something we consider not a current focus, more potential failure modes, incompatibilities with the current process handling model (calling It seems to me that you'd be a lot happier if you implemented your ask downstream than trying to get it accepted upstream, when the synnergy isn't there. I also agree with @florianl and his comment here: #1172 is much more in tune with the existing model of the profiler, but I still think it's not justifiable complexity-wise. However, if it can be drastically simplified like @florianl suggested my current concerns would largely disappear. |
|
my question re: security was more in a general sense. we have a lot of users who are very conscious about running ebpf in production, and I could see the idea of full-system ebpf profiling being a no-go for them. it's hypothetical and proactive, but I was just asking @florianl and @christos68k how would you explain that to a user? say someone that asks, "we want to profile everything in this namespace, but for compliance/legal/whatever we cannot touch anything in that namespace". it's clear that this isn't a feature that aligns with upstream, so at this point we're really just trying to understand it more since there seems to be a gap in our assumptions about it. if that's the case, then even implementing it downstream would be the wrong move (building something that isn't correct). thanks for your time in answering our questions! |
The eBPF profiler currently instruments every process it sees. In huge environments there should be a way to filter out and provide which PIDs to use before starting the instrumentation, there's no way today to:
This PR adds the below functionalities to be able to solve the above problems
PID allowlist: Restrict instrumentation to a set of host PIDs via
target_pids(static) ortarget_pids_file(path). Empty means instrument all.Dynamic API:
UpdateTargetPIDs(newSet),AddTargetPIDs(pids...),RemoveTargetPIDs(pids...)for runtime updates; removed PIDs are torn downProcessManager:
TrackedPIDs()andRemoveFromInstrumentation(pid)to support revoking instrumentation.target_pids_file: When set, the internal controller polls the file every 10s and applies the parsed PID list (newline or comma-separated).
CLI:
-target-pids=1234,5678andOTEL_PROFILING_AGENT_TARGET_PIDSfor standalone runs.