feat: add PID namespace translation support for sidecar deployments#1172
feat: add PID namespace translation support for sidecar deployments#11727za wants to merge 3 commits intoopen-telemetry:mainfrom
Conversation
| if (ret < 0) { | ||
| DEBUG_PRINT( | ||
| "failed to get namespace PID/TGID (%llu, %llu)", target_pid_ns_dev, target_pid_ns_inode); | ||
| return 0; |
There was a problem hiding this comment.
Isn't this incompatible with the whole-host nature of the profiler? E.g. won't this limit the processes that the profiler can profile to those seen from the namespace the profiler runs in?
There was a problem hiding this comment.
Thank you for your comment !
This change does not restrict the profiler's native "whole-host" visibility. The eBPF remains attached to the host kernel and continues to intercept events across the entire system. The bpf_get_ns_current_pid_tgid helper is used specifically to perform an in-kernel translation to retrieve the "Container PID" only when a match is found with the target namespace.
Key points:
This feature is disabled by default. If not explicitly configured, no translation is performed, preserving the original behavior.
his is specifically designed for sidecar deployments (e.g., using Grafana Alloy) where security constraints favor shareProcessNamespace: true over the more permissive hostPID: true.
There was a problem hiding this comment.
I'll run some tests locally to better understand the use case. Just looking at the code tells me that the whole-host visibility of the profiler is indeed restricted when running in this mode (we shouldn't only be looking at eBPF in isolation, but at the entire profiler as a system) but maybe I'm misunderstanding.
For more context, we've had similar requests in the past (e.g. limiting profiling to "special" processes only for performance reasons) that we decided not to support. Maybe it's worth it to make an exception in this case, but let's first understand better what the tradeoffs are.
There was a problem hiding this comment.
Just to be sure we are on the same line,:
the profiler itself when running this mode, will translate the host pid to the corresponding namespace PID. This is necessary when the application embedding the profiler runs inside a container without hostPID: true (for security reason).
In that case, the profiling application can only see the processes running inside container of the same pods (using shareProcesssNamespace). This is where the translation is needed to match the PID seen from the namespace to the PID raised by the host.
This is not for perf reason, but more because of deployment (as a sidecar) and security constraints (do not use hostPID: true).
Using alloy (>= v1.11.0) with a simple collection of yaml for kind (I can share it) can be a good way to test this.
There was a problem hiding this comment.
The question here is whether we want to support this execution mode and assume the maintenance/support burden. We designed the profiler to be a whole system profiler and as such we require the profiler to be able to access all processes running on the host (not just limited to the container the userspace process executes in) and thus run with hostPID: true.
Other configurations and deployment scenarios of course exist but we're not required to support them. We've turned away people in the past that had similar (conflict with whole-host profiling) functionality requests which set a precedent. If we accept this PR we'd both be going against this precedent and also setting a new one.
Personally, I want to focus on the whole-host nature of the profiler and not be side-tracked with code that works against this paradigm but I'm not the only maintainer.
CC: @open-telemetry/ebpf-profiler-maintainers
There was a problem hiding this comment.
thank you for this clarification
I get your point, and I understand the overall design behind that (having a host profiler instead of containerized profiler)., altough I didn't know that ebpf profiler was required to work in this mode only.
When enable_namespace_pid is set, translate host PIDs/TGIDs into the profiler’s PID namespace so sidecar deployments report container PIDs (e.g. PID 1) instead of host PIDs. This is useful when the profiler is embedded into an application running in a sidecar container. The feature needs to be enable using configuration, and to have BTF support available.
- Add simple test checking struct offset computation used by namespace pid translation. - Tracer initialization fails if BTF is not available but namespace PID translation is enabled.
|
Personally I would favor if there is a general agreement on #1178 first. This would allow to bump the minimal supported Linux kernel version to 5.10 and would allow the use of bpf helpers like |
Fixes open-telemetry#1178 And enables work on open-telemetry#1172 and open-telemetry#1257. Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
This change introduces the ability for the eBPF profiler to translate host-level PIDs and TGIDs into their corresponding values within a target PID namespace.
In sidecar scenarios, the profiler often runs in a different PID namespace than the target application, leading to a mismatch between the PIDs reported in profiles (host view) and the PIDs seen by operators inside the container (typically PID 1).
Changes:
get_ns_current_pid_tgidfunction to resolve namespaced IDs.EnableNamespacePIDconfiguration options to control the translation logic.Test
Running kind or any kubernetes cluster with a grafana/alloy (compiled with this branch) sidecar container watching processes of containers of the same pod. (work with sharedProcessNamespace, and doesn't require hostPID: true)