Skip to content

feat: add PID namespace translation support for sidecar deployments#1172

Open
7za wants to merge 3 commits intoopen-telemetry:mainfrom
7za:ff10/nspid
Open

feat: add PID namespace translation support for sidecar deployments#1172
7za wants to merge 3 commits intoopen-telemetry:mainfrom
7za:ff10/nspid

Conversation

@7za
Copy link
Copy Markdown

@7za 7za commented Feb 13, 2026

This change introduces the ability for the eBPF profiler to translate host-level PIDs and TGIDs into their corresponding values within a target PID namespace.

In sidecar scenarios, the profiler often runs in a different PID namespace than the target application, leading to a mismatch between the PIDs reported in profiles (host view) and the PIDs seen by operators inside the container (typically PID 1).

Changes:

  • Implement the get_ns_current_pid_tgid function to resolve namespaced IDs.
  • Added EnableNamespacePID configuration options to control the translation logic.
  • Implemented automatic PID namespace metadata retrieval in the Go component to feed the BPF RODATA variables.
  • Added safety checks to fallback to host PIDs if the namespace translation fails (e.g., on kernel without BTF support)

Test

Running kind or any kubernetes cluster with a grafana/alloy (compiled with this branch) sidecar container watching processes of containers of the same pod. (work with sharedProcessNamespace, and doesn't require hostPID: true)

@7za 7za requested review from a team as code owners February 13, 2026 10:53
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Feb 13, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@7za 7za marked this pull request as draft February 13, 2026 10:53
@7za 7za marked this pull request as ready for review February 13, 2026 10:54
@7za 7za marked this pull request as draft February 13, 2026 10:54
if (ret < 0) {
DEBUG_PRINT(
"failed to get namespace PID/TGID (%llu, %llu)", target_pid_ns_dev, target_pid_ns_inode);
return 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this incompatible with the whole-host nature of the profiler? E.g. won't this limit the processes that the profiler can profile to those seen from the namespace the profiler runs in?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment !

This change does not restrict the profiler's native "whole-host" visibility. The eBPF remains attached to the host kernel and continues to intercept events across the entire system. The bpf_get_ns_current_pid_tgid helper is used specifically to perform an in-kernel translation to retrieve the "Container PID" only when a match is found with the target namespace.

Key points:
This feature is disabled by default. If not explicitly configured, no translation is performed, preserving the original behavior.
his is specifically designed for sidecar deployments (e.g., using Grafana Alloy) where security constraints favor shareProcessNamespace: true over the more permissive hostPID: true.

Copy link
Copy Markdown
Member

@christos68k christos68k Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll run some tests locally to better understand the use case. Just looking at the code tells me that the whole-host visibility of the profiler is indeed restricted when running in this mode (we shouldn't only be looking at eBPF in isolation, but at the entire profiler as a system) but maybe I'm misunderstanding.

For more context, we've had similar requests in the past (e.g. limiting profiling to "special" processes only for performance reasons) that we decided not to support. Maybe it's worth it to make an exception in this case, but let's first understand better what the tradeoffs are.

Copy link
Copy Markdown
Author

@7za 7za Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure we are on the same line,:
the profiler itself when running this mode, will translate the host pid to the corresponding namespace PID. This is necessary when the application embedding the profiler runs inside a container without hostPID: true (for security reason).
In that case, the profiling application can only see the processes running inside container of the same pods (using shareProcesssNamespace). This is where the translation is needed to match the PID seen from the namespace to the PID raised by the host.
This is not for perf reason, but more because of deployment (as a sidecar) and security constraints (do not use hostPID: true).
Using alloy (>= v1.11.0) with a simple collection of yaml for kind (I can share it) can be a good way to test this.

Copy link
Copy Markdown
Member

@christos68k christos68k Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question here is whether we want to support this execution mode and assume the maintenance/support burden. We designed the profiler to be a whole system profiler and as such we require the profiler to be able to access all processes running on the host (not just limited to the container the userspace process executes in) and thus run with hostPID: true.

Other configurations and deployment scenarios of course exist but we're not required to support them. We've turned away people in the past that had similar (conflict with whole-host profiling) functionality requests which set a precedent. If we accept this PR we'd both be going against this precedent and also setting a new one.

Personally, I want to focus on the whole-host nature of the profiler and not be side-tracked with code that works against this paradigm but I'm not the only maintainer.

CC: @open-telemetry/ebpf-profiler-maintainers

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this clarification
I get your point, and I understand the overall design behind that (having a host profiler instead of containerized profiler)., altough I didn't know that ebpf profiler was required to work in this mode only.

7za added 2 commits February 17, 2026 10:18
When enable_namespace_pid is set, translate host PIDs/TGIDs into the
profiler’s PID namespace so sidecar deployments report container PIDs
(e.g. PID 1) instead of host PIDs.

This is useful when the profiler is embedded into an application running in a
sidecar container.

The feature needs to be enable using configuration, and to have BTF support
available.
- Add simple test checking struct offset computation used by namespace pid
  translation.
- Tracer initialization fails if BTF is not available but namespace PID
  translation is enabled.
@7za 7za marked this pull request as ready for review February 19, 2026 10:28
@7za 7za changed the title feat: add PID namespace translation support for sidecar deployments [WIP] feat: add PID namespace translation support for sidecar deployments Feb 19, 2026
@florianl
Copy link
Copy Markdown
Member

Personally I would favor if there is a general agreement on #1178 first. This would allow to bump the minimal supported Linux kernel version to 5.10 and would allow the use of bpf helpers like bpf_get_ns_current_pid_tgid() and so reducing the complexity of this change significantly.

florianl added a commit to florianl/opentelemetry-ebpf-profiler that referenced this pull request Mar 17, 2026
Fixes open-telemetry#1178

And enables work on open-telemetry#1172 and open-telemetry#1257.

Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
@github-actions
Copy link
Copy Markdown

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Mar 21, 2026
@florianl
Copy link
Copy Markdown
Member

florianl commented Apr 8, 2026

@7za as #1310 got merged and bumped the minimum required Linux kernel version to 5.10, can you update this PR and use the bpf helper?

@7za
Copy link
Copy Markdown
Author

7za commented Apr 8, 2026

@7za as #1310 got merged and bumped the minimum required Linux kernel version to 5.10, can you update this PR and use the bpf helper?

Perfect, will do it, thank you

@github-actions github-actions bot removed the Stale label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants