Skip to content

Fix unreadable Enhanced Session Recording events#64581

Open
capnspacehook wants to merge 4 commits intomasterfrom
capnspacehook/ebpf-tracing-refactor
Open

Fix unreadable Enhanced Session Recording events#64581
capnspacehook wants to merge 4 commits intomasterfrom
capnspacehook/ebpf-tracing-refactor

Conversation

@capnspacehook
Copy link
Contributor

@capnspacehook capnspacehook commented Mar 12, 2026

Sometimes the eBPF code isn't able to read the arguments of an execve call or the path of a file open call due to the kernel not having had a chance to page in the memory yet. Command and disk events previously traced execve and open syscalls and the eBPF code is called before the syscall is handled, before the kernel pages in the appropriate memory. The fix is to trace at more reliable points that happen after the kernel has had a chance to provision things a bit.

Instead of tracing execve syscalls we now hook when the scheduler in the kernel is starting a process, and instead of tracing open syscalls we hook when the LSM system checks if the caller is allowed to open a file.

The different trace/hook points result in slightly different events that are or are not emitted, but most users will likely never notice. The main difference is that instead of the path passed to the open syscall being emitted in an event, the absolute path of the path is emitted instead. By the time the LSM system is checking a file symlinks have been resolved for example.

The handling of arguments when handling command events was simplified, somewhat out of necessity. Previously when an execve call first started the eBPF code sent one event to the userland Go code for every argument in the traced execve call. When the execve call finished the rest of the details were sent to the Go code, and an audit event was emitted. This was easy to do as the arguments to the execve call were presented as an array of strings. When hooking the scheduler though we are given a large buffer containing all arguments delimited by null bytes. I opted to simplify the process of sending and receiving command arguments and just send the buffer over and letting Go split the buffer up.

Previously when parsing command event arguments we limited emitting N arguments and truncated each to M bytes. Now since the args are given as one big buffer I opted to just truncate the given args to fit in our buffer, which renders many of the argument handling specific sub-tests irrelevant.

I thought the file open code had to be expanded, as getting an absolute path from a struct path is no easy feat. I originally copied tracee's code that does so until I discovered bpf_d_path, a helper func that does exactly what I want. The only catch is it requires a kernel of 5.10, and we require 5.8 currently. I left in the commit that uses tracee's code just in case bumping the minimum kernel requirement is a non-starter.

Finally, because we previously traced syscalls the audit events contained a return code field which was the return value of the syscall invocation. Because we are now hooking the kernel's process scheduler and LSM internals a useful return code isn't available which means the same thing, so I don't set the return code for events.

Fixes https://github.com/gravitational/teleport-private/issues/2230.

Changelog: fix Enhanced Session Recording dropping command and disk events under specific circumstances

Manual Test Plan

Test Environment

All tests were preformed on both amd64 and arm64 hosts, and both with PAM disabled and enabled.
Additionally the ESR integration tests were run as well (they don't currently run in CI).
The integration tests can be run manually with make test-bpf

Test Cases

  • Verify that expected command audit events are emitted
  • Verify that expected disk audit events are emitted
  • Verify that expected network audit events are emitted

@capnspacehook capnspacehook added bpf Used to bugs with bpf and enhanced session recording. server-access labels Mar 12, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d932a819e0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@capnspacehook
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c335adbc3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +71 to +75
data->flags = BPF_CORE_READ(f, f_flags);
data->cgroup = bpf_get_current_cgroup_id();
data->audit_session_id = BPF_CORE_READ(task, sessionid);

bpf_map_delete_elem(&infotmp, &id);
bpf_ringbuf_submit(data, 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Initialize disk event return code before submit

After switching to bpf_ringbuf_reserve, this code never writes data->return_code before bpf_ringbuf_submit. The reserved ringbuf payload is not guaranteed to be zeroed, so SessionDisk.ReturnCode can contain stale values from prior records, which corrupts audit data for every emitted disk event and breaks consumers that rely on return codes to distinguish successful vs failed opens.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bpf Used to bugs with bpf and enhanced session recording. server-access size/sm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant