Uniform debug print messages in ebpf code #941

pinoOgni · 2025-11-27T09:44:38Z

This PR is a proposal to try to find a more uniform way to print ebpf debug information.

I only modified a few debug prints as a test and wrote the bpf-print-format.md file to document the proposal.

I'm open to suggestions.

Signed-off-by: Giuseppe Ognibene <[email protected]>

codecov · 2025-11-27T09:50:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.85%. Comparing base (b4c3b17) to head (cf8fcf3).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #941      +/-   ##
==========================================
+ Coverage   47.13%   55.85%   +8.71%     
==========================================
  Files         255      255              
  Lines       22139    22139              
==========================================
+ Hits        10436    12366    +1930     
+ Misses      11041     8935    -2106     
- Partials      662      838     +176

Flag	Coverage Δ
integration-test	`22.86% <ø> (?)`
integration-test-arm	`0.00% <ø> (?)`
integration-test-vm-${ARCH}-${KERNEL_VERSION}	`0.00% <ø> (?)`
k8s-integration-test	`2.67% <ø> (?)`
oats-test	`0.00% <ø> (?)`
unittests	`47.13% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

marctc · 2025-11-27T09:51:31Z

Is there a way to avoid passing __FUNCTION__ all the time? In logs you changed it's always almost there, I wonder if we could be part of the logger function itself or so

pinoOgni · 2025-11-27T10:49:34Z

Is there a way to avoid passing __FUNCTION__ all the time? In logs you changed it's always almost there, I wonder if we could be part of the logger function itself or so

Hi @marctc I hope I understand the question. I pass __FUNCTION__ so that the name of the current function is printed. I don't know of any other way for the logger function to print the name of the caller.

grcevski

This is a good idea, I'd propose that we think a bit more on how to do this that will have the best outcome. At the moment the log line is limited to 80 chars, since we send a lot of them and we want to capture as many as possible. I'm afraid that if we prefix the line with the function name, we'll end up losing valuable information, which is the values we keep.

grcevski · 2025-11-27T14:18:09Z

Another thing to watch out is that kernel 5.15 and lower support only up to 3 arguments in the format, so adding one more will cause us to go on separate lines for many of our prints.

Perhaps the best way to do this is to ensure that all probes have printed the function name on all entry points.

pinoOgni · 2025-11-27T14:30:35Z

Another thing to watch out is that kernel 5.15 and lower support only up to 3 arguments in the format, so adding one more will cause us to go on separate lines for many of our prints.

There's a section at the end of the document about this problem. The solutions I could think of were splitting or hard-coding the function name.

Perhaps the best way to do this is to ensure that all probes have printed the function name on all entry points.

Yes, but that's not enough. If there are two printouts with the same (or similar) message in two different probes, we could have a mix-up. The only way to ensure uniqueness is for each printout to print the function name.

I had also thought about adding an additional argument to some functions so that the probe name would be passed and then printed, but it seemed like too much.

mmat11 · 2025-11-27T14:35:16Z

This is a good idea, I'd propose that we think a bit more on how to do this that will have the best outcome. At the moment the log line is limited to 80 chars, since we send a lot of them and we want to capture as many as possible. I'm afraid that if we prefix the line with the function name, we'll end up losing valuable information, which is the values we keep.

Good point, I thought about this and wondered why do we send data to userspace via a ringbuffer instead of reading directly from the trace pipe? That just requires tracefs/debugfs to be mounted but I think debug will only be enabled in dev/test environments so it should be ok.

That will have benefits such as:

halve the amount of objects we compile
remove the limitation of 80 bytes per log event
keep the instructions amount at a similar level to prod builds, since we won't need to use bpf_dbg_helper anymore
make the code simpler

Are there things I'm not taking into consideration for this approach?

marctc · 2025-11-27T15:10:23Z

requires tracefs/debugfs to be mounted but I think debug will only be enabled in dev/test environments so it should be ok.

I don't how tracefs/debugfs works but sometimes we ask users to enable BPF logging to investigate bugs regardless their environment they use to run OBI.

rafaelroquetto

Nice, I think this will definitely help with the current state of the onion.

In addition to my inline comments, I think we need some sort of linter to validate this (for the cases which cannot be enforced by code), otherwise people will eventually and unintentionally deviate from these rules.

rafaelroquetto · 2025-11-27T16:06:29Z

bpf/tctracer/tctracer.c

        if (invocation) {
-            bpf_dbg_printk(
-                "Found Go HTTP invocation, resetting the span id to %x%x", tcp->seq, tcp->ack);
+            bpf_dbg_printk("%s: found Go HTTP invocation, resetting the span id to seq=%x, ack=%x",


I believe this is a mistake - the span id here is indeed meant to be the combination of %x%x, i.e. the concatenation of seq and ack.

ops, thanks!

rafaelroquetto · 2025-11-27T16:18:11Z

devdocs/bpf-print-format.md

+Addressing issues like logging multiple variables without a separator (e.g., avoiding `%x%x`). Example:
+
+```c
+bpf_dbg_printk("%s: found HTTP info, resetting the span id to seq=%x, ack=%x", __FUNCTION__, tcp->seq, tcp->ack);


I get the point, but this is a bad example (see previous comment)

yes, thanks

rafaelroquetto · 2025-11-27T16:19:26Z

devdocs/bpf-print-format.md

+At the beginning of an eBPF probe (not every probe, as it can be too verbose\!), print the function name in the following way:
+
+```c
+bpf_dbg_printk("=== %s ===", __FUNCTION__);


we could wrap this in a macro, so that one simply writes:

bpf_dbg_log_func()

rafaelroquetto · 2025-11-27T16:24:46Z

devdocs/bpf-print-format.md

+
+#### Function name in generic functions
+
+To print the function name in a generic function, use the **`__FUNCTION__`** identifier in **every log statement** without the bounding **`===`**. The triple equals signs are reserved exclusively for entry points of eBPF probes. Example:


I see the value here, but there's an issue as others have pointed out: some kernels only allow at most 3 arguments, and now we are "wasting" one with __FUNCTION__ - this has been a problem before, and you will notice that because of that some bpf_dbg_printk statements had to be split into at least 2 as a workaround.

This why I've explicitly opted to spell out the function name as part of the format string rather than using __FUNCTION__ in tpinjector.c and friends. I think it's a fair trade-off, given we are dealing with eBPF, but YMMV.

Now, if we want something like that anyway, we could probably have a macro like this:

#define bpf_func_printk(fmt, ...) \ bpf_printk("%s: " fmt, __func__, ##__VA_ARGS__)

(or wrap around bpf_dbg_printk, I haven't tried with that).

If we do that, it should be a separate macro, whilst bpf_dbg_printk remains as is, giving the caller the flexibility to pick - think of it as a convenience macro.

rafaelroquetto · 2025-11-27T16:26:21Z

devdocs/bpf-print-format.md

+
+It can happen, sometimes, that the number of arguments to print is 3, but when `__FUNCTION__` is included, the total reaches 4. In these cases, `bpf_dbg_printk` calls `bpf_printk`, which subsequently calls `___bpf_pick_printk`. This chain then selects and calls `__bpf_vprintk` (for 4+ arguments), which finally invokes `bpf_trace_vprintk`. Since `bpf_trace_vprintk` is only available starting in [kernel version 5.16](https://docs.ebpf.io/linux/helper-function/bpf_trace_vprintk/), two options are available to avoid errors when loading the eBPF program:
+
+1. Split the print into two separate calls: one with one argument and one with two arguments, where `__FUNCTION__` is added to both.


So, in light of this comment, perhaps the way is to simply provide the convenience macro above - makes it flexible.

rafaelroquetto · 2025-11-27T16:30:31Z

Good point, I thought about this and wondered why do we send data to userspace via a ringbuffer instead of reading directly from the trace pipe? That just requires tracefs/debugfs to be mounted but I think debug will only be enabled in dev/test environments so it should be ok.

As @marctc pointed out, this is because 99.9% of the logs end users can provide us comes from the usual OBI logs - people don't know how or don't have access to trace_pipe.

It also means sometimes logs lines can get lost, when the ring buffer is under pressure, but that's the nature of it.

mmat11 · 2025-11-27T16:33:52Z

I don't how tracefs/debugfs works but sometimes we ask users to enable BPF logging to investigate bugs regardless their environment they use to run OBI.

As @marctc pointed out, this is because 99.9% of the logs end users can provide us comes from the usual OBI logs - people don't know how or don't have access to trace_pipe.

Could mounting host /sys in the helm chart work?

grcevski · 2025-11-27T16:40:38Z

I don't how tracefs/debugfs works but sometimes we ask users to enable BPF logging to investigate bugs regardless their environment they use to run OBI.

As @marctc pointed out, this is because 99.9% of the logs end users can provide us comes from the usual OBI logs - people don't know how or don't have access to trace_pipe.

Could mounting host /sys in the helm chart work?

It might, but in a sense it's not always enough. For some platforms like GKE Autopilot we need to explicitly make that access as something we need, which reminds me I need to follow-up on the update about the cgroup. Also sometimes the trace pipe might be held by another program, although it's rare.

grcevski · 2025-11-27T16:45:44Z

Yes, but that's not enough. If there are two printouts with the same (or similar) message in two different probes, we could have a mix-up. The only way to ensure uniqueness is for each printout to print the function name.

I had also thought about adding an additional argument to some functions so that the probe name would be passed and then printed, but it seemed like too much.

I know, how about this? What if we changed the bpf_debug_printk to add the line for __FUNCTION__ automatically, but not do it for the user space log shipping? can we do macro magic here, assuming we keep the rule that we only do this if snprintf is available, which will also remove the 3 max arguments?

rafaelroquetto · 2025-11-27T16:46:45Z

@mmat11 in addition to what @grcevski, we removed the dependency on bpffs because it was not possible to mount it without privileged and messing with the k8s/docker app armour - I suspect the same could be true for debugfs and anything under /sys.

In my experience, we tend to know which customers do have access to the node/host and those few select ones are indeed able to provide us with a trace_pipe output - for those who cannot, debugfs would fall into the same bucket, and we'd still need to live with the actual logs from stdout/userspace instead.

Some people literally only have access to logs from a web interface (e.g. Grafana Dashboard/logs) and they end up pasting that. Or sometimes granting us access to their web UI where we can check the logs ourselves, so regardless of trace_pipe and friends, we still need to be able to see logs coming via the ring buffer (and it sucks they get truncated).

mmat11 · 2025-11-27T16:50:11Z

@mmat11 in addition to what @grcevski, we removed the dependency on bpffs because it was not possible to mount it without privileged and messing with the k8s/docker app armour - I suspect the same could be true for debugfs and anything under /sys.

In my experience, we tend to know which customers do have access to the node/host and those few select ones are indeed able to provide us with a trace_pipe output - for those who cannot, debugfs would fall into the same bucket, and we'd still need to live with the actual logs from stdout/userspace instead.

Some people literally only have access to logs from a web interface (e.g. Grafana Dashboard/logs) and they end up pasting that. Or sometimes granting us access to their web UI where we can check the logs ourselves, so regardless of trace_pipe and friends, we still need to be able to see logs coming via the ring buffer (and it sucks they get truncated).

I see, I think my proposal is not viable for now then. What about using dynamic log sizes then, similar to large buffer events? It would at least remove the "truncated" limit

rafaelroquetto · 2025-11-27T17:00:33Z

What about using dynamic log sizes then, similar to large buffer events? It would at least remove the "truncated" limit

@mmat11 I think that's a great idea, and probably the way to go.

One thing along those lines to consider (not sure if a good idea): if we want true structured logging, that can be achieved in this fashion - imagine something like this:

obi_bpf_log("found TP", "tp", tp, "pid", pid);

this gets serialised into a "log event" struct that gets shipped to userspace, which can then decide to unpack it as JSON or as a line (similar to TracePrinter).

but I digress

Uniform debug print messages in ebpf code

cf8fcf3

Signed-off-by: Giuseppe Ognibene <[email protected]>

pinoOgni requested a review from a team as a code owner November 27, 2025 09:44

grcevski reviewed Nov 27, 2025

View reviewed changes

rafaelroquetto reviewed Nov 27, 2025

View reviewed changes


		#### Function name in generic functions

		To print the function name in a generic function, use the `__FUNCTION__` identifier in every log statement without the bounding `===`. The triple equals signs are reserved exclusively for entry points of eBPF probes. Example:


		It can happen, sometimes, that the number of arguments to print is 3, but when `__FUNCTION__` is included, the total reaches 4. In these cases, `bpf_dbg_printk` calls `bpf_printk`, which subsequently calls `___bpf_pick_printk`. This chain then selects and calls `__bpf_vprintk` (for 4+ arguments), which finally invokes `bpf_trace_vprintk`. Since `bpf_trace_vprintk` is only available starting in [kernel version 5.16](https://docs.ebpf.io/linux/helper-function/bpf_trace_vprintk/), two options are available to avoid errors when loading the eBPF program:

		1. Split the print into two separate calls: one with one argument and one with two arguments, where `__FUNCTION__` is added to both.

Uniform debug print messages in ebpf code #941

Are you sure you want to change the base?

Uniform debug print messages in ebpf code #941

Uh oh!

Conversation

pinoOgni commented Nov 27, 2025

Uh oh!

codecov bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

marctc commented Nov 27, 2025

Uh oh!

pinoOgni commented Nov 27, 2025

Uh oh!

grcevski left a comment

Choose a reason for hiding this comment

Uh oh!

grcevski commented Nov 27, 2025

Uh oh!

pinoOgni commented Nov 27, 2025

Uh oh!

mmat11 commented Nov 27, 2025

Uh oh!

marctc commented Nov 27, 2025

Uh oh!

rafaelroquetto left a comment

Choose a reason for hiding this comment

Uh oh!

rafaelroquetto Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

pinoOgni Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

rafaelroquetto Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

pinoOgni Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

rafaelroquetto Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

rafaelroquetto Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

rafaelroquetto Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

rafaelroquetto commented Nov 27, 2025

Uh oh!

mmat11 commented Nov 27, 2025

Uh oh!

grcevski commented Nov 27, 2025

Uh oh!

grcevski commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rafaelroquetto commented Nov 27, 2025

Uh oh!

mmat11 commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rafaelroquetto commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Nov 27, 2025 •

edited

Loading

grcevski commented Nov 27, 2025 •

edited

Loading

mmat11 commented Nov 27, 2025 •

edited

Loading