Skip to content

Remove bpfilter daemon and consolidate every feature in libbpfilter#463

Open
qdeslandes wants to merge 5 commits intofacebook:mainfrom
qdeslandes:daemon-less
Open

Remove bpfilter daemon and consolidate every feature in libbpfilter#463
qdeslandes wants to merge 5 commits intofacebook:mainfrom
qdeslandes:daemon-less

Conversation

@qdeslandes
Copy link
Contributor

  • Eliminate the bpfilter daemon process; all filtering logic now lives in libbpfilter and is called directly by bfcli
  • Move the code generation engine (cgen/), BPF stubs (bpf/), context management (ctx.c), and ELF stub handling from src/bpfilter/ into src/libbpfilter/
  • Remove the IPC layer (request.c, response.c, io.c, ns.c) and the daemon entry point, systemd service, and socket
  • Rewrite bfcli to initialise the library context on startup, call libbpfilter APIs directly, and tear down on exit
  • Adapt all e2e tests to run without a daemon: helpers no longer start/stop a background process, and daemon-specific tests are replaced by equivalent persistence and namespace tests
  • Scrub every remaining "daemon" reference from comments, docs, and user-facing strings (external kernel headers excluded)

As it is now, this PR can be merged into main, but more polishing will be done before tagging a new release.

The cgen/, bpf/, ctx, and xlate modules were compiled directly into the
bpfilter binary. Move them into libbpfilter so the filtering logic lives
entirely in the shared library and the daemon is a thin entry point.
@meta-cla meta-cla bot added the cla signed label Mar 7, 2026
pindir_fd = bf_ctx_get_pindir_fd();
if (pindir_fd < 0)
return bf_err_r(pindir_fd, "failed to get pin directory FD");

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Global context leak on bf_ctx_setup early failure paths (must-fix)

After _bf_global_ctx = TAKE_PTR(_ctx) at line 393, the local _ctx is NULL and its _free_bf_ctx_ cleanup attribute becomes a no-op. If bf_ctx_get_pindir_fd() fails (line 395-396) or flock() fails (line 399-400), the function returns without freeing _bf_global_ctx. The caller in main.c does not call bf_ctx_teardown() on failure, so the global context, its BPF token fd, BTF state, and ELF stubs are all leaked.

Only the _bf_ctx_discover() failure path at line 404-405 properly calls _bf_ctx_free(&_bf_global_ctx).

Suggested fix: add _bf_ctx_free(&_bf_global_ctx) to the bf_ctx_get_pindir_fd and flock failure paths (matching the _bf_ctx_discover path), or restructure with a single goto err label.

_bf_global_ctx = TAKE_PTR(_ctx);
pindir_fd = bf_ctx_get_pindir_fd();
if (pindir_fd < 0)
return bf_err_r(pindir_fd, "failed to get pin directory FD");
r = flock(pindir_fd, LOCK_EX | LOCK_NB);
if (r)
return bf_err_r(-errno, "failed to lock pin directory");
r = _bf_ctx_discover();
if (r) {
_bf_ctx_free(&_bf_global_ctx);
return bf_err_r(r, "failed to discover chains");

if (r)
return bf_err_r(r, "failed to load vmlinux BTF");

_ctx = calloc(1, sizeof(*_ctx));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: BTF resource leak if calloc fails after bf_btf_setup succeeds (suggestion)

bf_btf_setup() is called at line 106 before calloc() at line 110. If calloc fails, _ctx is NULL, so the _free_bf_ctx_ cleanup attribute is a no-op and bf_btf_teardown() is never called. This leaks whatever global BTF state was initialized.

Suggested fix: move bf_btf_setup() to after the calloc succeeds, or add an explicit bf_btf_teardown() in the calloc failure path.

r = bf_btf_setup();
if (r)
return bf_err_r(r, "failed to load vmlinux BTF");
_ctx = calloc(1, sizeof(*_ctx));
if (!_ctx)
return -ENOMEM;

#include "cgen/prog/link.h"
#include "cgen/prog/map.h"

static int copy_hookopts(struct bf_hookopts **dest,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Static functions copy_hookopts and copy_set missing _bf_ prefix (must-fix)

Per doc/developers/style.rst and CLAUDE.md, static functions use a leading underscore with the _bf_ prefix (e.g., _bf_ctx_free()). Every other static function across src/libbpfilter/*.c follows this convention. These should be _bf_copy_hookopts and _bf_copy_set respectively.

static int copy_hookopts(struct bf_hookopts **dest,

static int copy_set(struct bf_set **dest, const struct bf_set *src)

"BF_REQ_CHAIN_PROG_FD failed");

return TAKE_FD(prog_fd);
return dup(cgen->handle->prog_fd);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: bf_chain_prog_fd and bf_chain_logs_fd return bare dup() failure (-1) without proper errno (suggestion)

dup() returns -1 on failure, which happens to equal -EPERM on Linux. The API contract says "negative error value on failure", but returning -1 without converting errno means: (1) the caller cannot distinguish dup() failure from a permission error, and (2) error logging will report "Operation not permitted" rather than the actual error.

The codebase pattern for returning fds (e.g., _bf_ctx_gen_token, bf_ctx_get_pindir_fd) is to use _cleanup_close_ with TAKE_FD():

Suggested change
return dup(cgen->handle->prog_fd);
_cleanup_close_ int fd = dup(cgen->handle->prog_fd);
if (fd < 0)
return bf_err_r(-errno, "failed to dup program fd for '%s'", name);
return TAKE_FD(fd);

Same applies to bf_chain_logs_fd at line 286.

return dup(cgen->handle->prog_fd);

return dup(cgen->handle->lmap->fd);

@github-actions
Copy link

github-actions bot commented Mar 7, 2026

Claude: review of facebook/bpfilter #463 (31553a8)

Must fix

  • Double-free / dangling global in bf_ctx_setupsrc/libbpfilter/ctx.c:393 — Commit 2 changed _bf_global_ctx = TAKE_PTR(_ctx) to a plain _bf_global_ctx = _ctx. Both pointers now alias the same allocation. On _bf_ctx_discover() failure, _bf_ctx_free(&_bf_global_ctx) frees the object, then the _free_bf_ctx_ cleanup on _ctx frees it again (double-free). On bf_ctx_get_pindir_fd / flock failure, the cleanup frees via _ctx but _bf_global_ctx is left dangling.
  • Static functions copy_hookopts and copy_set missing _bf_ prefixsrc/libbpfilter/cli.c:27,377 — Per doc/developers/style.rst, static functions use the _bf_ prefix. Should be _bf_copy_hookopts and _bf_copy_set.
  • Commit 5 subject typo: "bffps" should be "bpffs" — Commit lib: lock bffps during bf_ctx setup has a transposition that will live permanently in git history.

Suggestions

  • BTF resource leak if calloc fails after bf_btf_setupsrc/libbpfilter/ctx.c:103-110bf_btf_setup() succeeds, then calloc fails → _free_bf_ctx_ is a no-op → BTF state leaks. Move bf_btf_setup() after calloc, or add explicit cleanup.
  • bf_chain_prog_fd/bf_chain_logs_fd return bare dup() failuresrc/libbpfilter/cli.c:269,286dup() returns -1 on failure, which equals -EPERM. Should use _cleanup_close_ + TAKE_FD() pattern and convert errno, matching existing codebase patterns.
  • Missing <errno.h> include in cli.csrc/libbpfilter/cli.c:6 — The file uses ENOMEM, EINVAL, ENOENT, EEXIST, EBUSY, ENODEV directly but relies on transitive inclusion. Nearly every other .c file in src/libbpfilter/ that uses errno constants includes <errno.h> directly.
  • Unit test coverage regressed: removed tests not replacedtests/unit/libbpfilter/cli.c — 8 of 11 test functions removed. Now that functions are direct calls they are more testable, but coverage dropped. Functions like bf_chain_set, bf_chain_load, bf_chain_attach, bf_chain_update have non-trivial logic that should have at least basic unit test coverage.
  • Persistence tests no longer verify cross-process state restorationtests/e2e/persistence/restore_attached.sh — The old daemon tests verified state survived a process restart. The new tests exercise discovery within a single sandbox session. Each bfcli invocation does call bf_ctx_setup/_bf_ctx_discover, but the assertion is weaker than before.
  • Commit 2 uses daemon as component but this PR removes it — The subject uses daemon: but commit 4 removes daemon from the canonical component list. lib or lib,cli would be more accurate.

Nits

  • Doxygen for bf_ctx_setup should note bpffs_path lifetime requirementsrc/libbpfilter/include/bpfilter/ctx.h — The string is stored as a raw pointer without copying. Since this is now a public API, the lifetime requirement should be documented.
  • Commits 3 and 4 use comma-separated componentstests,cli: and lib,cli,tests,doc: — The style guide defines component: subcomponent: description with a single component. Consider documenting this convention or using the primary component.
  • Commit 2 body describes "what" rather than "why" — The body lists all the mechanical changes but could better explain the rationale (the first paragraph does well, the rest could be trimmed).

CLAUDE.md improvements

  • The ## Directory structure section in CLAUDE.md still lists src/bpfilter/ with cgen/, xlate.c, and bpf/ as daemon-specific directories. After this PR, those move to src/libbpfilter/ and the daemon directory is significantly reduced. The directory tree should be updated.
  • The ## Components section lists bpfilter - Daemon that generates and manages BPF programs. After this PR, the daemon no longer exists. This should be updated to reflect the new architecture.
  • The ## Commit messages Components list includes daemon. After this PR, daemon is no longer a valid component.

Replace the daemon IPC path in libbpfilter/cli.c with direct calls to
the cgen and ctx APIs, allowing bfcli to load, attach, and manage chains
without a running daemon.

Move ctx and elfstub headers to the public libbpfilter API, pull BTF
lifetime into bf_ctx_setup/teardown, remove namespace switching and
transient mode, and extend bfcli's stage-1 parser with global flags
(--verbose, --bpffs-path, --with-bpf-token, deprecated compat options).
Replace daemon e2e tests with namespace and persistence tests that
exercise bpffs-based chain discovery directly, without starting or
stopping the daemon process.

Remove the daemon lifecycle (start_bpfilter/stop_bpfilter) from the
test harness and propagate --bpffs-path through the ${BFCLI} variable
so all test invocations use the sandbox-local bpffs mount.

Skip bf_ctx_setup in bfcli when --dry-run is used so matcher tests
no longer require bpffs access. Fix clang-analyzer BitwiseShift
warning in --verbose option parsing.

Drop namespace unit tests and request namespace accessors that are no
longer part of the public API.
The daemon has been removed in favour of direct library calls. Update
comments, documentation, and user-facing strings that still referred to
"the daemon" so the terminology is consistent with the new architecture.

External kernel headers (src/external/) are left unchanged.
Lock bpfilter directory in bpffs to prevent concurrent access to the
chains and BPF objects.
return bf_err_r(r, "failed to create new context");

_bf_global_ctx = TAKE_PTR(_ctx);
_bf_global_ctx = _ctx;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Double-free on _bf_ctx_discover() failure; dangling _bf_global_ctx on other early-return paths (must-fix)

Commit 2 changed this from _bf_global_ctx = TAKE_PTR(_ctx) to a plain pointer copy. Now both _bf_global_ctx and _ctx point to the same allocation, and _ctx retains its _free_bf_ctx_ cleanup attribute.

_bf_ctx_discover() failure path (double-free): _bf_ctx_free(&_bf_global_ctx) frees the object and nulls the global, but _ctx still holds the stale pointer. On function return, the cleanup fires _bf_ctx_free(&_ctx) which sees non-NULL and frees again.

bf_ctx_get_pindir_fd() / flock() failure paths (dangling global): The cleanup frees through _ctx and nulls it, but _bf_global_ctx still holds the freed pointer. Any subsequent use of the global is use-after-free.

Suggested fix: either restore _bf_global_ctx = TAKE_PTR(_ctx) and add _bf_ctx_free(&_bf_global_ctx) to all early-return paths, or defer the global assignment to after all validation succeeds:

// defer assignment to success path only
r = _bf_ctx_discover();
...
_bf_global_ctx = _ctx;
_bf_global_ctx->lock_fd = TAKE_FD(pindir_fd);
TAKE_PTR(_ctx);

_bf_global_ctx = _ctx;

Comment on lines 103 to 110
assert(ctx);
assert(bpffs_path);

r = bf_btf_setup();
if (r)
return bf_err_r(r, "failed to load vmlinux BTF");

_ctx = calloc(1, sizeof(*_ctx));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: BTF resource leak if calloc fails after bf_btf_setup succeeds (suggestion)

bf_btf_setup() succeeds at line 103, then calloc fails at line 107. Since _ctx is NULL, the _free_bf_ctx_ cleanup is a no-op and bf_btf_teardown() is never called, leaking the global BTF state.

Suggested fix: move bf_btf_setup() after calloc succeeds, or add bf_btf_teardown() to the calloc failure path.

assert(ctx);
assert(bpffs_path);
r = bf_btf_setup();
if (r)
return bf_err_r(r, "failed to load vmlinux BTF");
_ctx = calloc(1, sizeof(*_ctx));

#include "cgen/prog/link.h"
#include "cgen/prog/map.h"

static int copy_hookopts(struct bf_hookopts **dest,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Static functions copy_hookopts and copy_set missing _bf_ prefix (must-fix)

Per doc/developers/style.rst, static/internal functions use a leading underscore with the _bf_ prefix. Over 115 static functions across src/libbpfilter/ follow this convention (e.g. _bf_ctx_free, _bf_dynbuf_grow, _bf_parse_l4_port_range). These should be _bf_copy_hookopts and _bf_copy_set.

static int copy_hookopts(struct bf_hookopts **dest,

"BF_REQ_CHAIN_PROG_FD failed");

return TAKE_FD(prog_fd);
return dup(cgen->handle->prog_fd);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Bare dup() returns -1 on failure, conflating with -EPERM (suggestion)

dup() returns -1 on failure, which equals -EPERM. The API contract expects negative errno values. The codebase pattern for returning fds uses _cleanup_close_ with TAKE_FD() and proper errno conversion (see _bf_ctx_gen_token, bf_ctx_get_pindir_fd):

Suggested change
return dup(cgen->handle->prog_fd);
_cleanup_close_ int fd = dup(cgen->handle->prog_fd);
if (fd < 0)
return bf_err_r(-errno, "failed to dup program fd for '%s'", name);
return TAKE_FD(fd);

Same pattern should be applied to bf_chain_logs_fd at line ~286.

return dup(cgen->handle->prog_fd);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant