Skip to content

Ebuild backend#274

Open
Kangie wants to merge 8 commits intolinux-application-whitelisting:mainfrom
Kangie:emerge-backend
Open

Ebuild backend#274
Kangie wants to merge 8 commits intolinux-application-whitelisting:mainfrom
Kangie:emerge-backend

Conversation

@Kangie
Copy link
Contributor

@Kangie Kangie commented Dec 2, 2023

I stumbled across fapolicyd and thought "that sounds pretty cool" and "I bet I could teach it to read the portage database".

After a month or so of "Fun" this PR is the result.

It's currently a WIP but I figured that good enough to request feedback and see if (once I write some tests and validate that I didn't break anything else...) we can sneak it in before the next release.

I am a C novice, so please let me know what can be improved!

Highlights:

  • New ebuild backend
  • init -> extra (and categorised some files)

TODOs:

  • Investigate use of libmd
  • Go over this code after some sleep
  • Tests!
  • Documentation
  • Work out how to whitelist portage properly (we can handle this downstream)
  • Add openrc init files
  • Fix ASAN identified memory leaks in new code
  • Work out how I broke unescape_shell during the refactoring (I didn't, just noticed something extant with asan enabled, will investigate further)

@Kangie Kangie force-pushed the emerge-backend branch 3 times, most recently from 8a345e5 to a13fe6b Compare December 3, 2023 05:06
Copy link

@thesamesam thesamesam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments. I'd consider rebasing and cleanly splitting the refactoring before adding the ebuild backend as well, like the changes to the Debian side & md5 parts

AS_HELP_STRING([--with-ebuild],[Use the ebuild database as a trust source]),
use_ebuild=$withval,use_ebuild=no)

if test x$use_ebuild = xyes ; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use AS_IF. Bare if has issues with quoting and is no longer recommended in the autoconf manual.

@Kangie Kangie force-pushed the emerge-backend branch 6 times, most recently from ff73fd5 to 1796a0f Compare December 9, 2023 01:30
@Kangie
Copy link
Contributor Author

Kangie commented Dec 10, 2023

@stevegrubb

Here's the other work I'm hoping to land before the next release, but if we don't make it I can snapshot the code for Gentoo after it's merged.

#277 has been raised to separate out the deb-backend changes; I'll rebase once that's merged so that there's less to review here.

Would you like me to split out this commit too, or should we squash all of the ebuild-related mess in the middle down into a few logical commits?

Edit: I have my TODO up the top, but aside from getting derailed by some 'not caused by my changes' ASAN stuff that I'm debugging in the background, I need to:

  • fix some memory leaks in the ebuild-backend code
  • Write some basic ebuild tests

On my todo, but not part of this PR, is adding a Gentoo CI/CD stage - there's some work that I'll need to do to come up with something suitable for downstream use. I suppose a short-term alternative, as the ebuild backend adds no new dependencies, would be to tarball that up as test data and just extract it in the test container?

I'll ping you again once I think the code is ready for review, but if you have any early feedback I'm willing to address it. :)

Edit the second: It won't make this PR / release, but would you be receptive to a Meson build system as an alternative to (or replacement for) autotools? It's starting to get to me a bit, so I'm willing to put in the porting work.

@Kangie Kangie force-pushed the emerge-backend branch 4 times, most recently from bffb243 to e4eabce Compare December 10, 2023 05:09
@stevegrubb
Copy link
Member

PR #277 appears ready to merge. Do we want to do that first and then rework this patch (if needed)? I have not yet looked this patch over.

@Kangie Kangie force-pushed the emerge-backend branch 2 times, most recently from 7c256e3 to 2f3e330 Compare December 16, 2023 02:59
@Kangie
Copy link
Contributor Author

Kangie commented Dec 17, 2023

PR #277 appears ready to merge. Do we want to do that first and then rework this patch (if needed)? I have not yet looked this patch over.

Yes, please - that's a prereq for this PR. I've had some people better at C than I provide some feedback offline and I'll be addressing it over the next few days.

@Kangie Kangie force-pushed the emerge-backend branch 2 times, most recently from b239ec1 to 08b0dfa Compare December 18, 2023 12:17
@Kangie
Copy link
Contributor Author

Kangie commented Dec 18, 2023

I've adapted the deb test to provide some basic assurances that the ebuild backend can read a vdb. I spent a bit of time today working on generating a psuedo-vdb that would enable us to test this without relying on a real Portage installation / database, but I didn't get there in the end. I'll keep plugging away at it in the background, but it may not make it in here.

I'm reasonably happy with the current state, reading the VDB is fast enough, though I could stand to work out if there's anything else on a Gentoo system can be safely filtered to reduce the amount of md5ing that happens when populating the trust db.

Happy to address any feedback!

image

@Kangie Kangie changed the title [WIP] Ebuild backend Ebuild backend Dec 18, 2023
@stevegrubb
Copy link
Member

stevegrubb commented Dec 18, 2023

OK, the other patch was merged which appears to create a couple conflicts that need fixing. I'll see if I can look this PR over soon.

@Kangie Kangie force-pushed the emerge-backend branch 2 times, most recently from 3149256 to 65041a5 Compare December 19, 2023 12:43
@stevegrubb
Copy link
Member

Was it really necessary to rename init.d to extra? That added a whole lot of churn.

There is a lot of code in /usr/share. It's in the HFS docs as the place for purely interpreted libraries.

Have you compiled and run this with address/undefined behavior sanitizers?

@Kangie
Copy link
Contributor Author

Kangie commented Jan 15, 2024

Apologies for the delayed response, I got deployed to help out with the response to the local severe weather event - today is my first day back at the keyboard.

Was it really necessary to rename init.d to extra? That added a whole lot of churn.

It seemed like the best way to separate the systemd init scripts, new openrc init stuff, and the application data / configuration that was all stored together under the 'init' folder. If you have a better way of organising it (or just one that you prefer) I'm happy to amend that commit.

There is a lot of code in /usr/share. It's in the HFS docs as the place for purely interpreted libraries.

In Gentoo it's 'architecture independent application data which is not modified at runtime.', but I do see some python, JS, and... C header files in amongst all the other binary files installed there on my development box.

I'll have to come up with a better filter. I'll force push and drop that for the short term.

Have you compiled and run this with address/undefined behavior sanitizers?

Yes. I actually have a branch where I'm tracking down some weirdness that I thought I caused but apparently existed in the code before I started touching it. I have a minimal broken example so if I can't work that out I'll log a bug:

library/escape.c:100:9: runtime error: load of address 0x7fb4ae604175 with insufficient space for an object of type 'char'
0x7fb4ae604175: note: pointer points here
 72 6f 6f 74 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00
             ^ 
    #0 0x55bd93a119a6 in unescape_shell library/escape.c:100
    #1 0x55bd939f9c8f in handle_mounts daemon/fapolicyd.c:373
    #2 0x55bd939f8661 in main daemon/fapolicyd.c:637
    #3 0x7fb4b325fee9 in __libc_start_call_main (/usr/lib64/libc.so.6+0x23ee9)
    #4 0x7fb4b325ffa4 in __libc_start_main_alias_1 (/usr/lib64/libc.so.6+0x23fa4)
    #5 0x55bd939f96c0 in _start (/data/development/temp/fapolicyd/src/fapolicyd+0x946c0)

library/escape.c:116:7: runtime error: store to address 0x7fb4ae604175 with insufficient space for an object of type 'char'
0x7fb4ae604175: note: pointer points here
 72 6f 6f 74 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00
             ^ 
    #0 0x55bd93a11cad in unescape_shell library/escape.c:116
    #1 0x55bd939f9c8f in handle_mounts daemon/fapolicyd.c:373
    #2 0x55bd939f8661 in main daemon/fapolicyd.c:637
    #3 0x7fb4b325fee9 in __libc_start_call_main (/usr/lib64/libc.so.6+0x23ee9)
    #4 0x7fb4b325ffa4 in __libc_start_main_alias_1 (/usr/lib64/libc.so.6+0x23fa4)
    #5 0x55bd939f96c0 in _start (/data/development/temp/fapolicyd/src/fapolicyd+0x946c0)

I am currently leaking 13 bytes allocated here that I haven't been able to track down, I'll try and look into that soon but I'm still recovering from working a week and a half of my vacation!

@stevegrubb
Copy link
Member

stevegrubb commented Jan 16, 2024

Hello, sounds like you've been busy. I think it might be best to not change the name of the directory as part of this patch set. That can/should be done separately. The idea is to minimize the changes in case we need to bisect back to this. The original idea was that everything related to config and initialization would live in the init directory. Let's address this after this patch is merged. Also, I think we should schedule this patch for after the 1.3.3 release so that we can get other bug fixes out to the community. Maybe since this is a new platform being on-boarded, we call the next release 1.4.

Regarding /usr/share, you might look at how we do this for rpm. We use a config file, fapolicyd-filter.conf. There is a library/filter.h file that shows the API that uses it. We had to resort to this because we found that we may be getting rid of files that the user really wants to trust. So, we gave them a way to take control.

As for the address sanitizer issue, I think that should be filed separately and treated as a 1.3.3 release blocker. I can see an issue it my testing too. Thanks for spotting it. Issue #281 was opened for this. @radosroka , have you seen the above address sanitizer issue?

@Kangie
Copy link
Contributor Author

Kangie commented Jan 18, 2024

I think it might be best to not change the name of the directory as part of this patch set. That can/should be done separately. The idea is to minimize the changes in case we need to bisect back to this. The original idea was that everything related to config and initialization would live in the init directory

Would you be happy for a patch/PR that moves the existing files into the new structure so that the 'gentoo PR' can just add the openrc files, or is there a strong preference for sticking with the existing approach?

Also, I think we should schedule this patch for after the 1.3.3 release so that we can get other bug fixes out to the community. Maybe since this is a new platform being on-boarded, we call the next release 1.4.

No worries, less time pressure to look at filtering. 😄

Gentoo has a binary package host now, so it will be sane to add a containerised CI/CD option where it still wasn't late last year - I'll look into that too if I have time!

Regarding /usr/share, you might look at how we do this for rpm. We use a config file, fapolicyd-filter.conf. There is a library/filter.h file that shows the API that uses it. We had to resort to this because we found that we may be getting rid of files that the user really wants to trust. So, we gave them a way to take control.

Will do.

@radosroka
Copy link
Member

library/escape.c:100:9: runtime error: load of address 0x7fb4ae604175 with insufficient space for an object of type 'char'
0x7fb4ae604175: note: pointer points here
72 6f 6f 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
#0 0x55bd93a119a6 in unescape_shell library/escape.c:100
#1 0x55bd939f9c8f in handle_mounts daemon/fapolicyd.c:373
#2 0x55bd939f8661 in main daemon/fapolicyd.c:637
#3 0x7fb4b325fee9 in __libc_start_call_main (/usr/lib64/libc.so.6+0x23ee9)
#4 0x7fb4b325ffa4 in __libc_start_main_alias_1 (/usr/lib64/libc.so.6+0x23fa4)
#5 0x55bd939f96c0 in _start (/data/development/temp/fapolicyd/src/fapolicyd+0x946c0)

library/escape.c:116:7: runtime error: store to address 0x7fb4ae604175 with insufficient space for an object of type 'char'
0x7fb4ae604175: note: pointer points here
72 6f 6f 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
#0 0x55bd93a11cad in unescape_shell library/escape.c:116
#1 0x55bd939f9c8f in handle_mounts daemon/fapolicyd.c:373
#2 0x55bd939f8661 in main daemon/fapolicyd.c:637
#3 0x7fb4b325fee9 in __libc_start_call_main (/usr/lib64/libc.so.6+0x23ee9)
#4 0x7fb4b325ffa4 in __libc_start_main_alias_1 (/usr/lib64/libc.so.6+0x23fa4)
#5 0x55bd939f96c0 in _start (/data/development/temp/fapolicyd/src/fapolicyd+0x946c0)


I am currently leaking 13 bytes [allocated here](https://github.com/linux-application-whitelisting/fapolicyd/pull/274/files#diff-b2e2388c4837a311031ffe6082dfdc129c60bad8def1a9f1fbe5f73ad3f90622R284) that I haven't been able to track down, I'll try and look into that soon but I'm still recovering from working a week and a half of my vacation!

Is this from the daemon itself or CLI?

@Kangie
Copy link
Contributor Author

Kangie commented Jan 25, 2024

It's from the daemon itself.

@radosroka
Copy link
Member

It's from the daemon itself.

And was it on this branch or main?

@Kangie
Copy link
Contributor Author

Kangie commented Apr 12, 2024

Not forgotten, just had a few other projects eat up my time. I'll aim to get this finished off for late April / Early may; I have some leave coming up :)

@Kangie Kangie force-pushed the emerge-backend branch 2 times, most recently from f513fbf to 396803f Compare October 26, 2024 09:15
@Kangie
Copy link
Contributor Author

Kangie commented Oct 26, 2024

OK I put fvwm3 to bed and finally have some time for this.

Thanks for the pointer on the filter - I'm almost there with it's implementation - I seem to be having some difficulty getting anything to match the filters though, which is annoying. I'll try and track that down over the next little bit.

All that I can tell for now is that the filter file is loaded (just a slightly modified fedora config because our kernel path is /usr/src/linux-*gentoo/) , does not seem to be invalid (as filter_load_file seems happy) however none of the full paths that I pass to filter_check seem to result in a match.

 - usr/src/linux-*/
  + */scripts/*
  + */tools/objtool/*

My "sophisticated debugging" techniques here seem to reveal that filter_check always returns 1:

10/26/24 19:06:22 [ DEBUG ]: filter_check: /usr/src/linux-6.10.1-gentoo/drivers/gpu/drm/nouveau/nvkm/subdev/acr/gp10b.c
10/26/24 19:06:22 [ DEBUG ]: filter_check: result 1
10/26/24 19:06:22 [ DEBUG ]: Adding /usr/src/linux-6.10.1-gentoo/drivers/gpu/drm/nouveau/nvkm/subdev/acr/gp10b.c
10/26/24 19:06:22 [ DEBUG ]:    Expected MD5: 8ff4c8c354f0ea101adb83290e56679b
10/26/24 19:06:22 [ DEBUG ]:    File size: 2076

I also note that /usr/include, a default excluded path in the fedora config (if I'm reading that correctly) is also returning a 1:

+ /
 - usr/include/
10/26/24 19:44:06 [ INFO ]: Adding app-text/ghostscript-gpl-10.03.1:0/10.03 (::gentoo) to the ebuild backend; 341 files
10/26/24 19:44:06 [ DEBUG ]: filter_check: /usr/include/ijs/ijs_server.h
10/26/24 19:44:06 [ DEBUG ]: filter_check: result 1
10/26/24 19:44:06 [ DEBUG ]: Adding /usr/include/ijs/ijs_server.h
10/26/24 19:44:06 [ DEBUG ]:    Expected MD5: 61b6bc496441226d94acbf3bec7c4c65
10/26/24 19:44:06 [ DEBUG ]:    File size: 4223

Otherwise we seem to be looking pretty good; after 15 minutes or so in permissive mode; it's even happy with the package manager (at least in terms of it removing things!):

10/26/24 19:24:05 [ INFO ]: shutting down...
10/26/24 19:24:05 [ DEBUG ]: rule=1 dec=allow perm=open auid=1000 pid=459150 exe=/usr/bin/python3.12 : path=/var/tmp/portage/._unmerge_/sys-kernel/gentoo-sources-6.11.4/temp/build.log ftype=text/plain trust=0
10/26/24 19:24:05 [ DEBUG ]: Exiting decision thread
10/26/24 19:24:05 [ DEBUG ]: Update poll interrupted
10/26/24 19:24:05 [ DEBUG ]: Update poll timeout expired
10/26/24 19:24:06 [ DEBUG ]: Inter-thread max queue depth 7
10/26/24 19:24:06 [ DEBUG ]: Allowed accesses: 110003
10/26/24 19:24:06 [ DEBUG ]: Denied accesses: 12
10/26/24 19:24:06 [ DEBUG ]: Trust database max pages: 256000
10/26/24 19:24:06 [ DEBUG ]: Trust database pages in use: 66174 (25%)
10/26/24 19:24:06 [ DEBUG ]: Subject cache size: 1549
10/26/24 19:24:06 [ DEBUG ]: Subject slots in use: 1469 (94%)
10/26/24 19:24:06 [ DEBUG ]: Subject hits: 108546
10/26/24 19:24:06 [ DEBUG ]: Subject misses: 4798
10/26/24 19:24:06 [ DEBUG ]: Subject evictions: 3329 (3%)
10/26/24 19:24:06 [ DEBUG ]: Object cache size: 8191
10/26/24 19:24:06 [ DEBUG ]: Object slots in use: 8184 (99%)
10/26/24 19:24:06 [ DEBUG ]: Object hits: 101831
10/26/24 19:24:06 [ DEBUG ]: Object misses: 59834
10/26/24 19:24:06 [ DEBUG ]: Object evictions: 51650 (50%)

typedef struct contents {
char *md5;
char *path;
} ebuildfiles;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but this describes a single file, so it should properly be ebuildfile - otherwise it is confusing if the object is an array of file objects or just a single file.

Comment on lines +32 to +33
#include "fapolicyd-backend.h" // for SRC_EBUILD, backend
#include "filter.h" // for filter_destroy, filter_init, filter_l...
#include "llist.h" // for list_empty, list_init
#include "md5-backend.h" // for add_file_to_backend_by_md5
#include "message.h" // for msg
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From other code in this repository, 8 column tabs are used; these comments don't align with the others when using 8 column tabs.

@Kangie Kangie force-pushed the emerge-backend branch 2 times, most recently from 156e552 to 1d9c07d Compare December 3, 2025 22:01
Kangie and others added 3 commits December 4, 2025 08:08
This commit introduces a new backend for handling Gentoo ebuilds via the
paortage var database (VDB), allowing fapolicyd to trust files managed
by the Portage package manager. The implementation is modeled after the
existing Debian (MD5) and RPM backends.

Key changes:
- Implemented `ebuild-backend` to ingest trust data from the ebuild VDB.
- Integrated the new backend into the backend manager and database structures.
- Added OpenRC init scripts (`init.d` and `conf.d`) to support
  Gentoo and other non-systemd environments.
- Updated the build system to exclude OpenRC files when building RPMs.

This enables fapolicyd to function on Gentoo-based systems by
automatically trusting installed packages.

Signed-off-by: Matt Jolly <kangie@gentoo.org>
If the main event loop and decision thread try to write out at the same
time, this can result in interleaved output like this:

```
12/03/25 10:28:29 [ DEBUG ]: Main poll interrupted
12/03/25 10:28:29 [ DEBUG12/03/25 10:28:29 [ DEBUG ]: Main poll interrupted
 ]: rule=6 dec=allow perm=open auid=1000 pid=15627 exe=/usr/bin/zsh : path=/usr/lib64/ld-linux-x86-64.so.2 ftype=application/x-sharedlib trust=1
. . .
12/03/25 10:28:29 [ DEBUG ]: Main poll interrupted
12/03/25 10:28:29 [ 12/03/25 10:28:29 [ DEBUG ]: rule=29 dec=allow perm=open auid=1000 pid=15627 exe=/usr/bin/env : path=/etc/ld.so.cache ftype=application/octet-stream trust=0
DEBUG ]: Main poll interrupted
```

Add a `pthread_mutex` to serialise access to `stderr` and ensure that
output comes out as desired.

Signed-off-by: Matt Jolly <kangie@gentoo.org>
My initial implementation of colour logging output unconditionally put
ANSI escape sequences in log output.

Improve this by respecting `NO_COLOR` and detecting if `stderr` is a tty,
and only adding colours if that's not the case (i.e. interactive
operation).

Signed-off-by: Matt Jolly <kangie@gentoo.org>
Lower level APIs, like `md5` have been deprecated since OpenSSL 3.0. Use
the modern high-level `EVP` (envelope) API instead.

Signed-off-by: Matt Jolly <kangie@gentoo.org>
The `sanity_check_array` function in `src/library/object.c` was accepting
a mutable `o_array *` but was being called with `const o_array *` from
`object_access` and `object_find_file`.

Since the function only reads from the array, the signature was updated
to accept `const o_array *` to resolve the "discards 'const' qualifier"
warning.

Signed-off-by: Matt Jolly <kangie@gentoo.org>
When adding items to the subject or object arrays, the code previously
unconditionally incremented the item count (`cnt`). If an entry already
existed at the target index (e.g., overwriting an existing attribute),
this caused the count to drift from the actual number of elements.

This desynchronisation triggered the `sanity_check_array` assertion:
`[ DEBUG ]: subject_add 2 - array corruption 3!=4`

This patch modifies `subject_add` and `object_add` to check for existing
entries. If an entry exists, it is properly freed and replaced without
incrementing the count. The count is now only incremented when a new
slot is filled.

Signed-off-by: Matt Jolly <kangie@gentoo.org>
If we exit during early init (say by pressing `^C` during the ebuild
backend's long init before the queue has been initialised), it could
cause a crash.

Modify `nudge_queue` to safely handle this case and properly exit early.

Signed-off-by: Matt Jolly <kangie@gentoo.org>
This commit introduces a meson build system option for fapolicyd.

It has the same functionality as the existing autotools implementation,
and tests are wired up.

Signed-off-by: Matt Jolly <kangie@gentoo.org>
@Kangie
Copy link
Contributor Author

Kangie commented Dec 3, 2025

OK it's been a hot minute. Revisited the code and fixed my issue with filters (oops, it was trivial).

The ebuild backend is not quick to initialise (and will never be while we have to re-hash every file to get a sha256), but it is functional and I'm not unhappy with the results. If consumers are interested, the next step would be to update the VDB format on-disk to use SHA256 instead of MD5, which would be performant and avoid re-hashing. That's on us downstream though, and the current state is perfectly fine as a demonstration that parsing the VDB format for fapolicy is usable.

The ebuild backend is self contained and does not rely on any package-manager dependencies. Some minimal VDB parsing tests have been included; since these tests should pass on any build (and currently are) I don't see too much value in adding additional Gentoo CI, but I'm happy to do so if you want "modern compiler" checks as I will be packaging this downstream and we tend to run bleeding-edge GCC - let me know if that's desirable.

ASAN and Valgrind are happy, and I've included a few bugfixes that I encountered along the way:

Logging:

  • respect NO_COLOR and check for tty before using colour.
  • Interleaved message output caused by two threads writing logs at the same time

Stability:

  • Early exit before the queue has been setup results in crash
  • sanity_check_array caught some array corruption
    • fapolicyd was dying some minutes after init. I traced this to cnt incrementing even when overwriting existing entries, causing an overflow. The fix prevents the increment on overwrite. Might only happen with massive ebuild backend databases?
    • Definitely needs a review, but my fapolicyd has been running solidly since running this fix.

QA:

  • sanity_check_array - fix "discards 'const' qualifier" warning

While reviewing the MD5 code (and the deprecation warning annoying me) I took the opportunity to update it (and SHA hashing) to use the modern/high-level OpenSSL EVP API. It seems fine, and results are consistent with previous behaviour but I'd definitely look over it. Looking over recent commits, this may have been on the agenda for the next release already?

Additionally, while I was waiting for ASAN and Valgrind in some very long ebuild backend inits I implemented a quick meson port. It's fast, has feature-parity with the existing autotools build, has tests wired up, and provides a nice summary at the end of the configure stage. If you're interested in merging this I'm happy to update CI to cater for it, too.

This is a big PR, and although I've done my best to isolate changes into their own commits I'm happy to split any change off into an individual PR for separate review, drop any changes that you're not interested in, or respond to any feedback on the code. As-is we're passing tests and I've been able to run fapolicyd in Valgrind (in permissive mode) for >24h without issue or any leaks showing up.

autoreconf -f --install && ./configure --with-audit --with-rpm --with-deb  )  12.79s user 3.01s system 158% cpu 9.997 total
( meson setup build_meson -Daudit=true -Drpm=enabled -Ddeb=enabled && meson  )  2.79s user 1.36s system 299% cpu 1.383 total

I'm helping out with recovery from some severe storms we had on this side of the timecube, so my availability will be a little limited in the near future, but I'll do my best to respond to any feedback promptly.

@stevegrubb - When you have time, let me know what changes you'd like. No rush given that it's been about a year since I last had time to progress this one :)

@stevegrubb
Copy link
Member

stevegrubb commented Dec 13, 2025

This patch is trying to do a lot:

  1. Move files around
  2. Fix things in the main program unrelated to ebuild
  3. Add ebuild backend

These should be separate patches so they can be reviewed and accepted on their own merit. The ebuild backend should probably be done with a new PR with an up to date commit message so that we have a merge with an appropriate introduction. I would suggest making the PR's in the common code/moving files around first so that changes I mention below don't cause this whole patch to have conflicts. Better to make smaller PRs to reduce merge surface area. There are going to be some patches landing soon that may conflict with this, so we should probably get through this asap.

ebuild-backend - it would be preferable if you followed the project code style so that it has a common look and feel. It follows the linux kernel programming style. Also, it's nice that you followed the memfd approach, that is much faster and doesn't waste as much memory. On my fedora system with a little over 6000 rpms, startup time is around 3 seconds. Of course it doesn't have to rehash everything. We did a whole lot of performance work on the rpm backend to get it there. You might look at the commit history of that file to see what performance changes we had to make.

In the rpm-backend, filter_check is called when processing a file in the package. If it is not a keeper, you do not need to malloc, hash, and save it on a list. Just move on to the next file. The point is to call the filter when the path is assembled and it's known to be a regular file to see if there's any reason to continue with it.

Other notes, SHA256() is on the hot path. This patch replaces it with EVP_ functions which breaks the hashing up into multiple calls. Does this make it faster? Anything changing crypto should be it's own patch with the rationale for doing so in it's commit message. How to measure performance of fapolicyd is here: https://github.com/linux-application-whitelisting/fapolicyd-extras/tree/main/performance

Updates in md5-backend.c could be its own patch

The interleaving TTY output is a direct result of using multiple calls to fputs which is much faster than the original which was a monster, big fprintf. This should be it's own patch with a commit message explaining it. But I'm considering to defer a patch addressing this issue because I've got plans for next year to redesign a major part of the application such that there is more than one decision thread. That is going to involve doing a lot of isolation and synchronization work. I'd rather have this to be part of that. (Also, this code is on the critical path when using the debug mode.) So, maybe consider breaking this up into multiple patches with the mutex locking the last in the series. Colorization changes could go in on it's own merit.

What is the code change in object for? Should be its own patch with its own commit message.

What is the code change in subject for? Should be its own patch with its own commit message.

@Kangie
Copy link
Contributor Author

Kangie commented Dec 27, 2025

Thanks for the feedback. I'm finally back from travel and relief work so I'll get this stuff split up over the next couple of days for review - the commits should already be independent; I anticipated that this would need to be broken up.

WRT the SHA256/EVP stuff - it's the only non-deprecated way to access those hashing functions in modern OpenSSL, but we can discuss that on its own PR. :)

@stevegrubb
Copy link
Member

stevegrubb commented Dec 30, 2025

ok, just for planning purposes...we are planning to do a release sometime in January. Probably mid to late just to roll up everything since the last release. (The magic lookup optimization is the main driver for the release.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants