Add ability to set thread affinity #51

esyr · 2025-09-25T23:53:20Z

Includes the relevant update to the pkeyread test, as it already tries to report some thread indices in the -v mode.

Sashan

looks good I could find just few nits in threads.c you might want to address.
thanks.

Sashan · 2025-09-30T13:38:34Z

source/perflib/threads.c

+    unsigned int ret = 0;
+
+    for (size_t i = 0; i < sizeof(a) * CHAR_BIT; i++)
+        ret += ((a & (1ULL << i)) == 0);


I think this what I've messed up in my sgguested change. we discussed this off-llist we are supposed to count bits which are set, right? if so then we need ret += ((a & (1ULL << i)) != 0); here.

Sashan · 2025-09-30T13:40:06Z

source/perflib/threads.c

+        goto err;
    }

+    ta = OPENSSL_malloc(sizeof(*ta) * threadcount);


I think it is good to use OPENSSL_malloc() so tests work with libraries which don't provide OPENSSL_malloc_array()

Sashan · 2025-09-30T13:40:37Z

source/perflib/threads.c

        args[i].num = i;
-        perflib_run_thread(&threads[i], &args[i]);
+        if (!(run_threads[i] = perflib_run_thread_(&threads[i], &args[i],
+                                                   ta + i)))


can we use &ta[i] here so it is clear we work with array, thanks.

Signed-off-by: Eugene Syromiatnikov <[email protected]>

Co-Authored-by: Alexandr Nedvedicky <[email protected]> Signed-off-by: Eugene Syromiatnikov <[email protected]>

Signed-off-by: Eugene Syromiatnikov <[email protected]>

nhorman · 2025-10-16T15:31:06Z

source/perflib/threads.c

+static ossl_inline unsigned int popcount(affinity_t a)
+{
+    return __builtin_popcountl(a);
+}


do we really need to special case the ability to use a compiler built in here? It seems like the balance between the ifdeffery here and a single function that counts up to sizeof(unsigned long) * 8 bits is biased in favor of just having one function.

I just don't like the idea of rolling own implementation when the built-in is right here, but I don't really care here.

if we want to use compiler built-ins can we also enable them for clang?

diff --git a/source/perflib/threads.c b/source/perflib/threads.c index 8cf3a76..4f9187c 100644 --- a/source/perflib/threads.c +++ b/source/perflib/threads.c @@ -22,7 +22,7 @@ /** affinity_t-typed value with nth bit set. */ #define AFFINITY_BIT(n) ((affinity_t)1U << (n)) -#if defined(__GNUC__) +#if defined(__GNUC__) || defined(__clang__) static ossl_inline unsigned int popcount(affinity_t a) { @@ -41,7 +41,7 @@ static ossl_inline unsigned int popcount(affinity_t a) return ret; } -#endif /* __GNUC__ */ +#endif /* __GNUC__ or __clang__ */ int perflib_roundrobin_affinity(affinity_t *cpu_set_bits, size_t cpu_set_size, size_t num, size_t cnt, void *arg)

to be honest I'm with Neal here. My reasoning is the peftools need to be portable to as many platforms/compilers as (conveniently) possible. you are rolling the builtin implementation anway so using a bultinn one here does not buy as much.

on the other hand if limit ourselves to clang and GCC tools, then I'm fine with going to bultin only one.

the true reason I don't like the if/else here is it leaves a dead/untested code behind. In my opinion the true choice here should be:

being portable, then roll your own
or

let's rely on compiler then code will work on platforms where bultiin is provided

in my view the perftools are roll your own case.

So what's the goal and the problem we are seeking to solve here that we need code to support to solve.

How will we support this, when people start fiddling with it and complain about impact because they do it poorly, and attempt to be smarter than the scheduler and do it badly.

To ask this another way, if our goal is to have less noisy data for falling-down-the-hill performance tests, does this belong in testing support for us and not in the main library?

mea culpa, that's what this is, so yeah it's in the test library, I have less objections.

nhorman · 2025-10-16T16:04:42Z

source/pkeyread.c

        "\t-v  verbose output, includes min, max, stddev, and median times\n"
-        "\t-T  timeout for each test run in seconds, can be fractional"
+        "\t-T  timeout for each test run in seconds, can be fractional\n"
+        "\t-b  Set CPU affinity for the threads (in round robin fashion)\n"


what about adding this option to all the other tests in the repo?

I was prototyping on pkeyread, but, yeah, adding it to other tests should be trivial.

I think I understand Nikola's question better now. and I think he is making a good point. let me ask the question different way: what is a difference between running the test using the command:

./pkeyread -f all -k all -b 16

and

taskset 0xffff ./pkeyread -f all -k all 16

If I understand things right, then th -b is a shortcut so people don't need to think of using a taskset(1) is my understanding correct?

nhorman · 2025-10-16T16:07:54Z

source/pkeyread.c

 OSSL_TIME max_time;

-int err = 0;
+int error = 0;


Why this change? Theres a good portion of this PR dedicated to renaming variables that doesn't really have anything to do with the addition of thread affinity management.

this address linker issues. there is function err() which conflicts with variables err. the changes in this PR just discovered this conflict. so the change got included here.

wouldn't it be more prudent to just rename the err() function that commit 953a33f introduced then?

Though I'm surprised that a reasonable compiler can't tell the difference between a variable reference and a function call of the same name.

wouldn't it be more prudent to just rename the err() function that commit 953a33f introduced then?

Though I'm surprised that a reasonable compiler can't tell the difference between a variable reference and a function call of the same name.

the err() is a windows version of err(3) which is commonly used on *nix it would make me sad to see it go,

jogme · 2025-10-16T15:41:30Z

source/perflib/perfhelper.c


 #include <string.h>
 #include <openssl/crypto.h>
+#include <openssl/macros.h>


why is this needed? There is no other change in this file

jogme · 2025-10-16T16:57:50Z

source/perflib/err.c

 }

+void
+err(int status, const char *fmt, ...)


why to duplicate errx function? Same for warn and warnx

err/warn append the output of perror() to the message, while errx/warnx just print the provided string (along with the program name as a prefix).

I see now; sorry for the noise

I don't follow why we need to duplicate all this stuff for thread affinity.. This seems unrelated.

If we want this it should probably be done separately, or we should ask ourselves why we can't use the standard stuff.

npajkovsky · 2025-10-16T20:44:07Z

source/perflib/err.h

+#  include <err.h>
+
+# else /* _WIN32 */
+


We don't use new lines around include in #if.

npajkovsky · 2025-10-16T21:02:57Z

The work is ok, but I'm a little bit lost why the work is needed.

Sashan · 2025-10-17T09:52:28Z

The work is ok, but I'm a little bit lost why the work is needed.

my understanding is you want to pin a thread to CPU so scheduler does not migrate the thread which runs performance test around the system. I think this does not present on system with low number of cores. it becomes more important on large multicore systems.

Sashan · 2025-10-17T12:11:06Z

source/perflib/threads.c

+static ossl_inline unsigned int popcount(affinity_t a)
+{
+    return __builtin_popcountl(a);
+}


if we want to use compiler built-ins can we also enable them for clang?

diff --git a/source/perflib/threads.c b/source/perflib/threads.c index 8cf3a76..4f9187c 100644 --- a/source/perflib/threads.c +++ b/source/perflib/threads.c @@ -22,7 +22,7 @@ /** affinity_t-typed value with nth bit set. */ #define AFFINITY_BIT(n) ((affinity_t)1U << (n)) -#if defined(__GNUC__) +#if defined(__GNUC__) || defined(__clang__) static ossl_inline unsigned int popcount(affinity_t a) { @@ -41,7 +41,7 @@ static ossl_inline unsigned int popcount(affinity_t a) return ret; } -#endif /* __GNUC__ */ +#endif /* __GNUC__ or __clang__ */ int perflib_roundrobin_affinity(affinity_t *cpu_set_bits, size_t cpu_set_size, size_t num, size_t cnt, void *arg)

to be honest I'm with Neal here. My reasoning is the peftools need to be portable to as many platforms/compilers as (conveniently) possible. you are rolling the builtin implementation anway so using a bultinn one here does not buy as much.

on the other hand if limit ourselves to clang and GCC tools, then I'm fine with going to bultin only one.

the true reason I don't like the if/else here is it leaves a dead/untested code behind. In my opinion the true choice here should be:

being portable, then roll your own
or

let's rely on compiler then code will work on platforms where bultiin is provided

in my view the perftools are roll your own case.

Sashan · 2025-10-17T12:44:19Z

source/pkeyread.c

        "\t-v  verbose output, includes min, max, stddev, and median times\n"
-        "\t-T  timeout for each test run in seconds, can be fractional"
+        "\t-T  timeout for each test run in seconds, can be fractional\n"
+        "\t-b  Set CPU affinity for the threads (in round robin fashion)\n"


I think I understand Nikola's question better now. and I think he is making a good point. let me ask the question different way: what is a difference between running the test using the command:

./pkeyread -f all -k all -b 16

and

taskset 0xffff ./pkeyread -f all -k all 16

If I understand things right, then th -b is a shortcut so people don't need to think of using a taskset(1) is my understanding correct?

nhorman · 2025-10-17T13:03:07Z

I think I understand Nikola's question better now. and I think he is making a good point. let me ask the question different way: what is a difference between running the test using the command:

I think the difference between:

./pkeyread -f all -k all -b 16

and

taskset  0xffff ./pkeyread -f all -k all 16

Is that in the latter case we rely on the OS scheduler to place threads on unique cores.

In the former case thread 1 is guaranteed to have an affinity of 0x1, thread 2 an affinity of 0x2, thread 3 an affinity of 0x4, etc.

In the latter all threads can run on any ore in the affinity set. Will they likely be scheduled to unique cores? Probably. Are they guaranteed to be? No.

I guess the question to ask is "Does that matter to us?", and honestly, I'm not sure of the answer there.

esyr · 2025-10-17T13:22:16Z

The work is ok, but I'm a little bit lost why the work is needed.

So, the original reason I ended up writing that is that while working on x509storeissuer updates, I started seeing some anomalous results, and wanted to exclude that aspect from the list of possible factors. In general, pinning threads helps with the following:

it minimises noise from rescheduling and discrepancies of impacts of performance of specific CPU cores across test runs;
it allows referencing to thread numbers (which is sometimes useful in cases of anomalous performance of some of them), as they correlate with CPU cores that way;
it allows providing specific thread mappings on the system's topology, which is useful in conjunction with some other aspects of test runs, like, the way some resources are shared across threads or the way some thread perform work, and/or the CPU mask set for the whole test.

All those factors are predominantly relevant only when running on NUMA systems, naturally.

Sashan · 2025-10-17T15:52:41Z

> All those factors are predominantly relevant only when running on NUMA systems, naturally.

understood. my preference here is to get away with taskset(1) (if possible) also it looks like windows offer similar mechanism according to stack overflow The takset seems to be available on FreeBSD. Solaris has prset(1M) to set affinity for process I believe other systems which can manage thread affinity expose their own command line tooling.

In my opinion the less we do here the better.

Sashan

To be honest I'm not entirely convinced about affinity changes here. My preference is to get away with task_set (and similar command line tools on other than linux OSes if stach command is provided). Would you mind to keep it in your perftools fork for a while? I would include it when I will be sure perftools need it. At the moment I just think time has not come yet. thanks.

Sashan · 2025-10-20T14:22:06Z

> > > All those factors are predominantly relevant only when running on NUMA systems, naturally.

back in the time you were hunting down the x509storeissuer performance would running the test using task_set help you or it was not sufficient so you had to opt to implement thread pinning using glibc?

bob-beck · 2025-10-23T15:04:58Z

source/perflib/err.c

 }

+void
+err(int status, const char *fmt, ...)


I don't follow why we need to duplicate all this stuff for thread affinity.. This seems unrelated.

If we want this it should probably be done separately, or we should ask ourselves why we can't use the standard stuff.

bob-beck · 2025-10-23T15:05:37Z

source/perflib/err.h

+    do { \
+        fprintf(stderr, "%s:%d(%s): ", __FILE__, __LINE__, __FUNCTION__); \
+        errx(__VA_ARGS__); \
+    } while (0)


This seems like yet another way to do this just for this PR, I'm questioning why this needs to be included,

bob-beck · 2025-10-23T15:09:50Z

source/perflib/threads.c

+static ossl_inline unsigned int popcount(affinity_t a)
+{
+    return __builtin_popcountl(a);
+}


So what's the goal and the problem we are seeking to solve here that we need code to support to solve.

How will we support this, when people start fiddling with it and complain about impact because they do it poorly, and attempt to be smarter than the scheduler and do it badly.

To ask this another way, if our goal is to have less noisy data for falling-down-the-hill performance tests, does this belong in testing support for us and not in the main library?

mea culpa, that's what this is, so yeah it's in the test library, I have less objections.

Sashan · 2025-10-23T15:23:48Z

IMO we should try to rely on using command line tools to deal with thread affinity (like taskset(1) and similar commands provided by platform where script is running.

so I would drop the affinity related changes from here. other pieces left behind still seem useful as they make things tidy.

esyr · 2025-10-30T12:31:44Z

As mentioned in [1], I haven't managed to find any discernible difference in performance and noise levels on pkeyread and x509storeissuer tests when thread pinning is applied, and, since there's strong hesitance towards having such a capability, I see no reason in pursuing the acceptance of this patch set.

[1] openssl/project#1693 (comment)

esyr requested review from Sashan and jogme September 25, 2025 23:53

Sashan requested changes Sep 30, 2025

View reviewed changes

esyr linked an issue Sep 30, 2025 that may be closed by this pull request

[perftools] Add support for setting thread affinity in tests openssl/project#1660

Open

Sashan and others added 2 commits October 16, 2025 11:36

s/err/error where apropriate easiest way to fix liner issues on windows

d568228

Use perflib/err.h unconditionally

caed5ee

Signed-off-by: Eugene Syromiatnikov <[email protected]>

esyr force-pushed the esyr/thread-affinity branch 2 times, most recently from efab928 to c033606 Compare October 16, 2025 12:18

esyr and others added 8 commits October 16, 2025 15:15

perflib/err.c: use program_invocation_name on glibc

393f457

Signed-off-by: Eugene Syromiatnikov <[email protected]>

perflib: add vwarn/err/warn

953a33f

Signed-off-by: Eugene Syromiatnikov <[email protected]>

perflib/err.h: add WARN/WARNX/ERR/ERRX

e78cd6a

Signed-off-by: Eugene Syromiatnikov <[email protected]>

perflib: add ability to set thread affinity

1b1e38c

Co-Authored-by: Alexandr Nedvedicky <[email protected]> Signed-off-by: Eugene Syromiatnikov <[email protected]>

pkeyread: output counts array allocation error to stderr

2484c08

Signed-off-by: Eugene Syromiatnikov <[email protected]>

pkeyread: tfix

0d9c3c8

Signed-off-by: Eugene Syromiatnikov <[email protected]>

README.md: update pkeyread documentation

a1b53d4

Signed-off-by: Eugene Syromiatnikov <[email protected]>

pkeyread: add an option to bind threads to cores

30f2a1f

Signed-off-by: Eugene Syromiatnikov <[email protected]>

esyr force-pushed the esyr/thread-affinity branch from c033606 to 30f2a1f Compare October 16, 2025 13:16

esyr requested a review from Sashan October 16, 2025 13:30

esyr marked this pull request as ready for review October 16, 2025 13:31

vavroch2010 requested review from nhorman and npajkovsky October 16, 2025 15:03

nhorman requested changes Oct 16, 2025

View reviewed changes

jogme suggested changes Oct 16, 2025

View reviewed changes

npajkovsky reviewed Oct 16, 2025

View reviewed changes

source/perflib/err.h

# include <err.h>

# else /* _WIN32 */

Copy link

npajkovsky Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use new lines around include in #if.

Sashan requested changes Oct 17, 2025

View reviewed changes

Sashan requested changes Oct 19, 2025

View reviewed changes

bob-beck suggested changes Oct 23, 2025

View reviewed changes

esyr closed this Oct 30, 2025

Add ability to set thread affinity #51

Add ability to set thread affinity #51

Uh oh!

Conversation

esyr commented Sep 25, 2025

Uh oh!

Sashan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bob-beck Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

npajkovsky commented Oct 16, 2025

Uh oh!

Sashan commented Oct 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nhorman commented Oct 17, 2025

Uh oh!

esyr commented Oct 17, 2025

Uh oh!

Sashan commented Oct 17, 2025

Uh oh!

Sashan left a comment

Choose a reason for hiding this comment

Uh oh!

Sashan commented Oct 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bob-beck Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bob-beck Oct 23, 2025 •

edited

Loading

bob-beck Oct 23, 2025 •

edited

Loading