Replies: 12 comments 32 replies
-
I had a bit of a question on it. Do you think the recent addition of Linux's sched-ext and its bpf userland part, the scx project, could potentially help HAProxy performance to some extent, even though the latter does its own Numa nodes, LLC, cpu topology ? |
Beta Was this translation helpful? Give feedback.
-
@awlx @pierrecdn @cpaillet @dclaisse @jvgutierrez @philos @exander77 @jvinolas @JB0925 : this can also be of interest to you based on some of your previous reports indicating you're dealing with large setups. |
Beta Was this translation helpful? Give feedback.
-
We use it in docker. Do you know if you will build it for docker https://hub.docker.com/_/haproxy/tags?name=3.2-dev8 ? |
Beta Was this translation helpful? Give feedback.
-
Hi everybody! @rnsanchez and I tested 3.2-dev8 on our production machines. Just FYI, they are 2 x Intel(R) Xeon(R) Gold 6430. Using our original conf (no
Using
On this machine specifically, we also tried with:
This got different results:
Now, with this
One thing to notice: We're forcing Please let us know if need us to run any more tests. Happy to help! :-) |
Beta Was this translation helpful? Give feedback.
-
Hi wtarreau! This work looks promising in general, but I do have a question. How is the load-balancing handled internally when using the new setup? Right now our go-to solution So the question is, am I correct in thinking that using thread-groups is effectively the same as running Having said that, I'd be willing to test this relatively soon on a massive AMD 9684X cpu. |
Beta Was this translation helpful? Give feedback.
-
Hello @wtarreau 👋, From our perspective, this works well. Here it is:
So, from that point of view, it's a win, less toil to configure this, and probably no headache when a new hardware is coming. But now I have a question: is it possible to make use of the computed |
Beta Was this translation helpful? Give feedback.
-
Just happened the bug in one of our setups and restarting the docker failed again. I've used latest
|
Beta Was this translation helpful? Give feedback.
-
In case it helps:
With "cpu-policy group-by-cluster":
Without "cpu-policy group-by-cluster":
|
Beta Was this translation helpful? Give feedback.
-
Thanks Christian ,for this feedback. However I want find any reason for the stats socket to stop working, especially in relation with thread groups (since the policy basically does only that, it configures thread groups and cpu maps for you). During our tests we constantly have commands looping over the CLI using socat (like the ones running show threads etc). Do you know if you're having any monitoring scripts connecting and issuing commands ? This makes me think that maybe one command doesn't end well and leaves a connection pending, and this would be serious enough that I'd like to figure what this is :-/ The fact that it happened on multiple LBs is not a coincidence, there's definitely a pattern that triggers this problem. I'll try to issue various debugging commands like show sess etc just in case I spot anything. |
Beta Was this translation helpful? Give feedback.
-
I just managed to block a CLI session by sending a command at the same moment the timeout expired. The stream is still there, with no timeout:
I'll now try to reproduce this more consistently. |
Beta Was this translation helpful? Give feedback.
-
Update on the last point, we finally found the root cause thanks to Christian's help, it's indeed thread-groups + stats socket with a certain connection rate that causes incoming connections to be bounced to other listeners. The connection accounting was not transfered in this case, resulting in listeners being blocked after several iterations. Now fixed in mainline. Better pull latest mainline when testing thread groups then. |
Beta Was this translation helpful? Give feedback.
-
We got another hung with proposed 3.1.6 version. It's docker alpine version running within a KVM virtual machine now: dmesg
haproxy docker:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
In 3.2-dev8 (to come soon), we've finally merged the so-called NUMA series that lasted for almost 2 years, which detects CPU topologies and arranges threads into groups based on nodes, CCX, clusters etc. By default nothing is changed (at least changes are not expected), but you can now adjust the default CPU binding policy in the
global
section using thecpu-policy
directive to one of these:group-by-cluster
: will create at least one thread group per cluster, a cluster here having boundaries on CPU types, declared clusters, LLC, nodes and packages; if more than 64 threads are present in the same cluster, more groups will automatically be created.performance
: same as above but only uses P-cores on mixed platformsefficiency
: same but only uses E-coresresource
: only uses the smallest detected cluster, probably just for containers and/or VMs.first-usable-node
: that's the current default since 2.5 or so, it uses the first node only, and within the limit of 64 threads.These policies are ignored if "nbthread", "thread-groups" or "cpu-map" are defined, so in order to test them, just use:
And that will be all. You can check with
-dc
on the command line what is detected (though that's a bit too developer-oriented at this point).I'd be interested in gathering feedback from those running on:
cpu-map
except that this is a PITA on some machines)For now I've tested on various x86 (64-core EPYC, 24- and 8-core Xeon, atoms, skylake, core i7 8th gen, 14th gen), ARM (v7, v8, v9, various models of each, including big.LITTLE, 16-core LX2), dual-core quad-thread MIPS.
The detection only works on Linux (everything is exposed under /sys). On FreeBSD and MacOS we only know the number of CPUs (and the node on FreeBSD) and that's still used to set the number of threads.
For the curious, there's also
cpu-set
which allows to reset taskset, use only / ignore certain CPUs (by number, by core number, thread number, cluster number, node number etc). That's only used to pre-select the ones the cpu-policy will operate with (e.g. "I want to disable HT0 because I reserve it for network IRQs").The goal if this works well is to change the default policy for 3.3 from
first-usable-node
to one to determine, probablyperformance
orgroup-by-cluster
or any new one that's left to be designed based on the feedback.In particular, @idl0r @jaroslawr @felipewd @rnsanchez @majedrze you've often reported issues related to large setups, if you could run some tests before the final release, that would be great ;-) Others are absolutely welcome of course!
Beta Was this translation helpful? Give feedback.
All reactions