You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* libreactor: enable SO_ATTACH_REUSEPORT_CBPF
SO_ATTACH_REUSEPORT_CBPF is a BPF based "program" that automatically assigns a packet to a given socket based on the core id of the CPU that initially received the packet and did the IRQ processing. This improves data locality and therefore increases performance.
* libreactor: Improve SO_ATTACH_REUSEPORT_CBPF performance by controlling worker forking order
Rename setup() to fork_workers() to make its purpose clearer
The standard BPF program used with SO_ATTACH_REUSEPORT_CBPF automatically assigns a packet to a given socket based on the core id of the CPU that initially received the packet and did the IRQ processing, CPU 0 -> socket 0. The idea is that if the packet is passed to the userland code running on the same CPU then things are more efficient. However, contrary to my initial assumption, there isn't an automatic mapping between the id of a socket, and the id of the CPU that the userland process (which opened the socket) is running on. The "id" of the socket is determined by the order in which sockets are opened. So it works best if the order in which the sockets are opened is controlled to match the order in which processes are pinned to CPUs.
Previously, the for loop in setup() (a) forked a child process, (b) pinned it to a CPU, and then (c) started up an instance of the libreactor server. However since fork() was being called inside the loop, the order in which the sockets got opened in the child processes was not deterministic. In some cases the process that was pinned to CPU 0 would actually end up being the third process to open a socket, so it would end up getting packets that had been received on the kernel side by CPU 2, which of course doesn't bring any efficiency gains.
To resolve this, I am using an eventfd semaphore to communicate between the parent and child processes and ensure that the forking happens sequentially, and the order of the sockets being opened matches the order of the CPUs being pinned. Now I am seeing a much more consistent performance improvement.
* libreactor: Upgrade to newly released libdynamic 2.2.0
0 commit comments