|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 |
| 2 | +
|
| 3 | +====== |
| 4 | +futex2 |
| 5 | +====== |
| 6 | + |
| 7 | +:Author: André Almeida < [email protected]> |
| 8 | + |
| 9 | +futex, or fast user mutex, is a set of syscalls to allow userspace to create |
| 10 | +performant synchronization mechanisms, such as mutexes, semaphores and |
| 11 | +conditional variables in userspace. C standard libraries, like glibc, uses it |
| 12 | +as a means to implement more high level interfaces like pthreads. |
| 13 | + |
| 14 | +futex2 is a followup version of the initial futex syscall, designed to overcome |
| 15 | +limitations of the original interface. |
| 16 | + |
| 17 | +User API |
| 18 | +======== |
| 19 | + |
| 20 | +``futex_waitv()`` |
| 21 | +----------------- |
| 22 | + |
| 23 | +Wait on an array of futexes, wake on any:: |
| 24 | + |
| 25 | + futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes, |
| 26 | + unsigned int flags, struct timespec *timeout, clockid_t clockid) |
| 27 | + |
| 28 | + struct futex_waitv { |
| 29 | + __u64 val; |
| 30 | + __u64 uaddr; |
| 31 | + __u32 flags; |
| 32 | + __u32 __reserved; |
| 33 | + }; |
| 34 | + |
| 35 | +Userspace sets an array of struct futex_waitv (up to a max of 128 entries), |
| 36 | +using ``uaddr`` for the address to wait for, ``val`` for the expected value |
| 37 | +and ``flags`` to specify the type (e.g. private) and size of futex. |
| 38 | +``__reserved`` needs to be 0, but it can be used for future extension. The |
| 39 | +pointer for the first item of the array is passed as ``waiters``. An invalid |
| 40 | +address for ``waiters`` or for any ``uaddr`` returns ``-EFAULT``. |
| 41 | + |
| 42 | +If userspace has 32-bit pointers, it should do a explicit cast to make sure |
| 43 | +the upper bits are zeroed. ``uintptr_t`` does the tricky and it works for |
| 44 | +both 32/64-bit pointers. |
| 45 | + |
| 46 | +``nr_futexes`` specifies the size of the array. Numbers out of [1, 128] |
| 47 | +interval will make the syscall return ``-EINVAL``. |
| 48 | + |
| 49 | +The ``flags`` argument of the syscall needs to be 0, but it can be used for |
| 50 | +future extension. |
| 51 | + |
| 52 | +For each entry in ``waiters`` array, the current value at ``uaddr`` is compared |
| 53 | +to ``val``. If it's different, the syscall undo all the work done so far and |
| 54 | +return ``-EAGAIN``. If all tests and verifications succeeds, syscall waits until |
| 55 | +one of the following happens: |
| 56 | + |
| 57 | +- The timeout expires, returning ``-ETIMEOUT``. |
| 58 | +- A signal was sent to the sleeping task, returning ``-ERESTARTSYS``. |
| 59 | +- Some futex at the list was woken, returning the index of some waked futex. |
| 60 | + |
| 61 | +An example of how to use the interface can be found at ``tools/testing/selftests/futex/functional/futex_waitv.c``. |
| 62 | + |
| 63 | +Timeout |
| 64 | +------- |
| 65 | + |
| 66 | +``struct timespec *timeout`` argument is an optional argument that points to an |
| 67 | +absolute timeout. You need to specify the type of clock being used at |
| 68 | +``clockid`` argument. ``CLOCK_MONOTONIC`` and ``CLOCK_REALTIME`` are supported. |
| 69 | +This syscall accepts only 64bit timespec structs. |
| 70 | + |
| 71 | +Types of futex |
| 72 | +-------------- |
| 73 | + |
| 74 | +A futex can be either private or shared. Private is used for processes that |
| 75 | +shares the same memory space and the virtual address of the futex will be the |
| 76 | +same for all processes. This allows for optimizations in the kernel. To use |
| 77 | +private futexes, it's necessary to specify ``FUTEX_PRIVATE_FLAG`` in the futex |
| 78 | +flag. For processes that doesn't share the same memory space and therefore can |
| 79 | +have different virtual addresses for the same futex (using, for instance, a |
| 80 | +file-backed shared memory) requires different internal mechanisms to be get |
| 81 | +properly enqueued. This is the default behavior, and it works with both private |
| 82 | +and shared futexes. |
| 83 | + |
| 84 | +Futexes can be of different sizes: 8, 16, 32 or 64 bits. Currently, the only |
| 85 | +supported one is 32 bit sized futex, and it need to be specified using |
| 86 | +``FUTEX_32`` flag. |
0 commit comments