@@ -88,6 +88,7 @@ is distributed under the [ISC license](LICENSE.md).
8888 - [ Understanding transactions] ( #understanding-transactions )
8989 - [ A three-stack lock-free queue] ( #a-three-stack-lock-free-queue )
9090 - [ A rehashable lock-free hash table] ( #a-rehashable-lock-free-hash-table )
91+ - [ Avoid false sharing] ( #avoid-false-sharing )
9192 - [ Beware of torn reads] ( #beware-of-torn-reads )
9293
9394## A quick tour
@@ -1984,6 +1985,131 @@ What we have here is a lock-free hash table with rehashing that should not be
19841985highly prone to starvation. In other respects this is a fairly naive hash table
19851986implementation. You might want to think about various ways to improve upon it.
19861987
1988+ ### Avoid false sharing
1989+
1990+ [ False sharing] ( https://en.wikipedia.org/wiki/False_sharing ) is a form of
1991+ contention that arises when some location, that is being written to by at least
1992+ a single core, happens to be in memory next to &mdash ; within the same cache
1993+ line aligned region of memory &mdash ; another location that is accessed, read or
1994+ written, by other cores.
1995+
1996+ Perhaps contrary to how it is often described, false sharing doesn't require the
1997+ use of atomic variables or atomic instructions. Consider the following example:
1998+
1999+ ``` ocaml
2000+ # type state = { mutable counter : int; mutable finished: bool }
2001+ type state = { mutable counter : int; mutable finished : bool; }
2002+
2003+ # let state = { counter = 1_000; finished = false }
2004+ val state : state = {counter = 1000; finished = false}
2005+
2006+ # let reader = Domain.spawn @@ fun () ->
2007+ while not state.finished do
2008+ Domain.cpu_relax ()
2009+ done
2010+ val reader : unit Domain.t = <abstr>
2011+
2012+ # while 0 < state.counter do
2013+ state.counter <- state.counter - 1
2014+ done;
2015+ - : unit = ()
2016+
2017+ # state.finished <- true;
2018+ - : unit = ()
2019+
2020+ # Domain.join reader
2021+ - : unit = ()
2022+ ```
2023+
2024+ The ` state ` is a record with two fields, ` counter ` and ` finished ` , next to each
2025+ other, which makes it rather likely for them to happen to reside in the same
2026+ cache line aligned region of memory. The main domain repeatedly mutates the
2027+ ` counter ` field and the other domain repeatedly reads the ` finished ` field. What
2028+ this means in practice is that the reads of the ` finished ` field by the other
2029+ domain will be very expensive, because the cache is repeatedly invalidated by
2030+ the ` counter ` updates done by the main domain.
2031+
2032+ The above example is contrived, of course, but this sort of false sharing can
2033+ happen very easily. Cache lines are typically relatively large &mdash ; 8, 16, or
2034+ even 32 words wide. Typically many, if not most, heap allocated objects in OCaml
2035+ are smaller than a cache line, which means that false sharing may easily happen
2036+ even between seemingly unrelated objects.
2037+
2038+ To completely avoid false sharing one would basically need to make sure that
2039+ mutable locations (atomic or otherwise) are not allocated next to locations that
2040+ might be accessed from other domains. Unfortunately, that is difficult to
2041+ achieve without being expensive in itself as it tends to increase memory usage
2042+ and the amount of initializing stores.
2043+
2044+ The
2045+ [ ` Loc.make ` ] ( https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Loc/index.html#val-make )
2046+ function takes an optional ` padded ` argument, which can be explicitly specified
2047+ as ` ~padded:true ` to request the location to be allocated in a way to avoid
2048+ false sharing. Using ` ~padded:true ` on long lived shared memory locations that
2049+ are being repeatedly modified can improve performance significantly. Using
2050+ ` ~padded:true ` on short lived shared memory locations is not recommended.
2051+
2052+ Using ` ~padded:true ` does not eliminate all false sharing, however. Consider the
2053+ following sketch of a queue data structure:
2054+
2055+ ``` ocaml
2056+ type 'a queue = {
2057+ head: 'a list Loc.t;
2058+ tail: 'a list Loc.t
2059+ }
2060+ ```
2061+
2062+ Even if you allocate the locations with padding
2063+
2064+ ``` ocaml
2065+ # let queue () = {
2066+ head = Loc.make ~padded:true [];
2067+ tail = Loc.make ~padded:true []
2068+ }
2069+ val queue : unit -> 'a queue = <fun>
2070+ ```
2071+
2072+ the queue record will still be vulnerable to the same kind of false sharing as
2073+ in the earlier example:
2074+
2075+ ``` ocaml
2076+ # let a_queue : int queue = queue ()
2077+ val a_queue : int queue = {head = <abstr>; tail = <abstr>}
2078+
2079+ # let counter = ref 1_000
2080+ val counter : int ref = {contents = 1000}
2081+ ```
2082+
2083+ Above the reference cell for the ` counter ` might exhibit false sharing with the
2084+ queue record (which is read-only) and significantly degrade the performance of
2085+ the queue for passing messages between domains.
2086+
2087+ To avoid the above kind of problems, a strategic approach is to also allocate
2088+ the queue record in a way to avoid false sharing. Unfortunately OCaml does not
2089+ currently provide a standard way to do so. The
2090+ [ multicore-magic] ( https://github.com/ocaml-multicore/multicore-magic ) library
2091+ provides a function
2092+ [ ` copy_as_padded ` ] ( https://ocaml-multicore.github.io/multicore-magic/doc/multicore-magic/Multicore_magic/index.html#val-copy_as_padded )
2093+ for the purpose. Using
2094+ [ ` copy_as_padded ` ] ( https://ocaml-multicore.github.io/multicore-magic/doc/multicore-magic/Multicore_magic/index.html#val-copy_as_padded )
2095+ one would write
2096+
2097+ ``` ocaml
2098+ # let queue () =
2099+ Multicore_magic.copy_as_padded {
2100+ head = Loc.make ~padded:true [];
2101+ tail = Loc.make ~padded:true []
2102+ }
2103+ val queue : unit -> 'a queue = <fun>
2104+ ```
2105+
2106+ to allocate the queue record in a way to avoid false sharing.
2107+
2108+ Note that allocating long lived data structures, like queues, used for inter
2109+ domain communication in the way as described above to avoid false sharing does
2110+ not eliminate all false sharing, but it is likely to reduce false sharing
2111+ significantly with relatively low effort.
2112+
19872113### Beware of torn reads
19882114
19892115The algorithm underlying ** kcas** ensures that it is not possible to read
0 commit comments