@@ -53,13 +53,14 @@ is distributed under the [ISC license](LICENSE.md).
5353 - [ Composing transactions] ( #composing-transactions )
5454 - [ Blocking transactions] ( #blocking-transactions )
5555 - [ A transactional lock-free leftist heap] ( #a-transactional-lock-free-leftist-heap )
56- - [ A composable Michael-Scott style queue] ( #a-composable-michael-scott-style-queue )
5756 - [ Programming with primitive operations] ( #programming-with-primitive-operations )
5857- [ Designing lock-free algorithms with k-CAS] ( #designing-lock-free-algorithms-with-k-cas )
5958 - [ Minimize accesses] ( #minimize-accesses )
6059 - [ Prefer compound accesses] ( #prefer-compound-accesses )
6160 - [ Log updates optimistically] ( #log-updates-optimistically )
6261 - [ Postcompute] ( #postcompute )
62+ - [ Post commit actions] ( #post-commit-actions )
63+ - [ A composable Michael-Scott style queue] ( #a-composable-michael-scott-style-queue )
6364 - [ Race to cooperate] ( #race-to-cooperate )
6465 - [ Understanding transactions] ( #understanding-transactions )
6566 - [ A three-stack lock-free queue] ( #a-three-stack-lock-free-queue )
@@ -712,134 +713,6 @@ Notice how we were able to use a `while` loop, rather than recursion, in
712713> lock-free heap implementation, but it was pretty straightforward to implement
713714> using k-CAS based on a textbook imperative implementation.
714715
715- #### A composable Michael-Scott style queue
716-
717- One of the most famous lock-free algorithms is
718- [ the Michael-Scott queue] ( https://www.cs.rochester.edu/~scott/papers/1996_PODC_queues.pdf ) .
719- Perhaps its characteristic feature is that the tail pointer of the queue is
720- allowed to momentarily fall behind and that operations on the queue perform
721- cooperative CASes to update the tail. The tail pointer can be seen as an
722- optimization &mdash ; whether it points to the true tail or not does not change
723- the logical state of the queue. Let's implement a composable queue that allows
724- the tail to momentarily lag behind.
725-
726- First we define a type for nodes:
727-
728- ``` ocaml
729- type 'a node = Nil | Node of 'a * 'a node Loc.t
730- ```
731-
732- A queue is then a pair of pointers to the head and tail of a queue:
733-
734- ``` ocaml
735- type 'a queue = {
736- head : 'a node Loc.t Loc.t;
737- tail : 'a node Loc.t Atomic.t
738- }
739- ```
740-
741- Note that we used an ` Atomic.t ` for the tail. We do not need to operate on the
742- tail transactionally.
743-
744- To create a queue we allocate a shared memory location for the pointer to the
745- first node to be enqueued and make both the head and tail point to the location:
746-
747- ``` ocaml
748- # let queue () =
749- let next = Loc.make Nil in
750- { head = Loc.make next; tail = Atomic.make next }
751- val queue : unit -> 'a queue = <fun>
752- ```
753-
754- To dequeue a node, only the head of the queue is examined. If the location
755- pointed to by the head points to a node we update the head to point to the
756- location pointing to the next node:
757-
758- ``` ocaml
759- # let try_dequeue ~xt { head; _ } =
760- let old_head = Xt.get ~xt head in
761- match Xt.get ~xt old_head with
762- | Nil -> None
763- | Node (value, next) ->
764- Xt.set ~xt head next;
765- Some value
766- val try_dequeue : xt:'a Xt.t -> 'b queue -> 'b option = <fun>
767- ```
768-
769- To enqueue a value into the queue, only the tail of the queue needs to be
770- examined. We allocate a new location for the new tail and a node. We then need
771- to find the true tail of the queue and update it to point to the new node. The
772- reason we need to find the true tail is that we explicitly allow the tail to
773- momentarily fall behind. We then add a post commit action to the transaction to
774- update the tail after the transaction has been successfully committed:
775-
776- ``` ocaml
777- # let enqueue ~xt { tail; _ } value =
778- let new_tail = Loc.make Nil in
779- let new_node = Node (value, new_tail) in
780- let rec find_and_set_tail old_tail =
781- match Xt.compare_and_swap ~xt old_tail Nil new_node with
782- | Nil -> ()
783- | Node (_, old_tail) -> find_and_set_tail old_tail
784- in
785- find_and_set_tail (Atomic.get tail);
786- let rec fix_tail () =
787- let old_tail = Atomic.get tail in
788- if
789- Loc.get new_tail == Nil
790- && not (Atomic.compare_and_set tail old_tail new_tail)
791- then fix_tail ()
792- in
793- Xt.post_commit ~xt fix_tail
794- val enqueue : xt:'a Xt.t -> 'b queue -> 'b -> unit = <fun>
795- ```
796-
797- The post commit action, registered using
798- [ ` post_commit ` ] ( https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-post_commit ) ,
799- checks that the tail is still the true tail and then attempts to update the
800- tail. The order of accesses is very subtle as always with non-transactional
801- atomic operations. Can you see why it works? Although we allow the tail to
802- momentarily fall behind, it is important that we do not let the tail fall behind
803- indefinitely, because then we would risk leaking memory &mdash ; nodes that have
804- been dequeued from the queue would still be pointed to by the tail.
805-
806- Using the Michael-Scott style queue is as easy as any other transactional queue:
807-
808- ``` ocaml
809- # let a_queue : int queue = queue ()
810- val a_queue : int queue = {head = <abstr>; tail = <abstr>}
811- # Xt.commit { tx = enqueue a_queue 19 }
812- - : unit = ()
813- # Xt.commit { tx = try_dequeue a_queue }
814- - : int option = Some 19
815- # Xt.commit { tx = try_dequeue a_queue }
816- - : int option = None
817- ```
818-
819- The queue implementation in this section is an example of using ** kcas** to
820- implement a fine-grained lock-free algorithm. Instead of recording all shared
821- memory accesses and performing them atomically all at once, the implementation
822- updates the tail outside of the transaction. This can potentially improve
823- performance and scalability.
824-
825- This sort of algorithm design requires careful reasoning. Consider the dequeue
826- operation. Instead of recording the ` Xt.get ~xt old_head ` operation in the
827- transaction log, one could propose to bypass the log as ` Loc.get old_head ` . That
828- may seem like a valid optimization, because logging the update of the head in
829- the transaction is sufficient to ensure that each transaction dequeues a unique
830- node. Unfortunately that would change the semantics of the operation.
831-
832- Suppose, for example, that you have two queues, _ A_ and _ B_ , and you must
833- maintain the invariant that at least one of the queues is empty. One domain
834- tries to dequeue from _ A_ and, if _ A_ was empty, enqueue to _ B_ . Another domain
835- does the opposite, dequeue from _ B_ and enqueue to _ A_ (when _ B_ was empty).
836- When such operations are performed in isolation, the invariant would be
837- maintained. However, if the access of ` old_head ` is not recorded in the log, it
838- is possible to end up with both _ A_ and _ B_ non-empty. This kind of
839- [ race condition] ( https://en.wikipedia.org/wiki/Race_condition ) is common enough
840- that it has been given a name: _ write skew_ . As an exercise, write out the
841- sequence of atomic accesses that leads to that result.
842-
843716### Programming with primitive operations
844717
845718The [ ` Op ` ] ( https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Op/index.html )
@@ -1132,6 +1005,168 @@ transactions:
11321005val n_elems : int = 2
11331006```
11341007
1008+ ### Post commit actions
1009+
1010+ Closely related to moving compute outside of transactions, it is also sometimes
1011+ possible or necessary to perform some side-effects or actions, such as
1012+ non-transactional IO operations, only after a transaction has been committed
1013+ successfully. These cases are supported via the ability to register
1014+ [ ` post_commit ` ] ( https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-post_commit )
1015+ actions.
1016+
1017+ As a basic example, one might want to log a message when some transactional
1018+ operation is performed. Instead of directly logging the message
1019+
1020+ ``` ocaml
1021+ # let enqueue_and_log ~xt queue message =
1022+ enqueue ~xt queue message;
1023+ (* BAD: The printf could be executed many times! *)
1024+ Printf.printf "sent %s" message
1025+ val enqueue_and_log : xt:'a Xt.t -> string queue -> string -> unit = <fun>
1026+ ```
1027+
1028+ one should use
1029+ [ ` post_commit ` ] ( https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-post_commit )
1030+
1031+ ``` ocaml
1032+ # let enqueue_and_log ~xt queue message =
1033+ enqueue ~xt queue message;
1034+ Xt.post_commit ~xt @@ fun () ->
1035+ Printf.printf "sent %s" message
1036+ val enqueue_and_log : xt:'a Xt.t -> string queue -> string -> unit = <fun>
1037+ ```
1038+
1039+ to make sure that the message is only printed once after the transaction has
1040+ actually completed successfully.
1041+
1042+ #### A composable Michael-Scott style queue
1043+
1044+ One of the most famous lock-free algorithms is
1045+ [ the Michael-Scott queue] ( https://www.cs.rochester.edu/~scott/papers/1996_PODC_queues.pdf ) .
1046+ Perhaps its characteristic feature is that the tail pointer of the queue is
1047+ allowed to momentarily fall behind and that operations on the queue perform
1048+ cooperative CASes to update the tail. The tail pointer can be seen as an
1049+ optimization &mdash ; whether it points to the true tail or not does not change
1050+ the logical state of the queue. Let's implement a composable queue that allows
1051+ the tail to momentarily lag behind.
1052+
1053+ First we define a type for nodes:
1054+
1055+ ``` ocaml
1056+ type 'a node = Nil | Node of 'a * 'a node Loc.t
1057+ ```
1058+
1059+ A queue is then a pair of pointers to the head and tail of a queue:
1060+
1061+ ``` ocaml
1062+ type 'a queue = {
1063+ head : 'a node Loc.t Loc.t;
1064+ tail : 'a node Loc.t Atomic.t
1065+ }
1066+ ```
1067+
1068+ Note that we used an ` Atomic.t ` for the tail. We do not need to operate on the
1069+ tail transactionally.
1070+
1071+ To create a queue we allocate a shared memory location for the pointer to the
1072+ first node to be enqueued and make both the head and tail point to the location:
1073+
1074+ ``` ocaml
1075+ # let queue () =
1076+ let next = Loc.make Nil in
1077+ { head = Loc.make next; tail = Atomic.make next }
1078+ val queue : unit -> 'a queue = <fun>
1079+ ```
1080+
1081+ To dequeue a node, only the head of the queue is examined. If the location
1082+ pointed to by the head points to a node we update the head to point to the
1083+ location pointing to the next node:
1084+
1085+ ``` ocaml
1086+ # let try_dequeue ~xt { head; _ } =
1087+ let old_head = Xt.get ~xt head in
1088+ match Xt.get ~xt old_head with
1089+ | Nil -> None
1090+ | Node (value, next) ->
1091+ Xt.set ~xt head next;
1092+ Some value
1093+ val try_dequeue : xt:'a Xt.t -> 'b queue -> 'b option = <fun>
1094+ ```
1095+
1096+ To enqueue a value into the queue, only the tail of the queue needs to be
1097+ examined. We allocate a new location for the new tail and a node. We then need
1098+ to find the true tail of the queue and update it to point to the new node. The
1099+ reason we need to find the true tail is that we explicitly allow the tail to
1100+ momentarily fall behind. We then add a post commit action to the transaction to
1101+ update the tail after the transaction has been successfully committed:
1102+
1103+ ``` ocaml
1104+ # let enqueue ~xt { tail; _ } value =
1105+ let new_tail = Loc.make Nil in
1106+ let new_node = Node (value, new_tail) in
1107+ let rec find_and_set_tail old_tail =
1108+ match Xt.compare_and_swap ~xt old_tail Nil new_node with
1109+ | Nil -> ()
1110+ | Node (_, old_tail) -> find_and_set_tail old_tail
1111+ in
1112+ find_and_set_tail (Atomic.get tail);
1113+ let rec fix_tail () =
1114+ let old_tail = Atomic.get tail in
1115+ if
1116+ Loc.get new_tail == Nil
1117+ && not (Atomic.compare_and_set tail old_tail new_tail)
1118+ then fix_tail ()
1119+ in
1120+ Xt.post_commit ~xt fix_tail
1121+ val enqueue : xt:'a Xt.t -> 'b queue -> 'b -> unit = <fun>
1122+ ```
1123+
1124+ The post commit action, registered using
1125+ [ ` post_commit ` ] ( https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-post_commit ) ,
1126+ checks that the tail is still the true tail and then attempts to update the
1127+ tail. The order of accesses is very subtle as always with non-transactional
1128+ atomic operations. Can you see why it works? Although we allow the tail to
1129+ momentarily fall behind, it is important that we do not let the tail fall behind
1130+ indefinitely, because then we would risk leaking memory &mdash ; nodes that have
1131+ been dequeued from the queue would still be pointed to by the tail.
1132+
1133+ Using the Michael-Scott style queue is as easy as any other transactional queue:
1134+
1135+ ``` ocaml
1136+ # let a_queue : int queue = queue ()
1137+ val a_queue : int queue = {head = <abstr>; tail = <abstr>}
1138+ # Xt.commit { tx = enqueue a_queue 19 }
1139+ - : unit = ()
1140+ # Xt.commit { tx = try_dequeue a_queue }
1141+ - : int option = Some 19
1142+ # Xt.commit { tx = try_dequeue a_queue }
1143+ - : int option = None
1144+ ```
1145+
1146+ The queue implementation in this section is an example of using ** kcas** to
1147+ implement a fine-grained lock-free algorithm. Instead of recording all shared
1148+ memory accesses and performing them atomically all at once, the implementation
1149+ updates the tail outside of the transaction. This can potentially improve
1150+ performance and scalability.
1151+
1152+ This sort of algorithm design requires careful reasoning. Consider the dequeue
1153+ operation. Instead of recording the ` Xt.get ~xt old_head ` operation in the
1154+ transaction log, one could propose to bypass the log as ` Loc.get old_head ` . That
1155+ may seem like a valid optimization, because logging the update of the head in
1156+ the transaction is sufficient to ensure that each transaction dequeues a unique
1157+ node. Unfortunately that would change the semantics of the operation.
1158+
1159+ Suppose, for example, that you have two queues, _ A_ and _ B_ , and you must
1160+ maintain the invariant that at least one of the queues is empty. One domain
1161+ tries to dequeue from _ A_ and, if _ A_ was empty, enqueue to _ B_ . Another domain
1162+ does the opposite, dequeue from _ B_ and enqueue to _ A_ (when _ B_ was empty).
1163+ When such operations are performed in isolation, the invariant would be
1164+ maintained. However, if the access of ` old_head ` is not recorded in the log, it
1165+ is possible to end up with both _ A_ and _ B_ non-empty. This kind of
1166+ [ race condition] ( https://en.wikipedia.org/wiki/Race_condition ) is common enough
1167+ that it has been given a name: _ write skew_ . As an exercise, write out the
1168+ sequence of atomic accesses that leads to that result.
1169+
11351170### Race to cooperate
11361171
11371172Sometimes it is necessary to perform slower transactions that access many shared
0 commit comments