Skip to content

Commit 7aa7e70

Browse files
committed
Move the Michael-Scott queue example to techniques section and...
...also describe `post_commit` more generally.
1 parent d168d20 commit 7aa7e70

File tree

1 file changed

+164
-129
lines changed

1 file changed

+164
-129
lines changed

README.md

Lines changed: 164 additions & 129 deletions
Original file line numberDiff line numberDiff line change
@@ -53,13 +53,14 @@ is distributed under the [ISC license](LICENSE.md).
5353
- [Composing transactions](#composing-transactions)
5454
- [Blocking transactions](#blocking-transactions)
5555
- [A transactional lock-free leftist heap](#a-transactional-lock-free-leftist-heap)
56-
- [A composable Michael-Scott style queue](#a-composable-michael-scott-style-queue)
5756
- [Programming with primitive operations](#programming-with-primitive-operations)
5857
- [Designing lock-free algorithms with k-CAS](#designing-lock-free-algorithms-with-k-cas)
5958
- [Minimize accesses](#minimize-accesses)
6059
- [Prefer compound accesses](#prefer-compound-accesses)
6160
- [Log updates optimistically](#log-updates-optimistically)
6261
- [Postcompute](#postcompute)
62+
- [Post commit actions](#post-commit-actions)
63+
- [A composable Michael-Scott style queue](#a-composable-michael-scott-style-queue)
6364
- [Race to cooperate](#race-to-cooperate)
6465
- [Understanding transactions](#understanding-transactions)
6566
- [A three-stack lock-free queue](#a-three-stack-lock-free-queue)
@@ -712,134 +713,6 @@ Notice how we were able to use a `while` loop, rather than recursion, in
712713
> lock-free heap implementation, but it was pretty straightforward to implement
713714
> using k-CAS based on a textbook imperative implementation.
714715
715-
#### A composable Michael-Scott style queue
716-
717-
One of the most famous lock-free algorithms is
718-
[the Michael-Scott queue](https://www.cs.rochester.edu/~scott/papers/1996_PODC_queues.pdf).
719-
Perhaps its characteristic feature is that the tail pointer of the queue is
720-
allowed to momentarily fall behind and that operations on the queue perform
721-
cooperative CASes to update the tail. The tail pointer can be seen as an
722-
optimization — whether it points to the true tail or not does not change
723-
the logical state of the queue. Let's implement a composable queue that allows
724-
the tail to momentarily lag behind.
725-
726-
First we define a type for nodes:
727-
728-
```ocaml
729-
type 'a node = Nil | Node of 'a * 'a node Loc.t
730-
```
731-
732-
A queue is then a pair of pointers to the head and tail of a queue:
733-
734-
```ocaml
735-
type 'a queue = {
736-
head : 'a node Loc.t Loc.t;
737-
tail : 'a node Loc.t Atomic.t
738-
}
739-
```
740-
741-
Note that we used an `Atomic.t` for the tail. We do not need to operate on the
742-
tail transactionally.
743-
744-
To create a queue we allocate a shared memory location for the pointer to the
745-
first node to be enqueued and make both the head and tail point to the location:
746-
747-
```ocaml
748-
# let queue () =
749-
let next = Loc.make Nil in
750-
{ head = Loc.make next; tail = Atomic.make next }
751-
val queue : unit -> 'a queue = <fun>
752-
```
753-
754-
To dequeue a node, only the head of the queue is examined. If the location
755-
pointed to by the head points to a node we update the head to point to the
756-
location pointing to the next node:
757-
758-
```ocaml
759-
# let try_dequeue ~xt { head; _ } =
760-
let old_head = Xt.get ~xt head in
761-
match Xt.get ~xt old_head with
762-
| Nil -> None
763-
| Node (value, next) ->
764-
Xt.set ~xt head next;
765-
Some value
766-
val try_dequeue : xt:'a Xt.t -> 'b queue -> 'b option = <fun>
767-
```
768-
769-
To enqueue a value into the queue, only the tail of the queue needs to be
770-
examined. We allocate a new location for the new tail and a node. We then need
771-
to find the true tail of the queue and update it to point to the new node. The
772-
reason we need to find the true tail is that we explicitly allow the tail to
773-
momentarily fall behind. We then add a post commit action to the transaction to
774-
update the tail after the transaction has been successfully committed:
775-
776-
```ocaml
777-
# let enqueue ~xt { tail; _ } value =
778-
let new_tail = Loc.make Nil in
779-
let new_node = Node (value, new_tail) in
780-
let rec find_and_set_tail old_tail =
781-
match Xt.compare_and_swap ~xt old_tail Nil new_node with
782-
| Nil -> ()
783-
| Node (_, old_tail) -> find_and_set_tail old_tail
784-
in
785-
find_and_set_tail (Atomic.get tail);
786-
let rec fix_tail () =
787-
let old_tail = Atomic.get tail in
788-
if
789-
Loc.get new_tail == Nil
790-
&& not (Atomic.compare_and_set tail old_tail new_tail)
791-
then fix_tail ()
792-
in
793-
Xt.post_commit ~xt fix_tail
794-
val enqueue : xt:'a Xt.t -> 'b queue -> 'b -> unit = <fun>
795-
```
796-
797-
The post commit action, registered using
798-
[`post_commit`](https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-post_commit),
799-
checks that the tail is still the true tail and then attempts to update the
800-
tail. The order of accesses is very subtle as always with non-transactional
801-
atomic operations. Can you see why it works? Although we allow the tail to
802-
momentarily fall behind, it is important that we do not let the tail fall behind
803-
indefinitely, because then we would risk leaking memory &mdash; nodes that have
804-
been dequeued from the queue would still be pointed to by the tail.
805-
806-
Using the Michael-Scott style queue is as easy as any other transactional queue:
807-
808-
```ocaml
809-
# let a_queue : int queue = queue ()
810-
val a_queue : int queue = {head = <abstr>; tail = <abstr>}
811-
# Xt.commit { tx = enqueue a_queue 19 }
812-
- : unit = ()
813-
# Xt.commit { tx = try_dequeue a_queue }
814-
- : int option = Some 19
815-
# Xt.commit { tx = try_dequeue a_queue }
816-
- : int option = None
817-
```
818-
819-
The queue implementation in this section is an example of using **kcas** to
820-
implement a fine-grained lock-free algorithm. Instead of recording all shared
821-
memory accesses and performing them atomically all at once, the implementation
822-
updates the tail outside of the transaction. This can potentially improve
823-
performance and scalability.
824-
825-
This sort of algorithm design requires careful reasoning. Consider the dequeue
826-
operation. Instead of recording the `Xt.get ~xt old_head` operation in the
827-
transaction log, one could propose to bypass the log as `Loc.get old_head`. That
828-
may seem like a valid optimization, because logging the update of the head in
829-
the transaction is sufficient to ensure that each transaction dequeues a unique
830-
node. Unfortunately that would change the semantics of the operation.
831-
832-
Suppose, for example, that you have two queues, _A_ and _B_, and you must
833-
maintain the invariant that at least one of the queues is empty. One domain
834-
tries to dequeue from _A_ and, if _A_ was empty, enqueue to _B_. Another domain
835-
does the opposite, dequeue from _B_ and enqueue to _A_ (when _B_ was empty).
836-
When such operations are performed in isolation, the invariant would be
837-
maintained. However, if the access of `old_head` is not recorded in the log, it
838-
is possible to end up with both _A_ and _B_ non-empty. This kind of
839-
[race condition](https://en.wikipedia.org/wiki/Race_condition) is common enough
840-
that it has been given a name: _write skew_. As an exercise, write out the
841-
sequence of atomic accesses that leads to that result.
842-
843716
### Programming with primitive operations
844717

845718
The [`Op`](https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Op/index.html)
@@ -1132,6 +1005,168 @@ transactions:
11321005
val n_elems : int = 2
11331006
```
11341007

1008+
### Post commit actions
1009+
1010+
Closely related to moving compute outside of transactions, it is also sometimes
1011+
possible or necessary to perform some side-effects or actions, such as
1012+
non-transactional IO operations, only after a transaction has been committed
1013+
successfully. These cases are supported via the ability to register
1014+
[`post_commit`](https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-post_commit)
1015+
actions.
1016+
1017+
As a basic example, one might want to log a message when some transactional
1018+
operation is performed. Instead of directly logging the message
1019+
1020+
```ocaml
1021+
# let enqueue_and_log ~xt queue message =
1022+
enqueue ~xt queue message;
1023+
(* BAD: The printf could be executed many times! *)
1024+
Printf.printf "sent %s" message
1025+
val enqueue_and_log : xt:'a Xt.t -> string queue -> string -> unit = <fun>
1026+
```
1027+
1028+
one should use
1029+
[`post_commit`](https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-post_commit)
1030+
1031+
```ocaml
1032+
# let enqueue_and_log ~xt queue message =
1033+
enqueue ~xt queue message;
1034+
Xt.post_commit ~xt @@ fun () ->
1035+
Printf.printf "sent %s" message
1036+
val enqueue_and_log : xt:'a Xt.t -> string queue -> string -> unit = <fun>
1037+
```
1038+
1039+
to make sure that the message is only printed once after the transaction has
1040+
actually completed successfully.
1041+
1042+
#### A composable Michael-Scott style queue
1043+
1044+
One of the most famous lock-free algorithms is
1045+
[the Michael-Scott queue](https://www.cs.rochester.edu/~scott/papers/1996_PODC_queues.pdf).
1046+
Perhaps its characteristic feature is that the tail pointer of the queue is
1047+
allowed to momentarily fall behind and that operations on the queue perform
1048+
cooperative CASes to update the tail. The tail pointer can be seen as an
1049+
optimization &mdash; whether it points to the true tail or not does not change
1050+
the logical state of the queue. Let's implement a composable queue that allows
1051+
the tail to momentarily lag behind.
1052+
1053+
First we define a type for nodes:
1054+
1055+
```ocaml
1056+
type 'a node = Nil | Node of 'a * 'a node Loc.t
1057+
```
1058+
1059+
A queue is then a pair of pointers to the head and tail of a queue:
1060+
1061+
```ocaml
1062+
type 'a queue = {
1063+
head : 'a node Loc.t Loc.t;
1064+
tail : 'a node Loc.t Atomic.t
1065+
}
1066+
```
1067+
1068+
Note that we used an `Atomic.t` for the tail. We do not need to operate on the
1069+
tail transactionally.
1070+
1071+
To create a queue we allocate a shared memory location for the pointer to the
1072+
first node to be enqueued and make both the head and tail point to the location:
1073+
1074+
```ocaml
1075+
# let queue () =
1076+
let next = Loc.make Nil in
1077+
{ head = Loc.make next; tail = Atomic.make next }
1078+
val queue : unit -> 'a queue = <fun>
1079+
```
1080+
1081+
To dequeue a node, only the head of the queue is examined. If the location
1082+
pointed to by the head points to a node we update the head to point to the
1083+
location pointing to the next node:
1084+
1085+
```ocaml
1086+
# let try_dequeue ~xt { head; _ } =
1087+
let old_head = Xt.get ~xt head in
1088+
match Xt.get ~xt old_head with
1089+
| Nil -> None
1090+
| Node (value, next) ->
1091+
Xt.set ~xt head next;
1092+
Some value
1093+
val try_dequeue : xt:'a Xt.t -> 'b queue -> 'b option = <fun>
1094+
```
1095+
1096+
To enqueue a value into the queue, only the tail of the queue needs to be
1097+
examined. We allocate a new location for the new tail and a node. We then need
1098+
to find the true tail of the queue and update it to point to the new node. The
1099+
reason we need to find the true tail is that we explicitly allow the tail to
1100+
momentarily fall behind. We then add a post commit action to the transaction to
1101+
update the tail after the transaction has been successfully committed:
1102+
1103+
```ocaml
1104+
# let enqueue ~xt { tail; _ } value =
1105+
let new_tail = Loc.make Nil in
1106+
let new_node = Node (value, new_tail) in
1107+
let rec find_and_set_tail old_tail =
1108+
match Xt.compare_and_swap ~xt old_tail Nil new_node with
1109+
| Nil -> ()
1110+
| Node (_, old_tail) -> find_and_set_tail old_tail
1111+
in
1112+
find_and_set_tail (Atomic.get tail);
1113+
let rec fix_tail () =
1114+
let old_tail = Atomic.get tail in
1115+
if
1116+
Loc.get new_tail == Nil
1117+
&& not (Atomic.compare_and_set tail old_tail new_tail)
1118+
then fix_tail ()
1119+
in
1120+
Xt.post_commit ~xt fix_tail
1121+
val enqueue : xt:'a Xt.t -> 'b queue -> 'b -> unit = <fun>
1122+
```
1123+
1124+
The post commit action, registered using
1125+
[`post_commit`](https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-post_commit),
1126+
checks that the tail is still the true tail and then attempts to update the
1127+
tail. The order of accesses is very subtle as always with non-transactional
1128+
atomic operations. Can you see why it works? Although we allow the tail to
1129+
momentarily fall behind, it is important that we do not let the tail fall behind
1130+
indefinitely, because then we would risk leaking memory &mdash; nodes that have
1131+
been dequeued from the queue would still be pointed to by the tail.
1132+
1133+
Using the Michael-Scott style queue is as easy as any other transactional queue:
1134+
1135+
```ocaml
1136+
# let a_queue : int queue = queue ()
1137+
val a_queue : int queue = {head = <abstr>; tail = <abstr>}
1138+
# Xt.commit { tx = enqueue a_queue 19 }
1139+
- : unit = ()
1140+
# Xt.commit { tx = try_dequeue a_queue }
1141+
- : int option = Some 19
1142+
# Xt.commit { tx = try_dequeue a_queue }
1143+
- : int option = None
1144+
```
1145+
1146+
The queue implementation in this section is an example of using **kcas** to
1147+
implement a fine-grained lock-free algorithm. Instead of recording all shared
1148+
memory accesses and performing them atomically all at once, the implementation
1149+
updates the tail outside of the transaction. This can potentially improve
1150+
performance and scalability.
1151+
1152+
This sort of algorithm design requires careful reasoning. Consider the dequeue
1153+
operation. Instead of recording the `Xt.get ~xt old_head` operation in the
1154+
transaction log, one could propose to bypass the log as `Loc.get old_head`. That
1155+
may seem like a valid optimization, because logging the update of the head in
1156+
the transaction is sufficient to ensure that each transaction dequeues a unique
1157+
node. Unfortunately that would change the semantics of the operation.
1158+
1159+
Suppose, for example, that you have two queues, _A_ and _B_, and you must
1160+
maintain the invariant that at least one of the queues is empty. One domain
1161+
tries to dequeue from _A_ and, if _A_ was empty, enqueue to _B_. Another domain
1162+
does the opposite, dequeue from _B_ and enqueue to _A_ (when _B_ was empty).
1163+
When such operations are performed in isolation, the invariant would be
1164+
maintained. However, if the access of `old_head` is not recorded in the log, it
1165+
is possible to end up with both _A_ and _B_ non-empty. This kind of
1166+
[race condition](https://en.wikipedia.org/wiki/Race_condition) is common enough
1167+
that it has been given a name: _write skew_. As an exercise, write out the
1168+
sequence of atomic accesses that leads to that result.
1169+
11351170
### Race to cooperate
11361171

11371172
Sometimes it is necessary to perform slower transactions that access many shared

0 commit comments

Comments
 (0)