-
Notifications
You must be signed in to change notification settings - Fork 15
Description
This issue is related to the multi threaded implementation of HVM4 runtime.
The minimal example to reproduce the issue is the following program:
@c2 = λs. !S0&C=s; λx0.S0₀(S0₁(x0))
@k2 = λs. !S0&K=s; λx0.S0₀(S0₁(x0))
@main = @c2(@k2)
The manual full reduction of the term can be found here: https://gist.github.com/Lorenzobattistela/89c6d6ff07f01402a92f2f93540732e7
The bug is caused by the following term structure that is found middle reduction:
!A&K = G₁ ;
!B&K = x0 ;
!G&C = λy0.&C{A₀, B₀}(&C{A₁, B₁}(y0)) ;
λx0.G₀
Note how the G dup refers to the A dup, which itself refers to G. So we have a recursive dup structure, where one dup needs the other to be resolved.
In the single-threaded environment, this is not a problem as shown in the manual reduction, because the dups resolve correctly with the lazy evaluation. However, in a multi threaded environment, two different threads pick up each of this DUP's (example, thread A picks up the G dup, thread B picks up the A dup that was enqueued as a task and stolen by thread B).
Initial state of the heap:
heap[G_loc] = term_loc
heap[A_loc] = DP1 to G_loc
...
So, thread A will do heap_take in the dup G location and set it to 0.
heap[G_loc] = 0 // acquired by thread A
heap[A_loc] = DP1 to G_loc
...
And then, thread B steals a task and finds the A dup, so it does heap_take on it and sets it to 0.
heap[G_loc] = 0 // acquired by thread A
heap[A_loc] = 0 // acquired by thread B
...
However, now, thread A reaches a point where it needs the A dup to continue G's reduction. So it tries to heap_take there. It finds a 0, and spin waits there.
Finally, thread B reaches a point where it needs the G value to continue A's reduction. Tries to heap_take there, and also finds 0, spin waits there.
The general problem can be stated as:
-> Thread A starts reducing a term. At some point in that term, it finds a dup, DUP_A. It enqueues other work
-> Thread B steals work from A's queue and start reducing. At some point, it finds DUP_B and starts reducing it
-> While reducing DUP_A, thread A finds a DP0 or DP1 of DUP_B. It tries to heap_take on it and receives 0 because thread B is working on it, so spin wait until someone set the value.
-> Now thread B is reducing, but it now finds a DP0 or DP1 of DUP_A, tries heap_take and receives 0, because thread A is working on it, so spin wait.
The problem was confirmed by implementing a global deadlock on DUP's, which solved the issue (but is slow and not a complete / correct solution)