Bug: recursive dups cause race conditions

This issue is related to the multi threaded implementation of HVM4 runtime.

The minimal example to reproduce the issue is the following program:
```
@c2   = λs. !S0&C=s; λx0.S0₀(S0₁(x0))
@k2   = λs. !S0&K=s; λx0.S0₀(S0₁(x0))
@main = @c2(@k2)
```

The manual full reduction of the term can be found here: https://gist.github.com/Lorenzobattistela/89c6d6ff07f01402a92f2f93540732e7

The bug is caused by the following term structure that is found middle reduction:
```
!A&K = G₁ ;
!B&K = x0 ;
!G&C = λy0.&C{A₀, B₀}(&C{A₁, B₁}(y0)) ;
λx0.G₀
```

Note how the `G` dup refers to the `A` dup, which itself refers to `G`. So we have a recursive dup structure, where one dup needs the other to be resolved. 

In the single-threaded environment, this is not a problem as shown in the manual reduction, because the dups resolve correctly with the lazy evaluation. However, in a multi threaded environment, two different threads pick up each of this DUP's (example, thread A picks up the G dup, thread B picks up the A dup that was enqueued as a task and stolen by thread B).

Initial state of the heap:
```
heap[G_loc] = term_loc
heap[A_loc] = DP1 to G_loc
...
```

So, thread `A` will do `heap_take` in the dup `G` location and set it to 0.

```
heap[G_loc] = 0 // acquired by thread A
heap[A_loc] = DP1 to G_loc
...
```

And then, thread `B` steals a task and finds the `A` dup, so it does `heap_take` on it and sets it to 0.
```
heap[G_loc] = 0 // acquired by thread A
heap[A_loc] = 0 // acquired by thread B
...
```

However, now, thread A reaches a point where it needs the `A` dup to continue `G`'s reduction. So it tries to `heap_take` there. It finds a `0`, and spin waits there.

Finally, thread `B` reaches a point where it needs the `G` value to continue `A`'s reduction. Tries to `heap_take` there, and also finds `0`, spin waits there. 

The general problem can be stated as:

```
-> Thread A starts reducing a term. At some point in that term, it finds a dup, DUP_A. It enqueues other work
-> Thread B steals work from A's queue and start reducing. At some point, it finds DUP_B and starts reducing it
-> While reducing DUP_A, thread A finds a DP0 or DP1 of DUP_B. It tries to heap_take on it and receives 0 because thread B is working on it, so spin wait until someone set the value.
-> Now thread B is reducing, but it now finds a DP0 or DP1 of DUP_A, tries heap_take and receives 0, because thread A is working on it, so spin wait.
```

The problem was confirmed by implementing a global deadlock on DUP's, which solved the issue (but is slow and not a complete / correct solution)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: recursive dups cause race conditions #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: recursive dups cause race conditions #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions