Skip to content

Commit 4238154

Browse files
committed
Fix IOSimPOR test failure
Adds an entry about how to investigate an IOSimPOR failure. Closes #154
1 parent 434f33a commit 4238154

File tree

2 files changed

+53
-15
lines changed

2 files changed

+53
-15
lines changed

io-sim/how-to-use-IOSimPOR.md

Lines changed: 38 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ IOSimPOR will usually not explore all possible schedules, because
2020
there are too many for this to be feasible (even infinitely many, in
2121
some cases). It prioritises races that occur earlier in a test
2222
execution, and prioritises making a small number of race reversals
23-
over making many.
23+
over making many.
2424

2525
It can test non-terminating programs with an infinite trace; in such
2626
cases it only reverse races in the part of the trace that the property
@@ -309,14 +309,48 @@ Exception:
309309
then the probable cause is a bug in IOSimPOR itself. The message
310310
indicates that IOSimPOR scheduler is trying to follow a schedule
311311
modification that specifies that thread `[5]` should run next, but this
312-
is impossible because thread `[5]` is not in the runqueue (the list of
312+
is impossible because thread `[5]` is not in the `runqueue` (the list of
313313
runnable threads). If you supplied a schedule control explicitly,
314-
using withReplay, then you may perhaps have supplied a schedule
314+
using `withReplay`, then you may perhaps have supplied a schedule
315315
control that does not match the version of the code you are running:
316316
in this case the exception is your fault. But if this message appears
317317
while you are exploring races, then it indicates a problem in
318318
IOSimPOR's dependency analysis: IOSimPOR has constructed a schedule as
319319
a result of race reversal that tries to run thread `[5]` at this point,
320320
because the dependency analysis indicates that thread `[5]` ought to be
321-
runnable---but it is not. Best to consult Quviq at this point.
321+
runnable---but it is not.
322322
323+
Another similar instance of such a problem is the following assertion failure:
324+
325+
```
326+
InternalError "assertion failure: Thread {4} not runnable"
327+
assertion failure: Thread {4} not runnable
328+
```
329+
330+
This indicates that IOSimPOR was following a schedule where the next scheduled
331+
thread was thread 4. However, this thread does not exist in the `runqueue`,
332+
possibly because it was blocked and not unblocked before being scheduled
333+
again.
334+
335+
To debug this, follow these steps:
336+
337+
1. **Get the shrunken test input**: Minimize the test input to simplify the
338+
failure case.
339+
2. **Retrieve the failing schedule**: Obtain the failing Schedule control the
340+
failure case.
341+
3. **Create a unit test**: Use the failing input in a unit test and run it
342+
with `explorationDebugLevel = 2` to display the races in each scheduled
343+
run.
344+
4. **Analyze the output**: Manually review the output, focusing on the
345+
schedule that leads to the failure. Look for the schedule in the
346+
`RacesFound` log messages.
347+
5. **Examine the faulty schedule**: Investigate the trace of the identified
348+
schedule to understand why it causes the failure, i.e. find the race which
349+
leads to the error.
350+
6. **Implement the fix**: Correct the identified issue.
351+
352+
Most likely, the root cause lies within the vector clocks logic or the
353+
`updateRaces` function, particularly in the management of the
354+
`stepInfoConcurrent` and `stepInfoNonDep` sets, which are crucial for race
355+
discovery. So analyse `updateRaces` function along the execution trace to see
356+
if everything makes sense.

io-sim/src/Control/Monad/IOSimPOR/Internal.hs

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1771,17 +1771,27 @@ updateRaces thread@Thread { threadId = tid }
17711771
stepInfoRaces } =
17721772
-- if this step depends on the previous step, or is not concurrent,
17731773
-- then any threads that it wakes up become non-concurrent also.
1774-
let !lessConcurrent = concurrent Set.\\ effectWakeup newEffect in
1774+
let !lessConcurrent = concurrent Set.\\ effectWakeup newEffect
1775+
1776+
-- `step` happened before `newStep` (`newStep` happened after
1777+
-- `step`)
1778+
happensBefore = step `happensBeforeStep` newStep
1779+
1780+
!stepInfoNonDep'
1781+
-- `newStep` happened after `step`
1782+
| happensBefore = stepInfoNonDep
1783+
-- `newStep` did not happen after `step`
1784+
| otherwise = newStep : stepInfoNonDep in
17751785

17761786
if tid `notElem` concurrent
1777-
then stepInfo { stepInfoConcurrent = lessConcurrent }
1787+
then let
1788+
in stepInfo { stepInfoConcurrent = lessConcurrent
1789+
, stepInfoNonDep = stepInfoNonDep'
1790+
}
17781791

17791792
-- The core of IOSimPOR. Detect if `newStep` is racing with any
17801793
-- previous steps and update each `StepInfo`.
17811794
else let theseStepsRace = step `racingSteps` newStep
1782-
-- `step` happened before `newStep` (`newStep` happened after
1783-
-- `step`)
1784-
happensBefore = step `happensBeforeStep` newStep
17851795
-- `newStep` happens after any of the racing steps
17861796
afterRacingStep = any (`happensBeforeStep` newStep) stepInfoRaces
17871797

@@ -1795,12 +1805,6 @@ updateRaces thread@Thread { threadId = tid }
17951805
| afterRacingStep = Set.delete tid concurrent
17961806
| otherwise = concurrent
17971807

1798-
!stepInfoNonDep'
1799-
-- `newStep` happened after `step`
1800-
| happensBefore = stepInfoNonDep
1801-
-- `newStep` did not happen after `step`
1802-
| otherwise = newStep : stepInfoNonDep
1803-
18041808
-- Here we record discovered races. We only record a new
18051809
-- race if we are following the default schedule, to avoid
18061810
-- finding the same race in different parts of the search

0 commit comments

Comments
 (0)