@@ -20,7 +20,7 @@ IOSimPOR will usually not explore all possible schedules, because
20
20
there are too many for this to be feasible (even infinitely many, in
21
21
some cases). It prioritises races that occur earlier in a test
22
22
execution, and prioritises making a small number of race reversals
23
- over making many.
23
+ over making many.
24
24
25
25
It can test non-terminating programs with an infinite trace; in such
26
26
cases it only reverse races in the part of the trace that the property
@@ -309,14 +309,48 @@ Exception:
309
309
then the probable cause is a bug in IOSimPOR itself. The message
310
310
indicates that IOSimPOR scheduler is trying to follow a schedule
311
311
modification that specifies that thread `[5]` should run next, but this
312
- is impossible because thread `[5]` is not in the runqueue (the list of
312
+ is impossible because thread `[5]` is not in the ` runqueue` (the list of
313
313
runnable threads). If you supplied a schedule control explicitly,
314
- using withReplay, then you may perhaps have supplied a schedule
314
+ using ` withReplay` , then you may perhaps have supplied a schedule
315
315
control that does not match the version of the code you are running:
316
316
in this case the exception is your fault. But if this message appears
317
317
while you are exploring races, then it indicates a problem in
318
318
IOSimPOR's dependency analysis: IOSimPOR has constructed a schedule as
319
319
a result of race reversal that tries to run thread `[5]` at this point,
320
320
because the dependency analysis indicates that thread `[5]` ought to be
321
- runnable---but it is not. Best to consult Quviq at this point.
321
+ runnable---but it is not.
322
322
323
+ Another similar instance of such a problem is the following assertion failure:
324
+
325
+ ```
326
+ InternalError "assertion failure: Thread {4} not runnable"
327
+ assertion failure: Thread {4} not runnable
328
+ ```
329
+
330
+ This indicates that IOSimPOR was following a schedule where the next scheduled
331
+ thread was thread 4. However, this thread does not exist in the `runqueue`,
332
+ possibly because it was blocked and not unblocked before being scheduled
333
+ again.
334
+
335
+ To debug this, follow these steps:
336
+
337
+ 1. **Get the shrunken test input**: Minimize the test input to simplify the
338
+ failure case.
339
+ 2. **Retrieve the failing schedule**: Obtain the failing Schedule control the
340
+ failure case.
341
+ 3. **Create a unit test**: Use the failing input in a unit test and run it
342
+ with `explorationDebugLevel = 2` to display the races in each scheduled
343
+ run.
344
+ 4. **Analyze the output**: Manually review the output, focusing on the
345
+ schedule that leads to the failure. Look for the schedule in the
346
+ `RacesFound` log messages.
347
+ 5. **Examine the faulty schedule**: Investigate the trace of the identified
348
+ schedule to understand why it causes the failure, i.e. find the race which
349
+ leads to the error.
350
+ 6. **Implement the fix**: Correct the identified issue.
351
+
352
+ Most likely, the root cause lies within the vector clocks logic or the
353
+ `updateRaces` function, particularly in the management of the
354
+ `stepInfoConcurrent` and `stepInfoNonDep` sets, which are crucial for race
355
+ discovery. So analyse `updateRaces` function along the execution trace to see
356
+ if everything makes sense.
0 commit comments