-
Notifications
You must be signed in to change notification settings - Fork 224
Repository scans sometimes skip non-problematic experiments #2799
Description
name: Bug report
about: Report a bug in ARTIQ
Bug Report
One-Line Summary
Currently, when we rescan the experiment repository, ARTIQ master sometimes skips non-problematic experiments.
Issue Details
See other instances of the issue on the m-labs forum:
- https://forum.m-labs.hk/d/786-scan-repository-error
- https://forum.m-labs.hk/d/690-workererror-when-scanning-repository-head
When skipping experiments, worker throws a WorkerError (“Worker ended while attempting to receive data (RID scan)”).
Can happen multiple times within a single repo scan - more likely to happen the more experiments there are.
Experiments that are skipped will vary depending on what files are inside the repo, but seems to be deterministic each time (i.e. two scans will skip same experiments if nothing is changed).
This issue seems to happen only when repo scans take over ~20s, at which point the WorkerErrors get thrown every ~20s.
Straightforward workaround is to put a “dummy” experiment immediately before (alphabetically) the skipped experiment, though this gets a bit silly.
Expected Behavior
ARTIQ master is able to rescan the repo and process all valid (i.e. no underlying problem) experiments.
Actual (undesired) Behavior
Experiments get skipped, often multiple times per repo scan, though they are the same experiment files if the underlying experiment repo stays the same.
Generally, it takes roughly ~20s (i.e. ~18s-19s by eye) between errors, kinda consistent with the timeout in Worker.examine (i.e. timeout=20).
Attached image shows the dashboard log when experiments get skipped - here, ARTIQ master skips 2 experiments.

Your System (omit irrelevant parts)
- Operating System: Windows 10
- ARTIQ version: v8.0+unknown.beta
- commit: 69c0f81