|
| 1 | +How it works? |
| 2 | +============= |
| 3 | + |
| 4 | +``xdist`` works by spawning one or more **workers**, which are |
| 5 | +controlled by the **controller**. Each **worker** is responsible for |
| 6 | +performing a full test collection and afterwards running tests as |
| 7 | +dictated by the **controller**. |
| 8 | + |
| 9 | +The execution flow is: |
| 10 | + |
| 11 | +1. **controller** spawns one or more **workers** at the beginning of the |
| 12 | + test session. The communication between **controller** and **worker** |
| 13 | + nodes makes use of `execnet <https://codespeak.net/execnet/>`__ and |
| 14 | + its |
| 15 | + `gateways <https://codespeak.net/execnet/basics.html#gateways-bootstrapping-python-interpreters>`__. |
| 16 | + The actual interpreters executing the code for the **workers** might |
| 17 | + be remote or local. |
| 18 | + |
| 19 | +2. Each **worker** itself is a mini pytest runner. **workers** at this |
| 20 | + point perform a full test collection, sending back the collected |
| 21 | + test-ids back to the **controller** which does not perform any |
| 22 | + collection itself. |
| 23 | + |
| 24 | +3. The **controller** receives the result of the collection from all |
| 25 | + nodes. At this point the **controller** performs some sanity check to |
| 26 | + ensure that all **workers** collected the same tests (including |
| 27 | + order), bailing out otherwise. If all is well, it converts the list |
| 28 | + of test-ids into a list of simple indexes, where each index |
| 29 | + corresponds to the position of that test in the original collection |
| 30 | + list. This works because all nodes have the same collection list, and |
| 31 | + saves bandwidth because the **controller** can now tell one of the |
| 32 | + workers to just *execute test index 3* index of passing the full test |
| 33 | + id. |
| 34 | + |
| 35 | +4. If **dist-mode** is **each**: the **controller** just sends the full |
| 36 | + list of test indexes to each node at this moment. |
| 37 | + |
| 38 | +5. If **dist-mode** is **load**: the **controller** takes around 25% of |
| 39 | + the tests and sends them one by one to each **worker** in a round |
| 40 | + robin fashion. The rest of the tests will be distributed later as |
| 41 | + **workers** finish tests (see below). |
| 42 | + |
| 43 | +6. Note that ``pytest_xdist_make_scheduler`` hook can be used to |
| 44 | + implement custom tests distribution logic. |
| 45 | + |
| 46 | +7. **workers** re-implement ``pytest_runtestloop``: pytest’s default |
| 47 | + implementation basically loops over all collected items in the |
| 48 | + ``session`` object and executes the ``pytest_runtest_protocol`` for |
| 49 | + each test item, but in xdist **workers** sit idly waiting for |
| 50 | + **controller** to send tests for execution. As tests are received by |
| 51 | + **workers**, ``pytest_runtest_protocol`` is executed for each test. |
| 52 | + Here it worth noting an implementation detail: **workers** always |
| 53 | + must keep at least one test item on their queue due to how the |
| 54 | + ``pytest_runtest_protocol(item, nextitem)`` hook is defined: in order |
| 55 | + to pass the ``nextitem`` to the hook, the worker must wait for more |
| 56 | + instructions from controller before executing that remaining test. If |
| 57 | + it receives more tests, then it can safely call |
| 58 | + ``pytest_runtest_protocol`` because it knows what the ``nextitem`` |
| 59 | + parameter will be. If it receives a “shutdown” signal, then it can |
| 60 | + execute the hook passing ``nextitem`` as ``None``. |
| 61 | + |
| 62 | +8. As tests are started and completed at the **workers**, the results |
| 63 | + are sent back to the **controller**, which then just forwards the |
| 64 | + results to the appropriate pytest hooks: ``pytest_runtest_logstart`` |
| 65 | + and ``pytest_runtest_logreport``. This way other plugins (for example |
| 66 | + ``junitxml``) can work normally. The **controller** (when in |
| 67 | + dist-mode **load**) decides to send more tests to a node when a test |
| 68 | + completes, using some heuristics such as test durations and how many |
| 69 | + tests each **worker** still has to run. |
| 70 | + |
| 71 | +9. When the **controller** has no more pending tests it will send a |
| 72 | + “shutdown” signal to all **workers**, which will then run their |
| 73 | + remaining tests to completion and shut down. At this point the |
| 74 | + **controller** will sit waiting for **workers** to shut down, still |
| 75 | + processing events such as ``pytest_runtest_logreport``. |
| 76 | + |
| 77 | +FAQ |
| 78 | +--- |
| 79 | + |
| 80 | +**Question**: Why does each worker do its own collection, as opposed to having the |
| 81 | +controller collect once and distribute from that collection to the |
| 82 | +workers? |
| 83 | + |
| 84 | +If collection was performed by controller then it would have to |
| 85 | +serialize collected items to send them through the wire, as workers live |
| 86 | +in another process. The problem is that test items are not easily |
| 87 | +(impossible?) to serialize, as they contain references to the test |
| 88 | +functions, fixture managers, config objects, etc. Even if one manages to |
| 89 | +serialize it, it seems it would be very hard to get it right and easy to |
| 90 | +break by any small change in pytest. |
0 commit comments