Split deterministically regardless of test order Fix #23

bullfest · bullfest · commit 70e75e4d1432 · 2022-04-22T15:38:05.000+02:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 
 ## [Unreleased]
 
+### Fixed
+- The `least_duration` algorithm should now split deterministically regardless of starting test order.
+  This should fix the main problem when running with test-randomization packages such as `pytest-randomly` or `pytest-random-order`
+  See #52
+
 ## [0.7.0] - 2022-03-13
 ### Added
 - Support for pytest 7.x, see https://github.com/jerry-git/pytest-split/pull/47
diff --git a/README.md b/README.md
@@ -27,20 +27,20 @@ This is of course a fundamental problem in the suite itself but sometimes it's n
 Additionally, `pytest-split` may be a better fit in some use cases considering distributed execution.
 
 ## Installation
-```
+```sh
 pip install pytest-split
 ```
 
 ## Usage
 First we have to store test durations from a complete test suite run.
 This produces .test_durations file which should be stored in the repo in order to have it available during future test runs.
 The file path is configurable via `--durations-path` CLI option.
-```
+```sh
 pytest --store-durations
 ```
 
 Then we can have as many splits as we want:
-```
+```sh
 pytest --splits 3 --group 1
 pytest --splits 3 --group 2
 pytest --splits 3 --group 3
@@ -59,7 +59,10 @@ Lists the slowest tests based on the information stored in the test durations fi
  information.
 
 ## Interactions with other pytest plugins
-* [`pytest-random-order`](https://github.com/jbasko/pytest-random-order): ⚠️ The **default settings** of that plugin (setting only `--random-order` to activate it) are **incompatible** with `pytest-split`. Test selection in the groups happens after randomization, potentially causing some tests to be selected in several groups and others not at all. Instead, a global random seed needs to be computed before running the tests (for example using `$RANDOM` from the shell) and that single seed then needs to be used for all groups by setting the `--random-order-seed` option.
+* [`pytest-random-order`](https://github.com/jbasko/pytest-random-order) and [`pytest-randomly`](https://github.com/pytest-dev/pytest-randomly):
+   ⚠️ `pytest-split` running with the `duration_based_chunks` algorithm is **incompatible** with test-order-randomization plugins.
+  Test selection in the groups happens after randomization, potentially causing some tests to be selected in several groups and others not at all.
+  Instead, a global random seed needs to be computed before running the tests (for example using `$RANDOM` from the shell) and that single seed then needs to be used for all groups by setting the `--random-order-seed` option.
 
 * [`nbval`](https://github.com/computationalmodelling/nbval): `pytest-split` could, in principle, break up a single IPython Notebook into different test groups. This most likely causes broken up pieces to fail (for the very least, package `import`s are usually done at Cell 1, and so, any broken up piece that doesn't contain Cell 1 will certainly fail). To avoid this, after splitting step is done, test groups are reorganized based on a simple algorithm illustrated in the following cartoon:
 
@@ -71,14 +74,15 @@ where the letters (A to E) refer to individual IPython Notebooks, and the number
 The plugin supports multiple algorithms to split tests into groups.
 Each algorithm makes different tradeoffs, but generally `least_duration` should give more balanced groups.
 
-| Algorithm      | Maintains Absolute Order | Maintains Relative Order | Split Quality |
-|----------------|--------------------------|--------------------------|---------------|
-| duration_based_chunks | ✅                | ✅                        | Good          |
-| least_duration | ❌                       | ✅                        | Better        |
+| Algorithm      | Maintains Absolute Order | Maintains Relative Order | Split Quality | Works with random ordering |
+|----------------|--------------------------|--------------------------|---------------|----------------------------|
+| duration_based_chunks | ✅                | ✅                       | Good          | ❌                         |
+| least_duration | ❌                       | ✅                       | Better        | ✅                         |
 
 Explanation of the terms in the table:
 * Absolute Order: whether each group contains all tests between first and last element in the same order as the original list of tests
 * Relative Order: whether each test in each group has the same relative order to its neighbours in the group as in the original list of tests
+* Works with random ordering: whether the algorithm works with test-shuffling tools such as [`pytest-randomly`](https://github.com/pytest-dev/pytest-randomly)
 
 The `duration_based_chunks` algorithm aims to find optimal boundaries for the list of tests and every test group contains all tests between the start and end boundary.
 The `least_duration` algorithm walks the list of tests and assigns each test to the group with the smallest current duration.
diff --git a/src/pytest_split/algorithms.py b/src/pytest_split/algorithms.py
@@ -41,14 +41,19 @@ def least_duration(
         (*tup, i) for i, tup in enumerate(items_with_durations)
     ]
 
+    # Sort by name to ensure it's always the same order
+    items_with_durations_indexed = sorted(
+        items_with_durations_indexed, key=lambda tup: str(tup[0])
+    )
+
     # sort in ascending order
     sorted_items_with_durations = sorted(
         items_with_durations_indexed, key=lambda tup: tup[1], reverse=True
     )
 
-    selected: "List[List[Tuple[nodes.Item, int]]]" = [[] for i in range(splits)]
-    deselected: "List[List[nodes.Item]]" = [[] for i in range(splits)]
-    duration: "List[float]" = [0 for i in range(splits)]
+    selected: "List[List[Tuple[nodes.Item, int]]]" = [[] for _ in range(splits)]
+    deselected: "List[List[nodes.Item]]" = [[] for _ in range(splits)]
+    duration: "List[float]" = [0 for _ in range(splits)]
 
     # create a heap of the form (summed_durations, group_index)
     heap: "List[Tuple[float, int]]" = [(0, i) for i in range(splits)]
diff --git a/tests/test_algorithms.py b/tests/test_algorithms.py
@@ -1,7 +1,13 @@
+import itertools
 from collections import namedtuple
+from typing import TYPE_CHECKING
 
 import pytest
 
+if TYPE_CHECKING:
+    from typing import List, Set
+    from _pytest.nodes import Item
+
 from pytest_split.algorithms import Algorithms
 
 item = namedtuple("item", "nodeid")
@@ -110,3 +116,18 @@ def test__split_tests_maintains_relative_order_of_tests(self, algo_name, expecte
         expected_first, expected_second = expected
         assert first.selected == expected_first
         assert second.selected == expected_second
+
+    def test__split_tests_same_set_regardless_of_order(self):
+        """NOTE: only least_duration does this correctly"""
+        tests = ["a", "b", "c", "d", "e", "f", "g"]
+        durations = {t: 1 for t in tests}
+        items = [item(t) for t in tests]
+        algo = Algorithms["least_duration"].value
+        for n in (2, 3, 4):
+            selected_each: "List[Set[Item]]" = [set() for _ in range(n)]
+            for order in itertools.permutations(items):
+                splits = algo(splits=n, items=order, durations=durations)
+                for i, group in enumerate(splits):
+                    if not selected_each[i]:
+                        selected_each[i] = set(group.selected)
+                    assert selected_each[i] == set(group.selected)
diff --git a/tests/test_plugin.py b/tests/test_plugin.py
@@ -141,8 +141,8 @@ class TestSplitToSuites:
             ["test_1", "test_2", "test_3", "test_4", "test_5", "test_6", "test_7"],
         ),
         (2, 2, "duration_based_chunks", ["test_8", "test_9", "test_10"]),
-        (2, 1, "least_duration", ["test_3", "test_5", "test_6", "test_8", "test_10"]),
-        (2, 2, "least_duration", ["test_1", "test_2", "test_4", "test_7", "test_9"]),
+        (2, 1, "least_duration", ["test_3", "test_5", "test_7", "test_9", "test_10"]),
+        (2, 2, "least_duration", ["test_1", "test_2", "test_4", "test_6", "test_8"]),
         (
             3,
             1,
@@ -151,17 +151,17 @@ class TestSplitToSuites:
         ),
         (3, 2, "duration_based_chunks", ["test_6", "test_7", "test_8"]),
         (3, 3, "duration_based_chunks", ["test_9", "test_10"]),
-        (3, 1, "least_duration", ["test_3", "test_6", "test_9"]),
-        (3, 2, "least_duration", ["test_4", "test_7", "test_10"]),
-        (3, 3, "least_duration", ["test_1", "test_2", "test_5", "test_8"]),
+        (3, 1, "least_duration", ["test_3", "test_8", "test_10"]),
+        (3, 2, "least_duration", ["test_4", "test_6", "test_9"]),
+        (3, 3, "least_duration", ["test_1", "test_2", "test_5", "test_7"]),
         (4, 1, "duration_based_chunks", ["test_1", "test_2", "test_3", "test_4"]),
         (4, 2, "duration_based_chunks", ["test_5", "test_6", "test_7"]),
         (4, 3, "duration_based_chunks", ["test_8", "test_9"]),
         (4, 4, "duration_based_chunks", ["test_10"]),
-        (4, 1, "least_duration", ["test_6", "test_10"]),
-        (4, 2, "least_duration", ["test_1", "test_4", "test_7"]),
-        (4, 3, "least_duration", ["test_2", "test_5", "test_8"]),
-        (4, 4, "least_duration", ["test_3", "test_9"]),
+        (4, 1, "least_duration", ["test_9", "test_10"]),
+        (4, 2, "least_duration", ["test_1", "test_4", "test_6"]),
+        (4, 3, "least_duration", ["test_2", "test_5", "test_7"]),
+        (4, 4, "least_duration", ["test_3", "test_8"]),
     ]
     legacy_duration = [True, False]
     all_params = [