-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Improve stepwise to not forget failed tests #13122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
43064f5
40b38f9
1837858
b6f1dad
0c30a58
1f7a151
2fcaa79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| The ``--stepwise`` mode received a number of improvements: | ||
|
|
||
| * It no longer forgets the last failed test in case pytest is executed later without the flag. | ||
|
|
||
| This enables the following workflow: | ||
|
|
||
| 1. Execute pytest with ``--stepwise``, pytest then stops at the first failing test; | ||
| 2. Iteratively update the code and run the test in isolation, without the ``--stepwise`` flag (for example in an IDE), until it is fixed. | ||
| 3. Execute pytest with ``--stepwise`` again and pytest will continue from the previously failed test, and if it passes, continue on to the next tests. | ||
|
|
||
| Previously, at step 3, pytest would start from the beginning, forgetting the previously failed test. | ||
|
|
||
| This change however might cause issues if the ``--stepwise`` mode is used far apart in time, as the state might get stale, so the internal state will be reset automatically in case the test suite changes (for now only the number of tests are considered for this, we might change/improve this on the future). | ||
|
|
||
| * New ``--stepwise-reset``/``--sw-reset`` flag, allowing the user to explicitly reset the stepwise state and restart the workflow from the beginning. |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,5 +1,10 @@ | ||||||
| from __future__ import annotations | ||||||
|
|
||||||
| import dataclasses | ||||||
| from datetime import datetime | ||||||
| from typing import Any | ||||||
| from typing import TYPE_CHECKING | ||||||
|
|
||||||
| from _pytest import nodes | ||||||
| from _pytest.cacheprovider import Cache | ||||||
| from _pytest.config import Config | ||||||
|
|
@@ -8,6 +13,9 @@ | |||||
| from _pytest.reports import TestReport | ||||||
|
|
||||||
|
|
||||||
| if TYPE_CHECKING: | ||||||
| from typing_extensions import Self | ||||||
|
|
||||||
| STEPWISE_CACHE_DIR = "cache/stepwise" | ||||||
|
|
||||||
|
|
||||||
|
|
@@ -30,11 +38,20 @@ def pytest_addoption(parser: Parser) -> None: | |||||
| help="Ignore the first failing test but stop on the next failing test. " | ||||||
| "Implicitly enables --stepwise.", | ||||||
| ) | ||||||
| group.addoption( | ||||||
| "--sw-reset", | ||||||
| "--stepwise-reset", | ||||||
| action="store_true", | ||||||
| default=False, | ||||||
| dest="stepwise_reset", | ||||||
| help="Resets stepwise state, restarting the stepwise workflow. " | ||||||
| "Implicitly enables --stepwise.", | ||||||
| ) | ||||||
|
|
||||||
|
|
||||||
| def pytest_configure(config: Config) -> None: | ||||||
| if config.option.stepwise_skip: | ||||||
| # allow --stepwise-skip to work on its own merits. | ||||||
| # --stepwise-skip/--stepwise-reset implies stepwise. | ||||||
| if config.option.stepwise_skip or config.option.stepwise_reset: | ||||||
| config.option.stepwise = True | ||||||
| if config.getoption("stepwise"): | ||||||
| config.pluginmanager.register(StepwisePlugin(config), "stepwiseplugin") | ||||||
|
|
@@ -47,43 +64,105 @@ def pytest_sessionfinish(session: Session) -> None: | |||||
| # Do not update cache if this process is a xdist worker to prevent | ||||||
| # race conditions (#10641). | ||||||
| return | ||||||
| # Clear the list of failing tests if the plugin is not active. | ||||||
| session.config.cache.set(STEPWISE_CACHE_DIR, []) | ||||||
|
|
||||||
|
|
||||||
| @dataclasses.dataclass | ||||||
| class StepwiseCacheInfo: | ||||||
| # The nodeid of the last failed test. | ||||||
| last_failed: str | None | ||||||
|
|
||||||
| # The number of tests in the last time --stepwise was run. | ||||||
| # We use this information as a simple way to invalidate the cache information, avoiding | ||||||
| # confusing behavior in case the cache is stale. | ||||||
| last_test_count: int | None | ||||||
|
|
||||||
| # The date when the cache was last updated, for information purposes only. | ||||||
| last_cache_date_str: str | ||||||
|
|
||||||
| @property | ||||||
| def last_cache_date(self) -> datetime: | ||||||
| return datetime.fromisoformat(self.last_cache_date_str) | ||||||
|
|
||||||
| @classmethod | ||||||
| def empty(cls) -> Self: | ||||||
| return cls( | ||||||
| last_failed=None, | ||||||
| last_test_count=None, | ||||||
| last_cache_date_str=datetime.now().isoformat(), | ||||||
| ) | ||||||
|
|
||||||
| def update_date_to_now(self) -> None: | ||||||
| self.last_cache_date_str = datetime.now().isoformat() | ||||||
|
|
||||||
|
|
||||||
| class StepwisePlugin: | ||||||
| def __init__(self, config: Config) -> None: | ||||||
| self.config = config | ||||||
| self.session: Session | None = None | ||||||
| self.report_status = "" | ||||||
| self.report_status: list[str] = [] | ||||||
| assert config.cache is not None | ||||||
| self.cache: Cache = config.cache | ||||||
| self.lastfailed: str | None = self.cache.get(STEPWISE_CACHE_DIR, None) | ||||||
| self.skip: bool = config.getoption("stepwise_skip") | ||||||
| self.reset: bool = config.getoption("stepwise_reset") | ||||||
| self.cached_info = self._load_cached_info() | ||||||
|
|
||||||
| def _load_cached_info(self) -> StepwiseCacheInfo: | ||||||
| cached_dict: dict[str, Any] | None = self.cache.get(STEPWISE_CACHE_DIR, None) | ||||||
| if cached_dict: | ||||||
| try: | ||||||
| return StepwiseCacheInfo( | ||||||
| cached_dict["last_failed"], | ||||||
| cached_dict["last_test_count"], | ||||||
| cached_dict["last_cache_date_str"], | ||||||
| ) | ||||||
| except (KeyError, TypeError) as e: | ||||||
| error = f"{type(e).__name__}: {e}" | ||||||
| self.report_status.append(f"error reading cache, discarding ({error})") | ||||||
|
|
||||||
| # Cache not found or error during load, return a new cache. | ||||||
| return StepwiseCacheInfo.empty() | ||||||
|
|
||||||
| def pytest_sessionstart(self, session: Session) -> None: | ||||||
| self.session = session | ||||||
|
|
||||||
| def pytest_collection_modifyitems( | ||||||
| self, config: Config, items: list[nodes.Item] | ||||||
| ) -> None: | ||||||
| if not self.lastfailed: | ||||||
| self.report_status = "no previously failed tests, not skipping." | ||||||
| last_test_count = self.cached_info.last_test_count | ||||||
| self.cached_info.last_test_count = len(items) | ||||||
|
|
||||||
| if self.reset: | ||||||
| self.report_status.append("resetting state, not skipping.") | ||||||
| self.cached_info.last_failed = None | ||||||
| return | ||||||
|
|
||||||
| if not self.cached_info.last_failed: | ||||||
| self.report_status.append("no previously failed tests, not skipping.") | ||||||
| return | ||||||
|
|
||||||
| if last_test_count is not None and last_test_count != len(items): | ||||||
The-Compiler marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| self.report_status.append( | ||||||
| f"test count changed, not skipping (now {len(items)} tests, previously {last_test_count})." | ||||||
| ) | ||||||
| self.cached_info.last_failed = None | ||||||
| return | ||||||
|
|
||||||
| # check all item nodes until we find a match on last failed | ||||||
| # Check all item nodes until we find a match on last failed. | ||||||
| failed_index = None | ||||||
| for index, item in enumerate(items): | ||||||
| if item.nodeid == self.lastfailed: | ||||||
| if item.nodeid == self.cached_info.last_failed: | ||||||
| failed_index = index | ||||||
| break | ||||||
|
|
||||||
| # If the previously failed test was not found among the test items, | ||||||
| # do not skip any tests. | ||||||
| if failed_index is None: | ||||||
| self.report_status = "previously failed test not found, not skipping." | ||||||
| self.report_status.append("previously failed test not found, not skipping.") | ||||||
| else: | ||||||
| self.report_status = f"skipping {failed_index} already passed items." | ||||||
| self.report_status.append( | ||||||
| f"skipping {failed_index} already passed items (cache from {self.cached_info.last_cache_date}," | ||||||
|
||||||
| f"skipping {failed_index} already passed items (cache from {self.cached_info.last_cache_date}," | |
| f"skipping {failed_index} already passed items (cache from {self.cached_info.last_cache_date:%c}," |
maybe, where %c is "Locale’s appropriate date and time representation."?
Without this, it gets displayed as e.g. cache from 2025-01-15 14:25:45.458813 which seems a bit verbose, and the fractional seconds don't really add any useful information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did try that initially, but to me it showed up as the default locale (C): IIUC we need to call locale.setlocale at some point explicitly for %c to work properly, otherwise it will always use the C locale (and we cannot do that because this is a global setting, which pytest definitely should not touch IMO).
But I might be wrong, can you test that in your machine?
If %c does not work, we might explicitly use %YY-%M-%d %h:%m:%s to get rid of the fractional seconds, which I agree is not useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, looks like it! My locale setup is a bit weird at the moment anyways (experimented with en_DK to get proper dates, but turns out it's not supported everywhere so now I get a mess of formats...). Because of that, I expected that to be an issue on my end.
What do you think about making a timedelta to now and then just formatting that in instead? That would then show up as 0:00:10 ago or 42 days, 1:23:45 ago which seems a much nicer format for the information we actually want to convey here.
edit: Whoops, I completely neglected that this would show microseconds too... I suppose that could be fixed with a datetime.timedelta(seconds=int(td.total_seconds())) (with td being the original delta).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea about timedelta, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I considered only showing that information if the cache was "old", but then what could be considered an "old" cache? 1 hr? 1 day? In the ended I kept it simple and always show the cache age for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, I have been thinking about that as well, and came to the same conclusion as you did 😅
Uh oh!
There was an error while loading. Please reload this page.