fix(ci_visibility): improve compatibility with external retry plugins [backport 3.10] (#13877)

github-actions[bot] · vitor-de-araujo · brettlangdon · web-flow · commit 710c6fc3f780 · 2025-07-07T08:17:23.000-04:00
Backport 775aaf0 from #13854 to 3.10. Currently, if Test Optimization is used together with [`pytest-rerunfailures`](https://github.com/pytest-dev/pytest-rerunfailures) or [`flaky`](https://github.com/box/flaky), weird things happen, because all those plugins are based on defining a custom `pytest_runtest_protocol` hook. Depending on the plugin load order, either the external plugins will work but ddtrace will report the test as failed to the backend (because it did not collect the test run data), or ddtrace will work but the external plugin will not do anything. This PR does two things: - Collect report information for executed tests in `pytest_runtest_makereport` and save it into a global dictionary. This makes the information available for us even if some other plugin runs the test and not ours, so we can still report the test result correctly to the backend. - If `pytest` or `rerunfailures` is present, we let them do their thing and don't run our `pytest_runtest_protocol`. This means our advanced features such as EFD, ATR, and Flaky Test Management will not work, but at least the external plugin will work correctly and we will report their results to the backend. ### Roads not taken I tried to overwrite the builtin pytest runner's `pytest_runtest_protocol`, in a similar way to how [`flaky` overwrites `call_and_report`](https://github.com/box/flaky/blob/v3.8.1/flaky/flaky_pytest_plugin.py#L87). The problem is that then we run _inside_ `flaky`'s `pytest_runtest_protocol`, and we end up logging and counting all `flaky` runs, not just the last one (as should happen with `flaky`), with the consequence that failed retries will count as failures even if the test eventually succeeds. Maybe one way to overcome this would be for us to wrap _`flaky`_'s `pytest_runtest_protocol`, but I would rather avoid patching an external plugin that patches the builtin runner. The `flaky` and `rerunfailures` functionality largely overlaps with ATR, so it's probably a better experience for users to use either `flaky`/`rerunfailures` or ATR, but not both. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: Vítor De Araújo <vitor.dearaujo@datadoghq.com> Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -66,6 +66,7 @@ tests/contrib/asynctest                               @DataDog/ci-app-libraries
 tests/contrib/pytest                                  @DataDog/ci-app-libraries
 tests/contrib/pytest_bdd                              @DataDog/ci-app-libraries
 tests/contrib/pytest_benchmark                        @DataDog/ci-app-libraries
+tests/contrib/pytest_flaky                            @DataDog/ci-app-libraries
 tests/contrib/unittest                                @DataDog/ci-app-libraries
 tests/integration/test_integration_civisibility.py    @DataDog/ci-app-libraries
 ddtrace/ext/ci.py                                     @DataDog/ci-app-libraries
diff --git a/.riot/requirements/10f023c.txt b/.riot/requirements/10f023c.txt
@@ -0,0 +1,21 @@
+#
+# This file is autogenerated by pip-compile with Python 3.12
+# by the following command:
+#
+#    pip-compile --allow-unsafe --no-annotate .riot/requirements/10f023c.in
+#
+attrs==25.3.0
+coverage[toml]==7.9.2
+flaky==3.8.1
+hypothesis==6.45.0
+iniconfig==2.1.0
+mock==5.2.0
+opentracing==2.4.0
+packaging==25.0
+pluggy==1.6.0
+pygments==2.19.2
+pytest==8.4.1
+pytest-cov==6.2.1
+pytest-mock==3.14.1
+pytest-randomly==3.16.0
+sortedcontainers==2.4.0
diff --git a/.riot/requirements/131a701.txt b/.riot/requirements/131a701.txt
@@ -0,0 +1,26 @@
+#
+# This file is autogenerated by pip-compile with Python 3.9
+# by the following command:
+#
+#    pip-compile --allow-unsafe --no-annotate .riot/requirements/131a701.in
+#
+attrs==25.3.0
+coverage[toml]==7.9.2
+exceptiongroup==1.3.0
+flaky==3.8.1
+hypothesis==6.45.0
+importlib-metadata==8.7.0
+iniconfig==2.1.0
+mock==5.2.0
+opentracing==2.4.0
+packaging==25.0
+pluggy==1.6.0
+pygments==2.19.2
+pytest==8.4.1
+pytest-cov==6.2.1
+pytest-mock==3.14.1
+pytest-randomly==3.16.0
+sortedcontainers==2.4.0
+tomli==2.2.1
+typing-extensions==4.14.0
+zipp==3.23.0
diff --git a/.riot/requirements/60dc244.txt b/.riot/requirements/60dc244.txt
@@ -0,0 +1,21 @@
+#
+# This file is autogenerated by pip-compile with Python 3.11
+# by the following command:
+#
+#    pip-compile --allow-unsafe --no-annotate .riot/requirements/60dc244.in
+#
+attrs==25.3.0
+coverage[toml]==7.9.2
+flaky==3.8.1
+hypothesis==6.45.0
+iniconfig==2.1.0
+mock==5.2.0
+opentracing==2.4.0
+packaging==25.0
+pluggy==1.6.0
+pygments==2.19.2
+pytest==8.4.1
+pytest-cov==6.2.1
+pytest-mock==3.14.1
+pytest-randomly==3.16.0
+sortedcontainers==2.4.0
diff --git a/.riot/requirements/98ec6ba.txt b/.riot/requirements/98ec6ba.txt
@@ -0,0 +1,24 @@
+#
+# This file is autogenerated by pip-compile with Python 3.10
+# by the following command:
+#
+#    pip-compile --allow-unsafe --no-annotate .riot/requirements/98ec6ba.in
+#
+attrs==25.3.0
+coverage[toml]==7.9.2
+exceptiongroup==1.3.0
+flaky==3.8.1
+hypothesis==6.45.0
+iniconfig==2.1.0
+mock==5.2.0
+opentracing==2.4.0
+packaging==25.0
+pluggy==1.6.0
+pygments==2.19.2
+pytest==8.4.1
+pytest-cov==6.2.1
+pytest-mock==3.14.1
+pytest-randomly==3.16.0
+sortedcontainers==2.4.0
+tomli==2.2.1
+typing-extensions==4.14.0
diff --git a/.riot/requirements/c826075.txt b/.riot/requirements/c826075.txt
@@ -0,0 +1,25 @@
+#
+# This file is autogenerated by pip-compile with Python 3.8
+# by the following command:
+#
+#    pip-compile --allow-unsafe --no-annotate .riot/requirements/c826075.in
+#
+attrs==25.3.0
+coverage[toml]==7.6.1
+exceptiongroup==1.3.0
+flaky==3.8.1
+hypothesis==6.45.0
+importlib-metadata==8.5.0
+iniconfig==2.1.0
+mock==5.2.0
+opentracing==2.4.0
+packaging==25.0
+pluggy==1.5.0
+pytest==8.3.5
+pytest-cov==5.0.0
+pytest-mock==3.14.1
+pytest-randomly==3.15.0
+sortedcontainers==2.4.0
+tomli==2.2.1
+typing-extensions==4.13.2
+zipp==3.20.2
diff --git a/ddtrace/contrib/internal/pytest/_plugin_v2.py b/ddtrace/contrib/internal/pytest/_plugin_v2.py
@@ -39,6 +39,7 @@
 from ddtrace.contrib.internal.pytest._utils import _pytest_version_supports_retries
 from ddtrace.contrib.internal.pytest._utils import _TestOutcome
 from ddtrace.contrib.internal.pytest._utils import excinfo_by_report
+from ddtrace.contrib.internal.pytest._utils import reports_by_item
 from ddtrace.contrib.internal.pytest.constants import FRAMEWORK
 from ddtrace.contrib.internal.pytest.constants import USER_PROPERTY_QUARANTINED
 from ddtrace.contrib.internal.pytest.constants import XFAIL_REASON
@@ -99,6 +100,9 @@
 _NODEID_REGEX = re.compile("^((?P<module>.*)/(?P<suite>[^/]*?))::(?P<name>.*?)$")
 OUTCOME_QUARANTINED = "quarantined"
 DISABLED_BY_TEST_MANAGEMENT_REASON = "Flaky test is disabled by Datadog"
+INCOMPATIBLE_PLUGINS = ("flaky", "rerunfailures")
+
+skip_pytest_runtest_protocol = False
 
 
 class XdistHooks:
@@ -259,6 +263,8 @@ def _pytest_load_initial_conftests_pre_yield(early_config, parser, args):
 
 
 def pytest_configure(config: pytest_Config) -> None:
+    global skip_pytest_runtest_protocol
+
     if os.getenv("DD_PYTEST_USE_NEW_PLUGIN_BETA"):
         # Logging the warning at this point ensures it shows up in output regardless of the use of the -s flag.
         deprecate(
@@ -275,6 +281,19 @@ def pytest_configure(config: pytest_Config) -> None:
             if _is_pytest_cov_enabled(config):
                 patch_coverage()
 
+            skip_pytest_runtest_protocol = False
+
+            for plugin in INCOMPATIBLE_PLUGINS:
+                if config.pluginmanager.hasplugin(plugin):
+                    log.warning(
+                        "The pytest `%s` plugin is in use; Test Optimization advanced features will be disabled. "
+                        "You can run `pytest` with `-p no:%s` to disable the plugin and enable Test Optimization "
+                        "features.",
+                        plugin,
+                        plugin,
+                    )
+                    skip_pytest_runtest_protocol = True
+
             # pytest-bdd plugin support
             if config.pluginmanager.hasplugin("pytest-bdd"):
                 from ddtrace.contrib.internal.pytest._pytest_bdd_subplugin import _PytestBddSubPlugin
@@ -460,7 +479,13 @@ def _pytest_runtest_protocol_post_yield(item, nextitem, coverage_collector):
 
     if not InternalTest.is_finished(test_id):
         log.debug("Test %s was not finished normally during pytest_runtest_protocol, finishing it now", test_id)
-        InternalTest.finish(test_id)
+        reports_dict = reports_by_item.get(item)
+        if reports_dict:
+            test_outcome = _process_reports_dict(item, reports_dict)
+            InternalTest.finish(test_id, test_outcome.status, test_outcome.skip_reason, test_outcome.exc_info)
+        else:
+            log.debug("Test %s has no entry in reports_by_item", test_id)
+            InternalTest.finish(test_id)
 
     if coverage_collector is not None:
         _handle_collected_coverage(test_id, coverage_collector)
@@ -505,6 +530,13 @@ def pytest_runtest_protocol(item, nextitem) -> t.Optional[bool]:
     if not is_test_visibility_enabled():
         return None
 
+    if skip_pytest_runtest_protocol:
+        # Retry-based features such as Early Flake Detection, Auto Test Retries, and Attempt-to-Fix do not work properly
+        # with external retry plugins such as `flaky` and `pytest-rerunfailures`. If those plugins are in use, we let
+        # their `pytest_runtest_protocol` run and report their results to the backend, and do not run our advanced
+        # features.
+        return None
+
     try:
         _pytest_run_one_test(item, nextitem)
         return True  # Do not run pytest's internal `pytest_runtest_protocol`.
@@ -518,9 +550,8 @@ def pytest_runtest_protocol(item, nextitem) -> t.Optional[bool]:
 def _pytest_run_one_test(item, nextitem):
     item.ihook.pytest_runtest_logstart(nodeid=item.nodeid, location=item.location)
     reports = runtestprotocol(item, nextitem=nextitem, log=False)
-    test_outcome = _process_reports(item, reports)
-
     reports_dict = {report.when: report for report in reports}
+    test_outcome = _process_reports_dict(item, reports_dict)
 
     test_id = _get_test_id_from_item(item)
     is_quarantined = InternalTest.is_quarantined_test(test_id)
@@ -573,14 +604,20 @@ def _pytest_run_one_test(item, nextitem):
     item.ihook.pytest_runtest_logfinish(nodeid=item.nodeid, location=item.location)
 
 
-def _process_reports(item, reports) -> _TestOutcome:
+def _process_reports_dict(item, reports) -> _TestOutcome:
     final_outcome = None
-    for report in reports:
+
+    for when in (TestPhase.SETUP, TestPhase.CALL, TestPhase.TEARDOWN):
+        report = reports.get(when)
+        if not report:
+            continue
+
         outcome = _process_result(item, report)
         if final_outcome is None or final_outcome.status is None:
             final_outcome = outcome
             if final_outcome.status is not None:
                 return final_outcome
+
     return final_outcome
 
 
@@ -681,6 +718,7 @@ def pytest_runtest_makereport(item: pytest.Item, call: pytest_CallInfo) -> None:
     # DEV: Make excinfo available for later use, when we don't have the `call` object anymore.
     # We cannot stash it directly into the report because pytest-xdist fails to serialize the report if we do that.
     excinfo_by_report[outcome.get_result()] = call.excinfo
+    reports_by_item.setdefault(item, {})[call.when] = outcome.get_result()
 
     if not is_test_visibility_enabled():
         return
diff --git a/ddtrace/contrib/internal/pytest/_utils.py b/ddtrace/contrib/internal/pytest/_utils.py
@@ -248,3 +248,4 @@ def get_user_property(report, key, default=None):
 
 
 excinfo_by_report = weakref.WeakKeyDictionary()
+reports_by_item = weakref.WeakKeyDictionary()
diff --git a/ddtrace/internal/ci_visibility/constants.py b/ddtrace/internal/ci_visibility/constants.py
@@ -87,7 +87,7 @@ class LIBRARY_CAPABILITIES(str, Enum):
 CIVISIBILITY_LOG_FILTER_RE = re.compile(
     "|".join(
         [
-            r"^ddtrace\.contrib\.(coverage|pytest|unittest)",
+            r"^ddtrace\.contrib\.internal\.(coverage|pytest|unittest)",
             r"ddtrace\.internal\.(ci_visibility|gitmetadata).*",
             r"ddtrace\.ext\.(git|ci_visibility|test)",
         ]
diff --git a/releasenotes/notes/ci_visibility-fix-pytest-retry-plugins-compat-7ae7c8dc81676195.yaml b/releasenotes/notes/ci_visibility-fix-pytest-retry-plugins-compat-7ae7c8dc81676195.yaml
@@ -0,0 +1,7 @@
+---
+fixes:
+  - |
+    CI Visibility: This fix resolves an issue where using Test Optimization together with external retry plugins such as ``flaky`` or
+    ``pytest-rerunfailures`` would cause the test results not to be reported correctly to Datadog. With this change,
+    those plugins can be used with ddtrace, and test results will be reported to Datadog, but Test Optimization advanced
+    features such as Early Flake Detection and Auto Test Retries will not be available when such plugins are used.
diff --git a/riotfile.py b/riotfile.py
@@ -1815,6 +1815,15 @@ def select_pys(min_version: str = MIN_PYTHON_VERSION, max_version: str = MAX_PYT
                 ),
             ],
         ),
+        Venv(
+            name="pytest:flaky",
+            pys=select_pys(min_version="3.8", max_version="3.12"),
+            command="pytest {cmdargs} --no-ddtrace --no-cov -p no:flaky tests/contrib/pytest_flaky/",
+            pkgs={
+                "flaky": latest,
+                "pytest-randomly": latest,
+            },
+        ),
         Venv(
             name="grpc",
             command="python -m pytest -v {cmdargs} tests/contrib/grpc",
diff --git a/tests/ci_visibility/suitespec.yml b/tests/ci_visibility/suitespec.yml
@@ -62,6 +62,7 @@ suites:
       - tests/contrib/pytest/*
       - tests/contrib/pytest_benchmark/*
       - tests/contrib/pytest_bdd/*
+      - tests/contrib/pytest_flaky/*
       - tests/snapshots/tests.contrib.pytest.*
     runner: riot
     snapshot: true
diff --git a/tests/contrib/pytest_flaky/__init__.py b/tests/contrib/pytest_flaky/__init__.py
diff --git a/tests/contrib/pytest_flaky/test_pytest_flaky.py b/tests/contrib/pytest_flaky/test_pytest_flaky.py
@@ -0,0 +1,72 @@
+from unittest import mock
+
+import pytest
+
+from ddtrace.internal.ci_visibility._api_client import TestVisibilityAPISettings
+from tests.contrib.pytest.test_pytest import PytestTestCaseBase
+from tests.contrib.pytest.test_pytest import _get_spans_from_list
+
+
+_TEST_CONTENT = """
+import flaky
+
+def test_func_pass():
+    assert True
+
+def test_func_fail():
+    assert False
+
+flaky_counter = 0
+
+@flaky.flaky
+def test_func_flaky():
+    global flaky_counter
+    flaky_counter += 1
+    assert flaky_counter >= 2
+
+"""
+
+
+class TestPytestFlakyPlugin(PytestTestCaseBase):
+    """
+    Check that the Test Optimization pytest plugin interacts correctly with the `flaky` plugin.
+    """
+
+    @pytest.fixture(autouse=True, scope="function")
+    def set_up_atr(self):
+        with mock.patch(
+            "ddtrace.internal.ci_visibility.recorder.CIVisibility._check_enabled_features",
+            return_value=TestVisibilityAPISettings(flaky_test_retries_enabled=True),
+        ):
+            yield
+
+    def test_pytest_flaky(self):
+        self.testdir.makepyfile(test_sample=_TEST_CONTENT)
+        rec = self.inline_run("--ddtrace", "-p", "flaky")
+        spans = self.pop_spans()
+        pass_spans = _get_spans_from_list(spans, "test", "test_func_pass")
+        fail_spans = _get_spans_from_list(spans, "test", "test_func_fail")
+        flaky_spans = _get_spans_from_list(spans, "test", "test_func_flaky")
+        assert len(pass_spans) == 1
+        assert len(fail_spans) == 1  # ATR is off because the `flaky` plugin is enabled
+        assert len(flaky_spans) == 1  # ATR is off because the `flaky` plugin is enabled
+        assert pass_spans[0].get_tag("test.status") == "pass"
+        assert fail_spans[0].get_tag("test.status") == "fail"
+        assert flaky_spans[0].get_tag("test.status") == "pass"  # `flaky` plugin made it pass
+        assert rec.ret == 1
+
+    def test_pytest_no_flaky(self):
+        self.testdir.makepyfile(test_sample=_TEST_CONTENT)
+        rec = self.inline_run("--ddtrace", "-p", "no:flaky")
+        spans = self.pop_spans()
+        pass_spans = _get_spans_from_list(spans, "test", "test_func_pass")
+        fail_spans = _get_spans_from_list(spans, "test", "test_func_fail")
+        flaky_spans = _get_spans_from_list(spans, "test", "test_func_flaky")
+        assert len(pass_spans) == 1
+        assert len(fail_spans) == 6  # ATR is on
+        assert len(flaky_spans) == 2  # ATR is on, passed on 2nd attempt
+        assert pass_spans[0].get_tag("test.status") == "pass"
+        assert fail_spans[0].get_tag("test.status") == "fail"
+        assert flaky_spans[0].get_tag("test.status") == "fail"
+        assert flaky_spans[1].get_tag("test.status") == "pass"
+        assert rec.ret == 1

Original file line number	Diff line number	Diff line change
`@@ -248,3 +248,4 @@ def get_user_property(report, key, default=None):`
`248`	`248`
`249`	`249`
`250`	`250`	`excinfo_by_report = weakref.WeakKeyDictionary()`
	`251`	`+reports_by_item = weakref.WeakKeyDictionary()`
Original file line number	Diff line number	Diff line change
`@@ -87,7 +87,7 @@ class LIBRARY_CAPABILITIES(str, Enum):`
`87`	`87`	`CIVISIBILITY_LOG_FILTER_RE = re.compile(`
`88`	`88`	`"\|".join(`
`89`	`89`	`[`
`90`		`- r"^ddtrace\.contrib\.(coverage\|pytest\|unittest)",`
	`90`	`+ r"^ddtrace\.contrib\.internal\.(coverage\|pytest\|unittest)",`
`91`	`91`	`r"ddtrace\.internal\.(ci_visibility\|gitmetadata).*",`
`92`	`92`	`r"ddtrace\.ext\.(git\|ci_visibility\|test)",`
`93`	`93`	`]`