ci: skip riot runs based on diff from last successful run (second attempt) [backport 1.18] (#6636)

github-actions[bot] · emmettbutler · web-flow · commit b5b2b2c884d9 · 2023-08-11T13:54:26.000+01:00
Backport 5875e8a from #6631 to 1.18. __Based on #6600, which was reverted from 1.x__ This pull request reduces the amount of duplicated work done in CI on pull request builds by skipping a given riot job based on the changes since the last time that job succeeded. The approach taken here reduces the amount of time spent waiting for jobs to pass after updating a branch from trunk, ensuring that only those jobs that might have been affected by the update are run. This logic is supported by CircleCI's `save_cache` and `restore_cache` directives. The cache key includes the branch name, job name, and node number to ensure uniqueness for each branch/job/node combination. `restore_cache` uses a prefix match on this key format. No new skips are applied on `*.x` branches, which mitigates much of the risk of this change. Here's a detailed description of the logical flow. 1. A commit with SHA `A` is pushed to a new branch, changing `contrib/cherrypy/middleware.py`. 2. As on current 1.x, CI jobs are selected based on the commit's diff. The `cherrypy` job starts. 3. The `restore_cache` step in the `cherrypy` job finds no cache. 4. The `cherrypy` job succeeds and writes a file to the CI runner's filesystem containing the Git hash of the current HEAD, `A`. This is the "latest successful commit" for the `cherrypy` job on this branch. 5. The `save_cache` step caches this file under a key that includes the branch name, `cherrypy`, and the epoch timestamp. 6. Another commit with SHA `B` is pushed to the branch, changing `contrib/falcon/patch.py`. The branch now contains changes to `falcon` and `cherrypy`. 7. The dynamic configuration generator selects the `falcon` and `cherrypy` jobs to run. 8. In the `falcon` job, the `restore_cache` step finds no cache at the key `*-falcon-*`. In the `cherrypy` job, this step finds a cache. 9. The `falcon` job runs similarly to steps 4 and 5. The `cherrypy` job opens the file it found in the cache and passes the resulting Git SHA to `needs_testrun.py`. This script runs `git diff B A`, finds that none of the changes in that diff affect `cherrypy`, and exits before running `riot`. **Why not do this check during the setup step?** The new diff check added in this pull request requires access to the CircleCI cache on a per-job basis, which is only available from inside each job. **How was this tested?** 1. [This commit](https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/43108/workflows/c149c29b-d3b2-47dc-b1f0-06696d865134) changed the `pylons` and `asynctest` suites, causing the `asynctest` suite to fail. CI correctly shows both the pass and the fail. 2. [This commit](https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/43117/workflows/0ffa0c7e-7e9c-41ee-852c-3b691b120f1a) fixed the `asynctest` failure. CI shows that the `pylons` job was skipped and the `asynctest` job was re-run. 3. [This commit](https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/43118/workflows/00174ec3-98ad-46a1-8b58-35811feda777) reverted the changes to both suites. CI did not queue either of them. ## Checklist - [x] Change(s) are motivated and described in the PR description. - [x] Testing strategy is described if automated tests are not included in the PR. - [x] Risk is outlined (performance impact, potential for breakage, maintainability, etc). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed. If no release note is required, add label `changelog/no-changelog`. - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)). - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Title is accurate. - [x] No unnecessary changes are introduced. - [x] Description motivates each change. - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [x] Testing strategy adequately addresses listed risk(s). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] Release note makes sense to a user of the library. - [x] Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment. - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: Emmett Butler <723615+emmettbutler@users.noreply.github.com>
diff --git a/.circleci/config.templ.yml b/.circleci/config.templ.yml
@@ -126,6 +126,9 @@ commands:
       - checkout
       - attach_workspace:
           at: .
+      - restore_cache:
+          keys:
+            - lastsuccess-{{ .Environment.CIRCLE_BRANCH }}-<<parameters.pattern>>-{{ .Environment.CIRCLE_NODE_INDEX }}
       - when:
           condition:
               << parameters.snapshot >>
@@ -139,9 +142,7 @@ commands:
                   DD_TRACE_AGENT_URL: << parameters.trace_agent_url >> 
                   RIOT_RUN_RECOMPILE_REQS: "<< pipeline.parameters.riot_run_latest >>"
                 command: |
-                  # Sort the hashes to ensure a consistent ordering/division between each node
-                  riot list --hash-only '<<parameters.pattern>>' | sort | circleci tests split | xargs -n 1 -I {} ./scripts/ddtest riot -v run --exitfirst --pass-env -s {} $([ -v _CI_DD_API_KEY ] && echo '--ddtrace' ) $([[ << pipeline.parameters.coverage >> == false ]] && echo '--no-cov' )
-                  ./scripts/check-diff ".riot/requirements/" "Changes detected after running riot. Consider deleting changed files, running scripts/compile-and-prune-test-requirements and committing the result."
+                  ./scripts/run-test-suite '<<parameters.pattern>>' <<pipeline.parameters.coverage>> 1
       - unless:
           condition:
               << parameters.snapshot >>
@@ -166,9 +167,7 @@ commands:
                 environment:
                   RIOT_RUN_RECOMPILE_REQS: "<< pipeline.parameters.riot_run_latest >>"
                 command: |
-                  # Sort the hashes to ensure a consistent ordering/division between each node
-                  riot list --hash-only '<<parameters.pattern>>' | sort | circleci tests split | xargs -n 1 -I {} riot -v run --exitfirst --pass-env -s {} $([ -v _CI_DD_API_KEY ] && echo '--ddtrace' ) $([[ << pipeline.parameters.coverage >> == false ]] && echo '--no-cov' )
-                  ./scripts/check-diff ".riot/requirements/" "Changes detected after running riot. Consider deleting changed files, running scripts/compile-and-prune-test-requirements and committing the result."
+                  ./scripts/run-test-suite '<<parameters.pattern>>' <<pipeline.parameters.coverage>>
       - when:
           condition:
             and:
@@ -180,6 +179,10 @@ commands:
           path: test-results
       - store_artifacts:
           path: test-results
+      - save_cache:
+          key: lastsuccess-{{ .Environment.CIRCLE_BRANCH }}-<<parameters.pattern>>-{{ .Environment.CIRCLE_NODE_INDEX }}-{{ epoch }}
+          paths:
+            - ./latest-success-commit
       - run:
           name: Get APM Test Agent Trace Check Results
           when: always
@@ -427,9 +430,7 @@ jobs:
           environment:
             RIOT_RUN_RECOMPILE_REQS: "<< pipeline.parameters.riot_run_latest >>"
           command: |
-            # Sort the hashes to ensure a consistent ordering/division between each node
-            riot list --hash-only 'integration-latest' | sort | circleci tests split | xargs -n 1 -I {} ./scripts/ddtest riot -v run --pass-env -s {}
-            ./scripts/check-diff ".riot/requirements/" "Changes detected after running riot. Consider deleting changed files, running scripts/compile-and-prune-test-requirements and committing the result."
+            ./scripts/run-test-suite 'integration-latest' <<pipeline.parameters.coverage>> 1
 
   integration_testagent:
     <<: *machine_executor
diff --git a/scripts/needs_testrun.py b/scripts/needs_testrun.py
@@ -58,43 +58,36 @@ def get_merge_base(pr_number: int) -> str:
 
 
 @cache
-def get_changed_files(pr_number: int) -> t.Set[str]:
+def get_changed_files(pr_number: int, sha: t.Optional[str] = None) -> t.Set[str]:
     """Get the files changed in a PR
 
+    Try with the GitHub REST API for the most accurate result. If that fails,
+    or if there is a specific SHA given, use the less accurate method of
+    diffing against a base commit, either the given SHA or the merge-base.
+
     >>> sorted(get_changed_files(6388))  # doctest: +NORMALIZE_WHITESPACE
     ['ddtrace/debugging/_expressions.py',
     'releasenotes/notes/fix-debugger-expressions-none-literal-30f3328d2e386f40.yaml',
     'tests/debugging/test_expressions.py']
     """
-    try:
-        # Try with the GitHub REST API for the most accurate result
-        url = f"https://api.github.com/repos/datadog/dd-trace-py/pulls/{pr_number}/files"
-        headers = {"Accept": "application/vnd.github+json"}
-
-        return {_["filename"] for _ in json.load(urlopen(Request(url, headers=headers)))}
-
-    except Exception:
-        # If that fails use the less accurate method of diffing against the
-        # merge-base w.r.t. the base branch
-        LOGGER.warning("Failed to get changed files from GitHub API, using git diff instead")
-        return set(
-            check_output(
-                [
-                    "git",
-                    "diff",
-                    "--name-only",
-                    "HEAD",
-                    get_merge_base(pr_number),
-                ]
-            )
-            .decode("utf-8")
-            .strip()
-            .splitlines()
-        )
+    rest_check_failed = False
+    if sha is None:
+        try:
+            url = f"https://api.github.com/repos/datadog/dd-trace-py/pulls/{pr_number}/files"
+            headers = {"Accept": "application/vnd.github+json"}
+            return {_["filename"] for _ in json.load(urlopen(Request(url, headers=headers)))}
+        except Exception:
+            rest_check_failed = True
+            LOGGER.warning("Failed to get changed files from GitHub API")
+
+    if sha is not None or rest_check_failed:
+        diff_base = sha or get_merge_base(pr_number)
+        LOGGER.info("Checking changed files against commit %s", diff_base)
+        return set(check_output(["git", "diff", "--name-only", "HEAD", diff_base]).decode("utf-8").strip().splitlines())
 
 
 @cache
-def needs_testrun(suite: str, pr_number: int) -> bool:
+def needs_testrun(suite: str, pr_number: int, sha: t.Optional[str] = None) -> bool:
     """Check if a testrun is needed for a suite and PR
 
     >>> needs_testrun("debugger", 6485)
@@ -115,7 +108,7 @@ def needs_testrun(suite: str, pr_number: int) -> bool:
         return True
 
     try:
-        changed_files = get_changed_files(pr_number)
+        changed_files = get_changed_files(pr_number, sha=sha)
     except Exception:
         LOGGER.error("Failed to get changed files")
         return True
@@ -181,15 +174,16 @@ def main() -> bool:
     argp = ArgumentParser()
 
     argp.add_argument("suite", help="The suite to use", type=str)
-    argp.add_argument("pr", help="The PR number", type=int)
+    argp.add_argument("--pr", help="The PR number", type=int, default=_get_pr_number())
+    argp.add_argument("--sha", help="Commit hash to use as diff base (defaults to PR merge root)", type=lambda v: v or None)
     argp.add_argument("--verbose", "-v", action="store_true", help="Verbose output")
 
     args = argp.parse_args()
 
     if args.verbose:
         LOGGER.setLevel(logging.INFO)
 
-    return needs_testrun(args.suite, args.pr)
+    return needs_testrun(args.suite, args.pr, sha=args.sha)
 
 
 if __name__ == "__main__":
diff --git a/scripts/run-test-suite b/scripts/run-test-suite
@@ -0,0 +1,45 @@
+#!/usr/bin/env bash
+
+CHECKPOINT_FILENAME="latest-success-commit"
+RIOT_PATTERN=${1}
+if [[ -v CIRCLECI ]]; then
+    RIOT_HASHES=$(riot list --hash-only $RIOT_PATTERN | sort | circleci tests split)
+else
+    RIOT_HASHES=$(riot list --hash-only $RIOT_PATTERN | sort)
+fi
+DDTRACE_FLAG=$([ -v _CI_DD_API_KEY ] && echo '--ddtrace')
+COVERAGE_FLAG=$([[ "${2:-false}" == false ]] && echo '--no-cov')
+DDTEST_CMD=$([[ ${3} == "1" ]] && echo "./scripts/ddtest")
+
+set -e
+
+if ! [[ -v CIRCLECI && $CIRCLE_BRANCH =~ [0-9]\.x ]]; then
+    if [[ -f "$CHECKPOINT_FILENAME" ]]; then
+        latest_success_commit=$(cat $CHECKPOINT_FILENAME)
+        if ! ./scripts/needs_testrun.py $CIRCLE_JOB --sha $latest_success_commit; then
+            echo "The $CIRCLE_JOB job succeeded at commit $latest_success_commit."
+            echo "None of the changes on this branch since that commit affect the $CIRCLE_JOB job."
+            echo "Skipping this job."
+            circleci step halt
+            exit 0
+        fi
+    fi
+fi
+
+for hash in $RIOT_HASHES; do
+    if ! $DDTEST_CMD riot -v run --exitfirst --pass-env -s $hash $DDTRACE_FLAG $COVERAGE_FLAG; then
+        if [[ -v CIRCLECI ]]; then
+            circleci step halt
+        fi
+        exit 1
+    fi
+done
+
+rm -f $CHECKPOINT_FILENAME
+echo $CIRCLE_SHA1 > $CHECKPOINT_FILENAME
+echo "All tests passed. Saved $CIRCLE_SHA1 as the latest successful commit for job $CIRCLE_JOB"
+
+./scripts/check-diff \
+    ".riot/requirements/" \
+    "Changes detected after running riot. Consider deleting changed files, \
+    running scripts/compile-and-prune-test-requirements and committing the result."