You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix any linting errors before continuing with building and testing.
22
-
23
-
24
-
Building
25
-
--------
26
-
27
-
Rust code is built using both `cargo build` and Bazel.
28
-
29
-
After changing a package under `rs/$PACKAGE` run `bazel build //rs/$PACKAGE`.
30
-
31
-
32
-
Changing crate dependencies
33
-
---------------------------
34
-
35
-
If crate dependencies need to be changed or added:
36
-
37
-
1. First modify the `Cargo.toml` local to the package.
38
-
2. If a crate is used by multiple packages add it to the workspace `Cargo.toml` in the root of the repo and reference it in the `Cargo.toml` local to the package using `{ workspace = true }`.
39
-
3. Add the crate to `bazel/rust.MODULE.bazel`.
40
-
4. Run a `cargo check` such that the `Cargo.lock` files get updated.
41
-
5. Run `bin/bazel-pin.sh --force` to sync `Cargo.lock` with `Cargo.Bazel.json.lock`.
42
-
43
-
44
-
Testing
45
-
=======
46
-
47
-
After code can be built it needs to be tested.
48
-
49
-
After changing a package under `rs/$PACKAGE` run `bazel test //rs/$PACKAGE`.
1
+
# General
2
+
3
+
All commands should be run from the repository root (`/ic`).
4
+
5
+
# Rust
6
+
7
+
After changing Rust code (`*.rs`) follow these steps in order:
8
+
9
+
1.**Format** by running the following from the root of the repository:
10
+
```
11
+
cd "$(git rev-parse --show-toplevel)"
12
+
rustfmt <MODIFIED_RUST_FILES>
13
+
```
14
+
where `<MODIFIED_RUST_FILES>` is a space separated list of paths of all modified Rust files relative to the root of the repository.
15
+
2.**Lint** by running the following from the root of the repository:
16
+
```
17
+
cd "$(git rev-parse --show-toplevel)"
18
+
cargo clippy --all-features <CRATES> -- \
19
+
-D warnings \
20
+
-D clippy::all \
21
+
-D clippy::mem_forget \
22
+
-A clippy::uninlined_format_args
23
+
```
24
+
where `<CRATES>` is a space separated list of
25
+
`-p <CRATE>` options for all modified crates.
26
+
e.g., `-p ic-crypto -p ic-types` if both were modified.
27
+
Run a single clippy invocation covering all modified crates.
28
+
29
+
To determine the crate name, check the `name` field in the nearest
30
+
ancestor `Cargo.toml` relative to the modified file.
31
+
32
+
Fix any linting errors.
33
+
3.**Build** the directly affected bazel targets by running the following from the root of the repository:
Copy file name to clipboardExpand all lines: .claude/skills/fix-flaky-tests/SKILL.md
+49-25Lines changed: 49 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,54 +5,78 @@ description: Use this when asked to fix flaky bazel tests.
5
5
6
6
This guide explains how to find flaky tests to fix and how to debug them. Flaky tests are bazel tests that run on GitHub workflows that pass after having failed in a previous attempt.
7
7
8
-
1. Make sure you're on an up-to-date `master` branch to make sure you're using and reading the latest code:
8
+
# Prerequisites
9
+
10
+
1. Make sure you're on an up-to-date `master` branch to ensure you're using and reading the latest code:
9
11
```
10
12
git checkout master && git pull
11
13
```
12
14
13
-
2. Determine which flaky bazel test to fix by picking the most flaky test in the last week which has not yet been fixed. To do this:
15
+
2. Run `gh auth status` to check if `gh` is authenticated with `github.com` using `Git operations protocol: ssh`.
This prints a one-time device code and a URL. Instruct the user to open the URL in their browser and enter the code.
22
+
23
+
**Do not** use the bare `gh auth login` command, as the interactive prompts are unreliable when run from an AI agent.
14
24
15
-
1. Run the following command to get the top 10 tests ordered by the number of times they flaked in the last week:
25
+
# Fix a flaky test
26
+
27
+
1. If not instructed to fix a test with a specified `label` determine which test to fix by picking the most flaky test in the last week which has not yet been fixed. To do this:
28
+
29
+
1. Run the following command to get the top 100 tests ordered descendingly by how much percent of their total runs they flaked in the last week, showing only tests which flaked 1% or more of their runs:
16
30
```
17
-
bazel run //ci/githubstats:query -- top 10 flaky --week
31
+
bazel run //ci/githubstats:query -- top 100 flaky% --ge 1 --week
32
+
```
33
+
34
+
2. Pick the `label` of the top most test which doesn't have an open PR or git commit in the last week mentioning its `<test_name>` which is the part of the `label` after the `:`.
35
+
36
+
`<test_name>` might be suffixed with `_head_nns` or `_colocate` which are variants of the same test. Strip those suffixes when checking for open PRs or commits to avoid missing matches.
37
+
38
+
To check if there is an open PR mentioning the test, run the following command
39
+
(replace underscores with spaces because GitHub search doesn't match underscored compound words):
40
+
```
41
+
gh pr list --search "$(echo '<test_name>' | tr '_' ' ')" --state open
18
42
```
19
-
2. Pick the `label` of the top most test which doesn't have an open PR or git commit in the last week mentioning its `<test_name>` which is the part of the `label` after the `:`. Also strip `_head_nns` or `_colocate` from the `<test_name>` to get a more fuzzy match.
20
43
21
44
To check if there is a git commit mentioning the test, run the following command:
Continue with the next test if you find an open PR or commit mentioning `<test_name>`
26
50
even if it seems the commit is not about fixing flakiness.
27
51
It's better to pick a test which has no other work being done on it to avoid conflicts.
28
52
29
-
3. Get the last flaky runs of the test named `label` in the last week by running the following command, replacing `<label>` with the label of the test:
53
+
2. Get the last flaky runs of the test named `label` in the last week by running the following command, replacing `<label>` with the label of the test:
30
54
```
31
55
bazel run //ci/githubstats:query -- last --flaky --week --download-ic-logs --download-console-logs <label>
32
56
```
33
57
Note the command will print `Downloading logs to: <LOG_DIR>`.
34
58
35
-
The directory `<LOG_DIR>` will contain an "invocation" directory, named like `<bazel_invocation_timestamp>_<bazel_invocation_id>`,
36
-
per bazel invocation that had a flaky run of the test.
59
+
Read `<LOG_DIR>/README.md` to understand how the logs are organized.
37
60
38
-
That invocation directory will have a directory per attempt of the test, named like `1`, `2`, `3`, etc.
61
+
3. Analyze the source code of `label` and the logs in `<LOG_DIR>` to determine the root cause of the flakiness.
39
62
40
-
Each attempt directory will either contain a `FAILED.log` or `PASSED.log` file with the log of the test if the attempt failed or passed, respectively.
63
+
4. Once you have determined the root cause,
64
+
fix the test taking `.claude/CLAUDE.md` into account.
41
65
42
-
In case the test was a system-test, i.e. when the `label` starts with `//rs/tests/`, the attempt directory will also contain:
43
-
* an `ic_logs` directory containing the logs of IC nodes that were deployed as part of the test.
44
-
Each IC node will have its own log file named `<node_id>.log` and there will be a symlink pointing to it with the IPv6 of the node: `<node_IPv6>.log`.
45
-
* a `console_logs` directory containing a `<vm_name>.log` file for each VM deployed as part of the test containing the console output of that VM. Often `<vm_name>` equals `<node_id>`.
46
-
47
-
4. Analyze the source code of `label` and the logs in `<LOG_DIR>` to determine the root cause of the flakiness.
48
-
49
-
5. Once you have determined the root cause, fix the test.
66
+
5. Verify the test still passes by running:
67
+
```
68
+
bazel test --test_output=errors --runs_per_test=3 --jobs=3 <label>
69
+
```
70
+
This executes 3 runs of the test in parallel to increase the chances of reproducing the flakiness. If it fails, analyze the failure and fix it until it passes reliably.
50
71
51
-
6. Run `bazel test <label>` to verify the test still passes.
72
+
6. Make a draft Pull Request with the fix, following these steps:
52
73
53
-
7. Create a new git branch named like `ai/deflake-<test_name>`, replacing `<test_name>` with the name of the test
54
-
and commit your fix to that branch.
74
+
1. From the root of the repository, create a new git branch named `ai/deflake-<test_name>-<date>`,
75
+
replacing `<test_name>` with the name of the test
76
+
and `<date>` with the current date in `YYYY-MM-DD` format,
77
+
and commit your fix to that branch.
55
78
56
-
8. Submit a draft PR with the fix.
57
-
Name it: `fix: deflake <label>`.
58
-
Include the root cause analysis in the PR description and link to this `SKILL.md` file.
79
+
2. Submit a draft PR using `gh` with the fix.
80
+
Name it: `fix: deflake <label>`.
81
+
Include the root cause analysis in the PR description
82
+
and mention the PR was created following the steps in `.claude/skills/fix-flaky-tests/SKILL.md`.
0 commit comments