dfinity
diff --git a/‎.bazelversion‎
Lines changed: 1 addition & 1 deletion b/‎.bazelversion‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.claude/CLAUDE.md‎
Lines changed: 54 additions & 49 deletions b/‎.claude/CLAUDE.md‎
Lines changed: 54 additions & 49 deletions
diff --git a/‎.claude/skills/fix-flaky-tests/SKILL.md‎
Lines changed: 49 additions & 25 deletions b/‎.claude/skills/fix-flaky-tests/SKILL.md‎
Lines changed: 49 additions & 25 deletions
diff --git a/‎.devcontainer/devcontainer.json‎
Lines changed: 1 addition & 1 deletion b/‎.devcontainer/devcontainer.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/CODEOWNERS‎
Lines changed: 3 additions & 1 deletion b/‎.github/CODEOWNERS‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎.github/actions/bazel/action.yaml‎
Lines changed: 0 additions & 8 deletions b/‎.github/actions/bazel/action.yaml‎
Lines changed: 0 additions & 8 deletions
diff --git a/‎.github/actions/netrc/action.yaml‎
Lines changed: 20 additions & 0 deletions b/‎.github/actions/netrc/action.yaml‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎.github/workflows/api-bn-recovery-test.yml‎
Lines changed: 43 additions & 0 deletions b/‎.github/workflows/api-bn-recovery-test.yml‎
Lines changed: 43 additions & 0 deletions
@@ -1 +1 @@
-8.5.1
+8.6.0
@@ -1,49 +1,54 @@
-Rust
-====
-
-After changing Rust code (`*.rs`) first format the code using:
-
-```
-cargo fmt -- <MODIFIED_RUST_FILES>
-````
-
-Then check the code for linting errors using:
-
-```
-cargo clippy --all-features --workspace --all-targets -- \
-    -D warnings \
-    -D clippy::all \
-    -D clippy::mem_forget \
-    -C debug-assertions=off \
-    -A clippy::uninlined_format_args
-```
-
-Fix any linting errors before continuing with building and testing.
-
-
-Building
---------
-
-Rust code is built using both `cargo build` and Bazel.
-
-After changing a package under `rs/$PACKAGE` run `bazel build //rs/$PACKAGE`.
-
-
-Changing crate dependencies
----------------------------
-
-If crate dependencies need to be changed or added:
-
-1. First modify the `Cargo.toml` local to the package.
-2. If a crate is used by multiple packages add it to the workspace `Cargo.toml` in the root of the repo and reference it in the `Cargo.toml` local to the package using `{ workspace = true }`.
-3. Add the crate to `bazel/rust.MODULE.bazel`.
-4. Run a `cargo check` such that the `Cargo.lock` files get updated.
-5. Run `bin/bazel-pin.sh --force` to sync `Cargo.lock` with `Cargo.Bazel.json.lock`.
-
-
-Testing
-=======
-
-After code can be built it needs to be tested.
-
-After changing a package under `rs/$PACKAGE` run `bazel test //rs/$PACKAGE`.
+# General
+
+All commands should be run from the repository root (`/ic`).
+
+# Rust
+
+After changing Rust code (`*.rs`) follow these steps in order:
+
+1. **Format** by running the following from the root of the repository:
+   ```
+   cd "$(git rev-parse --show-toplevel)"
+   rustfmt <MODIFIED_RUST_FILES>
+   ```
+   where `<MODIFIED_RUST_FILES>` is a space separated list of paths of all modified Rust files relative to the root of the repository.
+2. **Lint** by running the following from the root of the repository:
+   ```
+   cd "$(git rev-parse --show-toplevel)"
+   cargo clippy --all-features <CRATES> -- \
+       -D warnings \
+       -D clippy::all \
+       -D clippy::mem_forget \
+       -A clippy::uninlined_format_args
+   ```
+   where `<CRATES>` is a space separated list of
+   `-p <CRATE>` options for all modified crates.
+   e.g., `-p ic-crypto -p ic-types` if both were modified.
+   Run a single clippy invocation covering all modified crates.
+
+   To determine the crate name, check the `name` field in the nearest
+   ancestor `Cargo.toml` relative to the modified file.
+
+   Fix any linting errors.
+3. **Build** the directly affected bazel targets by running the following from the root of the repository:
+   ```
+   cd "$(git rev-parse --show-toplevel)"
+   TARGETS="$(bazel query 'kind(rule, rdeps(//..., set(<MODIFIED_FILES>), 1))' --keep_going 2>/dev/null)"
+   if [ -n "$TARGETS" ]; then
+       bazel build $TARGETS
+   fi
+   ```
+   where `<MODIFIED_FILES>` is a space separated list of paths of all modified files relative to the root of the repository.
+
+   Fix all build errors.
+4. **Test** the directly affected bazel tests by running the following from the root of the repository:
+   ```
+   cd "$(git rev-parse --show-toplevel)"
+   TESTS="$(bazel query 'kind(".*_test|test_suite", kind(rule, rdeps(//..., set(<MODIFIED_FILES>), 2)))' --keep_going 2>/dev/null)"
+   if [ -n "$TESTS" ]; then
+       bazel test --test_output=errors $TESTS
+   fi
+   ```
+   (Use a depth of 2 in `rdeps` because tests usually depend on source files indirectly through a `rust_library` for example).
+
+   Fix all test failures.
@@ -5,54 +5,78 @@ description: Use this when asked to fix flaky bazel tests.
 
 This guide explains how to find flaky tests to fix and how to debug them. Flaky tests are bazel tests that run on GitHub workflows that pass after having failed in a previous attempt.
 
-1. Make sure you're on an up-to-date `master` branch to make sure you're using and reading the latest code:
+# Prerequisites
+
+1. Make sure you're on an up-to-date `master` branch to ensure you're using and reading the latest code:
    ```
    git checkout master && git pull
    ```
 
-2. Determine which flaky bazel test to fix by picking the most flaky test in the last week which has not yet been fixed. To do this:
+2. Run `gh auth status` to check if `gh` is authenticated with `github.com` using `Git operations protocol: ssh`.
+
+   If not run:
+   ```
+   gh auth login --hostname github.com --git-protocol ssh --skip-ssh-key --web
+   ```
+   This prints a one-time device code and a URL. Instruct the user to open the URL in their browser and enter the code.
+
+   **Do not** use the bare `gh auth login` command, as the interactive prompts are unreliable when run from an AI agent.
 
-    1. Run the following command to get the top 10 tests ordered by the number of times they flaked in the last week:
+# Fix a flaky test
+
+1. If not instructed to fix a test with a specified `label` determine which test to fix by picking the most flaky test in the last week which has not yet been fixed. To do this:
+
+    1. Run the following command to get the top 100 tests ordered descendingly by how much percent of their total runs they flaked in the last week, showing only tests which flaked 1% or more of their runs:
        ```
-       bazel run //ci/githubstats:query -- top 10 flaky --week
+       bazel run //ci/githubstats:query -- top 100 flaky% --ge 1 --week
+       ```
+
+    2. Pick the `label` of the top most test which doesn't have an open PR or git commit in the last week mentioning its `<test_name>` which is the part of the `label` after the `:`.
+
+       `<test_name>` might be suffixed with `_head_nns` or `_colocate` which are variants of the same test. Strip those suffixes when checking for open PRs or commits to avoid missing matches.
+
+       To check if there is an open PR mentioning the test, run the following command
+       (replace underscores with spaces because GitHub search doesn't match underscored compound words):
+       ```
+       gh pr list --search "$(echo '<test_name>' | tr '_' ' ')" --state open
        ```
-    2. Pick the `label` of the top most test which doesn't have an open PR or git commit in the last week mentioning its `<test_name>` which is the part of the `label` after the `:`. Also strip `_head_nns` or `_colocate` from the `<test_name>` to get a more fuzzy match.
 
        To check if there is a git commit mentioning the test, run the following command:
        ```
-       git log --since 'last week' | grep <test_name>
+       git log --oneline --since 'last week' | grep "$(echo '<test_name>' | tr '_' '.')"
        ```
+
        Continue with the next test if you find an open PR or commit mentioning `<test_name>`
        even if it seems the commit is not about fixing flakiness.
        It's better to pick a test which has no other work being done on it to avoid conflicts.
 
-3. Get the last flaky runs of the test named `label` in the last week by running the following command, replacing `<label>` with the label of the test:
+2. Get the last flaky runs of the test named `label` in the last week by running the following command, replacing `<label>` with the label of the test:
    ```
    bazel run //ci/githubstats:query -- last --flaky --week --download-ic-logs --download-console-logs <label>
    ```
    Note the command will print `Downloading logs to: <LOG_DIR>`.
 
-   The directory `<LOG_DIR>` will contain an "invocation" directory, named like `<bazel_invocation_timestamp>_<bazel_invocation_id>`,
-   per bazel invocation that had a flaky run of the test.
+   Read `<LOG_DIR>/README.md` to understand how the logs are organized.
 
-   That invocation directory will have a directory per attempt of the test, named like `1`, `2`, `3`, etc.
+3. Analyze the source code of `label` and the logs in `<LOG_DIR>` to determine the root cause of the flakiness.
 
-   Each attempt directory will either contain a `FAILED.log` or `PASSED.log` file with the log of the test if the attempt failed or passed, respectively.
+4. Once you have determined the root cause,
+   fix the test taking `.claude/CLAUDE.md` into account.
 
-   In case the test was a system-test, i.e. when the `label` starts with `//rs/tests/`, the attempt directory will also contain:
-   * an `ic_logs` directory containing the logs of IC nodes that were deployed as part of the test.
-     Each IC node will have its own log file named `<node_id>.log` and there will be a symlink pointing to it with the IPv6 of the node: `<node_IPv6>.log`.
-   * a `console_logs` directory containing a `<vm_name>.log` file for each VM deployed as part of the test containing the console output of that VM. Often `<vm_name>` equals `<node_id>`.
-
-4. Analyze the source code of `label` and the logs in `<LOG_DIR>` to determine the root cause of the flakiness.
-
-5. Once you have determined the root cause, fix the test.
+5. Verify the test still passes by running:
+   ```
+   bazel test --test_output=errors --runs_per_test=3 --jobs=3 <label>
+   ```
+   This executes 3 runs of the test in parallel to increase the chances of reproducing the flakiness. If it fails, analyze the failure and fix it until it passes reliably.
 
-6. Run `bazel test <label>` to verify the test still passes.
+6. Make a draft Pull Request with the fix, following these steps:
 
-7. Create a new git branch named like `ai/deflake-<test_name>`, replacing `<test_name>` with the name of the test
-   and commit your fix to that branch.
+   1. From the root of the repository, create a new git branch named `ai/deflake-<test_name>-<date>`,
+      replacing `<test_name>` with the name of the test
+      and `<date>` with the current date in `YYYY-MM-DD` format,
+      and commit your fix to that branch.
 
-8. Submit a draft PR with the fix.
-   Name it: `fix: deflake <label>`.
-   Include the root cause analysis in the PR description and link to this `SKILL.md` file.
+   2. Submit a draft PR using `gh` with the fix.
+      Name it: `fix: deflake <label>`.
+      Include the root cause analysis in the PR description
+      and mention the PR was created following the steps in `.claude/skills/fix-flaky-tests/SKILL.md`.
@@ -1,5 +1,5 @@
 {
-  "image": "ghcr.io/dfinity/ic-build@sha256:62ef680d95901d1c191442d2a5f382315ed4b916e97a81335c8d13baba2fe30d",
+  "image": "ghcr.io/dfinity/ic-dev@sha256:fe06783d9cf8e9fc901a5996d9fc8b726f15769f2fd6bd86969d1fbbf77ae025",
   "remoteUser": "ubuntu",
   "privileged": true,
   "runArgs": [
 
@@ -137,7 +137,9 @@ go.sum                    @dfinity/idx
 /rs/interfaces/                                         @dfinity/ic-interface-owners
 /rs/interfaces/adapter_client/                          @dfinity/consensus
 /rs/interfaces/certified_stream_store/                  @dfinity/team-dsm
-/rs/interfaces/mocks/src/payload_builder                @dfinity/consensus
+/rs/interfaces/mocks/src/consensus_pool.rs              @dfinity/consensus
+/rs/interfaces/mocks/src/crypto.rs                      @dfinity/consensus
+/rs/interfaces/mocks/src/payload_builder.rs             @dfinity/consensus
 /rs/interfaces/registry/                                @dfinity/governance-team
 /rs/interfaces/src/canister_http.rs                     @dfinity/consensus
 /rs/interfaces/src/consensus.rs                         @dfinity/consensus
 
@@ -65,14 +65,6 @@ runs:
       run: |
         set -euo pipefail
 
-        # Set up .netrc so that bazel can authenticate with GitHub to have higher rate limits for fetching dependencies.
-        touch  ~/.netrc
-        chmod 600 ~/.netrc
-        echo "machine github.com login x-access-token password ${{ github.token }}" > ~/.netrc
-        echo "machine api.github.com login x-access-token password ${{ github.token }}" >> ~/.netrc
-        echo "Current GitHub API rate limits:"
-        curl -s --netrc https://api.github.com/rate_limit | jq '.resources.core'
-
         # Here we overwrite the PATH with our custom bazel wrapper to ensure
         # the specified commands are run with the CI-specific options included
         # in the wrapper.
 
@@ -0,0 +1,20 @@
+name: 'GitHub Authentication with netrc'
+description: Set up authentication in netrc using github token. Picked up by tools like Bazel.
+
+runs:
+  using: "composite"
+  steps:
+    # write netrc to $HOME
+    - shell: bash
+      run: |
+        # Set up .netrc so that tools can authenticate with GitHub to have higher rate limits for fetching dependencies.
+        touch  ~/.netrc
+        chmod 600 ~/.netrc
+        echo "machine github.com login x-access-token password ${{ github.token }}" > ~/.netrc
+        echo "machine api.github.com login x-access-token password ${{ github.token }}" >> ~/.netrc
+        echo "Current GitHub API rate limits:"
+
+        # Show how close we are to the limits (ignore errors from curl or invalid JSON)
+        curl -sf --netrc https://api.github.com/rate_limit 2>/dev/null | \
+          jq -r '.resources.core | "Rate limit remaining: \(.remaining)/\(.limit) (resets at \(.reset | strftime("%Y-%m-%d %H:%M:%S %Z")))"' 2>/dev/null \
+          || echo "Could not retrieve rate limit information"
@@ -0,0 +1,43 @@
+name: API BN Recovery Test
+
+on:
+  workflow_dispatch:
+  schedule:
+    - cron: '0 6 * * 1'  # weekly on Monday at 06:00 UTC
+  pull_request:
+    paths:
+      - '.github/workflows/api-bn-recovery-test.yml'
+      - 'ic-os/api-bn-recovery/**'
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+jobs:
+  test:
+    name: API BN Recovery Smoke Test
+    runs-on:
+      labels: dind-large
+    container:
+      image: ghcr.io/dfinity/ic-build@sha256:18d23aef1f5e9e7e1eef94c32563f8ed15531ae79065bb00bb5206a643fc49fe
+      options: >-
+        -e NODE_NAME --privileged --cgroupns host
+        --mount type=tmpfs,target="/home/buildifier/.local/share/containers"
+    timeout-minutes: 15
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+      - uses: ./.github/actions/netrc
+
+      - name: Install dfx
+        run: |
+          DFXVM_INIT_YES=1 sh -ci "$(curl -fsSL https://internetcomputer.org/install.sh)"
+          echo "$HOME/.local/share/dfx/bin" >> "$GITHUB_PATH"
+
+      - name: Run smoke test
+        uses: ./.github/actions/bazel
+        with:
+          run: ./ic-os/api-bn-recovery/test.sh
Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,5 @@`
`1`	`1`	`{`
`2`		`- "image": "ghcr.io/dfinity/ic-build@sha256:62ef680d95901d1c191442d2a5f382315ed4b916e97a81335c8d13baba2fe30d",`
	`2`	`+ "image": "ghcr.io/dfinity/ic-dev@sha256:fe06783d9cf8e9fc901a5996d9fc8b726f15769f2fd6bd86969d1fbbf77ae025",`
`3`	`3`	`"remoteUser": "ubuntu",`
`4`	`4`	`"privileged": true,`
`5`	`5`	`"runArgs": [`