golemcloud · vigoo · Mar 3, 2026 · Mar 2, 2026 · Mar 2, 2026 · Mar 2, 2026
diff --git a/.agents/skills/testing/SKILL.md b/.agents/skills/testing/SKILL.md
@@ -0,0 +1,95 @@
+---
+name: testing
+description: "Running and debugging tests in the Golem workspace. Use when writing tests, running specific tests, filtering tests, debugging test failures, or understanding test infrastructure."
+---
+
+# Testing in Golem
+
+Tests use [test-r](https://test-r.vigoo.dev). Each test file **must** import `test_r::test` or the tests will silently not run:
+
+```rust
+use test_r::test;
+
+#[test]
+fn my_test() {
+    // ...
+}
+```
+
+## Choosing the Right Test Command
+
+**Do not run `cargo make test`** — it runs all tests and takes a very long time.
+
+| Change Type | Test Command |
+|-------------|--------------|
+| Core logic, utilities | `cargo make unit-tests` |
+| Worker executor functionality | `cargo make worker-executor-tests` |
+| Service integration | `cargo make integration-tests` |
+| CLI changes | `cargo make cli-tests` |
+
+**Whenever tests are modified, always run the affected tests to verify they still pass before considering the task complete.**
+
+For running specific tests during development:
+```shell
+cargo test -p <crate> -- <test_name> --report-time
+```
+
+## Test Filtering Rules (test-r)
+
+This project uses `test-r` which supports **multiple filter arguments after `--`**. Filters are OR-matched (a test runs if it matches any filter). Each filter is a **substring match**, not a regex.
+
+```shell
+# Run a single specific test:
+cargo test -p <crate> -- <test_name> --report-time
+
+# Run multiple specific tests (filters go AFTER --, not before):
+cargo test -p <crate> -- test_name_1 test_name_2 test_name_3 --report-time
+
+# WRONG - multiple filters before -- causes "unexpected argument" error:
+# cargo test -p <crate> test1 test2 -- --report-time
+
+# WRONG - regex patterns don't work (filters are substring matches, not regex):
+# cargo test -p <crate> -- "test_a|test_b" --report-time
+# cargo test -p <crate> -- "test_.*pattern" --report-time
+```
+
+**Note:** `--list` in test-r ignores filters and always lists all tests. Do not use `--list` to verify that filters are working. Instead, do a real run and check the `filtered out` count in the result line.
+
+## Debugging Test Failures
+
+Use `--nocapture` when debugging tests:
+```shell
+cargo test -p <crate> -- <test> --nocapture
+```
+
+**Always save test output to a file** when running worker executor tests, integration tests, or CLI tests. These tests are slow and produce potentially thousands of lines of logs. Never pipe output directly to `grep`, `head`, `tail`, etc. — if you need to examine different parts of the output, you would have to re-run the entire slow test. Instead:
+```shell
+cargo test -p <crate> -- <test> --nocapture > tmp/test_output.txt 2>&1
+# Then search/inspect the saved file as needed
+grep -n "pattern" tmp/test_output.txt
+```
+
+**Handling hanging tests:** Load the `debugging-hanging-tests` skill for a step-by-step workflow.
+
+## Test Components
+
+Worker executor tests and integration tests use pre-compiled WASM files from the `test-components/` directory. These are checked into the repository and **rebuilding them is not automated**. Do not attempt to rebuild test components — use the existing compiled WASM files, EXCEPT if the test component itself has an AGENTS.md file with instructions of how to do so.
+
+Load the `modifying-test-components` skill when rebuilding is needed.
+
+## Timeouts
+
+Add a `#[timeout]` attribute for tests that should fail rather than hang:
+
+```rust
+use test_r::test;
+use test_r::timeout;
+
+#[test]
+#[timeout("30s")]
+async fn my_test() {
+    // ...
+}
+```
+
+Choose a timeout generous enough for normal execution but short enough to fail quickly when hung (30s–60s for most tests, up to 120s for complex integration tests).
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -158,6 +158,8 @@ jobs:
         if: always()
         with:
           report-path: "**/target/ctrf-*.json"
+          upload-artifact: 'true'
+          artifact-name: unit-tests-report
           summary: true
           summary-report: true
           failed-report: true
@@ -230,6 +232,8 @@ jobs:
         if: always()
         with:
           report-path: "**/target/ctrf-*.json"
+          upload-artifact: 'true'
+          artifact-name: worker-executor-tests-${{ matrix.group.name }}-report
           summary: true
           summary-report: true
           failed-report: true
@@ -339,6 +343,8 @@ jobs:
         if: always()
         with:
           report-path: "**/target/ctrf-*.json"
+          upload-artifact: 'true'
+          artifact-name: '${{ matrix.group.name }}-report'
           summary: true
           summary-report: true
           failed-report: true

diff --git a/AGENTS.md b/AGENTS.md
@@ -26,18 +26,9 @@ Always run `cargo make build` before starting work to ensure all dependencies ar
 
 ## Testing
 
-Tests use [test-r](https://test-r.vigoo.dev). **Important:** Each test file must import `test_r::test` or tests will not run:
+Tests use [test-r](https://test-r.vigoo.dev). **Important:** Each test file must import `test_r::test` or tests will not run.
 
-```rust
-use test_r::test;
-
-#[test]
-fn my_test() {
-    // ...
-}
-```
-
-**Do not run `cargo make test`** - it runs all tests and takes a very long time. Instead, choose the appropriate test command:
+**Do not run `cargo make test`** — it runs all tests and takes a very long time. Instead, choose the appropriate test command:
 
 | Change Type | Test Command |
 |-------------|--------------|
@@ -46,18 +37,11 @@ fn my_test() {
 | Service integration | `cargo make integration-tests` |
 | CLI changes | `cargo make cli-tests` |
 
-**Whenever tests are modified, always run the affected tests to verify they still pass before considering the task complete.**
+For specific tests: `cargo test -p <crate> -- <test_name> --report-time`
 
-For specific tests during development:
-```shell
-cargo test -p <crate> <test_module> -- --report-time
-```
-
-## Test Components
-
-Worker executor tests and integration tests use pre-compiled WASM files from the `test-components/` directory. These are checked into the repository and **rebuilding them is not automated**. Do not attempt to rebuild test components - use the existing compiled WASM files, EXCEPT if the test component itself has an AGENTS.md file with instructions of how to do so.
+**Whenever tests are modified, always run the affected tests to verify they still pass before considering the task complete.**
 
-Load the `modifying-test-components` skill when rebuilding is needed.
+Load the `testing` skill for detailed guidance on test filtering, debugging failures, test components, and timeouts.
 
 ## Running Locally
 
@@ -74,6 +58,7 @@ Load these skills for guided workflows on complex tasks:
 |-------|-------------|
 | `modifying-http-endpoints` | Adding or modifying REST API endpoints (covers OpenAPI regeneration, golem-client rebuild, type mappings) |
 | `adding-dependencies` | Adding or updating crate dependencies (covers workspace dependency management, versioning, features) |
+| `testing` | Running and debugging tests (covers test filtering, debugging failures, test components, timeouts) |
 | `debugging-hanging-tests` | Diagnosing worker executor or integration tests that hang indefinitely |
 | `modifying-test-components` | Building or modifying test WASM components, or rebuilding after SDK changes |
 | `modifying-wit-interfaces` | Adding or modifying WIT interfaces and synchronizing across sub-projects |
@@ -101,22 +86,6 @@ This runs `rustfmt` and `clippy` with automatic fixes. Load `pre-pr-checklist` s
 
 All crate dependencies must have their versions specified in the root workspace `Cargo.toml` under `[workspace.dependencies]`. Workspace members must reference them using `x = { workspace = true }` in their own `Cargo.toml` rather than specifying versions directly.
 
-## Debugging Tests
-
-Use `--nocapture` when debugging tests to allow debugger attachment:
-```shell
-cargo test -p <crate> <test> -- --nocapture
-```
-
-**Always save test output to a file** when running worker executor tests, integration tests, or CLI tests. These tests are slow and produce potentially thousands of lines of logs. Never pipe output directly to `grep`, `head`, `tail`, etc. — if you need to examine different parts of the output, you would have to re-run the entire slow test. Instead:
-```shell
-cargo test -p <crate> <test> -- --nocapture > tmp/test_output.txt 2>&1
-# Then search/inspect the saved file as needed
-grep -n "pattern" tmp/test_output.txt
-```
-
-**Handling hanging tests:** Load the `debugging-hanging-tests` skill for a step-by-step workflow.
-
 ## Project Structure
 
 - `golem-worker-executor/` - Worker execution engine

diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -239,7 +239,7 @@ system-interface = "0.27.3"
 tap = "1.0.1"
 tempfile = "3.18.0"
 terminal_size = "0.4.2"
-test-r = { version = "3.0.0", default-features = true }
+test-r = { version = "3.0.3", default-features = true }
 testcontainers = { version = "0.23.3" }
 testcontainers-modules = { version = "0.11.6", features = ["postgres", "redis", "minio", "mysql", ] }
 textwrap = "0.16.1"

diff --git a/golem-common/Cargo.toml b/golem-common/Cargo.toml
@@ -132,6 +132,7 @@ semver = { workspace = true, optional = true }
 [dev-dependencies]
 anyhow = { workspace = true }
 assert2 = { workspace = true }
+futures = { workspace = true }
 pretty_assertions = { workspace = true }
 proptest = { workspace = true }
 test-r = { workspace = true }