Skip to content

Commit 26e6daa

Browse files
vigoonoise64
andauthored
Agent invocation fixes (#2842)
* Fail on missing test reports * Using a fixed cargo-test-r * agent metadata casing inconsistency fix * deleted component-transformer-exampel1 * Proper embedding of spawned service logs into the test logs * Nicer integration test output * Hide debug logs from docker client * Agent skill * Deployment revision support on the invocation API * Clean-up unused component_revision and fix oplog query matcher * Fix idempotency key handling on the new api * Extracted some skills from agents.md * Transfer more information (nested spans, source location) from child processes to test runner * Clippy & format * Regenerated configs * backtrace_on_stack_overflow to debug stack overflow on CI * Removed obsolete api tests makefile and ci entries * Test stability fix * Format --------- Co-authored-by: Dávid István Bíró <noise64@gmail.com>
1 parent 53a2ee7 commit 26e6daa

File tree

59 files changed

+1669
-256
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+1669
-256
lines changed
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
name: adding-dependencies
3+
description: "Adding or updating crate dependencies in the Golem workspace. Use when adding a new Rust dependency, changing dependency versions, or configuring dependency features."
4+
---
5+
6+
# Adding Dependencies
7+
8+
All crate dependencies in the Golem workspace are centrally managed. Versions and default features are specified **once** in the root `Cargo.toml` under `[workspace.dependencies]`, and workspace members reference them with `{ workspace = true }`.
9+
10+
## Adding a New Dependency
11+
12+
### Step 1: Add to root workspace Cargo.toml
13+
14+
Add the dependency under `[workspace.dependencies]` in the root `Cargo.toml`, specifying the version and any default features:
15+
16+
```toml
17+
# Simple version
18+
my-crate = "1.2.3"
19+
20+
# With features
21+
my-crate = { version = "1.2.3", features = ["feature1", "feature2"] }
22+
23+
# With default-features disabled
24+
my-crate = { version = "1.2.3", default-features = false }
25+
```
26+
27+
Keep entries **alphabetically sorted** within the section. Internal workspace crates are listed first (with `path`), followed by external dependencies.
28+
29+
### Step 2: Reference from workspace member
30+
31+
In the member crate's `Cargo.toml`, add the dependency using `workspace = true`:
32+
33+
```toml
34+
[dependencies]
35+
my-crate = { workspace = true }
36+
37+
# To add extra features beyond what the workspace specifies
38+
my-crate = { workspace = true, features = ["extra-feature"] }
39+
40+
# To make it optional
41+
my-crate = { workspace = true, optional = true }
42+
```
43+
44+
**Never** specify a version directly in a member crate's `Cargo.toml`. Always use `{ workspace = true }`.
45+
46+
The same pattern applies to `[dev-dependencies]` and `[build-dependencies]`.
47+
48+
### Step 3: Verify
49+
50+
```shell
51+
cargo build -p <crate> # Build the specific crate
52+
cargo make build # Full workspace build
53+
```
54+
55+
## Updating a Dependency Version
56+
57+
Change the version **only** in the root `Cargo.toml` under `[workspace.dependencies]`. All workspace members automatically pick up the new version.
58+
59+
## Pinned and Patched Dependencies
60+
61+
Some dependencies use exact versions (`=x.y.z`) to ensure compatibility. Check the `[patch.crates-io]` section in the root `Cargo.toml` for git-overridden crates (e.g., `wasmtime`). When updating patched dependencies, both the version under `[workspace.dependencies]` and the corresponding `[patch.crates-io]` entry must be updated together.
62+
63+
## Checklist
64+
65+
1. Version specified in root `Cargo.toml` under `[workspace.dependencies]`
66+
2. Member crate references it with `{ workspace = true }`
67+
3. No version numbers in member crate `Cargo.toml` files
68+
4. Entry is alphabetically sorted in the workspace dependencies list
69+
5. `cargo make build` succeeds
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
name: debugging-hanging-tests
3+
description: "Diagnosing and fixing hanging worker executor or integration tests. Use when a test hangs indefinitely, times out, or appears stuck during execution."
4+
---
5+
6+
# Debugging Hanging Tests
7+
8+
Worker executor and integration tests can hang indefinitely due to `unimplemented!()` panics in async tasks, deadlocks, missing shard assignments, or other async runtime issues. This skill provides a systematic workflow for diagnosing and resolving these hangs.
9+
10+
## Common Causes
11+
12+
| Cause | Symptom |
13+
|-------|---------|
14+
| `unimplemented!()` panic in async task | Test hangs after a log line mentioning the unimplemented feature |
15+
| Deadlock | Test hangs with no further log output |
16+
| Missing shard assignment | Worker never starts executing |
17+
| Channel sender dropped | Receiver awaits forever with no error |
18+
| Infinite retry loop | Repeated log lines with the same error |
19+
20+
## Step 1: Add a Timeout
21+
22+
Add a `#[timeout]` attribute so the test fails with a clear error instead of hanging forever:
23+
24+
```rust
25+
use test_r::test;
26+
use test_r::timeout;
27+
28+
#[test]
29+
#[timeout("30s")]
30+
async fn my_hanging_test() {
31+
// ...
32+
}
33+
```
34+
35+
Choose a timeout generous enough for normal execution but short enough to fail quickly when hung (30s–60s for most tests, up to 120s for complex integration tests).
36+
37+
## Step 2: Capture Full Output
38+
39+
Run the test with `--nocapture` and save **all output** to a file. The root cause often appears far before the point where the test hangs:
40+
41+
```shell
42+
cargo test -p <crate> <test_name> -- --nocapture > tmp/test_output.txt 2>&1
43+
```
44+
45+
**Important:** Always redirect to a file. The output can be thousands of lines, and the relevant error may be near the beginning while the hang occurs at the end.
46+
47+
## Step 3: Search for Root Cause
48+
49+
Search the saved output file for these patterns, in order of likelihood:
50+
51+
```shell
52+
grep -n "unimplemented" tmp/test_output.txt
53+
grep -n "panic" tmp/test_output.txt
54+
grep -n "ERROR" tmp/test_output.txt
55+
grep -n "WARN" tmp/test_output.txt
56+
```
57+
58+
### What to look for
59+
60+
- **`not yet implemented`** or **`unimplemented`**: An async task hit an unimplemented code path and panicked. The panic is silently swallowed by the async runtime, causing the caller to await forever.
61+
- **`panic`**: Similar to above — a panic in a spawned task won't propagate to the test.
62+
- **`ERROR` with retry**: A service call failing repeatedly, causing an infinite retry loop.
63+
- **Repeated identical log lines**: Indicates a retry loop or polling cycle that never succeeds.
64+
65+
## Step 4: Fix the Root Cause
66+
67+
### If caused by `unimplemented!()`
68+
Implement the missing functionality, or if it's a test-only issue, provide a stub/mock.
69+
70+
### If caused by a deadlock
71+
Look for:
72+
- Multiple `lock()` calls on the same mutex in nested scopes
73+
- `await` while holding a lock guard
74+
- Circular lock dependencies between tasks
75+
76+
### If caused by missing shard assignment
77+
Check that the test setup properly initializes the shard manager and assigns shards before starting workers.
78+
79+
### If caused by a dropped sender
80+
Ensure all channel senders are kept alive for the duration the receiver needs them. Check for early returns or error paths that drop the sender.
81+
82+
## Checklist
83+
84+
1. `#[timeout("30s")]` added to the hanging test
85+
2. Test run with `--nocapture`, output saved to file
86+
3. Output searched for `unimplemented`, `panic`, `ERROR`
87+
4. Root cause identified and fixed
88+
5. Test passes within the timeout
89+
6. Remove the `#[timeout]` if it was only added for debugging (or keep it as a safety net)
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
name: modifying-http-endpoints
3+
description: "Adding or modifying HTTP REST API endpoints in Golem services. Use when creating new endpoints, changing existing API routes, or updating request/response types for the Golem REST API."
4+
---
5+
6+
# Modifying HTTP Endpoints
7+
8+
## Framework
9+
10+
Golem uses **Poem** with **poem-openapi** for REST API endpoints. Endpoints are defined as methods on API structs annotated with `#[OpenApi]` and `#[oai]`.
11+
12+
## Where Endpoints Live
13+
14+
- **Worker service**: `golem-worker-service/src/api/` — worker lifecycle, invocation, oplog
15+
- **Registry service**: `golem-registry-service/src/api/` — components, environments, deployments, plugins, accounts
16+
17+
Each service has an `api/mod.rs` that defines an `Apis` type tuple and a `make_open_api_service` function combining all API structs.
18+
19+
## Adding a New Endpoint
20+
21+
### 1. Define the endpoint method
22+
23+
Add a method to the appropriate API struct (e.g., `WorkerApi`, `ComponentsApi`):
24+
25+
```rust
26+
#[oai(
27+
path = "/:component_id/workers/:worker_name/my-action",
28+
method = "post",
29+
operation_id = "my_action"
30+
)]
31+
async fn my_action(
32+
&self,
33+
component_id: Path<ComponentId>,
34+
worker_name: Path<String>,
35+
request: Json<MyRequest>,
36+
token: GolemSecurityScheme,
37+
) -> Result<Json<MyResponse>> {
38+
// ...
39+
}
40+
```
41+
42+
### 2. If adding a new API struct
43+
44+
1. Create a new file in the service's `api/` directory
45+
2. Define a struct and impl block with `#[OpenApi(prefix_path = "/v1/...", tag = ApiTags::...)]`
46+
3. Add it to the `Apis` type tuple in `api/mod.rs`
47+
4. Instantiate it in `make_open_api_service`
48+
49+
### 3. Request/response types
50+
51+
- Define types in `golem-common/src/model/` with `poem_openapi::Object` derive
52+
- If the type is used in the generated client, add it to the type mapping in `golem-client/build.rs`
53+
54+
## After Modifying Endpoints
55+
56+
After any endpoint change, you **must** regenerate and rebuild:
57+
58+
### Step 1: Regenerate OpenAPI specs
59+
60+
```shell
61+
cargo make generate-openapi
62+
```
63+
64+
This builds the services, dumps their OpenAPI YAML, merges them, and stores the result in `openapi/`.
65+
66+
### Step 2: Clean and rebuild golem-client
67+
68+
The `golem-client` crate auto-generates its code from the OpenAPI spec at build time via `build.rs`. After regenerating the specs:
69+
70+
```shell
71+
cargo clean -p golem-client
72+
cargo build -p golem-client
73+
```
74+
75+
The clean step is necessary because the build script uses `rerun-if-changed` on the YAML file, but cargo may cache stale generated code.
76+
77+
### Step 3: If new types are used in the client
78+
79+
Add type mappings in `golem-client/build.rs` to the `gen()` call's type replacement list. This maps OpenAPI schema names to existing Rust types from `golem-common` or `golem-wasm`.
80+
81+
### Step 4: Build and verify
82+
83+
```shell
84+
cargo make build
85+
```
86+
87+
Then run the appropriate tests:
88+
89+
- HTTP API tests: `cargo make api-tests-http`
90+
- gRPC API tests: `cargo make api-tests-grpc`
91+
92+
## Checklist
93+
94+
1. Endpoint method added with `#[oai]` annotation
95+
2. New API struct registered in `api/mod.rs` `Apis` tuple and `make_open_api_service` (if applicable)
96+
3. Request/response types defined in `golem-common` with `poem_openapi::Object`
97+
4. Type mappings added in `golem-client/build.rs` (if applicable)
98+
5. `cargo make generate-openapi` run
99+
6. `cargo clean -p golem-client && cargo build -p golem-client` run
100+
7. `cargo make build` succeeds
101+
8. `cargo make fix` run before PR
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
name: modifying-service-configs
3+
description: "Modifying service configuration types or defaults. Use when changing config structs, adding config fields, or updating default values for any Golem service."
4+
---
5+
6+
# Modifying Service Configs
7+
8+
Golem services use a configuration system built on [Figment](https://github.com/SergioBenitez/Figment) via a custom `ConfigLoader`. Configuration defaults are serialized to TOML and env-var reference files that are checked into the repository and validated in CI.
9+
10+
## How Configuration Works
11+
12+
Each service has a configuration struct that implements:
13+
- `Default` — provides default values
14+
- `Serialize` / `Deserialize` — for TOML and env-var serialization
15+
- `SafeDisplay` — for logging without exposing secrets
16+
17+
Services load config by merging (in order): defaults → TOML file → environment variables.
18+
19+
## Service Config Locations
20+
21+
| Service | Config struct | File |
22+
|---------|--------------|------|
23+
| Worker Executor | `GolemConfig` | `golem-worker-executor/src/services/golem_config.rs` |
24+
| Worker Service | `WorkerServiceConfig` | `golem-worker-service/src/config.rs` |
25+
| Registry Service | `RegistryServiceConfig` | `golem-registry-service/src/config.rs` |
26+
| Shard Manager | `ShardManagerConfig` | `golem-shard-manager/src/shard_manager_config.rs` |
27+
| Compilation Service | `ServerConfig` | `golem-component-compilation-service/src/config.rs` |
28+
29+
The all-in-one `golem` binary has its own merged config that combines multiple service configs.
30+
31+
## Modifying a Config
32+
33+
### Step 1: Edit the config struct
34+
35+
Add, remove, or modify fields in the appropriate config struct. Update the `Default` implementation if default values change.
36+
37+
### Step 2: Regenerate config files
38+
39+
```shell
40+
cargo make generate-configs
41+
```
42+
43+
This builds the service binaries and runs them with `--dump-config-default-toml` and `--dump-config-default-env-var` flags, producing reference files that reflect the current `Default` implementation.
44+
45+
### Step 3: Verify
46+
47+
```shell
48+
cargo make build
49+
```
50+
51+
### Step 4: Check configs match
52+
53+
CI runs `cargo make check-configs` which regenerates configs and diffs them against committed files. If this fails, you forgot to run `cargo make generate-configs`.
54+
55+
## Adding a New Config Field
56+
57+
1. Add the field to the config struct with a `serde` attribute if needed
58+
2. Set its default value in the `Default` impl
59+
3. Run `cargo make generate-configs` to update reference files
60+
4. If the field requires a new environment variable, the env-var mapping is derived automatically from the field path
61+
62+
## Removing a Config Field
63+
64+
1. Remove the field from the struct and `Default` impl
65+
2. Run `cargo make generate-configs`
66+
3. Check for any code that references the removed field
67+
68+
## Nested Config Types
69+
70+
Many config structs compose sub-configs (e.g., `GolemConfig` contains `WorkersServiceConfig`, `BlobStoreServiceConfig`, etc.). When modifying a sub-config type that's shared across services, regenerate configs for all affected services — `cargo make generate-configs` handles this automatically.
71+
72+
## Checklist
73+
74+
1. Config struct modified with appropriate `serde` attributes
75+
2. `Default` implementation updated
76+
3. `cargo make generate-configs` run
77+
4. Generated TOML and env-var files committed
78+
5. `cargo make build` succeeds
79+
6. `cargo make check-configs` passes (CI validation)
80+
7. `cargo make fix` run before PR

0 commit comments

Comments
 (0)