Engine Build, Run, and Test Guide

This guide covers:

how to build and start tandem-engine
how to run automated tests
how to run the end-to-end smoke/runtime proof flow on Windows, macOS, and Linux

Windows quickstart (engine + tauri dev)

From tandem/:

pnpm install
pnpm engine:stop:windows
cargo build -p tandem-ai
New-Item -ItemType Directory -Force -Path .\src-tauri\binaries | Out-Null
Copy-Item .\target\debug\tandem-engine.exe .\src-tauri\binaries\tandem-engine.exe -Force
pnpm tauri dev

macOS/Linux quickstart (engine + tauri dev)

From tandem/:

pnpm install
# Kill any existing engine instance
pkill tandem-engine || true
cargo build -p tandem-ai
mkdir -p src-tauri/binaries
cp target/debug/tandem-engine src-tauri/binaries/tandem-engine
pnpm tauri dev

PowerShell equivalent (Windows):

pnpm install
Get-Process | Where-Object { $_.ProcessName -in @('tandem-engine','tandem') } | Stop-Process -Force -ErrorAction SilentlyContinue
cargo build -p tandem-ai
New-Item -ItemType Directory -Force -Path .\src-tauri\binaries | Out-Null
Copy-Item .\target\debug\tandem-engine.exe .\src-tauri\binaries\tandem-engine.exe -Force
pnpm tauri dev

Quick commands

From tandem/:

cargo build -p tandem-ai
cargo run -p tandem-ai -- serve --host 127.0.0.1 --port 39731
cargo test -p tandem-server -p tandem-core -p tandem-ai

Testing with packages/tandem-control-panel

cargo build -p tandem-ai --profile fast-release
sudo install -m 755 target/fast-release/tandem-engine /usr/local/bin/tandem-engine
sudo systemctl restart tandem-engine

Control panel testing locally

cd packages/tandem-control-panel && pnpm build && sudo systemctl restart tandem-control-panel.service

API Token Security Validation

Verify token-gated API behavior:

cargo run -p tandem-ai -- serve --host 127.0.0.1 --port 39731 --state-dir .tandem --api-token tk_test_token

In a second terminal:

# Health remains public
curl -s http://127.0.0.1:39731/global/health | jq .

# Non-health endpoints require token
curl -i -s http://127.0.0.1:39731/config/providers
curl -s http://127.0.0.1:39731/config/providers -H "X-Tandem-Token: tk_test_token" | jq .

Desktop/TUI token checks:

Desktop Settings now shows Engine API Token masked by default with Reveal and Copy.
TUI supports /engine token for masked output and /engine token show for full output plus storage backend/path.
Storage backend is reported as keychain, file, env, or memory.

OS-Aware Runtime Validation

Validate that engine-detected environment is surfaced to every client path:

curl -s http://127.0.0.1:39731/global/health | jq .environment

Expected fields:

os
shell_family
path_style
arch

Optional: toggle environment prompt block:

TANDEM_OS_AWARE_PROMPTS=0 cargo run -p tandem-ai -- serve --host 127.0.0.1 --port 39731

On Windows, validate shell guardrails:

Unix-only commands that cannot be safely translated are blocked with metadata.os_guardrail_applied=true and guardrail_reason.
Translatable commands (for example ls -la, find ... -type f -name ...) include metadata.translated_command.

CLI flags

serve supports:

--host or --hostname (same option)
--port
--state-dir

State directory resolution order:

--state-dir
TANDEM_STATE_DIR
canonical shared storage data dir (.../tandem/data)

Tool testing (CLI)

Use the tool subcommand to invoke built-in tools directly with JSON input. webfetch is especially useful because it converts noisy HTML into clean Markdown, extracts links + metadata, and reports size reductions. It should work against any public HTTP/HTTPS webpage (subject to site limits, auth, or anti-bot protections). Use webfetch_html when raw HTML is explicitly required.

The Markdown output is returned inline on stdout as JSON in the output field:

output.markdown holds the Markdown
output.text holds the plain-text fallback
output.stats includes raw vs Markdown size

Size savings example (proven from the Frumu.ai run above):

raw chars: 36,141
markdown chars: 7,292
reduction: 79.82%
bytes in: 36,188
bytes out: 7,292

Windows (PowerShell)

@'
{"tool":"webfetch","args":{"url":"https://frumu.ai","return":"both","mode":"auto"}}
'@ | cargo run -p tandem-ai -- tool --json -

@'
{"tool":"mcp_debug","args":{"url":"https://mcp.exa.ai/mcp","tool":"web_search_exa","args":{"query":"tandem engine","numResults":1}}}
'@ | cargo run -p tandem-ai -- tool --json -

macOS/Linux (bash)

cat << 'JSON' | cargo run -p tandem-ai -- tool --json -
{"tool":"webfetch","args":{"url":"https://frumu.ai","return":"both","mode":"auto"}}
JSON

Automated test layers

1) Rust unit/integration tests (fast, CI-friendly)

Run:

cargo test -p tandem-server -p tandem-core -p tandem-ai

Coverage includes route shape/contracts like:

/global/health
/provider
/api/session alias behavior
/mission create/list/get/apply-event
/routines create/list/patch/delete/run-now/history/events
/session/{id}/message
/session/{id}/run
/session/{id}/run/{run_id}/cancel
SSE message.part.updated
prompt_async?return=run (202 with runID + attach stream)
same-session conflict (409 with nested activeRun)
permission approve/deny compatibility routes

Mission/routine policy-focused tests:

cargo test -p tandem-server mission_ -- --nocapture
cargo test -p tandem-server routines_ -- --nocapture
cargo test -p tandem-server routine_policy_ -- --nocapture
cargo test -p tandem-server routines_run_now_ -- --nocapture

Contract-focused tests (JSON-first orchestrator parsing):

cargo test -p tandem test_parse_task_list_strict -- --nocapture
cargo test -p tandem test_parse_validation_result_strict_rejects_prose -- --nocapture

Desktop/CLI runtime contract closure tests:

# Desktop sidecar tests (includes reconnect recovery + conflict parsing + run-id cancel)
cargo test -p tandem sidecar::tests::recover_active_run_attach_stream_uses_get_run_endpoint -- --nocapture
cargo test -p tandem sidecar::tests::test_parse_prompt_async_response_409_includes_retry_and_attach -- --nocapture
cargo test -p tandem sidecar::tests::cancel_run_by_id_posts_expected_endpoint -- --nocapture
cargo test -p tandem sidecar::tests::mission_list_reads_engine_missions_endpoint -- --nocapture
cargo test -p tandem sidecar::tests::mission_get_reads_engine_mission_endpoint -- --nocapture
cargo test -p tandem sidecar::tests::mission_create_posts_to_engine_mission_endpoint -- --nocapture
cargo test -p tandem sidecar::tests::mission_apply_event_posts_event_payload -- --nocapture

# CLI (tandem-tui) run-id cancel client path
cargo test -p tandem-tui cancel_run_by_id_posts_expected_endpoint -- --nocapture

Example MCP auth challenge shape:

Authorization is required before I can continue with this action.

Tool mcp.arcade.jira_getboards result: Authorization required for mcp.arcade.jira_getboards.
Authorize here: https://example.com/oauth/start

Use a sanitized example in docs. Do not paste live OAuth challenge payloads with real redirect URIs, scopes, or state values into committed documentation. MCP-focused regression checks:

# Runtime MCP auth challenge extraction + schema normalization coverage
cargo test -p tandem-runtime mcp::tests::extract_auth_challenge_from_result_payload -- --nocapture
cargo test -p tandem-runtime mcp::tests::normalize_mcp_tool_args_maps_clickup_aliases -- --nocapture

# Full runtime MCP test module
cargo test -p tandem-runtime mcp::tests -- --nocapture

Manual MCP smoke checks:

Connect MCP server and verify tools are present in /tool.
Trigger an auth-gated tool and verify mcp.auth.required is emitted.
Complete auth and retry tool call (no engine restart required).
Force refresh/disconnect failure and verify stale MCP tools are not left active.
In web quickstart, verify run failures are rendered (no blank chat state on failure).

TUI mission quick-action commands (manual runtime validation):

/missions
/mission_create Mission Demo :: Validate quick actions :: Initial task
/mission_get <mission_id>
/mission_start <mission_id>
/mission_review_ok <mission_id> <work_item_id>
/mission_test_ok <mission_id> <work_item_id>
/mission_review_no <mission_id> <work_item_id> needs_revision

2) Engine smoke/runtime proof (process + HTTP + SSE + memory)

This is the automated version of the manual proof steps and writes artifacts to runtime-proof/.

Windows (PowerShell)

./scripts/engine_smoke.ps1

Optional args:

./scripts/engine_smoke.ps1 -HostName 127.0.0.1 -Port 39731 -StateDir .tandem-smoke -OutDir runtime-proof

macOS/Linux (bash)

Prerequisites:

jq
curl
ps
pkill

Run:

bash ./scripts/engine_smoke.sh

Optional env vars:

HOSTNAME=127.0.0.1 PORT=39731 STATE_DIR=.tandem-smoke OUT_DIR=runtime-proof bash ./scripts/engine_smoke.sh

What smoke scripts validate

engine starts and becomes healthy
session create/list endpoints
session message list endpoint has entries
provider catalog endpoint
SSE stream emits message.part.updated
idle memory sample after 60s
peak memory during tool-using prompt (with permission reply)
cleanup leaves no rogue tandem-engine process

Starting engine manually

Windows

cargo run -p tandem-ai -- serve --host 127.0.0.1 --port 39731 --state-dir .tandem

Running with `pnpm tauri dev`

Tauri dev must be able to find the tandem-engine sidecar binary in a dev lookup path. Use the binary built in target/ and copy it into src-tauri/binaries/.

Important: the filename is the same (tandem-engine or tandem-engine.exe), but the directories are different.

Shared Engine Mode (Desktop + TUI together)

Desktop and TUI now use shared engine mode by default:

default engine port: 127.0.0.1:39731
clients attach to an already-running engine when available
closing one client detaches instead of force-stopping the shared engine
set TANDEM_ENGINE_PORT to override the shared default for both desktop and TUI

Disable shared mode (legacy single-client behavior) by setting:

$env:TANDEM_SHARED_ENGINE_MODE="0"

If the app is stuck on Connecting... or fails to load, do a clean dev restart:

pnpm tauri:dev:clean

Manual equivalent:

Get-Process | Where-Object { $_.ProcessName -in @('tandem','tandem-engine','node') } | Stop-Process -Force -ErrorAction SilentlyContinue
pnpm tauri dev

JSON-first orchestrator feature flag

Strict planner/validator contract mode can be forced via env:

Windows (PowerShell)

$env:TANDEM_ORCH_STRICT_CONTRACT="1"
pnpm tauri dev

macOS/Linux (bash)

TANDEM_ORCH_STRICT_CONTRACT=1 pnpm tauri dev

Behavior in strict mode:

planner uses strict JSON parse first
validator uses strict JSON parse first
prose fallback is still allowed by default (allow_prose_fallback=true) during phase 1
contract degradation/failures emit contract_warning / contract_error orchestrator events

macOS/Linux (bash)

From tandem/:

cargo build -p tandem-ai
mkdir -p ./src-tauri/binaries
cp ./target/debug/tandem-engine ./src-tauri/binaries/tandem-engine
chmod +x ./src-tauri/binaries/tandem-engine
pnpm tauri dev

macOS/Linux

cargo run -p tandem-ai -- serve --host 127.0.0.1 --port 39731 --state-dir .tandem

Build commands by OS

Windows

cargo build -p tandem-ai

macOS/Linux

cargo build -p tandem-ai

Rogue process cleanup

Windows

Get-Process | Where-Object { $_.ProcessName -like 'tandem-engine*' } | Stop-Process -Force

macOS/Linux

pkill -f tandem-engine || true

Troubleshooting

Access is denied (os error 5) on Windows build usually means tandem-engine.exe is still running and locked by the OS loader.
Stop rogue engine processes, then rebuild.
If bind fails, verify no process is already listening on your port.
If startup log shows Another instance tried to launch, a previous app instance is still running. Close/kill all tandem processes and relaunch.
For writable state/config, use --state-dir with a project-local directory.

Windows copy/paste recovery for os error 5:

Get-Process | Where-Object { $_.ProcessName -in @('tandem-engine','tandem') } | Stop-Process -Force -ErrorAction SilentlyContinue
cargo build -p tandem-ai
New-Item -ItemType Directory -Force -Path .\src-tauri\binaries | Out-Null
Copy-Item .\target\debug\tandem-engine.exe .\src-tauri\binaries\tandem-engine.exe -Force

Note: mkdir -p and cp target/debug/tandem-engine ... are macOS/Linux commands. On Windows, use New-Item and Copy-Item with the .exe filename.

Storage migration verification (legacy upgrades)

When upgrading from builds that used ai.frumu.tandem, verify canonical migration:

Ensure %APPDATA%/ai.frumu.tandem exists with legacy data.
Ensure %APPDATA%/tandem is empty or absent.
Start Tandem app or engine once.
Verify %APPDATA%/tandem/storage_version.json and %APPDATA%/tandem/migration_report.json exist.
Verify sessions/tool history are visible without manual copying.
Verify %APPDATA%/ai.frumu.tandem remains intact (copy + keep legacy).

Startup migration wizard verification (blocking progress UX)

Seed legacy data under either:
%APPDATA%/ai.frumu.tandem
%APPDATA%/opencode
%USERPROFILE%/.local/share/opencode/storage
Launch Tandem and unlock vault.
Verify full-screen migration overlay appears and blocks interaction until completion.
Verify progress updates through phases:
scanning -> copying -> rehydrating -> finalizing.
On success/partial, verify summary card shows repaired session/message counts.
Click Continue and verify chat history loads for migrated sessions.

Settings migration rerun verification

Open Settings -> Data Migration.
Run Dry Run and verify result status reports dry_run.
Run Run Migration Again and verify counters update.
Verify migration_report.json timestamp updates and report path is shown.

Workspace namespace migration verification (`.opencode` -> `.tandem`)

For an existing workspace that contains legacy metadata:

Ensure <workspace>/.opencode/plans and/or <workspace>/.opencode/skill exists.
Start Tandem and set/switch active workspace to that folder.
Verify <workspace>/.tandem/plans and <workspace>/.tandem/skill are created.
Verify plan list and skills list still include legacy entries.
Create a new plan and install/import a skill; verify new files are written under .tandem/*.
Confirm legacy .opencode/* remains untouched (read-compatible window).

Workspace-scoped session history checks

With multiple projects configured, switch active folder in the project switcher.
Verify sidebar session list shows only sessions belonging to that folder.
Create a new chat in the active folder; verify it appears immediately in sidebar.
Verify sessions with legacy directory = "." still appear under current workspace.

Observability verification (JSONL + correlation)

After launching desktop (pnpm tauri dev) and sending one prompt:

Open %APPDATA%\\tandem\\logs and verify files exist:
- tandem.desktop.YYYY-MM-DD.jsonl
- tandem.engine.YYYY-MM-DD.jsonl
Search for provider.call.start in engine JSONL.
Search for chat.dispatch.start in desktop JSONL.
Verify a matching correlation_id exists across desktop dispatch and engine provider events.
If stream fails, verify one of:
- stream.subscribe.error
- stream.disconnected
- stream.watchdog.no_events

PowerShell helpers:

Select-String -Path "$env:APPDATA\tandem\logs\tandem.desktop.*.jsonl" -Pattern "chat.dispatch.start"
Select-String -Path "$env:APPDATA\tandem\logs\tandem.engine.*.jsonl" -Pattern "provider.call.start"
Select-String -Path "$env:APPDATA\tandem\logs\tandem.desktop.*.jsonl" -Pattern "stream.subscribe.error|stream.disconnected|stream.watchdog.no_events"

Uh oh!

FilesExpand file tree

ENGINE_TESTING.md

Latest commit

History

ENGINE_TESTING.md

File metadata and controls

Engine Build, Run, and Test Guide

Windows quickstart (engine + tauri dev)

macOS/Linux quickstart (engine + tauri dev)

Quick commands

Testing with packages/tandem-control-panel

Control panel testing locally

API Token Security Validation

OS-Aware Runtime Validation

CLI flags

Tool testing (CLI)

Windows (PowerShell)

macOS/Linux (bash)

Automated test layers

1) Rust unit/integration tests (fast, CI-friendly)

2) Engine smoke/runtime proof (process + HTTP + SSE + memory)

Windows (PowerShell)

macOS/Linux (bash)

What smoke scripts validate

Starting engine manually

Windows

Running with pnpm tauri dev

Shared Engine Mode (Desktop + TUI together)

JSON-first orchestrator feature flag

Windows (PowerShell)

macOS/Linux (bash)

macOS/Linux (bash)

macOS/Linux

Build commands by OS

Windows

macOS/Linux

Rogue process cleanup

Windows

macOS/Linux

Troubleshooting

Storage migration verification (legacy upgrades)

Startup migration wizard verification (blocking progress UX)

Settings migration rerun verification

Workspace namespace migration verification (.opencode -> .tandem)

Workspace-scoped session history checks

Observability verification (JSONL + correlation)

Running with `pnpm tauri dev`

Workspace namespace migration verification (`.opencode` -> `.tandem`)