Mng/run tmr by qi-imbue · Pull Request #910 · imbue-ai/mng

qi-imbue · 2026-03-18T19:47:36Z

No description provided.

Introduce skitwright, a lightweight end-to-end testing framework for CLI applications (a nod to Playwright). It provides: - Session: runs shell commands and records a text transcript - CommandResult: structured result with exit code, stdout, stderr - expect(): fluent assertion API for results and strings - Transcript: annotated text recording of all commands and outputs Add 10 basic e2e tests for the mng CLI that exercise it exclusively through its CLI interface (no library imports from mng): - Help output (mng --help, mng create --help) - List with no agents (table and JSON formats) - Create + list (verifies agent appears in list) - Create with JSON output format - Create in headless mode - Create + destroy lifecycle - Create + rename - Create with labels (verified via JSON list output) The e2e test fixture provides full isolation: separate MNG_HOST_DIR, MNG_PREFIX, MNG_ROOT_NAME, TMUX_TMPDIR, and disabled remote providers. Each test saves a transcript file for debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace uuid4().hex[:8] with get_short_random_string() - Replace relative import with absolute import - Convert MngRunner from class with __init__ to NamedTuple - Replace subprocess.run with skitwright run_command for tmux cleanup - Add @pytest.mark.release to all e2e tests (subprocess-based, no coverage) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add test_ratchets.py for skitwright (required by meta ratchet) - Add pytest config to skitwright pyproject.toml - Replace MngRunner class with lambda-based mng fixture (avoids __init__, NamedTuple, dataclass, and inline function ratchets) - Update MngRunFn type alias and test_basic.py to use function style Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ix markers - Add unit tests for skitwright (expect, transcript, session): 33 new tests, 100% coverage - Factor repetitive agent creation into create_agent fixture in e2e conftest - Change e2e test markers from @pytest.mark.release to @pytest.mark.acceptance so they run in CI on every PR - Fix incorrect docstring claiming "no library imports from mng" - Document skitwright as test-only dependency in mng pyproject.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The message command's CEL filter builder discarded the host/provider part of agent addresses (e.g. agent@host.modal). Now host_name and provider_name from parsed addresses are incorporated into the CEL filter, and the CEL context includes host.name for matching. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

In _partition_destroy_targets, the loop over online host agents silently skipped matched agents that were no longer present. Now raises AgentNotFoundError if any matched agent ID is not found in get_agents(). Also removes the redundant seen_hosts set (dict keys are already unique). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes test_every_project_has_pypi_readme CI failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…cleanup - Remove succeeded/failed computed properties from CommandResult; use exit_code checks directly (per user feedback) - Replace subprocess.run with Popen+threads for real-time interleaved stdout/stderr capture in the transcript (line-buffered) - Add OutputSource enum and OutputLine data type for interleaved output - Remove uv run prefix from mng commands (already in PATH via uv run pytest) - Add mng destroy --all --force cleanup in e2e fixture teardown - Add MNG_E2E_KEEP_ON_FAILURE env var to keep agents running on failure - Print transcript path to stderr on test failure - Update README, ratchets, and tests accordingly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert message.py, message_test.py, destroy.py changes that belong to another branch. Fix timeout test failure in CI by using process_group=0 and os.killpg() to kill the entire process tree (not just the shell). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add --cov=imbue.skitwright to root pyproject.toml addopts so coverage is tracked in monorepo-level CI runs. Also add proc.wait() after os.killpg() in timeout path to deterministically reap the process. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The emit tests previously had no assertions (only verified functions don't crash). Now they verify actual output: human format checks the value appears in stdout, JSONL checks the parsed event structure. Removed JSON-format variants since emit_event is silent in JSON mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The timeout code path was discarding all pre-timeout stderr output, replacing it with just the timeout message. Now reconstructs real stderr from captured lines and appends the timeout message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JSON mode emit tests were removed in the previous commit since emit_event is silent in JSON mode. Restored them with assertions that stdout is empty, verifying the code path executes without crashing and produces no output as expected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…-- splitting - Add --provider option to select which provider to launch agents on (e.g. docker, modal). All code that previously hardcoded LOCAL_PROVIDER_NAME now uses the configurable provider. Each agent tracks its own host reference to support providers that create a separate host per agent. - Add --env option to pass environment variables to agents (KEY=VALUE, repeatable). Uses the same resolve_env_vars utility as mng create. - Add --label option to attach labels to all launched agents (KEY=VALUE, repeatable). Labels are applied to both test agents and integrator agents. - Add --prompt-suffix option to append custom text to the agent prompt. - Replace _split_pytest_args with _TmrCommand (same pattern as _CreateCommand in mng create) for robust -- separator handling at the Click parse level. Test collection args go before --, testing flags go after --. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Script parses a tutorial shell script into command blocks and matches them against pytest functions by checking docstrings. Reports unmatched blocks (needing tests) and unmatched tests (needing blocks). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add tutorial_matcher_test.py with 12 unit tests covering all functions - Fix redundant file reads in find_pytest_functions (read once, reuse) - Warn on stderr when skipping files with syntax errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… setup - Move label KEY=VALUE parsing from tmr cli.py and create.py into a shared resolve_labels() function in env_utils.py (alongside resolve_env_vars). Both create.py and tmr cli.py now call resolve_labels(). - Extract _invoke_tmr_command() helper to eliminate duplicated Click command setup boilerplate across 5 _TmrCommand tests. - Rename test_cli_help_contains_new_options to test_cli_help_contains_provider_env_label_options for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When a remote agent's host becomes unreachable (e.g. Modal sandbox terminated), operations like reading results, stopping agents, and pulling branches would crash the entire coordinator. Now: - read_agent_result catches HostError and returns REMOTE_AGENT_ERROR with a descriptive summary instead of crashing. - _stop_agent_on_host catches HostError (it only caught MngError before, but HostError extends BaseMngError, not MngError). - pull_agent_branch catches HostError for the same reason. - New REMOTE_AGENT_ERROR outcome is added to TestOutcome with its own color (purple) in the HTML report. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

For remote agents (modal/docker), branches don't exist locally. To pull changes from agents with FIX_*_SUCCEEDED outcomes: - Save the base commit hash at the start of the run. - Before pulling, create a local branch from the base commit. - Then pull_git fetches the remote agent's changes into that branch. Add --integrator-provider option (defaults to "local") so the integrator agent runs locally. This makes sense because there is only one integrator and it needs access to the local branches just pulled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The send_message call can fail with a raw TimeoutError from pyinfra (SSH command timeout) in addition to SendMessageError. Broaden the catch to handle TimeoutError and HostError as well. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix: Only pass base_commit to gather_results when using a remote provider. For the local provider, agents use WORKTREE mode so their branches already exist locally -- calling git branch would fail with 'already exists'. Fix: Catch ProcessError in pull_agent_branch so git command failures (from _create_local_branch or other git operations) are handled gracefully instead of crashing the coordinator. Fix: Make _create_local_branch tolerate pre-existing branches by using is_checked_after=False and falling back to reuse. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When --integrator-provider is set to a remote provider, the integrator's branch doesn't exist locally. Pass base_commit through to _run_integrator_phase so pull_agent_branch can create the local branch before pulling, matching the pattern used for test agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When --snapshot is provided, all agents are launched from that snapshot directly, skipping the --use-snapshot build-and-snapshot flow. This is useful when a snapshot was created in a previous run and can be reused. When both --snapshot and --use-snapshot are provided, --snapshot takes precedence (no need to build a new snapshot). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When recording REMOTE_AGENT_ERROR, the summary now says exactly which stage failed (fetching result file, pulling branch, confirming message delivery). Connection failures during agent stop are already ignored since stopping is purely cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The list_agents call in poll_until_all_done can fail with connection errors (e.g. Docker daemon temporarily unavailable, network blip). Catch MngError, HostError, ConcurrencyGroupError, and OSError, log a warning, and retry on the next polling cycle instead of crashing. Bump the time_sleep ratchet count from 2 to 3 for the new sleep in the error retry path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Apply the same transient error handling to the integrator polling loop as poll_until_all_done: catch MngError/HostError/ConcurrencyGroupError/ OSError, log a warning, and retry on the next cycle. Also rename unused snapshot_name to _snapshot_name in cli.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Default HTML report path is now tmr_reports/tmr-report-<timestamp>.html instead of the current directory. The directory is created automatically by generate_html_report (which already calls mkdir). Added tmr_reports/ to .gitignore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the `mng` and `create_agent` fixture wrappers so that each test shows the exact CLI command being run as a plain string via `e2e.run()`. Replace `create_agent` with a simple `agent_name` fixture that provides a unique name without hiding the command construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update the --output-html help to reflect the new default path (tmr_reports/tmr-report-<timestamp>.html). Fix duplicate Step 4 comment by renumbering steps sequentially (1-10). The time_sleep ratchet increase (2 -> 4) in earlier commits is justified: both new sleeps are in polling error retry paths to prevent tight-loop retries after transient network failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When launching many agents concurrently on remote providers like Modal, the API rate limit (25 req/s) can be hit. Add two configurable options: - --max-parallel (default 4): max concurrent agent launches - --launch-delay (default 2.0s): delay between submitting each launch Agent launches are staggered by sleeping between submissions while the executor limits concurrency. This keeps the request rate well below provider limits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolve conflicts by adopting the new fixture pattern from e2e-tests-deux (e2e: Session + agent_name: str) while keeping tutorial block docstrings. Remove 4 orphaned tests (list, destroy, rename) that have no tutorial blocks. Update test_tutorial_create.py to use the new fixtures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- test_create_with_disabled_provider: verify error output mentions the provider/disabled, not just that the command failed - test_create_plugins: add assertions on both success and failure paths - test_create_bare: verify agent runs in a worktree (different pwd) - test_create_different_agent_type: verify agent_type == "codex" in JSON - test_create_source_path: verify agent's pwd differs, use unique temp path - test_create_shallow_clone: verify git rev-list --count == 1 - test_create_from_agent: verify target has same git HEAD as source - test_create_copy_with_branch: verify no new branch created, agent on main - test_create_connect_command: verify connect_command in JSON output - test_create_template: verify in_place=true applied (pwd matches main repo) - test_create_no_git: use unique temp path with cleanup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The --no-connect and --no-ensure-clean flags were placed after the -- separator, causing them to be passed to python as arguments instead of to mng create as flags. Move them before -- so they are correctly handled by mng. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- test_create_connect_command: remove assertion on connect_command field which doesn't exist in AgentDetails JSON output; verify agent is RUNNING/WAITING instead - test_create_plugins: assert deterministically that nonexistent plugin causes failure with plugin-related error message; also exercise --disable-plugin flag to match the tutorial block Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ModalProxyError extends Exception directly (not MngError or HostError), so it was not caught by the polling error handlers. Widen the catch to include Exception alongside the specific types. Root cause note: the Modal plugin's list_agents path calls get_tags() per-host, each of which triggers a separate sandbox_list() API call. With N Modal hosts, this means N sandbox_list() calls per poll cycle, easily hitting Modal's 25/s rate limit. The fix should be in the Modal plugin (caching sandbox_list results), but for now we tolerate the error in the tmr polling loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The Modal plugin's _list_running_host_ids() already fetches all sandboxes and their tags during discovery. Populate _sandbox_cache_by_id and _sandbox_cache_by_name with these results so that subsequent get_host_tags() calls (triggered by _build_host_details_from_host in list.py:441) hit the cache instead of making N additional sandbox_list() API calls (one per host). This was the root cause of hitting Modal's 25/s rate limit during polling. Also make wait_for_integrator's polling error catch consistent with poll_until_all_done. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

qi-imbue and others added 30 commits March 6, 2026 17:52

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

6edd122

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

db15814

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

bb47d2d

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

91fc8d4

Revert e2e test markers to @pytest.mark.release

cbdfddf

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove unused tmp_path param, reuse project_config_dir fixture

991b553

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix skitwright trailing comments ratchet count (3 type: ignore misfires)

e8a0a9c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

414c8e7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add missing readme field to skitwright pyproject.toml

a46e5cb

Fixes test_every_project_has_pypi_readme CI failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

caa9856

Fix test.

804673c

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

ef91b5e

Merge branch 'mng/basic-e2e-tests' into mng/run-tmr

e243607

Merge branch 'mng/fix-modal' into mng/run-tmr

1999149

qi-imbue and others added 30 commits March 18, 2026 14:19

Merge branch 'mng/tmr-deux' into mng/run-tmr

013c82c

Merge branch 'mng/tmr-deux' into mng/run-tmr

859afcc

dockerignore

6d78ce8

Merge branch 'mng/tmr-deux' into mng/run-tmr

f1b6cee

Merge branch 'mng/tmr-deux' into mng/run-tmr

29fed88

Merge branch 'mng/tmr-deux' into mng/run-tmr

3bdabf6

Include agent names in polling log message

4150735

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove --await-ready flag from e2e test commands

db4804c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'mng/tmr-deux' into mng/run-tmr

b22512c

Merge remote-tracking branch 'origin/main' into mng/tmr-deux

1715b83

Merge branch 'mng/tmr-deux' into mng/run-tmr

accc1e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mng/run tmr#910

Mng/run tmr#910
qi-imbue wants to merge 94 commits intomainfrom
mng/run-tmr

qi-imbue commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qi-imbue commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant