Mng/run tmr deux by qi-imbue · Pull Request #917 · imbue-ai/mng

qi-imbue · 2026-03-19T18:25:04Z

No description provided.

Introduce skitwright, a lightweight end-to-end testing framework for CLI applications (a nod to Playwright). It provides: - Session: runs shell commands and records a text transcript - CommandResult: structured result with exit code, stdout, stderr - expect(): fluent assertion API for results and strings - Transcript: annotated text recording of all commands and outputs Add 10 basic e2e tests for the mng CLI that exercise it exclusively through its CLI interface (no library imports from mng): - Help output (mng --help, mng create --help) - List with no agents (table and JSON formats) - Create + list (verifies agent appears in list) - Create with JSON output format - Create in headless mode - Create + destroy lifecycle - Create + rename - Create with labels (verified via JSON list output) The e2e test fixture provides full isolation: separate MNG_HOST_DIR, MNG_PREFIX, MNG_ROOT_NAME, TMUX_TMPDIR, and disabled remote providers. Each test saves a transcript file for debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace uuid4().hex[:8] with get_short_random_string() - Replace relative import with absolute import - Convert MngRunner from class with __init__ to NamedTuple - Replace subprocess.run with skitwright run_command for tmux cleanup - Add @pytest.mark.release to all e2e tests (subprocess-based, no coverage) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add test_ratchets.py for skitwright (required by meta ratchet) - Add pytest config to skitwright pyproject.toml - Replace MngRunner class with lambda-based mng fixture (avoids __init__, NamedTuple, dataclass, and inline function ratchets) - Update MngRunFn type alias and test_basic.py to use function style Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ix markers - Add unit tests for skitwright (expect, transcript, session): 33 new tests, 100% coverage - Factor repetitive agent creation into create_agent fixture in e2e conftest - Change e2e test markers from @pytest.mark.release to @pytest.mark.acceptance so they run in CI on every PR - Fix incorrect docstring claiming "no library imports from mng" - Document skitwright as test-only dependency in mng pyproject.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The message command's CEL filter builder discarded the host/provider part of agent addresses (e.g. agent@host.modal). Now host_name and provider_name from parsed addresses are incorporated into the CEL filter, and the CEL context includes host.name for matching. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

In _partition_destroy_targets, the loop over online host agents silently skipped matched agents that were no longer present. Now raises AgentNotFoundError if any matched agent ID is not found in get_agents(). Also removes the redundant seen_hosts set (dict keys are already unique). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes test_every_project_has_pypi_readme CI failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…cleanup - Remove succeeded/failed computed properties from CommandResult; use exit_code checks directly (per user feedback) - Replace subprocess.run with Popen+threads for real-time interleaved stdout/stderr capture in the transcript (line-buffered) - Add OutputSource enum and OutputLine data type for interleaved output - Remove uv run prefix from mng commands (already in PATH via uv run pytest) - Add mng destroy --all --force cleanup in e2e fixture teardown - Add MNG_E2E_KEEP_ON_FAILURE env var to keep agents running on failure - Print transcript path to stderr on test failure - Update README, ratchets, and tests accordingly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert message.py, message_test.py, destroy.py changes that belong to another branch. Fix timeout test failure in CI by using process_group=0 and os.killpg() to kill the entire process tree (not just the shell). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add --cov=imbue.skitwright to root pyproject.toml addopts so coverage is tracked in monorepo-level CI runs. Also add proc.wait() after os.killpg() in timeout path to deterministically reap the process. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The emit tests previously had no assertions (only verified functions don't crash). Now they verify actual output: human format checks the value appears in stdout, JSONL checks the parsed event structure. Removed JSON-format variants since emit_event is silent in JSON mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The timeout code path was discarding all pre-timeout stderr output, replacing it with just the timeout message. Now reconstructs real stderr from captured lines and appends the timeout message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JSON mode emit tests were removed in the previous commit since emit_event is silent in JSON mode. Restored them with assertions that stdout is empty, verifying the code path executes without crashing and produces no output as expected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…-- splitting - Add --provider option to select which provider to launch agents on (e.g. docker, modal). All code that previously hardcoded LOCAL_PROVIDER_NAME now uses the configurable provider. Each agent tracks its own host reference to support providers that create a separate host per agent. - Add --env option to pass environment variables to agents (KEY=VALUE, repeatable). Uses the same resolve_env_vars utility as mng create. - Add --label option to attach labels to all launched agents (KEY=VALUE, repeatable). Labels are applied to both test agents and integrator agents. - Add --prompt-suffix option to append custom text to the agent prompt. - Replace _split_pytest_args with _TmrCommand (same pattern as _CreateCommand in mng create) for robust -- separator handling at the Click parse level. Test collection args go before --, testing flags go after --. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Script parses a tutorial shell script into command blocks and matches them against pytest functions by checking docstrings. Reports unmatched blocks (needing tests) and unmatched tests (needing blocks). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add tutorial_matcher_test.py with 12 unit tests covering all functions - Fix redundant file reads in find_pytest_functions (read once, reuse) - Warn on stderr when skipping files with syntax errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… setup - Move label KEY=VALUE parsing from tmr cli.py and create.py into a shared resolve_labels() function in env_utils.py (alongside resolve_env_vars). Both create.py and tmr cli.py now call resolve_labels(). - Extract _invoke_tmr_command() helper to eliminate duplicated Click command setup boilerplate across 5 _TmrCommand tests. - Rename test_cli_help_contains_new_options to test_cli_help_contains_provider_env_label_options for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tests for resolve_labels belong alongside the other env_utils tests in the mng library, not in mng_tmr. This ensures the shared utility is tested when running mng tests independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Reorder steps: handle unmatched functions before adding new tests, since some may pair with modified script blocks - Unmatched functions: adapt docstring+logic if a close-match block exists, remove if the block was deleted entirely - Unmatched blocks: clarify that one block may need multiple tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Extract _assert_provider_disabled helper in test_create_remote.py to eliminate 80 lines of duplicated assertion code (MAJOR) - Tighten regex in test_create_address_syntax_existing_host - Add exit code verification to test_create_with_plugin_flags - Remove misleading @pytest.mark.docker from test_create_docker_start_args - Remove scroll container from asciinema player (rows param is enough) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Breadcrumbs now include the current page title as bold text (not a link) at the end of the trail, on all three page levels - Player uses fit:'none' with terminalFontSize:'12px' instead of fit:'width' with rows:20, giving a more compact native-sized player - Agent names in transcripts (matching cast file stems) are linked to their corresponding recording sections via anchor tags Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move tutorial block text from docstrings into explicit e2e.write_tutorial_block() calls that write tutorial_block.txt to the test output directory. This makes the blocks visible in the test output viewer alongside transcripts and recordings. - Add E2eSession subclass in conftest with write_tutorial_block() that dedents and writes the block to the output directory - Convert all 6 test files from docstrings to write_tutorial_block() - Simplify tutorial_matcher.py to use text matching instead of AST walking: parses functions by "def test_" lines, extracts block text from write_tutorial_block() calls or docstrings via regex - Update test output viewer to show tutorial_block.txt content - Update slash command documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace AST walking and regex extraction with simple line-by-line comparison: strip leading whitespace from both script block lines and function body lines, then check if all block lines appear in order in the body. No docstring or write_tutorial_block parsing needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resource guard requires the docker marker when the command string contains 'docker'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add collapsible left sidebar on test pages showing all tests in the same run for quick navigation. Collapse state persists across page loads via localStorage. Current test is bolded. - Change transcript comment color from green to yellow (#dcdcaa) - Render tutorial_block.txt with the same color scheme as transcripts: yellow for comment lines (starting with #), blue for command lines - Add @pytest.mark.docker to test_create_docker_start_args Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The sidebar CSS was defined as a plain string with {{ }} (f-string escape syntax) but was interpolated into an f-string via {sidebar_css}, causing literal {{ }} to appear in the rendered CSS instead of { }. Switch to string concatenation to avoid the escaping issue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Sidebar toggle is now a button outside the collapsible content so it stays visible when the sidebar is collapsed - Widen sidebar from 240px to 300px to avoid cropping test names - Run listing page now uses the same sidebar layout as the test detail page, eliminating layout shift when navigating between them - Extract _build_test_sidebar helper shared by both pages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Run listing page shows the full test list directly (no sidebar) - Sidebar only appears on test detail pages - Breadcrumb (nav) is rendered ABOVE the sidebar layout so it stays in the same position on all pages -- no layout shift when navigating - Refactored _html_page to take nav as a separate parameter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Rename sections: "Tutorial Block" -> "Tutorial block", "Transcript" -> "CLI transcript", "Recording" -> "TUI recording" - Remove redundant cast filename label (already in heading) - Match asciinema player font size to transcript (0.85em) - Player already left-aligned (fit: 'none' renders at native size) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add pytest CLI flags for controlling e2e test behavior: - --mng-e2e-keep-env={yes,on-failure,no} replaces the MNG_E2E_KEEP_ON_FAILURE env var. Controls whether agents and tmux sessions survive after tests. - --mng-e2e-artifacts={yes,on-failure,no} controls whether transcript, asciinema recordings, and tutorial block files are saved. When the environment is kept alive, a destroy-env shell script is written to the test output directory that cleans up agents and kills tmux. On test failure, stderr now includes a pointer to DEBUGGING.md which documents how to inspect test artifacts and interact with a kept environment (mng commands, tmux attach, env vars to set). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The matcher was rewritten to use line-based matching instead of AST docstring extraction. Update tests to use _block_lines_in_body instead of block_matches_docstring, and test body extraction instead of docstring extraction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The test_prevent_bare_print ratchet scans all .py files under libs/mng/imbue/mng/ and disallows bare print() calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove E2eSession.__init__ (init methods ratchet): use classmethod create() that sets output_dir after construction instead - Replace all CSS hex color codes with rgb() notation to avoid triggering the trailing comments ratchet - Reword docstring to avoid false positive trailing comment match - Add else: pass to ANSI parser if/elif chain Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…atches tutorial_matcher_test.py: - Replace nonexistent block_matches_docstring with _block_lines_in_body - Fix assertion that body is None (string parser always returns a string) - Fix assertion that syntax errors produce empty results (string parser still extracts functions from malformed files) api.py: - Remove bare Exception from two except clauses in polling loops, keeping only the specific expected exceptions (MngError, HostError, ConcurrencyGroupError, OSError) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The fixture was duplicated in notifier_test.py and watcher_test.py. Per CLAUDE.md, all fixtures must be in conftest.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

install.sh: Defer bash PATH and mng PATH warnings to the end of the script so they remain visible after all interactive prompts. test_create_commands.py: Add behavioral assertions beyond just checking mng list: verify custom command running via mng exec, check idle_timeout in JSON, dirty working tree before --no-ensure-clean test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When --mng-e2e-artifacts=no and --mng-e2e-keep-env=yes, the output directory was deleted before _write_destroy_script tried to write to it. Now the directory is preserved whenever keep_env is true. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Reject --mng-e2e-artifacts lower than --mng-e2e-keep-env at config time (yes=2 > on-failure=1 > no=0) since keeping the env requires the output directory for the destroy-env script - Print all env vars (including MNG_ROOT_NAME and CWD) when keeping the environment alive - Update DEBUGGING.md: document mng capture and mng message as portable alternatives to tmux attach, include MNG_ROOT_NAME in example env var block - Add comment to test_create_remote.py explaining why tests are kept as separate functions instead of parametrized (1:1 tutorial mapping) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Tighten _PROVIDER_ERROR_PATTERN to require both provider name and error type (e.g. "modal.*not authorized") instead of matching any word individually - test_create_with_env: verify env var via mng exec instead of just checking mng list - test_create_with_agent_args: verify "--model opus" appears in the agent's command field in JSON output - Add explanatory comment in test_create_remote.py about intentional 1:1 test-to-block correspondence Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- tmux examples use TMUX= (empty string) instead of unset TMUX - All mng command examples include MNG_HOST_DIR, MNG_PREFIX, and MNG_ROOT_NAME as inline env variable prefixes - Remove the export block since inline prefixes are more copy-pasteable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MNG_HOST_DIR alone is sufficient to isolate the test environment from the host mng -- it segregates all agent data. MNG_PREFIX and MNG_ROOT_NAME are already set by the parent autouse fixture and inherited via os.environ.copy(), so explicitly setting them in the e2e subprocess env was redundant and confusing. Remove MNG_PREFIX and MNG_ROOT_NAME from: - The e2e fixture's env dict construction - The e2e fixture's parameter list - The destroy-env script - The keep-env debug output - DEBUGGING.md examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

qi-imbue and others added 30 commits March 6, 2026 17:52

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

6edd122

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

db15814

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

bb47d2d

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

91fc8d4

Revert e2e test markers to @pytest.mark.release

cbdfddf

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove unused tmp_path param, reuse project_config_dir fixture

991b553

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix skitwright trailing comments ratchet count (3 type: ignore misfires)

e8a0a9c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

414c8e7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add missing readme field to skitwright pyproject.toml

a46e5cb

Fixes test_every_project_has_pypi_readme CI failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

caa9856

Fix test.

804673c

Merge remote-tracking branch 'origin/main' into mng/basic-e2e-tests

ef91b5e

qi-imbue and others added 30 commits March 19, 2026 05:23

Add @pytest.mark.docker to test_create_docker_start_args

20cc5a3

Resource guard requires the docker marker when the command string contains 'docker'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Left-align asciinema player by overriding default centering

96bd6f4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use flexbox to left-align asciinema player

45826b6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'main' into mng/run-tmr-deux

00da413

Merge branch 'mng/e2e-tests-deux' into mng/run-tmr-deux

0c9a978

Replace bare print() with sys.stdout.write() in serve_test_output

ad92b16

The test_prevent_bare_print ratchet scans all .py files under libs/mng/imbue/mng/ and disallows bare print() calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix ruff E741: rename ambiguous variable l to bl

0709a69

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move notification_cg fixture to conftest.py

bc7a954

The fixture was duplicated in notifier_test.py and watcher_test.py. Per CLAUDE.md, all fixtures must be in conftest.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'mng/e2e-tests-deux' into mng/run-tmr-deux

3388047

Merge branch 'mng/e2e-tests-deux' into mng/run-tmr-deux

9a9273d

Merge branch 'mng/e2e-tests-deux' into mng/run-tmr-deux

56a6241

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mng/run tmr deux#917

Mng/run tmr deux#917
qi-imbue wants to merge 110 commits intomainfrom
mng/run-tmr-deux

qi-imbue commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qi-imbue commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant