Skip to content

Mng/run tmr deux#917

Draft
qi-imbue wants to merge 110 commits intomainfrom
mng/run-tmr-deux
Draft

Mng/run tmr deux#917
qi-imbue wants to merge 110 commits intomainfrom
mng/run-tmr-deux

Conversation

@qi-imbue
Copy link
Contributor

No description provided.

qi-imbue and others added 30 commits March 6, 2026 17:52
Introduce skitwright, a lightweight end-to-end testing framework for CLI
applications (a nod to Playwright). It provides:
- Session: runs shell commands and records a text transcript
- CommandResult: structured result with exit code, stdout, stderr
- expect(): fluent assertion API for results and strings
- Transcript: annotated text recording of all commands and outputs

Add 10 basic e2e tests for the mng CLI that exercise it exclusively
through its CLI interface (no library imports from mng):
- Help output (mng --help, mng create --help)
- List with no agents (table and JSON formats)
- Create + list (verifies agent appears in list)
- Create with JSON output format
- Create in headless mode
- Create + destroy lifecycle
- Create + rename
- Create with labels (verified via JSON list output)

The e2e test fixture provides full isolation: separate MNG_HOST_DIR,
MNG_PREFIX, MNG_ROOT_NAME, TMUX_TMPDIR, and disabled remote providers.
Each test saves a transcript file for debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace uuid4().hex[:8] with get_short_random_string()
- Replace relative import with absolute import
- Convert MngRunner from class with __init__ to NamedTuple
- Replace subprocess.run with skitwright run_command for tmux cleanup
- Add @pytest.mark.release to all e2e tests (subprocess-based, no coverage)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add test_ratchets.py for skitwright (required by meta ratchet)
- Add pytest config to skitwright pyproject.toml
- Replace MngRunner class with lambda-based mng fixture (avoids
  __init__, NamedTuple, dataclass, and inline function ratchets)
- Update MngRunFn type alias and test_basic.py to use function style

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ix markers

- Add unit tests for skitwright (expect, transcript, session): 33 new tests,
  100% coverage
- Factor repetitive agent creation into create_agent fixture in e2e conftest
- Change e2e test markers from @pytest.mark.release to @pytest.mark.acceptance
  so they run in CI on every PR
- Fix incorrect docstring claiming "no library imports from mng"
- Document skitwright as test-only dependency in mng pyproject.toml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The message command's CEL filter builder discarded the host/provider part
of agent addresses (e.g. agent@host.modal). Now host_name and provider_name
from parsed addresses are incorporated into the CEL filter, and the CEL
context includes host.name for matching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In _partition_destroy_targets, the loop over online host agents silently
skipped matched agents that were no longer present. Now raises
AgentNotFoundError if any matched agent ID is not found in get_agents().
Also removes the redundant seen_hosts set (dict keys are already unique).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes test_every_project_has_pypi_readme CI failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cleanup

- Remove succeeded/failed computed properties from CommandResult; use
  exit_code checks directly (per user feedback)
- Replace subprocess.run with Popen+threads for real-time interleaved
  stdout/stderr capture in the transcript (line-buffered)
- Add OutputSource enum and OutputLine data type for interleaved output
- Remove uv run prefix from mng commands (already in PATH via uv run pytest)
- Add mng destroy --all --force cleanup in e2e fixture teardown
- Add MNG_E2E_KEEP_ON_FAILURE env var to keep agents running on failure
- Print transcript path to stderr on test failure
- Update README, ratchets, and tests accordingly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert message.py, message_test.py, destroy.py changes that belong to
another branch. Fix timeout test failure in CI by using process_group=0
and os.killpg() to kill the entire process tree (not just the shell).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add --cov=imbue.skitwright to root pyproject.toml addopts so coverage
is tracked in monorepo-level CI runs. Also add proc.wait() after
os.killpg() in timeout path to deterministically reap the process.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The emit tests previously had no assertions (only verified functions
don't crash). Now they verify actual output: human format checks the
value appears in stdout, JSONL checks the parsed event structure.
Removed JSON-format variants since emit_event is silent in JSON mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The timeout code path was discarding all pre-timeout stderr output,
replacing it with just the timeout message. Now reconstructs real
stderr from captured lines and appends the timeout message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JSON mode emit tests were removed in the previous commit since
emit_event is silent in JSON mode. Restored them with assertions
that stdout is empty, verifying the code path executes without
crashing and produces no output as expected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-- splitting

- Add --provider option to select which provider to launch agents on
  (e.g. docker, modal). All code that previously hardcoded LOCAL_PROVIDER_NAME
  now uses the configurable provider. Each agent tracks its own host reference
  to support providers that create a separate host per agent.

- Add --env option to pass environment variables to agents (KEY=VALUE,
  repeatable). Uses the same resolve_env_vars utility as mng create.

- Add --label option to attach labels to all launched agents (KEY=VALUE,
  repeatable). Labels are applied to both test agents and integrator agents.

- Add --prompt-suffix option to append custom text to the agent prompt.

- Replace _split_pytest_args with _TmrCommand (same pattern as _CreateCommand
  in mng create) for robust -- separator handling at the Click parse level.
  Test collection args go before --, testing flags go after --.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Script parses a tutorial shell script into command blocks and matches
them against pytest functions by checking docstrings. Reports unmatched
blocks (needing tests) and unmatched tests (needing blocks).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add tutorial_matcher_test.py with 12 unit tests covering all functions
- Fix redundant file reads in find_pytest_functions (read once, reuse)
- Warn on stderr when skipping files with syntax errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… setup

- Move label KEY=VALUE parsing from tmr cli.py and create.py into a shared
  resolve_labels() function in env_utils.py (alongside resolve_env_vars).
  Both create.py and tmr cli.py now call resolve_labels().

- Extract _invoke_tmr_command() helper to eliminate duplicated Click command
  setup boilerplate across 5 _TmrCommand tests.

- Rename test_cli_help_contains_new_options to
  test_cli_help_contains_provider_env_label_options for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests for resolve_labels belong alongside the other env_utils tests in
the mng library, not in mng_tmr. This ensures the shared utility is
tested when running mng tests independently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reorder steps: handle unmatched functions before adding new tests,
  since some may pair with modified script blocks
- Unmatched functions: adapt docstring+logic if a close-match block
  exists, remove if the block was deleted entirely
- Unmatched blocks: clarify that one block may need multiple tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
qi-imbue and others added 30 commits March 19, 2026 05:23
- Extract _assert_provider_disabled helper in test_create_remote.py
  to eliminate 80 lines of duplicated assertion code (MAJOR)
- Tighten regex in test_create_address_syntax_existing_host
- Add exit code verification to test_create_with_plugin_flags
- Remove misleading @pytest.mark.docker from test_create_docker_start_args
- Remove scroll container from asciinema player (rows param is enough)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Breadcrumbs now include the current page title as bold text (not a
  link) at the end of the trail, on all three page levels
- Player uses fit:'none' with terminalFontSize:'12px' instead of
  fit:'width' with rows:20, giving a more compact native-sized player
- Agent names in transcripts (matching cast file stems) are linked to
  their corresponding recording sections via anchor tags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move tutorial block text from docstrings into explicit
e2e.write_tutorial_block() calls that write tutorial_block.txt to the
test output directory. This makes the blocks visible in the test output
viewer alongside transcripts and recordings.

- Add E2eSession subclass in conftest with write_tutorial_block() that
  dedents and writes the block to the output directory
- Convert all 6 test files from docstrings to write_tutorial_block()
- Simplify tutorial_matcher.py to use text matching instead of AST
  walking: parses functions by "def test_" lines, extracts block text
  from write_tutorial_block() calls or docstrings via regex
- Update test output viewer to show tutorial_block.txt content
- Update slash command documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace AST walking and regex extraction with simple line-by-line
comparison: strip leading whitespace from both script block lines and
function body lines, then check if all block lines appear in order in
the body. No docstring or write_tutorial_block parsing needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resource guard requires the docker marker when the command string
contains 'docker'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add collapsible left sidebar on test pages showing all tests in the
  same run for quick navigation. Collapse state persists across page
  loads via localStorage. Current test is bolded.
- Change transcript comment color from green to yellow (#dcdcaa)
- Render tutorial_block.txt with the same color scheme as transcripts:
  yellow for comment lines (starting with #), blue for command lines
- Add @pytest.mark.docker to test_create_docker_start_args

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sidebar CSS was defined as a plain string with {{ }} (f-string
escape syntax) but was interpolated into an f-string via {sidebar_css},
causing literal {{ }} to appear in the rendered CSS instead of { }.
Switch to string concatenation to avoid the escaping issue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sidebar toggle is now a button outside the collapsible content so it
  stays visible when the sidebar is collapsed
- Widen sidebar from 240px to 300px to avoid cropping test names
- Run listing page now uses the same sidebar layout as the test detail
  page, eliminating layout shift when navigating between them
- Extract _build_test_sidebar helper shared by both pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Run listing page shows the full test list directly (no sidebar)
- Sidebar only appears on test detail pages
- Breadcrumb (nav) is rendered ABOVE the sidebar layout so it stays in
  the same position on all pages -- no layout shift when navigating
- Refactored _html_page to take nav as a separate parameter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename sections: "Tutorial Block" -> "Tutorial block",
  "Transcript" -> "CLI transcript", "Recording" -> "TUI recording"
- Remove redundant cast filename label (already in heading)
- Match asciinema player font size to transcript (0.85em)
- Player already left-aligned (fit: 'none' renders at native size)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add pytest CLI flags for controlling e2e test behavior:
- --mng-e2e-keep-env={yes,on-failure,no} replaces the MNG_E2E_KEEP_ON_FAILURE
  env var. Controls whether agents and tmux sessions survive after tests.
- --mng-e2e-artifacts={yes,on-failure,no} controls whether transcript,
  asciinema recordings, and tutorial block files are saved.

When the environment is kept alive, a destroy-env shell script is written
to the test output directory that cleans up agents and kills tmux.

On test failure, stderr now includes a pointer to DEBUGGING.md which
documents how to inspect test artifacts and interact with a kept
environment (mng commands, tmux attach, env vars to set).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The matcher was rewritten to use line-based matching instead of AST
docstring extraction. Update tests to use _block_lines_in_body instead
of block_matches_docstring, and test body extraction instead of
docstring extraction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test_prevent_bare_print ratchet scans all .py files under
libs/mng/imbue/mng/ and disallows bare print() calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove E2eSession.__init__ (init methods ratchet): use classmethod
  create() that sets output_dir after construction instead
- Replace all CSS hex color codes with rgb() notation to avoid
  triggering the trailing comments ratchet
- Reword docstring to avoid false positive trailing comment match
- Add else: pass to ANSI parser if/elif chain

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…atches

tutorial_matcher_test.py:
- Replace nonexistent block_matches_docstring with _block_lines_in_body
- Fix assertion that body is None (string parser always returns a string)
- Fix assertion that syntax errors produce empty results (string parser
  still extracts functions from malformed files)

api.py:
- Remove bare Exception from two except clauses in polling loops, keeping
  only the specific expected exceptions (MngError, HostError,
  ConcurrencyGroupError, OSError)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The fixture was duplicated in notifier_test.py and watcher_test.py.
Per CLAUDE.md, all fixtures must be in conftest.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
install.sh: Defer bash PATH and mng PATH warnings to the end of the
script so they remain visible after all interactive prompts.

test_create_commands.py: Add behavioral assertions beyond just checking
mng list: verify custom command running via mng exec, check idle_timeout
in JSON, dirty working tree before --no-ensure-clean test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --mng-e2e-artifacts=no and --mng-e2e-keep-env=yes, the output
directory was deleted before _write_destroy_script tried to write to
it. Now the directory is preserved whenever keep_env is true.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reject --mng-e2e-artifacts lower than --mng-e2e-keep-env at config
  time (yes=2 > on-failure=1 > no=0) since keeping the env requires
  the output directory for the destroy-env script
- Print all env vars (including MNG_ROOT_NAME and CWD) when keeping
  the environment alive
- Update DEBUGGING.md: document mng capture and mng message as
  portable alternatives to tmux attach, include MNG_ROOT_NAME in
  example env var block
- Add comment to test_create_remote.py explaining why tests are kept
  as separate functions instead of parametrized (1:1 tutorial mapping)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Tighten _PROVIDER_ERROR_PATTERN to require both provider name and
  error type (e.g. "modal.*not authorized") instead of matching any
  word individually
- test_create_with_env: verify env var via mng exec instead of just
  checking mng list
- test_create_with_agent_args: verify "--model opus" appears in the
  agent's command field in JSON output
- Add explanatory comment in test_create_remote.py about intentional
  1:1 test-to-block correspondence

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- tmux examples use TMUX= (empty string) instead of unset TMUX
- All mng command examples include MNG_HOST_DIR, MNG_PREFIX, and
  MNG_ROOT_NAME as inline env variable prefixes
- Remove the export block since inline prefixes are more copy-pasteable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MNG_HOST_DIR alone is sufficient to isolate the test environment from
the host mng -- it segregates all agent data. MNG_PREFIX and
MNG_ROOT_NAME are already set by the parent autouse fixture and
inherited via os.environ.copy(), so explicitly setting them in the e2e
subprocess env was redundant and confusing.

Remove MNG_PREFIX and MNG_ROOT_NAME from:
- The e2e fixture's env dict construction
- The e2e fixture's parameter list
- The destroy-env script
- The keep-env debug output
- DEBUGGING.md examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant