Skip to content

feat(cookbook): local-model lifecycle management (scan/recommend/download/serve/list/stop)#173

Merged
AVADSA25 merged 1 commit into
mainfrom
cookbook-phase1
Jun 1, 2026
Merged

feat(cookbook): local-model lifecycle management (scan/recommend/download/serve/list/stop)#173
AVADSA25 merged 1 commit into
mainfrom
cookbook-phase1

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

@AVADSA25 AVADSA25 commented Jun 1, 2026

Summary

CODEC Cookbook (Build Brief v2) — local-model lifecycle management for the M1 Ultra, implemented as CODEC skills + a non-skill helper package so the SkillRegistry hot-loads them and codec-mcp-http exposes the read-only ones automatically. Scan hardware → recommend models that fit → download → serve under PM2 on dedicated ports → list → stop.

Additive only — the entire design goal is to be structurally incapable of disturbing the running stack.

Hard safety constraints (enforced + tested)

  • Serve range 8110-8119. Allocation skips any port a live socket probe + pm2 jlist + our own served.json say is taken. Protected ports (8083/8090/8094/9223/5678) are out of range and explicitly skipped.
  • stop() is guarded four ways, in order: not-a-process-we-started → refused; resolved port is protected → refused; pm2 name not cookbook- → refused; confirm != True → dry-run. Only then pm2 delete. 13 stop-guard tests cover every protected port, every core service name (qwen3.6, codec-dashboard, pilot-runner, n8n), bound-but-not-ours ports, and the confirm gate.
  • Never docker stop/rm, never changes a service's port, never restarts/stops a non-cookbook process.

Structure

codec_cookbook/ (helper package — keeps OS/subprocess/PM2/network out of the AST-gated skill files):

  • probe.py — read-only hw + vm_stat + pm2 jlist + socket port probe + mlx version gate
  • catalog.py + catalog.json — 5 verified MLX models (the live qwen3.6@8083 is deliberately not included)
  • fit.py — Hub-derived weight size + real-config.json KV cache, (w+kv)×1.10+1.5; available_gb = total−24−resident, fits (margin 8), recommend
  • serve.py — interpreter discovery (config.json:cookbook.mlx_pythonsys.executable), port allocation, PM2 launch with the corrected commands (python -m mlx_lm server … --max-tokens 16384; llama-server … -ngl 999), /v1/models health poll, served.json, and the stop-guard
  • download.py — detached HF snapshot_download jobs + file-based status (survives across skill calls; reconciles dead pids)
  • args.py — parse model_id/context/flags/port/role from the task string

Six thin skills (skills/cookbook_*.py), each run(task, app, ctx) -> str (CODEC's real contract), parsing args + formatting helper output. All pass the load-time AST safety gate (import only codec_cookbook + re). Manifest regenerated → 82 built-ins.

Deviations from the brief (flagged)

  1. Skill contract. The brief's async def run(model_id=…, context_length=…) isn't CODEC's contract — skills are sync run(task, app, ctx) -> str. So skills parse structured args from task and the helpers return the structured dicts (testable); skills format them to strings.
  2. SKILL_TAGS isn't extracted by the registry (only NAME/DESCRIPTION/TRIGGERS/MCP_EXPOSE/observation) — included for docs, but SKILL_TRIGGERS is what's functional.
  3. MCP exposure — operator decision. Read-only scan/recommend/listSKILL_MCP_EXPOSE=True. The three mutating skills (serve/download/stop) → False, matching the pm2_control precedent (usable via dashboard chat / voice / local, not MCP). The brief said "codec-mcp-http exposes them automatically" — flipping the mutating ones on is a one-line SKILL_MCP_EXPOSE=True + adding them to _HTTP_BLOCKED (so stdio-MCP works but claude.ai-over-HTTP can't drive process lifecycle). I left that to you since it touches the _HTTP_BLOCKED security boundary — see the question below.

Test plan

  • tests/test_cookbook.py42 tests: catalog, fit golden anchors (30B→17.2, 80B→42) + KV math + GQA fallback + offline anchor fallback, args parsing, port-allocation bounds, 13 stop-guard cases, serve→list→stop integration (PM2 + health mocked), skill discovery + refusal/force behavior, corrected MLX command shape
  • python3.13 -m pytest --ignore=tests/test_skills.py -q2,140 passed, 77 skipped
  • test_skills.py: only the 4 pre-existing pilot_* failures (no cookbook failures)
  • AST safety gate: all 6 skill files clean; registry discovers all 6 with correct MCP exposure + triggers
  • ruff check: 0 issues; manifest drift-check passes (82 skills)
  • Environment check: huggingface_hub 1.14.0, mlx_lm 0.31.3 (≥ 0.25.2 gate)

Deferred to Phase 1.5 per the brief: SSH remote-serve to the 192.168.1.167 Linux box.

🤖 Generated with Claude Code

…load/serve/list/stop)

CODEC Cookbook — Build Brief v2. Adds local-model lifecycle management for the
M1 Ultra as CODEC skills + a non-skill helper package, so the SkillRegistry
hot-loads them and codec-mcp-http exposes the read-only ones automatically.

Structurally incapable of disturbing the running stack:
  * Serve range is 8110-8119; allocation skips any port a live socket probe +
    pm2 jlist + our served.json say is taken. Protected ports
    (8083/8090/8094/9223/5678) are out of range AND explicitly skipped.
  * stop() only ever deletes a process we recorded in served.json, named
    `cookbook-…`, not on a protected port, and only with confirm=True. Anything
    else → layered refusal (or a dry-run). Verified by 13 stop-guard tests.
  * Nothing issues docker stop/rm, changes a service's port, or restarts/stops
    a non-cookbook process.

Helper package codec_cookbook/ (NOT skills — so the OS/subprocess/PM2/network
work stays out of the AST-gated skill files):
  probe.py    read-only hw + vm_stat + pm2 jlist + socket port probe + mlx ver
  catalog.py  + catalog.json — 5 verified MLX models (qwen3.6@8083 NOT included)
  fit.py      Hub-derived weight size + real-config KV cache + (w+kv)*1.10+1.5;
              available_gb (total−24−resident), fits (margin 8), recommend
  serve.py    interpreter discovery (config→sys.executable), port alloc, PM2
              launch (corrected `python -m mlx_lm server --max-tokens 16384`
              / `llama-server -ngl 999`), /v1/models health poll, served.json,
              and the STOP-GUARD
  download.py detached HF snapshot_download jobs + file-based status
  args.py     parse model_id/context/flags/port/role from the task string

Six thin skills (skills/cookbook_*.py) — each `run(task, app, ctx) -> str`
(CODEC's real contract; the brief's async-kwargs shape was adapted), parse args,
call a helper, format the result. All pass the load-time AST safety gate (they
import only codec_cookbook + re). Manifest regenerated (82 built-ins).

MCP exposure: read-only scan/recommend/list = SKILL_MCP_EXPOSE True; the three
process/network-mutating skills (serve/download/stop) = False, matching the
pm2_control precedent (local + dashboard + voice, not MCP). The brief wanted all
exposed over MCP — see PR notes; flipping the mutating ones on is a one-line +
_HTTP_BLOCKED decision left to the operator.

Tests: tests/test_cookbook.py — 42 tests (catalog, fit golden anchors + KV +
offline fallback, args, port allocation, 13 stop-guard cases, serve→list→stop
integration with PM2/health mocked, skill discovery + refusal behavior). Full
suite: 2,140 passed / 77 skipped. ruff clean.

Deferred to Phase 1.5 per brief: SSH remote-serve to the Linux box.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@AVADSA25 AVADSA25 merged commit 112db67 into main Jun 1, 2026
1 check passed
AVADSA25 added a commit that referenced this pull request Jun 1, 2026
…rged at stuck head) (#174)

* fix(cookbook): protected-port set = verified live core stack (add 8084/8085/9222)

Probed the actual box with lsof rather than trusting the brief's enumerated
list. Findings:
  8083  mlx_vlm.server   Qwen3.6 LLM + UI-TARS/VLM vision   LIVE
  8084  whisper_server   STT                                LIVE  ← was missing
  8085  mlx_audio.server TTS                                LIVE  ← was missing
  8090  codec-dashboard                                     LIVE
  8094  pilot-runner                                        LIVE
  9222  Chrome DevTools CDP (routes/cdp.py + chrome skills) on-demand ← added
  9223  pilot CDP                                           on-demand
  5678  n8n                                                 LIVE
  8081/8082  FREE — Qwen+vision consolidated onto 8083, slots vacated

PROTECTED_PORTS {8083,8090,8094,9223,5678} → {8083,8084,8085,8090,8094,9222,9223,5678}.
8084/8085 were live core services (STT/TTS) absent from the denylist; 9222 is
the Chrome CDP sibling of 9223 that the chrome skills probe. 8081/8082 left out
(genuinely free now) but documented inline so a future reader knows they were
checked, not forgotten.

This is belt-and-suspenders only — allocate_port() already skips any live-bound
port at call time, the serve range (8110-8119) never intersects these, and
stop() refuses anything outside the cookbook- namespace regardless. The point
is an accurate static denylist so a manual pm2 delete reads off the right list.

Tests: protected-port parametrize widened to all 8; new
test_protected_set_covers_live_core_stack pins the live set. 46 cookbook tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(cookbook): cover download.py with mocked HF spawn (no network in CI)

Adds TestDownload (4 tests): start() writes 'starting' + spawns a DETACHED
python -c snapshot_download runner (asserts the argv, never executes it),
status() reconciles a dead pid to 'interrupted', not_started default, and
idempotent start when a job is already running. subprocess.Popen is mocked
throughout — the HF network path never runs under CI (pre-empts the
local↔CI divergence on the download path). 50 cookbook tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Mickael Farina <farina.mickael@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants