feat(cookbook): local-model lifecycle management (scan/recommend/download/serve/list/stop) by AVADSA25 · Pull Request #173 · AVADSA25/codec

AVADSA25 · 2026-06-01T09:26:39Z

Summary

CODEC Cookbook (Build Brief v2) — local-model lifecycle management for the M1 Ultra, implemented as CODEC skills + a non-skill helper package so the SkillRegistry hot-loads them and codec-mcp-http exposes the read-only ones automatically. Scan hardware → recommend models that fit → download → serve under PM2 on dedicated ports → list → stop.

Additive only — the entire design goal is to be structurally incapable of disturbing the running stack.

Hard safety constraints (enforced + tested)

Serve range 8110-8119. Allocation skips any port a live socket probe + pm2 jlist + our own served.json say is taken. Protected ports (8083/8090/8094/9223/5678) are out of range and explicitly skipped.
stop() is guarded four ways, in order: not-a-process-we-started → refused; resolved port is protected → refused; pm2 name not cookbook- → refused; confirm != True → dry-run. Only then pm2 delete. 13 stop-guard tests cover every protected port, every core service name (qwen3.6, codec-dashboard, pilot-runner, n8n), bound-but-not-ours ports, and the confirm gate.
Never docker stop/rm, never changes a service's port, never restarts/stops a non-cookbook process.

Structure

codec_cookbook/ (helper package — keeps OS/subprocess/PM2/network out of the AST-gated skill files):

probe.py — read-only hw + vm_stat + pm2 jlist + socket port probe + mlx version gate
catalog.py + catalog.json — 5 verified MLX models (the live qwen3.6@8083 is deliberately not included)
fit.py — Hub-derived weight size + real-config.json KV cache, (w+kv)×1.10+1.5; available_gb = total−24−resident, fits (margin 8), recommend
serve.py — interpreter discovery (config.json:cookbook.mlx_python → sys.executable), port allocation, PM2 launch with the corrected commands (python -m mlx_lm server … --max-tokens 16384; llama-server … -ngl 999), /v1/models health poll, served.json, and the stop-guard
download.py — detached HF snapshot_download jobs + file-based status (survives across skill calls; reconciles dead pids)
args.py — parse model_id/context/flags/port/role from the task string

Six thin skills (skills/cookbook_*.py), each run(task, app, ctx) -> str (CODEC's real contract), parsing args + formatting helper output. All pass the load-time AST safety gate (import only codec_cookbook + re). Manifest regenerated → 82 built-ins.

Deviations from the brief (flagged)

Skill contract. The brief's async def run(model_id=…, context_length=…) isn't CODEC's contract — skills are sync run(task, app, ctx) -> str. So skills parse structured args from task and the helpers return the structured dicts (testable); skills format them to strings.
SKILL_TAGS isn't extracted by the registry (only NAME/DESCRIPTION/TRIGGERS/MCP_EXPOSE/observation) — included for docs, but SKILL_TRIGGERS is what's functional.
MCP exposure — operator decision. Read-only scan/recommend/list → SKILL_MCP_EXPOSE=True. The three mutating skills (serve/download/stop) → False, matching the pm2_control precedent (usable via dashboard chat / voice / local, not MCP). The brief said "codec-mcp-http exposes them automatically" — flipping the mutating ones on is a one-line SKILL_MCP_EXPOSE=True + adding them to _HTTP_BLOCKED (so stdio-MCP works but claude.ai-over-HTTP can't drive process lifecycle). I left that to you since it touches the _HTTP_BLOCKED security boundary — see the question below.

Test plan

tests/test_cookbook.py — 42 tests: catalog, fit golden anchors (30B→17.2, 80B→42) + KV math + GQA fallback + offline anchor fallback, args parsing, port-allocation bounds, 13 stop-guard cases, serve→list→stop integration (PM2 + health mocked), skill discovery + refusal/force behavior, corrected MLX command shape
python3.13 -m pytest --ignore=tests/test_skills.py -q → 2,140 passed, 77 skipped
test_skills.py: only the 4 pre-existing pilot_* failures (no cookbook failures)
AST safety gate: all 6 skill files clean; registry discovers all 6 with correct MCP exposure + triggers
ruff check: 0 issues; manifest drift-check passes (82 skills)
Environment check: huggingface_hub 1.14.0, mlx_lm 0.31.3 (≥ 0.25.2 gate)

Deferred to Phase 1.5 per the brief: SSH remote-serve to the 192.168.1.167 Linux box.

🤖 Generated with Claude Code

…load/serve/list/stop) CODEC Cookbook — Build Brief v2. Adds local-model lifecycle management for the M1 Ultra as CODEC skills + a non-skill helper package, so the SkillRegistry hot-loads them and codec-mcp-http exposes the read-only ones automatically. Structurally incapable of disturbing the running stack: * Serve range is 8110-8119; allocation skips any port a live socket probe + pm2 jlist + our served.json say is taken. Protected ports (8083/8090/8094/9223/5678) are out of range AND explicitly skipped. * stop() only ever deletes a process we recorded in served.json, named `cookbook-…`, not on a protected port, and only with confirm=True. Anything else → layered refusal (or a dry-run). Verified by 13 stop-guard tests. * Nothing issues docker stop/rm, changes a service's port, or restarts/stops a non-cookbook process. Helper package codec_cookbook/ (NOT skills — so the OS/subprocess/PM2/network work stays out of the AST-gated skill files): probe.py read-only hw + vm_stat + pm2 jlist + socket port probe + mlx ver catalog.py + catalog.json — 5 verified MLX models (qwen3.6@8083 NOT included) fit.py Hub-derived weight size + real-config KV cache + (w+kv)*1.10+1.5; available_gb (total−24−resident), fits (margin 8), recommend serve.py interpreter discovery (config→sys.executable), port alloc, PM2 launch (corrected `python -m mlx_lm server --max-tokens 16384` / `llama-server -ngl 999`), /v1/models health poll, served.json, and the STOP-GUARD download.py detached HF snapshot_download jobs + file-based status args.py parse model_id/context/flags/port/role from the task string Six thin skills (skills/cookbook_*.py) — each `run(task, app, ctx) -> str` (CODEC's real contract; the brief's async-kwargs shape was adapted), parse args, call a helper, format the result. All pass the load-time AST safety gate (they import only codec_cookbook + re). Manifest regenerated (82 built-ins). MCP exposure: read-only scan/recommend/list = SKILL_MCP_EXPOSE True; the three process/network-mutating skills (serve/download/stop) = False, matching the pm2_control precedent (local + dashboard + voice, not MCP). The brief wanted all exposed over MCP — see PR notes; flipping the mutating ones on is a one-line + _HTTP_BLOCKED decision left to the operator. Tests: tests/test_cookbook.py — 42 tests (catalog, fit golden anchors + KV + offline fallback, args, port allocation, 13 stop-guard cases, serve→list→stop integration with PM2/health mocked, skill discovery + refusal behavior). Full suite: 2,140 passed / 77 skipped. ruff clean. Deferred to Phase 1.5 per brief: SSH remote-serve to the Linux box. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rged at stuck head) (#174) * fix(cookbook): protected-port set = verified live core stack (add 8084/8085/9222) Probed the actual box with lsof rather than trusting the brief's enumerated list. Findings: 8083 mlx_vlm.server Qwen3.6 LLM + UI-TARS/VLM vision LIVE 8084 whisper_server STT LIVE ← was missing 8085 mlx_audio.server TTS LIVE ← was missing 8090 codec-dashboard LIVE 8094 pilot-runner LIVE 9222 Chrome DevTools CDP (routes/cdp.py + chrome skills) on-demand ← added 9223 pilot CDP on-demand 5678 n8n LIVE 8081/8082 FREE — Qwen+vision consolidated onto 8083, slots vacated PROTECTED_PORTS {8083,8090,8094,9223,5678} → {8083,8084,8085,8090,8094,9222,9223,5678}. 8084/8085 were live core services (STT/TTS) absent from the denylist; 9222 is the Chrome CDP sibling of 9223 that the chrome skills probe. 8081/8082 left out (genuinely free now) but documented inline so a future reader knows they were checked, not forgotten. This is belt-and-suspenders only — allocate_port() already skips any live-bound port at call time, the serve range (8110-8119) never intersects these, and stop() refuses anything outside the cookbook- namespace regardless. The point is an accurate static denylist so a manual pm2 delete reads off the right list. Tests: protected-port parametrize widened to all 8; new test_protected_set_covers_live_core_stack pins the live set. 46 cookbook tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(cookbook): cover download.py with mocked HF spawn (no network in CI) Adds TestDownload (4 tests): start() writes 'starting' + spawns a DETACHED python -c snapshot_download runner (asserts the argv, never executes it), status() reconciles a dead pid to 'interrupted', not_started default, and idempotent start when a job is already running. subprocess.Popen is mocked throughout — the HF network path never runs under CI (pre-empts the local↔CI divergence on the download path). 50 cookbook tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Mickael Farina <farina.mickael@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

AVADSA25 merged commit 112db67 into main Jun 1, 2026
1 check passed

AVADSA25 mentioned this pull request Jun 1, 2026

fix(cookbook): re-land stranded port-fix + download tests (PR #173 merged at stuck head) #174

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cookbook): local-model lifecycle management (scan/recommend/download/serve/list/stop)#173

feat(cookbook): local-model lifecycle management (scan/recommend/download/serve/list/stop)#173
AVADSA25 merged 1 commit into
mainfrom
cookbook-phase1

AVADSA25 commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented Jun 1, 2026

Summary

Hard safety constraints (enforced + tested)

Structure

Deviations from the brief (flagged)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants