feat(cookbook): local-model lifecycle management (scan/recommend/download/serve/list/stop)#173
Merged
Conversation
…load/serve/list/stop)
CODEC Cookbook — Build Brief v2. Adds local-model lifecycle management for the
M1 Ultra as CODEC skills + a non-skill helper package, so the SkillRegistry
hot-loads them and codec-mcp-http exposes the read-only ones automatically.
Structurally incapable of disturbing the running stack:
* Serve range is 8110-8119; allocation skips any port a live socket probe +
pm2 jlist + our served.json say is taken. Protected ports
(8083/8090/8094/9223/5678) are out of range AND explicitly skipped.
* stop() only ever deletes a process we recorded in served.json, named
`cookbook-…`, not on a protected port, and only with confirm=True. Anything
else → layered refusal (or a dry-run). Verified by 13 stop-guard tests.
* Nothing issues docker stop/rm, changes a service's port, or restarts/stops
a non-cookbook process.
Helper package codec_cookbook/ (NOT skills — so the OS/subprocess/PM2/network
work stays out of the AST-gated skill files):
probe.py read-only hw + vm_stat + pm2 jlist + socket port probe + mlx ver
catalog.py + catalog.json — 5 verified MLX models (qwen3.6@8083 NOT included)
fit.py Hub-derived weight size + real-config KV cache + (w+kv)*1.10+1.5;
available_gb (total−24−resident), fits (margin 8), recommend
serve.py interpreter discovery (config→sys.executable), port alloc, PM2
launch (corrected `python -m mlx_lm server --max-tokens 16384`
/ `llama-server -ngl 999`), /v1/models health poll, served.json,
and the STOP-GUARD
download.py detached HF snapshot_download jobs + file-based status
args.py parse model_id/context/flags/port/role from the task string
Six thin skills (skills/cookbook_*.py) — each `run(task, app, ctx) -> str`
(CODEC's real contract; the brief's async-kwargs shape was adapted), parse args,
call a helper, format the result. All pass the load-time AST safety gate (they
import only codec_cookbook + re). Manifest regenerated (82 built-ins).
MCP exposure: read-only scan/recommend/list = SKILL_MCP_EXPOSE True; the three
process/network-mutating skills (serve/download/stop) = False, matching the
pm2_control precedent (local + dashboard + voice, not MCP). The brief wanted all
exposed over MCP — see PR notes; flipping the mutating ones on is a one-line +
_HTTP_BLOCKED decision left to the operator.
Tests: tests/test_cookbook.py — 42 tests (catalog, fit golden anchors + KV +
offline fallback, args, port allocation, 13 stop-guard cases, serve→list→stop
integration with PM2/health mocked, skill discovery + refusal behavior). Full
suite: 2,140 passed / 77 skipped. ruff clean.
Deferred to Phase 1.5 per brief: SSH remote-serve to the Linux box.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Merged
3 tasks
AVADSA25
added a commit
that referenced
this pull request
Jun 1, 2026
…rged at stuck head) (#174) * fix(cookbook): protected-port set = verified live core stack (add 8084/8085/9222) Probed the actual box with lsof rather than trusting the brief's enumerated list. Findings: 8083 mlx_vlm.server Qwen3.6 LLM + UI-TARS/VLM vision LIVE 8084 whisper_server STT LIVE ← was missing 8085 mlx_audio.server TTS LIVE ← was missing 8090 codec-dashboard LIVE 8094 pilot-runner LIVE 9222 Chrome DevTools CDP (routes/cdp.py + chrome skills) on-demand ← added 9223 pilot CDP on-demand 5678 n8n LIVE 8081/8082 FREE — Qwen+vision consolidated onto 8083, slots vacated PROTECTED_PORTS {8083,8090,8094,9223,5678} → {8083,8084,8085,8090,8094,9222,9223,5678}. 8084/8085 were live core services (STT/TTS) absent from the denylist; 9222 is the Chrome CDP sibling of 9223 that the chrome skills probe. 8081/8082 left out (genuinely free now) but documented inline so a future reader knows they were checked, not forgotten. This is belt-and-suspenders only — allocate_port() already skips any live-bound port at call time, the serve range (8110-8119) never intersects these, and stop() refuses anything outside the cookbook- namespace regardless. The point is an accurate static denylist so a manual pm2 delete reads off the right list. Tests: protected-port parametrize widened to all 8; new test_protected_set_covers_live_core_stack pins the live set. 46 cookbook tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(cookbook): cover download.py with mocked HF spawn (no network in CI) Adds TestDownload (4 tests): start() writes 'starting' + spawns a DETACHED python -c snapshot_download runner (asserts the argv, never executes it), status() reconciles a dead pid to 'interrupted', not_started default, and idempotent start when a job is already running. subprocess.Popen is mocked throughout — the HF network path never runs under CI (pre-empts the local↔CI divergence on the download path). 50 cookbook tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Mickael Farina <farina.mickael@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CODEC Cookbook (Build Brief v2) — local-model lifecycle management for the M1 Ultra, implemented as CODEC skills + a non-skill helper package so the
SkillRegistryhot-loads them andcodec-mcp-httpexposes the read-only ones automatically. Scan hardware → recommend models that fit → download → serve under PM2 on dedicated ports → list → stop.Additive only — the entire design goal is to be structurally incapable of disturbing the running stack.
Hard safety constraints (enforced + tested)
pm2 jlist+ our ownserved.jsonsay is taken. Protected ports (8083/8090/8094/9223/5678) are out of range and explicitly skipped.stop()is guarded four ways, in order: not-a-process-we-started → refused; resolved port is protected → refused; pm2 name notcookbook-→ refused;confirm != True→ dry-run. Only thenpm2 delete. 13 stop-guard tests cover every protected port, every core service name (qwen3.6,codec-dashboard,pilot-runner,n8n), bound-but-not-ours ports, and the confirm gate.docker stop/rm, never changes a service's port, never restarts/stops a non-cookbook process.Structure
codec_cookbook/(helper package — keeps OS/subprocess/PM2/network out of the AST-gated skill files):probe.py— read-only hw +vm_stat+pm2 jlist+ socket port probe + mlx version gatecatalog.py+catalog.json— 5 verified MLX models (the liveqwen3.6@8083is deliberately not included)fit.py— Hub-derived weight size + real-config.jsonKV cache,(w+kv)×1.10+1.5;available_gb = total−24−resident,fits(margin 8),recommendserve.py— interpreter discovery (config.json:cookbook.mlx_python→sys.executable), port allocation, PM2 launch with the corrected commands (python -m mlx_lm server … --max-tokens 16384;llama-server … -ngl 999),/v1/modelshealth poll,served.json, and the stop-guarddownload.py— detached HFsnapshot_downloadjobs + file-based status (survives across skill calls; reconciles dead pids)args.py— parsemodel_id/context/flags/port/role from the task stringSix thin skills (
skills/cookbook_*.py), eachrun(task, app, ctx) -> str(CODEC's real contract), parsing args + formatting helper output. All pass the load-time AST safety gate (import onlycodec_cookbook+re). Manifest regenerated → 82 built-ins.Deviations from the brief (flagged)
async def run(model_id=…, context_length=…)isn't CODEC's contract — skills are syncrun(task, app, ctx) -> str. So skills parse structured args fromtaskand the helpers return the structured dicts (testable); skills format them to strings.SKILL_TAGSisn't extracted by the registry (only NAME/DESCRIPTION/TRIGGERS/MCP_EXPOSE/observation) — included for docs, butSKILL_TRIGGERSis what's functional.scan/recommend/list→SKILL_MCP_EXPOSE=True. The three mutating skills (serve/download/stop) →False, matching thepm2_controlprecedent (usable via dashboard chat / voice / local, not MCP). The brief said "codec-mcp-http exposes them automatically" — flipping the mutating ones on is a one-lineSKILL_MCP_EXPOSE=True+ adding them to_HTTP_BLOCKED(so stdio-MCP works but claude.ai-over-HTTP can't drive process lifecycle). I left that to you since it touches the_HTTP_BLOCKEDsecurity boundary — see the question below.Test plan
tests/test_cookbook.py— 42 tests: catalog, fit golden anchors (30B→17.2, 80B→42) + KV math + GQA fallback + offline anchor fallback, args parsing, port-allocation bounds, 13 stop-guard cases, serve→list→stop integration (PM2 + health mocked), skill discovery + refusal/force behavior, corrected MLX command shapepython3.13 -m pytest --ignore=tests/test_skills.py -q→ 2,140 passed, 77 skippedtest_skills.py: only the 4 pre-existingpilot_*failures (no cookbook failures)ruff check: 0 issues; manifest drift-check passes (82 skills)huggingface_hub 1.14.0,mlx_lm 0.31.3(≥ 0.25.2 gate)Deferred to Phase 1.5 per the brief: SSH remote-serve to the 192.168.1.167 Linux box.
🤖 Generated with Claude Code