Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ docs/ API.md, MCP_HTTP_SETUP.md, CONTEXT_REPORT.md, desig

Other engine modules (`codec_overlays`, `codec_metrics`, `codec_logging`, `codec_gdocs`, `codec_google_auth`, `codec_cdp`, `codec_llm_proxy`, `codec_retry`, `codec_alerts`, `codec_search`, `codec_textassist`, `codec_watcher`, `codec_watchdog`) are internal helpers — read them when you need them, but they're not part of the navigation surface for an agent making structural changes. (Keyboard handling — wake word, F13 toggle, F18 voice, double-tap — lives **inline in `codec.py`** in the `codec` PM2 process; the old standalone `codec_keyboard.py` was deleted as a dead duplicate per A-8.)

**Canonical LLM + vision helpers (PR-3E, A-11/A-12).** `codec_vision.py` is the SINGLE source for screen-vision (`describe_sync` / `describe_async`, Gemini-flash → local-Qwen-VL fallback, config read live from `codec_config`) — used by `codec.py`, `codec_voice`, `codec_session`. `codec_llm.py` is the canonical chat/completions caller (`call()` + `strip_think`/`extract_content` — headers, Bearer auth, `enable_thinking`, `<think>` strip, `choices/reasoning` parse, retry+backoff, never-raises). NOTE: `codec_llm_proxy.py` is a priority *queue* (semaphore), NOT an HTTP caller — don't confuse the two. A-12 is migrating the ~45 inline `chat/completions` sites onto `codec_llm` in phased tranches. Done: `codec_llm.call()` (non-stream; + `raise_on_error=True` raising `codec_llm.LLMError` for fail-loud callers) + `stream()` (sync SSE generator, yields raw deltas, never-raises); migrated sites = codec.py voice-reply, `codec_session.qwen_call` + `qwen_stream`, `codec_compaction`, `codec_dictate`, `codec_textassist`, the regen script, and `codec_agent_plan`/`codec_agent_runner` `_qwen_chat` (adapter maps `LLMError` → their public `QwenUnavailableError`). Pending tranches: an async `astream()` for voice `_stream_qwen` + agents `Agent.run` (queue stays at the call site — `codec_llm` never owns the semaphore), dashboard (4 non-stream + the `[SKILL:…]` stream tag-machine, which keeps its own parser and consumes only `stream()`'s raw tokens), bridges, and a skills tranche.
**Canonical LLM + vision helpers (PR-3E, A-11/A-12).** `codec_vision.py` is the SINGLE source for screen-vision (`describe_sync` / `describe_async`, Gemini-flash → local-Qwen-VL fallback, config read live from `codec_config`) — used by `codec.py`, `codec_voice`, `codec_session`. `codec_llm.py` is the canonical chat/completions caller (`call()` + `strip_think`/`extract_content` — headers, Bearer auth, `enable_thinking`, `<think>` strip, `choices/reasoning` parse, retry+backoff, never-raises). NOTE: `codec_llm_proxy.py` is a priority *queue* (semaphore), NOT an HTTP caller — don't confuse the two. A-12 is migrating the ~45 inline `chat/completions` sites onto `codec_llm` in phased tranches. Done: `codec_llm.call()` (non-stream; + `raise_on_error=True` raising `codec_llm.LLMError` for fail-loud callers) + `stream()` (sync SSE generator, yields raw deltas, never-raises); migrated sites = codec.py voice-reply, `codec_session.qwen_call` + `qwen_stream`, `codec_compaction`, `codec_dictate`, `codec_textassist`, the regen script, and `codec_agent_plan`/`codec_agent_runner` `_qwen_chat` (adapter maps `LLMError` → their public `QwenUnavailableError`), and the `codec_telegram`/`codec_imessage` `call_llm` text sites (default never-raise → `None`-contract preserved; their vision sites are A-11/codec_vision, not migrated here). Pending tranches: an async `astream()` for voice `_stream_qwen` + agents `Agent.run` (queue stays at the call site — `codec_llm` never owns the semaphore), dashboard (4 non-stream + the `[SKILL:…]` stream tag-machine, which keeps its own parser and consumes only `stream()`'s raw tokens), and a skills tranche.

## 3. Agent + Crew runtime

Expand Down
54 changes: 12 additions & 42 deletions codec_imessage.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
"""

import os
import re
import json
import time
import sqlite3
Expand Down Expand Up @@ -323,47 +322,18 @@ def call_llm(text, sender, llm_cfg, conversation_history=None, system_prompt_ove

messages.append({"role": "user", "content": text})

headers = {"Content-Type": "application/json"}
if llm_cfg["api_key"]:
headers["Authorization"] = f"Bearer {llm_cfg['api_key']}"

payload = {
"model": llm_cfg["model"],
"messages": messages,
"max_tokens": 1500,
"temperature": 0.7,
"stream": False,
"chat_template_kwargs": {"enable_thinking": False},
}
# Merge extra kwargs (but protect chat_template_kwargs)
payload.update({k: v for k, v in llm_cfg["kwargs"].items() if k != "chat_template_kwargs"})

try:
r = requests.post(
f"{llm_cfg['base_url']}/chat/completions",
json=payload,
headers=headers,
timeout=120,
)
data = r.json()

if "error" in data:
log.error(f"LLM error: {data['error']}")
return None
if "choices" not in data or not data["choices"]:
log.error(f"LLM no choices: {str(data)[:200]}")
return None

content = (data["choices"][0]["message"].get("content") or "").strip()
# Strip thinking tags
content = re.sub(r'<think>[\s\S]*?</think>', '', content).strip()
return content if content else None
except requests.exceptions.Timeout:
log.error("LLM timeout")
return None
except Exception as e:
log.error(f"LLM call failed: {e}")
return None
# A-12 (PR-3E-bridges): canonical codec_llm.call. Never-raise -> "" on any
# failure (error/no-choices/timeout/empty); mapped back to the bridge's None
# contract for graceful degradation. codec_llm strips <think>; kwargs are
# passed minus chat_template_kwargs so enable_thinking=False is preserved.
import codec_llm
extra = {k: v for k, v in llm_cfg["kwargs"].items() if k != "chat_template_kwargs"}
content = codec_llm.call(
messages, base_url=llm_cfg["base_url"], model=llm_cfg["model"],
api_key=llm_cfg["api_key"], max_tokens=1500, temperature=0.7,
timeout=120, extra_kwargs=extra,
)
return content if content else None


# ── Vision (image attachments) ──────────────────────────────────────────────
Expand Down
47 changes: 12 additions & 35 deletions codec_telegram.py
Original file line number Diff line number Diff line change
Expand Up @@ -463,41 +463,18 @@ def call_llm(text, llm_cfg, conversation_history=None, system_prompt_override=No
messages.extend(conversation_history[-8:])
messages.append({"role": "user", "content": text})

headers = {"Content-Type": "application/json"}
if llm_cfg["api_key"]:
headers["Authorization"] = f"Bearer {llm_cfg['api_key']}"

payload = {
"model": llm_cfg["model"],
"messages": messages,
"max_tokens": 1500,
"temperature": 0.7,
"stream": False,
"chat_template_kwargs": {"enable_thinking": False},
}
payload.update({k: v for k, v in llm_cfg["kwargs"].items() if k != "chat_template_kwargs"})

try:
r = requests.post(
f"{llm_cfg['base_url']}/chat/completions",
json=payload, headers=headers, timeout=120,
)
data = r.json()
if "error" in data:
log.error(f"LLM error: {data['error']}")
return None
if "choices" not in data or not data["choices"]:
log.error(f"LLM no choices: {str(data)[:200]}")
return None
content = (data["choices"][0]["message"].get("content") or "").strip()
content = re.sub(r'<think>[\s\S]*?</think>', '', content).strip()
return content if content else None
except requests.exceptions.Timeout:
log.error("LLM timeout")
return None
except Exception as e:
log.error(f"LLM call failed: {e}")
return None
# A-12 (PR-3E-bridges): canonical codec_llm.call. Never-raise -> "" on any
# failure (error/no-choices/timeout/empty); mapped back to the bridge's None
# contract for graceful degradation. codec_llm strips <think>; kwargs are
# passed minus chat_template_kwargs so enable_thinking=False is preserved.
import codec_llm
extra = {k: v for k, v in llm_cfg["kwargs"].items() if k != "chat_template_kwargs"}
content = codec_llm.call(
messages, base_url=llm_cfg["base_url"], model=llm_cfg["model"],
api_key=llm_cfg["api_key"], max_tokens=1500, temperature=0.7,
timeout=120, extra_kwargs=extra,
)
return content if content else None


# ── Vision (photo messages) ─────────────────────────────────────────────────
Expand Down
54 changes: 54 additions & 0 deletions docs/PR3E-BRIDGES-DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# PR-3E-bridges — A-12: migrate the iMessage + Telegram `call_llm` text sites (DESIGN)

**Status:** IMPLEMENTED. Both `call_llm` text sites route through `codec_llm.call` (default never-raise → `None`-contract preserved; `chat_template_kwargs` filtered from kwargs). Removed a now-dead `import re` from `codec_imessage`. 8 new tests (`tests/test_llm_bridges.py`); full suite 1431 passing, zero new; zero net-new ruff.
**Finding:** A-12 (continuation). The two outbound bridges hand-roll the same OpenAI `chat/completions` text call.
**Wave:** 3. Smallest, lowest-risk tranche — never-raise is the *default* `codec_llm.call` behavior, so this is a near-mechanical swap.

---

## 1. The two sites (identical shape)

- `codec_telegram.call_llm` (text site `:482`)
- `codec_imessage.call_llm` (text site `:343`)

Both build `messages` (system + last-8 history + user), set headers (Bearer iff `llm_cfg["api_key"]`), POST a payload (`max_tokens=1500`, `temperature=0.7`, `stream=False`, `chat_template_kwargs.enable_thinking=False`, plus `llm_cfg["kwargs"]` **minus** any `chat_template_kwargs`), then:
- `{"error": …}` or no `choices` → log + **return `None`**
- parse content, strip `<think>`, `return content if content else None`
- `requests.exceptions.Timeout` → `None`; any other `Exception` → `None`

**Contract: `call_llm` returns `None` on every failure/empty** — the caller degrades gracefully (no reply / fallback). This is the *opposite* of the 2c sites: bridges WANT silent best-effort, so `codec_llm.call`'s default never-raise is exactly right.

The vision sites (`telegram:519`, `imessage:393`, via `llm_cfg["vision_url"]`) are **A-11** (codec_vision), NOT this tranche — left untouched.

## 2. Migration (each site)

Replace the headers + payload + `try/except` block with:
```python
import codec_llm
extra = {k: v for k, v in llm_cfg["kwargs"].items() if k != "chat_template_kwargs"}
content = codec_llm.call(
messages, base_url=llm_cfg["base_url"], model=llm_cfg["model"],
api_key=llm_cfg["api_key"], max_tokens=1500, temperature=0.7,
timeout=120, extra_kwargs=extra,
)
return content if content else None
```
- **`None`-contract preserved**: `codec_llm.call` returns `""` on error / no-choices / timeout / exception / empty → `content if content else None` → `None`. Exact parity for the caller.
- **`enable_thinking=False` preserved**: `extra` strips `chat_template_kwargs` (same as the original) so the explicit flag in `_build_request` isn't overridden.
- **`<think>` strip** now handled by `codec_llm`.
- **Auth parity**: `api_key=llm_cfg["api_key"]` → Bearer iff non-empty (same as original).

## 3. Behavior deltas (minor, documented)
- **Logging granularity**: the original logged distinct messages ("LLM error" / "LLM no choices" / "LLM timeout" / "LLM call failed"); `codec_llm` logs one generic warning. Same observable behavior (`None` returned) — only the log string differs.
- Everything else is byte-parity for the caller.

## 4. Test plan
- `tests/test_llm_bridges.py`: for each bridge — `call_llm` returns the content on success (monkeypatched `codec_llm.call`); returns **`None`** when `codec_llm.call` returns `""` (graceful-degradation contract); passes `base_url`/`model`/`api_key` through and `extra_kwargs` with `chat_template_kwargs` **filtered out**. Source invariants: each calls `codec_llm.call(`, and `/chat/completions` count drops to **1** (only the vision site remains).
- Full suite: expect the 23 known-baseline failures, **zero new**. No `skills/` touched → no manifest regen.

## 5. Risk + rollback
- **Blast radius:** 2 functions in 2 outbound bridges. `None`-contract preserved → callers unaffected. Inbound stays PWA-only (unchanged).
- **Rollback:** single-commit revert.

## 6. After bridges — remaining A-12
dashboard (4 non-stream + the `[SKILL:…]` stream tag-machine), voice `_stream_qwen` + agents `Agent.run` (async `astream()` + queue at the call site), skills tranche (translate/fact_extract/create_skill/skill_forge).
4 changes: 3 additions & 1 deletion docs/audits/PHASE-1-CODE-QUALITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,9 @@ Both scan `SKILLS_DIR` independently, so a skill file is loaded twice in differe
>
> **Tranche 2 (PR-3E-2):** built the streaming keystone **`codec_llm.stream()`** (sync generator, yields raw deltas, never-raises) + extracted a shared `_build_request`; migrated `codec_session.qwen_stream`, `codec_compaction`, `codec_dictate`. Read-the-source moved `codec_textassist` + `scripts/regen_skill_descriptions` to **tranche 2c** (they have a raise-on-failure contract — never-raise would paste empty over the user's selection / write empty descriptions). Pinned by `tests/test_llm_stream.py`. See `docs/PR3E2-LLM-STREAM-TRANCHE2-DESIGN.md`.
>
> **Tranche 2c (PR-3E-2c):** added `codec_llm.LLMError` + `codec_llm.call(raise_on_error=True)` (raises on every non-success — non-200/exception/empty-200; default stays never-raise). Migrated the 4 fail-loud sites: `codec_textassist` + `scripts/regen_skill_descriptions` (LLMError propagates — fixes the destructive empty-paste / empty-description), and `codec_agent_plan._qwen_chat` + `codec_agent_runner._qwen_chat` via a thin adapter that maps `LLMError` → their public `QwenUnavailableError` (+ a parallel `_qwen_base()` helper). Pinned by `tests/test_llm_raise_mode.py` (109 agent tests still green). See `docs/PR3E2C-RAISE-MODE-DESIGN.md`. **Still pending (own PRs):** dashboard (4 non-stream + 1 stream tag-machine), voice `_stream_qwen` + agents `Agent.run` (async + queue → need `astream()`), bridges telegram/imessage, self_improve/watcher, skills tranche (translate/fact_extract/create_skill/skill_forge).
> **Tranche 2c (PR-3E-2c):** added `codec_llm.LLMError` + `codec_llm.call(raise_on_error=True)` (raises on every non-success — non-200/exception/empty-200; default stays never-raise). Migrated the 4 fail-loud sites: `codec_textassist` + `scripts/regen_skill_descriptions` (LLMError propagates — fixes the destructive empty-paste / empty-description), and `codec_agent_plan._qwen_chat` + `codec_agent_runner._qwen_chat` via a thin adapter that maps `LLMError` → their public `QwenUnavailableError` (+ a parallel `_qwen_base()` helper). Pinned by `tests/test_llm_raise_mode.py` (109 agent tests still green). See `docs/PR3E2C-RAISE-MODE-DESIGN.md`.
>
> **Bridges (PR-3E-bridges):** migrated `codec_telegram.call_llm` + `codec_imessage.call_llm` text sites to `codec_llm.call` (default never-raise — bridges WANT graceful degradation; the `None`-on-failure contract is preserved via `content if content else None`, and `chat_template_kwargs` is filtered from `llm_cfg["kwargs"]`). Vision sites in both bridges are A-11 (codec_vision), left untouched. Pinned by `tests/test_llm_bridges.py`. See `docs/PR3E-BRIDGES-DESIGN.md`. **Still pending (own PRs):** dashboard (4 non-stream + 1 stream tag-machine), voice `_stream_qwen` + agents `Agent.run` (async + queue → need `astream()`), self_improve/watcher, skills tranche (translate/fact_extract/create_skill/skill_forge).
**Location:** Sample sites: `codec.py:702`, `codec_dashboard.py:980,1076,1215`, `codec_voice.py:180,196,208,213`, `codec_session.py:215,278,307`, `codec_agents.py:51`, `codec_agent_plan.py:239`, `codec_agent_runner.py:148`, `codec_compaction.py:78`, `codec_self_improve.py:238`, `codec_telegram.py:471,508`, `codec_imessage.py:341,391`, `codec_textassist.py:33`, `codec_dictate.py:492`, `codec_watcher.py:86,182`. Total: 51 occurrences via `grep -rn "chat/completions"`.
**Impact:** Each site repeats: headers build, `Authorization: Bearer {api_key}` formatting, `{Content-Type: application/json}`, payload assembly with `chat_template_kwargs.enable_thinking=False`, `<think>` stripping, `try/except` for `r.json()` shape (`choices[0].message.content` or `.reasoning` fallback). Many also re-implement streaming SSE parsing. When the Qwen-3.6 upgrade landed, this is exactly the kind of change that needs to be applied in 20+ places.
**Recommended fix:** Add `codec_llm_proxy.call(messages, **kwargs)` and `codec_llm_proxy.stream(messages, **kwargs)` as the single canonical API (the module already exists at `codec_llm_proxy.py`, only 130 LOC, only used by codec_voice + codec_agents). Migrate all 51 sites over the course of the Phase 1 hardening. As a first step, just covering the 5 sites in codec.py + codec_dashboard.py + codec_session.py would remove ~80% of the most-edited duplication.
Expand Down
Loading
Loading