feat(email-triage): read-only inbox classification + ranked digest (local-first)#176
Merged
Conversation
…ocal-first)
Phase 3 of the sequence (Cookbook → Compare → Email triage → Image). v1 is
deliberately READ-ONLY per the agreed scope: read a recent inbox window via the
user's existing Gmail OAuth, classify each message, return a ranked digest.
Applies NO labels, creates NO drafts, sends NOTHING — labels/drafts are later
consent-gated phases (outbound is bridge-only per the operating principles).
Local-first: classification runs on the LOCAL Qwen (config llm_base_url) in a
single batched call, so email content never leaves the machine; only metadata
(sender/subject/category/priority/one-line reason) is returned.
codec_email_triage.py (engine — Gmail API + LLM live here so the skill stays
AST-gate-clean):
fetch_recent reuse codec_google_auth.build_service (same path as google_gmail);
read-only list + get(metadata) only — a test asserts no mutating call
classify one LLM call classifies the whole batch → JSON
[{idx,category,priority,reason}]; tolerant parse (strips code
fences / prose, coerces unknown enums); LLM failure → every
message 'unclassified'/'medium' (digest still works)
triage fetch → classify → rank (priority, then category, unread-first)
→ {count, items, by_priority, by_category}
categories: lead / support / personal / transactional / noise; priority: high/medium/low.
skills/email_triage.py (thin — imports codec_email_triage + re): parses count +
unread scope, formats a priority-grouped digest, turns an auth error into a
"connect Google first" message. SKILL_MCP_EXPOSE=True (read-only; mirrors
google_gmail's exposure — digest is metadata only). Manifest → 84.
Tests: tests/test_email_triage.py — 17, Gmail service + local LLM mocked
(offline, no real inbox): fetch parse/cleanup + read-only-calls-only assertion,
classification parsing (fences/prose/enum-coercion/garbage/out-of-range),
single-batch-call + LLM-failure fallback, triage ranking + empty inbox, skill
format/unread-query/auth-error/no-messages + discovery. Full suite: 2,184 passed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 (Cookbook → Compare → Email triage → Image). The agreed v1 scope: read-only triage digest. Read a recent inbox window → classify each message → return a ranked digest. No labels, no drafts, no send — those are later consent-gated phases.
Local-first by design: classification runs on the local Qwen (
llm_base_url) in one batched call, so email content never leaves the machine — only the digest metadata (sender / subject / category / priority / reason) comes back. Aligns with CODEC's zero-data-leaves principle and reads the user's own mailbox (pull, not an inbound channel).Engine —
codec_email_triage.pyfetch_recentcodec_google_auth.build_service(same path asgoogle_gmail); read-onlylist+get(metadata)— a test asserts no mutating Gmail method is ever calledclassify[{idx,category,priority,reason}]; tolerant parse (strips fences/prose, coerces unknown enums); LLM failure → allunclassified/medium(digest still works)triage{count, items, by_priority, by_category}categories: lead / support / personal / transactional / noise · priority: high / medium / low
Skill —
skills/email_triage.pyThin (imports
codec_email_triage+re→ passes the AST gate). Parses count +unreadscope, renders a priority-grouped digest (🔴/🟡/⚪), turns an auth error into a friendly "connect Google first".SKILL_MCP_EXPOSE=True— read-only, mirrorsgoogle_gmail.Test plan
tests/test_email_triage.py— 17 tests, Gmail service + local LLM mocked (offline): fetch parse/cleanup + read-only-calls-only assertion, classification parsing (code-fence/prose/enum-coercion/garbage/out-of-range-idx), single-batch-call + LLM-failure fallback, triage ranking + empty inbox, skill format/unread-query/auth-error/no-messages, discovery + MCP exposurepython3.13 -m pytest --ignore=tests/test_skills.py -q→ 2,184 passed, 77 skippedruff check: 0 issuesRoadmap note
v2 (when you want it) layers the action surface you deferred: + Gmail labels then + draft replies — both build on this read-only base; send stays out (bridge + consent).
🤖 Generated with Claude Code