feat: learn from user label actions as AI classification hints#1993
feat: learn from user label actions as AI classification hints#1993gentlemandev wants to merge 7 commits intomainfrom
Conversation
Capture Gmail label add/remove events as sender classification feedback and inject them as context hints into the AI rule-selection prompt. When a user drags an email to a folder in their mail client, that classification signal is now recorded and used to improve future AI decisions for that sender. - New SenderClassification table stores individual events with threadId/messageId for auditability (not aggregated counts) - Subjects fetched at prompt time via batch Gmail API (no email content stored in DB for privacy) - Self-labeling filter prevents feedback loop from system-applied labels - Conversation-tracking rules (To Reply, FYI, etc.) excluded via existing shouldLearn config - Unique constraint prevents duplicate rows from webhook retries - Shared GMAIL_SYSTEM_LABELS constant and findRuleByLabelId helper extracted to reduce duplication across webhook handlers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
There was a problem hiding this comment.
4 issues found across 10 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/web/prisma/schema.prisma">
<violation number="1" location="apps/web/prisma/schema.prisma:690">
P3: Remove the redundant `@@index([emailAccountId, sender])` because it is already covered by the composite unique index.
(Based on your team's feedback about avoiding redundant indexes when a unique constraint already provides the same indexing.) [FEEDBACK_USED]</violation>
</file>
<file name="apps/web/utils/rule/sender-classification.ts">
<violation number="1" location="apps/web/utils/rule/sender-classification.ts:130">
P2: Escape subject and rule name before injecting them into the XML-like prompt block.</violation>
<violation number="2" location="apps/web/utils/rule/sender-classification.ts:164">
P1: Avoid ambiguous `findFirst` for label-to-rule mapping; handle multiple matches explicitly to prevent misclassification.</violation>
</file>
<file name="apps/web/app/api/google/webhook/process-label-added-event.ts">
<violation number="1" location="apps/web/app/api/google/webhook/process-label-added-event.ts:180">
P1: The self-labeling check is too broad: any past system-applied label on the message causes future user re-adds of that label to be skipped.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| labelId: string; | ||
| emailAccountId: string; | ||
| }) { | ||
| return prisma.rule.findFirst({ |
There was a problem hiding this comment.
P1: Avoid ambiguous findFirst for label-to-rule mapping; handle multiple matches explicitly to prevent misclassification.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/rule/sender-classification.ts, line 164:
<comment>Avoid ambiguous `findFirst` for label-to-rule mapping; handle multiple matches explicitly to prevent misclassification.</comment>
<file context>
@@ -0,0 +1,176 @@
+ labelId: string;
+ emailAccountId: string;
+}) {
+ return prisma.rule.findFirst({
+ where: {
+ emailAccountId,
</file context>
There was a problem hiding this comment.
This is consistent with the existing codebase pattern — process-label-removed-event.ts uses the same findFirst query for the same label-to-rule lookup. In practice, a labelId maps to a single rule because each rule creates/owns its label. If multiple rules ever share a label, both the existing and new code would need updating.
There was a problem hiding this comment.
Thanks for the feedback! I've saved this as a new learning to improve future reviews.
| if (!isEligibleForClassificationFeedback(rule.systemType)) return; | ||
|
|
||
| // Self-labeling filter: skip if Inbox Zero already applied this label | ||
| const systemApplied = await wasLabelAppliedBySystem({ |
There was a problem hiding this comment.
P1: The self-labeling check is too broad: any past system-applied label on the message causes future user re-adds of that label to be skipped.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/app/api/google/webhook/process-label-added-event.ts, line 180:
<comment>The self-labeling check is too broad: any past system-applied label on the message causes future user re-adds of that label to be skipped.</comment>
<file context>
@@ -152,3 +154,73 @@ export async function handleLabelAddedEvent(
+ if (!isEligibleForClassificationFeedback(rule.systemType)) return;
+
+ // Self-labeling filter: skip if Inbox Zero already applied this label
+ const systemApplied = await wasLabelAppliedBySystem({
+ messageId,
+ emailAccountId,
</file context>
There was a problem hiding this comment.
This is an intentional trade-off. The scenario (system applies label → user removes → user re-adds same label) is rare, and the LABEL_REMOVED event from step 2 already captures the user's disagreement. Without the filter, every system-applied label would create a feedback row, creating a self-reinforcing loop where the AI's own decisions inflate the classification counts — which is much worse.
There was a problem hiding this comment.
Thanks for the feedback! I've saved this as a new learning to improve future reviews.
| const lines: string[] = []; | ||
|
|
||
| for (const classification of classifications) { | ||
| const ruleName = classification.rule?.name ?? "Unknown"; |
There was a problem hiding this comment.
P2: Escape subject and rule name before injecting them into the XML-like prompt block.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/rule/sender-classification.ts, line 130:
<comment>Escape subject and rule name before injecting them into the XML-like prompt block.</comment>
<file context>
@@ -0,0 +1,176 @@
+ const lines: string[] = [];
+
+ for (const classification of classifications) {
+ const ruleName = classification.rule?.name ?? "Unknown";
+ const subject = subjects.get(classification.messageId);
+
</file context>
There was a problem hiding this comment.
The existing codebase already injects raw email subjects and body content into LLM prompts without escaping (see stringifyEmail in ai-choose-rule.ts). The PROMPT_SECURITY_INSTRUCTIONS handle prompt injection concerns at the model level. Adding escaping here would be inconsistent with the rest of the prompt construction and wouldn't provide additional security since these XML-like tags are LLM prompt structure, not parsed by an XML processor.
There was a problem hiding this comment.
Thanks for the feedback! I've updated an existing learning with this new information.
The @@unique([emailAccountId, sender, ruleId, messageId, eventType]) already serves as a prefix index for (emailAccountId, sender) queries in PostgreSQL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update mocks for GMAIL_SYSTEM_LABELS, fetchSenderFromMessage, findRuleByLabelId, and sender-classification dependencies - Fix "should skip non-SPAM labels" test to reflect new behavior (system labels skip, non-system labels now record classification) - Update label-removed tests to use findRuleByLabelId mock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unit tests (sender-classification.test.ts): - saveSenderClassification: lowercase normalization, upsert dedup - getSenderClassificationsForPrompt: null when empty, subject formatting, LABEL_REMOVED formatting, deleted messages, batch fetch failure - findRuleByLabelId: match and no-match cases Eval tests (sender-classification-hint.test.ts): - Split sender (Amazon): receipt vs marketing with classification history — AI correctly uses hints to distinguish - Split sender (Google): calendar vs notification - Correction signal: label removal steers away from wrong rule - Strong consistent history reinforces correct classification - Hint does NOT override clear email content (personal email stays Conversations despite notification history) - Baseline: same email without hint still works correctly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 5 new eval cases for sender classification hints: - Move-to-label pattern: remove+add from user moving email between folders (Notion changelog: Newsletter → Notification) - Weak signal: single data point should not override clear content (Cal.com booking is Calendar despite 1 Newsletter classification) - Contradictory history: user went back and forth, AI relies on content (Figma Config event → Marketing) - Hint should NOT override content: real receipt from sender with mostly marketing history (Uber trip receipt) - Split SaaS sender: Stripe sends receipts, notifications, and marketing (payout notification with mixed history) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The table stores feedback from user label actions, not classifications of senders. The new name better describes what the data is for (improving classification) rather than what it's about (a sender). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aiChooseRule now accepts ClassificationFeedbackItem[] and formats the prompt internally. Tests pass structured data instead of hardcoded prompt strings, so changing the prompt format doesn't break tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Capture user label add/remove events from Gmail webhooks as sender classification feedback, injected as context hints into the AI rule-selection prompt. When a user drags an email into a folder in their mail client, that classification signal improves future AI decisions for that sender.
TLDR: The webhook pipeline already receives label events but drops non-SPAM ones. This PR records them and shows them to the AI as advisory context (not hard rules), enabling the system to learn from user behavior — especially for split senders like Amazon that send both receipts and marketing.
SenderClassificationmodel stores individual events with threadId/messageId for auditabilityshouldLearnconfigGMAIL_SYSTEM_LABELSconstant andfindRuleByLabelIdhelper to reduce duplicationfetchSenderFromMessageshared helper deduplicates ~30 lines of identical error handlingTest plan
🤖 Generated with Claude Code