Skip to content

feat: learn from user label actions as AI classification hints#1993

Open
gentlemandev wants to merge 7 commits intomainfrom
feat/sender-classification-feedback
Open

feat: learn from user label actions as AI classification hints#1993
gentlemandev wants to merge 7 commits intomainfrom
feat/sender-classification-feedback

Conversation

@gentlemandev
Copy link
Collaborator

Summary

Capture user label add/remove events from Gmail webhooks as sender classification feedback, injected as context hints into the AI rule-selection prompt. When a user drags an email into a folder in their mail client, that classification signal improves future AI decisions for that sender.

TLDR: The webhook pipeline already receives label events but drops non-SPAM ones. This PR records them and shows them to the AI as advisory context (not hard rules), enabling the system to learn from user behavior — especially for split senders like Amazon that send both receipts and marketing.

  • New SenderClassification model stores individual events with threadId/messageId for auditability
  • Subjects fetched at prompt time via batch Gmail API — no email content stored in DB (privacy)
  • Self-labeling filter: skips when Inbox Zero applied the same label, records user reclassifications
  • Conversation-tracking rules (To Reply, Awaiting Reply, FYI, Actioned) excluded via existing shouldLearn config
  • Unique constraint prevents duplicate rows from webhook retries
  • Extracted shared GMAIL_SYSTEM_LABELS constant and findRuleByLabelId helper to reduce duplication
  • fetchSenderFromMessage shared helper deduplicates ~30 lines of identical error handling

Test plan

  • Verify label-add webhook for non-SPAM labels creates SenderClassification rows
  • Verify SPAM label-add still triggers cold email learning (unchanged behavior)
  • Verify self-labeling filter: system-applied labels are not recorded as classification feedback
  • Verify label-removal records LABEL_REMOVED classification alongside existing exclusion GroupItem
  • Verify AI prompt includes sender classifications when they exist for the sender
  • Verify AI prompt is unmodified when no classifications exist
  • Verify conversation-tracking rules (TO_REPLY, FYI, etc.) are excluded from feedback
  • Verify webhook retry deduplication via unique constraint

🤖 Generated with Claude Code

Capture Gmail label add/remove events as sender classification feedback
and inject them as context hints into the AI rule-selection prompt.
When a user drags an email to a folder in their mail client, that
classification signal is now recorded and used to improve future
AI decisions for that sender.

- New SenderClassification table stores individual events with
  threadId/messageId for auditability (not aggregated counts)
- Subjects fetched at prompt time via batch Gmail API (no email
  content stored in DB for privacy)
- Self-labeling filter prevents feedback loop from system-applied labels
- Conversation-tracking rules (To Reply, FYI, etc.) excluded via
  existing shouldLearn config
- Unique constraint prevents duplicate rows from webhook retries
- Shared GMAIL_SYSTEM_LABELS constant and findRuleByLabelId helper
  extracted to reduce duplication across webhook handlers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
inbox-zero Ignored Ignored Preview Mar 23, 2026 10:15am

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 10 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/web/prisma/schema.prisma">

<violation number="1" location="apps/web/prisma/schema.prisma:690">
P3: Remove the redundant `@@index([emailAccountId, sender])` because it is already covered by the composite unique index.

(Based on your team's feedback about avoiding redundant indexes when a unique constraint already provides the same indexing.) [FEEDBACK_USED]</violation>
</file>

<file name="apps/web/utils/rule/sender-classification.ts">

<violation number="1" location="apps/web/utils/rule/sender-classification.ts:130">
P2: Escape subject and rule name before injecting them into the XML-like prompt block.</violation>

<violation number="2" location="apps/web/utils/rule/sender-classification.ts:164">
P1: Avoid ambiguous `findFirst` for label-to-rule mapping; handle multiple matches explicitly to prevent misclassification.</violation>
</file>

<file name="apps/web/app/api/google/webhook/process-label-added-event.ts">

<violation number="1" location="apps/web/app/api/google/webhook/process-label-added-event.ts:180">
P1: The self-labeling check is too broad: any past system-applied label on the message causes future user re-adds of that label to be skipped.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

labelId: string;
emailAccountId: string;
}) {
return prisma.rule.findFirst({
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Avoid ambiguous findFirst for label-to-rule mapping; handle multiple matches explicitly to prevent misclassification.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/rule/sender-classification.ts, line 164:

<comment>Avoid ambiguous `findFirst` for label-to-rule mapping; handle multiple matches explicitly to prevent misclassification.</comment>

<file context>
@@ -0,0 +1,176 @@
+  labelId: string;
+  emailAccountId: string;
+}) {
+  return prisma.rule.findFirst({
+    where: {
+      emailAccountId,
</file context>
Fix with Cubic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with the existing codebase pattern — process-label-removed-event.ts uses the same findFirst query for the same label-to-rule lookup. In practice, a labelId maps to a single rule because each rule creates/owns its label. If multiple rules ever share a label, both the existing and new code would need updating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've saved this as a new learning to improve future reviews.

if (!isEligibleForClassificationFeedback(rule.systemType)) return;

// Self-labeling filter: skip if Inbox Zero already applied this label
const systemApplied = await wasLabelAppliedBySystem({
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The self-labeling check is too broad: any past system-applied label on the message causes future user re-adds of that label to be skipped.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/app/api/google/webhook/process-label-added-event.ts, line 180:

<comment>The self-labeling check is too broad: any past system-applied label on the message causes future user re-adds of that label to be skipped.</comment>

<file context>
@@ -152,3 +154,73 @@ export async function handleLabelAddedEvent(
+  if (!isEligibleForClassificationFeedback(rule.systemType)) return;
+
+  // Self-labeling filter: skip if Inbox Zero already applied this label
+  const systemApplied = await wasLabelAppliedBySystem({
+    messageId,
+    emailAccountId,
</file context>
Fix with Cubic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an intentional trade-off. The scenario (system applies label → user removes → user re-adds same label) is rare, and the LABEL_REMOVED event from step 2 already captures the user's disagreement. Without the filter, every system-applied label would create a feedback row, creating a self-reinforcing loop where the AI's own decisions inflate the classification counts — which is much worse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've saved this as a new learning to improve future reviews.

const lines: string[] = [];

for (const classification of classifications) {
const ruleName = classification.rule?.name ?? "Unknown";
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Escape subject and rule name before injecting them into the XML-like prompt block.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/rule/sender-classification.ts, line 130:

<comment>Escape subject and rule name before injecting them into the XML-like prompt block.</comment>

<file context>
@@ -0,0 +1,176 @@
+  const lines: string[] = [];
+
+  for (const classification of classifications) {
+    const ruleName = classification.rule?.name ?? "Unknown";
+    const subject = subjects.get(classification.messageId);
+
</file context>
Fix with Cubic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing codebase already injects raw email subjects and body content into LLM prompts without escaping (see stringifyEmail in ai-choose-rule.ts). The PROMPT_SECURITY_INSTRUCTIONS handle prompt injection concerns at the model level. Adding escaping here would be inconsistent with the rest of the prompt construction and wouldn't provide additional security since these XML-like tags are LLM prompt structure, not parsed by an XML processor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've updated an existing learning with this new information.

elie222 and others added 6 commits March 22, 2026 21:11
The @@unique([emailAccountId, sender, ruleId, messageId, eventType])
already serves as a prefix index for (emailAccountId, sender) queries
in PostgreSQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update mocks for GMAIL_SYSTEM_LABELS, fetchSenderFromMessage,
  findRuleByLabelId, and sender-classification dependencies
- Fix "should skip non-SPAM labels" test to reflect new behavior
  (system labels skip, non-system labels now record classification)
- Update label-removed tests to use findRuleByLabelId mock

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unit tests (sender-classification.test.ts):
- saveSenderClassification: lowercase normalization, upsert dedup
- getSenderClassificationsForPrompt: null when empty, subject
  formatting, LABEL_REMOVED formatting, deleted messages, batch
  fetch failure
- findRuleByLabelId: match and no-match cases

Eval tests (sender-classification-hint.test.ts):
- Split sender (Amazon): receipt vs marketing with classification
  history — AI correctly uses hints to distinguish
- Split sender (Google): calendar vs notification
- Correction signal: label removal steers away from wrong rule
- Strong consistent history reinforces correct classification
- Hint does NOT override clear email content (personal email
  stays Conversations despite notification history)
- Baseline: same email without hint still works correctly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 5 new eval cases for sender classification hints:

- Move-to-label pattern: remove+add from user moving email between
  folders (Notion changelog: Newsletter → Notification)
- Weak signal: single data point should not override clear content
  (Cal.com booking is Calendar despite 1 Newsletter classification)
- Contradictory history: user went back and forth, AI relies on
  content (Figma Config event → Marketing)
- Hint should NOT override content: real receipt from sender with
  mostly marketing history (Uber trip receipt)
- Split SaaS sender: Stripe sends receipts, notifications, and
  marketing (payout notification with mixed history)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The table stores feedback from user label actions, not classifications
of senders. The new name better describes what the data is for
(improving classification) rather than what it's about (a sender).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aiChooseRule now accepts ClassificationFeedbackItem[] and formats the
prompt internally. Tests pass structured data instead of hardcoded
prompt strings, so changing the prompt format doesn't break tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants