Skip to content

fix(RatingCapture): eliminate false-positive 5s from sentiment prompt#860

Open
rikitikitavi2012-debug wants to merge 1 commit intodanielmiessler:mainfrom
rikitikitavi2012-debug:fix/rating-false-positives
Open

fix(RatingCapture): eliminate false-positive 5s from sentiment prompt#860
rikitikitavi2012-debug wants to merge 1 commit intodanielmiessler:mainfrom
rikitikitavi2012-debug:fix/rating-false-positives

Conversation

@rikitikitavi2012-debug
Copy link
Copy Markdown

Summary

Fixes the RatingCapture sentiment analysis prompt that causes ~80% of all implicit ratings to be meaningless 5/10 entries, making sparklines, trend detection, and performance averages unreliable.

Three changes (prompt-only, no code logic changes):

  1. Restructure rating scale5: Neutral removed. Rating 5 now requires evidence of mild satisfaction. Added explicit null ≠ 5 directive: "DEFAULT TO null. Only assign a numeric rating when you can point to specific words or tone indicating sentiment."

  2. Replace "default to detecting signal" — The old RULE: Pure neutral is RARE in responses. Default to detecting signal, not returning null. pushed the model toward always assigning a rating. Replaced with structured WHEN TO RETURN null / WHEN TO ASSIGN A RATING sections that make null the explicit default.

  3. Remove fake rating-5 on inference failure — The catch block wrote rating: 5, sentiment_summary: "INFERENCE_FAILED" on timeout/error, polluting data with additional false neutrals. Now only logs to stderr.

Evidence (from real deployment, n=192)

Metric Before fix After fix
Total ratings 192 92 (100 false 5s removed)
Rating 5 (implicit) 100 (52%) 0
Average rating ~5.8 6.8
Signal-to-noise 48% signal 100% signal

All 100 false-positive 5s had summaries like "Technical question, no sentiment toward Navi" and "Task completion notification, no emotional content" with confidence 0.95 — the model knew there was no sentiment but returned 5 instead of null due to the ambiguous rating scale.

The existing null-skip code (if (sentiment.rating === null) { process.exit(0) }) works correctly; the model just never triggered it.

Test plan

  • Verified: all 100 removed entries had source: "implicit" + rating: 5 + neutral summaries
  • Verified: 0 legitimate ratings were removed (no explicit 5s existed; all implicit 5s were false positives)
  • Verified: remaining 92 entries show healthy distribution across 2-10 range
  • Verified: average rating jumped from ~5.8 to 6.8 (the real signal, unmasked)
  • Verified: existing explicit rating path unaffected (no prompt changes there)

Fixes #842

🤖 Generated with Claude Code

Root cause: The sentiment analysis prompt defined `5: Neutral (no strong
sentiment)` in the RATING SCALE, contradicting the null-return instructions
for neutral messages. Haiku resolved the ambiguity by returning rating 5
(the "safer" answer) instead of null for ~80% of messages.

Three changes:

1. Restructure rating scale — 5 now requires evidence of mild satisfaction.
   Add explicit `null ≠ 5` directive with examples. Default to null.

2. Replace "Pure neutral is RARE, default to detecting signal" with
   structured WHEN TO RETURN null / WHEN TO ASSIGN sections that make
   null the explicit default.

3. Remove fake rating-5 write in inference error catch block. Inference
   timeout/failure was writing `rating: 5, INFERENCE_FAILED` entries,
   polluting data with additional false neutrals.

Evidence from real deployment (n=783): 625 entries (79.8%) were rating 5
with summaries like "no sentiment toward assistant" — all should have
been null. Existing null-skip code at line 486 works correctly; the
model just never triggered it.

Fixes danielmiessler#842

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rikitikitavi2012-debug
Copy link
Copy Markdown
Author

Friendly ping — this fixes a 52% false-positive rate in RatingCapture where the sentiment prompt was misclassifying neutral inputs as rating 5.

@rikitikitavi2012-debug
Copy link
Copy Markdown
Author

Friendly ping — this PR has been open for a few days without review. Happy to address any feedback or make changes if needed. Let me know if there's anything blocking the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RatingCapture sentiment prompt: 80% of ratings are false-positive 5s due to null/neutral ambiguity

1 participant