fix(RatingCapture): eliminate false-positive 5s from sentiment prompt#860
Open
rikitikitavi2012-debug wants to merge 1 commit intodanielmiessler:mainfrom
Open
Conversation
Root cause: The sentiment analysis prompt defined `5: Neutral (no strong sentiment)` in the RATING SCALE, contradicting the null-return instructions for neutral messages. Haiku resolved the ambiguity by returning rating 5 (the "safer" answer) instead of null for ~80% of messages. Three changes: 1. Restructure rating scale — 5 now requires evidence of mild satisfaction. Add explicit `null ≠ 5` directive with examples. Default to null. 2. Replace "Pure neutral is RARE, default to detecting signal" with structured WHEN TO RETURN null / WHEN TO ASSIGN sections that make null the explicit default. 3. Remove fake rating-5 write in inference error catch block. Inference timeout/failure was writing `rating: 5, INFERENCE_FAILED` entries, polluting data with additional false neutrals. Evidence from real deployment (n=783): 625 entries (79.8%) were rating 5 with summaries like "no sentiment toward assistant" — all should have been null. Existing null-skip code at line 486 works correctly; the model just never triggered it. Fixes danielmiessler#842 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
Friendly ping — this fixes a 52% false-positive rate in RatingCapture where the sentiment prompt was misclassifying neutral inputs as rating 5. |
Author
|
Friendly ping — this PR has been open for a few days without review. Happy to address any feedback or make changes if needed. Let me know if there's anything blocking the merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the RatingCapture sentiment analysis prompt that causes ~80% of all implicit ratings to be meaningless
5/10entries, making sparklines, trend detection, and performance averages unreliable.Three changes (prompt-only, no code logic changes):
Restructure rating scale —
5: Neutralremoved. Rating 5 now requires evidence of mild satisfaction. Added explicitnull ≠ 5directive: "DEFAULT TO null. Only assign a numeric rating when you can point to specific words or tone indicating sentiment."Replace "default to detecting signal" — The old
RULE: Pure neutral is RARE in responses. Default to detecting signal, not returning null.pushed the model toward always assigning a rating. Replaced with structuredWHEN TO RETURN null/WHEN TO ASSIGN A RATINGsections that make null the explicit default.Remove fake rating-5 on inference failure — The catch block wrote
rating: 5, sentiment_summary: "INFERENCE_FAILED"on timeout/error, polluting data with additional false neutrals. Now only logs to stderr.Evidence (from real deployment, n=192)
All 100 false-positive 5s had summaries like "Technical question, no sentiment toward Navi" and "Task completion notification, no emotional content" with confidence 0.95 — the model knew there was no sentiment but returned 5 instead of null due to the ambiguous rating scale.
The existing null-skip code (
if (sentiment.rating === null) { process.exit(0) }) works correctly; the model just never triggered it.Test plan
source: "implicit"+rating: 5+ neutral summariesFixes #842
🤖 Generated with Claude Code