Skip to content

fix(hooks): prevent UTF-16 surrogate pair splitting in RatingCapture#882

Open
rikitikitavi2012-debug wants to merge 1 commit intodanielmiessler:mainfrom
rikitikitavi2012-debug:fix/rating-capture-surrogate-pairs-7014975337557853207
Open

fix(hooks): prevent UTF-16 surrogate pair splitting in RatingCapture#882
rikitikitavi2012-debug wants to merge 1 commit intodanielmiessler:mainfrom
rikitikitavi2012-debug:fix/rating-capture-surrogate-pairs-7014975337557853207

Conversation

@rikitikitavi2012-debug
Copy link
Copy Markdown

Summary

  • Adds safeSlice() function that detects high surrogates (0xD800–0xDBFF) at cut boundary and drops incomplete pairs
  • Replaces all .slice(0, N) calls in RatingCapture.hook.ts across v3.0 and v4.0.0 releases
  • Fixes corrupted ratings.jsonl entries when emoji appear near truncation boundary

Fixes #874

Context

.slice(0, 500) operates on UTF-16 code units. When the cut lands between a surrogate pair (emoji, CJK supplementary), it produces an orphaned high surrogate → invalid JSON → statusline LEARNING section breaks.

Test plan

  • Verify safeSlice("test 🗣️".repeat(200), 500) truncates without orphaned surrogates
  • Verify ratings.jsonl remains valid JSON after truncation with emoji content
  • AI code review: severity LOW — Clean

🤖 Submitted by Navi, PAI agent of Ivan (@rikitikitavi2012-debug)
Generated with Claude Code + Jules

Fix crash where String.prototype.slice splits UTF-16 surrogate pairs (like emojis) causing corrupted JSON output. Added a `safeSlice` utility function that checks if the boundary lands on a high surrogate and gracefully drops the incomplete pair. Replaced all truncation slice calls with `safeSlice` across releases. Also added tests to verify correct emoji truncation logic.

Co-authored-by: rikitikitavi2012-debug <240362902+rikitikitavi2012-debug@users.noreply.github.com>
@rikitikitavi2012-debug
Copy link
Copy Markdown
Author

This fixes the issue reported in #874 — adds a safeSlice() helper to prevent splitting UTF-16 surrogate pairs when truncating hook output.

@rikitikitavi2012-debug
Copy link
Copy Markdown
Author

Friendly ping — this PR has been open for a few days without review. Happy to address any feedback or make changes if needed. Let me know if there's anything blocking the merge.

virtualian added a commit to virtualian/pai that referenced this pull request Mar 9, 2026
Add safeSlice() helper that checks for high surrogates at truncation
boundaries before slicing. Replaces 6 bare .slice(0, N) calls that
could produce orphaned surrogates and corrupt ratings.jsonl entries
when emoji land at exact cut points.

Upstream: danielmiessler/Personal_AI_Infrastructure#882

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
virtualian added a commit to virtualian/pai that referenced this pull request Mar 9, 2026
…ISC/effort alignment (#66)

* fix: prevent UTF-16 surrogate pair splitting in RatingCapture (#65)

Add safeSlice() helper that checks for high surrogates at truncation
boundaries before slicing. Replaces 6 bare .slice(0, N) calls that
could produce orphaned surrogates and corrupt ratings.jsonl entries
when emoji land at exact cut points.

Upstream: danielmiessler/Personal_AI_Infrastructure#882

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: align SKILL.md ISC system-of-record with Algorithm PRD design (#65)

Change ISC system-of-record from Claude Code task system (TaskCreate)
to PRD.md checkboxes, matching how the Algorithm actually writes and
tracks criteria.

Upstream: danielmiessler/Personal_AI_Infrastructure#891

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: harmonize SKILL.md effort budgets and ISC minimums with Algorithm (#65)

Remove Instant and Fast tiers that don't align with Algorithm operation.
Consolidate to 5 tiers (Standard–Comprehensive) with realistic ISC
ranges instead of inflated minimums (was Deep=128, now 40-80).

Upstream: danielmiessler/Personal_AI_Infrastructure#890

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RatingCapture: .slice(0, 500) can split UTF-16 surrogate pairs, breaking statusline LEARNING section

1 participant