Skip to content

fix: Sanitize hidden Unicode characters from user and tool inputs#2435

Merged
evanliu048 merged 7 commits intomainfrom
remove-unicode-tag
Aug 4, 2025
Merged

fix: Sanitize hidden Unicode characters from user and tool inputs#2435
evanliu048 merged 7 commits intomainfrom
remove-unicode-tag

Conversation

@evanliu048
Copy link
Contributor

@evanliu048 evanliu048 commented Jul 30, 2025

Issue #, if available:

Description of changes:
This PR addresses a potential prompt injection issue by removing hidden/control Unicode characters from:
User input (e.g. chat prompt)
Tool outputs (e.g. fs_read, execute_bash responses)
Specifically, we strip characters in the following ranges:

U+E0000–U+E007F (Unicode TAG)
U+200B–U+200F, U+2028–U+202F, U+205F–U+206F, U+FFF0–U+FFFC, U+FFFF

These characters are rarely used in normal text but may be exploited to hide invisible instructions.
Unit tests are added to verify both detection and removal behavior.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@evanliu048 evanliu048 marked this pull request as draft July 31, 2025 00:09
@evanliu048 evanliu048 marked this pull request as ready for review August 1, 2025 23:53
@dingfeli
Copy link
Contributor

dingfeli commented Aug 4, 2025

LGTM but why is the windows CI failing?

@dingfeli
Copy link
Contributor

dingfeli commented Aug 4, 2025

LGTM but why is the windows CI failing?

Oh I see windows was never supported by Skim. Weird how this failure comes up now and not in any of the PRs before this.

edit: nvm I see that there is a change in the related file.

Copy link
Member

@chaynabors chaynabors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't merge until I chat with appsec

@evanliu048 evanliu048 merged commit b4b4221 into main Aug 4, 2025
15 checks passed
@evanliu048 evanliu048 deleted the remove-unicode-tag branch August 4, 2025 19:14
@evanliu048
Copy link
Contributor Author

LGTM but why is the windows CI failing?

Oh I see windows was never supported by Skim. Weird how this failure comes up now and not in any of the PRs before this.

edit: nvm I see that there is a change in the related file.

re import the func in chat/mod.rs and all the CI passed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants