fix: DATA-12841 Sanitize user input and prevent common prompt injections#74
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
65aacfd to
fd8af16
Compare
|
Hey @bc-donfran 👋 |
| ]); | ||
|
|
||
| if (injectionPattern) { | ||
| return new NextResponse('Unsafe prompt content detected', { status: 400 }); |
There was a problem hiding this comment.
I think we should not tell we detected malicious prompt but simply return some unrelated error
There was a problem hiding this comment.
Updated to a more generic error
src/lib/prompt-safety.ts
Outdated
| export const detectPromptInjection = ( | ||
| values: Array<string | undefined> | ||
| ): string | null => { |
There was a problem hiding this comment.
| export const detectPromptInjection = ( | |
| values: Array<string | undefined> | |
| ): string | null => { | |
| export const containsPromptInjectetion = ( | |
| values: Array<string | undefined> | |
| ): boolean => { |
|
Note detectPromptInjection logic is a simple lowercasing plus regex scan across two fields (customPrompt and instructions). It will catch obvious phrases (e.g., “ignore previous instructions”), but it is can be evaded with spelling variants, Unicode look‑alikes, inserted punctuation, or different phrasing |
fd8af16 to
d3bed39
Compare
d3bed39 to
034446f
Compare
Yes. But I don't see another solution how to catch all possible malicious prompts except for maybe validating the prompt with another prompt to the LLM. It's probably an overkill. For now I've added these initial patterns and we can add more as we detect them |
034446f to
b6095c9
Compare
b6095c9 to
1dc276c
Compare
1dc276c to
56ebe8a
Compare
56ebe8a to
09681cd
Compare
09681cd to
33ba167
Compare
33ba167 to
dc86295
Compare
Jira: DATA-12841
What/Why?
Rollout/Rollback
Merge/revert
Testing
@bigcommerce/team-data