Skip to content

Comments

fix: DATA-12841 Sanitize user input and prevent common prompt injections#74

Merged
bc-rmalyavc merged 1 commit intomainfrom
codex/fix-prompt-injection-vulnerability
Dec 23, 2025
Merged

fix: DATA-12841 Sanitize user input and prevent common prompt injections#74
bc-rmalyavc merged 1 commit intomainfrom
codex/fix-prompt-injection-vulnerability

Conversation

@bc-rmalyavc
Copy link
Contributor

@bc-rmalyavc bc-rmalyavc commented Dec 18, 2025

Jira: DATA-12841

What/Why?

  • Security team has reported a vulnerability. They were able to perform a basic attack replacing the current prompt for a product description with a malicious one that made gemini to enter into debug mode and provide sensitive information. With this change we attempt to prevent it by introducing common injection patterns and respond with error when one detected. So far we are adding an initial lists of injection patterns, which may be extended over time in case we detect new ones.
  • Also update ubuntu version in order to unblock build workflow that previously failed on PRs

Rollout/Rollback

Merge/revert

Testing

  • Unit test
  • Manually on dev and preview env. Attempted execute the same request suggested by the Security team and verified that the service is responding with 400 response upon such requests. Tested with custom prompt mode and also with a request suggested by the Security team where we inject a malicious prompt as a part of attributes in body. Verified that the service is still responding with valid product descriptions upon originally intended (valid) prompts.
image image

@bigcommerce/team-data

@vercel
Copy link

vercel bot commented Dec 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
ai-app-foundation Ready Ready Preview, Comment Dec 23, 2025 1:44pm

@bc-rmalyavc bc-rmalyavc requested a review from a team as a code owner December 18, 2025 14:46
@bc-rmalyavc bc-rmalyavc changed the title Broaden prompt injection pattern coverage Sanitize user input and prevent common injections Dec 18, 2025
@bc-rmalyavc bc-rmalyavc changed the title Sanitize user input and prevent common injections fix: DATA-12841 Sanitize user input and prevent common injections Dec 18, 2025
@bc-rmalyavc bc-rmalyavc changed the title fix: DATA-12841 Sanitize user input and prevent common injections fix: DATA-12841 Sanitize user input and prevent common prompt injections Dec 18, 2025
@bc-rmalyavc bc-rmalyavc force-pushed the codex/fix-prompt-injection-vulnerability branch from 65aacfd to fd8af16 Compare December 18, 2025 16:22
@bc-rmalyavc
Copy link
Contributor Author

Hey @bc-donfran 👋
Could you please take a look at this PR? I'm trying to address the vulnerability you found and would like your opinion about this approach. It's probably not ideal and there still may be ways to bypass it, though so far I added patterns that came to my mind initially and we can later add more injection patterns as we discover them.

]);

if (injectionPattern) {
return new NextResponse('Unsafe prompt content detected', { status: 400 });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not tell we detected malicious prompt but simply return some unrelated error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to a more generic error

Comment on lines 9 to 11
export const detectPromptInjection = (
values: Array<string | undefined>
): string | null => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export const detectPromptInjection = (
values: Array<string | undefined>
): string | null => {
export const containsPromptInjectetion = (
values: Array<string | undefined>
): boolean => {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@solofeed
Copy link
Collaborator

Note detectPromptInjection logic is a simple lowercasing plus regex scan across two fields (customPrompt and instructions). It will catch obvious phrases (e.g., “ignore previous instructions”), but it is can be evaded with spelling variants, Unicode look‑alikes, inserted punctuation, or different phrasing

@bc-rmalyavc
Copy link
Contributor Author

Note detectPromptInjection logic is a simple lowercasing plus regex scan across two fields (customPrompt and instructions). It will catch obvious phrases (e.g., “ignore previous instructions”), but it is can be evaded with spelling variants, Unicode look‑alikes, inserted punctuation, or different phrasing

Yes. But I don't see another solution how to catch all possible malicious prompts except for maybe validating the prompt with another prompt to the LLM. It's probably an overkill. For now I've added these initial patterns and we can add more as we detect them

@bc-rmalyavc bc-rmalyavc force-pushed the codex/fix-prompt-injection-vulnerability branch from 33ba167 to dc86295 Compare December 23, 2025 13:43
@bc-rmalyavc bc-rmalyavc merged commit 42347ce into main Dec 23, 2025
3 checks passed
@bc-rmalyavc bc-rmalyavc deleted the codex/fix-prompt-injection-vulnerability branch December 23, 2025 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants