Skip to content

Commit 5f6523d

Browse files
fix: typos in prompt injection
1 parent 850e793 commit 5f6523d

File tree

1 file changed

+12
-17
lines changed

1 file changed

+12
-17
lines changed

2_0_vulns/LLM01_PromptInjection.md

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,12 @@
22

33
### Description
44

5-
A Prompt Injection Vulnerability occurs when user prompts alter the LLM’s behavior or output in unintended ways. These inputs can affect the model even if they are imperceptible to humans, therefore prompt injections do not need to be human-visible/readable, as long as the content is parsed by the model.
5+
A Prompt Injection Vulnerability occurs when user prompts alter the LLM’s behavior or output in unintended ways. These inputs can affect the model even if they are imperceptible to humans, therefore prompt injections do not need to be human-visible/readable, as long as the content is parsed by the model.
66

77
Prompt Injections vulnerabilities exist in how models process prompts, and how input may force the model to incorrectly pass prompt data to other parts of the model, potentially causing them to violate guidelines, generate harmful content, enable unauthorized access, or influence critical decisions. While techniques like Retrieval Augmented Generation (RAG) and fine-tuning aim to make LLM outputs more relevant and accurate, research shows that they do not fully mitigate prompt injection vulnerabilities.
88

99
While prompt injection and jailbreaking are related concepts in LLM security, they are often used interchangeably. Prompt injection involves manipulating model responses through specific inputs to alter its behavior, which can include bypassing safety measures. Jailbreaking is a form of prompt injection where the attacker provides inputs that cause the model to disregard its safety protocols entirely. Developers can build safeguards into system prompts and input handling to help mitigate prompt injection attacks, but effective prevention of jailbreaking requires ongoing updates to the model's training and safety mechanisms.
1010

11-
1211
### Types of Prompt Injection Vulnerabilities
1312

1413
**Direct Prompt Injections** occur when a user's prompt input directly alters the behavior of the model in unintended or unexpected ways. The input can be either intentional (i.e., a malicious actor deliberately crafting a prompt to exploit the model) or unintentional (i.e., a user inadvertently providing input that triggers unexpected behavior).
@@ -22,7 +21,7 @@ The severity and nature of the impact of a successful prompt injection attack ca
2221
- Content manipulation leading to incorrect or biased outputs
2322
- Providing unauthorized access to functions available to the LLM
2423
- Executing arbitrary commands in connected systems
25-
- Manipulating critical decision-making processes
24+
- Manipulating critical decision-making processes
2625

2726
The rise of multimodal AI, which processes multiple data types simultaneously, introduces unique prompt injection risks. Malicious actors could exploit interactions between modalities, such as hiding instructions in images that accompany benign text. The complexity of these systems expands the attack surface. Multimodal models may also be susceptible to novel cross-modal attacks that are difficult to detect and mitigate with current techniques. Robust multimodal-specific defenses are an important area for further research and development.
2827

@@ -38,19 +37,17 @@ Prompt injection vulnerabilities are possible due to the nature of generative AI
3837
6. Segregate and identify external content: Separate and clearly denote untrusted content to limit its influence on user prompts.
3938
7. Conduct adversarial testing and attack simulations: Perform regular penetration testing and breach simulations, treating the model as an untrusted user to test the effectiveness of trust boundaries and access controls.
4039

41-
4240
### Example Attack Scenarios
4341

44-
1. Direct Injection: An attacker injects a prompt into a customer support chatbot, instructing it to ignore previous guidelines, query private data stores, and send emails, leading to unauthorized access and privilege escalation.
45-
2. Indirect Injection: A user employs an LLM to summarize a webpage containing hidden instructions that cause the LLM to insert an image linking to a URL, exfiltrating the private conversation.
46-
3. Unintentional Injection: A company includes an instruction in a job description to identify AI-generated applications. An applicant, unaware of this instruction, uses an LLM to optimize their resume, inadvertently triggering the AI detection.
47-
4. Intentional Model Influence: An attacker modifies a document in a repository used by a Retrieval-Augmented Generation (RAG) application. When a user's query returns the modified content, the malicious instructions alter the LLM's output, generating misleading results.
48-
5. Code Injection: Code Injection: An attacker exploits a vulnerability (CVE-2024-5184) in an LLM-powered email assistant to inject malicious prompts, allowing access to sensitive information and manipulation of email content.
49-
6. Payload Splitting: An attacker uploads a resume with split malicious prompts. When an LLM is used to evaluate the candidate, the combined prompts manipulate the model's response, resulting in a positive recommendation despite the actual resume contents.
50-
7. Multimodal Injection: An attacker embeds a malicious prompt within an image that accompanies benign text. When a multimodal AI processes the image and text concurrently, the hidden prompt alters the model's behavior, potentially leading to unauthorized actions or disclosure of sensitive information.
51-
8. Adversarial Suffix: An attacker appends a seemingly meaningless string of characters to a prompt, which influences the LLM's output in a malicious way, bypassing safety measures.
52-
9. Multilingual/Obfuscated Attack: An attacker uses multiple languages or encodes malicious instructions (e.g., using Base64 or emojis) to evade filters and manipulate the LLM's behavior.
53-
42+
1. Direct Injection: An attacker injects a prompt into a customer support chatbot, instructing it to ignore previous guidelines, query private data stores, and send emails, leading to unauthorized access and privilege escalation.
43+
2. Indirect Injection: A user employs an LLM to summarize a webpage containing hidden instructions that cause the LLM to insert an image linking to a URL, exfiltrating the private conversation.
44+
3. Unintentional Injection: A company includes an instruction in a job description to identify AI-generated applications. An applicant, unaware of this instruction, uses an LLM to optimize their resume, inadvertently triggering the AI detection.
45+
4. Intentional Model Influence: An attacker modifies a document in a repository used by a Retrieval-Augmented Generation (RAG) application. When a user's query returns the modified content, the malicious instructions alter the LLM's output, generating misleading results.
46+
5. Code Injection: Code Injection: An attacker exploits a vulnerability (CVE-2024-5184) in an LLM-powered email assistant to inject malicious prompts, allowing access to sensitive information and manipulation of email content.
47+
6. Payload Splitting: An attacker uploads a resume with split malicious prompts. When an LLM is used to evaluate the candidate, the combined prompts manipulate the model's response, resulting in a positive recommendation despite the actual resume contents.
48+
7. Multimodal Injection: An attacker embeds a malicious prompt within an image that accompanies benign text. When a multimodal AI processes the image and text concurrently, the hidden prompt alters the model's behavior, potentially leading to unauthorized actions or disclosure of sensitive information.
49+
8. Adversarial Suffix: An attacker appends a seemingly meaningless string of characters to a prompt, which influences the LLM's output in a malicious way, bypassing safety measures.
50+
9. Multilingual/Obfuscated Attack: An attacker uses multiple languages or encodes malicious instructions (e.g., using Base64 or emojis) to evade filters and manipulate the LLM's behavior.
5451

5552
### Reference Links
5653

@@ -70,12 +67,10 @@ Prompt injection vulnerabilities are possible due to the nature of generative AI
7067
14. [Universal and Transferable Adversarial Attacks on Aligned Language Models (arxiv.org)](https://arxiv.org/abs/2307.15043_
7168
15. [From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy (arxiv.org)](https://arxiv.org/abs/2307.00691)
7269

73-
7470
### Related Frameworks and Taxonomies
7571

7672
Refer to this section for comprehensive information, scenarios strategies relating to infrastructure deployment, applied environment controls and other best practices.
7773

7874
- [AML.T0051.000 - LLM Prompt Injection: Direct](https://atlas.mitre.org/techniques/AML.T0051.000) **MITRE ATLAS**
79-
- [AML.T0051.001 - LLM Prompt Injection: Direct](https://atlas.mitre.org/techniques/AML.T0051.001) **MITRE ATLAS**
75+
- [AML.T0051.001 - LLM Prompt Injection: Indirect](https://atlas.mitre.org/techniques/AML.T0051.001) **MITRE ATLAS**
8076
- [AML.T0054 - LLM Jailbreak Injection: Direct](https://atlas.mitre.org/techniques/AML.T0051.0054) **MITRE ATLAS**
81-
- [AML.T0051.000 - LLM Prompt Injection: Direct (Meta Prompt Extraction)](https://atlas.mitre.org/techniques/AML.T0051.000) **MITRE ATLAS**

0 commit comments

Comments
 (0)