Skip to content

Commit 0f2f397

Browse files
authored
Inline link and ### tweaks (#471)
1 parent b03e71f commit 0f2f397

10 files changed

+378
-211
lines changed

2_0_vulns/LLM01_PromptInjection.md

Lines changed: 37 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,11 @@ While prompt injection and jailbreaking are related concepts in LLM security, th
1010

1111
### Types of Prompt Injection Vulnerabilities
1212

13-
**Direct Prompt Injections** occur when a user's prompt input directly alters the behavior of the model in unintended or unexpected ways. The input can be either intentional (i.e., a malicious actor deliberately crafting a prompt to exploit the model) or unintentional (i.e., a user inadvertently providing input that triggers unexpected behavior).
13+
#### Direct Prompt Injections
14+
Direct prompt injections occur when a user's prompt input directly alters the behavior of the model in unintended or unexpected ways. The input can be either intentional (i.e., a malicious actor deliberately crafting a prompt to exploit the model) or unintentional (i.e., a user inadvertently providing input that triggers unexpected behavior).
1415

15-
**Indirect Prompt Injections** occur when an LLM accepts input from external sources, such as websites or files. The content may have in the external content data that when interpreted by the model, alters the behavior of the model in unintended or unexpected ways. Like direct injections, indirect injections can be either intentional or unintentional.
16+
#### Indirect Prompt Injections
17+
Indirect prompt injections occur when an LLM accepts input from external sources, such as websites or files. The content may have in the external content data that when interpreted by the model, alters the behavior of the model in unintended or unexpected ways. Like direct injections, indirect injections can be either intentional or unintentional.
1618

1719
The severity and nature of the impact of a successful prompt injection attack can vary greatly and are largely dependent on both the business context the model operates in, and the agency with which the model is architected. Generally, however, prompt injection can lead to unintended outcomes, including but not limited to:
1820

@@ -29,25 +31,41 @@ The rise of multimodal AI, which processes multiple data types simultaneously, i
2931

3032
Prompt injection vulnerabilities are possible due to the nature of generative AI. Given the stochastic influence at the heart of the way models work, it is unclear if there are fool-proof methods of prevention for prompt injection. However, the following measures can mitigate the impact of prompt injections:
3133

32-
1. Constrain model behavior: Provide specific instructions about the model's role, capabilities, and limitations within the system prompt. Enforce strict context adherence, limit responses to specific tasks or topics, and instruct the model to ignore attempts to modify core instructions.
33-
2. Define and validate expected output formats: Specify clear output formats, request detailed reasoning and source citations, and use deterministic code to validate adherence to these formats.
34-
3. Implement input and output filtering: Define sensitive categories and construct rules for identifying and handling such content. Apply semantic filters and use string-checking to scan for non-allowed content. Evaluate responses using the RAG Triad: Assess context relevance, groundedness, and question/answer relevance to identify potentially malicious outputs.
35-
4. Enforce privilege control and least privilege access: Provide the application with its own API tokens for extensible functionality, and handle these functions in code rather than providing them to the model. Restrict the model's access privileges to the minimum necessary for its intended operations.
36-
5. Require human approval for high-risk actions: Implement human-in-the-loop controls for privileged operations to prevent unauthorized actions.
37-
6. Segregate and identify external content: Separate and clearly denote untrusted content to limit its influence on user prompts.
38-
7. Conduct adversarial testing and attack simulations: Perform regular penetration testing and breach simulations, treating the model as an untrusted user to test the effectiveness of trust boundaries and access controls.
34+
#### 1. Constrain model behavior
35+
Provide specific instructions about the model's role, capabilities, and limitations within the system prompt. Enforce strict context adherence, limit responses to specific tasks or topics, and instruct the model to ignore attempts to modify core instructions.
36+
#### 2. Define and validate expected output formats
37+
Specify clear output formats, request detailed reasoning and source citations, and use deterministic code to validate adherence to these formats.
38+
#### 3. Implement input and output filtering
39+
Define sensitive categories and construct rules for identifying and handling such content. Apply semantic filters and use string-checking to scan for non-allowed content. Evaluate responses using the RAG Triad: Assess context relevance, groundedness, and question/answer relevance to identify potentially malicious outputs.
40+
#### 4. Enforce privilege control and least privilege access
41+
Provide the application with its own API tokens for extensible functionality, and handle these functions in code rather than providing them to the model. Restrict the model's access privileges to the minimum necessary for its intended operations.
42+
#### 5. Require human approval for high-risk actions
43+
Implement human-in-the-loop controls for privileged operations to prevent unauthorized actions.
44+
#### 6. Segregate and identify external content
45+
Separate and clearly denote untrusted content to limit its influence on user prompts.
46+
#### 7. Conduct adversarial testing and attack simulations
47+
Perform regular penetration testing and breach simulations, treating the model as an untrusted user to test the effectiveness of trust boundaries and access controls.
3948

4049
### Example Attack Scenarios
4150

42-
1. Direct Injection: An attacker injects a prompt into a customer support chatbot, instructing it to ignore previous guidelines, query private data stores, and send emails, leading to unauthorized access and privilege escalation.
43-
2. Indirect Injection: A user employs an LLM to summarize a webpage containing hidden instructions that cause the LLM to insert an image linking to a URL, leading to exfiltration of the the private conversation.
44-
3. Unintentional Injection: A company includes an instruction in a job description to identify AI-generated applications. An applicant, unaware of this instruction, uses an LLM to optimize their resume, inadvertently triggering the AI detection.
45-
4. Intentional Model Influence: An attacker modifies a document in a repository used by a Retrieval-Augmented Generation (RAG) application. When a user's query returns the modified content, the malicious instructions alter the LLM's output, generating misleading results.
46-
5. Code Injection: An attacker exploits a vulnerability (CVE-2024-5184) in an LLM-powered email assistant to inject malicious prompts, allowing access to sensitive information and manipulation of email content.
47-
6. Payload Splitting: An attacker uploads a resume with split malicious prompts. When an LLM is used to evaluate the candidate, the combined prompts manipulate the model's response, resulting in a positive recommendation despite the actual resume contents.
48-
7. Multimodal Injection: An attacker embeds a malicious prompt within an image that accompanies benign text. When a multimodal AI processes the image and text concurrently, the hidden prompt alters the model's behavior, potentially leading to unauthorized actions or disclosure of sensitive information.
49-
8. Adversarial Suffix: An attacker appends a seemingly meaningless string of characters to a prompt, which influences the LLM's output in a malicious way, bypassing safety measures.
50-
9. Multilingual/Obfuscated Attack: An attacker uses multiple languages or encodes malicious instructions (e.g., using Base64 or emojis) to evade filters and manipulate the LLM's behavior.
51+
#### Scenario #1: Direct Injection
52+
An attacker injects a prompt into a customer support chatbot, instructing it to ignore previous guidelines, query private data stores, and send emails, leading to unauthorized access and privilege escalation.
53+
#### Scenario #2: Indirect Injection
54+
A user employs an LLM to summarize a webpage containing hidden instructions that cause the LLM to insert an image linking to a URL, leading to exfiltration of the the private conversation.
55+
#### Scenario #3: Unintentional Injection
56+
A company includes an instruction in a job description to identify AI-generated applications. An applicant, unaware of this instruction, uses an LLM to optimize their resume, inadvertently triggering the AI detection.
57+
#### Scenario #4: Intentional Model Influence
58+
An attacker modifies a document in a repository used by a Retrieval-Augmented Generation (RAG) application. When a user's query returns the modified content, the malicious instructions alter the LLM's output, generating misleading results.
59+
#### Scenario #5: Code Injection
60+
An attacker exploits a vulnerability (CVE-2024-5184) in an LLM-powered email assistant to inject malicious prompts, allowing access to sensitive information and manipulation of email content.
61+
#### Scenario #6: Payload Splitting
62+
An attacker uploads a resume with split malicious prompts. When an LLM is used to evaluate the candidate, the combined prompts manipulate the model's response, resulting in a positive recommendation despite the actual resume contents.
63+
#### Scenario #7: Multimodal Injection
64+
An attacker embeds a malicious prompt within an image that accompanies benign text. When a multimodal AI processes the image and text concurrently, the hidden prompt alters the model's behavior, potentially leading to unauthorized actions or disclosure of sensitive information.
65+
#### Scenario #8: Adversarial Suffix
66+
An attacker appends a seemingly meaningless string of characters to a prompt, which influences the LLM's output in a malicious way, bypassing safety measures.
67+
#### Scenario #9: Multilingual/Obfuscated Attack
68+
An attacker uses multiple languages or encodes malicious instructions (e.g., using Base64 or emojis) to evade filters and manipulate the LLM's behavior.
5169

5270
### Reference Links
5371

@@ -63,7 +81,7 @@ Prompt injection vulnerabilities are possible due to the nature of generative AI
6381
11. [Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (nist.gov)](https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-2e2023.pdf)
6482
12. [2407.07403 A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends (arxiv.org)](https://arxiv.org/abs/2407.07403)
6583
13. [Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks](https://ieeexplore.ieee.org/document/10579515)
66-
14. [Universal and Transferable Adversarial Attacks on Aligned Language Models (arxiv.org)](https://arxiv.org/abs/2307.15043_
84+
14. [Universal and Transferable Adversarial Attacks on Aligned Language Models (arxiv.org)](https://arxiv.org/abs/2307.15043_)
6785
15. [From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy (arxiv.org)](https://arxiv.org/abs/2307.00691)
6886

6987
### Related Frameworks and Taxonomies

2_0_vulns/LLM02_SensitiveInformationDisclosure.md

Lines changed: 47 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -10,46 +10,66 @@ To reduce this risk, LLM applications should perform adequate data sanitization
1010

1111
### Common Examples of Vulnerability
1212

13-
1. **PII Leakage:** Personal identifiable information (PII) may be disclosed during interactions with the LLM.
14-
2. **Proprietary Algorithm Exposure:** Poorly configured model outputs can reveal proprietary algorithms or data. Revealing training data can expose models to inversion attacks, where attackers extract sensitive information or reconstruct inputs. For instance, as demonstrated in the 'Proof Pudding' attack (CVE-2019-20634), disclosed training data facilitated model extraction and inversion, allowing attackers to circumvent security controls in machine learning algorithms and bypass email filters.
15-
3. **Sensitive Business Data Disclosure:** Generated responses might inadvertently include confidential business information.
13+
#### 1. PII Leakage
14+
Personal identifiable information (PII) may be disclosed during interactions with the LLM.
15+
#### 2. Proprietary Algorithm Exposure
16+
Poorly configured model outputs can reveal proprietary algorithms or data. Revealing training data can expose models to inversion attacks, where attackers extract sensitive information or reconstruct inputs. For instance, as demonstrated in the 'Proof Pudding' attack (CVE-2019-20634), disclosed training data facilitated model extraction and inversion, allowing attackers to circumvent security controls in machine learning algorithms and bypass email filters.
17+
#### 3. Sensitive Business Data Disclosure
18+
Generated responses might inadvertently include confidential business information.
1619

1720
### Prevention and Mitigation Strategies
1821

19-
**Sanitization**:
22+
###@ Sanitization:
2023

21-
1. **Integrate Data Sanitization Techniques:** Implement data sanitization to prevent user data from entering the training model. This includes scrubbing or masking sensitive content before it is used in training.
22-
2. **Robust Input Validation:** Apply strict input validation methods to detect and filter out potentially harmful or sensitive data inputs, ensuring they do not compromise the model.
24+
#### 1. Integrate Data Sanitization Techniques
25+
Implement data sanitization to prevent user data from entering the training model. This includes scrubbing or masking sensitive content before it is used in training.
26+
#### 2. Robust Input Validation
27+
Apply strict input validation methods to detect and filter out potentially harmful or sensitive data inputs, ensuring they do not compromise the model.
2328

24-
**Access Controls**:
29+
###@ Access Controls:
2530

26-
1. **Enforce Strict Access Controls:** Limit access to sensitive data based on the principle of least privilege. Only grant access to data that is necessary for the specific user or process.
27-
2. **Restrict Data Sources:** Limit model access to external data sources, and ensure runtime data orchestration is securely managed to avoid unintended data leakage.
31+
#### 1. Enforce Strict Access Controls
32+
Limit access to sensitive data based on the principle of least privilege. Only grant access to data that is necessary for the specific user or process.
33+
#### 2. Restrict Data Sources
34+
Limit model access to external data sources, and ensure runtime data orchestration is securely managed to avoid unintended data leakage.
2835

29-
**Federated Learning and Privacy Techniques**:
36+
###@ Federated Learning and Privacy Techniques:
3037

31-
1. **Utilize Federated Learning:** Train models using decentralized data stored across multiple servers or devices. This approach minimizes the need for centralized data collection and reduces exposure risks.
32-
2. **Incorporate Differential Privacy:** Apply techniques that add noise to the data or outputs, making it difficult for attackers to reverse-engineer individual data points.
38+
#### 1. Utilize Federated Learning
39+
Train models using decentralized data stored across multiple servers or devices. This approach minimizes the need for centralized data collection and reduces exposure risks.
40+
#### 2. Incorporate Differential Privacy
41+
Apply techniques that add noise to the data or outputs, making it difficult for attackers to reverse-engineer individual data points.
3342

34-
**User Education and Transparency**
43+
###@ User Education and Transparency:
3544

36-
1. **Educate Users on Safe LLM Usage:** Provide guidance on avoiding the input of sensitive information. Offer training on best practices for interacting with LLMs securely.
37-
2. **Ensure Transparency in Data Usage:** Maintain clear policies about data retention, usage, and deletion. Allow users to opt out of having their data included in training processes.
45+
#### 1. Educate Users on Safe LLM Usage
46+
Provide guidance on avoiding the input of sensitive information. Offer training on best practices for interacting with LLMs securely.
47+
#### 2. Ensure Transparency in Data Usage
48+
Maintain clear policies about data retention, usage, and deletion. Allow users to opt out of having their data included in training processes.
3849

39-
**Secure System Configuration**
50+
###@ Secure System Configuration:
4051

41-
1. **Conceal System Preamble:** Limit the ability for users to override or access the system's initial settings, reducing the risk of exposure to internal configurations.
42-
2. **Reference Security Misconfiguration Best Practices:** Follow guidelines like [OWASP API8:2023 Security Misconfiguration](https://owasp.org/API-Security/editions/2023/en/0xa8-security-misconfiguration/) to prevent leaking sensitive information through error messages or configuration details.
43-
**Advanced Techniques**
52+
#### 1. Conceal System Preamble
53+
Limit the ability for users to override or access the system's initial settings, reducing the risk of exposure to internal configurations.
54+
#### 2. Reference Security Misconfiguration Best Practices
55+
Follow guidelines like "OWASP API8:2023 Security Misconfiguration" to prevent leaking sensitive information through error messages or configuration details.
56+
(Ref. link:[OWASP API8:2023 Security Misconfiguration](https://owasp.org/API-Security/editions/2023/en/0xa8-security-misconfiguration/))
4457

45-
1. **Homomorphic Encryption:** Use homomorphic encryption to enable secure data analysis and privacy-preserving machine learning. This ensures data remains confidential while being processed by the model.
46-
2. **Tokenization and Redaction:** Implement tokenization to preprocess and sanitize sensitive information. Techniques like pattern matching can detect and redact confidential content before processing.
58+
###@ Advanced Techniques:
59+
60+
#### 1. Homomorphic Encryption
61+
Use homomorphic encryption to enable secure data analysis and privacy-preserving machine learning. This ensures data remains confidential while being processed by the model.
62+
#### 2. Tokenization and Redaction
63+
Implement tokenization to preprocess and sanitize sensitive information. Techniques like pattern matching can detect and redact confidential content before processing.
4764

4865
### Example Attack Scenarios
4966

50-
1. **Unintentional Data Exposure:** A user receives a response containing another user's personal data due to inadequate data sanitization.
51-
2. **Targeted Prompt Injection:** An attacker bypasses input filters to extract sensitive information.
52-
3. **Data Leak via Training Data:** Negligent data inclusion in training leads to sensitive information disclosure.
67+
#### Scenario #1: Unintentional Data Exposure
68+
A user receives a response containing another user's personal data due to inadequate data sanitization.
69+
#### Scenario #2: Targeted Prompt Injection
70+
An attacker bypasses input filters to extract sensitive information.
71+
#### Scenario #3: Data Leak via Training Data
72+
Negligent data inclusion in training leads to sensitive information disclosure.
5373

5474
### Reference Links
5575

@@ -61,6 +81,8 @@ To reduce this risk, LLM applications should perform adequate data sanitization
6181

6282
### Related Frameworks and Taxonomies
6383

84+
Refer to this section for comprehensive information, scenarios strategies relating to infrastructure deployment, applied environment controls and other best practices.
85+
6486
- [AML.T0024.000 - Infer Training Data Membership](https://atlas.mitre.org/techniques/AML.T0024.000) **MITRE ATLAS**
6587
- [AML.T0024.001 - Invert ML Model](https://atlas.mitre.org/techniques/AML.T0024.001) **MITRE ATLAS**
66-
- [AML.T0024.002 - Extract ML Model](https://atlas.mitre.org/techniques/AML.T0024.002) **MITRE ATLAS**
88+
- [AML.T0024.002 - Extract ML Model](https://atlas.mitre.org/techniques/AML.T0024.002) **MITRE ATLAS**

0 commit comments

Comments
 (0)