Skip to content

Commit d362ca5

Browse files
Merge pull request #674 from talesh/cleaned-markdown
2 parents f4ac7b0 + 8be59bc commit d362ca5

File tree

82 files changed

+1849
-251
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+1849
-251
lines changed

2_0_vulns/LLM00_Preface.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,14 @@ Like the technology itself, this list is a product of the open-source community
2020

2121
Thank you to everyone who helped bring this together and those who continue to use and improve it. We’re grateful to be part of this work with you.
2222

23-
2423
#### Steve Wilson
24+
2525
Project Lead
2626
OWASP Top 10 for Large Language Model Applications
27-
LinkedIn: https://www.linkedin.com/in/wilsonsd/
27+
LinkedIn: <https://www.linkedin.com/in/wilsonsd/>
2828

2929
#### Ads Dawson
30+
3031
Technical Lead & Vulnerability Entries Lead
3132
OWASP Top 10 for Large Language Model Applications
32-
LinkedIn: https://www.linkedin.com/in/adamdawson0/
33+
LinkedIn: <https://www.linkedin.com/in/adamdawson0/>

2_0_vulns/LLM01_PromptInjection.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,11 @@ While prompt injection and jailbreaking are related concepts in LLM security, th
1111
### Types of Prompt Injection Vulnerabilities
1212

1313
#### Direct Prompt Injections
14+
1415
Direct prompt injections occur when a user's prompt input directly alters the behavior of the model in unintended or unexpected ways. The input can be either intentional (i.e., a malicious actor deliberately crafting a prompt to exploit the model) or unintentional (i.e., a user inadvertently providing input that triggers unexpected behavior).
1516

1617
#### Indirect Prompt Injections
18+
1719
Indirect prompt injections occur when an LLM accepts input from external sources, such as websites or files. The external source may have content data that when interpreted by the model, alters the behavior of the model in unintended or unexpected ways. Like direct injections, indirect injections can be either intentional or unintentional.
1820

1921
The severity and nature of the impact of a successful prompt injection attack can vary greatly and are largely dependent on both the business context the model operates in, and the agency with which the model is architected. Generally, however, prompt injection can lead to unintended outcomes, including but not limited to:
@@ -32,39 +34,69 @@ The rise of multimodal AI, which processes multiple data types simultaneously, i
3234
Prompt injection vulnerabilities are possible due to the nature of generative AI. Given the stochastic influence at the heart of the way models work, it is unclear if there are fool-proof methods of prevention for prompt injection. However, the following measures can mitigate the impact of prompt injections:
3335

3436
#### 1. Constrain model behavior
37+
3538
Provide specific instructions about the model's role, capabilities, and limitations within the system prompt. Enforce strict context adherence, limit responses to specific tasks or topics, and instruct the model to ignore attempts to modify core instructions.
39+
3640
#### 2. Define and validate expected output formats
41+
3742
Specify clear output formats, request detailed reasoning and source citations, and use deterministic code to validate adherence to these formats.
43+
3844
#### 3. Implement input and output filtering
45+
3946
Define sensitive categories and construct rules for identifying and handling such content. Apply semantic filters and use string-checking to scan for non-allowed content. Evaluate responses using the RAG Triad: Assess context relevance, groundedness, and question/answer relevance to identify potentially malicious outputs.
47+
4048
#### 4. Enforce privilege control and least privilege access
49+
4150
Provide the application with its own API tokens for extensible functionality, and handle these functions in code rather than providing them to the model. Restrict the model's access privileges to the minimum necessary for its intended operations.
51+
4252
#### 5. Require human approval for high-risk actions
53+
4354
Implement human-in-the-loop controls for privileged operations to prevent unauthorized actions.
55+
4456
#### 6. Segregate and identify external content
57+
4558
Separate and clearly denote untrusted content to limit its influence on user prompts.
59+
4660
#### 7. Conduct adversarial testing and attack simulations
61+
4762
Perform regular penetration testing and breach simulations, treating the model as an untrusted user to test the effectiveness of trust boundaries and access controls.
4863

4964
### Example Attack Scenarios
5065

5166
#### Scenario #1: Direct Injection
67+
5268
An attacker injects a prompt into a customer support chatbot, instructing it to ignore previous guidelines, query private data stores, and send emails, leading to unauthorized access and privilege escalation.
69+
5370
#### Scenario #2: Indirect Injection
71+
5472
A user employs an LLM to summarize a webpage containing hidden instructions that cause the LLM to insert an image linking to a URL, leading to exfiltration of the the private conversation.
73+
5574
#### Scenario #3: Unintentional Injection
75+
5676
A company includes an instruction in a job description to identify AI-generated applications. An applicant, unaware of this instruction, uses an LLM to optimize their resume, inadvertently triggering the AI detection.
77+
5778
#### Scenario #4: Intentional Model Influence
79+
5880
An attacker modifies a document in a repository used by a Retrieval-Augmented Generation (RAG) application. When a user's query returns the modified content, the malicious instructions alter the LLM's output, generating misleading results.
81+
5982
#### Scenario #5: Code Injection
83+
6084
An attacker exploits a vulnerability (CVE-2024-5184) in an LLM-powered email assistant to inject malicious prompts, allowing access to sensitive information and manipulation of email content.
85+
6186
#### Scenario #6: Payload Splitting
87+
6288
An attacker uploads a resume with split malicious prompts. When an LLM is used to evaluate the candidate, the combined prompts manipulate the model's response, resulting in a positive recommendation despite the actual resume contents.
89+
6390
#### Scenario #7: Multimodal Injection
91+
6492
An attacker embeds a malicious prompt within an image that accompanies benign text. When a multimodal AI processes the image and text concurrently, the hidden prompt alters the model's behavior, potentially leading to unauthorized actions or disclosure of sensitive information.
93+
6594
#### Scenario #8: Adversarial Suffix
95+
6696
An attacker appends a seemingly meaningless string of characters to a prompt, which influences the LLM's output in a malicious way, bypassing safety measures.
97+
6798
#### Scenario #9: Multilingual/Obfuscated Attack
99+
68100
An attacker uses multiple languages or encodes malicious instructions (e.g., using Base64 or emojis) to evade filters and manipulate the LLM's behavior.
69101

70102
### Reference Links

2_0_vulns/LLM02_SensitiveInformationDisclosure.md

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,64 +11,92 @@ To reduce this risk, LLM applications should perform adequate data sanitization
1111
### Common Examples of Vulnerability
1212

1313
#### 1. PII Leakage
14+
1415
Personal identifiable information (PII) may be disclosed during interactions with the LLM.
16+
1517
#### 2. Proprietary Algorithm Exposure
18+
1619
Poorly configured model outputs can reveal proprietary algorithms or data. Revealing training data can expose models to inversion attacks, where attackers extract sensitive information or reconstruct inputs. For instance, as demonstrated in the 'Proof Pudding' attack (CVE-2019-20634), disclosed training data facilitated model extraction and inversion, allowing attackers to circumvent security controls in machine learning algorithms and bypass email filters.
20+
1721
#### 3. Sensitive Business Data Disclosure
22+
1823
Generated responses might inadvertently include confidential business information.
1924

2025
### Prevention and Mitigation Strategies
2126

22-
#### Sanitization:
27+
#### Sanitization
2328

2429
#### 1. Integrate Data Sanitization Techniques
30+
2531
Implement data sanitization to prevent user data from entering the training model. This includes scrubbing or masking sensitive content before it is used in training.
32+
2633
#### 2. Robust Input Validation
34+
2735
Apply strict input validation methods to detect and filter out potentially harmful or sensitive data inputs, ensuring they do not compromise the model.
2836

29-
#### Access Controls:
37+
#### Access Controls
3038

3139
#### 1. Enforce Strict Access Controls
40+
3241
Limit access to sensitive data based on the principle of least privilege. Only grant access to data that is necessary for the specific user or process.
42+
3343
#### 2. Restrict Data Sources
44+
3445
Limit model access to external data sources, and ensure runtime data orchestration is securely managed to avoid unintended data leakage.
3546

36-
#### Federated Learning and Privacy Techniques:
47+
#### Federated Learning and Privacy Techniques
3748

3849
#### 1. Utilize Federated Learning
50+
3951
Train models using decentralized data stored across multiple servers or devices. This approach minimizes the need for centralized data collection and reduces exposure risks.
52+
4053
#### 2. Incorporate Differential Privacy
54+
4155
Apply techniques that add noise to the data or outputs, making it difficult for attackers to reverse-engineer individual data points.
4256

43-
#### User Education and Transparency:
57+
#### User Education and Transparency
4458

4559
#### 1. Educate Users on Safe LLM Usage
60+
4661
Provide guidance on avoiding the input of sensitive information. Offer training on best practices for interacting with LLMs securely.
62+
4763
#### 2. Ensure Transparency in Data Usage
64+
4865
Maintain clear policies about data retention, usage, and deletion. Allow users to opt out of having their data included in training processes.
4966

50-
#### Secure System Configuration:
67+
#### Secure System Configuration
5168

5269
#### 1. Conceal System Preamble
70+
5371
Limit the ability for users to override or access the system's initial settings, reducing the risk of exposure to internal configurations.
72+
5473
#### 2. Reference Security Misconfiguration Best Practices
74+
5575
Follow guidelines like "OWASP API8:2023 Security Misconfiguration" to prevent leaking sensitive information through error messages or configuration details.
5676
(Ref. link:[OWASP API8:2023 Security Misconfiguration](https://owasp.org/API-Security/editions/2023/en/0xa8-security-misconfiguration/))
5777

58-
#### Advanced Techniques:
78+
#### Advanced Techniques
5979

6080
#### 1. Homomorphic Encryption
81+
6182
Use homomorphic encryption to enable secure data analysis and privacy-preserving machine learning. This ensures data remains confidential while being processed by the model.
83+
6284
#### 2. Tokenization and Redaction
85+
6386
Implement tokenization to preprocess and sanitize sensitive information. Techniques like pattern matching can detect and redact confidential content before processing.
6487

6588
### Example Attack Scenarios
6689

6790
#### Scenario #1: Unintentional Data Exposure
91+
6892
A user receives a response containing another user's personal data due to inadequate data sanitization.
93+
6994
#### Scenario #2: Targeted Prompt Injection
95+
7096
An attacker bypasses input filters to extract sensitive information.
97+
7198
#### Scenario #3: Data Leak via Training Data
99+
72100
Negligent data inclusion in training leads to sensitive information disclosure.
73101

74102
### Reference Links

2_0_vulns/LLM03_SupplyChain.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,23 +14,40 @@ A simple threat model can be found [here](https://github.com/jsotiro/ThreatModel
1414
### Common Examples of Risks
1515

1616
#### 1. Traditional Third-party Package Vulnerabilities
17+
1718
Such as outdated or deprecated components, which attackers can exploit to compromise LLM applications. This is similar to "A06:2021 – Vulnerable and Outdated Components" with increased risks when components are used during model development or fine-tuning.
1819
(Ref. link: [A06:2021 – Vulnerable and Outdated Components](https://owasp.org/Top10/A06_2021-Vulnerable_and_Outdated_Components/))
20+
1921
#### 2. Licensing Risks
22+
2023
AI development often involves diverse software and dataset licenses, creating risks if not properly managed. Different open-source and proprietary licenses impose varying legal requirements. Dataset licenses may restrict usage, distribution, or commercialization.
24+
2125
#### 3. Outdated or Deprecated Models
26+
2227
Using outdated or deprecated models that are no longer maintained leads to security issues.
28+
2329
#### 4. Vulnerable Pre-Trained Model
30+
2431
Models are binary black boxes and unlike open source, static inspection can offer little to security assurances. Vulnerable pre-trained models can contain hidden biases, backdoors, or other malicious features that have not been identified through the safety evaluations of model repository. Vulnerable models can be created by both poisoned datasets and direct model tampering using techniques such as ROME also known as lobotomisation.
32+
2533
#### 5. Weak Model Provenance
34+
2635
Currently there are no strong provenance assurances in published models. Model Cards and associated documentation provide model information and relied upon users, but they offer no guarantees on the origin of the model. An attacker can compromise supplier account on a model repo or create a similar one and combine it with social engineering techniques to compromise the supply-chain of an LLM application.
36+
2737
#### 6. Vulnerable LoRA adapters
38+
2839
LoRA is a popular fine-tuning technique that enhances modularity by allowing pre-trained layers to be bolted onto an existing LLM. The method increases efficiency but create new risks, where a malicious LorA adapter compromises the integrity and security of the pre-trained base model. This can happen both in collaborative model merge environments but also exploiting the support for LoRA from popular inference deployment platforms such as vLMM and OpenLLM where adapters can be downloaded and applied to a deployed model.
40+
2941
#### 7. Exploit Collaborative Development Processes
42+
3043
Collaborative model merge and model handling services (e.g. conversions) hosted in shared environments can be exploited to introduce vulnerabilities in shared models. Model merging is is very popular on Hugging Face with model-merged models topping the OpenLLM leaderboard and can be exploited to bypass reviews. Similarly, services such as conversation bot have been proved to be vulnerable to manipulation and introduce malicious code in models.
44+
3145
#### 8. LLM Model on Device supply-chain vulnerabilities
46+
3247
LLM models on device increase the supply attack surface with compromised manufactured processes and exploitation of device OS or firmware vulnerabilities to compromise models. Attackers can reverse engineer and re-package applications with tampered models.
48+
3349
#### 9. Unclear T&Cs and Data Privacy Policies
50+
3451
Unclear T&Cs and data privacy policies of the model operators lead to the application's sensitive data being used for model training and subsequent sensitive information exposure. This may also apply to risks from using copyrighted material by the model supplier.
3552

3653
### Prevention and Mitigation Strategies
@@ -51,31 +68,56 @@ A simple threat model can be found [here](https://github.com/jsotiro/ThreatModel
5168
### Sample Attack Scenarios
5269

5370
#### Scenario #1: Vulnerable Python Library
71+
5472
An attacker exploits a vulnerable Python library to compromise an LLM app. This happened in the first Open AI data breach. Attacks on the PyPi package registry tricked model developers into downloading a compromised PyTorch dependency with malware in a model development environment. A more sophisticated example of this type of attack is Shadow Ray attack on the Ray AI framework used by many vendors to manage AI infrastructure. In this attack, five vulnerabilities are believed to have been exploited in the wild affecting many servers.
73+
5574
#### Scenario #2: Direct Tampering
75+
5676
Direct Tampering and publishing a model to spread misinformation. This is an actual attack with PoisonGPT bypassing Hugging Face safety features by directly changing model parameters.
77+
5778
#### Scenario #3: Fine-tuning Popular Model
79+
5880
An attacker fine-tunes a popular open access model to remove key safety features and perform high in a specific domain (insurance). The model is fine-tuned to score highly on safety benchmarks but has very targeted triggers. They deploy it on Hugging Face for victims to use it exploiting their trust on benchmark assurances.
81+
5982
#### Scenario #4: Pre-Trained Models
83+
6084
An LLM system deploys pre-trained models from a widely used repository without thorough verification. A compromised model introduces malicious code, causing biased outputs in certain contexts and leading to harmful or manipulated outcomes
85+
6186
#### Scenario #5: Compromised Third-Party Supplier
87+
6288
A compromised third-party supplier provides a vulnerable LorA adapter that is being merged to an LLM using model merge on Hugging Face.
89+
6390
#### Scenario #6: Supplier Infiltration
91+
6492
An attacker infiltrates a third-party supplier and compromises the production of a LoRA (Low-Rank Adaptation) adapter intended for integration with an on-device LLM deployed using frameworks like vLLM or OpenLLM. The compromised LoRA adapter is subtly altered to include hidden vulnerabilities and malicious code. Once this adapter is merged with the LLM, it provides the attacker with a covert entry point into the system. The malicious code can activate during model operations, allowing the attacker to manipulate the LLM’s outputs.
93+
6594
#### Scenario #7: CloudBorne and CloudJacking Attacks
95+
6696
These attacks target cloud infrastructures, leveraging shared resources and vulnerabilities in the virtualization layers. CloudBorne involves exploiting firmware vulnerabilities in shared cloud environments, compromising the physical servers hosting virtual instances. CloudJacking refers to malicious control or misuse of cloud instances, potentially leading to unauthorized access to critical LLM deployment platforms. Both attacks represent significant risks for supply chains reliant on cloud-based ML models, as compromised environments could expose sensitive data or facilitate further attacks.
97+
6798
#### Scenario #8: LeftOvers (CVE-2023-4969)
99+
68100
LeftOvers exploitation of leaked GPU local memory to recover sensitive data. An attacker can use this attack to exfiltrate sensitive data in production servers and development workstations or laptops.
101+
69102
#### Scenario #9: WizardLM
103+
70104
Following the removal of WizardLM, an attacker exploits the interest in this model and publish a fake version of the model with the same name but containing malware and backdoors.
105+
71106
#### Scenario #10: Model Merge/Format Conversion Service
107+
72108
An attacker stages an attack with a model merge or format conversation service to compromise a publicly available access model to inject malware. This is an actual attack published by vendor HiddenLayer.
109+
73110
#### Scenario #11: Reverse-Engineer Mobile App
111+
74112
An attacker reverse-engineers an mobile app to replace the model with a tampered version that leads the user to scam sites. Users are encouraged to download the app directly via social engineering techniques. This is a "real attack on predictive AI" that affected 116 Google Play apps including popular security and safety-critical applications used for as cash recognition, parental control, face authentication, and financial service.
75113
(Ref. link: [real attack on predictive AI](https://arxiv.org/abs/2006.08131))
114+
76115
#### Scenario #12: Dataset Poisoning
116+
77117
An attacker poisons publicly available datasets to help create a back door when fine-tuning models. The back door subtly favors certain companies in different markets.
118+
78119
#### Scenario #13: T&Cs and Privacy Policy
120+
79121
An LLM operator changes its T&Cs and Privacy Policy to require an explicit opt out from using application data for model training, leading to the memorization of sensitive data.
80122

81123
### Reference Links

2_0_vulns/LLM04_DataModelPoisoning.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,23 @@ Moreover, models distributed through shared repositories or open-source platform
3434
### Example Attack Scenarios
3535

3636
#### Scenario #1
37+
3738
An attacker biases the model's outputs by manipulating training data or using prompt injection techniques, spreading misinformation.
39+
3840
#### Scenario #2
41+
3942
Toxic data without proper filtering can lead to harmful or biased outputs, propagating dangerous information.
40-
#### Scenario # 3
43+
44+
#### Scenario #3
45+
4146
A malicious actor or competitor creates falsified documents for training, resulting in model outputs that reflect these inaccuracies.
47+
4248
#### Scenario #4
49+
4350
Inadequate filtering allows an attacker to insert misleading data via prompt injection, leading to compromised outputs.
51+
4452
#### Scenario #5
53+
4554
An attacker uses poisoning techniques to insert a backdoor trigger into the model. This could leave you open to authentication bypass, data exfiltration or hidden command execution.
4655

4756
### Reference Links

0 commit comments

Comments
 (0)