Skip to content

Commit 0c65f97

Browse files
committed
Merge branch 'main' into translations/el-GR
2 parents 459f6d0 + f7e7fb6 commit 0c65f97

File tree

164 files changed

+15523
-63
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

164 files changed

+15523
-63
lines changed

2_0_vulns/LLM01_PromptInjection.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ Prompt injection vulnerabilities are possible due to the nature of generative AI
6969

7070
#### Scenario #2: Indirect Injection
7171

72-
A user employs an LLM to summarize a webpage containing hidden instructions that cause the LLM to insert an image linking to a URL, leading to exfiltration of the the private conversation.
72+
A user employs an LLM to summarize a webpage containing hidden instructions that cause the LLM to insert an image linking to a URL, leading to exfiltration of the private conversation.
7373

7474
#### Scenario #3: Unintentional Injection
7575

2_0_vulns/LLM02_SensitiveInformationDisclosure.md

Lines changed: 44 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -12,105 +12,104 @@ To reduce this risk, LLM applications should perform adequate data sanitization
1212

1313
#### 1. PII Leakage
1414

15-
Personal identifiable information (PII) may be disclosed during interactions with the LLM.
15+
Personal identifiable information (PII) may be disclosed during interactions with the LLM.
1616

1717
#### 2. Proprietary Algorithm Exposure
1818

19-
Poorly configured model outputs can reveal proprietary algorithms or data. Revealing training data can expose models to inversion attacks, where attackers extract sensitive information or reconstruct inputs. For instance, as demonstrated in the 'Proof Pudding' attack (CVE-2019-20634), disclosed training data facilitated model extraction and inversion, allowing attackers to circumvent security controls in machine learning algorithms and bypass email filters.
19+
Poorly configured model outputs can reveal proprietary algorithms or data. Revealing training data can expose models to inversion attacks, where attackers extract sensitive information or reconstruct inputs. For instance, as demonstrated in the 'Proof Pudding' attack (CVE-2019-20634), disclosed training data facilitated model extraction and inversion, allowing attackers to circumvent security controls in machine learning algorithms and bypass email filters.
2020

2121
#### 3. Sensitive Business Data Disclosure
2222

23-
Generated responses might inadvertently include confidential business information.
23+
Generated responses might inadvertently include confidential business information.
2424

2525
### Prevention and Mitigation Strategies
26-
2726
#### Sanitization
2827

29-
#### 1. Integrate Data Sanitization Techniques
28+
##### 1. Integrate Data Sanitization Techniques
3029

31-
Implement data sanitization to prevent user data from entering the training model. This includes scrubbing or masking sensitive content before it is used in training.
30+
Implement data sanitization to prevent user data from entering the training model. This includes scrubbing or masking sensitive content before it is used in training.
3231

33-
#### 2. Robust Input Validation
32+
##### 2. Robust Input Validation
3433

35-
Apply strict input validation methods to detect and filter out potentially harmful or sensitive data inputs, ensuring they do not compromise the model.
34+
Apply strict input validation methods to detect and filter out potentially harmful or sensitive data inputs, ensuring they do not compromise the model.
3635

3736
#### Access Controls
3837

39-
#### 1. Enforce Strict Access Controls
38+
##### 1. Enforce Strict Access Controls
4039

41-
Limit access to sensitive data based on the principle of least privilege. Only grant access to data that is necessary for the specific user or process.
40+
Limit access to sensitive data based on the principle of least privilege. Only grant access to data that is necessary for the specific user or process.
4241

43-
#### 2. Restrict Data Sources
42+
##### 2. Restrict Data Sources
4443

45-
Limit model access to external data sources, and ensure runtime data orchestration is securely managed to avoid unintended data leakage.
44+
Limit model access to external data sources, and ensure runtime data orchestration is securely managed to avoid unintended data leakage.
4645

4746
#### Federated Learning and Privacy Techniques
4847

49-
#### 1. Utilize Federated Learning
48+
##### 1. Utilize Federated Learning
5049

51-
Train models using decentralized data stored across multiple servers or devices. This approach minimizes the need for centralized data collection and reduces exposure risks.
50+
Train models using decentralized data stored across multiple servers or devices. This approach minimizes the need for centralized data collection and reduces exposure risks.
5251

53-
#### 2. Incorporate Differential Privacy
52+
##### 2. Incorporate Differential Privacy
5453

55-
Apply techniques that add noise to the data or outputs, making it difficult for attackers to reverse-engineer individual data points.
54+
Apply techniques that add noise to the data or outputs, making it difficult for attackers to reverse-engineer individual data points.
5655

5756
#### User Education and Transparency
5857

59-
#### 1. Educate Users on Safe LLM Usage
58+
##### 1. Educate Users on Safe LLM Usage
6059

61-
Provide guidance on avoiding the input of sensitive information. Offer training on best practices for interacting with LLMs securely.
60+
Provide guidance on avoiding the input of sensitive information. Offer training on best practices for interacting with LLMs securely.
6261

63-
#### 2. Ensure Transparency in Data Usage
62+
##### 2. Ensure Transparency in Data Usage
6463

65-
Maintain clear policies about data retention, usage, and deletion. Allow users to opt out of having their data included in training processes.
64+
Maintain clear policies about data retention, usage, and deletion. Allow users to opt out of having their data included in training processes.
6665

6766
#### Secure System Configuration
6867

69-
#### 1. Conceal System Preamble
68+
##### 1. Conceal System Preamble
7069

71-
Limit the ability for users to override or access the system's initial settings, reducing the risk of exposure to internal configurations.
70+
Limit the ability for users to override or access the system's initial settings, reducing the risk of exposure to internal configurations.
7271

73-
#### 2. Reference Security Misconfiguration Best Practices
72+
##### 2. Reference Security Misconfiguration Best Practices
7473

75-
Follow guidelines like "OWASP API8:2023 Security Misconfiguration" to prevent leaking sensitive information through error messages or configuration details.
76-
(Ref. link:[OWASP API8:2023 Security Misconfiguration](https://owasp.org/API-Security/editions/2023/en/0xa8-security-misconfiguration/))
74+
Follow guidelines like "OWASP API8:2023 Security Misconfiguration" to prevent leaking sensitive information through error messages or configuration details. (Ref. link: [OWASP API8:2023 Security Misconfiguration](https://owasp.org/API-Security/editions/2023/en/0xa8-security-misconfiguration/)).
7775

7876
#### Advanced Techniques
7977

80-
#### 1. Homomorphic Encryption
78+
##### 1. Homomorphic Encryption
8179

82-
Use homomorphic encryption to enable secure data analysis and privacy-preserving machine learning. This ensures data remains confidential while being processed by the model.
80+
Use homomorphic encryption to enable secure data analysis and privacy-preserving machine learning. This ensures data remains confidential while being processed by the model.
8381

84-
#### 2. Tokenization and Redaction
82+
##### 2. Tokenization and Redaction
8583

86-
Implement tokenization to preprocess and sanitize sensitive information. Techniques like pattern matching can detect and redact confidential content before processing.
84+
Implement tokenization to preprocess and sanitize sensitive information. Techniques like pattern matching can detect and redact confidential content before processing.
8785

8886
### Example Attack Scenarios
8987

90-
#### Scenario #1: Unintentional Data Exposure
88+
#### Scenario 1: Unintentional Data Exposure
9189

92-
A user receives a response containing another user's personal data due to inadequate data sanitization.
90+
A user receives a response containing another user's personal data due to inadequate data sanitization.
9391

94-
#### Scenario #2: Targeted Prompt Injection
92+
#### Scenario 2: Targeted Prompt Injection
9593

96-
An attacker bypasses input filters to extract sensitive information.
94+
An attacker bypasses input filters to extract sensitive information.
9795

98-
#### Scenario #3: Data Leak via Training Data
96+
#### Scenario 3: Data Leak via Training Data
9997

100-
Negligent data inclusion in training leads to sensitive information disclosure.
98+
Negligent data inclusion in training leads to sensitive information disclosure.
10199

102-
### Reference Links
100+
#### Reference Links
103101

104-
1. [Lessons learned from ChatGPT’s Samsung leak](https://cybernews.com/security/chatgpt-samsung-leak-explained-lessons/): **Cybernews**
105-
2. [AI data leak crisis: New tool prevents company secrets from being fed to ChatGPT](https://www.foxbusiness.com/politics/ai-data-leak-crisis-prevent-company-secrets-chatgpt): **Fox Business**
106-
3. [ChatGPT Spit Out Sensitive Data When Told to Repeat ‘Poem’ Forever](https://www.wired.com/story/chatgpt-poem-forever-security-roundup/): **Wired**
107-
4. [Using Differential Privacy to Build Secure Models](https://neptune.ai/blog/using-differential-privacy-to-build-secure-models-tools-methods-best-practices): **Neptune Blog**
108-
5. [Proof Pudding (CVE-2019-20634)](https://avidml.org/database/avid-2023-v009/) **AVID** (`moohax` & `monoxgas`)
102+
- *Cybernews*: [Lessons learned from ChatGPT’s Samsung leak](https://cybernews.com/security/chatgpt-samsung-leak-explained-lessons/)
103+
- *Fox Business*: [AI data leak crisis: New tool prevents company secrets from being fed to ChatGPT](https://www.foxbusiness.com/politics/ai-data-leak-crisis-prevent-company-secrets-chatgpt)
104+
- *Wired*: [ChatGPT Spit Out Sensitive Data When Told to Repeat ‘Poem’ Forever](https://www.wired.com/story/chatgpt-poem-forever-security-roundup/)
105+
- *Neptune Blog*: [sing Differential Privacy to Build Secure Models](https://neptune.ai/blog/using-differential-privacy-to-build-secure-models-tools-methods-best-practices)
106+
- *AVID 2023-009*: [Proof Pudding (CVE-2019-20634)](https://avidml.org/database/avid-2023-v009/)
107+
- *Will Pearce \(Moo_hax\) & Nick Landers \(Monoxgas\)*: [Proof Pudding research](https://github.com/moohax/Proof-Pudding)
109108

110-
### Related Frameworks and Taxonomies
109+
#### Related Frameworks and Taxonomies
111110

112111
Refer to this section for comprehensive information, scenarios strategies relating to infrastructure deployment, applied environment controls and other best practices.
113112

114-
- [AML.T0024.000 - Infer Training Data Membership](https://atlas.mitre.org/techniques/AML.T0024.000) **MITRE ATLAS**
115-
- [AML.T0024.001 - Invert ML Model](https://atlas.mitre.org/techniques/AML.T0024.001) **MITRE ATLAS**
116-
- [AML.T0024.002 - Extract ML Model](https://atlas.mitre.org/techniques/AML.T0024.002) **MITRE ATLAS**
113+
- *MITRE ATLAS*: [AML.T0024.000 - Infer Training Data Membership](https://atlas.mitre.org/techniques/AML.T0024.000)
114+
- *MITRE ATLAS*: [AML.T0024.001 - Invert ML Model](https://atlas.mitre.org/techniques/AML.T0024.001)
115+
- *MITRE ATLAS*: [AML.T0024.002 - Extract ML Model](https://atlas.mitre.org/techniques/AML.T0024.002)

2_0_vulns/LLM03_SupplyChain.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,19 +28,19 @@ A simple threat model can be found [here](https://github.com/jsotiro/ThreatModel
2828

2929
#### 4. Vulnerable Pre-Trained Model
3030

31-
Models are binary black boxes and unlike open source, static inspection can offer little to security assurances. Vulnerable pre-trained models can contain hidden biases, backdoors, or other malicious features that have not been identified through the safety evaluations of model repository. Vulnerable models can be created by both poisoned datasets and direct model tampering using techniques such as ROME also known as lobotomisation.
31+
Models are binary black boxes and unlike open source, static inspection can offer little to no security assurances. Vulnerable pre-trained models can contain hidden biases, backdoors, or other malicious features that have not been identified through the safety evaluations of model repositories. Vulnerable models can be created by both poisoned datasets and direct model tampering using techniques such as ROME also known as lobotomisation.
3232

3333
#### 5. Weak Model Provenance
3434

35-
Currently there are no strong provenance assurances in published models. Model Cards and associated documentation provide model information and relied upon users, but they offer no guarantees on the origin of the model. An attacker can compromise supplier account on a model repo or create a similar one and combine it with social engineering techniques to compromise the supply-chain of an LLM application.
35+
Currently there are no strong provenance assurances in published models. Model Cards and associated documentation provide model information and relied upon users, but they offer no guarantees on the origin of the model. An attacker can compromise a supplier account on a model repo or create a similar one and combine it with social engineering techniques to compromise the supply-chain of an LLM application.
3636

3737
#### 6. Vulnerable LoRA adapters
3838

39-
LoRA is a popular fine-tuning technique that enhances modularity by allowing pre-trained layers to be bolted onto an existing LLM. The method increases efficiency but create new risks, where a malicious LorA adapter compromises the integrity and security of the pre-trained base model. This can happen both in collaborative model merge environments but also exploiting the support for LoRA from popular inference deployment platforms such as vLMM and OpenLLM where adapters can be downloaded and applied to a deployed model.
39+
LoRA is a popular fine-tuning technique that enhances modularity by allowing pre-trained layers to be bolted onto an existing LLM. The method increases efficiency but creates new risks, where a malicious LorA adapter compromises the integrity and security of the pre-trained base model. This can happen both in collaborative model merge environments but also exploiting the support for LoRA from popular inference deployment platforms such as vLMM and OpenLLM where adapters can be downloaded and applied to a deployed model.
4040

4141
#### 7. Exploit Collaborative Development Processes
4242

43-
Collaborative model merge and model handling services (e.g. conversions) hosted in shared environments can be exploited to introduce vulnerabilities in shared models. Model merging is is very popular on Hugging Face with model-merged models topping the OpenLLM leaderboard and can be exploited to bypass reviews. Similarly, services such as conversation bot have been proved to be vulnerable to manipulation and introduce malicious code in models.
43+
Collaborative model merge and model handling services (e.g. conversions) hosted in shared environments can be exploited to introduce vulnerabilities in shared models. Model merging is very popular on Hugging Face with model-merged models topping the OpenLLM leaderboard and can be exploited to bypass reviews. Similarly, services such as a conversation bot have been proved to be vulnerable to manipulation and introduce malicious code in models.
4444

4545
#### 8. LLM Model on Device supply-chain vulnerabilities
4646

@@ -55,7 +55,7 @@ A simple threat model can be found [here](https://github.com/jsotiro/ThreatModel
5555
1. Carefully vet data sources and suppliers, including T&Cs and their privacy policies, only using trusted suppliers. Regularly review and audit supplier Security and Access, ensuring no changes in their security posture or T&Cs.
5656
2. Understand and apply the mitigations found in the OWASP Top Ten's "A06:2021 – Vulnerable and Outdated Components." This includes vulnerability scanning, management, and patching components. For development environments with access to sensitive data, apply these controls in those environments, too.
5757
(Ref. link: [A06:2021 – Vulnerable and Outdated Components](https://owasp.org/Top10/A06_2021-Vulnerable_and_Outdated_Components/))
58-
3. Apply comprehensive AI Red Teaming and Evaluations when selecting a third party model. Decoding Trust is an example of a Trustworthy AI benchmark for LLMs but models can fine-tuned to by pass published benchmarks. Use extensive AI Red Teaming to evaluate the model, especially in the use cases you are planning to use the model for.
58+
3. Apply comprehensive AI Red Teaming and Evaluations when selecting a third party model. Decoding Trust is an example of a Trustworthy AI benchmark for LLMs but models can be fine-tuned to bypass published benchmarks. Use extensive AI Red Teaming to evaluate the model, especially in the use cases you are planning to use the model for.
5959
4. Maintain an up-to-date inventory of components using a Software Bill of Materials (SBOM) to ensure you have an up-to-date, accurate, and signed inventory, preventing tampering with deployed packages. SBOMs can be used to detect and alert for new, zero-date vulnerabilities quickly. AI BOMs and ML SBOMs are an emerging area and you should evaluate options starting with OWASP CycloneDX
6060
5. To mitigate AI licensing risks, create an inventory of all types of licenses involved using BOMs and conduct regular audits of all software, tools, and datasets, ensuring compliance and transparency through BOMs. Use automated license management tools for real-time monitoring and train teams on licensing models. Maintain detailed licensing documentation in BOMs and leverage tools such as [Dyana](https://github.com/dreadnode/dyana) to perform dynamic analysis of third-party software.
6161
6. Only use models from verifiable sources and use third-party model integrity checks with signing and file hashes to compensate for the lack of strong model provenance. Similarly, use code signing for externally supplied code.
@@ -101,15 +101,15 @@ A simple threat model can be found [here](https://github.com/jsotiro/ThreatModel
101101

102102
#### Scenario #9: WizardLM
103103

104-
Following the removal of WizardLM, an attacker exploits the interest in this model and publish a fake version of the model with the same name but containing malware and backdoors.
104+
Following the removal of WizardLM, an attacker exploits the interest in this model and publishes a fake version of the model with the same name but containing malware and backdoors.
105105

106106
#### Scenario #10: Model Merge/Format Conversion Service
107107

108108
An attacker stages an attack with a model merge or format conversation service to compromise a publicly available access model to inject malware. This is an actual attack published by vendor HiddenLayer.
109109

110110
#### Scenario #11: Reverse-Engineer Mobile App
111111

112-
An attacker reverse-engineers an mobile app to replace the model with a tampered version that leads the user to scam sites. Users are encouraged to download the app directly via social engineering techniques. This is a "real attack on predictive AI" that affected 116 Google Play apps including popular security and safety-critical applications used for as cash recognition, parental control, face authentication, and financial service.
112+
An attacker reverse-engineers an mobile app to replace the model with a tampered version that leads the user to scam sites. Users are encouraged to download the app directly via social engineering techniques. This is a "real attack on predictive AI" that affected 116 Google Play apps including popular security and safety-critical applications used for cash recognition, parental control, face authentication, and financial service.
113113
(Ref. link: [real attack on predictive AI](https://arxiv.org/abs/2006.08131))
114114

115115
#### Scenario #12: Dataset Poisoning

0 commit comments

Comments
 (0)