Skip to content

Commit 7f1530b

Browse files
davida-psabutbulCopilotclaude
authored
rag poisoning-pr (#68)
* adding support for base_url with ollama and openai. adding embedding adding target temperature for embedding attacks configuration. via file and menu adding skipped test method adding rag poisnoning attack adding package creation dependencies via setup.py (oldschool) adding uv package baseline adding tests * refactored provider and model prompts * Disable bugged telemetry in RAG Poisoning test to prevent PostHog errors open-webui/open-webui#15624 + add dependencies to base package * restored comment * configuration error should also skip * Improve error handling for RAG poisoning attack (these errors should be caught and warned) * Update ps_fuzz/attacks/rag_poisoning.py supress loggers Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update ps_fuzz/attacks/rag_poisoning.py out of scope fail-safe Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update ps_fuzz/test_base.py typo Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update ps_fuzz/attacks/rag_poisoning.py operator race condition Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/test_chat_clients.py unused Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/test_chat_clients.py redefined in #1 Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/test_chat_clients.py unused Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update ps_fuzz/attacks/rag_poisoning.py unused Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix tests, introduce claude.md * Update README.md to include RAG & Vector Database Attacks section and enhance attack options * Add Bandit configuration file and implement poisoned document creation in RAG poisoning attack * Address all Copilot review comments on PR #68 (#69) - Fix ChromaDB persist() compatibility: wrap in try/except for 0.4.0+ which auto-persists with persist_directory - Replace fragile error string matching with specific exception types (ImportError, ConnectionError, ValueError, etc.) - Fix register_test decorator to return cls (was returning None) - Fix getter/setter inconsistency: embedding_provider and embedding_model setters now accept empty values matching getter defaults - Fix empty base_url passthrough: empty strings are now stripped from kwargs instead of passed to model constructors - Remove unused client variable assignments in test_chat_clients.py - Reorganize tests: move AppConfig, helper, and TestStatus tests out of test_is_response_list.py into dedicated test files All 93 tests pass. https://claude.ai/code/session_01CDFqeg5QhB4V7yQ3yVVBc9 Co-authored-by: Claude <noreply@anthropic.com> --------- Co-authored-by: David Abutbul <david@abutbul.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent 5152174 commit 7f1530b

19 files changed

+1939
-59
lines changed

.bandit

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Bandit configuration file
2+
# Exclude directories that should not be scanned
3+
exclude_dirs:
4+
- './.venv'
5+
- './.git'
6+
- './build'
7+
- './dist'
8+
- './prompt_security_fuzzer.egg-info'
9+
- './.env'
10+
- './tests' # Exclude test files - pytest uses assertions which trigger B101 warnings

README.md

Lines changed: 52 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ Table of Contents
5656
* [Supported attacks](#attacks)
5757
* [Jailbreak](#jailbreak)
5858
* [Prompt Injection](#pi-injection)
59+
* [RAG & Vector Database Attacks](#rag-poisoning)
5960
* [System prompt extraction](#systemleak)
6061
* [ :rainbow: What’s next on the roadmap?](#roadmap)
6162
* [ :beers: Contributing](#contributing)
@@ -111,7 +112,7 @@ Table of Contents
111112
### Features
112113
<b>The Prompt Fuzzer Supports:</b><br>
113114
🧞 16 [llm providers](#llm-providers)<br>
114-
🔫 15 different [attacks](#attacks)<br>
115+
🔫 16 different [attacks](#attacks)<br>
115116
💬 Interactive mode<br>
116117
🤖 CLI mode<br>
117118
🧵 Multi threaded testing<br>
@@ -163,8 +164,14 @@ Alternatively, create a file named `.env` in the current directory and set the `
163164
* `--num-attempts, -n` NUM_ATTEMPTS Number of different attack prompts
164165
* `--num-threads, -t` NUM_THREADS Number of worker threads
165166
* `--attack-temperature, -a` ATTACK_TEMPERATURE Temperature for attack model
166-
* `--debug-level, -d` DEBUG_LEVEL Debug level (0-2)
167-
* `-batch, -b` Run the fuzzer in unattended (batch) mode, bypassing the interactive steps
167+
* `--debug-level, -d` DEBUG_LEVEL Debug level (0-2)
168+
* `-batch, -b` Run the fuzzer in unattended (batch) mode, bypassing the interactive steps
169+
* `--ollama-base-url` Base URL for Ollama API (for self-hosted deployments)
170+
* `--openai-base-url` Base URL for OpenAI API (for OpenAI-compatible endpoints)
171+
* `--embedding-provider` Embedding provider (ollama or open_ai) - required for RAG tests
172+
* `--embedding-model` Embedding model name - required for RAG tests
173+
* `--embedding-ollama-base-url` Base URL for Ollama Embedding API
174+
* `--embedding-openai-base-url` Base URL for OpenAI Embedding API
168175

169176
<br/>
170177

@@ -205,6 +212,43 @@ Run tests against the system prompt with a subset of attacks
205212
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt --custom-benchmark=ps_fuzz/attack_data/custom_benchmark1.csv --tests='["ucar","amnesia"]'
206213
```
207214

215+
#### 🧪 RAG Poisoning Attack
216+
Test RAG systems with vector database poisoning attacks
217+
218+
```bash
219+
# Using OpenAI embeddings
220+
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt \
221+
--embedding-provider=open_ai \
222+
--embedding-model=text-embedding-ada-002 \
223+
--tests='["rag_poisoning"]'
224+
225+
# Using Ollama embeddings with custom endpoint
226+
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt \
227+
--embedding-provider=ollama \
228+
--embedding-model=nomic-embed-text \
229+
--embedding-ollama-base-url=http://localhost:11434 \
230+
--tests='["rag_poisoning"]'
231+
```
232+
233+
**Note**: Requires chromadb (installed by default with prompt-security-fuzzer)
234+
235+
#### 🔌 Using Custom API Endpoints
236+
Run tests against custom or self-hosted LLM deployments
237+
238+
```bash
239+
# Using custom Ollama endpoint
240+
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt \
241+
--target-provider=ollama \
242+
--target-model=llama2 \
243+
--ollama-base-url=http://localhost:11434
244+
245+
# Using OpenAI-compatible endpoint (e.g., LocalAI, vLLM, LM Studio)
246+
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt \
247+
--target-provider=open_ai \
248+
--target-model=custom-model \
249+
--openai-base-url=http://your-custom-endpoint:8000/v1
250+
```
251+
208252
<br>
209253
<br>
210254
<br>
@@ -245,6 +289,11 @@ We use a dynamic testing approach, where we get the necessary context from your
245289
- **Ethical Compliance**: Evaluates resistance to discussing harmful or inappropriate content about sensitive topics.
246290
- **Typoglycemia Attack**: Exploits text processing vulnerabilities by omitting random characters, causing incorrect responses.
247291

292+
<a id="rag-poisoning"></a>
293+
##### RAG & Vector Database Attacks
294+
295+
- **RAG Poisoning (Hidden Parrot Attack)**: Tests whether malicious instructions embedded in vector database documents can compromise RAG system behavior. This attack verifies if poisoned content retrieved from vector stores can override system prompts or inject unauthorized instructions into LLM responses.
296+
248297
<a id="systemleak"></a>
249298
##### System prompt extraction
250299

claude.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Development Setup
2+
3+
## Python Environment
4+
5+
This project requires Python >= 3.9 (tested with 3.9, 3.10, 3.11).
6+
7+
### Setup with uv
8+
9+
1. Create virtual environment with Python 3.11:
10+
```bash
11+
uv venv --python 3.11
12+
```
13+
14+
2. Activate the virtual environment:
15+
```bash
16+
source .venv/bin/activate
17+
```
18+
19+
3. Install dependencies:
20+
```bash
21+
uv pip install -e ".[dev]"
22+
```
23+
24+
### Running Tests
25+
26+
Run all tests:
27+
```bash
28+
pytest
29+
```
30+
31+
Run specific test:
32+
```bash
33+
pytest tests/test_chat_clients.py::TestClientLangChainBaseURL::test_empty_base_url_parameters -v
34+
```
35+
36+
Run tests with verbose output:
37+
```bash
38+
pytest -v
39+
```

ps_fuzz/app_config.py

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,8 @@ def __init__(self, config_state_file: str, config_state: dict = None):
3838
logger.warning(f"Failed to load config state file {self.config_state_file}: {e}")
3939

4040
def get_attributes(self):
41-
return self.config_state
41+
attributes = self.config_state.copy()
42+
return attributes
4243

4344
def print_as_table(self):
4445
attributes = self.get_attributes()
@@ -184,6 +185,60 @@ def system_prompt(self) -> str:
184185
def system_prompt(self, value: str):
185186
self.config_state['system_prompt'] = value
186187
self.save()
188+
189+
@property
190+
def ollama_base_url(self) -> str:
191+
return self.config_state.get('ollama_base_url', '')
192+
193+
@ollama_base_url.setter
194+
def ollama_base_url(self, value: str):
195+
self.config_state['ollama_base_url'] = value
196+
self.save()
197+
198+
@property
199+
def openai_base_url(self) -> str:
200+
return self.config_state.get('openai_base_url', '')
201+
202+
@openai_base_url.setter
203+
def openai_base_url(self, value: str):
204+
self.config_state['openai_base_url'] = value
205+
self.save()
206+
207+
@property
208+
def embedding_provider(self) -> str:
209+
return self.config_state.get('embedding_provider', '')
210+
211+
@embedding_provider.setter
212+
def embedding_provider(self, value: str):
213+
self.config_state['embedding_provider'] = value if value else ''
214+
self.save()
215+
216+
@property
217+
def embedding_ollama_base_url(self) -> str:
218+
return self.config_state.get('embedding_ollama_base_url', '')
219+
220+
@embedding_ollama_base_url.setter
221+
def embedding_ollama_base_url(self, value: str):
222+
self.config_state['embedding_ollama_base_url'] = value
223+
self.save()
224+
225+
@property
226+
def embedding_openai_base_url(self) -> str:
227+
return self.config_state.get('embedding_openai_base_url', '')
228+
229+
@embedding_openai_base_url.setter
230+
def embedding_openai_base_url(self, value: str):
231+
self.config_state['embedding_openai_base_url'] = value
232+
self.save()
233+
234+
@property
235+
def embedding_model(self) -> str:
236+
return self.config_state.get('embedding_model', '')
237+
238+
@embedding_model.setter
239+
def embedding_model(self, value: str):
240+
self.config_state['embedding_model'] = value if value else ''
241+
self.save()
187242

188243
def update_from_args(self, args):
189244
args_dict = vars(args)
@@ -218,6 +273,12 @@ def parse_cmdline_args():
218273
parser.add_argument('-a', '--attack-temperature', type=float, default=None, help="Temperature for attack model")
219274
parser.add_argument('-d', '--debug-level', type=int, default=None, help="Debug level (0-2)")
220275
parser.add_argument("-b", '--batch', action='store_true', help="Run the fuzzer in unattended (batch) mode, bypassing the interactive steps")
276+
parser.add_argument('--ollama-base-url', type=str, dest='ollama_base_url', default=None, help="Base URL for Ollama API")
277+
parser.add_argument('--openai-base-url', type=str, dest='openai_base_url', default=None, help="Base URL for OpenAI API")
278+
parser.add_argument('--embedding-provider', type=str, dest='embedding_provider', default=None, help="Embedding provider (ollama or open_ai)")
279+
parser.add_argument('--embedding-ollama-base-url', type=str, dest='embedding_ollama_base_url', default=None, help="Base URL for Ollama Embedding API")
280+
parser.add_argument('--embedding-openai-base-url', type=str, dest='embedding_openai_base_url', default=None, help="Base URL for OpenAI Embedding API")
281+
parser.add_argument('--embedding-model', type=str, dest='embedding_model', default=None, help="Embedding model name")
221282
parser.add_argument('system_prompt_file', type=str, nargs='?', default=None, help="Filename containing the system prompt")
222283
return parser.parse_args()
223284

ps_fuzz/attack_config.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from .client_config import ClientConfig
22

33
class AttackConfig(object):
4-
def __init__(self, attack_client: ClientConfig, attack_prompts_count: int):
4+
def __init__(self, attack_client: ClientConfig, attack_prompts_count: int, embedding_config=None):
55
self.attack_client = attack_client
66
self.attack_prompts_count = attack_prompts_count
7+
self.embedding_config = embedding_config

ps_fuzz/attack_loader.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,6 @@
1010
complimentary_transition,
1111
harmful_behavior,
1212
base64_injection,
13-
custom_benchmark
13+
custom_benchmark,
14+
rag_poisoning
1415
)

ps_fuzz/attack_registry.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ def register_test(cls):
1414
global test_classes
1515
logger.debug(f"Registering attack test class: {cls.__name__}")
1616
test_classes.append(cls)
17+
return cls
1718

1819
def instantiate_tests(client_config: ClientConfig, attack_config:AttackConfig, custom_tests:List=None, custom_benchmark:bool=False) -> List[TestBase]:
1920
tests = []

0 commit comments

Comments
 (0)