Skip to content

KeerthiNingegowda/AI_pentesting_RAG_chatbot

Repository files navigation

AI_pentesting_RAG_chatbot

This repo consist of exploratory work related to AI pen testing using open source versions of garak and promptfoo

Note:- Please use this repo for exploration and learning purposes while keeping ethics in mind. The responses elicited from this exploration or the type of probes used during this exploration DOES NOT represent my beliefs.

Medium article can be found at https://medium.com/@keerthi.ningegowda/one-prompt-to-break-it-all-automated-ai-red-teaming-with-garak-and-promptfoo-315331438fbf?postPublishedType=initial

The RAG workflow at https://github.com/KeerthiNingegowda/n8n_workflow was exposed via webhook and the associated API was pentested to mimic production. Plus this probbaly the only was to pentest GenAI features at scale

Note:- Basic authorization was used in n8n workflow. Convert the username and password using Bse64 encoding

echo -n "uname":"pwd" | base64

Burpsuite

For tracking the request-response cycle between these pentesting tool and the N8N workflow you can use Burpsuite community edition. The tool intercepts the CLI http request-response back-and-forth via a proxy. Long story short you get to

  • See each prompt and associated buffs and the associated response from the AI model
  • Manipulate the request from garak on the fly, without and entire orchestra

AI Pentesting

Garak

Before starting with pentesting, do check out the original paper published by authors of garak - https://arxiv.org/pdf/2406.11036 . A snapshot of their framework is below:-

To list the all the probes via CLI

garak --list_probes

In this example, the RAG workflow at https://github.com/KeerthiNingegowda/n8n_workflow was exposed via webhook.

Sample example - running DAN (Do Anything Now) probes

python3 -m garak -m rest.RestGenerator -G ./garak_testing/telecom_config.json --probes dan.AntiDAN --report_prefix ~/Desktop/AI_pentesting_RAG_chatbot/garak_reports/antiDAN

Pro-tip:- Use -g option to control the number of prompts to be generated for each probe type. Use --parallel_attempts option to parallelize sending requests

Sample result for probe hijacking safety filters of an LLM and making it say bad stuff about humans

From this exploartion, I think that garak is more appropriate to use at model level rather than application level. Garak also has some limited probes related to AI applications related to different modalities like images.

Promptfoo

Is quite similar to garak, except that it is more dynamic and developer facing than garak. In garak the probes are created using a database with known attacks and vulnerabilities whereas in promptfoo, the probes are more context-specific i.e you need to provide information on what type of application you are testing to be able to pentest using promptfoo.

The difference between Promptfoo and garak is quite well discussed at https://www.promptfoo.dev/blog/promptfoo-vs-garak/

A simple snapshot of their methodology

In arguably promptfoo has more comprehensive test suite/strategies compared to garak. Very easy to configure how many test cases to run as opposed to garak. It is very tricky to do this in garak. Note that for some of the basic testing strategies an API key is not necessary. The type of plugins you use will determine how long it will take for your testing to run.

To run a test with yaml file use - In the directory where promptfooconfig.yaml is present

promptfoo redteam run

To view the report - In the directory where promptfooconfig.yaml is present

promptfoo redteam report

Sample response for successful document extraction in RAG

Insights and Current Gaps in Open-source LLM Vulnerability Scanners

https://arxiv.org/html/2410.16527v2

Note:- In this paper promptfoo is not included for sake of comparison. But has a lot of interesting insights from similar other tools like Giskard.

Secrets scanning

Use trufflehog to check if you left any secrets/api keys hanging around. Based on the results you can doublecheck your code/results

You can also rely on Gitguardian . This looks more efficient than trufflehog

PS:- I intentionally left my n8n basic auth credentials 😂. Gitguardian caught it and trufflehog didnt. Just as a reminder N8N workflow is running locally not even exposed within local network.

About

This repo consist of exploratory work related to AI pen testing using open source versions of garak, promptfoo

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages