AI Red-Teaming

A framework for testing AI model safety and robustness against adversarial prompts. The project tests Ollama models.

Overview

This project is a systematic tool for evaluating language model vulnerability to adversarial attacks. It tests models (running via Ollama) with a curated library of harmful/misuse prompts to measure safety and identify potential weaknesses.

Core Components

Redteaming results

Attack Runner

Loads prompts from categorized text files and runs them against a specified model, collecting responses for analysis.

Scoring System

Evaluates model responses for signs of harmful output by detecting:

Refusal patterns (e.g., "I cannot", "I won't")
Step-by-step instructions or code
Detailed procedural information
Response length and structure

Dashboard

Interactive visualization for results:

Filters by model and prompt category
Metrics: total tests, average harm score, unique prompts
Displays risky responses ranked by harm score
Prompt library statistics

Logging System

Records results in:

CSV format for aggregated data
JSON format with timestamps for raw results

(Will be implementing more robust logging mechanism)

Workflow

Load adversarial prompts
Send each prompt to an Ollama model
Score responses for harmful content
Log results to CSV and JSON
View results in the Streamlit dashboard

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
app		app
logs		logs
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Red-Teaming

Overview

Core Components

Attack Runner

Scoring System

Dashboard

Logging System

Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Red-Teaming

Overview

Core Components

Attack Runner

Scoring System

Dashboard

Logging System

Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages