FEAT Add JailbreakV_28k dataset from HF #1098

AdrGav941 · 2025-09-22T22:09:48Z

Description

This PR adds support for the JailbreakV_28k dataset to PyRIT.
One notable departure from multimodal dataset fetching present here is that we need a local download of the images via a Google Drive download provided by the owners of the HF dataset. The share link to the zip file is in the function comments and this function does not work without this being downloaded locally due to the number of images missing in HF.
Unzipping if the extracted file is not present at the provided path is handled, as of right now we do not use HF at all for image download due to the large number of missing images so the zip directory is a mandatory parameter.

Addresses #1007

Changes Made:

Added integration for JailbreakV_28k
Normalizes and associates the datasets "policy" column with harm-category
Allows for filtering on harm categories (policy values)

Files Added/Modified:

pyrit/datasets/fetch_jailbreakv_28k_dataset.py - Main implementation
pyrit/datasets/init.py - Added exports for new functions
tests/unit/datasets/test_fetch_jailbreakv_28k_dataset.py - Unit tests
tests\integration\datasets\test_fetch_datasets.py - Integration tests added

Tests and Documentation

PyTest parametrized testing for filtering and choice of text field (dataset has jailbreak and redteaming prompts)
Dataset mocking with both text fields and policy mapped to harm_category

romanlutz

Thanks for getting started on this!

The integration test for datasets is missing, but I suspect it will require a custom one as the dataset is meant to be multimodal (see other comment).

pyrit/datasets/fetch_jailbreakv_28k_dataset.py

pyrit/datasets/__init__.py

pyrit/datasets/fetch_jailbreakv_28k_dataset.py

tests/integration/datasets/test_fetch_datasets.py

pyrit/datasets/fetch_jailbreakv_28k_dataset.py

romanlutz

Great work! Two small adjustments and we're ready to merge.

tests/integration/datasets/test_fetch_datasets.py

pyrit/datasets/fetch_jailbreakv_28k_dataset.py

romanlutz · 2025-12-07T14:08:47Z

@AdrGav941 a lot changed in datasets the last couple of weeks. We should have really tried to merge it before the changes but didn't quite get to it. Please let me know if you want to make the changes yourself or if we should make the change.

AdrGav941 · 2025-12-07T14:23:30Z

@romanlutz im happy to make the changes, I'm on vacation until the 19th but can get it working again when i get back!

romanlutz · 2025-12-07T14:53:22Z

@romanlutz im happy to make the changes, I'm on vacation until the 19th but can get it working again when i get back!

No hurry 🙂

…kV_28K_dataset' into add__HF_jailbreakV_28K_dataset

pyrit/datasets/seed_datasets/remote/jailbreakv_28k_dataset.py

tests/integration/datasets/test_seed_dataset_provider_integration.py

romanlutz

Couple minor comments, otherwise this looks good to me. Just need to try it out once to make sure it works.

romanlutz · 2026-01-04T23:58:29Z

Was just trying this out. Downloaded the zip file, put it in the home directory, and then ran it.

README.md: 7.27kB [00:00, 15.4MB/s]
mini_JailBreakV_28K.csv: 230kB [00:00, 3.45MB/s]
JailBreakV_28K/JailBreakV_28K.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 23.2M/23.2M [00:02<00:00, 9.00MB/s]
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/workspace/pyrit/datasets/seed_datasets/remote/jailbreakv_28k_dataset.py", line 245, in fetch_dataset
raise ValueError(
ValueError: JailBreakV-28K fetch failed: 100.0% of items are missing images (280 out of 280 items processed). Only 0 valid pairs were created. At least 50% of items must have valid images. Please ensure the ZIP file contains the full image set.

Have you seen this before? This is on Linux (devcontainer). On Windows it works for me.

What confuses me, though, is that I got 280 pairs (560 total) with "mini" and 28000 (56000 total) with the full split, yet the zip file has the following folders for images

query_related with 6001 items (which maps to 6k rows in the CSV)
llm_transfer_attack with 6002 items (which maps to 20k rows in the CSV, 5k of them are just using the blank image, about 2.8k of them are used more than once and up to 17 times, the remaining are ~1k are used just once and curiously there are also ~2.2k that are never used at all)
figstep with 4000 items (which maps to 2k rows, apparently none of the images with name "query_image_*" are used)

I guess we can ignore the question of why they decided to put it together this way for this PR since it's not about "what to select from this" yet (that would be a follow-up task). I would, however, like to capture the metadata here:

policy is already captured via the harm categories, but the others... I imagine we'll do something in this direction in the not too distant future and being able to trace it back to the original dataset could prove helpful.

Somewhat concerning: ~~I've found that many repetitions of images have the same text prompts ("redteam_query") as well. The difference is only in the "jailbreak_query". ~~ Figured it out! The paper explains this fairly well:

So here's what I'm thinking: jailbreak_query maps to what we call SeedPrompt (i.e., the text prompt being sent) and redteam_query maps to what we call SeedObjective (in other words: the goal behind what the text+image is trying to achieve)

This leaves us with a few options:

We provide the jailbreak_query as SeedPrompt and ignore redteam_query for this dataset. That means we give people exactly the things the dataset provides to send to a target.
We provide additionally the redteam_query as SeedObjective. This is preferable even if we don't send it to the target because it'll help in scoring. The scorer works a lot better when the objective is clearly spelled out and some of the jailbreak_query contents are obfuscated (on purpose).
Additionally, provide a dataset of just the objectives. This would be enormously useful for AI-led attacks as they need good representative objectives. they reference RedTeam-2K a ton in this as the pre-step. I would love to provide that additionally as a separate dataset. See this distribution by topic (nice!):
There's a separate CSV file in the zip for this and it has 2K (as the name says) rows. I checked for the number of unique redteam_query items and those are also 2k so I'm willing to bet they match (I checked a few but not all).

I think we want to go with number 2 AND 3, but as separate fetchers.

Separate note: we don't have an attack where there's an objective and the adversarial target generates both the text AND the image for a multi-modal attack on an objective_target. I've wanted that for a while and this should happen sometime soon 😆

pyrit/datasets/seed_datasets/remote/jailbreakv_28k_dataset.py

…eed, fixing casing issue impacting Linux

romanlutz reviewed Sep 23, 2025

View reviewed changes

pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated Show resolved Hide resolved

pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated Show resolved Hide resolved

pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated Show resolved Hide resolved

hannahwestra25 reviewed Sep 23, 2025

View reviewed changes

pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated Show resolved Hide resolved

romanlutz reviewed Sep 26, 2025

View reviewed changes

romanlutz self-assigned this Sep 28, 2025

romanlutz mentioned this pull request Oct 7, 2025

FEAT: add support for multimodal data from HarmBench #1110

Merged

romanlutz reviewed Oct 22, 2025

View reviewed changes

pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated Show resolved Hide resolved

romanlutz reviewed Oct 22, 2025

View reviewed changes

pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated Show resolved Hide resolved

pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated Show resolved Hide resolved

romanlutz reviewed Oct 28, 2025

View reviewed changes

tests/integration/datasets/test_fetch_datasets.py Outdated Show resolved Hide resolved

pyrit/datasets/fetch_jailbreakv_28k_dataset.py Outdated Show resolved Hide resolved

AdrGav941 requested a review from romanlutz November 12, 2025 18:52

Restructuring JailbreakV dataset to work with overall dataset refactor

f117a58

AdrGav941 force-pushed the add__HF_jailbreakV_28K_dataset branch from 8c52bc6 to f117a58 Compare December 29, 2025 22:46

AdrGav941 and others added 4 commits December 29, 2025 17:47

Merge branch 'main' into add__HF_jailbreakV_28K_dataset

f93fbc7

Pre-commit hooks

2e5d6cb

Merge remote-tracking branch 'refs/remotes/adrgav941/add__HF_jailbrea…

6298c64

…kV_28K_dataset' into add__HF_jailbreakV_28K_dataset

Merge branch 'main' into add__HF_jailbreakV_28K_dataset

94e6727

romanlutz reviewed Jan 3, 2026

View reviewed changes

pyrit/datasets/seed_datasets/remote/jailbreakv_28k_dataset.py Outdated Show resolved Hide resolved

romanlutz reviewed Jan 3, 2026

View reviewed changes

tests/integration/datasets/test_seed_dataset_provider_integration.py Outdated Show resolved Hide resolved

romanlutz approved these changes Jan 3, 2026

View reviewed changes

Comment clarity and making category enum private

ea87e90

romanlutz reviewed Jan 6, 2026

View reviewed changes

pyrit/datasets/seed_datasets/remote/jailbreakv_28k_dataset.py Outdated Show resolved Hide resolved

Adrian Gavrila and others added 3 commits January 7, 2026 11:25

Removing text field specification, adding redteam query as ObjectiveS…

48ec348

…eed, fixing casing issue impacting Linux

Adding RedTeam_2K subset of JailbreakV as separate fetcher

cce6ed7

Merge branch 'main' into add__HF_jailbreakV_28K_dataset

e60297d

FEAT Add JailbreakV_28k dataset from HF #1098

Are you sure you want to change the base?

FEAT Add JailbreakV_28k dataset from HF #1098

Conversation

AdrGav941 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

romanlutz commented Dec 7, 2025

Uh oh!

AdrGav941 commented Dec 7, 2025

Uh oh!

romanlutz commented Dec 7, 2025

Uh oh!

Uh oh!

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

romanlutz commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AdrGav941 commented Sep 22, 2025 •

edited

Loading

romanlutz commented Jan 4, 2026 •

edited

Loading