Skip to content

Conversation

@AdrGav941
Copy link
Contributor

@AdrGav941 AdrGav941 commented Sep 22, 2025

Description

This PR adds support for the JailbreakV_28k dataset to PyRIT.
One notable departure from multimodal dataset fetching present here is that we need a local download of the images via a Google Drive download provided by the owners of the HF dataset. The share link to the zip file is in the function comments and this function does not work without this being downloaded locally due to the number of images missing in HF.
Unzipping if the extracted file is not present at the provided path is handled, as of right now we do not use HF at all for image download due to the large number of missing images so the zip directory is a mandatory parameter.

Addresses #1007

Changes Made:

  • Added integration for JailbreakV_28k
  • Normalizes and associates the datasets "policy" column with harm-category
  • Allows for filtering on harm categories (policy values)

Files Added/Modified:

  • pyrit/datasets/fetch_jailbreakv_28k_dataset.py - Main implementation
  • pyrit/datasets/init.py - Added exports for new functions
  • tests/unit/datasets/test_fetch_jailbreakv_28k_dataset.py - Unit tests
  • tests\integration\datasets\test_fetch_datasets.py - Integration tests added

Tests and Documentation

  • PyTest parametrized testing for filtering and choice of text field (dataset has jailbreak and redteaming prompts)
  • Dataset mocking with both text fields and policy mapped to harm_category

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting started on this!

The integration test for datasets is missing, but I suspect it will require a custom one as the dataset is meant to be multimodal (see other comment).

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Two small adjustments and we're ready to merge.

@AdrGav941 AdrGav941 requested a review from romanlutz November 12, 2025 18:52
@romanlutz
Copy link
Contributor

@AdrGav941 a lot changed in datasets the last couple of weeks. We should have really tried to merge it before the changes but didn't quite get to it. Please let me know if you want to make the changes yourself or if we should make the change.

@AdrGav941
Copy link
Contributor Author

@romanlutz im happy to make the changes, I'm on vacation until the 19th but can get it working again when i get back!

@romanlutz
Copy link
Contributor

@romanlutz im happy to make the changes, I'm on vacation until the 19th but can get it working again when i get back!

No hurry 🙂

@AdrGav941 AdrGav941 force-pushed the add__HF_jailbreakV_28K_dataset branch from 8c52bc6 to f117a58 Compare December 29, 2025 22:46
Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor comments, otherwise this looks good to me. Just need to try it out once to make sure it works.

@romanlutz
Copy link
Contributor

romanlutz commented Jan 4, 2026

Was just trying this out. Downloaded the zip file, put it in the home directory, and then ran it.

README.md: 7.27kB [00:00, 15.4MB/s]
mini_JailBreakV_28K.csv: 230kB [00:00, 3.45MB/s]
JailBreakV_28K/JailBreakV_28K.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 23.2M/23.2M [00:02<00:00, 9.00MB/s]
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/workspace/pyrit/datasets/seed_datasets/remote/jailbreakv_28k_dataset.py", line 245, in fetch_dataset
raise ValueError(
ValueError: JailBreakV-28K fetch failed: 100.0% of items are missing images (280 out of 280 items processed). Only 0 valid pairs were created. At least 50% of items must have valid images. Please ensure the ZIP file contains the full image set.

Have you seen this before? This is on Linux (devcontainer). On Windows it works for me.

What confuses me, though, is that I got 280 pairs (560 total) with "mini" and 28000 (56000 total) with the full split, yet the zip file has the following folders for images

  • query_related with 6001 items (which maps to 6k rows in the CSV)
  • llm_transfer_attack with 6002 items (which maps to 20k rows in the CSV, 5k of them are just using the blank image, about 2.8k of them are used more than once and up to 17 times, the remaining are ~1k are used just once and curiously there are also ~2.2k that are never used at all)
  • figstep with 4000 items (which maps to 2k rows, apparently none of the images with name "query_image_*" are used)

I guess we can ignore the question of why they decided to put it together this way for this PR since it's not about "what to select from this" yet (that would be a follow-up task). I would, however, like to capture the metadata here:
image
policy is already captured via the harm categories, but the others... I imagine we'll do something in this direction in the not too distant future and being able to trace it back to the original dataset could prove helpful.

Somewhat concerning: ~~I've found that many repetitions of images have the same text prompts ("redteam_query") as well. The difference is only in the "jailbreak_query". ~~ Figured it out! The paper explains this fairly well:
image
So here's what I'm thinking: jailbreak_query maps to what we call SeedPrompt (i.e., the text prompt being sent) and redteam_query maps to what we call SeedObjective (in other words: the goal behind what the text+image is trying to achieve)

This leaves us with a few options:

  1. We provide the jailbreak_query as SeedPrompt and ignore redteam_query for this dataset. That means we give people exactly the things the dataset provides to send to a target.
  2. We provide additionally the redteam_query as SeedObjective. This is preferable even if we don't send it to the target because it'll help in scoring. The scorer works a lot better when the objective is clearly spelled out and some of the jailbreak_query contents are obfuscated (on purpose).
  3. Additionally, provide a dataset of just the objectives. This would be enormously useful for AI-led attacks as they need good representative objectives. they reference RedTeam-2K a ton in this as the pre-step. I would love to provide that additionally as a separate dataset. See this distribution by topic (nice!):
    image There's a separate CSV file in the zip for this and it has 2K (as the name says) rows. I checked for the number of unique redteam_query items and those are also 2k so I'm willing to bet they match (I checked a few but not all).

I think we want to go with number 2 AND 3, but as separate fetchers.

Separate note: we don't have an attack where there's an objective and the adversarial target generates both the text AND the image for a multi-modal attack on an objective_target. I've wanted that for a while and this should happen sometime soon 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants