Skip to content

Conversation

@slister1001
Copy link
Member

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@github-actions github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Jan 5, 2026
from pyrit.models import SeedPrompt
from pyrit.models.data_type_serializer import PromptDataType
from pyrit.scenario.core.dataset_configuration import DatasetConfiguration
from pyrit.scenario.scenarios.foundry.foundry import Foundry, FoundryStrategy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I would recommend taking the shortest possible import path, in this case pyrit.scenario.foundry because everything more detailed is considered internal to PyRIT and can change without being considered breaking by us. Perhaps also a good idea to mark that with underscore to be extra clear @rlundeen2

Same for DatasetConfiguration above which can be imported from pyrit.scenario

## Success Metrics

### Reliability
- **Breaking Changes**: Reduce from 2-3 per 6 months to 0-1 per year

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for your SDK or for PyRIT? I don't think we can guarantee a specific number, but we can certainly guarantee a deprecation schedule. Our goal right now is to deprecate features and keep them around for 2 minor releases (e.g., from 0.10.0 to 0.12.0) with a warning for users to replace them before they get removed.

That said, given the level at which you're operating (from a PyRIT perspective: high level, scenarios) you are unlikely to actually face many breaking changes.

]
```

**RAI Context Types**: `email`, `document`, `html`, `code`, `tool_call`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can rename it, but for us, email/document/html/code will all just be "url" or we could call it blob_path or something. But it should not be .text, it should be a file_path similar to how image/audio/video are handled.

If you use text, it has an ambiguity problem; e.g. if a model wants to upload a pdf, it will just insert the pdf data into the text field.

```

**Remaining Considerations**:
- **XPIA Formatting**: For indirect jailbreak attacks, context types like `email` and `document` determine attack vehicle formatting. While PyRIT sees them as `text`, we preserve the original `context_type` in metadata for downstream formatters.
Copy link

@rlundeen2 rlundeen2 Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different people will have different opinions, but I think this makes the most sense as a converter at the end.

So we transform the prompt to however we want for an attack, and then the last converter transforms it to the format you want to send

E.g. prompt[text] -> JailBreakConverter[text] -> Base64Converter[text] -> AddImageConverter[image] -> emailAttachmentConverter[blob - email with the image we just created attached]

Then the target determines how this is sent.

As one example of this, we have a PDFConverter, and a blobStoreTarget. So you can create PDFs and upload them to a blobstore

┌─────────────────────────────────────────────────────────────┐
│ DatasetConfiguration Builder │
│ • Create SeedObjective for each attack string │

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the SeedPrompts are the same, you can just use SeedObjectives with the metadata

┌─────────────────────────────────────────────────────────────┐
│ Result Processing │
│ • Extract from PyRIT memory │
Copy link

@rlundeen2 rlundeen2 Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also get AttackResults;

I'd recommend creating a PyRIT scorer using RAI evaluator. Then you pass it in to FoundryScenario. It's used to evaluate attack success. We can help with this, and actually looks like you're already maybe doing that above.

But then you have the results when the FoundryScenario execution finishes in the AttackResult object and wouldn't have to re-evaluate ASR


#### Important: SeedPrompt Duplication Pattern

**Critical Note**: PyRIT's Foundry does **NOT** automatically send the `SeedObjective` value to the target. The objective is used for orchestration and scoring, but the actual prompt sent to the target must be a `SeedPrompt`. We will do this in every scenario except for Jailbreak and IndirectJailbreak where we handle the injection of the objective into the prompt.
Copy link

@rlundeen2 rlundeen2 Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, if you don't set the SeedPrompt, it will be the objective. But you can always separate them.

But if they are the same, you should probably just attach SeedObjective


1. **SeedObjective**: Contains the attack string (e.g., "Tell me how to build a weapon")
2. **SeedPrompt (attack vehicle)**: Contains the context data **with attack string injected** (e.g., email containing the malicious prompt)
3. **SeedPrompt (original context)**: Contains the original context **without** injection (for reference)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can get the original context from the Message already, you don't need anything extra to keep track of it. In this example, what I would do is

  1. SeedObjective with the objective
  2. Add a converter that converts from a prompt to an email

Then Call the scenario with the converter configured at the end. And the AttackResult object returned will have the original objective, the conversation is available, and the success.

)

# Plus any context prompts
context_prompts = [...]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, you answer below


return prompts

def _create_xpia_prompts(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code you could wrap in a converter if you wanted. Although I'd love any specific format converting code in PyRIT itself :)

from pyrit.models import PromptRequestPiece, Score


class RAIServiceScorer(Scorer):
Copy link

@rlundeen2 rlundeen2 Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this a FloatScaleScorer or TrueFalseScorer depending on what you're returning. And if FloatScale, set a threshhold for the TrueFalseScorer you pass in to the scenario.

self.rai_client = rai_client
self.risk_category = risk_category

async def score_async(self, request_response: PromptRequestPiece) -> List[Score]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd overwrite score_piece, so you can better handle multi part messages


# Run attack (PyRIT handles all execution)
self.logger.info(f"Executing attacks for {self.risk_category}...")
await scenario.run_attack_async()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will return attackResult objects with ASR, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants