Skip to content

Release v3.3.0#157

Merged
RomiconEZ merged 429 commits intoreleasefrom
release-3-3-0
Jul 27, 2025
Merged

Release v3.3.0#157
RomiconEZ merged 429 commits intoreleasefrom
release-3-3-0

Conversation

@RomiconEZ
Copy link
Member

@RomiconEZ RomiconEZ commented Jul 15, 2025

Changelog v3.3.0

  1. Redesigned the output of testing parameter presets. Added the following presets: all, owasp:llm01, owasp:llm07, owasp:llm09, llm, vlm, eng, rus.
  2. Added a new Linguistic Sandwich attack. An adversarial prompt in a low-resource language is sandwiched between benign prompts in other languages.
  3. In the System Prompt Leakage attack, the heuristiс evaluation has been replaced with LLM-as-a-judge. This checks the similarity between the system's output and the intended prompt based on the system description.
  4. The static Past Tense attack has become the dynamic Time Machine attack. The attacking model now alters the temporal context of the adversarial prompt.
  5. Add new tag - model: llm / vlm
  6. README update - Enterprise Version announce
  7. Other minor fixes and improvements.

nizamovtimur and others added 30 commits December 10, 2024 15:14
* Set dependency - httpx == 0.27.2

* Release v1.1.0

* Delete deprecate img and and chroma-data to gitignore
Union history
rewrite all examples notebooks in english
* Implement class "MultiStageInteractionSession" for multistage attack. Add new functionality for ChatSession class.

* Add multistage to the sycophancy and logical tests

---------

Co-authored-by: Roman <roman.nieronov@mail.ru>
Refactor sycophancy and logical_inconsistencies and linguistic
Add refine_attack_prompt func to MultiStageInteractionSession.
@RomiconEZ RomiconEZ self-assigned this Jul 15, 2025
Copy link
Member

@nizamovtimur nizamovtimur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog v3.2.0..v3.3.0

  1. Redesigned the output of testing parameter presets. Added the following presets: all, owasp:llm01, owasp:llm07, owasp:llm09, llm, vlm, eng, rus.
  2. Added a new Linguistic Sandwich attack. An adversarial prompt in a low-resource language is sandwiched between benign prompts in other languages.
  3. In the System Prompt Leakage attack, the heuristiс evaluation has been replaced with LLM-as-a-judge. This checks the similarity between the system's output and the intended prompt based on the system description.
  4. The static Past Tense attack has become the dynamic Time Machine attack. The attacking model now alters the temporal context of the adversarial prompt.
  5. Other minor fixes and improvements.

@NickoJo
Copy link
Member

NickoJo commented Jul 15, 2025

  1. Completely

это лучше убрать, смешное слово

Add NoneType checking for Judge Model responses fix AutoDAN-Turbo
@nizamovtimur nizamovtimur self-requested a review July 21, 2025 11:03
Copy link
Member

@nizamovtimur nizamovtimur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

требуется добавить в этот релиз фикс из #158
описание релиза можно оставить таким же, как и в моем прошлом ревью

@RomiconEZ RomiconEZ requested a review from Copilot July 27, 2025 12:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This is a major version release (v3.3.0) that introduces significant improvements to attack preset handling, adds new attack methods, and enhances the overall framework functionality.

  • Replaced parameter-based configuration with a dynamic preset system supporting multiple categories and OWASP classifications
  • Added new attack modules including Time Machine, Linguistic Sandwich attacks
  • Enhanced existing attack modules with improved error handling and model compatibility tags

Reviewed Changes

Copilot reviewed 54 out of 56 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/llamator/__version__.py Version bump to 3.3.0
src/llamator/utils/test_presets.py Complete rewrite of preset system with dynamic generation based on attack tags
src/llamator/utils/attack_params.py Refactored to support new preset system and improved parameter handling
src/llamator/attacks/time_machine.py New attack module for temporal framing vulnerabilities
src/llamator/attacks/linguistic_sandwich.py New attack exploiting attention blink in low-resource languages
tests/print_test_preset_test.py New utility script for displaying preset configurations
Multiple attack files Added model compatibility tags and improved descriptions

@nizamovtimur nizamovtimur self-requested a review July 27, 2025 15:38
@RomiconEZ RomiconEZ merged commit 105e215 into release Jul 27, 2025
4 of 5 checks passed
@RomiconEZ RomiconEZ deleted the release-3-3-0 branch September 7, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments