Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
429 commits
Select commit Hold shift + click to select a range
d8d2203
Merge pull request #44 from RomiconEZ/add-data-fit-to-num_attempts
nizamovtimur Dec 10, 2024
3f9238d
Add WhatsApp example in README
RomiconEZ Dec 10, 2024
c000a1a
Add WhatsApp example in Doc
RomiconEZ Dec 10, 2024
87b55f3
WhatsApp example
RomiconEZ Dec 10, 2024
ccb2159
Add model_description to ClientWhatsAppSelenium init
RomiconEZ Dec 10, 2024
32473ab
Merge pull request #45 from RomiconEZ/whatsapp-example
nizamovtimur Dec 11, 2024
847805c
Main - release v1.1.0 (#46)
RomiconEZ Dec 12, 2024
08d595b
Release v1.1.1 (#47)
RomiconEZ Dec 12, 2024
c8156be
Release v1.1.1
RomiconEZ Dec 12, 2024
5a59018
Merge branch 'release'
RomiconEZ Dec 13, 2024
e50a277
rewrite all examples notebooks in english
nizamovtimur Dec 18, 2024
b275bcc
Merge pull request #50 from RomiconEZ/translate-examples
nizamovtimur Dec 20, 2024
b2511b9
fix attack model system prompt
nizamovtimur Dec 24, 2024
c5c2ac5
Merge pull request #52 from RomiconEZ/small-fix-examples
nizamovtimur Dec 25, 2024
da2b320
Multi stage attack (#51)
nizamovtimur Dec 26, 2024
651999f
move `stop_criterion` from loop
nizamovtimur Dec 27, 2024
f7c7e69
fix sycophancy and logical_inconsistencies naming
nizamovtimur Dec 27, 2024
d3bfe8c
rename `translation.py` to `linguistic.py`
nizamovtimur Dec 27, 2024
28db862
Merge pull request #54 from RomiconEZ/refactor-multistages
nizamovtimur Dec 27, 2024
86541a6
Add refine_attack_prompt func to MultiStageInteractionSession.
RomiconEZ Dec 27, 2024
584d362
Add refine_tested_client_prompt and refine_attacker_prompt funcs to M…
RomiconEZ Dec 27, 2024
830b408
enhance whatsapp example
nizamovtimur Dec 28, 2024
a10b227
sync logic with handling tested_client_response before passing it to…
nizamovtimur Dec 28, 2024
0a3a55c
pre-commit
nizamovtimur Dec 28, 2024
87e44cd
Merge pull request #55 from RomiconEZ/refine_attack_prompt
nizamovtimur Dec 28, 2024
a854cda
Merge pull request #56 from RomiconEZ/enhance-whatsapp-example
nizamovtimur Dec 28, 2024
3adc800
added harmful_behavior_multistage.py
NickoJo Dec 26, 2024
1c63774
corrected harmful_behavior_multistage.py according to the new logic i…
NickoJo Dec 28, 2024
6bb723b
corrected attack
NickoJo Dec 28, 2024
fed170e
corrected attack
NickoJo Dec 28, 2024
30c91c5
add `harmful_behavior_multistage` to docs
nizamovtimur Dec 29, 2024
f00faf1
Merge pull request #53 from RomiconEZ/multi-stage-attack
nizamovtimur Dec 29, 2024
024ee01
Add param history_limit. Add kwargs and args to all attacks. Upgrade …
RomiconEZ Jan 11, 2025
8a38171
Adjust colour theme for dark mode for doc
RomiconEZ Jan 11, 2025
b0aecb9
Adjust colour theme for light mode for doc
RomiconEZ Jan 11, 2025
6e9a368
Run pre-commit
RomiconEZ Jan 11, 2025
6927cb1
Rename history_limit to multistage_depth. Change constant history_lim…
RomiconEZ Jan 11, 2025
a24b957
Run pre-commit
RomiconEZ Jan 11, 2025
2edf701
Update examples to use param - multistage_depth
RomiconEZ Jan 12, 2025
c4425cb
fix missing `multistage_depth` in `attack_registry.py` and actualize …
nizamovtimur Jan 12, 2025
905cd04
add `system_prompt_leakage` attack with dataset (#58)
nizamovtimur Jan 12, 2025
881bb04
corrected refine_kwargs
NickoJo Jan 12, 2025
daea518
typo fix + pre-commit
NickoJo Jan 12, 2025
0cda5e0
Merge pull request #61 from RomiconEZ/hb-issue-fix
nizamovtimur Jan 12, 2025
d448a8d
Merge branch 'main' into multistage_depth
nizamovtimur Jan 12, 2025
55c21db
add to `multistage_depth` to `system_prompt_leakage`
nizamovtimur Jan 12, 2025
5c243d3
actualize example
nizamovtimur Jan 12, 2025
b10034b
Update text blocks in ipynb examples. Add more test as comment to tes…
RomiconEZ Jan 12, 2025
c9b38d1
Merge pull request #60 from RomiconEZ/multistage_depth
RomiconEZ Jan 13, 2025
8f83521
Add more info to attacks docs (#62)
nizamovtimur Jan 13, 2025
203f4c5
Merge release v2.0.0 to main (#66)
RomiconEZ Jan 18, 2025
32b8c40
small fix for attacks and add strip parameter for ChatSession (#65)
nizamovtimur Jan 18, 2025
dcd0fb7
Update main to v2.0.1 (#68)
RomiconEZ Jan 18, 2025
36e4afb
added BON attack (#70)
NickoJo Jan 26, 2025
67f5eb2
enhance attack sysprompt for prompt leakage and add new examples (#69)
nizamovtimur Jan 26, 2025
59e6635
remove ps (#73)
nizamovtimur Jan 28, 2025
7867890
Update CONTRIBUTING.md (#76)
nizamovtimur Jan 30, 2025
3b2c91f
Crescendo (#72)
nizamovtimur Jan 30, 2025
e1c7213
Docker example (#77)
RomiconEZ Feb 1, 2025
6283d13
fix bon paper and add paper for sycophancy (#78)
nizamovtimur Feb 3, 2025
af60896
Harmful warning (#79)
nizamovtimur Feb 4, 2025
d439156
Release v2.1.0
RomiconEZ Feb 5, 2025
ec457f6
Added suffix attack
Shine-afk Feb 5, 2025
abcfab9
Update attack_descriptions.md
Shine-afk Feb 6, 2025
26e43c3
Update attack_descriptions.json
Shine-afk Feb 6, 2025
6b948ad
Update attack_descriptions.md
Shine-afk Feb 7, 2025
536e370
Update suffix.py
Shine-afk Feb 7, 2025
5a5b1b3
Merge pull request #82 from Shine-afk/suffix
nizamovtimur Feb 7, 2025
7d181c1
Add more attack prompts to SPL (#83)
nizamovtimur Feb 7, 2025
f1ede30
Small fix (#84)
nizamovtimur Feb 8, 2025
e696ace
add HarmBench prompts & remake harmful_behavior.py (#85)
NickoJo Feb 9, 2025
6eff307
Release v2.2.0
RomiconEZ Feb 10, 2025
d8406a1
add logo and fix docs
nizamovtimur Feb 13, 2025
39cf846
add exception handling for `ChatSession`
nizamovtimur Feb 13, 2025
970d038
fix doc link in examples (close #89)
nizamovtimur Feb 17, 2025
c0f2c57
add none checking to some attacks
nizamovtimur Feb 17, 2025
084b5a6
fix say none exception for multi stage
nizamovtimur Feb 17, 2025
120a110
enhance sycophancy
nizamovtimur Feb 17, 2025
69b2409
fix null checking for multistage attacks
nizamovtimur Feb 19, 2025
7516601
Merge pull request #88 from RomiconEZ/fix-bugs
nizamovtimur Feb 20, 2025
0d35530
unify the language (#90)
nizamovtimur Feb 22, 2025
a152cb0
Add logo to documentation
RomiconEZ Mar 8, 2025
1916483
Add italic project name to first page
RomiconEZ Mar 8, 2025
215f894
Add Guides section in README
RomiconEZ Mar 8, 2025
747eedb
Add Guides section in DOC. Update copyright. Fix attack_descriptions.md
RomiconEZ Mar 8, 2025
cb62c9e
Bump version to 2.3.0
RomiconEZ Mar 8, 2025
c55cf77
Minor changes
RomiconEZ Mar 8, 2025
b63268c
Bump version to 2.3.1
RomiconEZ Mar 8, 2025
173f1b0
Merge pull request #94 from RomiconEZ/main-2-3-0
RomiconEZ Mar 8, 2025
c87ea8e
add shuffle inconsistency attack (#98)
NickoJo Mar 17, 2025
4dc808e
Made a single point of formation of description for attacks. Reworked…
RomiconEZ Mar 20, 2025
2c947c5
add langchain client to langchain example and add telegram api info l…
nizamovtimur Mar 26, 2025
f70a8c0
enhance SPL attack (#100)
nizamovtimur Mar 26, 2025
c00eab6
Refactor test configurations to use parameter dictionaries
RomiconEZ Mar 26, 2025
425ce7a
Refactor attack classes to clean up multistage_depth handling
RomiconEZ Mar 26, 2025
aa7f599
Update dependencies, refactor LangChain integration, and improve robu…
RomiconEZ Mar 28, 2025
1ff1a9a
Integrate JudgeConfig and judge model into testing framework
RomiconEZ Mar 28, 2025
f6a87ef
Make judge model optional across testing pipeline
RomiconEZ Mar 28, 2025
5fd61a9
Run pre-commit
RomiconEZ Mar 28, 2025
5deaf42
Merge remote main
RomiconEZ Mar 28, 2025
3880b48
Transfer _prepare_attack_data to TestBase class. Add "tags" field in …
RomiconEZ Mar 29, 2025
25a5a0e
Remove unused os and pandas imports across modules
RomiconEZ Mar 29, 2025
21f7598
Add preset-based test configurations and utilities
RomiconEZ Mar 29, 2025
597f978
Refactor and enhance parameter handling in utility functions
RomiconEZ Mar 29, 2025
fdc7787
Refactor setup_models_and_tests call to simplify logic.
RomiconEZ Mar 29, 2025
8e7f239
Add exception handling in attack scripts for robustness
RomiconEZ Mar 29, 2025
0051740
Run pre-commit
RomiconEZ Mar 29, 2025
df56427
Remove redundant test scripts and improve example utility function
RomiconEZ Mar 29, 2025
86be0b1
Enable detailed progress tracking and display test stats
RomiconEZ Mar 29, 2025
7b5e636
Remove unused progress bar and enhance output formatting.
RomiconEZ Mar 29, 2025
a6c0cd8
Refactor and centralize box drawing for UI consistency
RomiconEZ Mar 29, 2025
b417a78
Refactor box rendering and improve code organization
RomiconEZ Mar 29, 2025
4446e00
Refactor output formatting into reusable helper functions
RomiconEZ Mar 29, 2025
69e10b0
Expand box width and refactor validation logic.
RomiconEZ Mar 29, 2025
a92511a
Improve logo alignment and adjust configuration output width
RomiconEZ Mar 29, 2025
730680b
Refactor test result formatting and progress bar implementation
RomiconEZ Mar 29, 2025
2593a80
Refactor test results table layout and cleanup unused imports
RomiconEZ Mar 29, 2025
7e7138a
Use `dict` instead of `Dict` for type hints and fix string formatting…
RomiconEZ Mar 29, 2025
051ca6e
Refactor report generation and introduce timestamped filenames.
RomiconEZ Apr 5, 2025
8673479
Remove outdated test notebook file
RomiconEZ Apr 5, 2025
8b40006
Refactor color constants into a dedicated module
RomiconEZ Apr 5, 2025
6a89ed6
Update CONTRIBUTING guide for streamlined attack addition
RomiconEZ Apr 5, 2025
71cc3c2
Remove unused test cases and add new OpenAI client tests
RomiconEZ Apr 5, 2025
e07c55c
Remove unused *args parameters from method definitions
RomiconEZ Apr 5, 2025
e74abd3
Refactor error messages for clarity and consistency
RomiconEZ Apr 5, 2025
3adb0ec
Update test environment variables for Mistral API compatibility
RomiconEZ Apr 5, 2025
23568c1
Add registration for custom tests if not already registered
RomiconEZ Apr 5, 2025
0cc2047
Update notebooks for using new way of calling testing function. Need …
RomiconEZ Apr 5, 2025
53c1482
Run pre-commit
RomiconEZ Apr 5, 2025
e88bad2
Rename `basic_tests_params` to `basic_tests` for consistency
RomiconEZ Apr 7, 2025
3ca212d
Switch to tqdm.auto and add sys import in work_progress_pool
RomiconEZ Apr 7, 2025
0d62484
Improve progress bar compatibility and add docstrings
RomiconEZ Apr 7, 2025
9bd842b
Refactor progress management in WorkProgressPool
RomiconEZ Apr 7, 2025
c2cda8e
Refactor progress display handling for notebook compatibility
RomiconEZ Apr 7, 2025
d25a787
"Reorganize CONTRIBUTING.md steps for clarity"
RomiconEZ Apr 7, 2025
fc04011
Add unified progress display for notebook and console modes
RomiconEZ Apr 7, 2025
c8a59ee
Refactor progress management logic in `ProgressWorker`.
RomiconEZ Apr 7, 2025
a7f85de
Refactor report generation with reusable base name variable
RomiconEZ Apr 7, 2025
4106d9e
Add validation for basic test parameters in LLAMATOR
RomiconEZ Apr 7, 2025
985ea43
Simplify basic_tests assignment in documentation.
RomiconEZ Apr 7, 2025
6fcd462
Run pre-commit
RomiconEZ Apr 7, 2025
646e045
Merge pull request #101 from LLAMATOR-Core/main-draft-v3
RomiconEZ Apr 7, 2025
6ea6c0c
refactor all attacks
nizamovtimur Apr 8, 2025
3a74127
replace judge model eval with heuristic in `base64_injection`attack
nizamovtimur Apr 8, 2025
01fc85b
refactor judge models interaction for `ethical_compliance`
nizamovtimur Apr 8, 2025
956d134
merge `stop_criterion` and `tested_client_response_handler` to one me…
nizamovtimur Apr 8, 2025
751cfb5
fix
nizamovtimur Apr 9, 2025
b868bd4
fix artifacts saving
nizamovtimur Apr 9, 2025
601628b
update examples
nizamovtimur Apr 9, 2025
58f1b21
fix ru_dan naming
nizamovtimur Apr 9, 2025
c0f93d4
Merge branch 'v3/refactor-attacks' into fix-examples
nizamovtimur Apr 9, 2025
0363d5c
run examples
nizamovtimur Apr 9, 2025
9408089
enhance docstrings and naming
nizamovtimur Apr 10, 2025
3ef885f
return atomaric logic in multistage interaction class
nizamovtimur Apr 10, 2025
e14f50c
fix
nizamovtimur Apr 10, 2025
148fe7b
yet another fix
nizamovtimur Apr 10, 2025
efd0414
add tags (#112)
nizamovtimur Apr 10, 2025
5693ebf
Merge branch 'main' into v3/refactor-attacks
nizamovtimur Apr 10, 2025
950349c
fix multistage info in `attack_descriptions.md`
nizamovtimur Apr 11, 2025
dd933ac
Refactor response verification logic into reusable functions
RomiconEZ Apr 11, 2025
a728def
Refactor response_verification to be a static method
RomiconEZ Apr 11, 2025
914fe59
pre-commit fix
nizamovtimur Apr 11, 2025
901d010
Merge pull request #111 from LLAMATOR-Core/v3/refactor-attacks
nizamovtimur Apr 12, 2025
b1a993c
Updates requirements
RomiconEZ Apr 12, 2025
8cffab4
Update environment variable names and improve attack class documentation
RomiconEZ Apr 12, 2025
97decd8
Bump version to 3.0.0.
RomiconEZ Apr 12, 2025
67613b1
Update development dependencies: bump twine to 6.1.0 and add pkginfo
RomiconEZ Apr 12, 2025
0d7fd09
Merge pull request #116 from LLAMATOR-Core/main-3-0-0
RomiconEZ Apr 12, 2025
bbb53e4
actualize examples (#117)
nizamovtimur Apr 12, 2025
b4865dd
Enhance documentation and add judge model validation checks
RomiconEZ Apr 13, 2025
f4f337f
Run pre-commit
RomiconEZ Apr 13, 2025
db7b063
Update howtos and project documentation for clarity and consistency
RomiconEZ Apr 13, 2025
2f67f8a
Add chat badge to project overview and README for community engagement
RomiconEZ Apr 13, 2025
de761d4
Refactor validation logic and improve parameter extraction in model t…
RomiconEZ Apr 13, 2025
a22d7dd
Merge pull request #121 from LLAMATOR-Core/small-improvements
RomiconEZ Apr 13, 2025
03aebb6
Add autodan turbo (#118)
nizamovtimur Apr 18, 2025
f220c0c
Switch parquet engine from fastparquet to pyarrow for improved compat…
RomiconEZ Apr 19, 2025
00a6881
Run pre-commit
RomiconEZ Apr 19, 2025
a28057f
Implement dialogue injection attack (#113)
3ndetz Apr 19, 2025
eea52fb
Update attack_descriptions.md (#124)
nizamovtimur Apr 19, 2025
8259d58
Merge remote-tracking branch 'origin/main' into main-actual
RomiconEZ Apr 19, 2025
030bc20
Switch parquet engine from fastparquet to pyarrow in dialogue injecti…
RomiconEZ Apr 19, 2025
1c22cda
Run pre-commit
RomiconEZ Apr 19, 2025
817a2df
Merge pull request #123 from LLAMATOR-Core/fix-dep-parquet
RomiconEZ Apr 19, 2025
2af248c
Bump version to 3.1.0 (#125)
RomiconEZ Apr 19, 2025
ec7042f
PAIR (harmful_behavior_multistage + scoring) (#127)
NickoJo Apr 27, 2025
c04e44e
Implement M-Attack for VLMs (#106)
ti3c2 May 2, 2025
3636b86
Implement VLM text hallucination attack (#107)
ti3c2 May 2, 2025
9dc37c0
Implement VLM low-resolution docs attack (#108)
ti3c2 May 2, 2025
252395f
Add dialogue injection continuation sub-attack (#132)
3ndetz May 2, 2025
b607d25
Add deceptive delight attack (#133)
nizamovtimur May 4, 2025
2b00ad7
mini refactor all attacks (#131)
nizamovtimur May 4, 2025
95aa108
add VLM example to README and docs + update unique features (#134)
nizamovtimur May 4, 2025
0808f2a
remove foolish attacks (#135)
nizamovtimur May 8, 2025
ef5e936
add harmbench
NickoJo May 9, 2025
b52901e
change language to enum
NickoJo May 10, 2025
3b93ea6
correct dataset and filtering
NickoJo May 10, 2025
cba0813
add language param
NickoJo May 12, 2025
e7077be
dan/ucar rework
NickoJo May 12, 2025
d876a44
typo fix
NickoJo May 12, 2025
f130c87
dataset filtering
NickoJo May 12, 2025
0bca893
source corrected
NickoJo May 12, 2025
f9ca7e6
change enum to literal
nizamovtimur May 18, 2025
d4caac6
change lang tags
nizamovtimur May 18, 2025
8b3c79d
fix notebooks
nizamovtimur May 18, 2025
b8563ac
refactor response filtering and dataset handling for improved readabi…
RomiconEZ May 21, 2025
9621766
fix tags
nizamovtimur May 21, 2025
cb83d55
Refactor dataset loading into a dedicated helper method.
RomiconEZ May 26, 2025
339d4b4
Merge pull request #137 from LLAMATOR-Core/dataset_rework
RomiconEZ May 26, 2025
7fbb5da
Refactor test functions to return aggregated results.
RomiconEZ May 26, 2025
77ed28b
Run pre-commit and rename file
RomiconEZ May 26, 2025
303ba51
Merge pull request #139 from LLAMATOR-Core/return_dict
RomiconEZ May 27, 2025
5436a6b
Update language validation to default to 'any' and enhance example ge…
RomiconEZ May 27, 2025
8478a1b
Run pre-commit
RomiconEZ May 27, 2025
19bb79d
Merge pull request #140 from LLAMATOR-Core/update_presets
RomiconEZ May 27, 2025
d9c7cb3
Update example and documentation (#142)
RomiconEZ May 29, 2025
0841e0f
Bump version to 3.2.0
RomiconEZ Jun 1, 2025
3c5404a
Merge pull request #143 from LLAMATOR-Core/main-3-2-0
RomiconEZ Jun 1, 2025
f6b754f
fix None checking in `MultiStageInteractionSession` (#145)
nizamovtimur Jun 10, 2025
e1ca4d7
add language consideration to refinement
NickoJo Jun 15, 2025
6fe423f
Merge pull request #146 from LLAMATOR-Core/pair-fix
NickoJo Jun 15, 2025
7db416d
Update README.md (#152)
nizamovtimur Jul 5, 2025
ef214fb
Transform Static Past Tense Attack into Dynamic Time Machine Attack (…
nizamovtimur Jul 5, 2025
5245b6a
Add Jugde model to System Prompt Leakage attack (#147)
nizamovtimur Jul 6, 2025
3ce9b2f
Add contact information and enterprise version note to README.md
RomiconEZ Jul 6, 2025
be99c67
Merge pull request #153 from LLAMATOR-Core/update-readme
RomiconEZ Jul 6, 2025
edfba18
Add 'model:llm' tag to various attack descriptions and refactor test …
RomiconEZ Jul 6, 2025
5c75e87
Run pre-commit and small refactor
RomiconEZ Jul 6, 2025
2b95922
Refactor test parameter naming from 'basic_tests_params' to 'basic_te…
RomiconEZ Jul 10, 2025
8d91c07
Run pre-commit
RomiconEZ Jul 10, 2025
cbb908f
add available presets in notebooks
RomiconEZ Jul 10, 2025
2fad915
split cell
RomiconEZ Jul 10, 2025
19d2ff8
Add Linguistic Sandwich Attack (#155)
nizamovtimur Jul 13, 2025
3ee492e
Merge pull request #154 from LLAMATOR-Core/refactor-print-test-preset
RomiconEZ Jul 15, 2025
8d6d8bc
Bump version to 3.3.0
RomiconEZ Jul 15, 2025
78e5346
Enhance test cases and add default handling for num_attempts parameter
RomiconEZ Jul 15, 2025
c898881
Release v3.3.0
RomiconEZ Jul 15, 2025
9a4122e
Fix NoneTypes and AutoDAN-Turbo (#158)
nizamovtimur Jul 21, 2025
d3fdca3
Update CONTRIBUTING.md
nizamovtimur Jul 22, 2025
906b78c
If response len < 3 do not eval
nizamovtimur Jul 23, 2025
2923266
Update linguistic_sandwich.py
nizamovtimur Jul 24, 2025
2eb9604
add more stopwords
nizamovtimur Jul 24, 2025
e5ad473
gg
nizamovtimur Jul 24, 2025
5ed37ba
abc
nizamovtimur Jul 24, 2025
b893833
fix
nizamovtimur Jul 24, 2025
33fedb1
Merge pull request #162 from LLAMATOR-Core/nizamovtimur-patch-1
RomiconEZ Jul 27, 2025
d0e0229
Merge pull request #161 from LLAMATOR-Core/enhance-contributing
RomiconEZ Jul 27, 2025
3deca0f
Merge remote-tracking branch 'origin/main' into main_last
RomiconEZ Jul 27, 2025
efac9ff
Update Run Tests section in CONTRIBUTING.md for clearer test instruct…
RomiconEZ Jul 27, 2025
37298d8
Merge branch 'main_last' into release_last
RomiconEZ Jul 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 3.2.0
current_version = 3.3.0
commit = False
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+))?
Expand Down
13 changes: 7 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@ pre-commit install

### Run Tests

1. Navigate to `tests/test_local_llamator.py`.
1. Navigate to the `tests` directory.
2. Create `.env` from `.env.example` and fill in the necessary fields.
3. Run the test function based on your LLM client setup.
3. Run an appropriate test file for your LLM client configuration.

## Making Changes

Expand Down Expand Up @@ -112,15 +112,16 @@ class TestNewAttack(TestBase):
"name": "New Attack",
"code_name": "new_attack",
"tags": [
"lang:en", # languages of available tested models
"lang:any", # languages of available tested models
"dialog:single-stage", # type of dialogs: single-stage or multi-stage
"owasp:llm01", # OWASP TOP 10 for LLM risks
"eval:heuristic", # type of resilience evaluation
"eval:heuristic", # type of resilience evaluation: heuristic or llm-as-a-judge
"arxiv:2504.11111", # original paper if exists
"model:llm", # type of testing model: llm, vlm
],
"description": {
"en": "Description in English.",
"ru": "Описание на русском.",
"en": "Your attack description here in English.",
"ru": "Описание атаки на русском.",
},
"github_link": "Link to attack in release branch",
}
Expand Down
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Red Teaming python-framework for testing chatbots and GenAI systems
## Install 🚀

```bash
pip install llamator==3.2.0
pip install llamator==3.3.0
```

## Documentation 📚
Expand Down Expand Up @@ -59,15 +59,16 @@ Documentation Link: [https://LLAMATOR-Core.github.io/llamator](https://LLAMATOR-

## Community 🌍

* 💬 [Telegram Chat — LLAMATOR | AI Red Team Community](https://t.me/llamator)
* 📣 [Telegram Channel — AI Security Lab](https://t.me/aisecuritylab)
* 💬 Telegram Chat — [LLAMATOR | AI Red Team Community](https://t.me/llamator)
* 🖤 Supported by [AI Talent Hub and ITMO University](https://ai.itmo.ru)

## Contacts 📬

## Supported by 🚀
✉️ **Email**: [team@llamator.org](mailto:team@llamator.org)

* [AI Security Lab ITMO](https://ai.itmo.ru/aisecuritylab)
* [Raft Security](https://raftds.ru/)
* [AI Talent Hub](https://ai.itmo.ru/)
## Enterprise Version 🏢

⚡️<span style="color:#FFC107;"><em>Coming Soon</em></span>⚡️

## License 📜

Expand Down
32 changes: 22 additions & 10 deletions docs/attack_descriptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ In code name: `suffix`

Uses a specially crafted suffix to bypass LLM filters and restrictions, forcing the model to generate prohibited or harmful content despite the original query. Evaluates the model's vulnerability to input data manipulation.

_Original Paper: <https://arxiv.org/abs/2307.15043>, Original Code: <https://github.com/llm-attacks/llm-attacks>_
_Original Paper: <https://arxiv.org/abs/2307.15043>, Code: <https://github.com/llm-attacks/llm-attacks>_

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/suffix.py">View code on GitHub</a>
</details>
Expand Down Expand Up @@ -45,7 +45,7 @@ In code name: `bon`

Probes the LLM resilience against Best-Of-N (BoN) attack by randomly transform symbols in input harmful prompt.

_Original Paper: <https://arxiv.org/abs/2412.03556>, Original Code: <https://github.com/jplhughes/bon-jailbreaking>_
_Original Paper: <https://arxiv.org/abs/2412.03556>, Code: <https://github.com/jplhughes/bon-jailbreaking>_

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/bon.py">View code on GitHub</a>
</details>
Expand Down Expand Up @@ -127,13 +127,15 @@ Leverages the use of a non-Russian language to circumvent the restrictions or fo
</details>

<details>
<summary><strong>Past Tense Attack</strong></summary>
<summary><strong>Linguistic Sandwich</strong></summary>

In code name: `past_tense`
In code name: `linguistic_sandwich`

Challenges the LLM with a dataset of harmful behavior prompts in past tense form.
Exploits attention blink phenomena by hiding adversarial questions between innocent questions in low-resource languages. The attack leverages the concept that LLMs may have reduced attention when processing mixed low-resource languages.

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/past_tense.py">View code on GitHub</a>
_Original Paper: <https://arxiv.org/abs/2404.07242v1>_

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/linguistic_sandwich.py">View code on GitHub</a>
</details>

<details>
Expand All @@ -143,11 +145,21 @@ In code name: `shuffle`

Attempts to bypass LLM's defense mechanism by randomly shuffling elements in input harmful prompt.

_Original Paper: <https://arxiv.org/html/2501.04931>_
_Original Paper: <https://arxiv.org/abs/2501.04931>_

<a href="https://github.com/RomiconEZ/llamator/blob/release/src/llamator/attacks/shuffle.py">View code on GitHub</a>
</details>

<details>
<summary><strong>Time Machine Attack</strong></summary>

In code name: `time_machine`

Uses an attack model to dynamically transform prompts into historical or future contexts to test temporal framing vulnerabilities.

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/time_machine.py">View code on GitHub</a>
</details>

<details>
<summary><strong>UCAR Jailbreak</strong></summary>

Expand All @@ -163,7 +175,7 @@ Probes an LLM's adherence to content restrictions when explicitly requested to i
<details>
<summary><strong>What Drives the Multi-stage?</strong></summary>

Multi-stage attacks are inspired by the [Jailbreaking Black Box Large Language Models in Twenty Queries (PAIR)](https://arxiv.org/html/2310.08419) paper.
Multi-stage attacks are inspired by the [Jailbreaking Black Box Large Language Models in Twenty Queries (PAIR)](https://arxiv.org/abs/2310.08419) paper.

For managing a multi-stage interaction between an attacker and tested chat clients, the `MultiStageInteractionSession` class is available [[source]](https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/client/chat_client.py). It contains the following properties:
* `attacker_session` is the session for the attacker.
Expand All @@ -185,7 +197,7 @@ In code name: `autodan_turbo`

Implements the AutoDAN-Turbo attack methodology which uses a lifelong agent for strategy self-exploration to jailbreak LLMs. This attack automatically discovers jailbreak strategies without human intervention and combines them for more effective attacks.

_Original Paper: <https://arxiv.org/abs/2410.05295v3>, Original Code: <https://github.com/SaFoLab-WISC/AutoDAN-Turbo>_
_Original Paper: <https://arxiv.org/abs/2410.05295v3>, Code: <https://github.com/SaFoLab-WISC/AutoDAN-Turbo>_

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/autodan_turbo.py">View code on GitHub</a>
</details>
Expand All @@ -207,7 +219,7 @@ _Original Paper: <https://arxiv.org/abs/2404.01833>_

In code name: `pair`

Challenges the LLM with a dataset of harmful behavior prompts using multistage refinement with judge model scoring.
Challenges the LLM with a dataset of adversarial prompts using multistage refinement with judge model scoring.
Original Paper: https://arxiv.org/abs/2310.08419v4, Code: https://github.com/patrickrchao/JailbreakingLLMs

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/pair.py">View code on GitHub</a>
Expand Down
8 changes: 4 additions & 4 deletions docs/code_documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,20 +52,20 @@ Available Clients
Additional Utility Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: llamator.utils.params_example.get_preset_tests_params_example
.. autofunction:: llamator.utils.test_presets.get_test_preset
:noindex:

.. note::

This function generates an example code snippet for configuring basic_tests_params based on a preset configuration.
This function generates an example code snippet for configuring basic_tests based on a preset configuration.
It returns a code snippet as a string.

.. autofunction:: llamator.utils.params_example.print_preset_tests_params_example
.. autofunction:: llamator.utils.test_presets.print_test_preset
:noindex:

.. note::

This function prints an example configuration for basic_tests_params based on a preset to the console.
This function prints an example configuration for basic_tests based on a preset to the console.

.. autofunction:: llamator.client.langchain_integration.print_chat_models_info
:noindex:
Expand Down
17 changes: 10 additions & 7 deletions docs/howtos.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
## Installation Guide

```bash
pip install llamator==3.2.0
pip install llamator==3.3.0
```

## Usage Guide (using LM Studio)
Expand Down Expand Up @@ -104,26 +104,29 @@ print(test_result_dict)

## Helper Functions

### `print_preset_tests_params_example`
### `print_test_preset`
Prints example configuration for presets to the console.

Available presets: `all`, `eng`, `llm`, `owasp:llm01`, `owasp:llm07`, `owasp:llm09`, `rus`, `vlm`

**Usage:**

```python
from llamator import print_preset_tests_params_example
from llamator import print_test_preset

# Print configuration for all available tests
print_preset_tests_params_example("all")
print_test_preset("all")
```

### `get_preset_tests_params_example`
### `get_test_preset`
Returns a string containing example configurations for presets.

**Usage:**
```python
from llamator import get_preset_tests_params_example
from llamator import get_test_preset

# Get example for all available tests
all_tests_preset = get_preset_tests_params_example("all")
all_tests_preset = get_test_preset("all")
print(all_tests_preset)
```

Expand Down
11 changes: 2 additions & 9 deletions docs/project_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,8 @@ LLAMATOR - Red Teaming python-framework for testing chatbots and GenAI systems

## Community

* 💬 [Telegram Chat — LLAMATOR | AI Red Team Community](https://t.me/llamator)
* 📣 [Telegram Channel — AI Security Lab](https://t.me/aisecuritylab)


## Supported by

* [AI Security Lab ITMO](https://ai.itmo.ru/aisecuritylab)
* [Raft Security](https://raftds.ru/)
* [AI Talent Hub](https://ai.itmo.ru/aisecuritylab)
* 💬 Telegram Chat — [LLAMATOR | AI Red Team Community](https://t.me/llamator)
* 🖤 Supported by [AI Talent Hub and ITMO University](https://ai.itmo.ru)

## License

Expand Down
Loading
Loading