SaFoLab : Security and Safe Foundation Model Systems

All

20 repositories

A2ASecBench
Public
Official code repository for "A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems" at ICLR 2026.
JavaScript
•
MIT License
•0•0•0•0•Updated Feb 26, 2026Feb 26, 2026
dVLM-AD
Public
Official Repo for “dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning”
Python
•0•5•0•0•Updated Feb 22, 2026Feb 22, 2026
armor
Public
Python
•
MIT License
•0•5•0•0•Updated Feb 17, 2026Feb 17, 2026
DRIFT
Public
[NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents".
Python
•2•39•1•0•Updated Feb 14, 2026Feb 14, 2026
AdaShield
Public
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting."
Python
•4•71•5•0•Updated Feb 9, 2026Feb 9, 2026
ReasoningBomb
Public
The official implementation of our preprint paper "ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoni…
Python
•
Other
•1•7•0•0•Updated Feb 9, 2026Feb 9, 2026
DoxBench
Public
[ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"
Jupyter Notebook
•
Apache License 2.0
•2•23•0•0•Updated Feb 7, 2026Feb 7, 2026
SaFo-Lab.github.io
Public
The homepage of SaFo Lab
HTML
•
MIT License
•0•2•0•0•Updated Jan 28, 2026Jan 28, 2026
MetaAgent
Public
Offical Repository of MetaAgent Program
Python
•6•41•4•0•Updated Dec 2, 2025Dec 2, 2025
AutoDAN-Reasoning
Public
A further improvement for the AutoDAN-Turbo through test-time scaling.
Python
•
MIT License
•3•12•0•0•Updated Oct 21, 2025Oct 21, 2025
AutoDAN-Turbo
Public
[ICLR 2025 Spotlight] The official implementation of our ICLR2025 paper "AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs".
Python
•
MIT License
•60•348•5•0•Updated Oct 8, 2025Oct 8, 2025
PRISM
Public
PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality
safety vlm vlm-reasoning
Python
•
MIT License
•1•5•0•0•Updated Sep 12, 2025Sep 12, 2025
AGrail4Agent
Public
[ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".
Python
•1•34•0•0•Updated Aug 4, 2025Aug 4, 2025
llm-armor
Public
JavaScript
•0•0•0•0•Updated Jul 23, 2025Jul 23, 2025
JailBreakV_28K
Public
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and further assess the robustn…
jailbreakv-28k
Python
•10•88•2•0•Updated May 9, 2025May 9, 2025
OET
Public
Python
•
MIT License
•1•11•0•0•Updated May 5, 2025May 5, 2025
FIUBench
Public
A Task of Fictitious Unlearning for VLMs
Jupyter Notebook
•2•28•7•0•Updated Apr 6, 2025Apr 6, 2025
Dolphins
Public
[ECCV 2024] The official code for "Dolphins: Multimodal Language Model for Driving“
Python
•
MIT License
•14•88•6•0•Updated Feb 10, 2025Feb 10, 2025
Awesome-T2I-safety-Papers
Public
List of T2I safety papers, updated daily, welcome to discuss using Discussions
MIT License
•1•67•0•0•Updated Aug 12, 2024Aug 12, 2024
.github
Public
Open codes from SaFoLab at University of Wisconsin–Madison
0•1•0•0•Updated Jul 3, 2024Jul 3, 2024