Skip to content
Change the repository type filter

All

    Repositories list

    • Official code repository for "A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems" at ICLR 2026.
      JavaScript
      MIT License
      0000Updated Feb 26, 2026Feb 26, 2026
    • dVLM-AD

      Public
      Official Repo for “dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning”
      Python
      0500Updated Feb 22, 2026Feb 22, 2026
    • armor

      Public
      Python
      MIT License
      0500Updated Feb 17, 2026Feb 17, 2026
    • DRIFT

      Public
      [NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents".
      Python
      23910Updated Feb 14, 2026Feb 14, 2026
    • AdaShield

      Public
      [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting."
      Python
      47150Updated Feb 9, 2026Feb 9, 2026
    • The official implementation of our preprint paper "ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoni…
      Python
      Other
      1700Updated Feb 9, 2026Feb 9, 2026
    • DoxBench

      Public
      [ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"
      Jupyter Notebook
      Apache License 2.0
      22300Updated Feb 7, 2026Feb 7, 2026
    • The homepage of SaFo Lab
      HTML
      MIT License
      0200Updated Jan 28, 2026Jan 28, 2026
    • MetaAgent

      Public
      Offical Repository of MetaAgent Program
      Python
      64140Updated Dec 2, 2025Dec 2, 2025
    • A further improvement for the AutoDAN-Turbo through test-time scaling.
      Python
      MIT License
      31200Updated Oct 21, 2025Oct 21, 2025
    • [ICLR 2025 Spotlight] The official implementation of our ICLR2025 paper "AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs".
      Python
      MIT License
      6034850Updated Oct 8, 2025Oct 8, 2025
    • PRISM

      Public
      PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality
      Python
      MIT License
      1500Updated Sep 12, 2025Sep 12, 2025
    • [ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".
      Python
      13400Updated Aug 4, 2025Aug 4, 2025
    • llm-armor

      Public
      JavaScript
      0000Updated Jul 23, 2025Jul 23, 2025
    • [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and further assess the robustn…
      Python
      108820Updated May 9, 2025May 9, 2025
    • OET

      Public
      Python
      MIT License
      11100Updated May 5, 2025May 5, 2025
    • FIUBench

      Public
      A Task of Fictitious Unlearning for VLMs
      Jupyter Notebook
      22870Updated Apr 6, 2025Apr 6, 2025
    • Dolphins

      Public
      [ECCV 2024] The official code for "Dolphins: Multimodal Language Model for Driving“
      Python
      MIT License
      148860Updated Feb 10, 2025Feb 10, 2025
    • List of T2I safety papers, updated daily, welcome to discuss using Discussions
      MIT License
      16700Updated Aug 12, 2024Aug 12, 2024
    • .github

      Public
      Open codes from SaFoLab at University of Wisconsin–Madison
      0100Updated Jul 3, 2024Jul 3, 2024