diff --git a/List_of_papers_about_transfer_based_attacks.md b/List_of_papers_about_transfer_based_attacks.md index 95fc3d6..c844d62 100644 --- a/List_of_papers_about_transfer_based_attacks.md +++ b/List_of_papers_about_transfer_based_attacks.md @@ -101,7 +101,7 @@ We also provide a complete list of papers about adversarial examples [here](htt Zhengwei Fang, Rui Wang, Tao Huang, Liping Jing -+ [Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling](https://arxiv.org//abs/2405.16181) (arXiv preprint arXiv:2405.16181, 2024) ++ [Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling](https://arxiv.org/abs/2405.16181) (arXiv preprint arXiv:2405.16181, 2024) Chunlin Qiu, Yiheng Duan, Lingchen Zhao, Qian Wang @@ -178,7 +178,7 @@ We also provide a complete list of papers about adversarial examples [here](htt Shangbo Wu, Yu-an Tan, Yajie Wang, Ruinan Ma, Wencong Ma, Yuanzhang Li -+ [The Ultimate Combo: Boosting Adversarial Example Transferability by Composing Data Augmentations](https://arxiv.org//abs/2312.11309) (arXiv preprint arXiv:2312.11309, 2023) ++ [The Ultimate Combo: Boosting Adversarial Example Transferability by Composing Data Augmentations](https://arxiv.org/abs/2312.11309) (arXiv preprint arXiv:2312.11309, 2023) Zebin Yun, Achi-Or Weingarten, Eyal Ronen, Mahmood Sharif @@ -188,7 +188,7 @@ We also provide a complete list of papers about adversarial examples [here](htt Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, Ee-Chien Chang -+ [ Boost Adversarial Transferability by Uniform Scale and Mix Mask Method](https://arxiv.org//abs/2311.12051) (arXiv preprint arXiv:2311.12051, 2023) ++ [ Boost Adversarial Transferability by Uniform Scale and Mix Mask Method](https://arxiv.org/abs/2311.12051) (arXiv preprint arXiv:2311.12051, 2023) Tao Wang, Zijian Ying, Qianmu Li, zhichao Lian @@ -198,12 +198,12 @@ We also provide a complete list of papers about adversarial examples [here](htt Kunyu Wang, Xuanran He, Wenxuan Wang, Xiaosen Wang -+ [ Boosting the Transferability of Adversarial Examples via Local Mixup and Adaptive Step Size](https://arxiv.org//abs/2401.13205) (arXiv preprint arXiv:2401.13205, 2024) ++ [ Boosting the Transferability of Adversarial Examples via Local Mixup and Adaptive Step Size](https://arxiv.org/abs/2401.13205) (arXiv preprint arXiv:2401.13205, 2024) Junlin Liu, Xinchen Lyu -+ [ Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping](https://arxiv.org//abs/2402.03951) (AAAI 2024) ++ [ Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping](https://arxiv.org/abs/2402.03951) (AAAI 2024) Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song @@ -270,7 +270,7 @@ We also provide a complete list of papers about adversarial examples [here](htt Qizhang Li, Yiwen Guo, Wangmeng Zuo, Hao Chen -+ [DANAA: Towards transferable attacks with double adversarial neuron attribution](https://arxiv.org//abs/2310.10427) (ADMA 2023) ++ [DANAA: Towards transferable attacks with double adversarial neuron attribution](https://arxiv.org/abs/2310.10427) (ADMA 2023) Zhibo Jin, Zhiyu Zhu, Xinyi Wang, Jiayu Zhang, Jun Shen, Huaming Chen @@ -367,7 +367,7 @@ We also provide a complete list of papers about adversarial examples [here](htt Zhuoer Xu, Zhangxuan Gu, Jianping Zhang, Shiwen Cui, Changhua Meng, Weiqiang Wang -+ [Improving Adversarial Transferability via Model Alignment](https://arxiv.org//abs/2311.18495) (arXiv preprint arXiv:2311.18495, 2023) ++ [Improving Adversarial Transferability via Model Alignment](https://arxiv.org/abs/2311.18495) (arXiv preprint arXiv:2311.18495, 2023) Avery Ma, Amir-massoud Farahmand, Yangchen Pan, Philip Torr, Jindong Gu @@ -382,7 +382,7 @@ We also provide a complete list of papers about adversarial examples [here](htt Jianping Zhang, Yizhan Huang, Zhuoer Xu, Weibin Wu, Michael R. Lyu -+ [ Exploring Frequencies via Feature Mixing and Meta-Learning for Improving Adversarial Transferability](https://arxiv.org//abs/2405.03193) (arXiv preprint arXiv:2405.03193, 2024) ++ [ Exploring Frequencies via Feature Mixing and Meta-Learning for Improving Adversarial Transferability](https://arxiv.org/abs/2405.03193) (arXiv preprint arXiv:2405.03193, 2024) Juanjuan Weng, Zhiming Luo, Shaozi Li @@ -578,22 +578,22 @@ We also provide a complete list of papers about adversarial examples [here](htt ## Survey & Benchmark -+ [Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly](https://arxiv.org//abs/2311.01323) (NeurIPS 2023) ++ [Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly](https://arxiv.org/abs/2311.01323) (NeurIPS 2023) Qizhang Li, Yiwen Guo, Wangmeng Zuo, Hao Chen -+ [A Survey on Transferability of Adversarial Examples across Deep Neural Networks](https://arxiv.org//abs/2310.17626) (arXiv preprint arXiv: 2310.17626 2023) ++ [A Survey on Transferability of Adversarial Examples across Deep Neural Networks](https://arxiv.org/abs/2310.17626) (arXiv preprint arXiv: 2310.17626 2023) Jindong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, Xiaochun Cao, Philip Torr -+ [Revisiting Transferable Adversarial Image Examples: Attack Categorization, Evaluation Guidelines, and New Insights](https://arxiv.org//abs/2310.11850) (arXiv preprint arXiv: 2310.11850 2023) ++ [Revisiting Transferable Adversarial Image Examples: Attack Categorization, Evaluation Guidelines, and New Insights](https://arxiv.org/abs/2310.11850) (arXiv preprint arXiv: 2310.11850 2023) Zhengyu Zhao, Hanwei Zhang, Renjue Li, Ronan Sicre, Laurent Amsaleg, Michael Backes, Qi Li, Chao Shen -+ [ Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems ](https://arxiv.org//abs/2311.11796) (arXiv preprint arXiv: 2311.11796 2023) ++ [ Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems ](https://arxiv.org/abs/2311.11796) (arXiv preprint arXiv: 2311.11796 2023) Guangjing Wang, Ce Zhou, Yuanda Wang, Bocheng Chen, Hanqing Guo, Qiben Yan @@ -605,4 +605,4 @@ We also provide a complete list of papers about adversarial examples [here](htt + [Short: Benchmarking Transferable Adversarial Attacks](https://arxiv.org/abs/2402.00418) (NDSS Workshop, 2024) - Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Huaming Chen \ No newline at end of file + Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Huaming Chen diff --git a/README.md b/README.md index 0c12fab..29a9a73 100644 --- a/README.md +++ b/README.md @@ -4,772 +4,772 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html) has been experiencing crashes over the past few days. In the absence of this valuable resource, staying up-to-date with the latest research papers in this field has become challenging. Consequently, I created a repository aimed at aggregating and maintaining the most current papers in this domain. While this repository may not encompass every paper, I did try. If you find any papers we have missed, just drop me an [email](mailto:xswanghuster@gmail.com). We have included the [data](./nicholas.md) from [List of All Adversarial Example Papers](https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html) till 2023-09-01. We also provide a list of papers about transfer-based attacks [here](https://xiaosenwang.com/transfer_based_attack_papers.html). # 2025-10-24 -+ [NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge](https://arxiv.org//abs/2510.21144) ++ [NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge](https://arxiv.org/abs/2510.21144) Hanyu Zhu, Lance Fiondella, Jiawei Yuan, Kai Zeng, Long Jiao -+ [When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails](https://arxiv.org//abs/2510.21285) ++ [When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails](https://arxiv.org/abs/2510.21285) Yingzhi Mao (1 and 2), Chunkang Zhang (1 and 2), Junxiang Wang (1), Xinyan Guan (1 and 2), Boxi Cao (1), Yaojie Lu (1), Hongyu Lin (1), Xianpei Han (1 and 2), Le Sun (1 and 2) ((1) Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, (2) University of Chinese Academy of Sciences) -+ [Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference](https://arxiv.org//abs/2510.21184) ++ [Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference](https://arxiv.org/abs/2510.21184) Stephen Zhao, Aidan Li, Rob Brekelmans, Roger Grosse -+ [SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation](https://arxiv.org//abs/2510.21120) ++ [SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation](https://arxiv.org/abs/2510.21120) Alec Helbling, Shruti Palaskar, Kundan Krishna, Polo Chau, Leon Gatys, Joseph Yitan Cheng -+ [DictPFL: Efficient and Private Federated Learning on Encrypted Gradients](https://arxiv.org//abs/2510.21086) ++ [DictPFL: Efficient and Private Federated Learning on Encrypted Gradients](https://arxiv.org/abs/2510.21086) Jiaqi Xue, Mayank Kumar, Yuzhang Shang, Shangqian Gao, Rui Ning, Mengxin Zheng, Xiaoqian Jiang, Qian Lou -+ [How Hard is it to Confuse a World Model?](https://arxiv.org//abs/2510.21232) ++ [How Hard is it to Confuse a World Model?](https://arxiv.org/abs/2510.21232) Waris Radji (Scool, CRIStAL), Odalric-Ambrym Maillard (Scool, CRIStAL) -+ [PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling](https://arxiv.org//abs/2510.21262) ++ [PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling](https://arxiv.org/abs/2510.21262) Andrea Bonfanti, Ismael Medina, Roman List, Björn Staeves, Roberto Santana, Marco Ellero -+ [Probe-based Fine-tuning for Reducing Toxicity](https://arxiv.org//abs/2510.21531) ++ [Probe-based Fine-tuning for Reducing Toxicity](https://arxiv.org/abs/2510.21531) Jan Wehner, Mario Fritz -+ [FrameShield: Adversarially Robust Video Anomaly Detection](https://arxiv.org//abs/2510.21532) ++ [FrameShield: Adversarially Robust Video Anomaly Detection](https://arxiv.org/abs/2510.21532) Mojtaba Nafez, Mobina Poulaei, Nikan Vasei, Bardia Soltani Moakhar, Mohammad Sabokrou, MohammadHossein Rohban -+ [Soft Instruction De-escalation Defense](https://arxiv.org//abs/2510.21057) ++ [Soft Instruction De-escalation Defense](https://arxiv.org/abs/2510.21057) Nils Philipp Walter, Chawin Sitawarin, Jamie Hayes, David Stutz, Ilia Shumailov -+ [Doubly-Regressing Approach for Subgroup Fairness](https://arxiv.org//abs/2510.21091) ++ [Doubly-Regressing Approach for Subgroup Fairness](https://arxiv.org/abs/2510.21091) Kyungseon Lee, Kunwoong Kim, Jihu Lee, Dongyoon Yang, Yongdai Kim -+ [QAE-BAC: Achieving Quantifiable Anonymity and Efficiency in Blockchain-Based Access Control with Attribute](https://arxiv.org//abs/2510.21124) ++ [QAE-BAC: Achieving Quantifiable Anonymity and Efficiency in Blockchain-Based Access Control with Attribute](https://arxiv.org/abs/2510.21124) Jie Zhang, Xiaohong Li, Mengke Zhang, Ruitao Feng, Shanshan Xu, Zhe Hou, Guangdong Bai -+ [Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency](https://arxiv.org//abs/2510.21189) ++ [Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency](https://arxiv.org/abs/2510.21189) Yukun Jiang, Mingjie Li, Michael Backes, Yang Zhang -+ [The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning](https://arxiv.org//abs/2510.21190) ++ [The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning](https://arxiv.org/abs/2510.21190) Mingrui Liu, Sixiao Zhang, Cheng Long, Kwok Yan Lam -+ [Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses](https://arxiv.org//abs/2510.21214) ++ [Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses](https://arxiv.org/abs/2510.21214) Xingwei Zhong, Kar Wai Fok, Vrizlynn L.L. Thing # 2025-10-23 -+ [SAID: Empowering Large Language Models with Self-Activating Internal Defense](https://arxiv.org//abs/2510.20129) ++ [SAID: Empowering Large Language Models with Self-Activating Internal Defense](https://arxiv.org/abs/2510.20129) Yulong Chen, Yadong Liu, Jiawen Zhang, Mu Li, Chao Huang, Jie Wen -+ [Enhancing Security in Deep Reinforcement Learning: A Comprehensive Survey on Adversarial Attacks and Defenses](https://arxiv.org//abs/2510.20314) ++ [Enhancing Security in Deep Reinforcement Learning: A Comprehensive Survey on Adversarial Attacks and Defenses](https://arxiv.org/abs/2510.20314) Wu Yichao, Wang Yirui, Ding Panpan, Wang Hailong, Zhu Bingqian, Liu Chun -+ [GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?](https://arxiv.org//abs/2510.20333) ++ [GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?](https://arxiv.org/abs/2510.20333) Chiyu Chen, Xinhao Song, Yunkai Chai, Yang Yao, Haodong Zhao, Lijun Li, Jie Li, Yan Teng, Gongshen Liu, Yingchun Wang -+ [Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models](https://arxiv.org//abs/2510.20468) ++ [Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models](https://arxiv.org/abs/2510.20468) Tomáš Souček, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko -+ [Steering Evaluation-Aware Language Models To Act Like They Are Deployed](https://arxiv.org//abs/2510.20487) ++ [Steering Evaluation-Aware Language Models To Act Like They Are Deployed](https://arxiv.org/abs/2510.20487) Tim Tian Hua, Andrew Qin, Samuel Marks, Neel Nanda -+ [AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN](https://arxiv.org//abs/2510.20566) ++ [AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN](https://arxiv.org/abs/2510.20566) Wei Shao, Yuhao Wang, Rongguang He, Muhammad Ejaz Ahmed, Seyit Camtepe -+ [RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines](https://arxiv.org//abs/2510.20768) ++ [RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines](https://arxiv.org/abs/2510.20768) Austin Jia, Avaneesh Ramesh, Zain Shamsi, Daniel Zhang, Alex Liu -+ [Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)](https://arxiv.org//abs/2510.20358) ++ [Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)](https://arxiv.org/abs/2510.20358) Francesca Padovani, Bastian Bunzeck, Manar Ali, Omar Momen, Arianna Bisazza, Hendrik Buschmeier, Sina Zarrieß -+ [BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation](https://arxiv.org//abs/2510.20792) ++ [BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation](https://arxiv.org/abs/2510.20792) Liang Ye, Shengqin Chen, Jiazhu Dai -+ [Causal Debiasing for Visual Commonsense Reasoning](https://arxiv.org//abs/2510.20281) ++ [Causal Debiasing for Visual Commonsense Reasoning](https://arxiv.org/abs/2510.20281) Jiayi Zou, Gengyun Jia, Bing-Kun Bao -+ [Dino-Diffusion Modular Designs Bridge the Cross-Domain Gap in Autonomous Parking](https://arxiv.org//abs/2510.20335) ++ [Dino-Diffusion Modular Designs Bridge the Cross-Domain Gap in Autonomous Parking](https://arxiv.org/abs/2510.20335) Zixuan Wu, Hengyuan Zhang, Ting-Hsuan Chen, Yuliang Guo, David Paz, Xinyu Huang, Liu Ren -+ [MEIcoder: Decoding Visual Stimuli from Neural Activity by Leveraging Most Exciting Inputs](https://arxiv.org//abs/2510.20762) ++ [MEIcoder: Decoding Visual Stimuli from Neural Activity by Leveraging Most Exciting Inputs](https://arxiv.org/abs/2510.20762) Jan Sobotka, Luca Baroni, Ján Antolík -+ [H-SPLID: HSIC-based Saliency Preserving Latent Information Decomposition](https://arxiv.org//abs/2510.20627) ++ [H-SPLID: HSIC-based Saliency Preserving Latent Information Decomposition](https://arxiv.org/abs/2510.20627) Lukas Miklautz, Chengzhi Shi, Andrii Shkabrii, Theodoros Thirimachos Davarakis, Prudence Lam, Claudia Plant, Jennifer Dy, Stratis Ioannidis -+ [Adversary-Aware Private Inference over Wireless Channels](https://arxiv.org//abs/2510.20518) ++ [Adversary-Aware Private Inference over Wireless Channels](https://arxiv.org/abs/2510.20518) Mohamed Seif, Malcolm Egan, Andrea J. Goldsmith, H. Vincent Poor -+ [Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations](https://arxiv.org//abs/2510.20223) ++ [Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations](https://arxiv.org/abs/2510.20223) Divyanshu Kumar, Shreyas Jena, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi -+ [HHEML: Hybrid Homomorphic Encryption for Privacy-Preserving Machine Learning on Edge](https://arxiv.org//abs/2510.20243) ++ [HHEML: Hybrid Homomorphic Encryption for Privacy-Preserving Machine Learning on Edge](https://arxiv.org/abs/2510.20243) Yu Hin Chan, Hao Yang, Shiyu Shen, Xingyu Fan, Shengzhe Lyu, Patrick S. Y. Hung, Ray C. C. Cheung -+ [NeuPerm: Disrupting Malware Hidden in Neural Network Parameters by Leveraging Permutation Symmetry](https://arxiv.org//abs/2510.20367) ++ [NeuPerm: Disrupting Malware Hidden in Neural Network Parameters by Leveraging Permutation Symmetry](https://arxiv.org/abs/2510.20367) Daniel Gilkarov, Ran Dubin -+ [An Experimental Study of Trojan Vulnerabilities in UAV Autonomous Landing](https://arxiv.org//abs/2510.20932) ++ [An Experimental Study of Trojan Vulnerabilities in UAV Autonomous Landing](https://arxiv.org/abs/2510.20932) Reza Ahmari, Ahmad Mohammadi, Vahid Hemmati, Mohammed Mynuddin, Mahmoud Nabil Mahmoud, Parham Kebria, Abdollah Homaifar, Mehrdad Saif -+ [Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference](https://arxiv.org//abs/2510.21017) ++ [Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference](https://arxiv.org/abs/2510.21017) Yuhong Luo, Austin Hoag, Xintong Wang, Philip S. Thomas, Przemyslaw A. Grabowicz -+ [Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization](https://arxiv.org//abs/2510.20883) ++ [Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization](https://arxiv.org/abs/2510.20883) Antônio H. Ribeiro, David Vävinggren, Dave Zachariah, Thomas B. Schön, Francis Bach -+ [Can Current Detectors Catch Face-to-Voice Deepfake Attacks?](https://arxiv.org//abs/2510.21004) ++ [Can Current Detectors Catch Face-to-Voice Deepfake Attacks?](https://arxiv.org/abs/2510.21004) Nguyen Linh Bao Nguyen, Alsharif Abuadbba, Kristen Moore, Tingming Wu -+ [A new measure for dynamic leakage based on quantitative information flow](https://arxiv.org//abs/2510.20922) ++ [A new measure for dynamic leakage based on quantitative information flow](https://arxiv.org/abs/2510.20922) Luigi D. C. Soares, Mário S. Alvim, Natasha Fernandes -+ [A Reinforcement Learning Framework for Robust and Secure LLM Watermarking](https://arxiv.org//abs/2510.21053) ++ [A Reinforcement Learning Framework for Robust and Secure LLM Watermarking](https://arxiv.org/abs/2510.21053) Li An, Yujian Liu, Yepeng Liu, Yuheng Bu, Yang Zhang, Shiyu Chang # 2025-10-22 -+ [LAPRAD: LLM-Assisted PRotocol Attack Discovery](https://arxiv.org//abs/2510.19264) ++ [LAPRAD: LLM-Assisted PRotocol Attack Discovery](https://arxiv.org/abs/2510.19264) R.Can Aygun (UCLA), Yehuda Afek (Tel-Aviv University), Anat Bremler-Barr (Tel-Aviv University), Leonard Kleinrock (UCLA) -+ [Collaborative penetration testing suite for emerging generative AI algorithms](https://arxiv.org//abs/2510.19303) ++ [Collaborative penetration testing suite for emerging generative AI algorithms](https://arxiv.org/abs/2510.19303) Petar Radanliev -+ [A New Type of Adversarial Examples](https://arxiv.org//abs/2510.19347) ++ [A New Type of Adversarial Examples](https://arxiv.org/abs/2510.19347) Xingyang Nie, Guojie Xiao, Su Pan, Biao Wang, Huilin Ge, Tao Fang -+ [Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation](https://arxiv.org//abs/2510.19420) ++ [Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation](https://arxiv.org/abs/2510.19420) Chengcan Wu, Zhixin Zhang, Mingqian Xu, Zeming Wei, Meng Sun -+ [Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent](https://arxiv.org//abs/2510.19641) ++ [Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent](https://arxiv.org/abs/2510.19641) Yangshijie Zhang, Xinda Wang, Jialin Liu, Wenqiang Wang, Zhicong Ma, Xingxing Jia -+ [Machine Text Detectors are Membership Inference Attacks](https://arxiv.org//abs/2510.19492) ++ [Machine Text Detectors are Membership Inference Attacks](https://arxiv.org/abs/2510.19492) Ryuto Koike, Liam Dugan, Masahiro Kaneko, Chris Callison-Burch, Naoaki Okazaki -+ [Hubble: a Model Suite to Advance the Study of LLM Memorization](https://arxiv.org//abs/2510.19811) ++ [Hubble: a Model Suite to Advance the Study of LLM Memorization](https://arxiv.org/abs/2510.19811) Johnny Tian-Zheng Wei, Ameya Godbole, Mohammad Aflah Khan, Ryan Wang, Xiaoyuan Zhu, James Flemings, Nitya Kashyap, Krishna P. Gummadi, Willie Neiswanger, Robin Jia -+ [OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform](https://arxiv.org//abs/2510.19169) ++ [OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform](https://arxiv.org/abs/2510.19169) Thomas Wang, Haowen Li -+ [LLM Unlearning with LLM Beliefs](https://arxiv.org//abs/2510.19422) ++ [LLM Unlearning with LLM Beliefs](https://arxiv.org/abs/2510.19422) Kemou Li, Qizhou Wang, Yue Wang, Fengpeng Li, Jun Liu, Bo Han, Jiantao Zhou -+ [Blackbox Model Provenance via Palimpsestic Membership Inference](https://arxiv.org//abs/2510.19796) ++ [Blackbox Model Provenance via Palimpsestic Membership Inference](https://arxiv.org/abs/2510.19796) Rohith Kuditipudi, Jing Huang, Sally Zhu, Diyi Yang, Christopher Potts, Percy Liang -+ [AegisRF: Adversarial Perturbations Guided with Sensitivity for Protecting Intellectual Property of Neural Radiance Fields](https://arxiv.org//abs/2510.19371) ++ [AegisRF: Adversarial Perturbations Guided with Sensitivity for Protecting Intellectual Property of Neural Radiance Fields](https://arxiv.org/abs/2510.19371) Woo Jae Kim, Kyu Beom Han, Yoonki Cho, Youngju Na, Junsik Jung, Sooel Son, Sung-eui Yoon -+ [Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection](https://arxiv.org//abs/2510.19574) ++ [Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection](https://arxiv.org/abs/2510.19574) Ariana Yi, Ce Zhou, Liyang Xiao, Qiben Yan -+ [Subliminal Corruption: Mechanisms, Thresholds, and Interpretability](https://arxiv.org//abs/2510.19152) ++ [Subliminal Corruption: Mechanisms, Thresholds, and Interpretability](https://arxiv.org/abs/2510.19152) Reya Vir, Sarvesh Bhatnagar -+ [ConvXformer: Differentially Private Hybrid ConvNeXt-Transformer for Inertial Navigation](https://arxiv.org//abs/2510.19352) ++ [ConvXformer: Differentially Private Hybrid ConvNeXt-Transformer for Inertial Navigation](https://arxiv.org/abs/2510.19352) Omer Tariq, Muhammad Bilal, Muneeb Ul Hassan, Dongsoo Han, Jon Crowcroft -+ [Revisiting the Relation Between Robustness and Universality](https://arxiv.org//abs/2510.19427) ++ [Revisiting the Relation Between Robustness and Universality](https://arxiv.org/abs/2510.19427) M. Klabunde, L. Caspari, F. Lemmerich -+ [The Tail Tells All: Estimating Model-Level Membership Inference Vulnerability Without Reference Models](https://arxiv.org//abs/2510.19773) ++ [The Tail Tells All: Estimating Model-Level Membership Inference Vulnerability Without Reference Models](https://arxiv.org/abs/2510.19773) Euodia Dodd, Nataša Krčo, Igor Shilov, Yves-Alexandre de Montjoye -+ [HAMLOCK: HArdware-Model LOgically Combined attacK](https://arxiv.org//abs/2510.19145) ++ [HAMLOCK: HArdware-Model LOgically Combined attacK](https://arxiv.org/abs/2510.19145) Sanskar Amgain, Daniel Lobo, Atri Chatterjee, Swarup Bhunia, Fnu Suya -+ [Exploring the Effect of DNN Depth on Adversarial Attacks in Network Intrusion Detection Systems](https://arxiv.org//abs/2510.19761) ++ [Exploring the Effect of DNN Depth on Adversarial Attacks in Network Intrusion Detection Systems](https://arxiv.org/abs/2510.19761) Mohamed ElShehaby, Ashraf Matrawy -+ [Defending Against Prompt Injection with DataFilter](https://arxiv.org//abs/2510.19207) ++ [Defending Against Prompt Injection with DataFilter](https://arxiv.org/abs/2510.19207) Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, David Wagner -+ [AegisMCP: Online Graph Intrusion Detection for Tool-Augmented LLMs on Edge Devices](https://arxiv.org//abs/2510.19462) ++ [AegisMCP: Online Graph Intrusion Detection for Tool-Augmented LLMs on Edge Devices](https://arxiv.org/abs/2510.19462) Zhonghao Zhan, Amir Al Sadi, Krinos Li, Hamed Haddadi -+ [Privacy-Preserving Spiking Neural Networks: A Deep Dive into Encryption Parameter Optimisation](https://arxiv.org//abs/2510.19537) ++ [Privacy-Preserving Spiking Neural Networks: A Deep Dive into Encryption Parameter Optimisation](https://arxiv.org/abs/2510.19537) Mahitha Pulivathi -+ [CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage](https://arxiv.org//abs/2510.19676) ++ [CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage](https://arxiv.org/abs/2510.19676) Nowfel Mashnoor, Mohammad Akyash, Hadi Kamali, Kimia Azar -+ [LLMs can hide text in other text of the same length.ipynb](https://arxiv.org//abs/2510.20075) ++ [LLMs can hide text in other text of the same length.ipynb](https://arxiv.org/abs/2510.20075) Antonio Norelli, Michael Bronstein -+ [Ask What Your Country Can Do For You: Towards a Public Red Teaming Model](https://arxiv.org//abs/2510.20061) ++ [Ask What Your Country Can Do For You: Towards a Public Red Teaming Model](https://arxiv.org/abs/2510.20061) Wm. Matthew Kennedy, Cigdem Patlak, Jayraj Dave, Blake Chambers, Aayush Dhanotiya, Darshini Ramiah, Reva Schwartz, Jack Hagen, Akash Kundu, Mouni Pendharkar, Liam Baisley, Theodora Skeadas, Rumman Chowdhury -+ [Mitigating Privacy-Utility Trade-off in Decentralized Federated Learning via $f$-Differential Privacy](https://arxiv.org//abs/2510.19934) ++ [Mitigating Privacy-Utility Trade-off in Decentralized Federated Learning via $f$-Differential Privacy](https://arxiv.org/abs/2510.19934) Xiang Li, Buxin Su, Chendi Wang, Qi Long, Weijie J. Su -+ [Towards Strong Certified Defense with Universal Asymmetric Randomization](https://arxiv.org//abs/2510.19977) ++ [Towards Strong Certified Defense with Universal Asymmetric Randomization](https://arxiv.org/abs/2510.19977) Hanbin Hong, Ashish Kundu, Ali Payani, Binghui Wang, Yuan Hong -+ [SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model Deployment](https://arxiv.org//abs/2510.19979) ++ [SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model Deployment](https://arxiv.org/abs/2510.19979) Tushar Nayan (1), Ziqi Zhang (2), Ruimin Sun (1) ((1) Florida International University, (2) University of Illinois Urbana-Champaign) -+ [FPT-Noise: Dynamic Scene-Aware Counterattack for Test-Time Adversarial Defense in Vision-Language Models](https://arxiv.org//abs/2510.20856) ++ [FPT-Noise: Dynamic Scene-Aware Counterattack for Test-Time Adversarial Defense in Vision-Language Models](https://arxiv.org/abs/2510.20856) Jia Deng, Jin Li, Zhenhua Zhao, Shaowei Wang # 2025-10-21 -+ [Rectifying Shortcut Behaviors in Preference-based Reward Learning](https://arxiv.org//abs/2510.19050) ++ [Rectifying Shortcut Behaviors in Preference-based Reward Learning](https://arxiv.org/abs/2510.19050) Wenqian Ye, Guangtao Zheng, Aidong Zhang -+ [DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code](https://arxiv.org//abs/2510.18904) ++ [DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code](https://arxiv.org/abs/2510.18904) Shriyansh Agrawal, Aidan Lau, Sanyam Shah, Ahan M R, Kevin Zhu, Sunishchal Dev, Vasu Sharma -+ [FeatureFool: Zero-Query Fooling of Video Models via Feature Map](https://arxiv.org//abs/2510.18362) ++ [FeatureFool: Zero-Query Fooling of Video Models via Feature Map](https://arxiv.org/abs/2510.18362) Duoxun Tang, Xi Xiao, Guangwu Hu, Kangkang Sun, Xiao Yang, Dongyang Chen, Qing Li, Yongjie Yin, Jiyao Wang -+ [Towards Universal Solvers: Using PGD Attack in Active Learning to Increase Generalizability of Neural Operators as Knowledge Distillation from Numerical PDE Solvers](https://arxiv.org//abs/2510.18989) ++ [Towards Universal Solvers: Using PGD Attack in Active Learning to Increase Generalizability of Neural Operators as Knowledge Distillation from Numerical PDE Solvers](https://arxiv.org/abs/2510.18989) Yifei Sun -+ [POLAR: Policy-based Layerwise Reinforcement Learning Method for Stealthy Backdoor Attacks in Federated Learning](https://arxiv.org//abs/2510.19056) ++ [POLAR: Policy-based Layerwise Reinforcement Learning Method for Stealthy Backdoor Attacks in Federated Learning](https://arxiv.org/abs/2510.19056) Kuai Yu, Xiaoyu Wu, Peishen Yan, Qingqian Yang, Linshan Jiang, Hao Wang, Yang Hua, Tao Song, Haibing Guan -+ [The Black Tuesday Attack: how to crash the stock market with adversarial examples to financial forecasting models](https://arxiv.org//abs/2510.18990) ++ [The Black Tuesday Attack: how to crash the stock market with adversarial examples to financial forecasting models](https://arxiv.org/abs/2510.18990) Thomas Hofweber, Jefrey Bergl, Ian Reyes, Amir Sadovnik -+ [Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability](https://arxiv.org//abs/2510.19851) ++ [Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability](https://arxiv.org/abs/2510.19851) Artur Zolkowski, Wen Xing, David Lindner, Florian Tramèr, Erik Jenner -+ [Extracting alignment data in open models](https://arxiv.org//abs/2510.18554) ++ [Extracting alignment data in open models](https://arxiv.org/abs/2510.18554) Federico Barbero, Xiangming Gu, Christopher A. Choquette-Choo, Chawin Sitawarin, Matthew Jagielski, Itay Yona, Petar Veličković, Ilia Shumailov, Jamie Hayes # 2025-10-20 -+ [PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits](https://arxiv.org//abs/2510.17947) ++ [PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits](https://arxiv.org/abs/2510.17947) Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar -+ [CourtGuard: A Local, Multiagent Prompt Injection Classifier](https://arxiv.org//abs/2510.19844) ++ [CourtGuard: A Local, Multiagent Prompt Injection Classifier](https://arxiv.org/abs/2510.19844) Isaac Wu, Michael Maslowski -+ [GUIDE: Enhancing Gradient Inversion Attacks in Federated Learning with Denoising Models](https://arxiv.org//abs/2510.17621) ++ [GUIDE: Enhancing Gradient Inversion Attacks in Federated Learning with Denoising Models](https://arxiv.org/abs/2510.17621) Vincenzo Carletti, Pasquale Foggia, Carlo Mazzocca, Giuseppe Parrella, Mario Vento # 2025-10-17 -+ [DRO-InstructZero: Distributionally Robust Prompt Optimization for Large Language Models](https://arxiv.org//abs/2510.15260) ++ [DRO-InstructZero: Distributionally Robust Prompt Optimization for Large Language Models](https://arxiv.org/abs/2510.15260) Yangyang Li -+ [DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing](https://arxiv.org//abs/2510.15303) ++ [DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing](https://arxiv.org/abs/2510.15303) Ting Qiao, Xing Liu, Wenke Huang, Jianbin Li, Zhaoxin Fan, Yiming Li -+ [Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models](https://arxiv.org//abs/2510.15430) ++ [Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models](https://arxiv.org/abs/2510.15430) Shuang Liang, Zhihao Xu, Jialing Tao, Hui Xue, Xiting Wang -+ [SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models](https://arxiv.org//abs/2510.15476) ++ [SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models](https://arxiv.org/abs/2510.15476) Hanbin Hong, Shuya Feng, Nima Naderloui, Shenao Yan, Jingyu Zhang, Biying Liu, Ali Arastehfard, Heqing Huang, Yuan Hong -+ [DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios](https://arxiv.org//abs/2510.15501) ++ [DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios](https://arxiv.org/abs/2510.15501) Yao Huang, Yitong Sun, Yichi Zhang, Ruochen Zhang, Yinpeng Dong, Xingxing Wei -+ [Language Models are Injective and Hence Invertible](https://arxiv.org//abs/2510.15511) ++ [Language Models are Injective and Hence Invertible](https://arxiv.org/abs/2510.15511) Giorgos Nikolaou, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Yannis Panagakis, Emanuele Rodola' -+ [Unmasking Facial DeepFakes: A Robust Multiview Detection Framework for Natural Images](https://arxiv.org//abs/2510.15576) ++ [Unmasking Facial DeepFakes: A Robust Multiview Detection Framework for Natural Images](https://arxiv.org/abs/2510.15576) Sami Belguesmia, Mohand Saïd Allili, Assia Hamadene -+ [Stress-Aware Learning under KL Drift via Trust-Decayed Mirror Descent](https://arxiv.org//abs/2510.15222) ++ [Stress-Aware Learning under KL Drift via Trust-Decayed Mirror Descent](https://arxiv.org/abs/2510.15222) Gabriel Nixon Raj -+ [Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks](https://arxiv.org//abs/2510.15333) ++ [Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks](https://arxiv.org/abs/2510.15333) Yuyuan Feng, Bin Ma, Enyan Dai -+ [Adversary-Free Counterfactual Prediction via Information-Regularized Representations](https://arxiv.org//abs/2510.15479) ++ [Adversary-Free Counterfactual Prediction via Information-Regularized Representations](https://arxiv.org/abs/2510.15479) Shiqin Tang, Rong Feng, Shuxin Zhuang, Hongzong Li, Youzhi Zhang -+ [Constrained Adversarial Perturbation](https://arxiv.org//abs/2510.15699) ++ [Constrained Adversarial Perturbation](https://arxiv.org/abs/2510.15699) Virendra Nishad (IIT Kanpur, India), Bhaskar Mukhoty (IIT Delhi, India), Hilal AlQuabeh (MBZUAI, UAE), Sandeep K. Shukla (IIIT Hyderabad, India), Sayak Ray Chowdhury (IIT Kanpur, India) -+ [Blackwell's Approachability for Sequential Conformal Inference](https://arxiv.org//abs/2510.15824) ++ [Blackwell's Approachability for Sequential Conformal Inference](https://arxiv.org/abs/2510.15824) Guillaume Principato, Gilles Stoltz -+ [HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment](https://arxiv.org//abs/2510.15499) ++ [HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment](https://arxiv.org/abs/2510.15499) Yuexiao Liu, Lijun Li, Xingjun Wang, Jing Shao -+ [Towards Proactive Defense Against Cyber Cognitive Attacks](https://arxiv.org//abs/2510.15801) ++ [Towards Proactive Defense Against Cyber Cognitive Attacks](https://arxiv.org/abs/2510.15801) Bonnie Rushing, Mac-Rufus Umeokolo, Shouhuai Xu -+ [Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness](https://arxiv.org//abs/2510.16171) ++ [Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness](https://arxiv.org/abs/2510.16171) Longwei Wang, Ifrat Ikhtear Uddin, KC Santosh, Chaowei Zhang, Xiao Qin, Yang Zhou # 2025-10-16 -+ [Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks](https://arxiv.org//abs/2510.14207) ++ [Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks](https://arxiv.org/abs/2510.14207) Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu -+ [A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space](https://arxiv.org//abs/2510.14301) ++ [A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space](https://arxiv.org/abs/2510.14301) Bingjie Zhang, Yibo Yang, Renzhe, Dandan Guo, Jindong Gu, Philip Torr, Bernard Ghanem -+ [Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies](https://arxiv.org//abs/2510.14312) ++ [Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies](https://arxiv.org/abs/2510.14312) Mason Nakamura, Abhinav Kumar, Saaduddin Mahmud, Sahar Abdelnabi, Shlomo Zilberstein, Eugene Bagdasarian -+ [Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation](https://arxiv.org//abs/2510.14246) ++ [Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation](https://arxiv.org/abs/2510.14246) Jingwen Gu, Yiting He, Zhishuai Liu, Pan Xu -+ [TED++: Submanifold-Aware Backdoor Detection via Layerwise Tubular-Neighbourhood Screening](https://arxiv.org//abs/2510.14299) ++ [TED++: Submanifold-Aware Backdoor Detection via Layerwise Tubular-Neighbourhood Screening](https://arxiv.org/abs/2510.14299) Nam Le, Leo Yu Zhang, Kewen Liao, Shirui Pan, Wei Luo -+ [BinCtx: Multi-Modal Representation Learning for Robust Android App Behavior Detection](https://arxiv.org//abs/2510.14344) ++ [BinCtx: Multi-Modal Representation Learning for Robust Android App Behavior Detection](https://arxiv.org/abs/2510.14344) Zichen Liu, Shao Yang, Xusheng Xiao -+ [Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers](https://arxiv.org//abs/2510.14381) ++ [Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers](https://arxiv.org/abs/2510.14381) Andrew Zhao, Reshmi Ghosh, Vitor Carvalho, Emily Lawton, Keegan Hines, Gao Huang, Jack W. Stokes -+ [Stealthy Dual-Trigger Backdoors: Attacking Prompt Tuning in LM-Empowered Graph Foundation Models](https://arxiv.org//abs/2510.14470) ++ [Stealthy Dual-Trigger Backdoors: Attacking Prompt Tuning in LM-Empowered Graph Foundation Models](https://arxiv.org/abs/2510.14470) Xiaoyu Xue, Yuni Lai, Chenxi Huang, Yulin Zhu, Gaolei Li, Xiaoge Zhang, Kai Zhou -+ [Galaxy Morphology Classification with Counterfactual Explanation](https://arxiv.org//abs/2510.14655) ++ [Galaxy Morphology Classification with Counterfactual Explanation](https://arxiv.org/abs/2510.14655) Zhuo Cao, Lena Krieger, Hanno Scharr, Ira Assent -+ [On the Ability of LLMs to Handle Character-Level Perturbations: How Well and How?](https://arxiv.org//abs/2510.14365) ++ [On the Ability of LLMs to Handle Character-Level Perturbations: How Well and How?](https://arxiv.org/abs/2510.14365) Anyun Zhuo, Xuefei Ning, Ningyuan Li, Yu Wang, Pinyan Lu -+ [A Multi-domain Image Translative Diffusion StyleGAN for Iris Presentation Attack Detection](https://arxiv.org//abs/2510.14314) ++ [A Multi-domain Image Translative Diffusion StyleGAN for Iris Presentation Attack Detection](https://arxiv.org/abs/2510.14314) Shivangi Yadav, Arun Ross -+ [Structured Universal Adversarial Attacks on Object Detection for Video Sequences](https://arxiv.org//abs/2510.14460) ++ [Structured Universal Adversarial Attacks on Object Detection for Video Sequences](https://arxiv.org/abs/2510.14460) Sven Jacob, Weijia Shao, Gjergji Kasneci -+ [Acquisition of interpretable domain information during brain MR image harmonization for content-based image retrieval](https://arxiv.org//abs/2510.14535) ++ [Acquisition of interpretable domain information during brain MR image harmonization for content-based image retrieval](https://arxiv.org/abs/2510.14535) Keima Abe, Hayato Muraki, Shuhei Tomoshige, Kenichi Oishi, Hitoshi Iyatomi -+ [SteeringTTA: Guiding Diffusion Trajectories for Robust Test-Time-Adaptation](https://arxiv.org//abs/2510.14634) ++ [SteeringTTA: Guiding Diffusion Trajectories for Robust Test-Time-Adaptation](https://arxiv.org/abs/2510.14634) Jihyun Yu, Yoojin Oh, Wonho Bae, Mingyu Kim, Junhyug Noh -+ [Backdoor Unlearning by Linear Task Decomposition](https://arxiv.org//abs/2510.14845) ++ [Backdoor Unlearning by Linear Task Decomposition](https://arxiv.org/abs/2510.14845) Amel Abdelraheem, Alessandro Favero, Gerome Bovet, Pascal Frossard -+ [When Flatness Does (Not) Guarantee Adversarial Robustness](https://arxiv.org//abs/2510.14231) ++ [When Flatness Does (Not) Guarantee Adversarial Robustness](https://arxiv.org/abs/2510.14231) Nils Philipp Walter, Linara Adilova, Jilles Vreeken, Michael Kamp -+ [Towards geological inference with process-based and deep generative modeling, part 1: training on fluvial deposits](https://arxiv.org//abs/2510.14445) ++ [Towards geological inference with process-based and deep generative modeling, part 1: training on fluvial deposits](https://arxiv.org/abs/2510.14445) Guillaume Rongier, Luk Peeters -+ [Redundancy-Aware Test-Time Graph Out-of-Distribution Detection](https://arxiv.org//abs/2510.14562) ++ [Redundancy-Aware Test-Time Graph Out-of-Distribution Detection](https://arxiv.org/abs/2510.14562) Yue Hou, He Zhu, Ruomei Liu, Yingke Su, Junran Wu, Ke Xu -+ [An Information Asymmetry Game for Trigger-based DNN Model Watermarking](https://arxiv.org//abs/2510.14218) ++ [An Information Asymmetry Game for Trigger-based DNN Model Watermarking](https://arxiv.org/abs/2510.14218) Chaoyue Huang, Gejian Zhao, Hanzhou Wu, Zhihua Xia, Asad Malik -+ [RHINO: Guided Reasoning for Mapping Network Logs to Adversarial Tactics and Techniques with Large Language Models](https://arxiv.org//abs/2510.14233) ++ [RHINO: Guided Reasoning for Mapping Network Logs to Adversarial Tactics and Techniques with Large Language Models](https://arxiv.org/abs/2510.14233) Fanchao Meng, Jiaping Gui, Yunbo Li, Yue Wu -+ [Certifying optimal MEV strategies with Lean](https://arxiv.org//abs/2510.14480) ++ [Certifying optimal MEV strategies with Lean](https://arxiv.org/abs/2510.14480) Massimo Bartoletti, Riccardo Marchesin, Roberto Zunino -+ [Lexo: Eliminating Stealthy Supply-Chain Attacks via LLM-Assisted Program Regeneration](https://arxiv.org//abs/2510.14522) ++ [Lexo: Eliminating Stealthy Supply-Chain Attacks via LLM-Assisted Program Regeneration](https://arxiv.org/abs/2510.14522) Evangelos Lamprou, Julian Dai, Grigoris Ntousakis, Martin C. Rinard, Nikos Vasilakis -+ [A Hard-Label Black-Box Evasion Attack against ML-based Malicious Traffic Detection Systems](https://arxiv.org//abs/2510.14906) ++ [A Hard-Label Black-Box Evasion Attack against ML-based Malicious Traffic Detection Systems](https://arxiv.org/abs/2510.14906) Zixuan Liu, Yi Zhao, Zhuotao Liu, Qi Li, Chuanpu Fu, Guangmeng Zhou, Ke Xu -+ [DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models](https://arxiv.org//abs/2510.15015) ++ [DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models](https://arxiv.org/abs/2510.15015) Mor Ventura, Michael Toker, Or Patashnik, Yonatan Belinkov, Roi Reichart -+ [Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks](https://arxiv.org//abs/2510.15017) ++ [Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks](https://arxiv.org/abs/2510.15017) ChenYu Wu, Yi Wang, Yang Liao -+ [Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling](https://arxiv.org//abs/2510.15068) ++ [Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling](https://arxiv.org/abs/2510.15068) Deyue Zhang, Dongdong Yang, Junjie Mu, Quancheng Zou, Zonghao Ying, Wenzhuo Xu, Zhao Liu, Xuan Wang, Xiangzheng Zhang -+ [Targeted Attacks and Defenses for Distributed Federated Learning in Vehicular Networks](https://arxiv.org//abs/2510.15109) ++ [Targeted Attacks and Defenses for Distributed Federated Learning in Vehicular Networks](https://arxiv.org/abs/2510.15109) Utku Demir, Tugba Erpek, Yalin E. Sagduyu, Sastry Kompella, Mengran Xue -+ [MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation](https://arxiv.org//abs/2510.15186) ++ [MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation](https://arxiv.org/abs/2510.15186) Gurusha Juneja, Jayanth Naga Sai Pasupulati, Alon Albalak, Wenyue Hua, William Yang Wang -+ [PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models](https://arxiv.org//abs/2510.15106) ++ [PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models](https://arxiv.org/abs/2510.15106) Issam Seddik, Sami Souihi, Mohamed Tamaazousti, Sara Tucci Piergiovanni -+ [SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling](https://arxiv.org//abs/2510.15083) ++ [SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling](https://arxiv.org/abs/2510.15083) Georgi Ganev, Reza Nazari, Rees Davison, Amir Dizche, Xinmin Wu, Ralph Abbey, Jorge Silva, Emiliano De Cristofaro # 2025-10-15 -+ [SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning](https://arxiv.org//abs/2510.13262) ++ [SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning](https://arxiv.org/abs/2510.13262) Weiqi Guo, Guanjun Liu, Ziyuan Zhou -+ [TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language Models](https://arxiv.org//abs/2510.13106) ++ [TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language Models](https://arxiv.org/abs/2510.13106) Ruoyu Sun, Da Song, Jiayang Song, Yuheng Huang, Lei Ma -+ [Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning](https://arxiv.org//abs/2510.13322) ++ [Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning](https://arxiv.org/abs/2510.13322) Baogang Song, Dongdong Zhao, Jianwen Xiang, Qiben Xu, Zizhuo Yu -+ [Personal Attribute Leakage in Federated Speech Models](https://arxiv.org//abs/2510.13357) ++ [Personal Attribute Leakage in Federated Speech Models](https://arxiv.org/abs/2510.13357) Hamdan Al-Ali, Ali Reza Ghavamipour, Tommaso Caselli, Fatih Turkmen, Zeerak Talat, Hanan Aldarmaki -+ [Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control](https://arxiv.org//abs/2510.13358) ++ [Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control](https://arxiv.org/abs/2510.13358) Shingo Ayabe, Hiroshi Kera, Kazuhiko Kawamoto -+ [Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training](https://arxiv.org//abs/2510.13361) ++ [Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training](https://arxiv.org/abs/2510.13361) Yisen Wang, Yichuan Mo, Hongjun Wang, Junyi Li, Zhouchen Lin -+ [In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers](https://arxiv.org//abs/2510.13543) ++ [In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers](https://arxiv.org/abs/2510.13543) Avihay Cohen -+ [Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach](https://arxiv.org//abs/2510.13792) ++ [Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach](https://arxiv.org/abs/2510.13792) Ziqing Lu, Lifeng Lai, Weiyu Xu -+ [SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs](https://arxiv.org//abs/2510.13190) ++ [SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs](https://arxiv.org/abs/2510.13190) Juan Ren, Mark Dras, Usman Naseem -+ [Taming the Fragility of KV Cache Eviction in LLM Inference](https://arxiv.org//abs/2510.13334) ++ [Taming the Fragility of KV Cache Eviction in LLM Inference](https://arxiv.org/abs/2510.13334) Yuan Feng, Haoyu Guo, JunLin Lv, S. Kevin Zhou, Xike Xie -+ [GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians](https://arxiv.org//abs/2510.13734) ++ [GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians](https://arxiv.org/abs/2510.13734) Xiuyuan Chen, Tao Sun, Dexin Su, Ailing Yu, Junwei Liu, Zhe Chen, Gangzeng Jin, Xin Wang, Jingnan Liu, Hansong Xiao, Hualei Zhou, Dongjie Tao, Chunxiao Guo, Minghui Yang, Yuan Xia, Jing Zhao, Qianrui Fan, Yanyun Wang, Shuai Zhen, Kezhong Chen, Jun Wang, Zewen Sun, Heng Zhao, Tian Guan, Shaodong Wang, Geyun Chang, Jiaming Deng, Hongchengcheng Chen, Kexin Feng, Ruzhen Li, Jiayi Geng, Changtai Zhao, Jun Wang, Guihu Lin, Peihao Li, Liqi Liu, Peng Wei, Jian Wang, Jinjie Gu, Ping Wang, Fan Yang -+ [LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models](https://arxiv.org//abs/2510.13626) ++ [LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models](https://arxiv.org/abs/2510.13626) Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, Jinlan Fu, Jingjing Gong, Xipeng Qiu -+ [Model-agnostic Adversarial Attack and Defense for Vision-Language-Action Models](https://arxiv.org//abs/2510.13237) ++ [Model-agnostic Adversarial Attack and Defense for Vision-Language-Action Models](https://arxiv.org/abs/2510.13237) Haochuan Xu, Yun Sing Koh, Shuhuai Huang, Zirun Zhou, Di Wang, Jun Sakuma, Jingfeng Zhang -+ [Towards Adversarial Robustness and Uncertainty Quantification in DINOv2-based Few-Shot Anomaly Detection](https://arxiv.org//abs/2510.13643) ++ [Towards Adversarial Robustness and Uncertainty Quantification in DINOv2-based Few-Shot Anomaly Detection](https://arxiv.org/abs/2510.13643) Akib Mohammed Khan, Bartosz Krawczyk -+ [Risk-adaptive Activation Steering for Safe Multimodal Large Language Models](https://arxiv.org//abs/2510.13698) ++ [Risk-adaptive Activation Steering for Safe Multimodal Large Language Models](https://arxiv.org/abs/2510.13698) Jonghyun Park, Minhyuk Seo, Jonghyun Choi -+ [Selective Adversarial Attacks on LLM Benchmarks](https://arxiv.org//abs/2510.13570) ++ [Selective Adversarial Attacks on LLM Benchmarks](https://arxiv.org/abs/2510.13570) Ivan Dubrovsky, Anastasia Orlova, Illarion Iov, Nina Gubina, Irena Gureeva, Alexey Zaytsev -+ [Robust Minimax Boosting with Performance Guarantees](https://arxiv.org//abs/2510.13445) ++ [Robust Minimax Boosting with Performance Guarantees](https://arxiv.org/abs/2510.13445) Santiago Mazuelas, Veronica Alvarez -+ [From base cases to backdoors: An Empirical Study of Unnatural Crypto-API Misuse](https://arxiv.org//abs/2510.13102) ++ [From base cases to backdoors: An Empirical Study of Unnatural Crypto-API Misuse](https://arxiv.org/abs/2510.13102) Victor Olaiya, Adwait Nadkarni -+ [Privacy-Aware Framework of Robust Malware Detection in Indoor Robots: Hybrid Quantum Computing and Deep Neural Networks](https://arxiv.org//abs/2510.13136) ++ [Privacy-Aware Framework of Robust Malware Detection in Indoor Robots: Hybrid Quantum Computing and Deep Neural Networks](https://arxiv.org/abs/2510.13136) Tan Le, Van Le, Sachin Shetty -+ [Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts](https://arxiv.org//abs/2510.13451) ++ [Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts](https://arxiv.org/abs/2510.13451) Li Bai, Qingqing Ye, Xinwei Zhang, Sen Zhang, Zi Liang, Jianliang Xu, Haibo Hu -+ [Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers](https://arxiv.org//abs/2510.13462) ++ [Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers](https://arxiv.org/abs/2510.13462) Xin Zhao, Xiaojun Chen, Bingshan Liu, Haoyu Gao, Zhendong Zhao, Yilong Chen -+ [Cyber-Resilient System Identification for Power Grid through Bayesian Integration](https://arxiv.org//abs/2510.14043) ++ [Cyber-Resilient System Identification for Power Grid through Bayesian Integration](https://arxiv.org/abs/2510.14043) Shimiao Li, Guannan Qu, Bryan Hooi, Vyas Sekar, Soummya Kar, Larry Pileggi -+ [Every Language Model Has a Forgery-Resistant Signature](https://arxiv.org//abs/2510.14086) ++ [Every Language Model Has a Forgery-Resistant Signature](https://arxiv.org/abs/2510.14086) Matthew Finlayson, Xiang Ren, Swabha Swayamdipta -+ [Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions](https://arxiv.org//abs/2510.13931) ++ [Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions](https://arxiv.org/abs/2510.13931) Siying Liu, Shisheng Zhang, Indu Bala -+ [NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations](https://arxiv.org//abs/2510.14025) ++ [NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations](https://arxiv.org/abs/2510.14025) Junjie Nan, Jianing Li, Wei Chen, Mingkun Zhang, Xueqi Cheng -+ [Signature in Code Backdoor Detection, how far are we?](https://arxiv.org//abs/2510.13992) ++ [Signature in Code Backdoor Detection, how far are we?](https://arxiv.org/abs/2510.13992) Quoc Hung Le, Thanh Le-Cong, Bach Le, Bowen Xu -+ [PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features](https://arxiv.org//abs/2510.14005) ++ [PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features](https://arxiv.org/abs/2510.14005) Wei Zou, Yupei Liu, Yanting Wang, Ying Chen, Neil Gong, Jinyuan Jia # 2025-10-14 -+ [Towards Robust Artificial Intelligence: Self-Supervised Learning Approach for Out-of-Distribution Detection](https://arxiv.org//abs/2510.12713) ++ [Towards Robust Artificial Intelligence: Self-Supervised Learning Approach for Out-of-Distribution Detection](https://arxiv.org/abs/2510.12713) Wissam Salhab, Darine Ameyed, Hamid Mcheick, Fehmi Jaafar -+ [SafeMT: Multi-turn Safety for Multimodal Language Models](https://arxiv.org//abs/2510.12133) ++ [SafeMT: Multi-turn Safety for Multimodal Language Models](https://arxiv.org/abs/2510.12133) Han Zhu, Juntao Dai, Jiaming Ji, Haoran Li, Chengkun Cai, Pengcheng Wen, Chi-Min Chan, Boyuan Chen, Yaodong Yang, Sirui Han, Yike Guo -+ [PromptLocate: Localizing Prompt Injection Attacks](https://arxiv.org//abs/2510.12252) ++ [PromptLocate: Localizing Prompt Injection Attacks](https://arxiv.org/abs/2510.12252) Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong -+ [Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs](https://arxiv.org//abs/2510.12255) ++ [Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs](https://arxiv.org/abs/2510.12255) Blazej Manczak, Eric Lin, Francisco Eiras, James O' Neill, Vaikkunth Mugunthan -+ [LLM-REVal: Can We Trust LLM Reviewers Yet?](https://arxiv.org//abs/2510.12367) ++ [LLM-REVal: Can We Trust LLM Reviewers Yet?](https://arxiv.org/abs/2510.12367) Rui Li, Jia-Chen Gu, Po-Nien Kung, Heming Xia, Junfeng liu, Xiangwen Kong, Zhifang Sui, Nanyun Peng -+ [When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection](https://arxiv.org//abs/2510.12476) ++ [When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection](https://arxiv.org/abs/2510.12476) Lang Gao, Xuhui Li, Chenxi Wang, Mingzhe Li, Wei Liu, Zirui Song, Jinghui Zhang, Rui Yan, Preslav Nakov, Xiuying Chen -+ [StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis](https://arxiv.org//abs/2510.12608) ++ [StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis](https://arxiv.org/abs/2510.12608) Siyuan Li, Aodu Wulianghai, Xi Lin, Guangyan Li, Xiang Chen, Jun Wu, Jianhua Li -+ [Vision Language Models Map Logos to Text via Semantic Entanglement in the Visual Projector](https://arxiv.org//abs/2510.12287) ++ [Vision Language Models Map Logos to Text via Semantic Entanglement in the Visual Projector](https://arxiv.org/abs/2510.12287) Sifan Li, Hongkai Chen, Yujun Cai, Qingwen Ye, Liyang Chen, Junsong Yuan, Yiwei Wang -+ [Content Anonymization for Privacy in Long-form Audio](https://arxiv.org//abs/2510.12780) ++ [Content Anonymization for Privacy in Long-form Audio](https://arxiv.org/abs/2510.12780) Cristina Aggazzotti, Ashi Garg, Zexin Cai, Nicholas Andrews -+ [ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation](https://arxiv.org//abs/2510.12119) ++ [ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation](https://arxiv.org/abs/2510.12119) Ziyuan Luo, Yangyi Zhao, Ka Chun Cheung, Simon See, Renjie Wan -+ [MS-GAGA: Metric-Selective Guided Adversarial Generation Attack](https://arxiv.org//abs/2510.12468) ++ [MS-GAGA: Metric-Selective Guided Adversarial Generation Attack](https://arxiv.org/abs/2510.12468) Dion J. X. Ho, Gabriel Lee Jun Rong, Niharika Shrivastava, Harshavardhan Abichandani, Pai Chet Ng, Xiaoxiao Miao -+ [Fairness-Constrained Optimization Attack in Federated Learning](https://arxiv.org//abs/2510.12143) ++ [Fairness-Constrained Optimization Attack in Federated Learning](https://arxiv.org/abs/2510.12143) Harsh Kasyap, Minghong Fang, Zhuqing Liu, Carsten Maple, Somanath Tripathy -+ [Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs](https://arxiv.org//abs/2510.12233) ++ [Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs](https://arxiv.org/abs/2510.12233) Bowen Fan, Zhilin Guo, Xunkai Li, Yihan Zhou, Bing Zhou, Zhenjun Li, Rong-Hua Li, Guoren Wang -+ [Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers](https://arxiv.org//abs/2510.12672) ++ [Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers](https://arxiv.org/abs/2510.12672) Ruben Belo, Claudia Soares, Marta Guimaraes -+ [KoALA: KL-L0 Adversarial Detector via Label Agreement](https://arxiv.org//abs/2510.12752) ++ [KoALA: KL-L0 Adversarial Detector via Label Agreement](https://arxiv.org/abs/2510.12752) Siqi Li, Yasser Shoukry -+ [Sample-Efficient Omniprediction for Proper Losses](https://arxiv.org//abs/2510.12769) ++ [Sample-Efficient Omniprediction for Proper Losses](https://arxiv.org/abs/2510.12769) Isaac Gibbs, Ryan J. Tibshirani -+ [DeepTrust: Multi-Step Classification through Dissimilar Adversarial Representations for Robust Android Malware Detection](https://arxiv.org//abs/2510.12310) ++ [DeepTrust: Multi-Step Classification through Dissimilar Adversarial Representations for Robust Android Malware Detection](https://arxiv.org/abs/2510.12310) Daniel Pulido-Cortázar, Daniel Gibert, Felip Manyà -+ [Leaking Queries On Secure Stream Processing Systems](https://arxiv.org//abs/2510.12172) ++ [Leaking Queries On Secure Stream Processing Systems](https://arxiv.org/abs/2510.12172) Hung Pham, Viet Vo, Tien Tuan Anh Dinh, Duc Tran, Shuhao Zhang -+ [IP-Augmented Multi-Modal Malicious URL Detection Via Token-Contrastive Representation Enhancement and Multi-Granularity Fusion](https://arxiv.org//abs/2510.12395) ++ [IP-Augmented Multi-Modal Malicious URL Detection Via Token-Contrastive Representation Enhancement and Multi-Granularity Fusion](https://arxiv.org/abs/2510.12395) Ye Tian, Yanqiu Yu, Liangliang Song, Zhiquan Liu, Yanbin Wang, Jianguo Sun -+ [Targeted Pooled Latent-Space Steganalysis Applied to Generative Steganography, with a Fix](https://arxiv.org//abs/2510.12414) ++ [Targeted Pooled Latent-Space Steganalysis Applied to Generative Steganography, with a Fix](https://arxiv.org/abs/2510.12414) Etienne Levecque (LIST3N), Aurélien Noirault (CRIStAL), Tomáš Pevný (CTU), Jan Butora (CRIStAL), Patrick Bas (CRIStAL), Rémi Cogranne (LIST3N) -+ [Who's Asking? Evaluating LLM Robustness to Inquiry Personas in Factual Question Answering](https://arxiv.org//abs/2510.12925) ++ [Who's Asking? Evaluating LLM Robustness to Inquiry Personas in Factual Question Answering](https://arxiv.org/abs/2510.12925) Nil-Jana Akpinar, Chia-Jung Lee, Vanessa Murdock, Pietro Perona -+ [A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation](https://arxiv.org//abs/2510.12993) ++ [A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation](https://arxiv.org/abs/2510.12993) João A. Leite, Arnav Arora, Silvia Gargova, João Luz, Gustavo Sampaio, Ian Roberts, Carolina Scarton, Kalina Bontcheva -+ [Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning](https://arxiv.org//abs/2510.12939) ++ [Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning](https://arxiv.org/abs/2510.12939) James Pedley, Benjamin Etheridge, Stephen J. Roberts, Francesco Quinzan -+ [An Investigation of Memorization Risk in Healthcare Foundation Models](https://arxiv.org//abs/2510.12950) ++ [An Investigation of Memorization Risk in Healthcare Foundation Models](https://arxiv.org/abs/2510.12950) Sana Tonekaboni, Lena Stempfle, Adibvafa Fallahpour, Walter Gerych, Marzyeh Ghassemi -+ [Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check](https://arxiv.org//abs/2510.12981) ++ [Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check](https://arxiv.org/abs/2510.12981) Sungjun Cho, Dasol Hwang, Frederic Sala, Sangheum Hwang, Kyunghyun Cho, Sungmin Cha -+ [Simulation-Based Pretraining and Domain Adaptation for Astronomical Time Series with Minimal Labeled Data](https://arxiv.org//abs/2510.12958) ++ [Simulation-Based Pretraining and Domain Adaptation for Astronomical Time Series with Minimal Labeled Data](https://arxiv.org/abs/2510.12958) Rithwik Gupta, Daniel Muthukrishna, Jeroen Audenaert -+ [Local Differential Privacy for Federated Learning with Fixed Memory Usage and Per-Client Privacy](https://arxiv.org//abs/2510.12908) ++ [Local Differential Privacy for Federated Learning with Fixed Memory Usage and Per-Client Privacy](https://arxiv.org/abs/2510.12908) Rouzbeh Behnia, Jeremiah Birrell, Arman Riasi, Reza Ebrahimi, Kaushik Dutta, Thang Hoang -+ [Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection](https://arxiv.org//abs/2510.13893) ++ [Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection](https://arxiv.org/abs/2510.13893) Olga E. Sorokoletova, Francesco Giarrusso, Vincenzo Suriani, Daniele Nardi -+ [RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs](https://arxiv.org//abs/2510.13901) ++ [RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs](https://arxiv.org/abs/2510.13901) Tuan T. Nguyen, John Le, Thai T. Vu, Willy Susilo, Heath Cooper # 2025-10-13 -+ [GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving](https://arxiv.org//abs/2510.11769) ++ [GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving](https://arxiv.org/abs/2510.11769) Ruida Wang, Jiarui Yao, Rui Pan, Shizhe Diao, Tong Zhang -+ [PHANTOM RECALL: When Familiar Puzzles Fool Smart Models](https://arxiv.org//abs/2510.11812) ++ [PHANTOM RECALL: When Familiar Puzzles Fool Smart Models](https://arxiv.org/abs/2510.11812) Souradeep Mukhopadhyay, Rishabh Baral, Nimeesh Mahajan, Samhitha Harish, Aswin RRV, Mihir Parmar, Mutsumi Nakamura, Chitta Baral -+ [BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing](https://arxiv.org//abs/2510.11823) ++ [BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing](https://arxiv.org/abs/2510.11823) Caelin Kaplan, Alexander Warnecke, Neil Archibald -+ [Countermind: A Multi-Layered Security Architecture for Large Language Models](https://arxiv.org//abs/2510.11837) ++ [Countermind: A Multi-Layered Security Architecture for Large Language Models](https://arxiv.org/abs/2510.11837) Dominik Schwarz -+ [LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance](https://arxiv.org//abs/2510.11905) ++ [LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance](https://arxiv.org/abs/2510.11905) Patrick Haller, Mark Ibrahim, Polina Kirichenko, Levent Sagun, Samuel J. Bell -+ [Don't Walk the Line: Boundary Guidance for Filtered Generation](https://arxiv.org//abs/2510.11834) ++ [Don't Walk the Line: Boundary Guidance for Filtered Generation](https://arxiv.org/abs/2510.11834) Sarah Ball, Andreas Haupt -+ [Deep Research Brings Deeper Harm](https://arxiv.org//abs/2510.11851) ++ [Deep Research Brings Deeper Harm](https://arxiv.org/abs/2510.11851) Shuo Chen, Zonggen Li, Zhen Han, Bailan He, Tong Liu, Haokun Chen, Georg Groh, Philip Torr, Volker Tresp, Jindong Gu -+ [Robust Adversarial Reinforcement Learning in Stochastic Games via Sequence Modeling](https://arxiv.org//abs/2510.11877) ++ [Robust Adversarial Reinforcement Learning in Stochastic Games via Sequence Modeling](https://arxiv.org/abs/2510.11877) Xiaohang Tang, Zhuowen Cheng, Satyabrat Kumar -+ [High-Probability Bounds For Heterogeneous Local Differential Privacy](https://arxiv.org//abs/2510.11895) ++ [High-Probability Bounds For Heterogeneous Local Differential Privacy](https://arxiv.org/abs/2510.11895) Maryam Aliakbarpour, Alireza Fallah, Swaha Roy, Ria Stevens -+ [A Comprehensive Survey of Website Fingerprinting Attacks and Defenses in Tor: Advances and Open Challenges](https://arxiv.org//abs/2510.11804) ++ [A Comprehensive Survey of Website Fingerprinting Attacks and Defenses in Tor: Advances and Open Challenges](https://arxiv.org/abs/2510.11804) Yuwen Cui, Guangjing Wang, Khanh Vu, Kai Wei, Kehan Shen, Zhengyuan Jiang, Xiao Han, Ning Wang, Zhuo Lu, Yao Liu -+ [Robust ML-based Detection of Conventional, LLM-Generated, and Adversarial Phishing Emails Using Advanced Text Preprocessing](https://arxiv.org//abs/2510.11915) ++ [Robust ML-based Detection of Conventional, LLM-Generated, and Adversarial Phishing Emails Using Advanced Text Preprocessing](https://arxiv.org/abs/2510.11915) Deeksha Hareesha Kulal, Chidozie Princewill Arannonu, Afsah Anwar, Nidhi Rastogi, Quamar Niyaz -+ [LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings](https://arxiv.org//abs/2510.11584) ++ [LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings](https://arxiv.org/abs/2510.11584) Ting Li, Yang Yang, Yipeng Yu, Liang Yao, Guoqing Chao, Ruifeng Xu @@ -873,15 +873,15 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Pengyu Zhu, Lijun Li, Yaxing Lyu, Li Sun, Sen Su, Jing Shao -+ [Joint Discriminative-Generative Modeling via Dual Adversarial Training](https://arxiv.org//abs/2510.13872) ++ [Joint Discriminative-Generative Modeling via Dual Adversarial Training](https://arxiv.org/abs/2510.13872) Xuwang Yin, Claire Zhang, Julie Steele, Nir Shavit, Tony T. Wang -+ [Exploring and Leveraging Class Vectors for Classifier Editing](https://arxiv.org//abs/2510.11268) ++ [Exploring and Leveraging Class Vectors for Classifier Editing](https://arxiv.org/abs/2510.11268) Jaeik Kim, Jaeyoung Do -+ [Bag of Tricks for Subverting Reasoning-based Safety Guardrails](https://arxiv.org//abs/2510.11570) ++ [Bag of Tricks for Subverting Reasoning-based Safety Guardrails](https://arxiv.org/abs/2510.11570) Shuo Chen, Zhen Han, Haokun Chen, Bailan He, Shengyun Si, Jingpei Wu, Philip Torr, Volker Tresp, Jindong Gu @@ -967,16 +967,16 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Anand D. Sarwate, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar -+ [Scheming Ability in LLM-to-LLM Strategic Interactions](https://arxiv.org//abs/2510.12826) ++ [Scheming Ability in LLM-to-LLM Strategic Interactions](https://arxiv.org/abs/2510.12826) Thao Pham -+ [ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking](https://arxiv.org//abs/2510.13842) ++ [ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking](https://arxiv.org/abs/2510.13842) Yutao Wu, Xiao Liu, Yinghui Li, Yifeng Gao, Yifan Ding, Jiale Ding, Xiang Zheng, Xingjun Ma # 2025-10-10 -+ [Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning](https://arxiv.org//abs/2510.09487) ++ [Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning](https://arxiv.org/abs/2510.09487) Shangzhe Li, Dongruo Zhou, Weitong Zhang @@ -1036,19 +1036,19 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Benedikt Franke, Florian Heinrich, Markus Lange, Arne Raulf -+ [SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG](https://arxiv.org//abs/2510.09710) ++ [SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG](https://arxiv.org/abs/2510.09710) Xiaonan Si, Meilin Zhu, Simeng Qin, Lijia Yu, Lijun Zhang, Shuaitong Liu, Xinfeng Li, Ranjie Duan, Yang Liu, Xiaojun Jia -+ [Uncolorable Examples: Preventing Unauthorized AI Colorization via Perception-Aware Chroma-Restrictive Perturbation](https://arxiv.org//abs/2510.08979) ++ [Uncolorable Examples: Preventing Unauthorized AI Colorization via Perception-Aware Chroma-Restrictive Perturbation](https://arxiv.org/abs/2510.08979) Yuki Nii, Futa Waseda, Ching-Chun Chang, Isao Echizen -+ [All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language](https://arxiv.org//abs/2510.09714) ++ [All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language](https://arxiv.org/abs/2510.09714) Shiyuan Guo, Henry Sleight, Fabien Roger -+ [On the Fairness of Privacy Protection: Measuring and Mitigating the Disparity of Group Privacy Risks for Differentially Private Machine Learning](https://arxiv.org//abs/2510.09114) ++ [On the Fairness of Privacy Protection: Measuring and Mitigating the Disparity of Group Privacy Risks for Differentially Private Machine Learning](https://arxiv.org/abs/2510.09114) Zhi Yang, Changwu Huang, Ke Tang, Xin Yao @@ -1238,15 +1238,15 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Muhammad Usman, Yugyung Lee -+ [PEAR: Planner-Executor Agent Robustness Benchmark](https://arxiv.org//abs/2510.07505) ++ [PEAR: Planner-Executor Agent Robustness Benchmark](https://arxiv.org/abs/2510.07505) Shen Dong, Mingxuan Zhang, Pengfei He, Li Ma, Bhavani Thuraisingham, Hui Liu, Yue Xing -+ [A2AS: Agentic AI Runtime Security and Self-Defense](https://arxiv.org//abs/2510.13825) ++ [A2AS: Agentic AI Runtime Security and Self-Defense](https://arxiv.org/abs/2510.13825) Eugene Neelou, Ivan Novikov, Max Moroz, Om Narayan, Tiffany Saade, Mika Ayenson, Ilya Kabanov, Jen Ozmen, Edward Lee, Vineeth Sai Narajala, Emmanuel Guilherme Junior, Ken Huang, Huseyin Gulsin, Jason Ross, Marat Vyshegorodtsev, Adelin Travers, Idan Habler, Rahul Jadav -+ [Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race](https://arxiv.org//abs/2510.06544) ++ [Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race](https://arxiv.org/abs/2510.06544) Xutao Mao, Ke Li, Cameron Baird, Ezra Xuanru Tao, Dan Lin @@ -1295,11 +1295,11 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Raju Dhakal, Prashant Shekhar, Laxima Niure Kandel -+ [RareAgent: Self-Evolving Reasoning for Drug Repurposing in Rare Diseases](https://arxiv.org//abs/2510.05764) ++ [RareAgent: Self-Evolving Reasoning for Drug Repurposing in Rare Diseases](https://arxiv.org/abs/2510.05764) Lang Qin, Zijian Gan, Xu Cao, Pengcheng Jiang, Yankai Jiang, Jiawei Han, Kaishun Wu, Jintai Chen -+ [The Role of Federated Learning in Improving Financial Security: A Survey](https://arxiv.org//abs/2510.14991) ++ [The Role of Federated Learning in Improving Financial Security: A Survey](https://arxiv.org/abs/2510.14991) Cade Houston Kennedy, Amr Hilal, Morteza Momeni @@ -1329,15 +1329,15 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Xiangxiang Chen, Peixin Zhang, Jun Sun, Wenhai Wang, Jingyi Wang -+ [SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models](https://arxiv.org//abs/2510.05173) ++ [SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models](https://arxiv.org/abs/2510.05173) Peigui Qi, Kunsheng Tang, Wenbo Zhou, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang -+ [Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time](https://arxiv.org//abs/2510.04340) ++ [Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time](https://arxiv.org/abs/2510.04340) Daniel Tan, Anders Woodruff, Niels Warncke, Arun Jose, Maxime Riché, David Demitri Africa, Mia Taylor -+ [Agentic Misalignment: How LLMs Could Be Insider Threats](https://arxiv.org//abs/2510.05179) ++ [Agentic Misalignment: How LLMs Could Be Insider Threats](https://arxiv.org/abs/2510.05179) Aengus Lynch, Benjamin Wright, Caleb Larson, Stuart J. Ritchie, Soren Mindermann, Evan Hubinger, Ethan Perez, Kevin Troy @@ -1355,7 +1355,7 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Sagar Lekhak, Emmett J. Ientilucci, Dimah Dera, Susmita Ghosh -+ [Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain](https://arxiv.org//abs/2510.05159) ++ [Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain](https://arxiv.org/abs/2510.05159) Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru, Nicolas Chapados, Quentin Cappart, Alexandre Lacoste, Krishnamurthy Dj Dvijotham, Alexandre Drouin @@ -1363,12 +1363,12 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Zhixin Xie, Xurui Song, Jun Luo -+ [Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs](https://arxiv.org//abs/2510.03567) ++ [Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs](https://arxiv.org/abs/2510.03567) Fatmazohra Rezkellah, Ramzi Dakhmouche # 2025-10-02 -+ [Dynamic Target Attack](https://arxiv.org//abs/2510.02422) ++ [Dynamic Target Attack](https://arxiv.org/abs/2510.02422) Kedong Xiu, Churui Zeng, Tianhang Zheng, Xinzhe Huang, Xiaojun Jia, Di Wang, Puning Zhao, Zhan Qin, Kui Ren @@ -1377,7 +1377,7 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Jing Wang, Wonho Bae, Jiahong Chen, Wenxu Wang, Junhyug Noh -+ [Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers](https://arxiv.org//abs/2510.00915) ++ [Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers](https://arxiv.org/abs/2510.00915) Xin-Qiang Cai, Wei Wang, Feng Liu, Tongliang Liu, Gang Niu, Masashi Sugiyama @@ -1386,245 +1386,245 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yusuf Ziya Isik, Rafał Łaganowski -+ [A Generalized Information Bottleneck Theory of Deep Learning](https://arxiv.org//abs/2509.26327) ++ [A Generalized Information Bottleneck Theory of Deep Learning](https://arxiv.org/abs/2509.26327) Charles Westphal, Stephen Hailes, Mirco Musolesi # 2025-09-29 -+ [AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models](https://arxiv.org//abs/2509.24269) ++ [AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models](https://arxiv.org/abs/2509.24269) Zihao Zhu, Xinyu Wu, Gehan Hu, Siwei Lyu, Ke Xu, Baoyuan Wu -+ [Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention](https://arxiv.org//abs/2509.24393) ++ [Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention](https://arxiv.org/abs/2509.24393) Yichi Zhang, Yue Ding, Jingwen Yang, Tianwei Luo, Dongbai Li, Ranjie Duan, Qiang Liu, Hang Su, Yinpeng Dong, Jun Zhu -+ [UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following](https://arxiv.org//abs/2509.25148) ++ [UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following](https://arxiv.org/abs/2509.25148) FaQiang Qian, WeiKun Zhang, Ziliang Wang, Kang An, Xuhui Zheng, Liangjian Wen, Mengya Gao, Yong Dai, Yichao Wu -+ [Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs](https://arxiv.org//abs/2509.24166) ++ [Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs](https://arxiv.org/abs/2509.24166) Arpit Garg, Hemanth Saratchandran, Ravi Garg, Simon Lucey -+ [Metamorphic Testing for Audio Content Moderation Software](https://arxiv.org//abs/2509.24215) ++ [Metamorphic Testing for Audio Content Moderation Software](https://arxiv.org/abs/2509.24215) Wenxuan Wang, Yongjiang Wu, Junyuan Zhang, Shuqing Li, Yun Peng, Wenting Chen, Shuai Wang, Michael R. Lyu -+ [Adversarial Reinforcement Learning Framework for ESP Cheater Simulation](https://arxiv.org//abs/2509.24274) ++ [Adversarial Reinforcement Learning Framework for ESP Cheater Simulation](https://arxiv.org/abs/2509.24274) Inkyu Park, Jeong-Gwan Lee, Taehwan Kwon, Juheon Choi, Seungku Kim, Junsu Kim, Kimin Lee -+ [DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models](https://arxiv.org//abs/2509.24296) ++ [DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models](https://arxiv.org/abs/2509.24296) Zherui Li, Zheng Nie, Zhenhong Zhou, Yufei Guo, Yue Liu, Yitong Zhang, Yu Cheng, Qingsong Wen, Kun Wang, Jiaheng Zhang -+ [HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment](https://arxiv.org//abs/2509.24384) ++ [HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment](https://arxiv.org/abs/2509.24384) Langqi Yang, Tianhang Zheng, Kedong Xiu, Yixuan Chen, Di Wang, Puning Zhao, Zhan Qin, Kui Ren -+ [Community detection robustness of graph neural networks](https://arxiv.org//abs/2509.24662) ++ [Community detection robustness of graph neural networks](https://arxiv.org/abs/2509.24662) Jaidev Goel, Pablo Moriano, Ramakrishnan Kannan, Yulia R. Gel -+ [Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption](https://arxiv.org//abs/2509.24748) ++ [Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption](https://arxiv.org/abs/2509.24748) Longxiang He, Deheng Ye, Junbo Tan, Xueqian Wang, Li Shen -+ [Scalable GANs with Transformers](https://arxiv.org//abs/2509.24935) ++ [Scalable GANs with Transformers](https://arxiv.org/abs/2509.24935) Sangeek Hyun, MinKyu Lee, Jae-Pil Heo -+ [SecInfer: Preventing Prompt Injection via Inference-time Scaling](https://arxiv.org//abs/2509.24967) ++ [SecInfer: Preventing Prompt Injection via Inference-time Scaling](https://arxiv.org/abs/2509.24967) Yupei Liu, Yanting Wang, Yuqi Jia, Jinyuan Jia, Neil Zhenqiang Gong -+ [GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs](https://arxiv.org//abs/2509.25178) ++ [GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs](https://arxiv.org/abs/2509.25178) Aryan Yazdan Parast, Parsa Hosseini, Hesam Asadollahzadeh, Arshia Soltani Moakhar, Basim Azam, Soheil Feizi, Naveed Akhtar -+ [Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models](https://arxiv.org//abs/2509.24488) ++ [Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models](https://arxiv.org/abs/2509.24488) Wenjie Fu, Huandong Wang, Junyao Gao, Guoan Wan, Tao Jiang -+ [SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender Systems](https://arxiv.org//abs/2509.24961) ++ [SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender Systems](https://arxiv.org/abs/2509.24961) Kaihong Li, Huichi Zhou, Bin Ma, Fangjun Huang -+ [DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense](https://arxiv.org//abs/2509.24359) ++ [DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense](https://arxiv.org/abs/2509.24359) Amira Guesmi, Muhammad Shafique -+ [TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models](https://arxiv.org//abs/2509.24566) ++ [TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models](https://arxiv.org/abs/2509.24566) Zhifang Zhang, Qiqi Tao, Jiaqi Lv, Na Zhao, Lei Feng, Joey Tianyi Zhou -+ [VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines](https://arxiv.org//abs/2509.24891) ++ [VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines](https://arxiv.org/abs/2509.24891) Mostafa Mohaimen Akand Faisal, Rabeya Amin Jhuma -+ [MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification](https://arxiv.org//abs/2509.25082) ++ [MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification](https://arxiv.org/abs/2509.25082) Xiaoyi Huang, Junwei Wu, Kejia Zhang, Carl Yang, Zhiming Luo -+ [Score-based Membership Inference on Diffusion Models](https://arxiv.org//abs/2509.25003) ++ [Score-based Membership Inference on Diffusion Models](https://arxiv.org/abs/2509.25003) Mingxing Rao, Bowen Qu, Daniel Moyer -+ [H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning](https://arxiv.org//abs/2509.24330) ++ [H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning](https://arxiv.org/abs/2509.24330) Shiyuan Zuo, Rongfei Fan, Cheng Zhan, Jie Xu, Puning Zhao, Han Hu -+ [Distributionally Robust Federated Learning with Outlier Resilience](https://arxiv.org//abs/2509.24462) ++ [Distributionally Robust Federated Learning with Outlier Resilience](https://arxiv.org/abs/2509.24462) Zifan Wang, Xinlei Yi, Xenia Konti, Michael M. Zavlanos, Karl H. Johansson -+ [Guided Uncertainty Learning Using a Post-Hoc Evidential Meta-Model](https://arxiv.org//abs/2509.24492) ++ [Guided Uncertainty Learning Using a Post-Hoc Evidential Meta-Model](https://arxiv.org/abs/2509.24492) Charmaine Barker, Daniel Bethell, Simos Gerasimou -+ [Learning in an Echo Chamber: Online Learning with Replay Adversary](https://arxiv.org//abs/2509.25135) ++ [Learning in an Echo Chamber: Online Learning with Replay Adversary](https://arxiv.org/abs/2509.25135) Daniil Dmitriev, Harald Eskelund Franck, Carolin Heinzler, Amartya Sanyal -+ [FuncPoison: Poisoning Function Library to Hijack Multi-agent Autonomous Driving Systems](https://arxiv.org//abs/2509.24408) ++ [FuncPoison: Poisoning Function Library to Hijack Multi-agent Autonomous Driving Systems](https://arxiv.org/abs/2509.24408) Yuzhen Long, Songze Li -+ [Takedown: How It's Done in Modern Coding Agent Exploits](https://arxiv.org//abs/2509.24240) ++ [Takedown: How It's Done in Modern Coding Agent Exploits](https://arxiv.org/abs/2509.24240) Eunkyu Lee, Donghyeon Kim, Wonyoung Kim, Insu Yun -+ [When MCP Servers Attack: Taxonomy, Feasibility, and Mitigation](https://arxiv.org//abs/2509.24272) ++ [When MCP Servers Attack: Taxonomy, Feasibility, and Mitigation](https://arxiv.org/abs/2509.24272) Weibo Zhao, Jiahao Liu, Bonan Ruan, Shaofei Li, Zhenkai Liang -+ [GSPR: Aligning LLM Safeguards as Generalizable Safety Policy Reasoners](https://arxiv.org//abs/2509.24418) ++ [GSPR: Aligning LLM Safeguards as Generalizable Safety Policy Reasoners](https://arxiv.org/abs/2509.24418) Haoran Li, Yulin Chen, Jingru Zeng, Hao Peng, Huihao Jing, Wenbin Hu, Xi Yang, Ziqian Zeng, Sirui Han, Yangqiu Song -+ [PRIVMARK: Private Large Language Models Watermarking with MPC](https://arxiv.org//abs/2509.24624) ++ [PRIVMARK: Private Large Language Models Watermarking with MPC](https://arxiv.org/abs/2509.24624) Thomas Fargues, Ye Dong, Tianwei Zhang, Jin-Song Dong -+ [Secret Leader Election in Ethereum PoS: An Empirical Security Analysis of Whisk and Homomorphic Sortition under DoS on the Leader and Censorship Attacks](https://arxiv.org//abs/2509.24955) ++ [Secret Leader Election in Ethereum PoS: An Empirical Security Analysis of Whisk and Homomorphic Sortition under DoS on the Leader and Censorship Attacks](https://arxiv.org/abs/2509.24955) Tereza Burianová, Martin Perešíni, Ivan Homoliak # 2025-09-28 -+ [Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning](https://arxiv.org//abs/2509.23558) ++ [Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning](https://arxiv.org/abs/2509.23558) Zhaoqi Wang, Daqing He, Zijian Zhang, Xin Li, Liehuang Zhu, Meng Li, Jiamou Liu -+ [SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents](https://arxiv.org//abs/2509.23694) ++ [SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents](https://arxiv.org/abs/2509.23694) Jianshuo Dong, Sheng Guo, Hao Wang, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu -+ [Quant Fever, Reasoning Blackholes, Schrodinger's Compliance, and More: Probing GPT-OSS-20B](https://arxiv.org//abs/2509.23882) ++ [Quant Fever, Reasoning Blackholes, Schrodinger's Compliance, and More: Probing GPT-OSS-20B](https://arxiv.org/abs/2509.23882) Shuyi Lin, Tian Lu, Zikai Wang, Bo Wen, Yibo Zhao, Cheng Tan -+ [Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence](https://arxiv.org//abs/2509.23573) ++ [Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence](https://arxiv.org/abs/2509.23573) Yuqiao Meng, Luoxi Tang, Feiyang Yu, Jinyuan Jia, Guanhua Yan, Ping Yang, Zhaohan Xi -+ [BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images](https://arxiv.org//abs/2509.23617) ++ [BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images](https://arxiv.org/abs/2509.23617) Cheng Huang, Weizheng Xie, Fan Gao, Yutong Liu, Ruoling Wu, Zeyu Han, Jingxi Qiu, Xiangxiang Wang, Zhenglin Yang, Hao Wang, Yongbin Yu -+ [Generalizable Speech Deepfake Detection via Information Bottleneck Enhanced Adversarial Alignment](https://arxiv.org//abs/2509.23618) ++ [Generalizable Speech Deepfake Detection via Information Bottleneck Enhanced Adversarial Alignment](https://arxiv.org/abs/2509.23618) Pu Huang, Shouguang Wang, Siya Yao, Mengchu Zhou -+ [Accuracy-Robustness Trade Off via Spiking Neural Network Gradient Sparsity Trail](https://arxiv.org//abs/2509.23762) ++ [Accuracy-Robustness Trade Off via Spiking Neural Network Gradient Sparsity Trail](https://arxiv.org/abs/2509.23762) Nhan T. Luu -+ [HFuzzer: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing](https://arxiv.org//abs/2509.23835) ++ [HFuzzer: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing](https://arxiv.org/abs/2509.23835) Yukai Zhao, Menghan Wu, Xing Hu, Xin Xia -+ [Adversarial Diffusion for Robust Reinforcement Learning](https://arxiv.org//abs/2509.23846) ++ [Adversarial Diffusion for Robust Reinforcement Learning](https://arxiv.org/abs/2509.23846) Daniele Foffano, Alessio Russo, Alexandre Proutiere -+ [Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack](https://arxiv.org//abs/2509.23871) ++ [Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack](https://arxiv.org/abs/2509.23871) Yukun Chen, Boheng Li, Yu Yuan, Leyi Qi, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren -+ [Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer](https://arxiv.org//abs/2509.23886) ++ [Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer](https://arxiv.org/abs/2509.23886) Simon Schrodi, Elias Kempf, Fazl Barez, Thomas Brox -+ [Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios](https://arxiv.org//abs/2509.23895) ++ [Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios](https://arxiv.org/abs/2509.23895) Jinghan Xu Yuyang Zhang Qixuan Cai Jiancheng Chen Keqiu Li -+ [Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE](https://arxiv.org//abs/2509.24130) ++ [Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE](https://arxiv.org/abs/2509.24130) Guancheng Wan, Lucheng Fu, Haoxin Liu, Yiqiao Jin, Hui Yi Leong, Eric Hanchen Jiang, Hejia Geng, Jinhe Bi, Yunpu Ma, Xiangru Tang, B. Aditya Prakash, Yizhou Sun, Wei Wang -+ [Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models](https://arxiv.org//abs/2509.23626) ++ [Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models](https://arxiv.org/abs/2509.23626) Beomseok Kang, Niluthpol Chowdhury Mithun, Mikhail Sizintsev, Han-Pang Chiu, Supun Samarasekera -+ [Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models](https://arxiv.org//abs/2509.23827) ++ [Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models](https://arxiv.org/abs/2509.23827) Efthymios Tsaprazlis, Tiantian Feng, Anil Ramakrishna, Rahul Gupta, Shrikanth Narayanan -+ [FairViT-GAN: A Hybrid Vision Transformer with Adversarial Debiasing for Fair and Explainable Facial Beauty Prediction](https://arxiv.org//abs/2509.23859) ++ [FairViT-GAN: A Hybrid Vision Transformer with Adversarial Debiasing for Fair and Explainable Facial Beauty Prediction](https://arxiv.org/abs/2509.23859) Djamel Eddine Boukhari -+ [Adversarial Versus Federated: An Adversarial Learning based Multi-Modality Cross-Domain Federated Medical Segmentation](https://arxiv.org//abs/2509.23907) ++ [Adversarial Versus Federated: An Adversarial Learning based Multi-Modality Cross-Domain Federated Medical Segmentation](https://arxiv.org/abs/2509.23907) You Zhou, Lijiang Chen, Shuchang Lyu, Guangxia Cui, Wenpei Bai, Zheng Zhou, Meng Li, Guangliang Cheng, Huiyu Zhou, Qi Zhao -+ [Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives](https://arxiv.org//abs/2509.23917) ++ [Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives](https://arxiv.org/abs/2509.23917) Kuanrong Liu, Siyuan Liang, Cheng Qian, Ming Zhang, Xiaochun Cao -+ [StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data](https://arxiv.org//abs/2509.23594) ++ [StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data](https://arxiv.org/abs/2509.23594) Yixu Wang, Yan Teng, Yingchun Wang, Xingjun Ma -+ [FedDAPL: Toward Client-Private Generalization in Federated Learning](https://arxiv.org//abs/2509.23688) ++ [FedDAPL: Toward Client-Private Generalization in Federated Learning](https://arxiv.org/abs/2509.23688) Soroosh Safari Loaliyan, Jose-Luis Ambite, Paul M. Thompson, Neda Jahanshad, Greg Ver Steeg -+ [Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability](https://arxiv.org//abs/2509.23689) ++ [Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability](https://arxiv.org/abs/2509.23689) Ankit Gangwal, Aaryan Ajay Sharma -+ [Visual CoT Makes VLMs Smarter but More Fragile](https://arxiv.org//abs/2509.23789) ++ [Visual CoT Makes VLMs Smarter but More Fragile](https://arxiv.org/abs/2509.23789) Chunxue Xu, Yiwei Wang, Yujun Cai, Bryan Hooi, Songze Li -+ [Influence-Guided Concolic Testing of Transformer Robustness](https://arxiv.org//abs/2509.23806) ++ [Influence-Guided Concolic Testing of Transformer Robustness](https://arxiv.org/abs/2509.23806) Chih-Duo Hong, Yu Wang, Yao-Chen Chang, Fang Yu -+ [Learning-Based Testing for Deep Learning: Enhancing Model Robustness with Adversarial Input Prioritization](https://arxiv.org//abs/2509.23961) ++ [Learning-Based Testing for Deep Learning: Enhancing Model Robustness with Adversarial Input Prioritization](https://arxiv.org/abs/2509.23961) Sheikh Md Mushfiqur Rahman, Nasir Eisty -+ [AutoML in Cybersecurity: An Empirical Study](https://arxiv.org//abs/2509.23621) ++ [AutoML in Cybersecurity: An Empirical Study](https://arxiv.org/abs/2509.23621) Sherif Saad, Kevin Shi, Mohammed Mamun, Hythem Elmiligi -+ [A First Look at Privacy Risks of Android Task-executable Voice Assistant Applications](https://arxiv.org//abs/2509.23680) ++ [A First Look at Privacy Risks of Android Task-executable Voice Assistant Applications](https://arxiv.org/abs/2509.23680) Shidong Pan, Yikai Ge, Xiaoyu Sun -+ [GPM: The Gaussian Pancake Mechanism for Planting Undetectable Backdoors in Differential Privacy](https://arxiv.org//abs/2509.23834) ++ [GPM: The Gaussian Pancake Mechanism for Planting Undetectable Backdoors in Differential Privacy](https://arxiv.org/abs/2509.23834) Haochen Sun, Xi He -+ [Binary Diff Summarization using Large Language Models](https://arxiv.org//abs/2509.23970) ++ [Binary Diff Summarization using Large Language Models](https://arxiv.org/abs/2509.23970) Meet Udeshi, Venkata Sai Charan Putrevu, Prashanth Krishnamurthy, Prashant Anantharaman, Sean Carrick, Ramesh Karri, Farshad Khorrami -+ [Analyzing and Evaluating Unbiased Language Model Watermark](https://arxiv.org//abs/2509.24048) ++ [Analyzing and Evaluating Unbiased Language Model Watermark](https://arxiv.org/abs/2509.24048) Yihan Wu, Xuehao Cui, Ruibo Chen, Heng Huang @@ -1633,256 +1633,256 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Vahid Negahdari, Shirin Samadi Bahrami, Seyed Reza Moghadasi, Mohammad Reza Razvan # 2025-09-27 -+ [Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia](https://arxiv.org//abs/2509.23023) ++ [Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia](https://arxiv.org/abs/2509.23023) Davi Bastos Costa, Renato Vicente -+ [LLM Watermark Evasion via Bias Inversion](https://arxiv.org//abs/2509.23019) ++ [LLM Watermark Evasion via Bias Inversion](https://arxiv.org/abs/2509.23019) Jeongyeon Hwang, Sangdon Park, Jungseul Ok -+ [DPFNAS: Differential Privacy-Enhanced Federated Neural Architecture Search for 6G Edge Intelligence](https://arxiv.org//abs/2509.23030) ++ [DPFNAS: Differential Privacy-Enhanced Federated Neural Architecture Search for 6G Edge Intelligence](https://arxiv.org/abs/2509.23030) Yang Lv, Jin Cao, Ben Niu, Zhe Sun, Fengwei Wang, Fenghua Li, Hui Li -+ [Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data](https://arxiv.org//abs/2509.23041) ++ [Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data](https://arxiv.org/abs/2509.23041) Zi Liang, Qingqing Ye, Xuan Liu, Yanyun Wang, Jianliang Xu, Haibo Hu -+ [Patch Rebirth: Toward Fast and Transferable Model Inversion of Vision Transformers](https://arxiv.org//abs/2509.23235) ++ [Patch Rebirth: Toward Fast and Transferable Model Inversion of Vision Transformers](https://arxiv.org/abs/2509.23235) Seongsoo Heo, Dong-Wan Choi -+ [Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection](https://arxiv.org//abs/2509.23246) ++ [Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection](https://arxiv.org/abs/2509.23246) Manjiang Yu, Priyanka Singh, Xue Li, Yang Cao -+ [Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing](https://arxiv.org//abs/2509.23279) ++ [Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing](https://arxiv.org/abs/2509.23279) Rohit Chowdhury, Aniruddha Bala, Rohan Jaiswal, Siddharth Roheda -+ [A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models](https://arxiv.org//abs/2509.23286) ++ [A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models](https://arxiv.org/abs/2509.23286) Wonje Jeung, Sangyeon Yoon, Yoonjun Cho, Dongjae Jeon, Sangwoo Shin, Hyesoo Hong, Albert No -+ [Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling](https://arxiv.org//abs/2509.23325) ++ [Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Adversarial Scheduling](https://arxiv.org/abs/2509.23325) Jonas Ngnawé, Maxime Heuillet, Sabyasachi Sahoo, Yann Pequignot, Ola Ahmad, Audrey Durand, Frédéric Precioso, Christian Gagné -+ [Dual-Space Smoothness for Robust and Balanced LLM Unlearning](https://arxiv.org//abs/2509.23362) ++ [Dual-Space Smoothness for Robust and Balanced LLM Unlearning](https://arxiv.org/abs/2509.23362) Han Yan, Zheyuan Liu, Meng Jiang -+ [Factor Decorrelation Enhanced Data Removal from Deep Predictive Models](https://arxiv.org//abs/2509.23443) ++ [Factor Decorrelation Enhanced Data Removal from Deep Predictive Models](https://arxiv.org/abs/2509.23443) Wenhao Yang, Lin Li, Xiaohui Tao, Kaize Shi -+ [ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search](https://arxiv.org//abs/2509.23519) ++ [ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search](https://arxiv.org/abs/2509.23519) Zeyu Shen, Basileal Imana, Tong Wu, Chong Xiang, Prateek Mittal, Aleksandra Korolova -+ [Guard Vector: Beyond English LLM Guardrails with Task-Vector Composition and Streaming-Aware Prefix SFT](https://arxiv.org//abs/2509.23381) ++ [Guard Vector: Beyond English LLM Guardrails with Task-Vector Composition and Streaming-Aware Prefix SFT](https://arxiv.org/abs/2509.23381) Wonhyuk Lee, Youngchol Kim, Yunjin Park, Junhyung Moon, Dongyoung Jeong, Wanjin Park -+ [MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction](https://arxiv.org//abs/2509.23459) ++ [MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction](https://arxiv.org/abs/2509.23459) Sepideh Abedini (1,2), Shubhankar Mohapatra (1), D. B. Emerson (2), Masoumeh Shafieinejad (2), Jesse C. Cresswell (3), Xi He (1,2) ((1) University of Waterloo, (2) Vector Institute, (3) Layer 6 AI) -+ [Desensitizing for Improving Corruption Robustness in Point Cloud Classification through Adversarial Training](https://arxiv.org//abs/2509.23010) ++ [Desensitizing for Improving Corruption Robustness in Point Cloud Classification through Adversarial Training](https://arxiv.org/abs/2509.23010) Zhiqiang Tian, Weigang Li, Chunhua Deng, Junwei Hu, Yongqiang Wang, Wenping Liu -+ [Real-World Transferable Adversarial Attack on Face-Recognition Systems](https://arxiv.org//abs/2509.23198) ++ [Real-World Transferable Adversarial Attack on Face-Recognition Systems](https://arxiv.org/abs/2509.23198) Andrey Kaznacheev, Matvey Mikhalchuk, Andrey Kuznetsov, Aleksandr Petiushko, Anton Razzhigaev -+ [Robust Multi-Modal Face Anti-Spoofing with Domain Adaptation: Tackling Missing Modalities, Noisy Pseudo-Labels, and Model Degradation](https://arxiv.org//abs/2509.23475) ++ [Robust Multi-Modal Face Anti-Spoofing with Domain Adaptation: Tackling Missing Modalities, Noisy Pseudo-Labels, and Model Degradation](https://arxiv.org/abs/2509.23475) Ming-Tsung Hsu, Fang-Yu Hsu, Yi-Ting Lin, Kai-Heng Chien, Jun-Ren Chen, Cheng-Hsiang Su, Yi-Chen Ou, Chiou-Ting Hsu, Pei-Kai Huang -+ [Targeted perturbations reveal brain-like local coding axes in robustified, but not standard, ANN-based brain models](https://arxiv.org//abs/2509.23333) ++ [Targeted perturbations reveal brain-like local coding axes in robustified, but not standard, ANN-based brain models](https://arxiv.org/abs/2509.23333) Nikolas McNeal, N. Apurva Ratan Murty -+ [GuardNet: Graph-Attention Filtering for Jailbreak Defense in Large Language Models](https://arxiv.org//abs/2509.23037) ++ [GuardNet: Graph-Attention Filtering for Jailbreak Defense in Large Language Models](https://arxiv.org/abs/2509.23037) Javad Forough, Mohammad Maheri, Hamed Haddadi -+ [CoSIFL: Collaborative Secure and Incentivized Federated Learning with Differential Privacy](https://arxiv.org//abs/2509.23190) ++ [CoSIFL: Collaborative Secure and Incentivized Federated Learning with Differential Privacy](https://arxiv.org/abs/2509.23190) Zhanhong Xie, Meifan Zhang, Lihua Yin -+ [NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning](https://arxiv.org//abs/2509.23252) ++ [NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning](https://arxiv.org/abs/2509.23252) Raviteja Anantha, Soheil Hor, Teodor Nicola Antoniu, Layne C. Price -+ [FedBit: Accelerating Privacy-Preserving Federated Learning via Bit-Interleaved Packing and Cross-Layer Co-Design](https://arxiv.org//abs/2509.23091) ++ [FedBit: Accelerating Privacy-Preserving Federated Learning via Bit-Interleaved Packing and Cross-Layer Co-Design](https://arxiv.org/abs/2509.23091) Xiangchen Meng, Yangdi Lyu -+ [Noisy Networks, Nosy Neighbors: Inferring Privacy Invasive Information from Encrypted Wireless Traffic](https://arxiv.org//abs/2510.13822) ++ [Noisy Networks, Nosy Neighbors: Inferring Privacy Invasive Information from Encrypted Wireless Traffic](https://arxiv.org/abs/2510.13822) Bartosz Burgiel # 2025-09-26 -+ [Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety](https://arxiv.org//abs/2509.21782) ++ [Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety](https://arxiv.org/abs/2509.21782) Junliang Liu, Jingyu Xiao, Wenxin Tang, Wenxuan Wang, Zhixian Wang, Minrui Zhang, Shuanghe Yu -+ [Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models](https://arxiv.org//abs/2509.21761) ++ [Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models](https://arxiv.org/abs/2509.21761) Miao Yu, Zhenhong Zhou, Moayad Aloqaily, Kun Wang, Biwei Huang, Stephen Wang, Yueming Jin, Qingsong Wen -+ [You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors](https://arxiv.org//abs/2509.21884) ++ [You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors](https://arxiv.org/abs/2509.21884) Bochuan Cao, Changjiang Li, Yuanpu Cao, Yameng Ge, Ting Wang, Jinghui Chen -+ [Active Attacks: Red-teaming LLMs via Adaptive Environments](https://arxiv.org//abs/2509.21947) ++ [Active Attacks: Red-teaming LLMs via Adaptive Environments](https://arxiv.org/abs/2509.21947) Taeyoung Yun, Pierre-Luc St-Charles, Jinkyoo Park, Yoshua Bengio, Minsu Kim -+ [Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models](https://arxiv.org//abs/2509.21979) ++ [Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models](https://arxiv.org/abs/2509.21979) Zikun Guo, Xinyue Xu, Pei Xiang, Shu Yang, Xin Han, Di Wang, Lijie Hu -+ [Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks](https://arxiv.org//abs/2509.22060) ++ [Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks](https://arxiv.org/abs/2509.22060) Aravindhan G, Yuvaraj Govindarajulu, Parin Shah -+ [The Rogue Scalpel: Activation Steering Compromises LLM Safety](https://arxiv.org//abs/2509.22067) ++ [The Rogue Scalpel: Activation Steering Compromises LLM Safety](https://arxiv.org/abs/2509.22067) Anton Korznikov, Andrey Galichin, Alexey Dontsov, Oleg Y. Rogov, Ivan Oseledets, Elena Tutubalina -+ [Jailbreaking on Text-to-Video Models via Scene Splitting Strategy](https://arxiv.org//abs/2509.22292) ++ [Jailbreaking on Text-to-Video Models via Scene Splitting Strategy](https://arxiv.org/abs/2509.22292) Wonjun Lee, Haon Park, Doehyeon Lee, Bumsub Ham, Suhyun Kim -+ [Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning](https://arxiv.org//abs/2509.22472) ++ [Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning](https://arxiv.org/abs/2509.22472) Antreas Ioannou, Andreas Shiamishis, Nora Hollenstein, Nezihe Merve Gürel -+ [Mixture of Detectors: A Compact View of Machine-Generated Text Detection](https://arxiv.org//abs/2509.22147) ++ [Mixture of Detectors: A Compact View of Machine-Generated Text Detection](https://arxiv.org/abs/2509.22147) Sai Teja Lekkala, Yadagiri Annepaka, Arun Kumar Challa, Samatha Reddy Machireddy, Partha Pakray, Chukhu Chunka -+ [Context Parametrization with Compositional Adapters](https://arxiv.org//abs/2509.22158) ++ [Context Parametrization with Compositional Adapters](https://arxiv.org/abs/2509.22158) Josip Jukić, Martin Tutek, Jan Šnajder -+ [SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models](https://arxiv.org//abs/2509.21843) ++ [SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models](https://arxiv.org/abs/2509.21843) Jingkai Guo, Chaitali Chakrabarti, Deliang Fan -+ [Deepfakes: we need to re-think the concept of "real" images](https://arxiv.org//abs/2509.21864) ++ [Deepfakes: we need to re-think the concept of "real" images](https://arxiv.org/abs/2509.21864) Janis Keuper, Margret Keuper -+ [FailureAtlas:Mapping the Failure Landscape of T2I Models via Active Exploration](https://arxiv.org//abs/2509.21995) ++ [FailureAtlas:Mapping the Failure Landscape of T2I Models via Active Exploration](https://arxiv.org/abs/2509.21995) Muxi Chen, Zhaohua Zhang, Chenchen Zhao, Mingyang Chen, Wenyu Jiang, Tianwen Jiang, Jianhuan Zhuo, Yu Tang, Qiuyong Xiao, Jihong Zhang, Qiang Xu -+ [RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer](https://arxiv.org//abs/2509.22323) ++ [RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer](https://arxiv.org/abs/2509.22323) Wangbo Zhao, Yizeng Han, Zhiwei Tang, Jiasheng Tang, Pengfei Zhou, Kai Wang, Bohan Zhuang, Zhangyang Wang, Fan Wang, Yang You -+ [Text Adversarial Attacks with Dynamic Outputs](https://arxiv.org//abs/2509.22393) ++ [Text Adversarial Attacks with Dynamic Outputs](https://arxiv.org/abs/2509.22393) Wenqiang Wang, Siyuan Liang, Xiao Yan, Xiaochun Cao -+ [Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models](https://arxiv.org//abs/2509.22400) ++ [Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models](https://arxiv.org/abs/2509.22400) Xinhao Zhong, Yimin Zhou, Zhiqi Zhang, Junhao Li, Yi Sun, Bin Chen, Shu-Tao Xia, Ke Xu -+ [Zubov-Net: Adaptive Stability for Neural ODEs Reconciling Accuracy with Robustness](https://arxiv.org//abs/2509.21879) ++ [Zubov-Net: Adaptive Stability for Neural ODEs Reconciling Accuracy with Robustness](https://arxiv.org/abs/2509.21879) Chaoyang Luo, Yan Zou, Nanjing Huang -+ [Concept-SAE: Active Causal Probing of Visual Model Behavior](https://arxiv.org//abs/2509.22015) ++ [Concept-SAE: Active Causal Probing of Visual Model Behavior](https://arxiv.org/abs/2509.22015) Jianrong Ding, Muxi Chen, Chenchen Zhao, Qiang Xu -+ [Non-Linear Trajectory Modeling for Multi-Step Gradient Inversion Attacks in Federated Learning](https://arxiv.org//abs/2509.22082) ++ [Non-Linear Trajectory Modeling for Multi-Step Gradient Inversion Attacks in Federated Learning](https://arxiv.org/abs/2509.22082) Li Xia, Zheng Liu, Sili Huang, Wei Tang, Xuan Liu -+ [Countering adversarial evasion in regression analysis](https://arxiv.org//abs/2509.22113) ++ [Countering adversarial evasion in regression analysis](https://arxiv.org/abs/2509.22113) David Benfield, Phan Tu Vuong, Alain Zemkoho -+ [A Law of Data Reconstruction for Random Features (and Beyond)](https://arxiv.org//abs/2509.22214) ++ [A Law of Data Reconstruction for Random Features (and Beyond)](https://arxiv.org/abs/2509.22214) Leonardo Iurada, Simone Bombari, Tatiana Tommasi, Marco Mondelli -+ [Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning](https://arxiv.org//abs/2509.22263) ++ [Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning](https://arxiv.org/abs/2509.22263) Nakyeong Yang, Dong-Kyum Kim, Jea Kwon, Minsung Kim, Kyomin Jung, Meeyoung Cha -+ [Nonlinear Optimization with GPU-Accelerated Neural Network Constraints](https://arxiv.org//abs/2509.22462) ++ [Nonlinear Optimization with GPU-Accelerated Neural Network Constraints](https://arxiv.org/abs/2509.22462) Robert Parker, Oscar Dowson, Nicole LoGiudice, Manuel Garcia, Russell Bent -+ ["Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors](https://arxiv.org//abs/2509.22040) ++ ["Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors](https://arxiv.org/abs/2509.22040) Yue Liu, Yanjie Zhao, Yunbo Lyu, Ting Zhang, Haoyu Wang, David Lo -+ [Collusion-Driven Impersonation Attack on Channel-Resistant RF Fingerprinting](https://arxiv.org//abs/2509.22154) ++ [Collusion-Driven Impersonation Attack on Channel-Resistant RF Fingerprinting](https://arxiv.org/abs/2509.22154) Zhou Xu, Guyue Li, Zhe Peng, Aiqun Hu -+ [Privacy Mechanism Design based on Empirical Distributions](https://arxiv.org//abs/2509.22428) ++ [Privacy Mechanism Design based on Empirical Distributions](https://arxiv.org/abs/2509.22428) Leonhard Grosse, Sara Saeidian, Mikael Skoglund, Tobias J. Oechtering -+ [Your RAG is Unfair: Exposing Fairness Vulnerabilities in Retrieval-Augmented Generation via Backdoor Attacks](https://arxiv.org//abs/2509.22486) ++ [Your RAG is Unfair: Exposing Fairness Vulnerabilities in Retrieval-Augmented Generation via Backdoor Attacks](https://arxiv.org/abs/2509.22486) Gaurav Bagwe, Saket S. Chaturvedi, Xiaolong Ma, Xiaoyong Yuan, Kuang-Ching Wang, Lan Zhang -+ [Creative Adversarial Testing (CAT): A Novel Framework for Evaluating Goal-Oriented Agentic AI Systems](https://arxiv.org//abs/2509.23006) ++ [Creative Adversarial Testing (CAT): A Novel Framework for Evaluating Goal-Oriented Agentic AI Systems](https://arxiv.org/abs/2509.23006) Hassen Dhrif -+ [Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment](https://arxiv.org//abs/2509.22745) ++ [Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment](https://arxiv.org/abs/2509.22745) Jaehan Kim, Minkyoo Song, Seungwon Shin, Sooel Son -+ [Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN](https://arxiv.org//abs/2509.22836) ++ [Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN](https://arxiv.org/abs/2509.22836) Roie Kazoom, Alon Goldberg, Hodaya Cohen, Ofer Hadar -+ [Boundary on the Table: Efficient Black-Box Decision-Based Attacks for Structured Data](https://arxiv.org//abs/2509.22850) ++ [Boundary on the Table: Efficient Black-Box Decision-Based Attacks for Structured Data](https://arxiv.org/abs/2509.22850) Roie Kazoom, Yuval Ratzabi, Etamar Rothstein, Ofer Hadar -+ [Observation-Free Attacks on Online Learning to Rank](https://arxiv.org//abs/2509.22855) ++ [Observation-Free Attacks on Online Learning to Rank](https://arxiv.org/abs/2509.22855) Sameep Chattopadhyay, Nikhil Karamchandani, Sharayu Mohair -+ [Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings](https://arxiv.org//abs/2509.22925) ++ [Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings](https://arxiv.org/abs/2509.22925) Yuanzhi Zhu, Xi Wang, Stéphane Lathuilière, Vicky Kalogeiton -+ [Unsupervised Speech Enhancement using Data-defined Priors](https://arxiv.org//abs/2509.22942) ++ [Unsupervised Speech Enhancement using Data-defined Priors](https://arxiv.org/abs/2509.22942) Dominik Klement, Matthew Maciejewski, Sanjeev Khudanpur, Jan Černocký, Lukáš Burget -+ [ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents](https://arxiv.org//abs/2509.22830) ++ [ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents](https://arxiv.org/abs/2509.22830) Hwan Chang, Yonghyun Jun, Hwanhee Lee -+ [Concept activation vectors: a unifying view and adversarial attacks](https://arxiv.org//abs/2509.22755) ++ [Concept activation vectors: a unifying view and adversarial attacks](https://arxiv.org/abs/2509.22755) Ekkehard Schnoor, Malik Tiomoko, Jawher Said, Alex Jung, Wojciech Samek -+ [Model Context Protocol for Vision Systems: Audit, Security, and Protocol Extensions](https://arxiv.org//abs/2509.22814) ++ [Model Context Protocol for Vision Systems: Audit, Security, and Protocol Extensions](https://arxiv.org/abs/2509.22814) Aditi Tiwari, Akshit Bhalla, Darshan Prasad -+ [PAPER: Privacy-Preserving ResNet Models using Low-Degree Polynomial Approximations and Structural Optimizations on Leveled FHE](https://arxiv.org//abs/2509.22857) ++ [PAPER: Privacy-Preserving ResNet Models using Low-Degree Polynomial Approximations and Structural Optimizations on Leveled FHE](https://arxiv.org/abs/2509.22857) Eduardo Chielle, Manaar Alam, Jinting Liu, Jovan Kascelan, Michail Maniatakos -+ [AntiFLipper: A Secure and Efficient Defense Against Label-Flipping Attacks in Federated Learning](https://arxiv.org//abs/2509.22873) ++ [AntiFLipper: A Secure and Efficient Defense Against Label-Flipping Attacks in Federated Learning](https://arxiv.org/abs/2509.22873) Aashnan Rahman, Abid Hasan, Sherajul Arifin, Faisal Haque Bappy, Tahrim Hossain, Tariqul Islam, Abu Raihan Mostofa Kamal, Md. Azam Hossain @@ -1890,346 +1890,346 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Jaehan Kim, Minkyoo Song, Seungwon Shin, Sooel Son -+ [On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations](https://arxiv.org//abs/2510.00037) ++ [On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations](https://arxiv.org/abs/2510.00037) Jianing Guo, Zhenhong Wu, Chang Tu, Yiyao Ma, Xiangqi Kong, Zhiqian Liu, Jiaming Ji, Shuning Zhang, Yuanpei Chen, Kai Chen, Xianglong Liu, Qi Dou, Yaodong Yang, Huijie Zhao, Weifeng Lv, Simin Li # 2025-09-25 -+ [SAGE: A Realistic Benchmark for Semantic Understanding](https://arxiv.org//abs/2509.21310) ++ [SAGE: A Realistic Benchmark for Semantic Understanding](https://arxiv.org/abs/2509.21310) Samarth Goel, Reagan J. Lee, Kannan Ramchandran -+ [A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks](https://arxiv.org//abs/2509.20639) ++ [A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks](https://arxiv.org/abs/2509.20639) Adam Swanda, Amy Chang, Alexander Chen, Fraser Burch, Paul Kassianik, Konstantin Berlin -+ [Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection](https://arxiv.org//abs/2509.20682) ++ [Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection](https://arxiv.org/abs/2509.20682) Duc-Tuan Truong, Tianchi Liu, Junjie Li, Ruijie Tao, Kong Aik Lee, Eng Siong Chng -+ [DAC-LoRA: Dynamic Adversarial Curriculum for Efficient and Robust Few-Shot Adaptation](https://arxiv.org//abs/2509.20792) ++ [DAC-LoRA: Dynamic Adversarial Curriculum for Efficient and Robust Few-Shot Adaptation](https://arxiv.org/abs/2509.20792) Ved Umrajkar -+ [Trustworthy Semantic Communication for Vehicular Networks: Challenges and Solutions](https://arxiv.org//abs/2509.20830) ++ [Trustworthy Semantic Communication for Vehicular Networks: Challenges and Solutions](https://arxiv.org/abs/2509.20830) Yanghe Pan, Yuntao Wang, Shaolong Guo, Chengyu Yin, Ruidong Li, Zhou Su, Yuan Wu -+ [Security-aware Semantic-driven ISAC via Paired Adversarial Residual Networks](https://arxiv.org//abs/2509.20835) ++ [Security-aware Semantic-driven ISAC via Paired Adversarial Residual Networks](https://arxiv.org/abs/2509.20835) Yu Liu, Boxiang He, Fanggang Wang -+ [Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools](https://arxiv.org//abs/2509.21011) ++ [Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools](https://arxiv.org/abs/2509.21011) Ping He, Changjiang Li, Binbin Zhao, Tianyu Du, Shouling Ji -+ [The Use of the Simplex Architecture to Enhance Safety in Deep-Learning-Powered Autonomous Systems](https://arxiv.org//abs/2509.21014) ++ [The Use of the Simplex Architecture to Enhance Safety in Deep-Learning-Powered Autonomous Systems](https://arxiv.org/abs/2509.21014) Federico Nesti, Niko Salamini, Mauro Marinoni, Giorgio Maria Cicero, Gabriele Serra, Alessandro Biondi, Giorgio Buttazzo -+ [Vision Transformers: the threat of realistic adversarial patches](https://arxiv.org//abs/2509.21084) ++ [Vision Transformers: the threat of realistic adversarial patches](https://arxiv.org/abs/2509.21084) Kasper Cools, Clara Maathuis, Alexander M. van Oers, Claudia S. Hübner, Nikos Deligiannis, Marijke Vandewal, Geert De Cubber -+ [Evading Overlapping Community Detection via Proxy Node Injection](https://arxiv.org//abs/2509.21211) ++ [Evading Overlapping Community Detection via Proxy Node Injection](https://arxiv.org/abs/2509.21211) Dario Loi, Matteo Silvestri, Fabrizio Silvestri, Gabriele Tolomei -+ [No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks](https://arxiv.org//abs/2509.21296) ++ [No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks](https://arxiv.org/abs/2509.21296) Yehonatan Refael, Guy Smorodinsky, Ofir Lindenbaum, Itay Safran -+ [RedHerring Attack: Testing the Reliability of Attack Detection](https://arxiv.org//abs/2509.20691) ++ [RedHerring Attack: Testing the Reliability of Attack Detection](https://arxiv.org/abs/2509.20691) Jonathan Rusert -+ [Overcoming Black-box Attack Inefficiency with Hybrid and Dynamic Select Algorithms](https://arxiv.org//abs/2509.20699) ++ [Overcoming Black-box Attack Inefficiency with Hybrid and Dynamic Select Algorithms](https://arxiv.org/abs/2509.20699) Abhinay Shankar Belde, Rohit Ramkumar, Jonathan Rusert -+ [Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search](https://arxiv.org//abs/2509.20838) ++ [Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search](https://arxiv.org/abs/2509.20838) Shuo Huang, Xingliang Yuan, Gholamreza Haffari, Lizhen Qu -+ [Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization](https://arxiv.org//abs/2509.20900) ++ [Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization](https://arxiv.org/abs/2509.20900) Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch -+ [Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models](https://arxiv.org//abs/2509.21155) ++ [Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models](https://arxiv.org/abs/2509.21155) Chantal Shaib, Vinith M. Suriyakumar, Levent Sagun, Byron C. Wallace, Marzyeh Ghassemi -+ [GEP: A GCG-Based method for extracting personally identifiable information from chatbots built on small language models](https://arxiv.org//abs/2509.21192) ++ [GEP: A GCG-Based method for extracting personally identifiable information from chatbots built on small language models](https://arxiv.org/abs/2509.21192) Jieli Zhu, Vi Ngoc-Nha Tran -+ [Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond](https://arxiv.org//abs/2509.21284) ++ [Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond](https://arxiv.org/abs/2509.21284) Dingzirui Wang, Xuanliang Zhang, Keyan Xu, Qingfu Zhu, Wanxiang Che, Yang Deng -+ [Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation](https://arxiv.org//abs/2509.20680) ++ [Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation](https://arxiv.org/abs/2509.20680) Wenkai Guo, Xuefeng Liu, Haolin Wang, Jianwei Niu, Shaojie Tang, Jing Yuan -+ [CLUE: Conflict-guided Localization for LLM Unlearning Framework](https://arxiv.org//abs/2509.20977) ++ [CLUE: Conflict-guided Localization for LLM Unlearning Framework](https://arxiv.org/abs/2509.20977) Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang -+ [Poisoning Prompt-Guided Sampling in Video Large Language Models](https://arxiv.org//abs/2509.20851) ++ [Poisoning Prompt-Guided Sampling in Video Large Language Models](https://arxiv.org/abs/2509.20851) Yuxin Cao, Wei Song, Jingling Xue, Jin Song Dong -+ [The Unanticipated Asymmetry Between Perceptual Optimization and Assessment](https://arxiv.org//abs/2509.20878) ++ [The Unanticipated Asymmetry Between Perceptual Optimization and Assessment](https://arxiv.org/abs/2509.20878) Jiabei Zhang, Qi Wang, Siyu Wu, Du Chen, Tianhe Wu -+ [A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models](https://arxiv.org//abs/2509.21008) ++ [A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models](https://arxiv.org/abs/2509.21008) Qinqin He, Jiaqi Weng, Jialing Tao, Hui Xue -+ [The Unwinnable Arms Race of AI Image Detection](https://arxiv.org//abs/2509.21135) ++ [The Unwinnable Arms Race of AI Image Detection](https://arxiv.org/abs/2509.21135) Till Aczel, Lorenzo Vettor, Andreas Plesner, Roger Wattenhofer -+ [FERD: Fairness-Enhanced Data-Free Robustness Distillation](https://arxiv.org//abs/2509.20793) ++ [FERD: Fairness-Enhanced Data-Free Robustness Distillation](https://arxiv.org/abs/2509.20793) Zhengxiao Li, Liming Lu, Xu Zheng, Siyuan Liang, Zhenghan Chen, Yongbin Zhou, Shuchao Pang -+ [Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers](https://arxiv.org//abs/2509.21130) ++ [Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers](https://arxiv.org/abs/2509.21130) Killian Steunou, Sigurd Saue, Théo Druilhe -+ [The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures](https://arxiv.org//abs/2509.20736) ++ [The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures](https://arxiv.org/abs/2509.20736) Zhenshan Zhang, Xueping Zhang, Yechen Wang, Liwei Jin, Ming Li -+ [FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction](https://arxiv.org//abs/2509.21029) ++ [FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction](https://arxiv.org/abs/2509.21029) Runqi Lin, Alasdair Paren, Suqin Yuan, Muyang Li, Philip Torr, Adel Bibi, Tongliang Liu -+ [EvoMail: Self-Evolving Cognitive Agents for Adaptive Spam and Phishing Email Defense](https://arxiv.org//abs/2509.21129) ++ [EvoMail: Self-Evolving Cognitive Agents for Adaptive Spam and Phishing Email Defense](https://arxiv.org/abs/2509.21129) Wei Huang, De-Tian Chu, Lin-Yuan Bai, Wei Kang, Hai-Tao Zhang, Bo Li, Zhi-Mo Han, Jing Ge, Hai-Feng Lin -+ [Optimal Robust Recourse with $L^p$-Bounded Model Change](https://arxiv.org//abs/2509.21293) ++ [Optimal Robust Recourse with $L^p$-Bounded Model Change](https://arxiv.org/abs/2509.21293) Phone Kyaw, Kshitij Kayastha, Shahin Jabbari -+ [Cryptographic Backdoor for Neural Networks: Boon and Bane](https://arxiv.org//abs/2509.20714) ++ [Cryptographic Backdoor for Neural Networks: Boon and Bane](https://arxiv.org/abs/2509.20714) Anh Tu Ngo, Anupam Chattopadhyay, Subhamoy Maitra -+ [Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks?](https://arxiv.org//abs/2509.21087) ++ [Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks?](https://arxiv.org/abs/2509.21087) Rostislav Makarov, Lea Schönherr, Timo Gerkmann -+ [RLCracker: Exposing the Vulnerability of LLM Watermarks with Adaptive RL Attacks](https://arxiv.org//abs/2509.20924) ++ [RLCracker: Exposing the Vulnerability of LLM Watermarks with Adaptive RL Attacks](https://arxiv.org/abs/2509.20924) Hanbo Huang, Yiran Zhang, Hao Zheng, Xuan Gong, Yihan Li, Lin Liu, Shiyu Liang -+ [TRiCo: Triadic Game-Theoretic Co-Training for Robust Semi-Supervised Learning](https://arxiv.org//abs/2509.21526) ++ [TRiCo: Triadic Game-Theoretic Co-Training for Robust Semi-Supervised Learning](https://arxiv.org/abs/2509.21526) Hongyang He, Xinyuan Song, Yangfan He, Zeyu Zhang, Yanshu Li, Haochen You, Lifan Sun, Wenqiao Zhang -+ [Wav2Arrest 2.0: Long-Horizon Cardiac Arrest Prediction with Time-to-Event Modeling, Identity-Invariance, and Pseudo-Lab Alignment](https://arxiv.org//abs/2509.21695) ++ [Wav2Arrest 2.0: Long-Horizon Cardiac Arrest Prediction with Time-to-Event Modeling, Identity-Invariance, and Pseudo-Lab Alignment](https://arxiv.org/abs/2509.21695) Saurabh Kataria, Davood Fattahi, Minxiao Wang, Ran Xiao, Matthew Clark, Timothy Ruchti, Mark Mai, Xiao Hu -+ [Functional Encryption in Secure Neural Network Training: Data Leakage and Practical Mitigations](https://arxiv.org//abs/2509.21497) ++ [Functional Encryption in Secure Neural Network Training: Data Leakage and Practical Mitigations](https://arxiv.org/abs/2509.21497) Alexandru Ioniţă, Andreea Ioniţă -+ [HuLA: Prosody-Aware Anti-Spoofing with Multi-Task Learning for Expressive and Emotional Synthetic Speech](https://arxiv.org//abs/2509.21676) ++ [HuLA: Prosody-Aware Anti-Spoofing with Multi-Task Learning for Expressive and Emotional Synthetic Speech](https://arxiv.org/abs/2509.21676) Aurosweta Mahapatra, Ismail Rasim Ulgen, Berrak Sisman -+ [Bidirectional Intention Inference Enhances LLMs' Defense Against Multi-Turn Jailbreak Attacks](https://arxiv.org//abs/2509.22732) ++ [Bidirectional Intention Inference Enhances LLMs' Defense Against Multi-Turn Jailbreak Attacks](https://arxiv.org/abs/2509.22732) Haibo Tong, Dongcheng Zhao, Guobin Shen, Xiang He, Dachuan Lin, Feifei Zhao, Yi Zeng -+ [Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models](https://arxiv.org//abs/2509.22723) ++ [Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models](https://arxiv.org/abs/2509.22723) Kang Wei, Xin Yuan, Fushuo Huo, Chuan Ma, Long Yuan, Songze Li, Ming Ding, Dacheng Tao # 2025-09-24 -+ [LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation](https://arxiv.org//abs/2509.19839) ++ [LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation](https://arxiv.org/abs/2509.19839) Huizhen Shu, Xuying Li, Zhuo Li -+ [CON-QA: Privacy-Preserving QA using cloud LLMs in Contract Domain](https://arxiv.org//abs/2509.19925) ++ [CON-QA: Privacy-Preserving QA using cloud LLMs in Contract Domain](https://arxiv.org/abs/2509.19925) Ajeet Kumar Singh, Rajsabi Surya, Anurag Tripathi, Santanu Choudhury, Sudhir Bisane -+ [Steerable Adversarial Scenario Generation through Test-Time Preference Alignment](https://arxiv.org//abs/2509.20102) ++ [Steerable Adversarial Scenario Generation through Test-Time Preference Alignment](https://arxiv.org/abs/2509.20102) Tong Nie, Yuewen Mei, Yihong Tang, Junlin He, Jie Sun, Haotian Shi, Wei Ma, Jian Sun -+ [bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs](https://arxiv.org//abs/2509.19775) ++ [bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs](https://arxiv.org/abs/2509.19775) Wence Ji, Jiancan Wu, Aiying Li, Shuyi Zhang, Junkang Wu, An Zhang, Xiang Wang, Xiangnan He -+ [A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers](https://arxiv.org//abs/2509.19947) ++ [A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers](https://arxiv.org/abs/2509.19947) Zhixiao Wu, Yao Lu, Jie Wen, Hao Sun, Qi Zhou, Guangming Lu -+ [Generative Adversarial Networks Applied for Privacy Preservation in Biometric-Based Authentication and Identification](https://arxiv.org//abs/2509.20024) ++ [Generative Adversarial Networks Applied for Privacy Preservation in Biometric-Based Authentication and Identification](https://arxiv.org/abs/2509.20024) Lubos Mjachky, Ivan Homoliak -+ [Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization](https://arxiv.org//abs/2509.20230) ++ [Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization](https://arxiv.org/abs/2509.20230) Wenhan Wu, Zheyuan Liu, Chongyang Gao, Ren Wang, Kaize Ding -+ [RAG Security and Privacy: Formalizing the Threat Model and Attack Surface](https://arxiv.org//abs/2509.20324) ++ [RAG Security and Privacy: Formalizing the Threat Model and Attack Surface](https://arxiv.org/abs/2509.20324) Atousa Arzanipour, Rouzbeh Behnia, Reza Ebrahimi, Kaushik Dutta -+ [Benchmarking Gaslighting Attacks Against Speech Large Language Models](https://arxiv.org//abs/2509.19858) ++ [Benchmarking Gaslighting Attacks Against Speech Large Language Models](https://arxiv.org/abs/2509.19858) Jinyang Wu, Bin Zhu, Xiandong Zou, Qiquan Zhang, Xu Fang, Pan Zhou -+ [BiTAA: A Bi-Task Adversarial Attack for Object Detection and Depth Estimation via 3D Gaussian Splatting](https://arxiv.org//abs/2509.19793) ++ [BiTAA: A Bi-Task Adversarial Attack for Object Detection and Depth Estimation via 3D Gaussian Splatting](https://arxiv.org/abs/2509.19793) Yixun Zhang, Feng Zhou, Jianqin Yin -+ [FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models](https://arxiv.org//abs/2509.19870) ++ [FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models](https://arxiv.org/abs/2509.19870) Xin Wang, Jie Li, Zejia Weng, Yixu Wang, Yifeng Gao, Tianyu Pang, Chao Du, Yan Teng, Yingchun Wang, Zuxuan Wu, Xingjun Ma, Yu-Gang Jiang -+ [Improving Generalizability and Undetectability for Targeted Adversarial Attacks on Multimodal Pre-trained Models](https://arxiv.org//abs/2509.19994) ++ [Improving Generalizability and Undetectability for Targeted Adversarial Attacks on Multimodal Pre-trained Models](https://arxiv.org/abs/2509.19994) Zhifang Zhang, Jiahan Zhang, Shengjie Zhou, Qi Wei, Shuo He, Feng Liu, Lei Feng -+ [Does the Manipulation Process Matter? RITA: Reasoning Composite Image Manipulations via Reversely-Ordered Incremental-Transition Autoregression](https://arxiv.org//abs/2509.20006) ++ [Does the Manipulation Process Matter? RITA: Reasoning Composite Image Manipulations via Reversely-Ordered Incremental-Transition Autoregression](https://arxiv.org/abs/2509.20006) Xuekang Zhu, Ji-Zhe Zhou, Kaiwen Feng, Chenfan Qu, Yunfei Wang, Liting Zhou, Jian liu -+ [Smaller is Better: Enhancing Transparency in Vehicle AI Systems via Pruning](https://arxiv.org//abs/2509.20148) ++ [Smaller is Better: Enhancing Transparency in Vehicle AI Systems via Pruning](https://arxiv.org/abs/2509.20148) Sanish Suwal, Shaurya Garg, Dipkamal Bhusal, Michael Clifford, Nidhi Rastogi -+ [Universal Camouflage Attack on Vision-Language Models for Autonomous Driving](https://arxiv.org//abs/2509.20196) ++ [Universal Camouflage Attack on Vision-Language Models for Autonomous Driving](https://arxiv.org/abs/2509.20196) Dehong Kong, Sifan Yu, Siyuan Liang, Jiawei Liang, Jianhou Gan, Aishan Liu, Wenqi Ren -+ [Consistent Estimation of Numerical Distributions under Local Differential Privacy by Wavelet Expansion](https://arxiv.org//abs/2509.19661) ++ [Consistent Estimation of Numerical Distributions under Local Differential Privacy by Wavelet Expansion](https://arxiv.org/abs/2509.19661) Puning Zhao, Zhikun Zhang, Bo Sun, Li Shen, Liang Zhang, Shaowei Wang, Zhe Liu -+ [On the Fragility of Contribution Score Computation in Federated Learning](https://arxiv.org//abs/2509.19921) ++ [On the Fragility of Contribution Score Computation in Federated Learning](https://arxiv.org/abs/2509.19921) Balazs Pejo, Marcell Frank, Krisztian Varga, Peter Veliczky -+ [Generative Model Inversion Through the Lens of the Manifold Hypothesis](https://arxiv.org//abs/2509.20177) ++ [Generative Model Inversion Through the Lens of the Manifold Hypothesis](https://arxiv.org/abs/2509.20177) Xiong Peng, Bo Han, Fengfei Yu, Tongliang Liu, Feng Liu, Mingyuan Zhou -+ [Staying on the Manifold: Geometry-Aware Noise Injection](https://arxiv.org//abs/2509.20201) ++ [Staying on the Manifold: Geometry-Aware Noise Injection](https://arxiv.org/abs/2509.20201) Albert Kjøller Jacobsen, Johanna Marie Gegenfurtner, Georgios Arvanitidis -+ [Monitoring Violations of Differential Privacy over Time](https://arxiv.org//abs/2509.20283) ++ [Monitoring Violations of Differential Privacy over Time](https://arxiv.org/abs/2509.20283) Önder Askin, Tim Kutta, Holger Dette -+ [FlyTrap: Physical Distance-Pulling Attack Towards Camera-based Autonomous Target Tracking Systems](https://arxiv.org//abs/2509.20362) ++ [FlyTrap: Physical Distance-Pulling Attack Towards Camera-based Autonomous Target Tracking Systems](https://arxiv.org/abs/2509.20362) Shaoyuan Xie, Mohamad Habib Fakih, Junchi Lu, Fayzah Alshammari, Ningfei Wang, Takami Sato, Halima Bouzidi, Mohammad Abdullah Al Faruque, Qi Alfred Chen -+ [Are Neural Networks Collision Resistant?](https://arxiv.org//abs/2509.20262) ++ [Are Neural Networks Collision Resistant?](https://arxiv.org/abs/2509.20262) Marco Benedetti, Andrej Bogdanov, Enrico M. Malatesta, Marc Mézard, Gianmarco Perrupato, Alon Rosen, Nikolaj I. Schwartzbach, Riccardo Zecchina -+ [Adversarial Defense in Cybersecurity: A Systematic Review of GANs for Threat Detection and Mitigation](https://arxiv.org//abs/2509.20411) ++ [Adversarial Defense in Cybersecurity: A Systematic Review of GANs for Threat Detection and Mitigation](https://arxiv.org/abs/2509.20411) Tharcisse Ndayipfukamiye, Jianguo Ding, Doreen Sebastian Sarwatt, Adamu Gaston Philipo, Huansheng Ning -+ [Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits](https://arxiv.org//abs/2509.20549) ++ [Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits](https://arxiv.org/abs/2509.20549) Weixin Chen, Han Zhao -+ [Perspectra: Choosing Your Experts Enhances Critical Thinking in Multi-Agent Research Ideation](https://arxiv.org//abs/2509.20553) ++ [Perspectra: Choosing Your Experts Enhances Critical Thinking in Multi-Agent Research Ideation](https://arxiv.org/abs/2509.20553) Yiren Liu, Viraj Shah, Sangho Suh, Pao Siangliulue, Tal August, Yun Huang -+ [Every Character Counts: From Vulnerability to Defense in Phishing Detection](https://arxiv.org//abs/2509.20589) ++ [Every Character Counts: From Vulnerability to Defense in Phishing Detection](https://arxiv.org/abs/2509.20589) Maria Chiper, Radu Tudor Ionescu -+ [Bridging Privacy and Utility: Synthesizing anonymized EEG with constraining utility functions](https://arxiv.org//abs/2509.20454) ++ [Bridging Privacy and Utility: Synthesizing anonymized EEG with constraining utility functions](https://arxiv.org/abs/2509.20454) Kay Fuhrmeister, Arne Pelzer, Fabian Radke, Julia Lechinger, Mahzad Gharleghi, Thomas Köllmer, Insa Wolf -+ [Efficiently Attacking Memorization Scores](https://arxiv.org//abs/2509.20463) ++ [Efficiently Attacking Memorization Scores](https://arxiv.org/abs/2509.20463) Tue Do, Varun Chandrasekaran, Daniel Alabi -+ [Differential Privacy of Network Parameters from a System Identification Perspective](https://arxiv.org//abs/2509.20460) ++ [Differential Privacy of Network Parameters from a System Identification Perspective](https://arxiv.org/abs/2509.20460) Andrew Campbell, Anna Scaglione, Hang Liu, Victor Elvira, Sean Peisert, Daniel Arnold -+ [Advancing Practical Homomorphic Encryption for Federated Learning: Theoretical Guarantees and Efficiency Optimizations](https://arxiv.org//abs/2509.20476) ++ [Advancing Practical Homomorphic Encryption for Federated Learning: Theoretical Guarantees and Efficiency Optimizations](https://arxiv.org/abs/2509.20476) Ren-Yi Huang, Dumindu Samaraweera, Prashant Shekhar, J. Morris Chang -+ [JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation](https://arxiv.org//abs/2509.21401) ++ [JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation](https://arxiv.org/abs/2509.21401) Md Jueal Mia, M. Hadi Amini -+ [Dynamic Dual-level Defense Routing for Continual Adversarial Training](https://arxiv.org//abs/2509.21392) ++ [Dynamic Dual-level Defense Routing for Continual Adversarial Training](https://arxiv.org/abs/2509.21392) Wenxuan Wang, Chenglei Wang, Xuelin Qian -+ [SafeSteer: Adaptive Subspace Steering for Efficient Jailbreak Defense in Vision-Language Models](https://arxiv.org//abs/2509.21400) ++ [SafeSteer: Adaptive Subspace Steering for Efficient Jailbreak Defense in Vision-Language Models](https://arxiv.org/abs/2509.21400) Xiyu Zeng, Siyuan Liang, Liming Lu, Haotian Zhu, Enguang Liu, Jisheng Dang, Yongbin Zhou, Shuchao Pang -+ [Large Language Models for Real-World IoT Device Identification](https://arxiv.org//abs/2510.13817) ++ [Large Language Models for Real-World IoT Device Identification](https://arxiv.org/abs/2510.13817) Rameen Mahmood, Tousif Ahmed, Sai Teja Peddinti, Danny Yuxing Huang # 2025-09-23 -+ [TIMED: Adversarial and Autoregressive Refinement of Diffusion-Based Time Series Generation](https://arxiv.org//abs/2509.19638) ++ [TIMED: Adversarial and Autoregressive Refinement of Diffusion-Based Time Series Generation](https://arxiv.org/abs/2509.19638) MohammadReza EskandariNasab, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi -+ [The Pareto Frontier of Resilient Jet Tagging](https://arxiv.org//abs/2509.19431) ++ [The Pareto Frontier of Resilient Jet Tagging](https://arxiv.org/abs/2509.19431) Rikab Gambhir, Matt LeBlanc, Yuanchen Zhou -+ [Stochastic Path Planning in Correlated Obstacle Fields](https://arxiv.org//abs/2509.19559) ++ [Stochastic Path Planning in Correlated Obstacle Fields](https://arxiv.org/abs/2509.19559) Li Zhou, Elvan Ceyhan -+ [Improving Credit Card Fraud Detection through Transformer-Enhanced GAN Oversampling](https://arxiv.org//abs/2509.19032) ++ [Improving Credit Card Fraud Detection through Transformer-Enhanced GAN Oversampling](https://arxiv.org/abs/2509.19032) Kashaf Ul Emaan -+ [The Secret Agenda: LLMs Strategically Lie and Our Current Safety Tools Are Blind](https://arxiv.org//abs/2509.20393) ++ [The Secret Agenda: LLMs Strategically Lie and Our Current Safety Tools Are Blind](https://arxiv.org/abs/2509.20393) Caleb DeLeeuw, Gaurav Chawla, Aniket Sharma, Vanessa Dietze -+ [Defending against Stegomalware in Deep Neural Networks with Permutation Symmetry](https://arxiv.org//abs/2509.20399) ++ [Defending against Stegomalware in Deep Neural Networks with Permutation Symmetry](https://arxiv.org/abs/2509.20399) Birk Torpmann-Hagen, Michael A. Riegler, Pål Halvorsen, Dag Johansen -+ [Why Speech Deepfake Detectors Won't Generalize: The Limits of Detection in an Open World](https://arxiv.org//abs/2509.20405) ++ [Why Speech Deepfake Detectors Won't Generalize: The Limits of Detection in an Open World](https://arxiv.org/abs/2509.20405) Visar Berisha, Prad Kadambi, Isabella Lenz -+ [SAEmnesia: Erasing Concepts in Diffusion Models with Sparse Autoencoders](https://arxiv.org//abs/2509.21379) ++ [SAEmnesia: Erasing Concepts in Diffusion Models with Sparse Autoencoders](https://arxiv.org/abs/2509.21379) Enrico Cassano, Riccardo Renzulli, Marco Nurisso, Mirko Zaffaroni, Alan Perotti, Marco Grangetto -+ [Localizing Adversarial Attacks To Produces More Imperceptible Noise](https://arxiv.org//abs/2509.22710) ++ [Localizing Adversarial Attacks To Produces More Imperceptible Noise](https://arxiv.org/abs/2509.22710) Pavan Reddy, Aditya Sanjay Gujral -+ [Diversity Boosts AI-Generated Text Detection](https://arxiv.org//abs/2509.18880) ++ [Diversity Boosts AI-Generated Text Detection](https://arxiv.org/abs/2509.18880) Advik Raj Basani, Pin-Yu Chen -+ [Uncovering Privacy Vulnerabilities through Analytical Gradient Inversion Attacks](https://arxiv.org//abs/2509.18871) ++ [Uncovering Privacy Vulnerabilities through Analytical Gradient Inversion Attacks](https://arxiv.org/abs/2509.18871) Tamer Ahmed Eltaras, Qutaibah Malluhi, Alessandro Savino, Stefano Di Carlo, Adnan Qayyum @@ -2237,121 +2237,121 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Joachim Diederich -+ [DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces](https://arxiv.org//abs/2509.19230) ++ [DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces](https://arxiv.org/abs/2509.19230) Tianshuo Zhang, Li Gao, Siran Peng, Xiangyu Zhu, Zhen Lei # 2025-09-22 -+ [Is It Certainly a Deepfake? Reliability Analysis in Detection & Generation Ecosystem](https://arxiv.org//abs/2509.17550) ++ [Is It Certainly a Deepfake? Reliability Analysis in Detection & Generation Ecosystem](https://arxiv.org/abs/2509.17550) Neslihan Kose, Anthony Rhodes, Umur Aybars Ciftci, Ilke Demir -+ [Distributionally Robust Safety Verification of Neural Networks via Worst-Case CVaR](https://arxiv.org//abs/2509.17413) ++ [Distributionally Robust Safety Verification of Neural Networks via Worst-Case CVaR](https://arxiv.org/abs/2509.17413) Masako Kishida -+ [Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents](https://arxiv.org//abs/2509.17488) ++ [Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents](https://arxiv.org/abs/2509.17488) Shouju Wang, Fenglin Yu, Xirui Liu, Xiaoting Qin, Jue Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan -+ [Hybrid Reputation Aggregation: A Robust Defense Mechanism for Adversarial Federated Learning in 5G and Edge Network Environments](https://arxiv.org//abs/2509.18044) ++ [Hybrid Reputation Aggregation: A Robust Defense Mechanism for Adversarial Federated Learning in 5G and Edge Network Environments](https://arxiv.org/abs/2509.18044) Saeid Sheikhi, Panos Kostakos, Lauri Loven -+ [Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM](https://arxiv.org//abs/2509.18058) ++ [Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM](https://arxiv.org/abs/2509.18058) Alexander Panfilov, Evgenii Kortukov, Kristina Nikolić, Matthias Bethge, Sebastian Lapuschkin, Wojciech Samek, Ameya Prabhu, Maksym Andriushchenko, Jonas Geiping -+ [D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models](https://arxiv.org//abs/2509.17938) ++ [D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models](https://arxiv.org/abs/2509.17938) Satyapriya Krishna, Andy Zou, Rahul Gupta, Eliot Krzysztof Jones, Nick Winter, Dan Hendrycks, J. Zico Kolter, Matt Fredrikson, Spyros Matsoukas -+ [An Unlearning Framework for Continual Learning](https://arxiv.org//abs/2509.17530) ++ [An Unlearning Framework for Continual Learning](https://arxiv.org/abs/2509.17530) Sayanta Adhikari, Vishnuprasadh Kumaravelu, P. K. Srijith -+ [Budgeted Adversarial Attack against Graph-Based Anomaly Detection in Sensor Networks](https://arxiv.org//abs/2509.17987) ++ [Budgeted Adversarial Attack against Graph-Based Anomaly Detection in Sensor Networks](https://arxiv.org/abs/2509.17987) Sanju Xaviar, Omid Ardakanian -+ [SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models](https://arxiv.org//abs/2509.17371) ++ [SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models](https://arxiv.org/abs/2509.17371) Haotian Xu, Qingsong Peng, Jie Shi, Huadi Zheng, Yu Li, Cheng Zhuo -+ [Lipschitz-Based Robustness Certification for Recurrent Neural Networks via Convex Relaxation](https://arxiv.org//abs/2509.17898) ++ [Lipschitz-Based Robustness Certification for Recurrent Neural Networks via Convex Relaxation](https://arxiv.org/abs/2509.17898) Paul Hamelbeck, Johannes Schiffer -+ [Shilling Recommender Systems by Generating Side-feature-aware Fake User Profiles](https://arxiv.org//abs/2509.17918) ++ [Shilling Recommender Systems by Generating Side-feature-aware Fake User Profiles](https://arxiv.org/abs/2509.17918) Yuanrong Wang, Yingpeng Du -+ [TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion](https://arxiv.org//abs/2509.17302) ++ [TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion](https://arxiv.org/abs/2509.17302) Duoxun Tang, Xinhang Jiang, Jiajun Niu -+ [B-Privacy: Defining and Enforcing Privacy in Weighted Voting](https://arxiv.org//abs/2509.17871) ++ [B-Privacy: Defining and Enforcing Privacy in Weighted Voting](https://arxiv.org/abs/2509.17871) Samuel Breckenridge, Dani Vilardell, Andrés Fábrega, Amy Zhao, Patrick McCorry, Rafael Solari, Ari Juels -+ [Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis](https://arxiv.org//abs/2509.18014) ++ [Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis](https://arxiv.org/abs/2509.18014) Joshua Ward, Xiaofeng Lin, Chi-Hua Wang, Guang Cheng -+ [Quickest Change Detection in Continuous-Time in Presence of a Covert Adversary](https://arxiv.org//abs/2509.17778) ++ [Quickest Change Detection in Continuous-Time in Presence of a Covert Adversary](https://arxiv.org/abs/2509.17778) Amir Reza Ramtin, Philippe Nain, Don Towsley -+ [Design and Implementation of a Secure RAG-Enhanced AI Chatbot for Smart Tourism Customer Service: Defending Against Prompt Injection Attacks -- A Case Study of Hsinchu, Taiwan](https://arxiv.org//abs/2509.21367) ++ [Design and Implementation of a Secure RAG-Enhanced AI Chatbot for Smart Tourism Customer Service: Defending Against Prompt Injection Attacks -- A Case Study of Hsinchu, Taiwan](https://arxiv.org/abs/2509.21367) Yu-Kai Shih, You-Kai Kang # 2025-09-21 -+ [Mind the Gap: Comparing Model- vs Agentic-Level Red Teaming with Action-Graph Observability on GPT-OSS-20B](https://arxiv.org//abs/2509.17259) ++ [Mind the Gap: Comparing Model- vs Agentic-Level Red Teaming with Action-Graph Observability on GPT-OSS-20B](https://arxiv.org/abs/2509.17259) Ilham Wicaksono, Zekun Wu, Rahul Patel, Theo King, Adriano Koshiyama, Philip Treleaven -+ [AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software](https://arxiv.org//abs/2509.16861) ++ [AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software](https://arxiv.org/abs/2509.16861) Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Gunel Gulmammadova, Joey Chua -+ [Learning from Gene Names, Expression Values and Images: Contrastive Masked Text-Image Pretraining for Spatial Transcriptomics Representation Learning](https://arxiv.org//abs/2509.16892) ++ [Learning from Gene Names, Expression Values and Images: Contrastive Masked Text-Image Pretraining for Spatial Transcriptomics Representation Learning](https://arxiv.org/abs/2509.16892) Jiahe Qian, Yaoyu Fang, Ziqiao Weng, Xinkun Wang, Lee A. Cooper, Bo Zhou -+ [Localizing Malicious Outputs from CodeLLM](https://arxiv.org//abs/2509.17070) ++ [Localizing Malicious Outputs from CodeLLM](https://arxiv.org/abs/2509.17070) Mayukh Borana, Junyi Liang, Sai Sathiesh Rajan, Sudipta Chattopadhyay -+ [SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions](https://arxiv.org//abs/2509.17091) ++ [SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions](https://arxiv.org/abs/2509.17091) Massa Baali, Sarthak Bisht, Francisco Teixeira, Kateryna Shapovalenko, Rita Singh, Bhiksha Raj -+ [TraceHiding: Scalable Machine Unlearning for Mobility Data](https://arxiv.org//abs/2509.17241) ++ [TraceHiding: Scalable Machine Unlearning for Mobility Data](https://arxiv.org/abs/2509.17241) Ali Faraji, Manos Papagelis -+ [Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving](https://arxiv.org//abs/2509.16950) ++ [Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving](https://arxiv.org/abs/2509.16950) Xuan Chen, Shiwei Feng, Zikang Xiong, Shengwei An, Yunshu Mao, Lu Yan, Guanhong Tao, Wenbo Guo, Xiangyu Zhang -+ [Seeing is Deceiving: Mirror-Based LiDAR Spoofing for Autonomous Vehicle Deception](https://arxiv.org//abs/2509.17253) ++ [Seeing is Deceiving: Mirror-Based LiDAR Spoofing for Autonomous Vehicle Deception](https://arxiv.org/abs/2509.17253) Selma Yahia, Ildi Alla, Girija Bangalore Mohan, Daniel Rau, Mridula Singh, Valeria Loscri -+ [DecipherGuard: Understanding and Deciphering Jailbreak Prompts for a Safer Deployment of Intelligent Software Systems](https://arxiv.org//abs/2509.16870) ++ [DecipherGuard: Understanding and Deciphering Jailbreak Prompts for a Safer Deployment of Intelligent Software Systems](https://arxiv.org/abs/2509.16870) Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Gunel Gulmammadova, Joey Chua -+ [Lightweight MobileNetV1+GRU for ECG Biometric Authentication: Federated and Adversarial Evaluation](https://arxiv.org//abs/2509.20382) ++ [Lightweight MobileNetV1+GRU for ECG Biometric Authentication: Federated and Adversarial Evaluation](https://arxiv.org/abs/2509.20382) Dilli Hang Rai, Sabin Kafley -+ [MARS: A Malignity-Aware Backdoor Defense in Federated Learning](https://arxiv.org//abs/2509.20383) ++ [MARS: A Malignity-Aware Backdoor Defense in Federated Learning](https://arxiv.org/abs/2509.20383) Wei Wan, Yuxuan Ning, Zhicong Huang, Cheng Hong, Shengshan Hu, Ziqi Zhou, Yechao Zhang, Tianqing Zhu, Wanlei Zhou, Leo Yu Zhang -+ [Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models](https://arxiv.org//abs/2509.21360) ++ [Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models](https://arxiv.org/abs/2509.21360) Xingkai Peng, Jun Jiang, Meng Tong, Shuai Li, Weiming Zhang, Nenghai Yu, Kejiang Chen @@ -2360,658 +2360,658 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Xuan Chen, Shiwei Feng, Zikang Xiong, Shengwei An, Yunshu Mao, Lu Yan, Guanhong Tao, Wenbo Guo, Xiangyu Zhang # 2025-09-20 -+ [Can an Individual Manipulate the Collective Decisions of Multi-Agents?](https://arxiv.org//abs/2509.16494) ++ [Can an Individual Manipulate the Collective Decisions of Multi-Agents?](https://arxiv.org/abs/2509.16494) Fengyuan Liu, Rui Zhao, Shuo Chen, Guohao Li, Philip Torr, Lei Han, Jindong Gu -+ [Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks](https://arxiv.org//abs/2509.16546) ++ [Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks](https://arxiv.org/abs/2509.16546) Ashley Kurian, Aydin Aysu -+ [V-CECE: Visual Counterfactual Explanations via Conceptual Edits](https://arxiv.org//abs/2509.16567) ++ [V-CECE: Visual Counterfactual Explanations via Conceptual Edits](https://arxiv.org/abs/2509.16567) Nikolaos Spanos, Maria Lymperaiou, Giorgos Filandrianos, Konstantinos Thomas, Athanasios Voulodimos, Giorgos Stamou -+ [FakeChain: Exposing Shallow Cues in Multi-Step Deepfake Detection](https://arxiv.org//abs/2509.16602) ++ [FakeChain: Exposing Shallow Cues in Multi-Step Deepfake Detection](https://arxiv.org/abs/2509.16602) Minji Heo, Simon S. Woo -+ [MoRoVoc: A Large Dataset for Geographical Variation Identification of the Spoken Romanian Language](https://arxiv.org//abs/2509.16781) ++ [MoRoVoc: A Large Dataset for Geographical Variation Identification of the Spoken Romanian Language](https://arxiv.org/abs/2509.16781) Andrei-Marius Avram, Ema-Ioana Bănescu, Anda-Teodora Robea, Dumitru-Clementin Cercel, Mihaela-Claudia Cercel -+ [OS-DiffVSR: Towards One-step Latent Diffusion Model for High-detailed Real-world Video Super-Resolution](https://arxiv.org//abs/2509.16507) ++ [OS-DiffVSR: Towards One-step Latent Diffusion Model for High-detailed Real-world Video Super-Resolution](https://arxiv.org/abs/2509.16507) Hanting Li, Huaao Tang, Jianhong Han, Tianxiong Zhou, Jiulong Cui, Haizhen Xie, Yan Chen, Jie Hu -+ [A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis](https://arxiv.org//abs/2509.16582) ++ [A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis](https://arxiv.org/abs/2509.16582) Antonio Scardace, Lemuel Puglisi, Francesco Guarnera, Sebastiano Battiato, Daniele Ravì -+ [ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents](https://arxiv.org//abs/2509.16645) ++ [ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents](https://arxiv.org/abs/2509.16645) Yichen Wang, Hangtao Zhang, Hewen Pan, Ziqi Zhou, Xianlong Wang, Peijin Guo, Lulu Xue, Shengshan Hu, Minghui Li, Leo Yu Zhang -+ [SOLAR: Switchable Output Layer for Accuracy and Robustness in Once-for-All Training](https://arxiv.org//abs/2509.16833) ++ [SOLAR: Switchable Output Layer for Accuracy and Robustness in Once-for-All Training](https://arxiv.org/abs/2509.16833) Shaharyar Ahmed Khan Tareen, Lei Fan, Xiaojing Yuan, Qin Lin, Bin Hu -+ [FairTune: A Bias-Aware Fine-Tuning Framework Towards Fair Heart Rate Prediction from PPG](https://arxiv.org//abs/2509.16491) ++ [FairTune: A Bias-Aware Fine-Tuning Framework Towards Fair Heart Rate Prediction from PPG](https://arxiv.org/abs/2509.16491) Lovely Yeswanth Panchumarthi, Saurabh Kataria, Yi Wu, Xiao Hu, Alex Fedorov, Hyunjung Gloria Kwak -+ [Delving into Cryptanalytic Extraction of PReLU Neural Networks](https://arxiv.org//abs/2509.16620) ++ [Delving into Cryptanalytic Extraction of PReLU Neural Networks](https://arxiv.org/abs/2509.16620) Yi Chen, Xiaoyang Dong, Ruijie Ma, Yantian Shen, Anyu Wang, Hongbo Yu, Xiaoyun Wang -+ ["Digital Camouflage": The LLVM Challenge in LLM-Based Malware Detection](https://arxiv.org//abs/2509.16671) ++ ["Digital Camouflage": The LLVM Challenge in LLM-Based Malware Detection](https://arxiv.org/abs/2509.16671) Ekin Böke, Simon Torka # 2025-09-19 -+ [Stress Testing Deliberative Alignment for Anti-Scheming Training](https://arxiv.org//abs/2509.15541) ++ [Stress Testing Deliberative Alignment for Anti-Scheming Training](https://arxiv.org/abs/2509.15541) Bronson Schoen, Evgenia Nitishinskaya, Mikita Balesni, Axel Højmark, Felix Hofstätter, Jérémy Scheurer, Alexander Meinke, Jason Wolfe, Teun van der Weij, Alex Lloyd, Nicholas Goldowsky-Dill, Angela Fan, Andrei Matveiakin, Rusheb Shah, Marcus Williams, Amelia Glaese, Boaz Barak, Wojciech Zaremba, Marius Hobbhahn -+ [Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers](https://arxiv.org//abs/2509.16058) ++ [Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers](https://arxiv.org/abs/2509.16058) Krati Saxena, Federico Jurado Ruiz, Guido Manzi, Dianbo Liu, Alex Lamb -+ [Reward Hacking Mitigation using Verifiable Composite Rewards](https://arxiv.org//abs/2509.15557) ++ [Reward Hacking Mitigation using Verifiable Composite Rewards](https://arxiv.org/abs/2509.15557) Mirza Farhan Bin Tarek, Rahmatollah Beheshti -+ [Robust Vision-Language Models via Tensor Decomposition: A Defense Against Adversarial Attacks](https://arxiv.org//abs/2509.16163) ++ [Robust Vision-Language Models via Tensor Decomposition: A Defense Against Adversarial Attacks](https://arxiv.org/abs/2509.16163) Het Patel, Muzammil Allie, Qian Zhang, Jia Chen, Evangelos E. Papalexakis -+ [DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm](https://arxiv.org//abs/2509.15550) ++ [DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm](https://arxiv.org/abs/2509.15550) Xiaowei Zhu, Yubing Ren, Fang Fang, Qingfeng Tan, Shi Wang, Yanan Cao -+ [Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models](https://arxiv.org//abs/2509.15631) ++ [Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models](https://arxiv.org/abs/2509.15631) Tomoya Yamashita, Akira Ito, Yuuki Yamanaka, Masanori Yamada, Takayuki Miura, Toshiki Shibahara -+ [SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection](https://arxiv.org//abs/2509.16060) ++ [SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection](https://arxiv.org/abs/2509.16060) Maithili Joshi, Palash Nandi, Tanmoy Chakraborty -+ [Backdoor Mitigation via Invertible Pruning Masks](https://arxiv.org//abs/2509.15497) ++ [Backdoor Mitigation via Invertible Pruning Masks](https://arxiv.org/abs/2509.15497) Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak -+ [PolyJuice Makes It Real: Black-Box, Universal Red Teaming for Synthetic Image Detectors](https://arxiv.org//abs/2509.15551) ++ [PolyJuice Makes It Real: Black-Box, Universal Red Teaming for Synthetic Image Detectors](https://arxiv.org/abs/2509.15551) Sepehr Dehdashtian, Mashrur M. Morshed, Jacob H. Seidman, Gaurav Bharaj, Vishnu Naresh Boddeti -+ [Adversarial Graph Fusion for Incomplete Multi-view Semi-supervised Learning with Tensorial Imputation](https://arxiv.org//abs/2509.15955) ++ [Adversarial Graph Fusion for Incomplete Multi-view Semi-supervised Learning with Tensorial Imputation](https://arxiv.org/abs/2509.15955) Zhangqi Jiang, Tingjin Luo, Xu Yang, Xinyan Liang -+ [Randomized Smoothing Meets Vision-Language Models](https://arxiv.org//abs/2509.16088) ++ [Randomized Smoothing Meets Vision-Language Models](https://arxiv.org/abs/2509.16088) Emmanouil Seferis, Changshun Wu, Stefanos Kollias, Saddek Bensalem, Chih-Hong Cheng -+ [Inverting Trojans in LLMs](https://arxiv.org//abs/2509.16203) ++ [Inverting Trojans in LLMs](https://arxiv.org/abs/2509.16203) Zhengxing Li, Guangmingmei Yang, Jayaram Raghuram, David J. Miller, George Kesidis -+ [Adversarially Robust Assembly Language Model for Packed Executables Detection](https://arxiv.org//abs/2509.15499) ++ [Adversarially Robust Assembly Language Model for Packed Executables Detection](https://arxiv.org/abs/2509.15499) Shijia Li, Jiang Ming, Lanqing Liu, Longwei Yang, Ni Zhang, Chunfu Jia -+ [Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE](https://arxiv.org//abs/2509.15572) ++ [Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE](https://arxiv.org/abs/2509.15572) Xinpeng Liu, Junming Liu, Peiyu Liu, Han Zheng, Qinying Wang, Mathias Payer, Shouling Ji, Wenhai Wang -+ [Inference Attacks on Encrypted Online Voting via Traffic Analysis](https://arxiv.org//abs/2509.15694) ++ [Inference Attacks on Encrypted Online Voting via Traffic Analysis](https://arxiv.org/abs/2509.15694) Anastasiia Belousova, Francesco Marchiori, Mauro Conti -+ [An Adversarial Robust Behavior Sequence Anomaly Detection Approach Based on Critical Behavior Unit Learning](https://arxiv.org//abs/2509.15756) ++ [An Adversarial Robust Behavior Sequence Anomaly Detection Approach Based on Critical Behavior Unit Learning](https://arxiv.org/abs/2509.15756) Dongyang Zhan, Kai Tan, Lin Ye, Xiangzhan Yu, Hongli Zhang, Zheng He -+ [Secure Confidential Business Information When Sharing Machine Learning Models](https://arxiv.org//abs/2509.16352) ++ [Secure Confidential Business Information When Sharing Machine Learning Models](https://arxiv.org/abs/2509.16352) Yunfan Yang, Jiarong Xu, Hongzhe Zhang, Xiao Fang -+ [Evaluating CxG Generalisation in LLMs via Construction-Based NLI Fine Tuning](https://arxiv.org//abs/2509.16422) ++ [Evaluating CxG Generalisation in LLMs via Construction-Based NLI Fine Tuning](https://arxiv.org/abs/2509.16422) Tom Mackintosh, Harish Tayyar Madabushi, Claire Bonial -+ [Overfitting in Adaptive Robust Optimization](https://arxiv.org//abs/2509.16451) ++ [Overfitting in Adaptive Robust Optimization](https://arxiv.org/abs/2509.16451) Karl Zhu, Dimitris Bertsimas -+ [EigenTrack: Spectral Activation Feature Tracking for Hallucination and Out-of-Distribution Detection in LLMs and VLMs](https://arxiv.org//abs/2509.15735) ++ [EigenTrack: Spectral Activation Feature Tracking for Hallucination and Out-of-Distribution Detection in LLMs and VLMs](https://arxiv.org/abs/2509.15735) Davide Ettori, Nastaran Darabi, Sina Tayebati, Ranganath Krishnan, Mahesh Subedar, Omesh Tickoo, Amit Ranjan Trivedi # 2025-09-18 -+ [SynBench: A Benchmark for Differentially Private Text Generation](https://arxiv.org//abs/2509.14594) ++ [SynBench: A Benchmark for Differentially Private Text Generation](https://arxiv.org/abs/2509.14594) Yidan Sun, Viktor Schlegel, Srinivasan Nandakumar, Iqra Zahid, Yuping Wu, Yulong Wu, Hao Li, Jie Zhang, Warren Del-Pinto, Goran Nenadic, Siew Kei Lam, Anil Anthony Bharath -+ [Enhancing Retrieval Augmentation via Adversarial Collaboration](https://arxiv.org//abs/2509.14750) ++ [Enhancing Retrieval Augmentation via Adversarial Collaboration](https://arxiv.org/abs/2509.14750) Letian Zhang, Guanghao Meng, Xudong Ren, Yiming Wang, Shu-Tao Xia -+ [Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems](https://arxiv.org//abs/2509.14956) ++ [Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems](https://arxiv.org/abs/2509.14956) Diego Gosmar, Deborah A. Dahl -+ [LLM Jailbreak Detection for (Almost) Free!](https://arxiv.org//abs/2509.14558) ++ [LLM Jailbreak Detection for (Almost) Free!](https://arxiv.org/abs/2509.14558) Guorui Chen, Yifan Xia, Xiaojun Jia, Zhijiang Li, Philip Torr, Jindong Gu -+ [Enterprise AI Must Enforce Participant-Aware Access Control](https://arxiv.org//abs/2509.14608) ++ [Enterprise AI Must Enforce Participant-Aware Access Control](https://arxiv.org/abs/2509.14608) Shashank Shreedhar Bhatt, Tanmay Rajore, Khushboo Aggarwal, Ganesh Ananthanarayanan, Ranveer Chandra, Nishanth Chandran, Suyash Choudhury, Divya Gupta, Emre Kiciman, Sumit Kumar Pandey, Srinath Setty, Rahul Sharma, Teijia Zhao -+ [Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection](https://arxiv.org//abs/2509.14622) ++ [Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection](https://arxiv.org/abs/2509.14622) Yihao Guo, Haocheng Bian, Liutong Zhou, Ze Wang, Zhaoyi Zhang, Francois Kawala, Milan Dean, Ian Fischer, Yuantao Peng, Noyan Tokgozoglu, Ivan Barrientos, Riyaaz Shaik, Rachel Li, Chandru Venkataraman, Reza Shifteh Far, Moses Pawar, Venkat Sundaranatha, Michael Xu, Frank Chu -+ [Reveal and Release: Iterative LLM Unlearning with Self-generated Data](https://arxiv.org//abs/2509.14624) ++ [Reveal and Release: Iterative LLM Unlearning with Self-generated Data](https://arxiv.org/abs/2509.14624) Linxi Xie, Xin Teng, Shichang Ke, Hongyi Wen, Shengjie Wang -+ [MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models](https://arxiv.org//abs/2509.14651) ++ [MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models](https://arxiv.org/abs/2509.14651) Siyu Yan, Long Zeng, Xuecheng Wu, Chengcheng Han, Kongcheng Zhang, Chong Peng, Xuezhi Cao, Xunliang Cai, Chenjuan Guo -+ [[Re] Improving Interpretation Faithfulness for Vision Transformers](https://arxiv.org//abs/2509.14846) ++ [[Re] Improving Interpretation Faithfulness for Vision Transformers](https://arxiv.org/abs/2509.14846) Izabela Kurek, Wojciech Trejter, Stipe Frkovic, Andro Erdelez -+ [Discrete optimal transport is a strong audio adversarial attack](https://arxiv.org//abs/2509.14959) ++ [Discrete optimal transport is a strong audio adversarial attack](https://arxiv.org/abs/2509.14959) Anton Selitskiy, Akib Shahriyar, Jishnuraj Prakasan -+ [Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning](https://arxiv.org//abs/2509.15103) ++ [Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2509.15103) Simin Li, Zheng Yuwei, Zihao Mao, Linhao Wang, Ruixiao Xu, Chengdong Ma, Xin Yu, Yuqing Ma, Qi Dou, Xin Wang, Jie Luo, Bo An, Yaodong Yang, Weifeng Lv, Xianglong Liu -+ [Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting](https://arxiv.org//abs/2509.15170) ++ [Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting](https://arxiv.org/abs/2509.15170) Aarushi Mahajan, Wayne Burleson -+ [LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models](https://arxiv.org//abs/2509.15218) ++ [LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models](https://arxiv.org/abs/2509.15218) Ruijie Hou, Yueyang Jiao, Hanxu Hu, Yingming Li, Wai Lam, Huajian Zhang, Hongyuan Lu -+ [AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt](https://arxiv.org//abs/2509.15159) ++ [AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt](https://arxiv.org/abs/2509.15159) Saket S. Chaturvedi, Gaurav Bagwe, Lan Zhang, Xiaoyong Yuan -+ [Edge-Aware Normalized Attention for Efficient and Detail-Preserving Single Image Super-Resolution](https://arxiv.org//abs/2509.14550) ++ [Edge-Aware Normalized Attention for Efficient and Detail-Preserving Single Image Super-Resolution](https://arxiv.org/abs/2509.14550) Penghao Rao, Tieyong Zeng -+ [Geometric Image Synchronization with Deep Watermarking](https://arxiv.org//abs/2509.15208) ++ [Geometric Image Synchronization with Deep Watermarking](https://arxiv.org/abs/2509.15208) Pierre Fernandez, Tomáš Souček, Nikola Jovanović, Hady Elsahar, Sylvestre-Alvise Rebuffi, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko -+ [Towards Privacy-Preserving and Heterogeneity-aware Split Federated Learning via Probabilistic Masking](https://arxiv.org//abs/2509.14603) ++ [Towards Privacy-Preserving and Heterogeneity-aware Split Federated Learning via Probabilistic Masking](https://arxiv.org/abs/2509.14603) Xingchen Wang, Feijie Wu, Chenglin Miao, Tianchun Li, Haoyu Hu, Qiming Cao, Jing Gao, Lu Su -+ [CUFG: Curriculum Unlearning Guided by the Forgetting Gradient](https://arxiv.org//abs/2509.14633) ++ [CUFG: Curriculum Unlearning Guided by the Forgetting Gradient](https://arxiv.org/abs/2509.14633) Jiaxing Miao, Liang Hu, Qi Zhang, Lai Zhong Yuan, Usman Naseem -+ [STEP: Structured Training and Evaluation Platform for benchmarking trajectory prediction models](https://arxiv.org//abs/2509.14801) ++ [STEP: Structured Training and Evaluation Platform for benchmarking trajectory prediction models](https://arxiv.org/abs/2509.14801) Julian F. Schumann, Anna Mészáros, Jens Kober, Arkady Zgonnikov -+ [Benefits of Online Tilted Empirical Risk Minimization: A Case Study of Outlier Detection and Robust Regression](https://arxiv.org//abs/2509.15141) ++ [Benefits of Online Tilted Empirical Risk Minimization: A Case Study of Outlier Detection and Robust Regression](https://arxiv.org/abs/2509.15141) Yigit E. Yildirim, Samet Demir, Zafer Dogan -+ [Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction](https://arxiv.org//abs/2509.15202) ++ [Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction](https://arxiv.org/abs/2509.15202) Yuanbo Xie, Yingjie Zhang, Tianyun Liu, Duohe Ma, Tingwen Liu -+ [Evil Vizier: Vulnerabilities of LLM-Integrated XR Systems](https://arxiv.org//abs/2509.15213) ++ [Evil Vizier: Vulnerabilities of LLM-Integrated XR Systems](https://arxiv.org/abs/2509.15213) Yicheng Zhang, Zijian Huang, Sophie Chen, Erfan Shayegani, Jiasi Chen, Nael Abu-Ghazaleh -+ [Acoustic Simulation Framework for Multi-channel Replay Speech Detection](https://arxiv.org//abs/2509.14789) ++ [Acoustic Simulation Framework for Multi-channel Replay Speech Detection](https://arxiv.org/abs/2509.14789) Michael Neri, Tuomas Virtanen -+ [ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models](https://arxiv.org//abs/2509.15435) ++ [ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models](https://arxiv.org/abs/2509.15435) Chung-En Johnny Yu, Hsuan-Chih (Neil)Chen, Brian Jalaian, Nathaniel D. Bastian -+ [Impact of Phonetics on Speaker Identity in Adversarial Voice Attack](https://arxiv.org//abs/2509.15437) ++ [Impact of Phonetics on Speaker Identity in Adversarial Voice Attack](https://arxiv.org/abs/2509.15437) Daniyal Kabir Dar, Qiben Yan, Li Xiao, Arun Ross -+ [Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages](https://arxiv.org//abs/2509.15260) ++ [Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages](https://arxiv.org/abs/2509.15260) Yujia Hu, Ming Shan Hee, Preslav Nakov, Roy Ka-Wei Lee -+ [Red Teaming Multimodal Language Models: Evaluating Harm Across Prompt Modalities and Models](https://arxiv.org//abs/2509.15478) ++ [Red Teaming Multimodal Language Models: Evaluating Harm Across Prompt Modalities and Models](https://arxiv.org/abs/2509.15478) Madison Van Doren, Casey Ford, Emily Dix -+ [Stochastic Sample Approximations of (Local) Moduli of Continuity](https://arxiv.org//abs/2509.15368) ++ [Stochastic Sample Approximations of (Local) Moduli of Continuity](https://arxiv.org/abs/2509.15368) Rodion Nazarov, Allen Gehret, Robert Shorten, Jakub Marecek -+ [Adversarial generalization of unfolding (model-based) networks](https://arxiv.org//abs/2509.15370) ++ [Adversarial generalization of unfolding (model-based) networks](https://arxiv.org/abs/2509.15370) Vicky Kouni -+ [Assessing metadata privacy in neuroimaging](https://arxiv.org//abs/2509.15278) ++ [Assessing metadata privacy in neuroimaging](https://arxiv.org/abs/2509.15278) Emilie Kibsgaard, Anita Sue Jwa, Christopher J Markiewicz, David Rodriguez Gonzalez, Judith Sainz Pardo, Russell A. Poldrack, Cyril R. Pernet -+ [Benchmarking and Improving LLM Robustness for Personalized Generation](https://arxiv.org//abs/2509.19358) ++ [Benchmarking and Improving LLM Robustness for Personalized Generation](https://arxiv.org/abs/2509.19358) Chimaobi Okite, Naihao Deng, Kiran Bodipati, Huaidian Hou, Joyce Chai, Rada Mihalcea -+ [Semantic Representation Attack against Aligned Large Language Models](https://arxiv.org//abs/2509.19360) ++ [Semantic Representation Attack against Aligned Large Language Models](https://arxiv.org/abs/2509.19360) Jiawei Lian, Jianhong Pan, Lefan Wang, Yi Wang, Shaohui Mei, Lap-Pui Chau # 2025-09-17 -+ [DSCC-HS: A Dynamic Self-Reinforcing Framework for Hallucination Suppression in Large Language Models](https://arxiv.org//abs/2509.13702) ++ [DSCC-HS: A Dynamic Self-Reinforcing Framework for Hallucination Suppression in Large Language Models](https://arxiv.org/abs/2509.13702) Xiao Zheng -+ [Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning](https://arxiv.org//abs/2509.13755) ++ [Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning](https://arxiv.org/abs/2509.13755) Zhaoyang Chu, Yao Wan, Zhikun Zhang, Di Wang, Zhou Yang, Hongyu Zhang, Pan Zhou, Xuanhua Shi, Hai Jin, David Lo -+ [Differential Privacy in Federated Learning: Mitigating Inference Attacks with Randomized Response](https://arxiv.org//abs/2509.13987) ++ [Differential Privacy in Federated Learning: Mitigating Inference Attacks with Randomized Response](https://arxiv.org/abs/2509.13987) Ozer Ozturk, Busra Buyuktanir, Gozde Karatas Baydogmus, Kazim Yildiz -+ [Privacy-Aware In-Context Learning for Large Language Models](https://arxiv.org//abs/2509.13625) ++ [Privacy-Aware In-Context Learning for Large Language Models](https://arxiv.org/abs/2509.13625) Bishnu Bhusal, Manoj Acharya, Ramneet Kaur, Colin Samplawski, Anirban Roy, Adam D. Cobb, Rohit Chadha, Susmit Jha -+ [StyleProtect: Safeguarding Artistic Identity in Fine-tuned Diffusion Models](https://arxiv.org//abs/2509.13711) ++ [StyleProtect: Safeguarding Artistic Identity in Fine-tuned Diffusion Models](https://arxiv.org/abs/2509.13711) Qiuyu Tang, Joshua Krinsky, Aparna Bharati -+ [Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification](https://arxiv.org//abs/2509.13922) ++ [Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification](https://arxiv.org/abs/2509.13922) Wenkui Yang, Jie Cao, Junxian Duan, Ran He -+ [Is GPT-4o mini Blinded by its Own Safety Filters? Exposing the Multimodal-to-Unimodal Bottleneck in Hate Speech Detection](https://arxiv.org//abs/2509.13608) ++ [Is GPT-4o mini Blinded by its Own Safety Filters? Exposing the Multimodal-to-Unimodal Bottleneck in Hate Speech Detection](https://arxiv.org/abs/2509.13608) Niruthiha Selvanayagam, Ted Kurti -+ [Secure UAV-assisted Federated Learning: A Digital Twin-Driven Approach with Zero-Knowledge Proofs](https://arxiv.org//abs/2509.13634) ++ [Secure UAV-assisted Federated Learning: A Digital Twin-Driven Approach with Zero-Knowledge Proofs](https://arxiv.org/abs/2509.13634) Md Bokhtiar Al Zami, Md Raihan Uddin, Dinh C. Nguyen -+ [ParaAegis: Parallel Protection for Flexible Privacy-preserved Federated Learning](https://arxiv.org//abs/2509.13739) ++ [ParaAegis: Parallel Protection for Flexible Privacy-preserved Federated Learning](https://arxiv.org/abs/2509.13739) Zihou Wu (1), Yuecheng Li (1), Tianchi Liao (2), Jian Lou (2), Chuan Chen (1) ((1) School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China (2) School of Software Engineering, Sun Yat-sen University, Zhuhai, China) -+ [Differentially private federated learning for localized control of infectious disease dynamics](https://arxiv.org//abs/2509.14024) ++ [Differentially private federated learning for localized control of infectious disease dynamics](https://arxiv.org/abs/2509.14024) Raouf Kerkouche, Henrik Zunker, Mario Fritz, Martin J. Kühn -+ [Defending Diffusion Models Against Membership Inference Attacks via Higher-Order Langevin Dynamics](https://arxiv.org//abs/2509.14225) ++ [Defending Diffusion Models Against Membership Inference Attacks via Higher-Order Langevin Dynamics](https://arxiv.org/abs/2509.14225) Benjamin Sterling, Yousef El-Laham, Mónica F. Bugallo -+ [Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds](https://arxiv.org//abs/2509.13628) ++ [Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds](https://arxiv.org/abs/2509.13628) Mert Gürbüzbalaban, Yasa Syed, Necdet Serhat Aybat -+ [Who Taught the Lie? Responsibility Attribution for Poisoned Knowledge in Retrieval-Augmented Generation](https://arxiv.org//abs/2509.13772) ++ [Who Taught the Lie? Responsibility Attribution for Poisoned Knowledge in Retrieval-Augmented Generation](https://arxiv.org/abs/2509.13772) Baolei Zhang, Haoran Xin, Yuxi Chen, Zhuqing Liu, Biao Yi, Tong Li, Lihai Nie, Zheli Liu, Minghong Fang -+ [Cybersecurity AI: Humanoid Robots as Attack Vectors](https://arxiv.org//abs/2509.14139) ++ [Cybersecurity AI: Humanoid Robots as Attack Vectors](https://arxiv.org/abs/2509.14139) Víctor Mayoral-Vilches -+ [VCBench: Benchmarking LLMs in Venture Capital](https://arxiv.org//abs/2509.14448) ++ [VCBench: Benchmarking LLMs in Venture Capital](https://arxiv.org/abs/2509.14448) Rick Chen, Joseph Ternasky, Afriyie Samuel Kwesi, Ben Griffin, Aaron Ontoyin Yin, Zakari Salifu, Kelvin Amoaba, Xianling Mu, Fuat Alican, Yigit Ihlamur -+ [A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness](https://arxiv.org//abs/2509.14297) ++ [A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness](https://arxiv.org/abs/2509.14297) Xuan Luo, Yue Wang, Zefeng He, Geng Tu, Jing Li, Ruifeng Xu -+ [RLBind: Adversarial-Invariant Cross-Modal Alignment for Unified Robust Embeddings](https://arxiv.org//abs/2509.14383) ++ [RLBind: Adversarial-Invariant Cross-Modal Alignment for Unified Robust Embeddings](https://arxiv.org/abs/2509.14383) Yuhong Lu # 2025-09-16 -+ [Forget What's Sensitive, Remember What Matters: Token-Level Differential Privacy in Memory Sculpting for Continual Learning](https://arxiv.org//abs/2509.12958) ++ [Forget What's Sensitive, Remember What Matters: Token-Level Differential Privacy in Memory Sculpting for Continual Learning](https://arxiv.org/abs/2509.12958) Bihao Zhan, Jie Zhou, Junsong Li, Yutao Yang, Shilian Chen, Qianjun Pan, Xin Li, Wen Wu, Xingjiao Wu, Qin Chen, Hang Yan, Liang He -+ [RepIt: Representing Isolated Targets to Steer Language Models](https://arxiv.org//abs/2509.13281) ++ [RepIt: Representing Isolated Targets to Steer Language Models](https://arxiv.org/abs/2509.13281) Vincent Siu, Nathan W. Henry, Nicholas Crispino, Yang Liu, Dawn Song, Chenguang Wang -+ [DisorientLiDAR: Physical Attacks on LiDAR-based Localization](https://arxiv.org//abs/2509.12595) ++ [DisorientLiDAR: Physical Attacks on LiDAR-based Localization](https://arxiv.org/abs/2509.12595) Yizhen Lao, Yu Zhang, Ziting Wang, Chengbo Wang, Yifei Xue, Wanpeng Shao -+ [CIARD: Cyclic Iterative Adversarial Robustness Distillation](https://arxiv.org//abs/2509.12633) ++ [CIARD: Cyclic Iterative Adversarial Robustness Distillation](https://arxiv.org/abs/2509.12633) Liming Lu, Shuchao Pang, Xu Zheng, Xiang Gu, Anan Du, Yunhuai Liu, Yongbin Zhou -+ [A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs](https://arxiv.org//abs/2509.12649) ++ [A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs](https://arxiv.org/abs/2509.12649) Kiho Lee, Jungkon Kim, Doowon Kim, Hyoungshick Kim -+ [Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations](https://arxiv.org//abs/2509.12653) ++ [Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations](https://arxiv.org/abs/2509.12653) Jinjie Shen, Yaxiong Wang, Lechao Cheng, Nan Pu, Zhun Zhong -+ [Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models](https://arxiv.org//abs/2509.12724) ++ [Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models](https://arxiv.org/abs/2509.12724) Yunhan Zhao, Xiang Zheng, Xingjun Ma -+ [Jailbreaking Large Language Models Through Content Concretization](https://arxiv.org//abs/2509.12937) ++ [Jailbreaking Large Language Models Through Content Concretization](https://arxiv.org/abs/2509.12937) Johan Wahréus, Ahmed Hussain, Panos Papadimitratos -+ [Sy-FAR: Symmetry-based Fair Adversarial Robustness](https://arxiv.org//abs/2509.12939) ++ [Sy-FAR: Symmetry-based Fair Adversarial Robustness](https://arxiv.org/abs/2509.12939) Haneen Najjar, Eyal Ronen, Mahmood Sharif -+ [MIA-EPT: Membership Inference Attack via Error Prediction for Tabular Data](https://arxiv.org//abs/2509.13046) ++ [MIA-EPT: Membership Inference Attack via Error Prediction for Tabular Data](https://arxiv.org/abs/2509.13046) Eyal German, Daniel Samira, Yuval Elovici, Asaf Shabtai -+ [JANUS: A Dual-Constraint Generative Framework for Stealthy Node Injection Attacks](https://arxiv.org//abs/2509.13266) ++ [JANUS: A Dual-Constraint Generative Framework for Stealthy Node Injection Attacks](https://arxiv.org/abs/2509.13266) Jiahao Zhang, Xiaobing Pei, Zhaokun Zhong, Wenqiang Hao, Zhenghao Tang -+ [Towards Inclusive Toxic Content Moderation: Addressing Vulnerabilities to Adversarial Attacks in Toxicity Classifiers Tackling LLM-generated Content](https://arxiv.org//abs/2509.12672) ++ [Towards Inclusive Toxic Content Moderation: Addressing Vulnerabilities to Adversarial Attacks in Toxicity Classifiers Tackling LLM-generated Content](https://arxiv.org/abs/2509.12672) Shaz Furniturewala, Arkaitz Zubiaga -+ [Empowering LLMs with Parameterized Skills for Adversarial Long-Horizon Planning](https://arxiv.org//abs/2509.13127) ++ [Empowering LLMs with Parameterized Skills for Adversarial Long-Horizon Planning](https://arxiv.org/abs/2509.13127) Sijia Cui, Shuai Xu, Aiyao He, Yanna Wang, Bo Xu -+ [Do Natural Language Descriptions of Model Activations Convey Privileged Information?](https://arxiv.org//abs/2509.13316) ++ [Do Natural Language Descriptions of Model Activations Convey Privileged Information?](https://arxiv.org/abs/2509.13316) Millicent Li, Alberto Mario Ceballos Arroyo, Giordano Rogers, Naomi Saphra, Byron C. Wallace -+ [When Inverse Data Outperforms: Exploring the Pitfalls of Mixed Data in Multi-Stage Fine-Tuning](https://arxiv.org//abs/2509.13079) ++ [When Inverse Data Outperforms: Exploring the Pitfalls of Mixed Data in Multi-Stage Fine-Tuning](https://arxiv.org/abs/2509.13079) Mengyi Deng, Xin Li, Tingyu Zhu, Zhicheng Yang, Zhijiang Guo, Wei Wang -+ [Agent4FaceForgery: Multi-Agent LLM Framework for Realistic Face Forgery Detection](https://arxiv.org//abs/2509.12546) ++ [Agent4FaceForgery: Multi-Agent LLM Framework for Realistic Face Forgery Detection](https://arxiv.org/abs/2509.12546) Yingxin Lai, Zitong Yu, Jun Wang, Linlin Shen, Yong Xu, Xiaochun Cao -+ [End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection](https://arxiv.org//abs/2509.13214) ++ [End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection](https://arxiv.org/abs/2509.13214) Fei Wang, Xuecheng Wu, Zheng Zhang, Danlei Huang, Yuheng Huang, BoWang -+ [Selective Risk Certification for LLM Outputs via Information-Lift Statistics: PAC-Bayes, Robustness, and Skeleton Design](https://arxiv.org//abs/2509.12527) ++ [Selective Risk Certification for LLM Outputs via Information-Lift Statistics: PAC-Bayes, Robustness, and Skeleton Design](https://arxiv.org/abs/2509.12527) Sanjeda Akter, Ibne Farabi Shihab, Anuj Sharma -+ [BAPFL: Exploring Backdoor Attacks Against Prototype-based Federated Learning](https://arxiv.org//abs/2509.12964) ++ [BAPFL: Exploring Backdoor Attacks Against Prototype-based Federated Learning](https://arxiv.org/abs/2509.12964) Honghong Zeng, Jiong Lou, Zhe Wang, Hefeng Zhou, Chentao Wu, Wei Zhao, Jie Li -+ [On the Out-of-Distribution Backdoor Attack for Federated Learning](https://arxiv.org//abs/2509.13219) ++ [On the Out-of-Distribution Backdoor Attack for Federated Learning](https://arxiv.org/abs/2509.13219) Jiahao Xu, Zikai Zhang, Rui Hu -+ [EByFTVeS: Efficient Byzantine Fault Tolerant-based Verifiable Secret-sharing in Distributed Privacy-preserving Machine Learning](https://arxiv.org//abs/2509.12899) ++ [EByFTVeS: Efficient Byzantine Fault Tolerant-based Verifiable Secret-sharing in Distributed Privacy-preserving Machine Learning](https://arxiv.org/abs/2509.12899) Zhen Li, Zijian Zhang, Wenjin Yang, Pengbo Wang, Zhaoqi Wang, Meng Li, Yan Wu, Xuyang Liu, Jing Sun, Liehuang Zhu -+ [Bridging Threat Models and Detections: Formal Verification via CADP](https://arxiv.org//abs/2509.13035) ++ [Bridging Threat Models and Detections: Formal Verification via CADP](https://arxiv.org/abs/2509.13035) Dumitru-Bogdan Prelipcean (Bitdefender, Iaşi, Romania, Alexandru Ioan Cuza University, Iasi, Romania, LACL, Universite Paris-Est Creteil, France), Cătălin Dima (LACL, Université Paris-Est Crétéil, France) -+ [Adversarial Appearance Learning in Augmented Cityscapes for Pedestrian Recognition in Autonomous Driving](https://arxiv.org//abs/2509.13507) ++ [Adversarial Appearance Learning in Augmented Cityscapes for Pedestrian Recognition in Autonomous Driving](https://arxiv.org/abs/2509.13507) Artem Savkin, Thomas Lapotre, Kevin Strauss, Uzair Akbar, Federico Tombari -+ [Valuation of Exotic Options and Counterparty Games Based on Conditional Diffusion](https://arxiv.org//abs/2509.13374) ++ [Valuation of Exotic Options and Counterparty Games Based on Conditional Diffusion](https://arxiv.org/abs/2509.13374) Helin Zhao, Junchi Shen -+ [AQUA-LLM: Evaluating Accuracy, Quantization, and Adversarial Robustness Trade-offs in LLMs for Cybersecurity Question Answering](https://arxiv.org//abs/2509.13514) ++ [AQUA-LLM: Evaluating Accuracy, Quantization, and Adversarial Robustness Trade-offs in LLMs for Cybersecurity Question Answering](https://arxiv.org/abs/2509.13514) Onat Gungor, Roshan Sood, Harold Wang, Tajana Rosing -+ [FedMentor: Domain-Aware Differential Privacy for Heterogeneous Federated LLMs in Mental Health](https://arxiv.org//abs/2509.14275) ++ [FedMentor: Domain-Aware Differential Privacy for Heterogeneous Federated LLMs in Mental Health](https://arxiv.org/abs/2509.14275) Nobin Sarwar, Shubhashis Roy Dipta -+ [Beyond Data Privacy: New Privacy Risks for Large Language Models](https://arxiv.org//abs/2509.14278) ++ [Beyond Data Privacy: New Privacy Risks for Large Language Models](https://arxiv.org/abs/2509.14278) Yuntao Du, Zitao Li, Ninghui Li, Bolin Ding -+ [The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration](https://arxiv.org//abs/2509.14284) ++ [The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration](https://arxiv.org/abs/2509.14284) Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal -+ [A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks](https://arxiv.org//abs/2509.14285) ++ [A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks](https://arxiv.org/abs/2509.14285) S M Asif Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, M. F. Mridha, Jungpil Shin -+ [Towards mitigating information leakage when evaluating safety monitors](https://arxiv.org//abs/2509.21344) ++ [Towards mitigating information leakage when evaluating safety monitors](https://arxiv.org/abs/2509.21344) Gerard Boxo, Aman Neelappa, Shivam Raval -+ [SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs](https://arxiv.org//abs/2509.13450) ++ [SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs](https://arxiv.org/abs/2509.13450) Vincent Siu, Nicholas Crispino, David Park, Nathan W. Henry, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang # 2025-09-15 -+ [When Safe Unimodal Inputs Collide: Optimizing Reasoning Chains for Cross-Modal Safety in Multimodal Large Language Models](https://arxiv.org//abs/2509.12060) ++ [When Safe Unimodal Inputs Collide: Optimizing Reasoning Chains for Cross-Modal Safety in Multimodal Large Language Models](https://arxiv.org/abs/2509.12060) Wei Cai, Shujuan Liu, Jian Zhao, Ziyan Shi, Yusheng Zhao, Yuchen Yuan, Tianle Zhang, Chi Zhang, Xuelong Li -+ [Inducing Uncertainty for Test-Time Privacy](https://arxiv.org//abs/2509.11625) ++ [Inducing Uncertainty for Test-Time Privacy](https://arxiv.org/abs/2509.11625) Muhammad H. Ashiq, Peter Triantafillou, Hung Yun Tseng, Grigoris G. Chrysos -+ [Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check](https://arxiv.org//abs/2509.11629) ++ [Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check](https://arxiv.org/abs/2509.11629) Chentao Cao, Xiaojun Xu, Bo Han, Hang Li -+ [Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning](https://arxiv.org//abs/2509.11816) ++ [Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning](https://arxiv.org/abs/2509.11816) Filip Sondej, Yushi Yang -+ [Probabilistic Robustness Analysis in High Dimensional Space: Application to Semantic Segmentation Network](https://arxiv.org//abs/2509.11838) ++ [Probabilistic Robustness Analysis in High Dimensional Space: Application to Semantic Segmentation Network](https://arxiv.org/abs/2509.11838) Navid Hashemi, Samuel Sasaki, Diego Manzanas Lopez, Ipek Oguz, Meiyi Ma, Taylor T. Johnson -+ [Time-Constrained Intelligent Adversaries for Automation Vulnerability Testing: A Multi-Robot Patrol Case Study](https://arxiv.org//abs/2509.11971) ++ [Time-Constrained Intelligent Adversaries for Automation Vulnerability Testing: A Multi-Robot Patrol Case Study](https://arxiv.org/abs/2509.11971) James C. Ward, Alex Bott, Connor York, Edmund R. Hunt -+ [Poison to Detect: Detection of Targeted Overfitting in Federated Learning](https://arxiv.org//abs/2509.11974) ++ [Poison to Detect: Detection of Targeted Overfitting in Federated Learning](https://arxiv.org/abs/2509.11974) Soumia Zohra El Mestari, Maciej Krzysztof Zuziak, Gabriele Lenzini -+ [Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference](https://arxiv.org//abs/2509.12152) ++ [Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference](https://arxiv.org/abs/2509.12152) Synthia Wang, Sai Teja Peddinti, Nina Taft, Nick Feamster -+ [A Controllable 3D Deepfake Generation Framework with Gaussian Splatting](https://arxiv.org//abs/2509.11624) ++ [A Controllable 3D Deepfake Generation Framework with Gaussian Splatting](https://arxiv.org/abs/2509.11624) Wending Liu, Siyun Liang, Huy H. Nguyen, Isao Echizen -+ [Robust Concept Erasure in Diffusion Models: A Theoretical Perspective on Security and Robustness](https://arxiv.org//abs/2509.12024) ++ [Robust Concept Erasure in Diffusion Models: A Theoretical Perspective on Security and Robustness](https://arxiv.org/abs/2509.12024) Zixuan Fu, Yan Ren, Finn Carter, Chenyue Wen, Le Ku, Daheng Yu, Emily Davis, Bo Zhang -+ [DRAG: Data Reconstruction Attack using Guided Diffusion](https://arxiv.org//abs/2509.11724) ++ [DRAG: Data Reconstruction Attack using Guided Diffusion](https://arxiv.org/abs/2509.11724) Wa-Kin Lei, Jun-Cheng Chen, Shang-Tse Chen -+ [DARD: Dice Adversarial Robustness Distillation against Adversarial Attacks](https://arxiv.org//abs/2509.11525) ++ [DARD: Dice Adversarial Robustness Distillation against Adversarial Attacks](https://arxiv.org/abs/2509.11525) Jing Zou, Shungeng Zhang, Meikang Qiu, Chong Li -+ [From Autoencoders to CycleGAN: Robust Unpaired Face Manipulation via Adversarial Learning](https://arxiv.org//abs/2509.12176) ++ [From Autoencoders to CycleGAN: Robust Unpaired Face Manipulation via Adversarial Learning](https://arxiv.org/abs/2509.12176) Collin Guo -+ [Removal Attack and Defense on AI-generated Content Latent-based Watermarking](https://arxiv.org//abs/2509.11745) ++ [Removal Attack and Defense on AI-generated Content Latent-based Watermarking](https://arxiv.org/abs/2509.11745) De Zhang Lee, Han Fang, Hanyi Wang, Ee-Chien Chang -+ [A Practical Adversarial Attack against Sequence-based Deep Learning Malware Classifiers](https://arxiv.org//abs/2509.11836) ++ [A Practical Adversarial Attack against Sequence-based Deep Learning Malware Classifiers](https://arxiv.org/abs/2509.11836) Kai Tan, Dongyang Zhan, Lin Ye, Hongli Zhang, Binxing Fang -+ [NeuroStrike: Neuron-Level Attacks on Aligned LLMs](https://arxiv.org//abs/2509.11864) ++ [NeuroStrike: Neuron-Level Attacks on Aligned LLMs](https://arxiv.org/abs/2509.11864) Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, Maximilian Thang, Stjepan Picek, Ahmad-Reza Sadeghi -+ [Efficient Byzantine-Robust Privacy-Preserving Federated Learning via Dimension Compression](https://arxiv.org//abs/2509.11870) ++ [Efficient Byzantine-Robust Privacy-Preserving Federated Learning via Dimension Compression](https://arxiv.org/abs/2509.11870) Xian Qin, Xue Yang, Xiaohu Tang -+ [MORABLES: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables](https://arxiv.org//abs/2509.12371) ++ [MORABLES: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables](https://arxiv.org/abs/2509.12371) Matteo Marcuzzo, Alessandro Zangari, Andrea Albarelli, Jose Camacho-Collados, Mohammad Taher Pilehvar -+ [Geometric Red-Teaming for Robotic Manipulation](https://arxiv.org//abs/2509.12379) ++ [Geometric Red-Teaming for Robotic Manipulation](https://arxiv.org/abs/2509.12379) Divyam Goel, Yufei Wang, Tiancheng Wu, Guixiu Qiao, Pavel Piliptchak, David Held, Zackory Erickson -+ [Amulet: a Python Library for Assessing Interactions Among ML Defenses and Risks](https://arxiv.org//abs/2509.12386) ++ [Amulet: a Python Library for Assessing Interactions Among ML Defenses and Risks](https://arxiv.org/abs/2509.12386) Asim Waheed, Vasisht Duddu, Rui Zhang, Sebastian Szyller, N. Asokan -+ [Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time](https://arxiv.org//abs/2509.12521) ++ [Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time](https://arxiv.org/abs/2509.12521) Yifan Lan, Yuanpu Cao, Weitong Zhang, Lu Lin, Jinghui Chen -+ [Secure Human Oversight of AI: Exploring the Attack Surface of Human Oversight](https://arxiv.org//abs/2509.12290) ++ [Secure Human Oversight of AI: Exploring the Attack Surface of Human Oversight](https://arxiv.org/abs/2509.12290) Jonas C. Ditz, Veronika Lazar, Elmar Lichtmeß, Carola Plesch, Matthias Heck, Kevin Baum, Markus Langer -+ [Redefining Website Fingerprinting Attacks With Multiagent LLMs](https://arxiv.org//abs/2509.12462) ++ [Redefining Website Fingerprinting Attacks With Multiagent LLMs](https://arxiv.org/abs/2509.12462) Chuxu Song, Dheekshith Dev Manohar Mekala, Hao Wang, Richard Martin -+ [Early Approaches to Adversarial Fine-Tuning for Prompt Injection Defense: A 2022 Study of GPT-3 and Contemporary Models](https://arxiv.org//abs/2509.14271) ++ [Early Approaches to Adversarial Fine-Tuning for Prompt Injection Defense: A 2022 Study of GPT-3 and Contemporary Models](https://arxiv.org/abs/2509.14271) Gustavo Sandoval, Denys Fenchenko, Junyao Chen # 2025-09-14 -+ [Free-MAD: Consensus-Free Multi-Agent Debate](https://arxiv.org//abs/2509.11035) ++ [Free-MAD: Consensus-Free Multi-Agent Debate](https://arxiv.org/abs/2509.11035) Yu Cui, Hang Fu, Haibin Zhang, Licheng Wang, Cong Zuo -+ [Membership Inference Attacks on Recommender System: A Survey](https://arxiv.org//abs/2509.11080) ++ [Membership Inference Attacks on Recommender System: A Survey](https://arxiv.org/abs/2509.11080) Jiajie He, Yuechun Gu, Keke Chen, Xintong Chen -+ [ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs](https://arxiv.org//abs/2509.11128) ++ [ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs](https://arxiv.org/abs/2509.11128) Yibo Zhang, Liang Lin -+ [Feature Space Topology Control via Hopkins Loss](https://arxiv.org//abs/2509.11154) ++ [Feature Space Topology Control via Hopkins Loss](https://arxiv.org/abs/2509.11154) Einari Vaaras, Manu Airaksinen -+ [Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers](https://arxiv.org//abs/2509.11173) ++ [Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers](https://arxiv.org/abs/2509.11173) Simin Chen, Jinjun Peng, Yixin He, Junfeng Yang, Baishakhi Ray -+ [From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming](https://arxiv.org//abs/2509.11398) ++ [From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming](https://arxiv.org/abs/2509.11398) Anusha Sinha, Keltin Grimes, James Lucassen, Michael Feffer, Nathan VanHoudnos, Zhiwei Steven Wu, Hoda Heidari -+ [When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity](https://arxiv.org//abs/2509.11141) ++ [When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity](https://arxiv.org/abs/2509.11141) Shiyao Cui, Xijia Feng, Yingkang Wang, Junxiao Yang, Zhexin Zhang, Biplab Sikdar, Hongning Wang, Han Qiu, Minlie Huang -+ [RanAT4BIE: Random Adversarial Training for Biomedical Information Extraction](https://arxiv.org//abs/2509.11191) ++ [RanAT4BIE: Random Adversarial Training for Biomedical Information Extraction](https://arxiv.org/abs/2509.11191) Jian Chen, Shengyi Lv, Leilei Su -+ [Beyond Sliders: Mastering the Art of Diffusion-based Image Manipulation](https://arxiv.org//abs/2509.11213) ++ [Beyond Sliders: Mastering the Art of Diffusion-based Image Manipulation](https://arxiv.org/abs/2509.11213) Yufei Tang, Daiheng Gao, Pingyu Wu, Wenbo Zhou, Bang Zhang, Weiming Zhang -+ [ANROT-HELANet: Adverserially and Naturally Robust Attention-Based Aggregation Network via The Hellinger Distance for Few-Shot Classification](https://arxiv.org//abs/2509.11220) ++ [ANROT-HELANet: Adverserially and Naturally Robust Attention-Based Aggregation Network via The Hellinger Distance for Few-Shot Classification](https://arxiv.org/abs/2509.11220) Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N.Duong -+ [Realistic Environmental Injection Attacks on GUI Agents](https://arxiv.org//abs/2509.11250) ++ [Realistic Environmental Injection Attacks on GUI Agents](https://arxiv.org/abs/2509.11250) Yitong Zhang, Ximo Li, Liyi Cai, Jia Li -+ [Stabilizing Data-Free Model Extraction](https://arxiv.org//abs/2509.11159) ++ [Stabilizing Data-Free Model Extraction](https://arxiv.org/abs/2509.11159) Dat-Thinh Nguyen, Kim-Hung Le, Nhien-An Le-Khac -+ [On the Escaping Efficiency of Distributed Adversarial Training Algorithms](https://arxiv.org//abs/2509.11337) ++ [On the Escaping Efficiency of Distributed Adversarial Training Algorithms](https://arxiv.org/abs/2509.11337) Ying Cao, Kun Yuan, Ali H. Sayed -+ [SoK: How Sensor Attacks Disrupt Autonomous Vehicles: An End-to-end Analysis, Challenges, and Missed Threats](https://arxiv.org//abs/2509.11120) ++ [SoK: How Sensor Attacks Disrupt Autonomous Vehicles: An End-to-end Analysis, Challenges, and Missed Threats](https://arxiv.org/abs/2509.11120) Qingzhao Zhang, Shaocheng Luo, Z. Morley Mao, Miroslav Pajic, Michael K. Reiter -+ [DMLDroid: Deep Multimodal Fusion Framework for Android Malware Detection with Resilience to Code Obfuscation and Adversarial Perturbations](https://arxiv.org//abs/2509.11187) ++ [DMLDroid: Deep Multimodal Fusion Framework for Android Malware Detection with Resilience to Code Obfuscation and Adversarial Perturbations](https://arxiv.org/abs/2509.11187) Doan Minh Trung, Tien Duc Anh Hao, Luong Hoang Minh, Nghi Hoang Khoa, Nguyen Tan Cam, Van-Hau Pham, Phan The Duy -+ [Make Identity Unextractable yet Perceptible: Synthesis-Based Privacy Protection for Subject Faces in Photos](https://arxiv.org//abs/2509.11249) ++ [Make Identity Unextractable yet Perceptible: Synthesis-Based Privacy Protection for Subject Faces in Photos](https://arxiv.org/abs/2509.11249) Tao Wang, Yushu Zhang, Xiangli Xiao, Kun Xu, Lin Yuan, Wenying Wen, Yuming Fang -+ [MAUI: Reconstructing Private Client Data in Federated Transfer Learning](https://arxiv.org//abs/2509.11451) ++ [MAUI: Reconstructing Private Client Data in Federated Transfer Learning](https://arxiv.org/abs/2509.11451) Ahaan Dabholkar, Atul Sharma, Z. Berkay Celik, Saurabh Bagchi -+ [Pulse-to-Circuit Characterization of Stealthy Crosstalk Attack on Multi-Tenant Superconducting Quantum Hardware](https://arxiv.org//abs/2509.11407) ++ [Pulse-to-Circuit Characterization of Stealthy Crosstalk Attack on Multi-Tenant Superconducting Quantum Hardware](https://arxiv.org/abs/2509.11407) Syed Emad Uddin Shubha, Tasnuva Farheen -+ [Hybrid Quantum-Classical Model for Image Classification](https://arxiv.org//abs/2509.13353) ++ [Hybrid Quantum-Classical Model for Image Classification](https://arxiv.org/abs/2509.13353) Muhammad Adnan Shahzad -+ [Self-Evolving LLMs via Continual Instruction Tuning](https://arxiv.org//abs/2509.18133) ++ [Self-Evolving LLMs via Continual Instruction Tuning](https://arxiv.org/abs/2509.18133) Jiazheng Kang, Le Huang, Cheng Hou, Zhe Zhao, Zhenxiang Yan, Chuan Shi, Ting Bai -+ [Pathological Truth Bias in Vision-Language Models](https://arxiv.org//abs/2509.22674) ++ [Pathological Truth Bias in Vision-Language Models](https://arxiv.org/abs/2509.22674) Yash Thube # 2025-09-13 -+ [Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding](https://arxiv.org//abs/2509.10931) ++ [Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding](https://arxiv.org/abs/2509.10931) Seongho Joo, Hyukhun Koh, Kyomin Jung -+ [Public Data Assisted Differentially Private In-Context Learning](https://arxiv.org//abs/2509.10932) ++ [Public Data Assisted Differentially Private In-Context Learning](https://arxiv.org/abs/2509.10932) Seongho Joo, Hyukhun Koh, Kyomin Jung -+ [Simulating Sinogram-Domain Motion and Correcting Image-Domain Artifacts Using Deep Learning in HR-pQCT Bone Imaging](https://arxiv.org//abs/2509.10961) ++ [Simulating Sinogram-Domain Motion and Correcting Image-Domain Artifacts Using Deep Learning in HR-pQCT Bone Imaging](https://arxiv.org/abs/2509.10961) Farhan Sadik, Christopher L. Newman, Stuart J. Warden, Rachel K. Surowiec -+ [Robustifying Diffusion-Denoised Smoothing Against Covariate Shift](https://arxiv.org//abs/2509.10913) ++ [Robustifying Diffusion-Denoised Smoothing Against Covariate Shift](https://arxiv.org/abs/2509.10913) Ali Hedayatnia, Mostafa Tavassolipour, Babak Nadjar Araabi, Abdol-Hossein Vahabie -+ [A Modern Look at Simplicity Bias in Image Classification Tasks](https://arxiv.org//abs/2509.12265) ++ [A Modern Look at Simplicity Bias in Image Classification Tasks](https://arxiv.org/abs/2509.12265) Xiaoguang Chang, Teng Wang, Changyin Sun @@ -3020,173 +3020,173 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Meiyin Meng, Zaixi Zhang # 2025-09-12 -+ [GAMA: A General Anonymizing Multi-Agent System for Privacy Preservation Enhanced by Domain Rules and Disproof Method](https://arxiv.org//abs/2509.10018) ++ [GAMA: A General Anonymizing Multi-Agent System for Privacy Preservation Enhanced by Domain Rules and Disproof Method](https://arxiv.org/abs/2509.10018) Hailong Yang, Renhuo Zhao, Guanjin Wang, Zhaohong Deng -+ [Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge](https://arxiv.org//abs/2509.09955) ++ [Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge](https://arxiv.org/abs/2509.09955) Omar Erak, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis, Sami Muhaidat -+ [Adversarial robustness through Lipschitz-Guided Stochastic Depth in Neural Networks](https://arxiv.org//abs/2509.10298) ++ [Adversarial robustness through Lipschitz-Guided Stochastic Depth in Neural Networks](https://arxiv.org/abs/2509.10298) Laith Nayal, Mahmoud Mousatat, Bader Rasheed -+ [Immunizing Images from Text to Image Editing via Adversarial Cross-Attention](https://arxiv.org//abs/2509.10359) ++ [Immunizing Images from Text to Image Editing via Adversarial Cross-Attention](https://arxiv.org/abs/2509.10359) Matteo Trippodo, Federico Becattini, Lorenzo Seidenari -+ [FedRP: A Communication-Efficient Approach for Differentially Private Federated Learning Using Random Projection](https://arxiv.org//abs/2509.10041) ++ [FedRP: A Communication-Efficient Approach for Differentially Private Federated Learning Using Random Projection](https://arxiv.org/abs/2509.10041) Mohammad Hasan Narimani, Mostafa Tavassolipour -+ [Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications](https://arxiv.org//abs/2509.10248) ++ [Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications](https://arxiv.org/abs/2509.10248) Janis Keuper -+ [When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review](https://arxiv.org//abs/2509.09912) ++ [When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review](https://arxiv.org/abs/2509.09912) Changjia Zhu, Junjie Xiong, Renkai Ma, Zhicong Lu, Yao Liu, Lingyao Li -+ [Machine Unlearning for Responsible and Adaptive AI in Education](https://arxiv.org//abs/2509.10590) ++ [Machine Unlearning for Responsible and Adaptive AI in Education](https://arxiv.org/abs/2509.10590) Betty Mayeku, Sandra Hummel, Parisa Memarmoshrefi -+ [LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems](https://arxiv.org//abs/2509.10682) ++ [LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems](https://arxiv.org/abs/2509.10682) Vitor Hugo Galhardo Moia, Igor Jochem Sanz, Gabriel Antonio Fontes Rebello, Rodrigo Duarte de Meneses, Briland Hitaj, Ulf Lindqvist -+ [Privacy-Preserving Decentralized Federated Learning via Explainable Adaptive Differential Privacy](https://arxiv.org//abs/2509.10691) ++ [Privacy-Preserving Decentralized Federated Learning via Explainable Adaptive Differential Privacy](https://arxiv.org/abs/2509.10691) Fardin Jalil Piran, Zhiling Chen, Yang Zhang, Qianyu Zhou, Jiong Tang, Farhad Imani -+ [Dark Patterns Meet GUI Agents: LLM Agent Susceptibility to Manipulative Interfaces and the Role of Human Oversight](https://arxiv.org//abs/2509.10723) ++ [Dark Patterns Meet GUI Agents: LLM Agent Susceptibility to Manipulative Interfaces and the Role of Human Oversight](https://arxiv.org/abs/2509.10723) Jingyu Tang, Chaoran Chen, Jiawen Li, Zhiping Zhang, Bingcan Guo, Ibrahim Khalilov, Simret Araya Gebreegziabher, Bingsheng Yao, Dakuo Wang, Yanfang Ye, Tianshi Li, Ziang Xiao, Yaxing Yao, Toby Jia-Jun Li -+ [Safety and Security Analysis of Large Language Models: Risk Profile and Harm Potential](https://arxiv.org//abs/2509.10655) ++ [Safety and Security Analysis of Large Language Models: Risk Profile and Harm Potential](https://arxiv.org/abs/2509.10655) Charankumar Akiri, Harrison Simpson, Kshitiz Aryal, Aarav Khanna, Maanak Gupta -+ [Side-channel Inference of User Activities in AR/VR Using GPU Profiling](https://arxiv.org//abs/2509.10703) ++ [Side-channel Inference of User Activities in AR/VR Using GPU Profiling](https://arxiv.org/abs/2509.10703) Seonghun Son, Chandrika Mukherjee, Reham Mohamed Aburas, Berk Gulmezoglu, Z. Berkay Celik -+ [JU-NLP at Touché: Covert Advertisement in Conversational AI-Generation and Detection Strategies](https://arxiv.org//abs/2509.14256) ++ [JU-NLP at Touché: Covert Advertisement in Conversational AI-Generation and Detection Strategies](https://arxiv.org/abs/2509.14256) Arka Dutta, Agrik Majumdar, Sombrata Biswas, Dipankar Das, Sivaji Bandyopadhyay # 2025-09-11 -+ [Enabling Regulatory Multi-Agent Collaboration: Architecture, Challenges, and Solutions](https://arxiv.org//abs/2509.09215) ++ [Enabling Regulatory Multi-Agent Collaboration: Architecture, Challenges, and Solutions](https://arxiv.org/abs/2509.09215) Qinnan Hu, Yuntao Wang, Yuan Gao, Zhou Su, Linkang Du -+ [Towards Confidential and Efficient LLM Inference with Dual Privacy Protection](https://arxiv.org//abs/2509.09091) ++ [Towards Confidential and Efficient LLM Inference with Dual Privacy Protection](https://arxiv.org/abs/2509.09091) Honglan Yu, Yibin Wang, Feifei Dai, Dong Liu, Haihui Fan, Xiaoyan Gu -+ [Character-Level Perturbations Disrupt LLM Watermarks](https://arxiv.org//abs/2509.09112) ++ [Character-Level Perturbations Disrupt LLM Watermarks](https://arxiv.org/abs/2509.09112) Zhaoxi Zhang, Xiaomei Zhang, Yanjun Zhang, He Zhang, Shirui Pan, Bo Liu, Asif Qumer Gill, Leo Yu Zhang -+ [Prompt Pirates Need a Map: Stealing Seeds helps Stealing Prompts](https://arxiv.org//abs/2509.09488) ++ [Prompt Pirates Need a Map: Stealing Seeds helps Stealing Prompts](https://arxiv.org/abs/2509.09488) Felix Mächtle, Ashwath Shetty, Jonas Sander, Nils Loose, Sören Pirk, Thomas Eisenbarth -+ [OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection](https://arxiv.org//abs/2509.09495) ++ [OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection](https://arxiv.org/abs/2509.09495) Victor Livernoche, Akshatha Arodi, Andreea Musulan, Zachary Yang, Adam Salvail, Gaétan Marceau Caron, Jean-François Godbout, Reihaneh Rabbany -+ [Steering MoE LLMs via Expert (De)Activation](https://arxiv.org//abs/2509.09660) ++ [Steering MoE LLMs via Expert (De)Activation](https://arxiv.org/abs/2509.09660) Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Ryan Rossi, Trung Bui, Hinrich Schütze, Nanyun Peng -+ [Balancing Utility and Privacy: Dynamically Private SGD with Random Projection](https://arxiv.org//abs/2509.09485) ++ [Balancing Utility and Privacy: Dynamically Private SGD with Random Projection](https://arxiv.org/abs/2509.09485) Zhanhong Jiang, Md Zahid Hasan, Nastaran Saadati, Aditya Balu, Chao Liu, Soumik Sarkar -+ [ProDiGy: Proximity- and Dissimilarity-Based Byzantine-Robust Federated Learning](https://arxiv.org//abs/2509.09534) ++ [ProDiGy: Proximity- and Dissimilarity-Based Byzantine-Robust Federated Learning](https://arxiv.org/abs/2509.09534) Sena Ergisi, Luis Maßny, Rawad Bitar -+ [Representation-Aware Distributionally Robust Optimization: A Knowledge Transfer Framework](https://arxiv.org//abs/2509.09371) ++ [Representation-Aware Distributionally Robust Optimization: A Knowledge Transfer Framework](https://arxiv.org/abs/2509.09371) Zitao Wang, Nian Si, Molei Liu -+ [ZORRO: Zero-Knowledge Robustness and Privacy for Split Learning (Full Version)](https://arxiv.org//abs/2509.09787) ++ [ZORRO: Zero-Knowledge Robustness and Privacy for Split Learning (Full Version)](https://arxiv.org/abs/2509.09787) Nojan Sheybani, Alessandro Pegoraro, Jonathan Knauer, Phillip Rieger, Elissa Mollakuqe, Farinaz Koushanfar, Ahmad-Reza Sadeghi -+ [Images in Motion?: A First Look into Video Leakage in Collaborative Deep Learning](https://arxiv.org//abs/2509.09742) ++ [Images in Motion?: A First Look into Video Leakage in Collaborative Deep Learning](https://arxiv.org/abs/2509.09742) Md Fazle Rasul, Alanood Alqobaisi, Bruhadeshwar Bezawada, Indrakshi Ray -+ [Privacy-Preserving Automated Rosacea Detection Based on Medically Inspired Region of Interest Selection](https://arxiv.org//abs/2509.09844) ++ [Privacy-Preserving Automated Rosacea Detection Based on Medically Inspired Region of Interest Selection](https://arxiv.org/abs/2509.09844) Chengyu Yang, Rishik Reddy Yesgari, Chengjun Liu -+ [Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework](https://arxiv.org//abs/2509.18127) ++ [Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework](https://arxiv.org/abs/2509.18127) Jiaqi Weng, Han Zheng, Hanyu Zhang, Qinqin He, Jialing Tao, Hui Xue, Zhixuan Chu, Xiting Wang # 2025-09-10 -+ [Symmetry-Guided Multi-Agent Inverse Reinforcement Learnin](https://arxiv.org//abs/2509.08257) ++ [Symmetry-Guided Multi-Agent Inverse Reinforcement Learnin](https://arxiv.org/abs/2509.08257) Yongkai Tian, Yirong Qi, Xin Yu, Wenjun Wu, Jie Luo -+ [Adversarial Attacks Against Automated Fact-Checking: A Survey](https://arxiv.org//abs/2509.08463) ++ [Adversarial Attacks Against Automated Fact-Checking: A Survey](https://arxiv.org/abs/2509.08463) Fanzhen Liu, Alsharif Abuadbba, Kristen Moore, Surya Nepal, Cecile Paris, Jia Wu, Jian Yang, Quan Z. Sheng -+ [Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations](https://arxiv.org//abs/2509.08646) ++ [Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations](https://arxiv.org/abs/2509.08646) Ron F. Del Rosario, Klaudia Krawiecka, Christian Schroeder de Witt -+ [X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates](https://arxiv.org//abs/2509.08729) ++ [X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates](https://arxiv.org/abs/2509.08729) Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park -+ [Learning Turbulent Flows with Generative Models: Super-resolution, Forecasting, and Sparse Flow Reconstruction](https://arxiv.org//abs/2509.08752) ++ [Learning Turbulent Flows with Generative Models: Super-resolution, Forecasting, and Sparse Flow Reconstruction](https://arxiv.org/abs/2509.08752) Vivek Oommen, Siavash Khodakarami, Aniruddha Bora, Zhicheng Wang, George Em Karniadakis -+ [GTA-Crime: A Synthetic Dataset and Generation Framework for Fatal Violence Detection with Adversarial Snippet-Level Domain Adaptation](https://arxiv.org//abs/2509.08232) ++ [GTA-Crime: A Synthetic Dataset and Generation Framework for Fatal Violence Detection with Adversarial Snippet-Level Domain Adaptation](https://arxiv.org/abs/2509.08232) Seongho Kim, Sejong Ryu, Hyoukjun You, Je Hyeong Hong -+ [Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition](https://arxiv.org//abs/2509.08225) ++ [Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition](https://arxiv.org/abs/2509.08225) Matthew Nolan, Lina Yao, Robert Davidson -+ [Perfectly-Private Analog Secure Aggregation in Federated Learning](https://arxiv.org//abs/2509.08683) ++ [Perfectly-Private Analog Secure Aggregation in Federated Learning](https://arxiv.org/abs/2509.08683) Delio Jaramillo-Velez, Charul Rajput, Ragnar Freij-Hollanti, Camilla Hollanti, Alexandre Graell i Amat -+ [Securing Private Federated Learning in a Malicious Setting: A Scalable TEE-Based Approach with Client Auditing](https://arxiv.org//abs/2509.08709) ++ [Securing Private Federated Learning in a Malicious Setting: A Scalable TEE-Based Approach with Client Auditing](https://arxiv.org/abs/2509.08709) Shun Takagi, Satoshi Hasegawa -+ [Tight Privacy Audit in One Run](https://arxiv.org//abs/2509.08704) ++ [Tight Privacy Audit in One Run](https://arxiv.org/abs/2509.08704) Zihang Xiang, Tianhao Wang, Hanshen Xiao, Yuan Tian, Di Wang -+ [Approximate Algorithms for Verifying Differential Privacy with Gaussian Distributions](https://arxiv.org//abs/2509.08804) ++ [Approximate Algorithms for Verifying Differential Privacy with Gaussian Distributions](https://arxiv.org/abs/2509.08804) Bishnu Bhusal, Rohit Chadha, A. Prasad Sistla, Mahesh Viswanathan -+ [Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates](https://arxiv.org//abs/2509.08933) ++ [Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates](https://arxiv.org/abs/2509.08933) Sreejeet Maity, Aritra Mitra -+ [Group Distributionally Robust Machine Learning under Group Level Distributional Uncertainty](https://arxiv.org//abs/2509.08942) ++ [Group Distributionally Robust Machine Learning under Group Level Distributional Uncertainty](https://arxiv.org/abs/2509.08942) Xenia Konti, Yi Shen, Zifan Wang, Karl Henrik Johansson, Michael J. Pencina, Nicoleta J. Economou-Zavlanos, Michael M. Zavlanos -+ [Quantum Error Correction in Adversarial Regimes](https://arxiv.org//abs/2509.08943) ++ [Quantum Error Correction in Adversarial Regimes](https://arxiv.org/abs/2509.08943) Rahul Arvind, Nikhil Bansal, Dax Enshan Koh, Tobias Haug, Kishor Bharti -+ [AVEC: Bootstrapping Privacy for Local LLMs](https://arxiv.org//abs/2509.10561) ++ [AVEC: Bootstrapping Privacy for Local LLMs](https://arxiv.org/abs/2509.10561) Madhava Gaikwad @@ -3195,148 +3195,148 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park # 2025-09-09 -+ [How Far Are We from True Unlearnability?](https://arxiv.org//abs/2509.08058) ++ [How Far Are We from True Unlearnability?](https://arxiv.org/abs/2509.08058) Kai Ye, Liangcai Su, Chenxiong Qian -+ [Nearest Neighbor Projection Removal Adversarial Training](https://arxiv.org//abs/2509.07673) ++ [Nearest Neighbor Projection Removal Adversarial Training](https://arxiv.org/abs/2509.07673) Himanshu Singh, A. V. Subramanyam, Shivank Rajput, Mohan Kankanhalli -+ [Hammer and Anvil: A Principled Defense Against Backdoors in Federated Learning](https://arxiv.org//abs/2509.08089) ++ [Hammer and Anvil: A Principled Defense Against Backdoors in Federated Learning](https://arxiv.org/abs/2509.08089) Lucas Fenaux, Zheng Wang, Jacob Yan, Nathan Chung, Florian Kerschbaum -+ [Sketched Gaussian Mechanism for Private Federated Learning](https://arxiv.org//abs/2509.08195) ++ [Sketched Gaussian Mechanism for Private Federated Learning](https://arxiv.org/abs/2509.08195) Qiaobo Li, Zhijie Chen, Arindam Banerjee -+ [SAGE: Sample-Aware Guarding Engine for Robust Intrusion Detection Against Adversarial Attacks](https://arxiv.org//abs/2509.08091) ++ [SAGE: Sample-Aware Guarding Engine for Robust Intrusion Detection Against Adversarial Attacks](https://arxiv.org/abs/2509.08091) Jing Chen, Onat Gungor, Zhengli Shang, Tajana Rosing -+ [Asynchronous Gossip Algorithms for Rank-Based Statistical Methods](https://arxiv.org//abs/2509.07543) ++ [Asynchronous Gossip Algorithms for Rank-Based Statistical Methods](https://arxiv.org/abs/2509.07543) Anna Van Elst, Igor Colin, Stephan Clémençon # 2025-09-08 -+ [Towards Trustworthy Agentic IoEV: AI Agents for Explainable Cyberthreat Mitigation and State Analytics](https://arxiv.org//abs/2509.12233) ++ [Towards Trustworthy Agentic IoEV: AI Agents for Explainable Cyberthreat Mitigation and State Analytics](https://arxiv.org/abs/2509.12233) Meryem Malak Dif, Mouhamed Amine Bouchiha, Abdelaziz Amara Korba, Yacine Ghamri-Doudane -+ [When Secure Isn't: Assessing the Security of Machine Learning Model Sharing](https://arxiv.org//abs/2509.06703) ++ [When Secure Isn't: Assessing the Security of Machine Learning Model Sharing](https://arxiv.org/abs/2509.06703) Gabriele Digregorio, Marco Di Gennaro, Stefano Zanero, Stefano Longari, Michele Carminati # 2025-09-07 -+ [RetinaGuard: Obfuscating Retinal Age in Fundus Images for Biometric Privacy Preserving](https://arxiv.org//abs/2509.06142) ++ [RetinaGuard: Obfuscating Retinal Age in Fundus Images for Biometric Privacy Preserving](https://arxiv.org/abs/2509.06142) Zhengquan Luo, Chi Liu, Dongfu Xiao, Zhen Yu, Yueye Wang, Tianqing Zhu -+ [Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal](https://arxiv.org//abs/2509.09708) ++ [Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal](https://arxiv.org/abs/2509.09708) Nirmalendu Prakash, Yeo Wei Jie, Amir Abdullah, Ranjan Satapathy, Erik Cambria, Roy Ka Wei Lee -+ [Robust DDoS-Attack Classification with 3D CNNs Against Adversarial Methods](https://arxiv.org//abs/2509.10543) ++ [Robust DDoS-Attack Classification with 3D CNNs Against Adversarial Methods](https://arxiv.org/abs/2509.10543) Landon Bragg, Nathan Dorsey, Josh Prior, John Ajit, Ben Kim, Nate Willis, Pablo Rivas -+ [Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment](https://arxiv.org//abs/2509.10546) ++ [Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment](https://arxiv.org/abs/2509.10546) Gang Cheng, Haibo Jin, Wenbin Zhang, Haohan Wang, Jun Zhuang # 2025-09-06 -+ [AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs](https://arxiv.org//abs/2509.08000) ++ [AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs](https://arxiv.org/abs/2509.08000) Debdeep Sanyal, Manodeep Ray, Murari Mandal -+ [EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System](https://arxiv.org//abs/2509.10540) ++ [EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System](https://arxiv.org/abs/2509.10540) Pavan Reddy, Aditya Sanjay Gujral -+ [Exploit Tool Invocation Prompt for Tool Behavior Hijacking in LLM-Based Agentic System](https://arxiv.org//abs/2509.05755) ++ [Exploit Tool Invocation Prompt for Tool Behavior Hijacking in LLM-Based Agentic System](https://arxiv.org/abs/2509.05755) Yu Liu, Yuchong Xie, Mingyu Luo, Zesen Liu, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She # 2025-09-05 -+ [CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor](https://arxiv.org//abs/2509.09703) ++ [CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor](https://arxiv.org/abs/2509.09703) Zhenhua Xu, Xixiang Zhao, Xubin Yue, Shengwei Tian, Changting Lin, Meng Han -+ [Differential Robustness in Transformer Language Models: Empirical Evaluation Under Adversarial Text Attacks](https://arxiv.org//abs/2509.09706) ++ [Differential Robustness in Transformer Language Models: Empirical Evaluation Under Adversarial Text Attacks](https://arxiv.org/abs/2509.09706) Taniya Gidatkar, Oluwaseun Ajao, Matthew Shardlow -+ [Contextuality, Holonomy and Discrete Fiber Bundles in Group-Valued Boltzmann Machines](https://arxiv.org//abs/2509.10536) ++ [Contextuality, Holonomy and Discrete Fiber Bundles in Group-Valued Boltzmann Machines](https://arxiv.org/abs/2509.10536) Jean-Pierre Magnot -+ [Pre-Forgettable Models: Prompt Learning as a Native Mechanism for Unlearning](https://arxiv.org//abs/2509.15230) ++ [Pre-Forgettable Models: Prompt Learning as a Native Mechanism for Unlearning](https://arxiv.org/abs/2509.15230) Rutger Hendrix, Giovanni Patanè, Leonardo G. Russo, Simone Carnemolla, Giovanni Bellitto, Federica Proietto Salanitri, Concetto Spampinato, Matteo Pennisi # 2025-09-04 -+ [A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models](https://arxiv.org//abs/2509.03871) ++ [A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models](https://arxiv.org/abs/2509.03871) Yanbo Wang, Yongcan Yu, Jian Liang, Ran He -+ [NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models](https://arxiv.org//abs/2509.03985) ++ [NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models](https://arxiv.org/abs/2509.03985) Chuhan Zhang, Ye Zhang, Bowen Shi, Yuyou Gan, Tianyu Du, Shouling Ji, Dazhan Deng, Yingcai Wu -+ [Detecting Regional Spurious Correlations in Vision Transformers via Token Discarding](https://arxiv.org//abs/2509.04009) ++ [Detecting Regional Spurious Correlations in Vision Transformers via Token Discarding](https://arxiv.org/abs/2509.04009) Solha Kang, Esla Timothy Anzaku, Wesley De Neve, Arnout Van Messem, Joris Vankerschaver, Francois Rameau, Utku Ozbulak -+ [False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize](https://arxiv.org//abs/2509.03888) ++ [False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize](https://arxiv.org/abs/2509.03888) Cheng Wang, Zeming Wei, Qin Liu, Muhao Chen -+ [Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?](https://arxiv.org//abs/2509.04292) ++ [Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?](https://arxiv.org/abs/2509.04292) Qinyan Zhang, Xinping Lei, Ruijie Miao, Yu Fu, Haojie Fan, Le Chang, Jiafan Hou, Dingling Zhang, Zhongfei Hou, Ziqiang Yang, Changxin Pu, Fei Hu, Jingkai Liu, Mengyun Liu, Yang Liu, Xiang Gao, Jiaheng Liu, Tong Yang, Zaiyuan Wang, Ge Zhang, Wenhao Huang -+ [Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain](https://arxiv.org//abs/2509.03787) ++ [Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain](https://arxiv.org/abs/2509.03787) Shakiba Amirshahi, Amin Bigdeli, Charles L. A. Clarke, Amira Ghenai -+ [Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios](https://arxiv.org//abs/2509.04403) ++ [Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios](https://arxiv.org/abs/2509.04403) Jingen Qu, Lijun Li, Bo Zhang, Yichen Yan, Jing Shao -+ [Formal Verification of Local Robustness of a Classification Algorithm for a Spatial Use Case](https://arxiv.org//abs/2509.03948) ++ [Formal Verification of Local Robustness of a Classification Algorithm for a Spatial Use Case](https://arxiv.org/abs/2509.03948) Delphine Longuet, Amira Elouazzani, Alejandro Penacho Riveiros, Nicola Bastianello -+ [Privacy Risks in Time Series Forecasting: User- and Record-Level Membership Inference](https://arxiv.org//abs/2509.04169) ++ [Privacy Risks in Time Series Forecasting: User- and Record-Level Membership Inference](https://arxiv.org/abs/2509.04169) Nicolas Johansson (1), Tobias Olsson (1), Daniel Nilsson (2), Johan Östman (2), Fazeleh Hoseini (2) ((1) Chalmers University of Technology, (2) AI Sweden) -+ [Rethinking Layer-wise Gaussian Noise Injection: Bridging Implicit Objectives and Privacy Budget Allocation](https://arxiv.org//abs/2509.04232) ++ [Rethinking Layer-wise Gaussian Noise Injection: Bridging Implicit Objectives and Privacy Budget Allocation](https://arxiv.org/abs/2509.04232) Qifeng Tan, Shusen Yang, Xuebin Ren, Yikai Zhang (Xi'an Jiaotong University) -+ [Peekaboo, I See Your Queries: Passive Attacks Against DSSE Via Intermittent Observations](https://arxiv.org//abs/2509.03806) ++ [Peekaboo, I See Your Queries: Passive Attacks Against DSSE Via Intermittent Observations](https://arxiv.org/abs/2509.03806) Hao Nie, Wei Wang, Peng Xu, Wei Chen, Laurence T. Yang, Mauro Conti, Kaitai Liang -+ [An Automated, Scalable Machine Learning Model Inversion Assessment Pipeline](https://arxiv.org//abs/2509.04214) ++ [An Automated, Scalable Machine Learning Model Inversion Assessment Pipeline](https://arxiv.org/abs/2509.04214) Tyler Shumaker, Jessica Carpenter, David Saranchak, Nathaniel D. Bastian -+ [Between a Rock and a Hard Place: Exploiting Ethical Reasoning to Jailbreak LLMs](https://arxiv.org//abs/2509.05367) ++ [Between a Rock and a Hard Place: Exploiting Ethical Reasoning to Jailbreak LLMs](https://arxiv.org/abs/2509.05367) Shei Pern Chua, Zhen Leng Thai, Teh Kai Jun, Xiao Li, Xiaolin Hu -+ [Variational Gaussian Mixture Manifold Models for Client-Specific Federated Personalization](https://arxiv.org//abs/2509.10521) ++ [Variational Gaussian Mixture Manifold Models for Client-Specific Federated Personalization](https://arxiv.org/abs/2509.10521) Sai Puppala, Ismail Hossain, Md Jahangir Alam, Sajedul Talukder -+ [MEUV: Achieving Fine-Grained Capability Activation in Large Language Models via Mutually Exclusive Unlock Vectors](https://arxiv.org//abs/2509.12221) ++ [MEUV: Achieving Fine-Grained Capability Activation in Large Language Models via Mutually Exclusive Unlock Vectors](https://arxiv.org/abs/2509.12221) Xin Tong, Zhi Lin, Jingya Wang, Meng Han, Bo Jin -+ [Visible Yet Unreadable: A Systematic Blind Spot of Vision Language Models Across Writing Systems](https://arxiv.org//abs/2509.06996) ++ [Visible Yet Unreadable: A Systematic Blind Spot of Vision Language Models Across Writing Systems](https://arxiv.org/abs/2509.06996) Jie Zhang, Ting Xu, Gelei Deng, Runyi Hu, Han Qiu, Tianwei Zhang, Qing Guo, Ivor Tsang @@ -3345,87 +3345,87 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yunbo Long, Liming Xu, Lukas Beckenbauer, Yuhan Liu, Alexandra Brintrup # 2025-09-03 -+ [ANNIE: Be Careful of Your Robots](https://arxiv.org//abs/2509.03383) ++ [ANNIE: Be Careful of Your Robots](https://arxiv.org/abs/2509.03383) Yiyang Huang, Zixuan Wang, Zishen Wan, Yapeng Tian, Haobo Xu, Yinhe Han, Yiming Gan -+ [AutoDetect: Designing an Autoencoder-based Detection Method for Poisoning Attacks on Object Detection Applications in the Military Domain](https://arxiv.org//abs/2509.03179) ++ [AutoDetect: Designing an Autoencoder-based Detection Method for Poisoning Attacks on Object Detection Applications in the Military Domain](https://arxiv.org/abs/2509.03179) Alma M. Liezenga, Stefan Wijnja, Puck de Haan, Niels W. T. Brink, Jip J. van Stijn, Yori Kamphuis, Klamer Schutte -+ [On the MIA Vulnerability Gap Between Private GANs and Diffusion Models](https://arxiv.org//abs/2509.03341) ++ [On the MIA Vulnerability Gap Between Private GANs and Diffusion Models](https://arxiv.org/abs/2509.03341) Ilana Sebag, Jean-Yves Franceschi, Alain Rakotomamonjy, Alexandre Allauzen, Jamal Atif -+ [DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling](https://arxiv.org//abs/2509.03472) ++ [DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling](https://arxiv.org/abs/2509.03472) Yubo Gao, Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar -+ [SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models](https://arxiv.org//abs/2509.03487) ++ [SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models](https://arxiv.org/abs/2509.03487) Jigang Fan, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang, Zaixi Zhang -+ [Enhancing Robustness in Post-Processing Watermarking: An Ensemble Attack Network Using CNNs and Transformers](https://arxiv.org//abs/2509.03006) ++ [Enhancing Robustness in Post-Processing Watermarking: An Ensemble Attack Network Using CNNs and Transformers](https://arxiv.org/abs/2509.03006) Tzuhsuan Huang, Cheng Yu Yeo, Tsai-Ling Huang, Hong-Han Shuai, Wen-Huang Cheng, Jun-Cheng Chen -+ [Background Matters Too: A Language-Enhanced Adversarial Framework for Person Re-Identification](https://arxiv.org//abs/2509.03032) ++ [Background Matters Too: A Language-Enhanced Adversarial Framework for Person Re-Identification](https://arxiv.org/abs/2509.03032) Kaicong Huang, Talha Azfar, Jack M. Reilly, Thomas Guggisberg, Ruimin Ke -+ [High Cursive Complex Character Recognition using GAN External Classifier](https://arxiv.org//abs/2509.03062) ++ [High Cursive Complex Character Recognition using GAN External Classifier](https://arxiv.org/abs/2509.03062) S M Rafiuddin -+ [Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods](https://arxiv.org//abs/2509.03108) ++ [Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods](https://arxiv.org/abs/2509.03108) Shota Iwamatsu, Koichi Ito, Takafumi Aoki -+ [Prompt-Guided Patch UNet-VAE with Adversarial Supervision for Adrenal Gland Segmentation in Computed Tomography Medical Images](https://arxiv.org//abs/2509.03188) ++ [Prompt-Guided Patch UNet-VAE with Adversarial Supervision for Adrenal Gland Segmentation in Computed Tomography Medical Images](https://arxiv.org/abs/2509.03188) Hania Ghouse, Muzammil Behzad -+ [Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation](https://arxiv.org//abs/2509.02970) ++ [Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation](https://arxiv.org/abs/2509.02970) Kaoru Otsuka, Yuki Takezawa, Makoto Yamada -+ [LSAM: Asynchronous Distributed Training with Landscape-Smoothed Sharpness-Aware Minimization](https://arxiv.org//abs/2509.03110) ++ [LSAM: Asynchronous Distributed Training with Landscape-Smoothed Sharpness-Aware Minimization](https://arxiv.org/abs/2509.03110) Yunfei Teng, Sixin Zhang -+ [Can LLMs Lie? Investigation beyond Hallucination](https://arxiv.org//abs/2509.03518) ++ [Can LLMs Lie? Investigation beyond Hallucination](https://arxiv.org/abs/2509.03518) Haoran Huan, Mihir Prabhudesai, Mengning Wu, Shantanu Jaiswal, Deepak Pathak -+ [EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint](https://arxiv.org//abs/2509.03058) ++ [EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint](https://arxiv.org/abs/2509.03058) Zhenhua Xu, Meng Han, Wenpeng Xing -+ [Exposing Privacy Risks in Anonymizing Clinical Data: Combinatorial Refinement Attacks on k-Anonymity Without Auxiliary Information](https://arxiv.org//abs/2509.03350) ++ [Exposing Privacy Risks in Anonymizing Clinical Data: Combinatorial Refinement Attacks on k-Anonymity Without Auxiliary Information](https://arxiv.org/abs/2509.03350) Somiya Chhillar, Mary K. Righi, Rebecca E. Sutter, Evgenios M. Kornaropoulos -+ [Federated Learning: An approach with Hybrid Homomorphic Encryption](https://arxiv.org//abs/2509.03427) ++ [Federated Learning: An approach with Hybrid Homomorphic Encryption](https://arxiv.org/abs/2509.03427) Pedro Correia, Ivan Silva, Ivone Amorim, Eva Maia, Isabel Praça -+ [PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming](https://arxiv.org//abs/2509.03728) ++ [PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming](https://arxiv.org/abs/2509.03728) Wesley Hanwen Deng, Sunnie S. Y. Kim, Akshita Jha, Ken Holstein, Motahhare Eslami, Lauren Wilcox, Leon A Gatys -+ [Learning an Adversarial World Model for Automated Curriculum Generation in MARL](https://arxiv.org//abs/2509.03771) ++ [Learning an Adversarial World Model for Automated Curriculum Generation in MARL](https://arxiv.org/abs/2509.03771) Brennen Hill -+ [Stealth by Conformity: Evading Robust Aggregation through Adaptive Poisoning](https://arxiv.org//abs/2509.08746) ++ [Stealth by Conformity: Evading Robust Aggregation through Adaptive Poisoning](https://arxiv.org/abs/2509.08746) Ryan McGaughey, Jesus Martinez del Rincon, Ihsen Alouani -+ [Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity](https://arxiv.org//abs/2509.08747) ++ [Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity](https://arxiv.org/abs/2509.08747) Wei Guo, Maura Pintor, Ambra Demontis, Battista Biggio -+ [Prototype-Guided Robust Learning against Backdoor Attacks](https://arxiv.org//abs/2509.08748) ++ [Prototype-Guided Robust Learning against Backdoor Attacks](https://arxiv.org/abs/2509.08748) Wei Guo, Maura Pintor, Ambra Demontis, Battista Biggio @@ -3438,46 +3438,46 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Jigang Fan, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang, Zaixi Zhang # 2025-09-02 -+ [A-SEA3L-QA: A Fully Automated Self-Evolving, Adversarial Workflow for Arabic Long-Context Question-Answer Generation](https://arxiv.org//abs/2509.02864) ++ [A-SEA3L-QA: A Fully Automated Self-Evolving, Adversarial Workflow for Arabic Long-Context Question-Answer Generation](https://arxiv.org/abs/2509.02864) Kesen Wang, Daulet Toibazar, Pedro J. Moreno -+ [Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models](https://arxiv.org//abs/2509.02859) ++ [Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models](https://arxiv.org/abs/2509.02859) Sandipana Dowerah, Atharva Kulkarni, Ajinkya Kulkarni, Hoan My Tran, Joonas Kalda, Artem Fedorchenko, Benoit Fauve, Damien Lolive, Tanel Alumäe, Matthew Magimai Doss -+ [See No Evil: Adversarial Attacks Against Linguistic-Visual Association in Referring Multi-Object Tracking Systems](https://arxiv.org//abs/2509.02028) ++ [See No Evil: Adversarial Attacks Against Linguistic-Visual Association in Referring Multi-Object Tracking Systems](https://arxiv.org/abs/2509.02028) Halima Bouzidi, Haoyu Liu, Mohammad Abdullah Al Faruque -+ [The Anti-Ouroboros Effect: Emergent Resilience in Large Language Models from Recursive Selective Feedback](https://arxiv.org//abs/2509.10509) ++ [The Anti-Ouroboros Effect: Emergent Resilience in Large Language Models from Recursive Selective Feedback](https://arxiv.org/abs/2509.10509) Sai Teja Reddy Adapala -+ [Abex-rat: Synergizing Abstractive Augmentation and Adversarial Training for Classification of Occupational Accident Reports](https://arxiv.org//abs/2509.02072) ++ [Abex-rat: Synergizing Abstractive Augmentation and Adversarial Training for Classification of Occupational Accident Reports](https://arxiv.org/abs/2509.02072) Jian Chen, Jiabao Dou, Jinbao Tian, Yunqi Yang, Zhou Li # 2025-09-01 -+ [PIR-RAG: A System for Private Information Retrieval in Retrieval-Augmented Generation](https://arxiv.org//abs/2509.21325) ++ [PIR-RAG: A System for Private Information Retrieval in Retrieval-Augmented Generation](https://arxiv.org/abs/2509.21325) Baiqiang Wang, Qian Lou, Mengxin Zheng, Dongfang Zhao # 2025-08-31 -+ [Distributed Gossip-GAN for Low-overhead CSI Feedback Training in FDD mMIMO-OFDM Systems](https://arxiv.org//abs/2509.10490) ++ [Distributed Gossip-GAN for Low-overhead CSI Feedback Training in FDD mMIMO-OFDM Systems](https://arxiv.org/abs/2509.10490) Yuwen Cao, Guijun Liu, Tomoaki Ohtsuki, Howard H. Yang, Tony Q. S. Quek # 2025-08-30 -+ [Deep opacity and AI: A threat to XAI and to privacy protection mechanisms](https://arxiv.org//abs/2509.08835) ++ [Deep opacity and AI: A threat to XAI and to privacy protection mechanisms](https://arxiv.org/abs/2509.08835) Vincent C. Müller -+ [Partially Functional Dynamic Backdoor Diffusion-based Causal Model](https://arxiv.org//abs/2509.00472) ++ [Partially Functional Dynamic Backdoor Diffusion-based Causal Model](https://arxiv.org/abs/2509.00472) Xinwen Liu, Lei Qian, Song Xi Chen, Niansheng Tang -+ [When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment](https://arxiv.org//abs/2509.00544) ++ [When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment](https://arxiv.org/abs/2509.00544) Hanqi Yan, Hainiu Xu, Siya Qi, Shu Yang, Yulan He @@ -3486,11 +3486,11 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Hanqi Yan, Hainiu Xu, Siya Qi, Shu Yang, Yulan He # 2025-08-28 -+ [GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability](https://arxiv.org//abs/2508.21197) ++ [GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability](https://arxiv.org/abs/2508.21197) Zhenghao He, Sanchit Sinha, Guangzhi Xiong, Aidong Zhang -+ [PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance](https://arxiv.org//abs/2508.20890) ++ [PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance](https://arxiv.org/abs/2508.20890) Mengxiao Wang, Yuxuan Zhang, Guofei Gu @@ -3503,27 +3503,27 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Xiangtao Meng, Yingkai Dong, Ning Yu, Li Wang, Zheng Li, Shanqing Guo # 2025-08-27 -+ [Network-Level Prompt and Trait Leakage in Local Research Agents](https://arxiv.org//abs/2508.20282) ++ [Network-Level Prompt and Trait Leakage in Local Research Agents](https://arxiv.org/abs/2508.20282) Hyejun Jeong, Mohammadreza Teymoorianfard, Abhinav Kumar, Amir Houmansadr, Eugene Bagdasarian -+ [Pruning Weights but Not Truth: Safeguarding Truthfulness While Pruning LLMs](https://arxiv.org//abs/2509.00096) ++ [Pruning Weights but Not Truth: Safeguarding Truthfulness While Pruning LLMs](https://arxiv.org/abs/2509.00096) Yao Fu, Runchao Li, Xianxuan Long, Haotian Yu, Xiaotian Han, Yu Yin, Pan Li -+ [Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents](https://arxiv.org//abs/2508.19493) ++ [Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents](https://arxiv.org/abs/2508.19493) Zhixin Lin, Jungang Li, Shidong Pan, Yibo Shi, Yue Yao, Dongliang Xu -+ [AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models](https://arxiv.org//abs/2509.03537) ++ [AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models](https://arxiv.org/abs/2509.03537) Cheng-Kai Yeh, Hsing-Wang Lee, Chung-Hung Kuo, Hen-Hsen Huang -+ [Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks](https://arxiv.org//abs/2508.20038) ++ [Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks](https://arxiv.org/abs/2508.20038) Sheng Liu, Qiang Sheng, Danding Wang, Yang Li, Guang Yang, Juan Cao -+ [Language Models Identify Ambiguities and Exploit Loopholes](https://arxiv.org//abs/2508.19546) ++ [Language Models Identify Ambiguities and Exploit Loopholes](https://arxiv.org/abs/2508.19546) Jio Choi, Mohit Bansal, Elias Stengel-Eskin @@ -3531,84 +3531,84 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Ting-Chun Liu, Ching-Yu Hsu, Kuan-Yi Lee, Chi-An Fu, Hung-yi Lee -+ [Robustness Assessment and Enhancement of Text Watermarking for Google's SynthID](https://arxiv.org//abs/2508.20228) ++ [Robustness Assessment and Enhancement of Text Watermarking for Google's SynthID](https://arxiv.org/abs/2508.20228) Xia Han, Qi Li, Jianbing Ni, Mohammad Zulkernine # 2025-08-26 -+ [PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality](https://arxiv.org//abs/2508.18649) ++ [PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality](https://arxiv.org/abs/2508.18649) Nanxi Li, Zhengyue Zhao, Chaowei Xiao -+ [Membership Inference Attacks on LLM-based Recommender Systems](https://arxiv.org//abs/2508.18665) ++ [Membership Inference Attacks on LLM-based Recommender Systems](https://arxiv.org/abs/2508.18665) Jiajie He, Yuechun Gu, Min-Chun Chen, Keke Chen -+ [Auditing Approximate Machine Unlearning for Differentially Private Models](https://arxiv.org//abs/2508.18671) ++ [Auditing Approximate Machine Unlearning for Differentially Private Models](https://arxiv.org/abs/2508.18671) Yuechun Gu, Jiajie He, Keke Chen -+ [FLAegis: A Two-Layer Defense Framework for Federated Learning Against Poisoning Attacks](https://arxiv.org//abs/2508.18737) ++ [FLAegis: A Two-Layer Defense Framework for Federated Learning Against Poisoning Attacks](https://arxiv.org/abs/2508.18737) Enrique Mármol Campos, Aurora González Vidal, José Luis Hernández Ramos, Antonio Skarmeta -+ [SegReConcat: A Data Augmentation Method for Voice Anonymization Attack](https://arxiv.org//abs/2508.18907) ++ [SegReConcat: A Data Augmentation Method for Voice Anonymization Attack](https://arxiv.org/abs/2508.18907) Ridwan Arefeen, Xiaoxiao Miao, Rong Tong, Aik Beng Ng, Simon See -+ [Enhancing Model Privacy in Federated Learning with Random Masking and Quantization](https://arxiv.org//abs/2508.18911) ++ [Enhancing Model Privacy in Federated Learning with Random Masking and Quantization](https://arxiv.org/abs/2508.18911) Zhibo Xu, Jianhao Zhu, Jingwen Xu, Changze Lv, Zisu Huang, Xiaohua Wang, Muling Wu, Qi Qian, Xiaoqing Zheng, Xuanjing Huang -+ [Tackling Federated Unlearning as a Parameter Estimation Problem](https://arxiv.org//abs/2508.19065) ++ [Tackling Federated Unlearning as a Parameter Estimation Problem](https://arxiv.org/abs/2508.19065) Antonio Balordi, Lorenzo Manini, Fabio Stella, Alessio Merlo -+ [Attackers Strike Back? Not Anymore - An Ensemble of RL Defenders Awakens for APT Detection](https://arxiv.org//abs/2508.19072) ++ [Attackers Strike Back? Not Anymore - An Ensemble of RL Defenders Awakens for APT Detection](https://arxiv.org/abs/2508.19072) Sidahmed Benabderrahmane, Talal Rahwan -+ [SecureV2X: An Efficient and Privacy-Preserving System for Vehicle-to-Everything (V2X) Applications](https://arxiv.org//abs/2508.19115) ++ [SecureV2X: An Efficient and Privacy-Preserving System for Vehicle-to-Everything (V2X) Applications](https://arxiv.org/abs/2508.19115) Joshua Lee, Ali Arastehfard, Weiran Liu, Xuegang Ban, Yuan Hong -+ [UniC-RAG: Universal Knowledge Corruption Attacks to Retrieval-Augmented Generation](https://arxiv.org//abs/2508.18652) ++ [UniC-RAG: Universal Knowledge Corruption Attacks to Retrieval-Augmented Generation](https://arxiv.org/abs/2508.18652) Runpeng Geng, Yanting Wang, Ying Chen, Jinyuan Jia -+ [The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization](https://arxiv.org//abs/2508.18976) ++ [The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization](https://arxiv.org/abs/2508.18976) Stephen Meisenbacher, Alexandra Klymenko, Andreea-Elena Bodea, Florian Matthes -+ [Flatness-aware Curriculum Learning via Adversarial Difficulty](https://arxiv.org//abs/2508.18726) ++ [Flatness-aware Curriculum Learning via Adversarial Difficulty](https://arxiv.org/abs/2508.18726) Hiroaki Aizawa, Yoshikazu Hayashi -+ [A Closer Look at Edema Area Segmentation in SD-OCT Images Using Adversarial Framework](https://arxiv.org//abs/2508.18790) ++ [A Closer Look at Edema Area Segmentation in SD-OCT Images Using Adversarial Framework](https://arxiv.org/abs/2508.18790) Yuhui Tao, Yizhe Zhang, Qiang Chen -+ [Can we make NeRF-based visual localization privacy-preserving?](https://arxiv.org//abs/2508.18971) ++ [Can we make NeRF-based visual localization privacy-preserving?](https://arxiv.org/abs/2508.18971) Maxime Pietrantoni, Martin Humenberger, Torsten Sattler, Gabriela Csurka -+ [Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models](https://arxiv.org//abs/2508.18805) ++ [Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models](https://arxiv.org/abs/2508.18805) Rui Zhang, Zihan Wang, Tianli Yang, Hongwei Li, Wenbo Jiang, Qingchuan Zhao, Yang Liu, Guowen Xu -+ [Saddle Hierarchy in Dense Associative Memory](https://arxiv.org//abs/2508.19151) ++ [Saddle Hierarchy in Dense Associative Memory](https://arxiv.org/abs/2508.19151) Robin Thériault, Daniele Tantari -+ [Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness](https://arxiv.org//abs/2508.19183) ++ [Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness](https://arxiv.org/abs/2508.19183) Wenchuan Mu, Kwan Hui Lim -+ [A Tight Context-aware Privacy Bound for Histogram Publication](https://arxiv.org//abs/2508.18832) ++ [A Tight Context-aware Privacy Bound for Histogram Publication](https://arxiv.org/abs/2508.18832) Sara Saeidian (1 and 2), Ata Yavuzyılmaz, Leonhard Grosse (1), Georg Schuppe (3), Tobias J. Oechtering (1) ((1) KTH Royal Institute of Technology, (2) Inria Saclay, (3) SEBx) -+ [Memorization in Graph Neural Networks](https://arxiv.org//abs/2508.19352) ++ [Memorization in Graph Neural Networks](https://arxiv.org/abs/2508.19352) Adarsh Jamadandi, Jing Xu, Adam Dziedzic, Franziska Boenisch @@ -3621,517 +3621,517 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Haozhe Jiang, Nika Haghtalab # 2025-08-25 -+ [Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models](https://arxiv.org//abs/2508.17674) ++ [Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models](https://arxiv.org/abs/2508.17674) Qiming Guo, Jinwen Tang, Xingran Huang -+ [Robustness Feature Adapter for Efficient Adversarial Training](https://arxiv.org//abs/2508.17680) ++ [Robustness Feature Adapter for Efficient Adversarial Training](https://arxiv.org/abs/2508.17680) Quanwei Wu, Jun Guo, Wei Wang, Yi Wang -+ [Speculative Safety-Aware Decoding](https://arxiv.org//abs/2508.17739) ++ [Speculative Safety-Aware Decoding](https://arxiv.org/abs/2508.17739) Xuekang Wang, Shengyu Zhu, Xueqi Cheng -+ [FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation](https://arxiv.org//abs/2508.17868) ++ [FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation](https://arxiv.org/abs/2508.17868) Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo -+ [Vocoder-Projected Feature Discriminator](https://arxiv.org//abs/2508.17874) ++ [Vocoder-Projected Feature Discriminator](https://arxiv.org/abs/2508.17874) Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo -+ [Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation](https://arxiv.org//abs/2508.18148) ++ [Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation](https://arxiv.org/abs/2508.18148) Haijian Ma, Daizong Liu, Xiaowen Cai, Pan Zhou, Yulai Xie -+ [ISACL: Internal State Analyzer for Copyrighted Training Data Leakage](https://arxiv.org//abs/2508.17767) ++ [ISACL: Internal State Analyzer for Copyrighted Training Data Leakage](https://arxiv.org/abs/2508.17767) Guangwei Zhang, Qisheng Su, Jiateng Liu, Cheng Qian, Yanzhou Pan, Yanjie Fu, Denghui Zhang -+ [CATformer: Contrastive Adversarial Transformer for Image Super-Resolution](https://arxiv.org//abs/2508.17708) ++ [CATformer: Contrastive Adversarial Transformer for Image Super-Resolution](https://arxiv.org/abs/2508.17708) Qinyi Tian, Spence Cox, Laura E. Dalton -+ [SCOUT: Semi-supervised Camouflaged Object Detection by Utilizing Text and Adaptive Data Selection](https://arxiv.org//abs/2508.17843) ++ [SCOUT: Semi-supervised Camouflaged Object Detection by Utilizing Text and Adaptive Data Selection](https://arxiv.org/abs/2508.17843) Weiqi Yan, Lvhai Chen, Shengchuan Zhang, Yan Zhang, Liujuan Cao -+ [Does simple trump complex? Comparing strategies for adversarial robustness in DNNs](https://arxiv.org//abs/2508.18019) ++ [Does simple trump complex? Comparing strategies for adversarial robustness in DNNs](https://arxiv.org/abs/2508.18019) William Brooks, Marelie H. Davel, Coenraad Mouton -+ [FedGreed: A Byzantine-Robust Loss-Based Aggregation Method for Federated Learning](https://arxiv.org//abs/2508.18060) ++ [FedGreed: A Byzantine-Robust Loss-Based Aggregation Method for Federated Learning](https://arxiv.org/abs/2508.18060) Emmanouil Kritharakis, Antonios Makris, Dusan Jakovetic, Konstantinos Tserpes -+ [Quantum-Classical Hybrid Framework for Zero-Day Time-Push GNSS Spoofing Detection](https://arxiv.org//abs/2508.18085) ++ [Quantum-Classical Hybrid Framework for Zero-Day Time-Push GNSS Spoofing Detection](https://arxiv.org/abs/2508.18085) Abyad Enan, Mashrur Chowdhury, Sagar Dasgupta, Mizanur Rahman -+ [PhantomLint: Principled Detection of Hidden LLM Prompts in Structured Documents](https://arxiv.org//abs/2508.17884) ++ [PhantomLint: Principled Detection of Hidden LLM Prompts in Structured Documents](https://arxiv.org/abs/2508.17884) Toby Murray -+ [ClearMask: Noise-Free and Naturalness-Preserving Protection Against Voice Deepfake Attacks](https://arxiv.org//abs/2508.17660) ++ [ClearMask: Noise-Free and Naturalness-Preserving Protection Against Voice Deepfake Attacks](https://arxiv.org/abs/2508.17660) Yuanda Wang, Bocheng Chen, Hanqing Guo, Guangjing Wang, Weikang Ding, Qiben Yan -+ [Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails](https://arxiv.org//abs/2508.18384) ++ [Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails](https://arxiv.org/abs/2508.18384) Kellen Tan Cheng, Anna Lisa Gentile, Chad DeLuca, Guang-Jie Ren -+ [Analise de Desaprendizado de Maquina em Modelos de Classificacao de Imagens Medicas](https://arxiv.org//abs/2508.18509) ++ [Analise de Desaprendizado de Maquina em Modelos de Classificacao de Imagens Medicas](https://arxiv.org/abs/2508.18509) Andreza M. C. Falcao, Filipe R. Cordeiro -+ [Training Language Model Agents to Find Vulnerabilities with CTF-Dojo](https://arxiv.org//abs/2508.18370) ++ [Training Language Model Agents to Find Vulnerabilities with CTF-Dojo](https://arxiv.org/abs/2508.18370) Terry Yue Zhuo, Dingmin Wang, Hantian Ding, Varun Kumar, Zijian Wang -+ [Privacy-Preserving Federated Learning Framework for Risk-Based Adaptive Authentication](https://arxiv.org//abs/2508.18453) ++ [Privacy-Preserving Federated Learning Framework for Risk-Based Adaptive Authentication](https://arxiv.org/abs/2508.18453) Yaser Baseri, Abdelhakim Senhaji Hafid, Dimitrios Makrakis, Hamidreza Fereidouni -+ [A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code](https://arxiv.org//abs/2508.18106) ++ [A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code](https://arxiv.org/abs/2508.18106) Keke Lian, Bin Wang, Lei Zhang, Libo Chen, Junjie Wang, Ziming Zhao, Yujiu Yang, Miaoqian Lin, Haotong Duan, Haoran Zhao, Shuang Liao, Mingda Guo, Jiazheng Quan, Yilu Zhong, Chenhao He, Zichuan Chen, Jie Wu, Haoling Li, Zhaoxuan Li, Jiongchi Yu, Hui Li, Dong Zhang # 2025-08-24 -+ [School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs](https://arxiv.org//abs/2508.17511) ++ [School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs](https://arxiv.org/abs/2508.17511) Mia Taylor, James Chua, Jan Betley, Johannes Treutlein, Owain Evans -+ [How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System](https://arxiv.org//abs/2508.17215) ++ [How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System](https://arxiv.org/abs/2508.17215) Kaiwen Zuo, Zelin Liu, Raman Dutt, Ziyang Wang, Zhongtian Sun, Yeming Wang, Fan Mo, Pietro Liò -+ [Exposing Privacy Risks in Graph Retrieval-Augmented Generation](https://arxiv.org//abs/2508.17222) ++ [Exposing Privacy Risks in Graph Retrieval-Augmented Generation](https://arxiv.org/abs/2508.17222) Jiale Liu, Jiahao Zhang, Suhang Wang -+ [Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents](https://arxiv.org//abs/2508.17393) ++ [Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents](https://arxiv.org/abs/2508.17393) Sameer Komoravolu, Khalil Mrini -+ [Activation Transport Operators](https://arxiv.org//abs/2508.17540) ++ [Activation Transport Operators](https://arxiv.org/abs/2508.17540) Andrzej Szablewski, Marek Masiak -+ [Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD](https://arxiv.org//abs/2508.17450) ++ [Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD](https://arxiv.org/abs/2508.17450) Bryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee -+ [Advancing Weakly-Supervised Change Detection in Satellite Images via Adversarial Class Prompting](https://arxiv.org//abs/2508.17186) ++ [Advancing Weakly-Supervised Change Detection in Satellite Images via Adversarial Class Prompting](https://arxiv.org/abs/2508.17186) Zhenghui Zhao, Chen Wu, Di Wang, Hongruixuan Chen, Cuiqun Chen, Zhuo Zheng, Bo Du, Liangpei Zhang -+ [Uncovering and Mitigating Destructive Multi-Embedding Attacks in Deepfake Proactive Forensics](https://arxiv.org//abs/2508.17247) ++ [Uncovering and Mitigating Destructive Multi-Embedding Attacks in Deepfake Proactive Forensics](https://arxiv.org/abs/2508.17247) Lixin Jia, Haiyang Sun, Zhiqing Guo, Yunfeng Diao, Dan Ma, Gaobo Yang -+ [AdaGAT: Adaptive Guidance Adversarial Training for the Robustness of Deep Neural Networks](https://arxiv.org//abs/2508.17265) ++ [AdaGAT: Adaptive Guidance Adversarial Training for the Robustness of Deep Neural Networks](https://arxiv.org/abs/2508.17265) Zhenyu Liu, Huizhi Liang, Xinrun Li, Vaclav Snasel, Varun Ojha -+ [Defending Deepfake via Texture Feature Perturbation](https://arxiv.org//abs/2508.17315) ++ [Defending Deepfake via Texture Feature Perturbation](https://arxiv.org/abs/2508.17315) Xiao Zhang, Changfang Chen, Tianyi Wang -+ [Sharpness-Aware Geometric Defense for Robust Out-Of-Distribution Detection](https://arxiv.org//abs/2508.17174) ++ [Sharpness-Aware Geometric Defense for Robust Out-Of-Distribution Detection](https://arxiv.org/abs/2508.17174) Jeng-Lin Li, Ming-Ching Chang, Wei-Chao Chen -+ [MetaFed: Advancing Privacy, Performance, and Sustainability in Federated Metaverse Systems](https://arxiv.org//abs/2508.17341) ++ [MetaFed: Advancing Privacy, Performance, and Sustainability in Federated Metaverse Systems](https://arxiv.org/abs/2508.17341) Muhammet Anil Yagiz, Zeynep Sude Cengiz, Polat Goktas -+ [Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias](https://arxiv.org//abs/2508.17361) ++ [Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias](https://arxiv.org/abs/2508.17361) Shir Bernstein, David Beste, Daniel Ayzenshteyn, Lea Schonherr, Yisroel Mirsky -+ [FRAME : Comprehensive Risk Assessment Framework for Adversarial Machine Learning Threats](https://arxiv.org//abs/2508.17405) ++ [FRAME : Comprehensive Risk Assessment Framework for Adversarial Machine Learning Threats](https://arxiv.org/abs/2508.17405) Avishag Shapira, Simon Shigol, Asaf Shabtai -+ [Adversarial Examples Are Not Bugs, They Are Superposition](https://arxiv.org//abs/2508.17456) ++ [Adversarial Examples Are Not Bugs, They Are Superposition](https://arxiv.org/abs/2508.17456) Liv Gorton, Owen Lewis -+ [Risk Assessment and Security Analysis of Large Language Models](https://arxiv.org//abs/2508.17329) ++ [Risk Assessment and Security Analysis of Large Language Models](https://arxiv.org/abs/2508.17329) Xiaoyan Zhang, Dongyang Lyu, Xiaoqi Li -+ [SoK: Cybersecurity Assessment of Humanoid Ecosystem](https://arxiv.org//abs/2508.17481) ++ [SoK: Cybersecurity Assessment of Humanoid Ecosystem](https://arxiv.org/abs/2508.17481) Priyanka Prakash Surve, Asaf Shabtai, Yuval Elovici -+ [LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions](https://arxiv.org//abs/2508.18321) ++ [LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions](https://arxiv.org/abs/2508.18321) Maojia Song, Tej Deep Pala, Weisheng Jin, Amir Zadeh, Chuan Li, Dorien Herremans, Soujanya Poria # 2025-08-23 -+ [WildSpoof Challenge Evaluation Plan](https://arxiv.org//abs/2508.16858) ++ [WildSpoof Challenge Evaluation Plan](https://arxiv.org/abs/2508.16858) Yihan Wu, Jee-weon Jung, Hye-jin Shim, Xin Cheng, Xin Wang -+ [ObjexMT: Objective Extraction and Metacognitive Calibration for LLM-as-a-Judge under Multi-Turn Jailbreaks](https://arxiv.org//abs/2508.16889) ++ [ObjexMT: Objective Extraction and Metacognitive Calibration for LLM-as-a-Judge under Multi-Turn Jailbreaks](https://arxiv.org/abs/2508.16889) Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park -+ [NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability](https://arxiv.org//abs/2508.16937) ++ [NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability](https://arxiv.org/abs/2508.16937) Krishna Kanth Nakka, Alexandre Alahi -+ [Unveiling the Latent Directions of Reflection in Large Language Models](https://arxiv.org//abs/2508.16989) ++ [Unveiling the Latent Directions of Reflection in Large Language Models](https://arxiv.org/abs/2508.16989) Fu-Chieh Chang, Yu-Ting Lee, Pei-Yuan Wu -+ [Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks](https://arxiv.org//abs/2508.17158) ++ [Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks](https://arxiv.org/abs/2508.17158) Jack Youstra, Mohammed Mahfoud, Yang Yan, Henry Sleight, Ethan Perez, Mrinank Sharma -+ [Rao Differential Privacy](https://arxiv.org//abs/2508.17135) ++ [Rao Differential Privacy](https://arxiv.org/abs/2508.17135) Carlos Soto -+ [SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds](https://arxiv.org//abs/2508.18306) ++ [SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds](https://arxiv.org/abs/2508.18306) Wuxinlin Cheng, Yupeng Cao, Jinwen Wu, Koduvayur Subbalakshmi, Tian Han, Zhuo Feng # 2025-08-22 -+ [STA-GANN: A Valid and Generalizable Spatio-Temporal Kriging Approach](https://arxiv.org//abs/2508.16161) ++ [STA-GANN: A Valid and Generalizable Spatio-Temporal Kriging Approach](https://arxiv.org/abs/2508.16161) Yujie Li, Zezhi Shao, Chengqing Yu, Tangwen Qian, Zhao Zhang, Yifan Du, Shaoming He, Fei Wang, Yongjun Xu -+ [An Investigation of Visual Foundation Models Robustness](https://arxiv.org//abs/2508.16225) ++ [An Investigation of Visual Foundation Models Robustness](https://arxiv.org/abs/2508.16225) Sandeep Gupta, Roberto Passerone -+ [From Confidence to Collapse in LLM Factual Robustness](https://arxiv.org//abs/2508.16267) ++ [From Confidence to Collapse in LLM Factual Robustness](https://arxiv.org/abs/2508.16267) Alina Fastowski, Bardh Prenkaj, Gjergji Kasneci -+ [LLMSymGuard: A Symbolic Safety Guardrail Framework Leveraging Interpretable Jailbreak Concepts](https://arxiv.org//abs/2508.16325) ++ [LLMSymGuard: A Symbolic Safety Guardrail Framework Leveraging Interpretable Jailbreak Concepts](https://arxiv.org/abs/2508.16325) Darpan Aswal, Céline Hudelot -+ [Confusion is the Final Barrier: Rethinking Jailbreak Evaluation and Investigating the Real Misuse Threat of LLMs](https://arxiv.org//abs/2508.16347) ++ [Confusion is the Final Barrier: Rethinking Jailbreak Evaluation and Investigating the Real Misuse Threat of LLMs](https://arxiv.org/abs/2508.16347) Yu Yan, Sheng Sun, Zhe Wang, Yijun Lin, Zenghao Duan, zhifei zheng, Min Liu, Zhiyi yin, Jianping Zhang -+ [HAMSA: Hijacking Aligned Compact Models via Stealthy Automation](https://arxiv.org//abs/2508.16484) ++ [HAMSA: Hijacking Aligned Compact Models via Stealthy Automation](https://arxiv.org/abs/2508.16484) Alexey Krylov, Iskander Vagizov, Dmitrii Korzh, Maryam Douiba, Azidine Guezzaz, Vladimir Kokh, Sergey D. Erokhin, Elena V. Tutubalina, Oleg Y. Rogov -+ [Retrieval-Augmented Defense: Adaptive and Controllable Jailbreak Prevention for Large Language Models](https://arxiv.org//abs/2508.16406) ++ [Retrieval-Augmented Defense: Adaptive and Controllable Jailbreak Prevention for Large Language Models](https://arxiv.org/abs/2508.16406) Guangyu Yang, Jinghong Chen, Jingbiao Mei, Weizhe Lin, Bill Byrne -+ [Domain Adaptation via Feature Refinement](https://arxiv.org//abs/2508.16124) ++ [Domain Adaptation via Feature Refinement](https://arxiv.org/abs/2508.16124) Savvas Karatsiolis, Andreas Kamilaris -+ [PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting](https://arxiv.org//abs/2508.16217) ++ [PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting](https://arxiv.org/abs/2508.16217) Hohyun Na, Seunghoo Hong, Simon S. Woo -+ [Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms](https://arxiv.org//abs/2508.16481) ++ [Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms](https://arxiv.org/abs/2508.16481) Jonathan Nöther, Adish Singla, Goran Radanovic -+ [Quality control in sublinear time: a case study via random graphs](https://arxiv.org//abs/2508.16531) ++ [Quality control in sublinear time: a case study via random graphs](https://arxiv.org/abs/2508.16531) Cassandra Marcussen, Ronitt Rubinfeld, Madhu Sudan -+ [Evaluating the Defense Potential of Machine Unlearning against Membership Inference Attacks](https://arxiv.org//abs/2508.16150) ++ [Evaluating the Defense Potential of Machine Unlearning against Membership Inference Attacks](https://arxiv.org/abs/2508.16150) Aristeidis Sidiropoulos, Christos Chrysanthos Nikolaidis, Theodoros Tsiolakis, Nikolaos Pavlidis, Vasilis Perifanis, Pavlos S. Efraimidis -+ [How to Beat Nakamoto in the Race](https://arxiv.org//abs/2508.16202) ++ [How to Beat Nakamoto in the Race](https://arxiv.org/abs/2508.16202) Shu-Jie Cao, Dongning Guo -+ [Guarding Your Conversations: Privacy Gatekeepers for Secure Interactions with Cloud-Based AI Models](https://arxiv.org//abs/2508.16765) ++ [Guarding Your Conversations: Privacy Gatekeepers for Secure Interactions with Cloud-Based AI Models](https://arxiv.org/abs/2508.16765) GodsGift Uzor, Hasan Al-Qudah, Ynes Ineza, Abdul Serwadda -+ [A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems](https://arxiv.org//abs/2508.16843) ++ [A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems](https://arxiv.org/abs/2508.16843) Kamel Kamel, Keshav Sood, Hridoy Sankar Dutta, Sunil Aryal -+ [Aligning Distributionally Robust Optimization with Practical Deep Learning Needs](https://arxiv.org//abs/2508.16734) ++ [Aligning Distributionally Robust Optimization with Practical Deep Learning Needs](https://arxiv.org/abs/2508.16734) Dmitrii Feoktistov, Igor Ignashin, Andrey Veprikov, Nikita Borovko, Alexander Bogdanov, Savelii Chezhegov, Aleksandr Beznosikov -+ [Securing Heterogeneous Network (HetNet) Communications for Wildfire Management: Mitigating the Effects of Adversarial and Environmental Threats](https://arxiv.org//abs/2508.16761) ++ [Securing Heterogeneous Network (HetNet) Communications for Wildfire Management: Mitigating the Effects of Adversarial and Environmental Threats](https://arxiv.org/abs/2508.16761) Nesrine Benchoubane, Olfa Ben Yahia, William Ferguson, Gurkan Gur, Sumit Chakravarty, Gregory Falco, Gunes Karabulut Kurt # 2025-08-21 -+ [Conflict-Aware Soft Prompting for Retrieval-Augmented Generation](https://arxiv.org//abs/2508.15253) ++ [Conflict-Aware Soft Prompting for Retrieval-Augmented Generation](https://arxiv.org/abs/2508.15253) Eunseong Choi, June Park, Hyeri Lee, Jongwuk Lee -+ [IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents](https://arxiv.org//abs/2508.15310) ++ [IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents](https://arxiv.org/abs/2508.15310) Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, Shouling Ji -+ [VideoEraser: Concept Erasure in Text-to-Video Diffusion Models](https://arxiv.org//abs/2508.15314) ++ [VideoEraser: Concept Erasure in Text-to-Video Diffusion Models](https://arxiv.org/abs/2508.15314) Naen Xu, Jinghuai Zhang, Changjiang Li, Zhi Chen, Chunyi Zhou, Qingming Li, Tianyu Du, Shouling Ji -+ [Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation](https://arxiv.org//abs/2508.15370) ++ [Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation](https://arxiv.org/abs/2508.15370) Yichi Zhang, Yao Huang, Yifan Wang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu -+ [Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection](https://arxiv.org//abs/2508.15449) ++ [Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection](https://arxiv.org/abs/2508.15449) Chengcan Wu, Zeming Wei, Huanran Chen, Yinpeng Dong, Meng Sun -+ [Towards a 3D Transfer-based Black-box Attack via Critical Feature Guidance](https://arxiv.org//abs/2508.15650) ++ [Towards a 3D Transfer-based Black-box Attack via Critical Feature Guidance](https://arxiv.org/abs/2508.15650) Shuchao Pang, Zhenghan Chen, Shen Zhang, Liming Lu, Siyuan Liang, Anan Du, Yongbin Zhou -+ [A Study of Privacy-preserving Language Modeling Approaches](https://arxiv.org//abs/2508.15421) ++ [A Study of Privacy-preserving Language Modeling Approaches](https://arxiv.org/abs/2508.15421) Pritilata Saha, Abhirup Sinha -+ [SafetyFlow: An Agent-Flow System for Automated LLM Safety Benchmarking](https://arxiv.org//abs/2508.15526) ++ [SafetyFlow: An Agent-Flow System for Automated LLM Safety Benchmarking](https://arxiv.org/abs/2508.15526) Xiangyang Zhu, Yuan Tian, Chunyi Li, Kaiwei Zhang, Wei Sun, Guangtao Zhai -+ [SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models](https://arxiv.org//abs/2508.15648) ++ [SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models](https://arxiv.org/abs/2508.15648) Peng Ding, Wen Sun, Dailin Li, Wei Zou, Jiaming Wang, Jiajun Chen, Shujian Huang -+ [Retrieval-Augmented Review Generation for Poisoning Recommender Systems](https://arxiv.org//abs/2508.15252) ++ [Retrieval-Augmented Review Generation for Poisoning Recommender Systems](https://arxiv.org/abs/2508.15252) Shiyi Yang, Xinshu Li, Guanglin Zhou, Chen Wang, Xiwei Xu, Liming Zhu, Lina Yao -+ [Adversarial Attacks against Neural Ranking Models via In-Context Learning](https://arxiv.org//abs/2508.15283) ++ [Adversarial Attacks against Neural Ranking Models via In-Context Learning](https://arxiv.org/abs/2508.15283) Amin Bigdeli, Negar Arabzadeh, Ebrahim Bagheri, Charles L. A. Clarke -+ [Adversarial Agent Behavior Learning in Autonomous Driving Using Deep Reinforcement Learning](https://arxiv.org//abs/2508.15207) ++ [Adversarial Agent Behavior Learning in Autonomous Driving Using Deep Reinforcement Learning](https://arxiv.org/abs/2508.15207) Arjun Srinivasan, Anubhav Paras, Aniket Bera -+ [Fast globally optimal Truncated Least Squares point cloud registration with fixed rotation axis](https://arxiv.org//abs/2508.15613) ++ [Fast globally optimal Truncated Least Squares point cloud registration with fixed rotation axis](https://arxiv.org/abs/2508.15613) Ivo Ivanov, Carsten Markgraf -+ [DoSReMC: Domain Shift Resilient Mammography Classification using Batch Normalization Adaptation](https://arxiv.org//abs/2508.15452) ++ [DoSReMC: Domain Shift Resilient Mammography Classification using Batch Normalization Adaptation](https://arxiv.org/abs/2508.15452) Uğurcan Akyüz, Deniz Katircioglu-Öztürk, Emre K. Süslü, Burhan Keleş, Mete C. Kaya, Gamze Durhan, Meltem G. Akpınar, Figen B. Demirkazık, Gözde B. Akar -+ [SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks](https://arxiv.org//abs/2508.15182) ++ [SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks](https://arxiv.org/abs/2508.15182) Xiangman Li, Xiaodong Wu, Qi Li, Jianbing Ni, Rongxing Lu -+ [Mini-Batch Robustness Verification of Deep Neural Networks](https://arxiv.org//abs/2508.15454) ++ [Mini-Batch Robustness Verification of Deep Neural Networks](https://arxiv.org/abs/2508.15454) Saar Tzour-Shaday, Dana Drachsler Cohen -+ [Distributed Detection of Adversarial Attacks in Multi-Agent Reinforcement Learning with Continuous Action Space](https://arxiv.org//abs/2508.15764) ++ [Distributed Detection of Adversarial Attacks in Multi-Agent Reinforcement Learning with Continuous Action Space](https://arxiv.org/abs/2508.15764) Kiarash Kazari, Ezzeldin Shereen, György Dán -+ [BadFU: Backdoor Federated Learning through Adversarial Machine Unlearning](https://arxiv.org//abs/2508.15541) ++ [BadFU: Backdoor Federated Learning through Adversarial Machine Unlearning](https://arxiv.org/abs/2508.15541) Bingguang Lu, Hongsheng Hu, Yuantian Miao, Shaleeza Sohail, Chaoxiang He, Shuo Wang, Xiao Chen -+ [Strategic Sample Selection for Improved Clean-Label Backdoor Attacks in Text Classification](https://arxiv.org//abs/2508.15934) ++ [Strategic Sample Selection for Improved Clean-Label Backdoor Attacks in Text Classification](https://arxiv.org/abs/2508.15934) Onur Alp Kirci, M. Emre Gursoy # 2025-08-20 -+ [Foe for Fraud: Transferable Adversarial Attacks in Credit Card Fraud Detection](https://arxiv.org//abs/2508.14699) ++ [Foe for Fraud: Transferable Adversarial Attacks in Credit Card Fraud Detection](https://arxiv.org/abs/2508.14699) Jan Lum Fok, Qingwen Zeng, Shiping Chen, Oscar Fawkes, Huaming Chen -+ [Adversarial Generation and Collaborative Evolution of Safety-Critical Scenarios for Autonomous Vehicles](https://arxiv.org//abs/2508.14527) ++ [Adversarial Generation and Collaborative Evolution of Safety-Critical Scenarios for Autonomous Vehicles](https://arxiv.org/abs/2508.14527) Jiangfan Liu, Yongkang Guo, Fangzhi Zhong, Tianyuan Zhang, Zonglei Jing, Siyuan Liang, Jiakai Wang, Mingchuan Zhang, Aishan Liu, Xianglong Liu -+ [Adversarial Hospital-Invariant Feature Learning for WSI Patch Classification](https://arxiv.org//abs/2508.14779) ++ [Adversarial Hospital-Invariant Feature Learning for WSI Patch Classification](https://arxiv.org/abs/2508.14779) Mengliang Zhang, Jacob M. Luber -+ [Improving Fairness in Graph Neural Networks via Counterfactual Debiasing](https://arxiv.org//abs/2508.14683) ++ [Improving Fairness in Graph Neural Networks via Counterfactual Debiasing](https://arxiv.org/abs/2508.14683) Zengyi Wo, Chang Liu, Yumeng Wang, Minglai Shao, Wenjun Wang -+ [Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent](https://arxiv.org//abs/2508.14853) ++ [Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent](https://arxiv.org/abs/2508.14853) Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu -+ [Distributional Adversarial Attacks and Training in Deep Hedging](https://arxiv.org//abs/2508.14757) ++ [Distributional Adversarial Attacks and Training in Deep Hedging](https://arxiv.org/abs/2508.14757) Guangyi He, Tobias Sutter, Lukas Gonon -+ [DOPA: Stealthy and Generalizable Backdoor Attacks from a Single Client under Challenging Federated Constraints](https://arxiv.org//abs/2508.14530) ++ [DOPA: Stealthy and Generalizable Backdoor Attacks from a Single Client under Challenging Federated Constraints](https://arxiv.org/abs/2508.14530) Xuezheng Qin, Ruwei Huang, Xiaolong Tang, Feng Li -+ [A Lightweight Incentive-Based Privacy-Preserving Smart Metering Protocol for Value-Added Services](https://arxiv.org//abs/2508.14703) ++ [A Lightweight Incentive-Based Privacy-Preserving Smart Metering Protocol for Value-Added Services](https://arxiv.org/abs/2508.14703) Farid Zaredar, Morteza Amini -+ [A Lightweight Privacy-Preserving Smart Metering Billing Protocol with Dynamic Tariff Policy Adjustment](https://arxiv.org//abs/2508.14815) ++ [A Lightweight Privacy-Preserving Smart Metering Billing Protocol with Dynamic Tariff Policy Adjustment](https://arxiv.org/abs/2508.14815) Farid Zaredar, Morteza Amini -+ [TAIGen: Training-Free Adversarial Image Generation via Diffusion Models](https://arxiv.org//abs/2508.15020) ++ [TAIGen: Training-Free Adversarial Image Generation via Diffusion Models](https://arxiv.org/abs/2508.15020) Susim Roy, Anubhooti Jain, Mayank Vatsa, Richa Singh -+ [A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives](https://arxiv.org//abs/2508.15031) ++ [A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives](https://arxiv.org/abs/2508.15031) Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong -+ [MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs](https://arxiv.org//abs/2508.15036) ++ [MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs](https://arxiv.org/abs/2508.15036) Ruyi Ding, Tianhong Xu, Xinyi Shen, Aidong Adam Ding, Yunsi Fei -+ [Paired-Sampling Contrastive Framework for Joint Physical-Digital Face Attack Detection](https://arxiv.org//abs/2508.14980) ++ [Paired-Sampling Contrastive Framework for Joint Physical-Digital Face Attack Detection](https://arxiv.org/abs/2508.14980) Andrei Balykin, Anvar Ganiev, Denis Kondranin, Kirill Polevoda, Nikolai Liudkevich, Artem Petrov -+ [Side Effects of Erasing Concepts from Diffusion Models](https://arxiv.org//abs/2508.15124) ++ [Side Effects of Erasing Concepts from Diffusion Models](https://arxiv.org/abs/2508.15124) Shaswati Saha, Sourajit Saha, Manas Gaur, Tejas Gokhale -+ [Aura-CAPTCHA: A Reinforcement Learning and GAN-Enhanced Multi-Modal CAPTCHA System](https://arxiv.org//abs/2508.14976) ++ [Aura-CAPTCHA: A Reinforcement Learning and GAN-Enhanced Multi-Modal CAPTCHA System](https://arxiv.org/abs/2508.14976) Joydeep Chandra, Prabal Manhas, Ramanjot Kaur, Rashi Sahay -+ [Robust Estimation Under Heterogeneous Corruption Rates](https://arxiv.org//abs/2508.15051) ++ [Robust Estimation Under Heterogeneous Corruption Rates](https://arxiv.org/abs/2508.15051) Syomantak Chaudhuri, Jerry Li, Thomas A. Courtade -+ [Potential and challenges of generative adversarial networks for super-resolution in 4D Flow MRI](https://arxiv.org//abs/2508.14950) ++ [Potential and challenges of generative adversarial networks for super-resolution in 4D Flow MRI](https://arxiv.org/abs/2508.14950) Oliver Welin Odeback, Arivazhagan Geetha Balasubramanian, Jonas Schollenberger, Edward Ferdiand, Alistair A. Young, C. Alberto Figueroa, Susanne Schnell, Outi Tammisola, Ricardo Vinuesa, Tobias Granberg, Alexander Fyrdahl, David Marlevi -+ [Self-Disguise Attack: Induce the LLM to disguise itself for AIGT detection evasion](https://arxiv.org//abs/2508.15848) ++ [Self-Disguise Attack: Induce the LLM to disguise itself for AIGT detection evasion](https://arxiv.org/abs/2508.15848) Yinghan Zhou, Juan Wen, Wanli Peng, Zhengxian Wu, Ziwei Zhang, Yiming Xue -+ [Linkage Attacks Expose Identity Risks in Public ECG Data Sharing](https://arxiv.org//abs/2508.15850) ++ [Linkage Attacks Expose Identity Risks in Public ECG Data Sharing](https://arxiv.org/abs/2508.15850) Ziyu Wang, Elahe Khatibi, Farshad Firouzi, Sanaz Rahimi Mousavi, Krishnendu Chakrabarty, Amir M. Rahmani -+ [Sealing The Backdoor: Unlearning Adversarial Text Triggers In Diffusion Models Using Knowledge Distillation](https://arxiv.org//abs/2508.18235) ++ [Sealing The Backdoor: Unlearning Adversarial Text Triggers In Diffusion Models Using Knowledge Distillation](https://arxiv.org/abs/2508.18235) Ashwath Vaithinathan Aravindan, Abha Jha, Matthew Salaway, Atharva Sandeep Bhide, Duygu Nur Yaldiz # 2025-08-19 -+ [The AI Risk Spectrum: From Dangerous Capabilities to Existential Threats](https://arxiv.org//abs/2508.13700) ++ [The AI Risk Spectrum: From Dangerous Capabilities to Existential Threats](https://arxiv.org/abs/2508.13700) Markov Grey, Charbel-Raphaël Segerie -+ [On the Security and Privacy of Federated Learning: A Survey with Attacks, Defenses, Frameworks, Applications, and Future Directions](https://arxiv.org//abs/2508.13730) ++ [On the Security and Privacy of Federated Learning: A Survey with Attacks, Defenses, Frameworks, Applications, and Future Directions](https://arxiv.org/abs/2508.13730) Daniel M. Jimenez-Gutierrez, Yelizaveta Falkouskaya, Jose L. Hernandez-Ramos, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti -+ [Evaluating Identity Leakage in Speaker De-Identification Systems](https://arxiv.org//abs/2508.14012) ++ [Evaluating Identity Leakage in Speaker De-Identification Systems](https://arxiv.org/abs/2508.14012) Seungmin Seo, Oleg Aulov, Afzal Godil, Kevin Mangold -+ [Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation](https://arxiv.org//abs/2508.14031) ++ [Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation](https://arxiv.org/abs/2508.14031) Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee -+ [CRISP: Persistent Concept Unlearning via Sparse Autoencoders](https://arxiv.org//abs/2508.13650) ++ [CRISP: Persistent Concept Unlearning via Sparse Autoencoders](https://arxiv.org/abs/2508.13650) Tomer Ashuach, Dana Arad, Aaron Mueller, Martin Tutek, Yonatan Belinkov -+ [Sycophancy under Pressure: Evaluating and Mitigating Sycophantic Bias via Adversarial Dialogues in Scientific QA](https://arxiv.org//abs/2508.13743) ++ [Sycophancy under Pressure: Evaluating and Mitigating Sycophantic Bias via Adversarial Dialogues in Scientific QA](https://arxiv.org/abs/2508.13743) Kaiwei Zhang, Qi Jia, Zijian Chen, Wei Sun, Xiangyang Zhu, Chunyi Li, Dandan Zhu, Guangtao Zhai -+ [Enhancing Robustness of Implicit Neural Representations Against Weight Perturbations](https://arxiv.org//abs/2508.13481) ++ [Enhancing Robustness of Implicit Neural Representations Against Weight Perturbations](https://arxiv.org/abs/2508.13481) Wenyong Zhou, Yuxin Cheng, Zhengwu Liu, Taiqiang Wu, Chen Zhang, Ngai Wong -+ [Enhancing Targeted Adversarial Attacks on Large Vision-Language Models through Intermediate Projector Guidance](https://arxiv.org//abs/2508.13739) ++ [Enhancing Targeted Adversarial Attacks on Large Vision-Language Models through Intermediate Projector Guidance](https://arxiv.org/abs/2508.13739) Yiming Cao, Yanjie Li, Kaisheng Liang, Yuni Lai, Bin Xiao -+ [Timestep-Compressed Attack on Spiking Neural Networks through Timestep-Level Backpropagation](https://arxiv.org//abs/2508.13812) ++ [Timestep-Compressed Attack on Spiking Neural Networks through Timestep-Level Backpropagation](https://arxiv.org/abs/2508.13812) Donghwa Kang, Doohyun Kim, Sang-Ki Ko, Jinkyu Lee, Hyeongboo Baek, Brent ByungHoon Kang -+ [Backdooring Self-Supervised Contrastive Learning by Noisy Alignment](https://arxiv.org//abs/2508.14015) ++ [Backdooring Self-Supervised Contrastive Learning by Noisy Alignment](https://arxiv.org/abs/2508.14015) Tuo Chen, Jie Gui, Minjing Dong, Ju Jia, Lanting Fang, Jian Liu -+ [Learning to See Through Flare](https://arxiv.org//abs/2508.13907) ++ [Learning to See Through Flare](https://arxiv.org/abs/2508.13907) Xiaopeng Peng, Heath Gemar, Erin Fleet, Kyle Novak, Abbie Watnik, Grover Swartzlander -+ [Text2Weight: Bridging Natural Language and Neural Network Weight Spaces](https://arxiv.org//abs/2508.13633) ++ [Text2Weight: Bridging Natural Language and Neural Network Weight Spaces](https://arxiv.org/abs/2508.13633) Bowen Tian, Wenshuo Chen, Zexi Li, Songning Lai, Jiemin Wu, Yutao Yue -+ [Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond](https://arxiv.org//abs/2508.13679) ++ [Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond](https://arxiv.org/abs/2508.13679) Canzhe Zhao, Shinji Ito, Shuai Li -+ [FedUP: Efficient Pruning-based Federated Unlearning for Model Poisoning Attacks](https://arxiv.org//abs/2508.13853) ++ [FedUP: Efficient Pruning-based Federated Unlearning for Model Poisoning Attacks](https://arxiv.org/abs/2508.13853) Nicolò Romandini, Cristian Borcea, Rebecca Montanari, Luca Foschini -+ [When Secure Aggregation Falls Short: Achieving Long-Term Privacy in Asynchronous Federated Learning for LEO Satellite Networks](https://arxiv.org//abs/2508.13425) ++ [When Secure Aggregation Falls Short: Achieving Long-Term Privacy in Asynchronous Federated Learning for LEO Satellite Networks](https://arxiv.org/abs/2508.13425) Mohamed Elmahallawy, Tie Luo -+ [Beneath the Mask: Can Contribution Data Unveil Malicious Personas in Open-Source Projects?](https://arxiv.org//abs/2508.13453) ++ [Beneath the Mask: Can Contribution Data Unveil Malicious Personas in Open-Source Projects?](https://arxiv.org/abs/2508.13453) Ruby Nealon -+ [Red Teaming Methodology for Design Obfuscation](https://arxiv.org//abs/2508.13965) ++ [Red Teaming Methodology for Design Obfuscation](https://arxiv.org/abs/2508.13965) Yuntao Liu, Abir Akib, Zelin Lu, Qian Xu, Ankur Srivastava, Gang Qu, David Kehlet, Nij Dorairaj -+ [CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection](https://arxiv.org//abs/2508.14128) ++ [CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection](https://arxiv.org/abs/2508.14128) Jiaming Hu, Haoyu Wang, Debarghya Mukherjee, Ioannis Ch. Paschalidis -+ [ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification](https://arxiv.org//abs/2508.14134) ++ [ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification](https://arxiv.org/abs/2508.14134) Xin Wu, Fei Teng, Ji Zhang, Xingwang Li, Yuxuan Liang -+ [Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS](https://arxiv.org//abs/2508.14313) ++ [Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS](https://arxiv.org/abs/2508.14313) Can Jin, Yang Zhou, Qixin Zhang, Hongwu Peng, Di Zhang, Marco Pavone, Ligong Han, Zhang-Wei Hong, Tong Che, Dimitris N. Metaxas -+ [MMReview: A Multidisciplinary and Multimodal Benchmark for LLM-Based Peer Review Automation](https://arxiv.org//abs/2508.14146) ++ [MMReview: A Multidisciplinary and Multimodal Benchmark for LLM-Based Peer Review Automation](https://arxiv.org/abs/2508.14146) Xian Gao, Jiacheng Ruan, Zongyun Zhang, Jingsheng Gao, Ting Liu, Yuzhuo Fu -+ [Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text](https://arxiv.org//abs/2508.14190) ++ [Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text](https://arxiv.org/abs/2508.14190) Zixin Rao, Youssef Mohamed, Shang Liu, Zeyan Liu -+ [Noise Robust One-Class Intrusion Detection on Dynamic Graphs](https://arxiv.org//abs/2508.14192) ++ [Noise Robust One-Class Intrusion Detection on Dynamic Graphs](https://arxiv.org/abs/2508.14192) Aleksei Liuliakov, Alexander Schulz, Luca Hermes, Barbara Hammer -+ [MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers](https://arxiv.org//abs/2508.14925) ++ [MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers](https://arxiv.org/abs/2508.14925) Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Haohua Du, Xiangyang Li -+ [CIA+TA Risk Assessment for AI Reasoning Vulnerabilities](https://arxiv.org//abs/2508.15839) ++ [CIA+TA Risk Assessment for AI Reasoning Vulnerabilities](https://arxiv.org/abs/2508.15839) Yuksel Aydin -+ [Mechanistic Exploration of Backdoored Large Language Model Attention Patterns](https://arxiv.org//abs/2508.15847) ++ [Mechanistic Exploration of Backdoored Large Language Model Attention Patterns](https://arxiv.org/abs/2508.15847) Mohammed Abu Baker, Lakshmi Babu-Saheer -+ [Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution](https://arxiv.org//abs/2508.15840) ++ [Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution](https://arxiv.org/abs/2508.15840) Robert Dilworth @@ -4140,96 +4140,96 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Xian Gao, Jiacheng Ruan, Zongyun Zhang, Jingsheng Gao, Ting Liu, Yuzhuo Fu # 2025-08-18 -+ [Systematic Analysis of MCP Security](https://arxiv.org//abs/2508.12538) ++ [Systematic Analysis of MCP Security](https://arxiv.org/abs/2508.12538) Yongjian Guo, Puzhuo Liu, Wanlun Ma, Zehang Deng, Xiaogang Zhu, Peng Di, Xi Xiao, Sheng Wen -+ [Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering](https://arxiv.org//abs/2508.12672) ++ [Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering](https://arxiv.org/abs/2508.12672) Emmanouil Kritharakis, Dusan Jakovetic, Antonios Makris, Konstantinos Tserpes -+ [RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns](https://arxiv.org//abs/2508.13152) ++ [RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns](https://arxiv.org/abs/2508.13152) Xin Chen, Junchao Wu, Shu Yang, Runzhe Zhan, Zeyu Wu, Ziyang Luo, Di Wang, Min Yang, Lidia S. Chao, Derek F. Wong -+ [Drifting Away from Truth: GenAI-Driven News Diversity Challenges LVLM-Based Misinformation Detection](https://arxiv.org//abs/2508.12711) ++ [Drifting Away from Truth: GenAI-Driven News Diversity Challenges LVLM-Based Misinformation Detection](https://arxiv.org/abs/2508.12711) Fanxiao Li, Jiaying Wu, Tingchao Fu, Yunyun Dong, Bingbing Song, Wei Zhou -+ [Unlearning Comparator: A Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods](https://arxiv.org//abs/2508.12730) ++ [Unlearning Comparator: A Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods](https://arxiv.org/abs/2508.12730) Jaeung Lee, Suhyeon Yu, Yurim Jang, Simon S. Woo, Jaemin Jo -+ [Efficient and Verifiable Privacy-Preserving Convolutional Computation for CNN Inference with Untrusted Clouds](https://arxiv.org//abs/2508.12832) ++ [Efficient and Verifiable Privacy-Preserving Convolutional Computation for CNN Inference with Untrusted Clouds](https://arxiv.org/abs/2508.12832) Jinyu Lu, Xinrong Sun, Yunting Tao, Tong Ji, Fanyu Kong, Guoqiang Yang -+ [The Hidden Cost of Correlation: Rethinking Privacy Leakage in Local Differential Privacy](https://arxiv.org//abs/2508.12539) ++ [The Hidden Cost of Correlation: Rethinking Privacy Leakage in Local Differential Privacy](https://arxiv.org/abs/2508.12539) Sandaru Jayawardana, Sennur Ulukus, Ming Ding, Kanchana Thilakarathna -+ [MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies](https://arxiv.org//abs/2508.13048) ++ [MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies](https://arxiv.org/abs/2508.13048) Weiwei Qi, Shuo Shao, Wei Gu, Tianhang Zheng, Puning Zhao, Zhan Qin, Kui Ren -+ [Involuntary Jailbreak](https://arxiv.org//abs/2508.13246) ++ [Involuntary Jailbreak](https://arxiv.org/abs/2508.13246) Yangyang Guo, Yangyan Li, Mohan Kankanhalli -+ [DAASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples](https://arxiv.org//abs/2508.13309) ++ [DAASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples](https://arxiv.org/abs/2508.13309) Abdullah Al Nomaan Nafi, Habibur Rahaman, Zafaryab Haider, Tanzim Mahfuz, Fnu Suya, Swarup Bhunia, Prabuddha Chakraborty -+ [Efficient Constraint-Aware Flow Matching via Randomized Exploration](https://arxiv.org//abs/2508.13316) ++ [Efficient Constraint-Aware Flow Matching via Randomized Exploration](https://arxiv.org/abs/2508.13316) Zhengyan Huan, Jacob Boerma, Li-Ping Liu, Shuchin Aeron -+ [DAIQ: Auditing Demographic Attribute Inference from Question in LLMs](https://arxiv.org//abs/2508.15830) ++ [DAIQ: Auditing Demographic Attribute Inference from Question in LLMs](https://arxiv.org/abs/2508.15830) Srikant Panda, Hitesh Laxmichand Patel, Shahad Al-Khalifa, Amit Agarwal, Hend Al-Khalifa, Sharefah Al-Ghamdi # 2025-08-17 -+ [Distribution Matching via Generalized Consistency Models](https://arxiv.org//abs/2508.12222) ++ [Distribution Matching via Generalized Consistency Models](https://arxiv.org/abs/2508.12222) Sagar Shrestha, Rajesh Shrestha, Tri Nguyen, Subash Timilsina -+ [CRoC: Context Refactoring Contrast for Graph Anomaly Detection with Limited Supervision](https://arxiv.org//abs/2508.12278) ++ [CRoC: Context Refactoring Contrast for Graph Anomaly Detection with Limited Supervision](https://arxiv.org/abs/2508.12278) Siyue Xie, Da Sun Handason Tam, Wing Cheong Lau -+ [Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position](https://arxiv.org//abs/2508.12398) ++ [Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position](https://arxiv.org/abs/2508.12398) Zhixin Xie, Xurui Song, Jun Luo -+ [Adversarial Attacks on VQA-NLE: Exposing and Alleviating Inconsistencies in Visual Question Answering Explanations](https://arxiv.org//abs/2508.12430) ++ [Adversarial Attacks on VQA-NLE: Exposing and Alleviating Inconsistencies in Visual Question Answering Explanations](https://arxiv.org/abs/2508.12430) Yahsin Yeh, Yilun Wu, Bokai Ruan, Honghan Shuai -+ [EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization](https://arxiv.org//abs/2508.12479) ++ [EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization](https://arxiv.org/abs/2508.12479) Chinmay Maheshwari, Chinmay Pimpalkhare, Debasish Chatterjee -+ [Rethinking Safety in LLM Fine-tuning: An Optimization Perspective](https://arxiv.org//abs/2508.12531) ++ [Rethinking Safety in LLM Fine-tuning: An Optimization Perspective](https://arxiv.org/abs/2508.12531) Minseon Kim, Jin Myung Kwak, Lama Alssum, Bernard Ghanem, Philip Torr, David Krueger, Fazl Barez, Adel Bibi -+ [ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers](https://arxiv.org//abs/2508.12384) ++ [ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers](https://arxiv.org/abs/2508.12384) Hanwen Cao, Haobo Lu, Xiaosen Wang, Kun He -+ [CryptPEFT: Efficient and Private Neural Network Inference via Parameter-Efficient Fine-Tuning](https://arxiv.org//abs/2508.12264) ++ [CryptPEFT: Efficient and Private Neural Network Inference via Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2508.12264) Saisai Xia, Wenhao Wang, Zihao Wang, Yuhui Zhang, Yier Jin, Dan Meng, Rui Hou -+ [Adjustable AprilTags For Identity Secured Tasks](https://arxiv.org//abs/2508.12304) ++ [Adjustable AprilTags For Identity Secured Tasks](https://arxiv.org/abs/2508.12304) Hao Li -+ [MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols](https://arxiv.org//abs/2508.13220) ++ [MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols](https://arxiv.org/abs/2508.13220) Yixuan Yang, Daoyuan Wu, Yufan Chen -+ [Passive Hack-Back Strategies for Cyber Attribution: Covert Vectors in Denied Environment](https://arxiv.org//abs/2508.16637) ++ [Passive Hack-Back Strategies for Cyber Attribution: Covert Vectors in Denied Environment](https://arxiv.org/abs/2508.16637) Abraham Itzhak Weinberg @@ -4238,157 +4238,157 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yixuan Yang, Daoyuan Wu, Yufan Chen # 2025-08-16 -+ [Rigorous Feature Importance Scores based on Shapley Value and Banzhaf Index](https://arxiv.org//abs/2508.11959) ++ [Rigorous Feature Importance Scores based on Shapley Value and Banzhaf Index](https://arxiv.org/abs/2508.11959) Xuanxiang Huang, Olivier Létoffé, Joao Marques-Silva -+ [Deciphering the Interplay between Attack and Protection Complexity in Privacy-Preserving Federated Learning](https://arxiv.org//abs/2508.11907) ++ [Deciphering the Interplay between Attack and Protection Complexity in Privacy-Preserving Federated Learning](https://arxiv.org/abs/2508.11907) Xiaojin Zhang, Mingcong Xu, Yiming Li, Wei Chen, Qiang Yang -+ [CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection](https://arxiv.org//abs/2508.11933) ++ [CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection](https://arxiv.org/abs/2508.11933) Yue Wang, Liesheng Wei, Yuxiang Wang -+ [Mitigating Jailbreaks with Intent-Aware LLMs](https://arxiv.org//abs/2508.12072) ++ [Mitigating Jailbreaks with Intent-Aware LLMs](https://arxiv.org/abs/2508.12072) Wei Jie Yeo, Ranjan Satapathy, Erik Cambria -+ [ComplicitSplat: Downstream Models are Vulnerable to Blackbox Attacks by 3D Gaussian Splat Camouflages](https://arxiv.org//abs/2508.11854) ++ [ComplicitSplat: Downstream Models are Vulnerable to Blackbox Attacks by 3D Gaussian Splat Camouflages](https://arxiv.org/abs/2508.11854) Matthew Hull, Haoyang Yang, Pratham Mehta, Mansi Phute, Aeree Cho, Haorang Wang, Matthew Lau, Wenke Lee, Wilian Lunardi, Martin Andreoni, Polo Chau -+ [TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks](https://arxiv.org//abs/2508.12132) ++ [TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks](https://arxiv.org/abs/2508.12132) Amira Guesmi, Bassem Ouni, Muhammad Shafique -+ [An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction](https://arxiv.org//abs/2508.11931) ++ [An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction](https://arxiv.org/abs/2508.11931) Tim van Erven, Jack Mayo, Julia Olkhovskaya, Chen-Yu Wei -+ [Adversarial Robustness in Distributed Quantum Machine Learning](https://arxiv.org//abs/2508.11848) ++ [Adversarial Robustness in Distributed Quantum Machine Learning](https://arxiv.org/abs/2508.11848) Pouya Kananian, Hans-Arno Jacobsen -+ [Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous](https://arxiv.org//abs/2508.12175) ++ [Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous](https://arxiv.org/abs/2508.12175) Ben Nassi, Stav Cohen, Or Yair -+ [Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions](https://arxiv.org//abs/2508.13214) ++ [Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions](https://arxiv.org/abs/2508.13214) Xuyang Guo, Zekai Huang, Zhao Song, Jiahao Zhang # 2025-08-15 -+ [When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs](https://arxiv.org//abs/2508.11383) ++ [When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs](https://arxiv.org/abs/2508.11383) Mikhail Seleznyov, Mikhail Chaichuk, Gleb Ershov, Alexander Panchenko, Elena Tutubalina, Oleg Somov -+ [Noise Matters: Optimizing Matching Noise for Diffusion Classifiers](https://arxiv.org//abs/2508.11330) ++ [Noise Matters: Optimizing Matching Noise for Diffusion Classifiers](https://arxiv.org/abs/2508.11330) Yanghao Wang, Long Chen -+ [Semantically Guided Adversarial Testing of Vision Models Using Language Models](https://arxiv.org//abs/2508.11341) ++ [Semantically Guided Adversarial Testing of Vision Models Using Language Models](https://arxiv.org/abs/2508.11341) Katarzyna Filus, Jorge M. Cruz-Duarte -+ [Remove360: Benchmarking Residuals After Object Removal in 3D Gaussian Splatting](https://arxiv.org//abs/2508.11431) ++ [Remove360: Benchmarking Residuals After Object Removal in 3D Gaussian Splatting](https://arxiv.org/abs/2508.11431) Simona Kocour, Assia Benbihi, Torsten Sattler -+ [Boosting the Robustness-Accuracy Trade-off of SNNs by Robust Temporal Self-Ensemble](https://arxiv.org//abs/2508.11279) ++ [Boosting the Robustness-Accuracy Trade-off of SNNs by Robust Temporal Self-Ensemble](https://arxiv.org/abs/2508.11279) Jihang Wang, Dongcheng Zhao, Ruolin Chen, Qian Zhang, Yi Zeng -+ [Robust Convolution Neural ODEs via Contractivity-promoting regularization](https://arxiv.org//abs/2508.11432) ++ [Robust Convolution Neural ODEs via Contractivity-promoting regularization](https://arxiv.org/abs/2508.11432) Muhammad Zakwan, Liang Xu, Giancarlo Ferrari-Trecate -+ [SafeSieve: From Heuristics to Experience in Progressive Pruning for LLM-based Multi-Agent Communication](https://arxiv.org//abs/2508.11733) ++ [SafeSieve: From Heuristics to Experience in Progressive Pruning for LLM-based Multi-Agent Communication](https://arxiv.org/abs/2508.11733) Ruijia Zhang, Xinyan Zhao, Ruixiang Wang, Sigen Chen, Guibin Zhang, An Zhang, Kun Wang, Qingsong Wen -+ [Limitation Learning: Catching Adverse Dialog with GAIL](https://arxiv.org//abs/2508.11767) ++ [Limitation Learning: Catching Adverse Dialog with GAIL](https://arxiv.org/abs/2508.11767) Noah Kasmanoff, Rahul Zalkikar -+ [Assessing User Privacy Leakage in Synthetic Packet Traces: An Attack-Grounded Approach](https://arxiv.org//abs/2508.11742) ++ [Assessing User Privacy Leakage in Synthetic Packet Traces: An Attack-Grounded Approach](https://arxiv.org/abs/2508.11742) Minhao Jin, Hongyu He, Maria Apostolaki # 2025-08-14 -+ [A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning](https://arxiv.org//abs/2508.10315) ++ [A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning](https://arxiv.org/abs/2508.10315) Keke Gai, Dongjue Wang, Jing Yu, Liehuang Zhu, Qi Wu -+ [Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation](https://arxiv.org//abs/2508.10404) ++ [Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation](https://arxiv.org/abs/2508.10404) Huizhen Shu, Xuying Li, Qirui Wang, Yuji Kosuga, Mengqiu Tian, Zhuo Li -+ [Contrastive ECOC: Learning Output Codes for Adversarial Defense](https://arxiv.org//abs/2508.10491) ++ [Contrastive ECOC: Learning Output Codes for Adversarial Defense](https://arxiv.org/abs/2508.10491) Che-Yu Chou, Hung-Hsuan Chen -+ [Hybrid Generative Fusion for Efficient and Privacy-Preserving Face Recognition Dataset Generation](https://arxiv.org//abs/2508.10672) ++ [Hybrid Generative Fusion for Efficient and Privacy-Preserving Face Recognition Dataset Generation](https://arxiv.org/abs/2508.10672) Feiran Li, Qianqian Xu, Shilong Bao, Boyu Han, Zhiyong Yang, Qingming Huang -+ [Enhancing Fairness in Autoencoders for Node-Level Graph Anomaly Detection](https://arxiv.org//abs/2508.10785) ++ [Enhancing Fairness in Autoencoders for Node-Level Graph Anomaly Detection](https://arxiv.org/abs/2508.10785) Shouju Wang, Yuchen Song, Sheng'en Li, Dongmian Zou -+ [Searching for Privacy Risks in LLM Agents via Simulation](https://arxiv.org//abs/2508.10880) ++ [Searching for Privacy Risks in LLM Agents via Simulation](https://arxiv.org/abs/2508.10880) Yanzhe Zhang, Diyi Yang -+ [Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts](https://arxiv.org//abs/2508.10390) ++ [Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts](https://arxiv.org/abs/2508.10390) Chiyu Zhang, Lu Zhou, Xiaogang Xu, Jiafei Wu, Liming Fang, Zhe Liu -+ [Towards Powerful and Practical Patch Attacks for 2D Object Detection in Autonomous Driving](https://arxiv.org//abs/2508.10600) ++ [Towards Powerful and Practical Patch Attacks for 2D Object Detection in Autonomous Driving](https://arxiv.org/abs/2508.10600) Yuxin Cao, Yedi Zhang, Wentao He, Yifan Liao, Yan Xiao, Chang Li, Zhiyong Huang, Jin Song Dong -+ [Pruning and Malicious Injection: A Retraining-Free Backdoor Attack on Transformer Models](https://arxiv.org//abs/2508.10243) ++ [Pruning and Malicious Injection: A Retraining-Free Backdoor Attack on Transformer Models](https://arxiv.org/abs/2508.10243) Taibiao Zhao, Mingxuan Sun, Hao Wang, Xiaobing Chen, Xiangwei Zhou -+ [Oops!... They Stole it Again: Attacks on Split Learning](https://arxiv.org//abs/2508.10598) ++ [Oops!... They Stole it Again: Attacks on Split Learning](https://arxiv.org/abs/2508.10598) Tanveer Khan, Antonis Michalas -+ [BERTector: Intrusion Detection Based on Joint-Dataset Learning](https://arxiv.org//abs/2508.10327) ++ [BERTector: Intrusion Detection Based on Joint-Dataset Learning](https://arxiv.org/abs/2508.10327) Haoyang Hu, Xun Huang, Chenyu Wu, Shiwen Liu, Zhichao Lian, Shuangquan Zhang -+ [MirGuard: Towards a Robust Provenance-based Intrusion Detection System Against Graph Manipulation Attacks](https://arxiv.org//abs/2508.10639) ++ [MirGuard: Towards a Robust Provenance-based Intrusion Detection System Against Graph Manipulation Attacks](https://arxiv.org/abs/2508.10639) Anyuan Sang, Lu Zhou, Li Yang, Junbo Jia, Huipeng Yang, Pengbin Feng, Jianfeng Ma -+ [Bistochastically private release of longitudinal data](https://arxiv.org//abs/2508.10606) ++ [Bistochastically private release of longitudinal data](https://arxiv.org/abs/2508.10606) Nicolas Ruiz -+ [MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications](https://arxiv.org//abs/2508.10991) ++ [MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications](https://arxiv.org/abs/2508.10991) Wenpeng Xing, Zhonghao Qi, Yupeng Qin, Yilin Li, Caini Chang, Jiahui Yu, Changting Lin, Zhenzhen Xie, Meng Han -+ [SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth](https://arxiv.org//abs/2508.11009) ++ [SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth](https://arxiv.org/abs/2508.11009) Wenpeng Xing, Lanyi Wei, Haixiao Hu, Rongchang Li, Mohan Li, Changting Lin, Meng Han -+ [Failures to Surface Harmful Contents in Video Large Language Models](https://arxiv.org//abs/2508.10974) ++ [Failures to Surface Harmful Contents in Video Large Language Models](https://arxiv.org/abs/2508.10974) Yuxin Cao, Wei Song, Derui Wang, Jingling Xue, Jin Song Dong -+ [SHLIME: Foiling adversarial attacks fooling SHAP and LIME](https://arxiv.org//abs/2508.11053) ++ [SHLIME: Foiling adversarial attacks fooling SHAP and LIME](https://arxiv.org/abs/2508.11053) Sam Chauhan, Estelle Duguet, Karthik Ramakrishnan, Hugh Van Deventer, Jack Kruger, Ranjan Subbaraman -+ [Privacy-Aware Detection of Fake Identity Documents: Methodology, Benchmark, and Improved Detection Methods (FakeIDet2)](https://arxiv.org//abs/2508.11716) ++ [Privacy-Aware Detection of Fake Identity Documents: Methodology, Benchmark, and Improved Detection Methods (FakeIDet2)](https://arxiv.org/abs/2508.11716) Javier Muñoz-Haro, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez -+ [Contrast Sensitivity in Multimodal Large Language Models: A Psychophysics-Inspired Evaluation](https://arxiv.org//abs/2508.10367) ++ [Contrast Sensitivity in Multimodal Large Language Models: A Psychophysics-Inspired Evaluation](https://arxiv.org/abs/2508.10367) Pablo Hernández-Cámara, Alexandra Gomez-Villa, Jose Manuel Jaén-Lorites, Jorge Vila-Tomás, Valero Laparra, Jesus Malo @@ -4401,1066 +4401,1066 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Chiyu Zhang, Lu Zhou, Xiaogang Xu, Jiafei Wu, Liming Fang, Zhe Liu # 2025-08-13 -+ [Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference](https://arxiv.org//abs/2508.09442) ++ [Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference](https://arxiv.org/abs/2508.09442) Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, Zhan Qin -+ [NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs](https://arxiv.org//abs/2508.09473) ++ [NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs](https://arxiv.org/abs/2508.09473) Birong Pan, Mayi Xu, Qiankun Pi, Jianhao Chen, Yuanyuan Zhu, Ming Zhong, Tieyun Qian -+ [Generation of Indian Sign Language Letters, Numbers, and Words](https://arxiv.org//abs/2508.09522) ++ [Generation of Indian Sign Language Letters, Numbers, and Words](https://arxiv.org/abs/2508.09522) Ajeet Kumar Yadav, Nishant Kumar, Rathna G N -+ [Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection](https://arxiv.org//abs/2508.09652) ++ [Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection](https://arxiv.org/abs/2508.09652) Andrea Ponte, Luca Demetrio, Luca Oneto, Ivan Tesfai Ogbu, Battista Biggio, Fabio Roli -+ [The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage](https://arxiv.org//abs/2508.09603) ++ [The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage](https://arxiv.org/abs/2508.09603) Skyler Hallinan, Jaehun Jung, Melanie Sclar, Ximing Lu, Abhilasha Ravichander, Sahana Ramnath, Yejin Choi, Sai Praneeth Karimireddy, Niloofar Mireshghallah, Xiang Ren -+ [Slow Tuning and Low-Entropy Masking for Safe Chain-of-Thought Distillation](https://arxiv.org//abs/2508.09666) ++ [Slow Tuning and Low-Entropy Masking for Safe Chain-of-Thought Distillation](https://arxiv.org/abs/2508.09666) Ziyang Ma, Qingyue Yuan, Linhai Zhang, Deyu Zhou -+ [The Perils of Chart Deception: How Misleading Visualizations Affect Vision-Language Models](https://arxiv.org//abs/2508.09716) ++ [The Perils of Chart Deception: How Misleading Visualizations Affect Vision-Language Models](https://arxiv.org/abs/2508.09716) Ridwan Mahbub, Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Mizanur Rahman, Mir Tafseer Nayeem, Enamul Hoque -+ [IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding](https://arxiv.org//abs/2508.09456) ++ [IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding](https://arxiv.org/abs/2508.09456) Junxian Li, Beining Xu, Di Zhang -+ [CLIP-Flow: A Universal Discriminator for AI-Generated Images Inspired by Anomaly Detection](https://arxiv.org//abs/2508.09477) ++ [CLIP-Flow: A Universal Discriminator for AI-Generated Images Inspired by Anomaly Detection](https://arxiv.org/abs/2508.09477) Zhipeng Yuan, Kai Wang, Weize Quan, Dong-Ming Yan, Tieru Wu -+ [Improving the Speaker Anonymization Evaluation's Robustness to Target Speakers with Adversarial Learning](https://arxiv.org//abs/2508.09803) ++ [Improving the Speaker Anonymization Evaluation's Robustness to Target Speakers with Adversarial Learning](https://arxiv.org/abs/2508.09803) Carlos Franzreb, Arnab Das, Tim Polzehl, Sebastian Möller -+ [Security Analysis of ChatGPT: Threats and Privacy Risks](https://arxiv.org//abs/2508.09426) ++ [Security Analysis of ChatGPT: Threats and Privacy Risks](https://arxiv.org/abs/2508.09426) Yushan Xiang, Zhongwen Li, Xiaoqi Li -+ [Extending the OWASP Multi-Agentic System Threat Modeling Guide: Insights from Multi-Agent Security Research](https://arxiv.org//abs/2508.09815) ++ [Extending the OWASP Multi-Agentic System Threat Modeling Guide: Insights from Multi-Agent Security Research](https://arxiv.org/abs/2508.09815) Klaudia Krawiecka, Christian Schroeder de Witt -+ [Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development](https://arxiv.org//abs/2508.10108) ++ [Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development](https://arxiv.org/abs/2508.10108) Sattvik Sahai, Prasoon Goyal, Michael Johnston, Anna Gottardi, Yao Lu, Lucy Hu, Luke Dai, Shaohua Liu, Samyuth Sagi, Hangjie Shi, Desheng Zhang, Lavina Vaz, Leslie Ball, Maureen Murray, Rahul Gupta, Shankar Ananthakrishna -+ [Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model](https://arxiv.org//abs/2508.10110) ++ [Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model](https://arxiv.org/abs/2508.10110) Sushrut Patwardhan, Raghavendra Ramachandra, Sushma Venkatesh -+ [Detecting Untargeted Attacks and Mitigating Unreliable Updates in Federated Learning for Underground Mining Operations](https://arxiv.org//abs/2508.10212) ++ [Detecting Untargeted Attacks and Mitigating Unreliable Updates in Federated Learning for Underground Mining Operations](https://arxiv.org/abs/2508.10212) Md Sazedur Rahman, Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong -+ [IPG: Incremental Patch Generation for Generalized Adversarial Patch Training](https://arxiv.org//abs/2508.10946) ++ [IPG: Incremental Patch Generation for Generalized Adversarial Patch Training](https://arxiv.org/abs/2508.10946) Wonho Lee, Hyunsik Na, Jisu Lee, Daeseon Choi -+ [Do Language Models Agree with Human Perceptions of Suspense in Stories?](https://arxiv.org//abs/2508.15794) ++ [Do Language Models Agree with Human Perceptions of Suspense in Stories?](https://arxiv.org/abs/2508.15794) Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza, Diana M. Popescu, Joni Isbell, Chandreyi Chakraborty, Mark Riedl # 2025-08-12 -+ [Safe Semantics, Unsafe Interpretations: Tackling Implicit Reasoning Safety in Large Vision-Language Models](https://arxiv.org//abs/2508.08926) ++ [Safe Semantics, Unsafe Interpretations: Tackling Implicit Reasoning Safety in Large Vision-Language Models](https://arxiv.org/abs/2508.08926) Wei Cai, Jian Zhao, Yuchu Jiang, Tianle Zhang, Xuelong Li -+ [SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling](https://arxiv.org//abs/2508.09105) ++ [SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling](https://arxiv.org/abs/2508.09105) Shixuan Sun, Siyuan Liang, Ruoyu Chen, Jianjie Huang, Jingzhi Li, Xiaochun Cao -+ [AI Security Map: Holistic Organization of AI Security Technologies and Impacts on Stakeholders](https://arxiv.org//abs/2508.08583) ++ [AI Security Map: Holistic Organization of AI Security Technologies and Impacts on Stakeholders](https://arxiv.org/abs/2508.08583) Hiroya Kato, Kentaro Kita, Kento Hasegawa, Seira Hidano -+ [Generative AI for Critical Infrastructure in Smart Grids: A Unified Framework for Synthetic Data Generation and Anomaly Detection](https://arxiv.org//abs/2508.08593) ++ [Generative AI for Critical Infrastructure in Smart Grids: A Unified Framework for Synthetic Data Generation and Anomaly Detection](https://arxiv.org/abs/2508.08593) Aydin Zaboli, Junho Hong -+ [Securing Educational LLMs: A Generalised Taxonomy of Attacks on LLMs and DREAD Risk Assessment](https://arxiv.org//abs/2508.08629) ++ [Securing Educational LLMs: A Generalised Taxonomy of Attacks on LLMs and DREAD Risk Assessment](https://arxiv.org/abs/2508.08629) Farzana Zahid, Anjalika Sewwandi, Lee Brandon, Vimal Kumar, Roopak Sinha -+ [SafeFix: Targeted Model Repair via Controlled Image Generation](https://arxiv.org//abs/2508.08701) ++ [SafeFix: Targeted Model Repair via Controlled Image Generation](https://arxiv.org/abs/2508.08701) Ouyang Xu, Baoming Zhang, Ruiyu Mao, Yunhui Guo -+ [EditMF: Drawing an Invisible Fingerprint for Your Large Language Models](https://arxiv.org//abs/2508.08836) ++ [EditMF: Drawing an Invisible Fingerprint for Your Large Language Models](https://arxiv.org/abs/2508.08836) Jiaxuan Wu, Yinghan Zhou, Wanli Peng, Yiming Xue, Juan Wen, Ping Zhong -+ [Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models](https://arxiv.org//abs/2508.08875) ++ [Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models](https://arxiv.org/abs/2508.08875) Fuyao Zhang, Xinyu Yan, Tiantong Wu, Wenjie Li, Tianxiang Chen, Yang Cao, Ran Yan, Longtao Huang, Wei Yang Bryan Lim, Qiang Yang -+ [Attacks and Defenses Against LLM Fingerprinting](https://arxiv.org//abs/2508.09021) ++ [Attacks and Defenses Against LLM Fingerprinting](https://arxiv.org/abs/2508.09021) Kevin Kurian, Ethan Holland, Sean Oesch -+ [When Deepfakes Look Real: Detecting AI-Generated Faces with Unlabeled Data due to Annotation Challenges](https://arxiv.org//abs/2508.09022) ++ [When Deepfakes Look Real: Detecting AI-Generated Faces with Unlabeled Data due to Annotation Challenges](https://arxiv.org/abs/2508.09022) Zhiqiang Yang, Renshuai Tao, Xiaolong Zheng, Guodong Yang, Chunjie Zhang -+ [Privacy-protected Retrieval-Augmented Generation for Knowledge Graph Question Answering](https://arxiv.org//abs/2508.08785) ++ [Privacy-protected Retrieval-Augmented Generation for Knowledge Graph Question Answering](https://arxiv.org/abs/2508.08785) Yunfeng Ning, Mayi Xu, Jintao Wen, Qiankun Pi, Yuanyuan Zhu, Ming Zhong, Jiawei Jiang, Tieyun Qian -+ [MADPromptS: Unlocking Zero-Shot Morphing Attack Detection with Multiple Prompt Aggregation](https://arxiv.org//abs/2508.08939) ++ [MADPromptS: Unlocking Zero-Shot Morphing Attack Detection with Multiple Prompt Aggregation](https://arxiv.org/abs/2508.08939) Eduarda Caldeira, Fadi Boutros, Naser Damer -+ [Deep Learning Models for Robust Facial Liveness Detection](https://arxiv.org//abs/2508.09094) ++ [Deep Learning Models for Robust Facial Liveness Detection](https://arxiv.org/abs/2508.09094) Oleksandr Kuznetsov, Emanuele Frontoni, Luca Romeo, Riccardo Rosati, Andrea Maranesi, Alessandro Muscatello -+ [Exploring Cross-Stage Adversarial Transferability in Class-Incremental Continual Learning](https://arxiv.org//abs/2508.08920) ++ [Exploring Cross-Stage Adversarial Transferability in Class-Incremental Continual Learning](https://arxiv.org/abs/2508.08920) Jungwoo Kim, Jong-Seok Lee -+ [Fre-CW: Targeted Attack on Time Series Forecasting using Frequency Domain Loss](https://arxiv.org//abs/2508.08955) ++ [Fre-CW: Targeted Attack on Time Series Forecasting using Frequency Domain Loss](https://arxiv.org/abs/2508.08955) Naifu Feng, Lixing Chen, Junhua Tang, Hua Ding, Jianhua Li, Yang Bai -+ [Multi-Target Backdoor Attacks Against Speaker Recognition](https://arxiv.org//abs/2508.08559) ++ [Multi-Target Backdoor Attacks Against Speaker Recognition](https://arxiv.org/abs/2508.08559) Alexandrine Fortier, Sonal Joshi, Thomas Thebaud, Jesus Villalba Lopez, Najim Dehak, Patrick Cardinal -+ [Image selective encryption analysis using mutual information in CNN based embedding space](https://arxiv.org//abs/2508.08832) ++ [Image selective encryption analysis using mutual information in CNN based embedding space](https://arxiv.org/abs/2508.08832) Ikram Messadi, Giulia Cervia, Vincent Itier -+ [Evasive Ransomware Attacks Using Low-level Behavioral Adversarial Examples](https://arxiv.org//abs/2508.08656) ++ [Evasive Ransomware Attacks Using Low-level Behavioral Adversarial Examples](https://arxiv.org/abs/2508.08656) Manabu Hirano, Ryotaro Kobayashi -+ [Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance](https://arxiv.org//abs/2508.08789) ++ [Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance](https://arxiv.org/abs/2508.08789) Yuchu Jiang, Jian Zhao, Yuchen Yuan, Tianle Zhang, Yao Huang, Yanghao Zhang, Yan Wang, Yanshu Li, Xizhong Guo, Yusheng Zhao, Jun Zhang, Zhi Zhang, Xiaojian Lin, Yixiu Zou, Haoxuan Ma, Yuhu Shang, Yuzhi Hu, Keshu Cai, Ruochen Zhang, Boyuan Chen, Yilan Gao, Ziheng Jiao, Yi Qin, Shuangjun Du, Xiao Tong, Zhekun Liu, Yu Chen, Xuankun Rong, Rui Wang, Yejie Zheng, Zhaoxin Fan, Hongyuan Zhang, Pan Zhou, Lei Jin, Hao Zhao, Xu Yang, Jiaojiao Zhao, Jianshu Li, Joey Tianyi Zhou, Zhi-Qi Cheng, Longtao Huang, Zhiyi Liu, Zheng Zhu, Jianan Li, Gang Wang, Qi Li, Xu-Yao Zhang, Yaodong Yang, Mang Ye, Wenqi Ren, Zhaofeng He, Hang Su, Rongrong Ni, Liping Jing, Xingxing Wei, Junliang Xing, Massimo Alioto, Shengmei Shen, Petia Radeva, Dacheng Tao, Ya-Qin Zhang, Shuicheng Yan, Chi Zhang, Zhongjiang He, Xuelong Li -+ [Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems](https://arxiv.org//abs/2508.09230) ++ [Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems](https://arxiv.org/abs/2508.09230) Yutong Wu, Jie Zhang, Yiming Li, Chao Zhang, Qing Guo, Nils Lukas, Tianwei Zhang -+ [Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs](https://arxiv.org//abs/2508.09288) ++ [Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs](https://arxiv.org/abs/2508.09288) Aayush Gupta -+ [Exact Verification of Graph Neural Networks with Incremental Constraint Solving](https://arxiv.org//abs/2508.09320) ++ [Exact Verification of Graph Neural Networks with Incremental Constraint Solving](https://arxiv.org/abs/2508.09320) Minghao Liu, Chia-Hsuan Lu, Marta Kwiatkowska -+ [Collective dynamics of strategic classification](https://arxiv.org//abs/2508.09340) ++ [Collective dynamics of strategic classification](https://arxiv.org/abs/2508.09340) Marta C. Couto, Flavia Barsotti, Fernando P. Santos -+ [Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users](https://arxiv.org//abs/2508.09245) ++ [Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users](https://arxiv.org/abs/2508.09245) Jeffri Murrugarra-LLerena, Haoran Niu, K. Suzanne Barber, Hal Daumé III, Yang Trista Cao, Paola Cascante-Bonilla -+ [Constrained Black-Box Attacks Against Multi-Agent Reinforcement Learning](https://arxiv.org//abs/2508.09275) ++ [Constrained Black-Box Attacks Against Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2508.09275) Amine Andam, Jamal Bentahar, Mustapha Hedabou -+ [Securing Agentic AI: Threat Modeling and Risk Analysis for Network Monitoring Agentic AI System](https://arxiv.org//abs/2508.10043) ++ [Securing Agentic AI: Threat Modeling and Risk Analysis for Network Monitoring Agentic AI System](https://arxiv.org/abs/2508.10043) Pallavi Zambare, Venkata Nikhil Thanikella, Ying Liu -+ [Search-Time Data Contamination](https://arxiv.org//abs/2508.13180) ++ [Search-Time Data Contamination](https://arxiv.org/abs/2508.13180) Ziwen Han, Meher Mankikar, Julian Michael, Zifan Wang -+ [Special-Character Adversarial Attacks on Open-Source Language Model](https://arxiv.org//abs/2508.14070) ++ [Special-Character Adversarial Attacks on Open-Source Language Model](https://arxiv.org/abs/2508.14070) Ephraiem Sarabamoun -+ [A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy](https://arxiv.org//abs/2508.14079) ++ [A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy](https://arxiv.org/abs/2508.14079) Maxime Heuillet, Rishika Bhagwatkar, Jonas Ngnawé, Yann Pequignot, Alexandre Larouche, Christian Gagné, Irina Rish, Ola Ahmad, Audrey Durand -+ [Privacy Preserving Inference of Personalized Content for Out of Matrix Users](https://arxiv.org//abs/2508.14905) ++ [Privacy Preserving Inference of Personalized Content for Out of Matrix Users](https://arxiv.org/abs/2508.14905) Michael Sun, Tai Vu, Andrew Wang # 2025-08-11 -+ [Optimization of Private Semantic Communication Performance: An Uncooperative Covert Communication Method](https://arxiv.org//abs/2508.07586) ++ [Optimization of Private Semantic Communication Performance: An Uncooperative Covert Communication Method](https://arxiv.org/abs/2508.07586) Wenjing Zhang, Ye Hu, Tao Luo, Zhilong Zhang, Mingzhe Chen -+ [1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning](https://arxiv.org//abs/2508.07667) ++ [1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning](https://arxiv.org/abs/2508.07667) Wenkai Li, Liwen Sun, Zhenxiang Guan, Xuhui Zhou, Maarten Sap -+ [Best-Effort Policies for Robust Markov Decision Processes](https://arxiv.org//abs/2508.07790) ++ [Best-Effort Policies for Robust Markov Decision Processes](https://arxiv.org/abs/2508.07790) Alessandro Abate, Thom Badings, Giuseppe De Giacomo, Francesco Fabiano -+ [BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks](https://arxiv.org//abs/2508.08127) ++ [BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks](https://arxiv.org/abs/2508.08127) Rui Miao, Yixin Liu, Yili Wang, Xu Shen, Yue Tan, Yiwei Dai, Shirui Pan, Xin Wang -+ [Uncertainty-Driven Reliability: Selective Prediction and Trustworthy Deployment in Modern Machine Learning](https://arxiv.org//abs/2508.07556) ++ [Uncertainty-Driven Reliability: Selective Prediction and Trustworthy Deployment in Modern Machine Learning](https://arxiv.org/abs/2508.07556) Stephan Rabanser -+ [BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models](https://arxiv.org//abs/2508.08040) ++ [BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models](https://arxiv.org/abs/2508.08040) Maozhen Zhang, Mengnan Zhao, Bo Wang -+ [Can You Trick the Grader? Adversarial Persuasion of LLM Judges](https://arxiv.org//abs/2508.07805) ++ [Can You Trick the Grader? Adversarial Persuasion of LLM Judges](https://arxiv.org/abs/2508.07805) Yerin Hwang, Dongryeol Lee, Taegwan Kang, Yongil Kim, Kyomin Jung -+ [Jinx: Unlimited LLMs for Probing Alignment Failures](https://arxiv.org//abs/2508.08243) ++ [Jinx: Unlimited LLMs for Probing Alignment Failures](https://arxiv.org/abs/2508.08243) Jiahao Zhao, Liwei Dong -+ [Anatomy-Aware Low-Dose CT Denoising via Pretrained Vision Models and Semantic-Guided Contrastive Learning](https://arxiv.org//abs/2508.07788) ++ [Anatomy-Aware Low-Dose CT Denoising via Pretrained Vision Models and Semantic-Guided Contrastive Learning](https://arxiv.org/abs/2508.07788) Runze Wang, Zeli Chen, Zhiyun Song, Wei Fang, Jiajin Zhang, Danyang Tu, Yuxing Tang, Minfeng Xu, Xianghua Ye, Le Lu, Dakai Jin -+ [Boosting Active Defense Persistence: A Two-Stage Defense Framework Combining Interruption and Poisoning Against Deepfake](https://arxiv.org//abs/2508.07795) ++ [Boosting Active Defense Persistence: A Two-Stage Defense Framework Combining Interruption and Poisoning Against Deepfake](https://arxiv.org/abs/2508.07795) Hongrui Zheng, Yuezun Li, Liejun Wang, Yunfeng Diao, Zhiqing Guo -+ [MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization](https://arxiv.org//abs/2508.07833) ++ [MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization](https://arxiv.org/abs/2508.07833) Animesh Jain, Alexandros Stergiou -+ [VOIDFace: A Privacy-Preserving Multi-Network Face Recognition With Enhanced Security](https://arxiv.org//abs/2508.07960) ++ [VOIDFace: A Privacy-Preserving Multi-Network Face Recognition With Enhanced Security](https://arxiv.org/abs/2508.07960) Ajnas Muhammed, Iurri Medvedev, Nuno Gonçalves -+ [Mitigating Biases in Surgical Operating Rooms with Geometry](https://arxiv.org//abs/2508.08028) ++ [Mitigating Biases in Surgical Operating Rooms with Geometry](https://arxiv.org/abs/2508.08028) Tony Danjun Wang, Tobias Czempiel, Nassir Navab, Lennart Bastian -+ [Pindrop it! Audio and Visual Deepfake Countermeasures for Robust Detection and Fine Grained-Localization](https://arxiv.org//abs/2508.08141) ++ [Pindrop it! Audio and Visual Deepfake Countermeasures for Robust Detection and Fine Grained-Localization](https://arxiv.org/abs/2508.08141) Nicholas Klein, Hemlata Tak, James Fullwood, Krishna Regmi, Leonidas Spinoulas, Ganesh Sivaraman, Tianxiang Chen, Elie Khoury -+ [Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning](https://arxiv.org//abs/2508.08165) ++ [Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning](https://arxiv.org/abs/2508.08165) Yan Wang, Da-Wei Zhou, Han-Jia Ye -+ [IPBA: Imperceptible Perturbation Backdoor Attack in Federated Self-Supervised Learning](https://arxiv.org//abs/2508.08031) ++ [IPBA: Imperceptible Perturbation Backdoor Attack in Federated Self-Supervised Learning](https://arxiv.org/abs/2508.08031) Jiayao Wang, Yang Song, Zhendong Zhao, Jiale Zhang, Qilin Wu, Junwu Zhu, Dongfang Zhao -+ [FairDRL-ST: Disentangled Representation Learning for Fair Spatio-Temporal Mobility Prediction](https://arxiv.org//abs/2508.07518) ++ [FairDRL-ST: Disentangled Representation Learning for Fair Spatio-Temporal Mobility Prediction](https://arxiv.org/abs/2508.07518) Sichen Zhao, Wei Shao, Jeffrey Chan, Ziqi Xu, Flora Salim -+ [Multi-Turn Jailbreaks Are Simpler Than They Seem](https://arxiv.org//abs/2508.07646) ++ [Multi-Turn Jailbreaks Are Simpler Than They Seem](https://arxiv.org/abs/2508.07646) Xiaoxue Yang, Jaeha Lee, Anna-Katharina Dick, Jasper Timm, Fei Xie, Diogo Cruz -+ [Multi-Hop Privacy Propagation for Differentially Private Federated Learning in Social Networks](https://arxiv.org//abs/2508.07676) ++ [Multi-Hop Privacy Propagation for Differentially Private Federated Learning in Social Networks](https://arxiv.org/abs/2508.07676) Chenchen Lin, Xuehe Wang -+ [EFU: Enforcing Federated Unlearning via Functional Encryption](https://arxiv.org//abs/2508.07873) ++ [EFU: Enforcing Federated Unlearning via Functional Encryption](https://arxiv.org/abs/2508.07873) Samaneh Mohammadi, Vasileios Tsouvalas, Iraklis Symeonidis, Ali Balador, Tanir Ozcelebi, Francesco Flammini, Nirvana Meratnia -+ [Robust Anomaly Detection in O-RAN: Leveraging LLMs against Data Manipulation Attacks](https://arxiv.org//abs/2508.08029) ++ [Robust Anomaly Detection in O-RAN: Leveraging LLMs against Data Manipulation Attacks](https://arxiv.org/abs/2508.08029) Thusitha Dayaratne, Ngoc Duy Pham, Viet Vo, Shangqi Lai, Sharif Abuadbba, Hajime Suzuki, Xingliang Yuan, Carsten Rudolph -+ [False Reality: Uncovering Sensor-induced Human-VR Interaction Vulnerability](https://arxiv.org//abs/2508.08043) ++ [False Reality: Uncovering Sensor-induced Human-VR Interaction Vulnerability](https://arxiv.org/abs/2508.08043) Yancheng Jiang, Yan Jiang, Ruochen Zhou, Yi-Chao Chen, Xiaoyu Ji, Wenyuan Xu -+ [Fully-Fluctuating Participation in Sleepy Consensus](https://arxiv.org//abs/2508.08068) ++ [Fully-Fluctuating Participation in Sleepy Consensus](https://arxiv.org/abs/2508.08068) Yuval Efron, Joachim Neu, Toniann Pitassi -+ [Processing of synthetic data in AI development for healthcare and the definition of personal data in EU law](https://arxiv.org//abs/2508.08353) ++ [Processing of synthetic data in AI development for healthcare and the definition of personal data in EU law](https://arxiv.org/abs/2508.08353) Vibeke Binz Vallevik, Anne Kjersti C. Befring, Severin Elvatun, Jan Franz Nygaard -+ [VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models](https://arxiv.org//abs/2508.08521) ++ [VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models](https://arxiv.org/abs/2508.08521) Mansi Phute (Georgia Tech), Ravikumar Balakrishnan (HiddenLayer) -+ [Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference](https://arxiv.org//abs/2508.08438) ++ [Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference](https://arxiv.org/abs/2508.08438) Kexin Chu, Zecheng Lin, Dawei Xiang, Zixu Shen, Jianchang Su, Cheng Chu, Yiwei Yang, Wenhui Zhang, Wenfei Wu, Wei Zhang -+ [Designing with Deception: ML- and Covert Gate-Enhanced Camouflaging to Thwart IC Reverse Engineering](https://arxiv.org//abs/2508.08462) ++ [Designing with Deception: ML- and Covert Gate-Enhanced Camouflaging to Thwart IC Reverse Engineering](https://arxiv.org/abs/2508.08462) Junling Fan, David Koblah, Domenic Forte -+ [Towards Effective MLLM Jailbreaking Through Balanced On-Topicness and OOD-Intensity](https://arxiv.org//abs/2508.09218) ++ [Towards Effective MLLM Jailbreaking Through Balanced On-Topicness and OOD-Intensity](https://arxiv.org/abs/2508.09218) Zuoou Li, Weitong Zhang, Jingyuan Wang, Shuyuan Zhang, Wenjia Bai, Bernhard Kainz, Mengyun Qiao -+ [FIDELIS: Blockchain-Enabled Protection Against Poisoning Attacks in Federated Learning](https://arxiv.org//abs/2508.10042) ++ [FIDELIS: Blockchain-Enabled Protection Against Poisoning Attacks in Federated Learning](https://arxiv.org/abs/2508.10042) Jane Carney, Kushal Upreti, Gaby G. Dagher, Tim Andersen # 2025-08-10 -+ [Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape](https://arxiv.org//abs/2508.07334) ++ [Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape](https://arxiv.org/abs/2508.07334) Quan Shi, Wang Xi, Zenghui Ding, Jianqing Gao, Xianjun Yang -+ [A Real-Time, Self-Tuning Moderator Framework for Adversarial Prompt Detection](https://arxiv.org//abs/2508.07139) ++ [A Real-Time, Self-Tuning Moderator Framework for Adversarial Prompt Detection](https://arxiv.org/abs/2508.07139) Ivan Zhang -+ [Representation Understanding via Activation Maximization](https://arxiv.org//abs/2508.07281) ++ [Representation Understanding via Activation Maximization](https://arxiv.org/abs/2508.07281) Hongbo Zhu, Angelo Cangelosi -+ [ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering](https://arxiv.org//abs/2508.07321) ++ [ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering](https://arxiv.org/abs/2508.07321) Shubhra Ghosh, Abhilekh Borah, Aditya Kumar Guru, Kripabandhu Ghosh -+ [A Spin Glass Characterization of Neural Networks](https://arxiv.org//abs/2508.07397) ++ [A Spin Glass Characterization of Neural Networks](https://arxiv.org/abs/2508.07397) Jun Li -+ [Gradient Surgery for Safe LLM Fine-Tuning](https://arxiv.org//abs/2508.07172) ++ [Gradient Surgery for Safe LLM Fine-Tuning](https://arxiv.org/abs/2508.07172) Biao Yi, Jiahao Li, Baolei Zhang, Lihai Nie, Tong Li, Tiansheng Huang, Zheli Liu -+ [HaDM-ST: Histology-Assisted Differential Modeling for Spatial Transcriptomics Generation](https://arxiv.org//abs/2508.07225) ++ [HaDM-ST: Histology-Assisted Differential Modeling for Spatial Transcriptomics Generation](https://arxiv.org/abs/2508.07225) Xuepeng Liu, Zheng Jiang, Pinan Zhu, Hanyu Liu, Chao Li -+ [ForensicsSAM: Toward Robust and Unified Image Forgery Detection and Localization Resisting to Adversarial Attack](https://arxiv.org//abs/2508.07402) ++ [ForensicsSAM: Toward Robust and Unified Image Forgery Detection and Localization Resisting to Adversarial Attack](https://arxiv.org/abs/2508.07402) Rongxuan Peng, Shunquan Tan, Chenqi Kong, Anwei Luo, Alex C. Kot, Jiwu Huang -+ [Fading the Digital Ink: A Universal Black-Box Attack Framework for 3DGS Watermarking Systems](https://arxiv.org//abs/2508.07263) ++ [Fading the Digital Ink: A Universal Black-Box Attack Framework for 3DGS Watermarking Systems](https://arxiv.org/abs/2508.07263) Qingyuan Zeng, Shu Jiang, Jiajing Lin, Zhenzhong Wang, Kay Chen Tan, Min Jiang -+ [Towards Unveiling Predictive Uncertainty Vulnerabilities in the Context of the Right to Be Forgotten](https://arxiv.org//abs/2508.07458) ++ [Towards Unveiling Predictive Uncertainty Vulnerabilities in the Context of the Right to Be Forgotten](https://arxiv.org/abs/2508.07458) Wei Qian, Chenxu Zhao, Yangyi Li, Wenqian Ye, Mengdi Huai -+ [Enhancing Privacy in Decentralized Min-Max Optimization: A Differentially Private Approach](https://arxiv.org//abs/2508.07505) ++ [Enhancing Privacy in Decentralized Min-Max Optimization: A Differentially Private Approach](https://arxiv.org/abs/2508.07505) Yueyang Quan, Chang Wang, Shengjie Zhai, Minghong Fang, Zhuqing Liu -+ [Certifiably robust malware detectors by design](https://arxiv.org//abs/2508.10038) ++ [Certifiably robust malware detectors by design](https://arxiv.org/abs/2508.10038) Pierre-Francois Gimenez, Sarath Sivaprasad, Mario Fritz -+ [Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries](https://arxiv.org//abs/2508.10039) ++ [Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries](https://arxiv.org/abs/2508.10039) Wenqiang Wang, Yan Xiao, Hao Lin, Yangshijie Zhang, Xiaochun Cao -+ [Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models](https://arxiv.org//abs/2508.14062) ++ [Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models](https://arxiv.org/abs/2508.14062) Badrinath Ramakrishnan, Akshaya Balaji # 2025-08-09 -+ [Many-Turn Jailbreaking](https://arxiv.org//abs/2508.06755) ++ [Many-Turn Jailbreaking](https://arxiv.org/abs/2508.06755) Xianjun Yang, Liqiang Xiao, Shiyang Li, Faisal Ladhak, Hyokun Yun, Linda Ruth Petzold, Yi Xu, William Yang Wang -+ [PROPS: Progressively Private Self-alignment of Large Language Models](https://arxiv.org//abs/2508.06783) ++ [PROPS: Progressively Private Self-alignment of Large Language Models](https://arxiv.org/abs/2508.06783) Noel Teku, Fengwei Tian, Payel Bhattacharjee, Souradip Chakraborty, Amrit Singh Bedi, Ravi Tandon -+ [Who's the Evil Twin? Differential Auditing for Undesired Behavior](https://arxiv.org//abs/2508.06827) ++ [Who's the Evil Twin? Differential Auditing for Undesired Behavior](https://arxiv.org/abs/2508.06827) Ishwar Balappanawar, Venkata Hasith Vattikuti, Greta Kintzley, Ronan Azimi-Mancel, Satvik Golechha -+ [Balancing Privacy and Efficiency: Music Information Retrieval via Additive Homomorphic Encryption](https://arxiv.org//abs/2508.07044) ++ [Balancing Privacy and Efficiency: Music Information Retrieval via Additive Homomorphic Encryption](https://arxiv.org/abs/2508.07044) William Zerong Wang, Dongfang Zhao -+ [Membership and Memorization in LLM Knowledge Distillation](https://arxiv.org//abs/2508.07054) ++ [Membership and Memorization in LLM Knowledge Distillation](https://arxiv.org/abs/2508.07054) Ziqi Zhang, Ali Shahin Shamsabadi, Hanxiao Lu, Yifeng Cai, Hamed Haddadi -+ [Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection](https://arxiv.org//abs/2508.06913) ++ [Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection](https://arxiv.org/abs/2508.06913) Siyuan Li, Xi Lin, Guangyan Li, Zehao Liu, Aodu Wulianghai, Li Ding, Jun Wu, Jianhua Li -+ [Adversarial Video Promotion Against Text-to-Video Retrieval](https://arxiv.org//abs/2508.06964) ++ [Adversarial Video Promotion Against Text-to-Video Retrieval](https://arxiv.org/abs/2508.06964) Qiwei Tian, Chenhao Lin, Zhengyu Zhao, Qian Li, Shuai Liu, Chao Shen -+ [Membership Inference Attacks with False Discovery Rate Control](https://arxiv.org//abs/2508.07066) ++ [Membership Inference Attacks with False Discovery Rate Control](https://arxiv.org/abs/2508.07066) Chenxu Zhao, Wei Qian, Aobo Chen, Mengdi Huai -+ [Sensory robustness through top-down feedback and neural stochasticity in recurrent vision models](https://arxiv.org//abs/2508.07115) ++ [Sensory robustness through top-down feedback and neural stochasticity in recurrent vision models](https://arxiv.org/abs/2508.07115) Antonino Greco, Marco D'Alessandro, Karl J. Friston, Giovanni Pezzulo, Markus Siegel -+ [SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization](https://arxiv.org//abs/2508.07086) ++ [SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization](https://arxiv.org/abs/2508.07086) Beilong Tang, Xiaoxiao Miao, Xin Wang, Ming Li -+ [Label Inference Attacks against Federated Unlearning](https://arxiv.org//abs/2508.06789) ++ [Label Inference Attacks against Federated Unlearning](https://arxiv.org/abs/2508.06789) Wei Wang, Xiangyun Tang, Yajie Wang, Yijing Lin, Tao Zhang, Meng Shen, Dusit Niyato, Liehuang Zhu -+ [Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models](https://arxiv.org//abs/2508.06837) ++ [Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models](https://arxiv.org/abs/2508.06837) Shiqian Zhao, Chong Wang, Yiming Li, Yihao Huang, Wenjie Qu, Siew-Kei Lam, Yi Xie, Kangjie Chen, Jie Zhang, Tianwei Zhang -+ [Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs](https://arxiv.org//abs/2508.10031) ++ [Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs](https://arxiv.org/abs/2508.10031) Jinhwa Kim, Ian G. Harris -+ [The Cost of Thinking: Increased Jailbreak Risk in Large Language Models](https://arxiv.org//abs/2508.10032) ++ [The Cost of Thinking: Increased Jailbreak Risk in Large Language Models](https://arxiv.org/abs/2508.10032) Fan Yang # 2025-08-08 -+ [LLM Robustness Leaderboard v1 --Technical report](https://arxiv.org//abs/2508.06296) ++ [LLM Robustness Leaderboard v1 --Technical report](https://arxiv.org/abs/2508.06296) Pierre Peigné - Lefebvre, Quentin Feuillade-Montixi, Tom David, Nicolas Miailhe -+ [ETA: Energy-based Test-time Adaptation for Depth Completion](https://arxiv.org//abs/2508.05989) ++ [ETA: Energy-based Test-time Adaptation for Depth Completion](https://arxiv.org/abs/2508.05989) Younjoon Chung, Hyoungseob Park, Patrick Rim, Xiaoran Zhang, Jihe He, Ziyao Zeng, Safa Cicek, Byung-Woo Hong, James S. Duncan, Alex Wong -+ [Differentially Private Federated Clustering with Random Rebalancing](https://arxiv.org//abs/2508.06183) ++ [Differentially Private Federated Clustering with Random Rebalancing](https://arxiv.org/abs/2508.06183) Xiyuan Yang, Shengyuan Hu, Soyeon Kim, Tian Li -+ [Membership Inference Attack with Partial Features](https://arxiv.org//abs/2508.06244) ++ [Membership Inference Attack with Partial Features](https://arxiv.org/abs/2508.06244) Xurun Wang, Guangrui Liu, Xinjie Li, Haoyu He, Lin Yao, Weizhe Zhang -+ [In-Training Defenses against Emergent Misalignment in Language Models](https://arxiv.org//abs/2508.06249) ++ [In-Training Defenses against Emergent Misalignment in Language Models](https://arxiv.org/abs/2508.06249) David Kaczér, Magnus Jørgenvåg, Clemens Vetter, Lucie Flek, Florian Mai -+ [FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields](https://arxiv.org//abs/2508.06301) ++ [FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields](https://arxiv.org/abs/2508.06301) Junhyeog Yun, Minui Hong, Gunhee Kim -+ [ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls](https://arxiv.org//abs/2508.06457) ++ [ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls](https://arxiv.org/abs/2508.06457) Sanket Badhe -+ [WGAST: Weakly-Supervised Generative Network for Daily 10 m Land Surface Temperature Estimation via Spatio-Temporal Fusion](https://arxiv.org//abs/2508.06485) ++ [WGAST: Weakly-Supervised Generative Network for Daily 10 m Land Surface Temperature Estimation via Spatio-Temporal Fusion](https://arxiv.org/abs/2508.06485) Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai -+ [Adversarial Topic-aware Prompt-tuning for Cross-topic Automated Essay Scoring](https://arxiv.org//abs/2508.05987) ++ [Adversarial Topic-aware Prompt-tuning for Cross-topic Automated Essay Scoring](https://arxiv.org/abs/2508.05987) Chunyun Zhang, Hongyan Zhao, Chaoran Cui, Qilong Song, Zhiqing Lu, Shuai Gong, Kailin Liu -+ [Beyond Uniform Criteria: Scenario-Adaptive Multi-Dimensional Jailbreak Evaluation](https://arxiv.org//abs/2508.06194) ++ [Beyond Uniform Criteria: Scenario-Adaptive Multi-Dimensional Jailbreak Evaluation](https://arxiv.org/abs/2508.06194) Lai Jiang, Yuekang Li, Xiaohan Zhang, Youtao Ding, Li Pan -+ [Quantifying Conversation Drift in MCP via Latent Polytope](https://arxiv.org//abs/2508.06418) ++ [Quantifying Conversation Drift in MCP via Latent Polytope](https://arxiv.org/abs/2508.06418) Haoran Shi, Hongwei Yao, Shuo Shao, Shaopeng Jiao, Ziqi Peng, Zhan Qin, Cong Wang -+ [Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System](https://arxiv.org//abs/2508.06059) ++ [Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System](https://arxiv.org/abs/2508.06059) Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, Francis C. M. Lau -+ [SAM Encoder Breach by Adversarial Simplicial Complex Triggers Downstream Model Failures](https://arxiv.org//abs/2508.06127) ++ [SAM Encoder Breach by Adversarial Simplicial Complex Triggers Downstream Model Failures](https://arxiv.org/abs/2508.06127) Yi Qin, Rui Wang, Tao Huang, Tong Xiao, Liping Jing -+ [SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models](https://arxiv.org//abs/2508.06142) ++ [SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models](https://arxiv.org/abs/2508.06142) Hanqing Wang, Yuan Tian, Mingyu Liu, Zhenhao Zhang, Xiangyang Zhu -+ [FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation](https://arxiv.org//abs/2508.06392) ++ [FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation](https://arxiv.org/abs/2508.06392) Wenbin Teng, Gonglin Chen, Haiwei Chen, Yajie Zhao -+ [Adaptive Backtracking for Privacy Protection in Large Language Models](https://arxiv.org//abs/2508.06087) ++ [Adaptive Backtracking for Privacy Protection in Large Language Models](https://arxiv.org/abs/2508.06087) Zhihao Yao, Yuxuan Gu, Xiachong Feng, Weitao Ma, Bo Li, Xiaocheng Feng -+ [ProvX: Generating Counterfactual-Driven Attack Explanations for Provenance-Based Detection](https://arxiv.org//abs/2508.06073) ++ [ProvX: Generating Counterfactual-Driven Attack Explanations for Provenance-Based Detection](https://arxiv.org/abs/2508.06073) Weiheng Wu, Wei Qiao, Teng Li, Yebo Feng, Zhuo Ma, Jianfeng Ma, Yang Liu -+ [SLIP: Soft Label Mechanism and Key-Extraction-Guided CoT-based Defense Against Instruction Backdoor in APIs](https://arxiv.org//abs/2508.06153) ++ [SLIP: Soft Label Mechanism and Key-Extraction-Guided CoT-based Defense Against Instruction Backdoor in APIs](https://arxiv.org/abs/2508.06153) Zhengxian Wu, Juan Wen, Wanli Peng, Haowei Chang, Yinghan Zhou, Yiming Xue -+ [When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation](https://arxiv.org//abs/2508.06394) ++ [When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation](https://arxiv.org/abs/2508.06394) Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese, Omer Akgul, Athanasios Theocharis, Petros Efstathopoulos -+ [Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs](https://arxiv.org//abs/2508.06601) ++ [Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs](https://arxiv.org/abs/2508.06601) Kyle O'Brien, Stephen Casper, Quentin Anthony, Tomek Korbak, Robert Kirk, Xander Davies, Ishan Mishra, Geoffrey Irving, Yarin Gal, Stella Biderman -+ [Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models](https://arxiv.org//abs/2508.06621) ++ [Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models](https://arxiv.org/abs/2508.06621) Tomohiro Sawada, Kartik Goyal -+ [ContextGuard-LVLM: Enhancing News Veracity through Fine-grained Cross-modal Contextual Consistency Verification](https://arxiv.org//abs/2508.06623) ++ [ContextGuard-LVLM: Enhancing News Veracity through Fine-grained Cross-modal Contextual Consistency Verification](https://arxiv.org/abs/2508.06623) Sihan Ma, Qiming Wu, Ruotong Jiang, Frank Burns -+ [Learning to Forget with Information Divergence Reweighted Objectives for Noisy Labels](https://arxiv.org//abs/2508.06622) ++ [Learning to Forget with Information Divergence Reweighted Objectives for Noisy Labels](https://arxiv.org/abs/2508.06622) Jeremiah Birrell, Reza Ebrahimi -+ [Privacy-Preserving Tabular Synthetic Data Generation Using TabularARGN](https://arxiv.org//abs/2508.06647) ++ [Privacy-Preserving Tabular Synthetic Data Generation Using TabularARGN](https://arxiv.org/abs/2508.06647) Andrey Sidorenko, Paul Tiwald -+ [Fine-Grained Safety Neurons with Training-Free Continual Projection to Reduce LLM Fine Tuning Risks](https://arxiv.org//abs/2508.09190) ++ [Fine-Grained Safety Neurons with Training-Free Continual Projection to Reduce LLM Fine Tuning Risks](https://arxiv.org/abs/2508.09190) Bing Han, Feifei Zhao, Dongcheng Zhao, Guobin Shen, Ping Wu, Yu Shi, Yi Zeng -+ [Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs](https://arxiv.org//abs/2508.10029) ++ [Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs](https://arxiv.org/abs/2508.10029) Wenpeng Xing, Mohan Li, Chunqiang Hu, Haitao XuNingyu Zhang, Bo Lin, Meng Han -+ [Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts](https://arxiv.org//abs/2508.06361) ++ [Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts](https://arxiv.org/abs/2508.06361) Zhaomin Wu, Mingzhe Du, See-Kiong Ng, Bingsheng He # 2025-08-07 -+ [MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models](https://arxiv.org//abs/2508.05083) ++ [MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models](https://arxiv.org/abs/2508.05083) Dexuan Xu, Jieyi Wang, Zhongyan Chai, Yongzhi Cao, Hanpin Wang, Huamin Zhang, Yu Huang -+ [Automatic Image Colorization with Convolutional Neural Networks and Generative Adversarial Networks](https://arxiv.org//abs/2508.05068) ++ [Automatic Image Colorization with Convolutional Neural Networks and Generative Adversarial Networks](https://arxiv.org/abs/2508.05068) Ruiyu Li, Changyuan Qiu, Hangrui Cao, Qihan Ren, Yuqing Qiu -+ [JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering](https://arxiv.org//abs/2508.05087) ++ [JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering](https://arxiv.org/abs/2508.05087) Renmiao Chen, Shiyao Cui, Xuancheng Huang, Chengwei Pan, Victor Shea-Jay Huang, QingLin Zhang, Xuan Ouyang, Zhexin Zhang, Hongning Wang, Minlie Huang -+ [Navigating the Trade-off: A Synthesis of Defensive Strategies for Zero-Shot Adversarial Robustness in Vision-Language Models](https://arxiv.org//abs/2508.05237) ++ [Navigating the Trade-off: A Synthesis of Defensive Strategies for Zero-Shot Adversarial Robustness in Vision-Language Models](https://arxiv.org/abs/2508.05237) Zane Xu, Jason Sun -+ [Building Effective Safety Guardrails in AI Education Tools](https://arxiv.org//abs/2508.05360) ++ [Building Effective Safety Guardrails in AI Education Tools](https://arxiv.org/abs/2508.05360) Hannah-Beth Clark, Laura Benton, Emma Searle, Margaux Dowland, Matthew Gregory, Will Gayne, John Roberts -+ [PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems](https://arxiv.org//abs/2508.05167) ++ [PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems](https://arxiv.org/abs/2508.05167) Qi Guo, Xiaojun Jia, Shanmin Pang, Simeng Qin, Lin Wang, Ju Jia, Yang Liu, Qing Guo -+ [From Detection to Correction: Backdoor-Resilient Face Recognition via Vision-Language Trigger Detection and Noise-Based Neutralization](https://arxiv.org//abs/2508.05409) ++ [From Detection to Correction: Backdoor-Resilient Face Recognition via Vision-Language Trigger Detection and Noise-Based Neutralization](https://arxiv.org/abs/2508.05409) Farah Wahida, M.A.P. Chamikara, Yashothara Shanmugarasa, Mohan Baruwal Chhetri, Thilina Ranbaduge, Ibrahim Khalil -+ [Physical Adversarial Camouflage through Gradient Calibration and Regularization](https://arxiv.org//abs/2508.05414) ++ [Physical Adversarial Camouflage through Gradient Calibration and Regularization](https://arxiv.org/abs/2508.05414) Jiawei Liang, Siyuan Liang, Jianjie Huang, Chenxi Si, Ming Zhang, Xiaochun Cao -+ [Keep It Real: Challenges in Attacking Compression-Based Adversarial Purification](https://arxiv.org//abs/2508.05489) ++ [Keep It Real: Challenges in Attacking Compression-Based Adversarial Purification](https://arxiv.org/abs/2508.05489) Samuel Räber, Till Aczel, Andreas Plesner, Roger Wattenhofer -+ [FS-IQA: Certified Feature Smoothing for Robust Image Quality Assessment](https://arxiv.org//abs/2508.05516) ++ [FS-IQA: Certified Feature Smoothing for Robust Image Quality Assessment](https://arxiv.org/abs/2508.05516) Ekaterina Shumitskaya, Dmitriy Vatolin, Anastasia Antsiferova -+ [Don't Reach for the Stars: Rethinking Topology for Resilient Federated Learning](https://arxiv.org//abs/2508.05224) ++ [Don't Reach for the Stars: Rethinking Topology for Resilient Federated Learning](https://arxiv.org/abs/2508.05224) Mirko Konstantin, Anirban Mukhopadhyay -+ [NT-ML: Backdoor Defense via Non-target Label Training and Mutual Learning](https://arxiv.org//abs/2508.05404) ++ [NT-ML: Backdoor Defense via Non-target Label Training and Mutual Learning](https://arxiv.org/abs/2508.05404) Wenjie Huo, Katinka Wolter -+ [Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes](https://arxiv.org//abs/2508.05469) ++ [Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes](https://arxiv.org/abs/2508.05469) Zachary Robertson, Sanmi Koyejo -+ [Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classification](https://arxiv.org//abs/2508.05600) ++ [Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classification](https://arxiv.org/abs/2508.05600) Thorsten Peinemann, Paula Arnold, Sebastian Berndt, Thomas Eisenbarth, Esfandiar Mohammadi -+ [Anti-Jamming Sensing with Distributed Reconfigurable Intelligent Metasurface Antennas](https://arxiv.org//abs/2508.04964) ++ [Anti-Jamming Sensing with Distributed Reconfigurable Intelligent Metasurface Antennas](https://arxiv.org/abs/2508.04964) Zhaowei Wang, Yunsong Huang, Weicheng Liu, Hui-Ming Wang -+ [Necessity of Block Designs for Optimal Locally Private Distribution Estimation](https://arxiv.org//abs/2508.05110) ++ [Necessity of Block Designs for Optimal Locally Private Distribution Estimation](https://arxiv.org/abs/2508.05110) Abigail Gentle -+ [Safety of Embodied Navigation: A Survey](https://arxiv.org//abs/2508.05855) ++ [Safety of Embodied Navigation: A Survey](https://arxiv.org/abs/2508.05855) Zixia Wang, Jia Hu, Ronghui Mu -+ [Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation](https://arxiv.org//abs/2508.05775) ++ [Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation](https://arxiv.org/abs/2508.05775) Chi Zhang, Changjia Zhu, Junjie Xiong, Xiaoran Xu, Lingyao Li, Yao Liu, Zhuo Lu -+ [System Security Framework for 5G Advanced /6G IoT Integrated Terrestrial Network-Non-Terrestrial Network (TN-NTN) with AI-Enabled Cloud Security](https://arxiv.org//abs/2508.05707) ++ [System Security Framework for 5G Advanced /6G IoT Integrated Terrestrial Network-Non-Terrestrial Network (TN-NTN) with AI-Enabled Cloud Security](https://arxiv.org/abs/2508.05707) Sasa Maric, Rasil Baidar, Robert Abbas, Sam Reisenfeld -+ [A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality](https://arxiv.org//abs/2508.09185) ++ [A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality](https://arxiv.org/abs/2508.09185) Rongqian Chen, Allison Andreyev, Yanming Xiu, Mahdi Imani, Bin Li, Maria Gorlatova, Gang Tan, Tian Lan -+ [RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System](https://arxiv.org//abs/2508.09186) ++ [RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System](https://arxiv.org/abs/2508.09186) Abdolazim Rezaei, Mehdi Sookhak, Mahboobeh Haghparast -+ [Robust Market Making: To Quote, or not To Quote](https://arxiv.org//abs/2508.16588) ++ [Robust Market Making: To Quote, or not To Quote](https://arxiv.org/abs/2508.16588) Ziyi Wang, Carmine Ventre, Maria Polukarov # 2025-08-06 -+ [Adversarial Attacks and Defenses on Graph-aware Large Language Models (LLMs)](https://arxiv.org//abs/2508.04894) ++ [Adversarial Attacks and Defenses on Graph-aware Large Language Models (LLMs)](https://arxiv.org/abs/2508.04894) Iyiola E. Olatunji, Franziska Boenisch, Jing Xu, Adam Dziedzic -+ [IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards](https://arxiv.org//abs/2508.04632) ++ [IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards](https://arxiv.org/abs/2508.04632) Xu Guo, Tianyi Liang, Tong Jian, Xiaogui Yang, Ling-I Wu, Chenhui Li, Zhihui Lu, Qipeng Guo, Kai Chen -+ [ANPrompt: Anti-noise Prompt Tuning for Vision-Language Models](https://arxiv.org//abs/2508.04677) ++ [ANPrompt: Anti-noise Prompt Tuning for Vision-Language Models](https://arxiv.org/abs/2508.04677) Yansheng Gao, Yufei Zheng, Jinghan Qu, Zixi Zhu, Yukuan Zhang, Shengsheng Wang -+ [Boosting Adversarial Transferability via Residual Perturbation Attack](https://arxiv.org//abs/2508.05689) ++ [Boosting Adversarial Transferability via Residual Perturbation Attack](https://arxiv.org/abs/2508.05689) Jinjia Peng, Zeze Tao, Huibing Wang, Meng Wang, Yang Wang -+ [AuthPrint: Fingerprinting Generative Models Against Malicious Model Providers](https://arxiv.org//abs/2508.05691) ++ [AuthPrint: Fingerprinting Generative Models Against Malicious Model Providers](https://arxiv.org/abs/2508.05691) Kai Yao, Marc Juarez -+ [Communication-Learning Co-Design for Differentially Private Over-the-Air Federated Distillation](https://arxiv.org//abs/2508.06557) ++ [Communication-Learning Co-Design for Differentially Private Over-the-Air Federated Distillation](https://arxiv.org/abs/2508.06557) Zihao Hu (1), Jia Yan (2), Ying-Jun Angela Zhang (1) ((1) The Chinese University of Hong Kong, (2) The Hong Kong University of Science and Technology (Guangzhou)) -+ [SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience](https://arxiv.org//abs/2508.04700) ++ [SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience](https://arxiv.org/abs/2508.04700) Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, Jiaqi Wang -+ [A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models](https://arxiv.org//abs/2508.04276) ++ [A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models](https://arxiv.org/abs/2508.04276) Jiayi Wen, Tianxin Chen, Zhirun Zheng, Cheng Huang -+ [An Audit and Analysis of LLM-Assisted Health Misinformation Jailbreaks Against LLMs](https://arxiv.org//abs/2508.10010) ++ [An Audit and Analysis of LLM-Assisted Health Misinformation Jailbreaks Against LLMs](https://arxiv.org/abs/2508.10010) Ayana Hussain, Patrick Zhao, Nicholas Vincent -+ [Assessing Representation Stability for Transformer Models](https://arxiv.org//abs/2508.11667) ++ [Assessing Representation Stability for Transformer Models](https://arxiv.org/abs/2508.11667) Bryan E. Tuck, Rakesh M. Verma -+ [Unsupervised Pairwise Learning Optimization Framework for Cross-Corpus EEG-Based Emotion Recognition Based on Prototype Representation](https://arxiv.org//abs/2508.11663) ++ [Unsupervised Pairwise Learning Optimization Framework for Cross-Corpus EEG-Based Emotion Recognition Based on Prototype Representation](https://arxiv.org/abs/2508.11663) Guangli Li, Canbiao Wu, Zhen Liang -+ [Per-element Secure Aggregation against Data Reconstruction Attacks in Federated Learning](https://arxiv.org//abs/2508.04285) ++ [Per-element Secure Aggregation against Data Reconstruction Attacks in Federated Learning](https://arxiv.org/abs/2508.04285) Takumi Suimon, Yuki Koizumi, Junji Takemasa, Toru Hasegawa # 2025-08-05 -+ [Beyond Surface-Level Detection: Towards Cognitive-Driven Defense Against Jailbreak Attacks via Meta-Operations Reasoning](https://arxiv.org//abs/2508.03054) ++ [Beyond Surface-Level Detection: Towards Cognitive-Driven Defense Against Jailbreak Attacks via Meta-Operations Reasoning](https://arxiv.org/abs/2508.03054) Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang -+ [T2UE: Generating Unlearnable Examples from Text Descriptions](https://arxiv.org//abs/2508.03091) ++ [T2UE: Generating Unlearnable Examples from Text Descriptions](https://arxiv.org/abs/2508.03091) Xingjun Ma, Hanxun Huang, Tianwei Song, Ye Sun, Yifeng Gao, Yu-Gang Jiang -+ [Hide and Seek with LLMs: An Adversarial Game for Sneaky Error Generation and Self-Improving Diagnosis](https://arxiv.org//abs/2508.03396) ++ [Hide and Seek with LLMs: An Adversarial Game for Sneaky Error Generation and Self-Improving Diagnosis](https://arxiv.org/abs/2508.03396) Rui Zou, Mengqi Wei, Yutao Zhu, Jirong Wen, Xin Zhao, Jing Chen -+ [VCNet: Recreating High-Level Visual Cortex Principles for Robust Artificial Vision](https://arxiv.org//abs/2508.02995) ++ [VCNet: Recreating High-Level Visual Cortex Principles for Robust Artificial Vision](https://arxiv.org/abs/2508.02995) Brennen A. Hill, Zhang Xinyu, Timothy Putra Prasetio -+ [Untraceable DeepFakes via Traceable Fingerprint Elimination](https://arxiv.org//abs/2508.03067) ++ [Untraceable DeepFakes via Traceable Fingerprint Elimination](https://arxiv.org/abs/2508.03067) Jiewei Lai, Lan Zhang, Chen Tang, Pengcheng Sun, Xinming Wang, Yunhao Wang -+ [VFLAIR-LLM: A Comprehensive Framework and Benchmark for Split Learning of LLMs](https://arxiv.org//abs/2508.03097) ++ [VFLAIR-LLM: A Comprehensive Framework and Benchmark for Split Learning of LLMs](https://arxiv.org/abs/2508.03097) Zixuan Gu, Qiufeng Fan, Long Sun, Yang Liu, Xiaojun Ye -+ [Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS](https://arxiv.org//abs/2508.03125) ++ [Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS](https://arxiv.org/abs/2508.03125) Bingyu Yan, Ziyi Zhou, Xiaoming Zhang, Chaozhuo Li, Ruilin Zeng, Yirui Qi, Tianbo Wang, Litian Zhang -+ [GeoShield: Safeguarding Geolocation Privacy from Vision-Language Models via Adversarial Perturbations](https://arxiv.org//abs/2508.03209) ++ [GeoShield: Safeguarding Geolocation Privacy from Vision-Language Models via Adversarial Perturbations](https://arxiv.org/abs/2508.03209) Xinwei Liu, Xiaojun Jia, Yuan Xun, Simeng Qin, Xiaochun Cao -+ [The Power of Many: Synergistic Unification of Diverse Augmentations for Efficient Adversarial Robustness](https://arxiv.org//abs/2508.03213) ++ [The Power of Many: Synergistic Unification of Diverse Augmentations for Efficient Adversarial Robustness](https://arxiv.org/abs/2508.03213) Wang Yu-Hang, Shiwei Li, Jianxiang Liao, Li Bohan, Jian Liu, Wenfei Yin -+ [When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs](https://arxiv.org//abs/2508.03365) ++ [When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs](https://arxiv.org/abs/2508.03365) Bodam Kim, Hiskias Dingeto, Taeyoun Kwon, Dasol Choi, DongGeon Lee, Haon Park, JaeHoon Lee, Jongho Shin -+ [VideoGuard: Protecting Video Content from Unauthorized Editing](https://arxiv.org//abs/2508.03480) ++ [VideoGuard: Protecting Video Content from Unauthorized Editing](https://arxiv.org/abs/2508.03480) Junjie Cao, Kaizhou Li, Xinchun Yu, Hongxiang Li, Xiaoping Zhang -+ [CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors](https://arxiv.org//abs/2508.02997) ++ [CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors](https://arxiv.org/abs/2508.02997) Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis -+ [Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation](https://arxiv.org//abs/2508.03098) ++ [Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation](https://arxiv.org/abs/2508.03098) Haoran Wang, Xiongxiao Xu, Baixiang Huang, Kai Shu -+ [Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation](https://arxiv.org//abs/2508.03110) ++ [Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation](https://arxiv.org/abs/2508.03110) Zizhong Li, Haopeng Zhang, Jiawei Zhang -+ [Adversarial Attention Perturbations for Large Object Detection Transformers](https://arxiv.org//abs/2508.02987) ++ [Adversarial Attention Perturbations for Large Object Detection Transformers](https://arxiv.org/abs/2508.02987) Zachary Yahn, Selim Furkan Tekin, Fatih Ilhan, Sihao Hu, Tiansheng Huang, Yichang Xu, Margaret Loper, Ling Liu -+ [Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models](https://arxiv.org//abs/2508.03006) ++ [Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models](https://arxiv.org/abs/2508.03006) Fan Yang, Yihao Huang, Jiayi Zhu, Ling Shi, Geguang Pu, Jin Song Dong, Kailong Wang -+ [evTransFER: A Transfer Learning Framework for Event-based Facial Expression Recognition](https://arxiv.org//abs/2508.03609) ++ [evTransFER: A Transfer Learning Framework for Event-based Facial Expression Recognition](https://arxiv.org/abs/2508.03609) Rodrigo Verschae, Ignacio Bugueno-Cordova -+ [BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models](https://arxiv.org//abs/2508.03221) ++ [BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models](https://arxiv.org/abs/2508.03221) Yu Pan, Jiahao Chen, Lin Wang, Bingrong Dai, Yi Du -+ [Heterogeneity-Oblivious Robust Federated Learning](https://arxiv.org//abs/2508.03579) ++ [Heterogeneity-Oblivious Robust Federated Learning](https://arxiv.org/abs/2508.03579) Weiyao Zhang, Jinyang Li, Qi Song, Miao Wang, Chungang Lin, Haitong Luo, Xuying Meng, Yujun Zhang -+ [What If, But Privately: Private Counterfactual Retrieval](https://arxiv.org//abs/2508.03681) ++ [What If, But Privately: Private Counterfactual Retrieval](https://arxiv.org/abs/2508.03681) Shreya Meel, Mohamed Nomeir, Pasan Dissanayake, Sanghamitra Dutta, Sennur Ulukus -+ [BDFirewall: Towards Effective and Expeditiously Black-Box Backdoor Defense in MLaaS](https://arxiv.org//abs/2508.03307) ++ [BDFirewall: Towards Effective and Expeditiously Black-Box Backdoor Defense in MLaaS](https://arxiv.org/abs/2508.03307) Ye Li, Chengcheng Zhu, Yanchao Zhao, Jiale Zhang -+ [Probing and Enhancing the Robustness of GNN-based QEC Decoders with Reinforcement Learning](https://arxiv.org//abs/2508.03783) ++ [Probing and Enhancing the Robustness of GNN-based QEC Decoders with Reinforcement Learning](https://arxiv.org/abs/2508.03783) Ryota Ikeda -+ [Adversarial Attacks on Reinforcement Learning-based Medical Questionnaire Systems: Input-level Perturbation Strategies and Medical Constraint Validation](https://arxiv.org//abs/2508.05677) ++ [Adversarial Attacks on Reinforcement Learning-based Medical Questionnaire Systems: Input-level Perturbation Strategies and Medical Constraint Validation](https://arxiv.org/abs/2508.05677) Peizhuo Liu -+ [Selection-Based Vulnerabilities: Clean-Label Backdoor Attacks in Active Learning](https://arxiv.org//abs/2508.05681) ++ [Selection-Based Vulnerabilities: Clean-Label Backdoor Attacks in Active Learning](https://arxiv.org/abs/2508.05681) Yuhan Zhi, Longtian Wang, Xiaofei Xie, Chao Shen, Qiang Hu, Xiaohong Guan -+ [Anti-Tamper Protection for Unauthorized Individual Image Generation](https://arxiv.org//abs/2508.06325) ++ [Anti-Tamper Protection for Unauthorized Individual Image Generation](https://arxiv.org/abs/2508.06325) Zelin Li, Ruohan Zong, Yifan Liu, Ruichen Yao, Yaokun Liu, Yang Zhang, Dong Wang -+ [EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving](https://arxiv.org//abs/2508.09158) ++ [EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving](https://arxiv.org/abs/2508.09158) Siwen Jiao, Kangan Qian, Hao Ye, Yang Zhong, Ziang Luo, Sicong Jiang, Zilin Huang, Yangyi Fang, Jinyu Miao, Zheng Fu, Yunlong Wang, Kun Jiang, Diange Yang, Rui Fan, Baoyun Peng # 2025-08-04 -+ [Defend LLMs Through Self-Consciousness](https://arxiv.org//abs/2508.02961) ++ [Defend LLMs Through Self-Consciousness](https://arxiv.org/abs/2508.02961) Boshi Huang, Fabio Nonato de Paula -+ [Secure mmWave Beamforming with Proactive-ISAC Defense Against Beam-Stealing Attacks](https://arxiv.org//abs/2508.02856) ++ [Secure mmWave Beamforming with Proactive-ISAC Defense Against Beam-Stealing Attacks](https://arxiv.org/abs/2508.02856) Seyed Bagher Hashemi Natanzi, Hossein Mohammadi, Bo Tang, Vuk Marojevic -+ [Highlight & Summarize: RAG without the jailbreaks](https://arxiv.org//abs/2508.02872) ++ [Highlight & Summarize: RAG without the jailbreaks](https://arxiv.org/abs/2508.02872) Giovanni Cherubin, Andrew Paverd -+ [Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers](https://arxiv.org//abs/2508.02175) ++ [Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers](https://arxiv.org/abs/2508.02175) Liang Lin, Miao Yu, Kaiwen Luo, Yibo Zhang, Lilan Peng, Dexian Wang, Xuehai Tang, Yuanhe Zhang, Xikang Yang, Zhenhong Zhou, Kun Wang, Yang Liu -+ [Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation](https://arxiv.org//abs/2508.02835) ++ [Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation](https://arxiv.org/abs/2508.02835) Kennedy Edemacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, Jong Wook Kim -+ [Online Robust Multi-Agent Reinforcement Learning under Model Uncertainties](https://arxiv.org//abs/2508.02948) ++ [Online Robust Multi-Agent Reinforcement Learning under Model Uncertainties](https://arxiv.org/abs/2508.02948) Zain Ulabedeen Farhat, Debamita Ghosh, George K. Atia, Yue Wang -+ [DINA: A Dual Defense Framework Against Internal Noise and External Attacks in Natural Language Processing](https://arxiv.org//abs/2508.05671) ++ [DINA: A Dual Defense Framework Against Internal Noise and External Attacks in Natural Language Processing](https://arxiv.org/abs/2508.05671) Ko-Wei Chuang, Hen-Hsen Huang, Tsai-Yen Li -+ [MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving](https://arxiv.org//abs/2508.06534) ++ [MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving](https://arxiv.org/abs/2508.06534) Aishan Liu, Jiakai Wang, Tianyuan Zhang, Hainan Li, Jiangfan Liu, Siyuan Liang, Yilong Ren, Xianglong Liu, Dacheng Tao -+ [Towards Stealthy and Effective Backdoor Attacks on Lane Detection: A Naturalistic Data Poisoning Approach](https://arxiv.org//abs/2508.15778) ++ [Towards Stealthy and Effective Backdoor Attacks on Lane Detection: A Naturalistic Data Poisoning Approach](https://arxiv.org/abs/2508.15778) Yifan Liao, Yuxin Cao, Yedi Zhang, Wentao He, Yan Xiao, Xianglong Du, Zhiyong Huang, Jin Song Dong -+ [Is Uncertainty Quantification a Viable Alternative to Learned Deferral?](https://arxiv.org//abs/2508.02319) ++ [Is Uncertainty Quantification a Viable Alternative to Learned Deferral?](https://arxiv.org/abs/2508.02319) Anna M. Wundram, Christian F. Baumgartner -+ [Mitigating Attention Hacking in Preference-Based Reward Modeling via Interaction Distillation](https://arxiv.org//abs/2508.02618) ++ [Mitigating Attention Hacking in Preference-Based Reward Modeling via Interaction Distillation](https://arxiv.org/abs/2508.02618) Jianxiang Zang, Meiling Ning, Shihan Dou, Jiazheng Zhang, Tao Gui, Qi Zhang, Xuanjing Huang # 2025-08-03 -+ [What Makes "Good" Distractors for Object Hallucination Evaluation in Large Vision-Language Models?](https://arxiv.org//abs/2508.06530) ++ [What Makes "Good" Distractors for Object Hallucination Evaluation in Large Vision-Language Models?](https://arxiv.org/abs/2508.06530) Ming-Kun Xie, Jia-Hao Xiao, Gang Niu, Lei Feng, Zhiqiang Kou, Min-Ling Zhang, Masashi Sugiyama -+ [IMU: Influence-guided Machine Unlearning](https://arxiv.org//abs/2508.01620) ++ [IMU: Influence-guided Machine Unlearning](https://arxiv.org/abs/2508.01620) Xindi Fan, Jing Wu, Mingyi Zhou, Pengwei Liang, Dinh Phung -+ [Pr$^2$R: Information-Fused and Style-Aware Privacy-Preserving Replay for Lifelong Person Re-Identification](https://arxiv.org//abs/2508.01587) ++ [Pr$^2$R: Information-Fused and Style-Aware Privacy-Preserving Replay for Lifelong Person Re-Identification](https://arxiv.org/abs/2508.01587) Mingyu Wang, Haojie Liu, Zhiyong Li, Wei Jiang # 2025-08-02 -+ [BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability](https://arxiv.org//abs/2508.01332) ++ [BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability](https://arxiv.org/abs/2508.01332) Zhenhua Zou, Zhuotao Liu, Lepeng Zhao, Qiuyang Zhan -+ [ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models](https://arxiv.org//abs/2508.01365) ++ [ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models](https://arxiv.org/abs/2508.01365) Zihan Wang, Rui Zhang, Hongwei Li, Wenshu Fan, Wenbo Jiang, Qingchuan Zhao, Guowen Xu -+ [PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation](https://arxiv.org//abs/2508.01272) ++ [PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation](https://arxiv.org/abs/2508.01272) Zonglei Jing, Xiao Yang, Xiaoqian Li, Siyuan Liang, Aishan Liu, Mingchuan Zhang, Xianglong Liu # 2025-08-01 -+ [R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge](https://arxiv.org//abs/2508.00324) ++ [R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge](https://arxiv.org/abs/2508.00324) Yeonjun In, Wonjoong Kim, Sangwu Park, Chanyoung Park -+ [Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking](https://arxiv.org//abs/2508.00500) ++ [Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking](https://arxiv.org/abs/2508.00500) Haoyu Wang, Chris M. Poskitt, Jun Sun, Jiali Wei -+ [CyGATE: Game-Theoretic Cyber Attack-Defense Engine for Patch Strategy Optimization](https://arxiv.org//abs/2508.00478) ++ [CyGATE: Game-Theoretic Cyber Attack-Defense Engine for Patch Strategy Optimization](https://arxiv.org/abs/2508.00478) Yuning Jiang, Nay Oo, Qiaoran Meng, Lu Lin, Dusit Niyato, Zehui Xiong, Hoon Wei Lim, Biplab Sikdar -+ [Activation-Guided Local Editing for Jailbreaking Attacks](https://arxiv.org//abs/2508.00555) ++ [Activation-Guided Local Editing for Jailbreaking Attacks](https://arxiv.org/abs/2508.00555) Jiecong Wang, Haoran Li, Hao Peng, Ziqian Zeng, Zihao Wang, Haohua Du, Zhengtao Yu -+ [Wukong Framework for Not Safe For Work Detection in Text-to-Image systems](https://arxiv.org//abs/2508.00591) ++ [Wukong Framework for Not Safe For Work Detection in Text-to-Image systems](https://arxiv.org/abs/2508.00591) Mingrui Liu, Sixiao Zhang, Cheng Long -+ [LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks](https://arxiv.org//abs/2508.00602) ++ [LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks](https://arxiv.org/abs/2508.00602) Francesco Panebianco, Stefano Bonfanti, Francesco Trovò, Michele Carminati -+ [Backdoor Attacks on Deep Learning Face Detection](https://arxiv.org//abs/2508.00620) ++ [Backdoor Attacks on Deep Learning Face Detection](https://arxiv.org/abs/2508.00620) Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi -+ [Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos](https://arxiv.org//abs/2508.00748) ++ [Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos](https://arxiv.org/abs/2508.00748) Laura Pedrouzo-Rodriguez, Pedro Delgado-DeRobles, Luis F. Gomez, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez -+ [MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations](https://arxiv.org//abs/2508.00760) ++ [MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations](https://arxiv.org/abs/2508.00760) Qiyao Xue, Yuchen Dou, Ryan Shi, Xiang Lorraine Li, Wei Gao -+ [DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models](https://arxiv.org//abs/2508.00619) ++ [DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models](https://arxiv.org/abs/2508.00619) Shantanu Thorat, Andrew Caines -+ [Privacy-Preserving Driver Drowsiness Detection with Spatial Self-Attention and Federated Learning](https://arxiv.org//abs/2508.00287) ++ [Privacy-Preserving Driver Drowsiness Detection with Spatial Self-Attention and Federated Learning](https://arxiv.org/abs/2508.00287) Tran Viet Khoa, Do Hai Son, Mohammad Abu Alsheikh, Yibeltal F Alem, Dinh Thai Hoang -+ [IN2OUT: Fine-Tuning Video Inpainting Model for Video Outpainting Using Hierarchical Discriminator](https://arxiv.org//abs/2508.00418) ++ [IN2OUT: Fine-Tuning Video Inpainting Model for Video Outpainting Using Hierarchical Discriminator](https://arxiv.org/abs/2508.00418) Sangwoo Youn, Minji Lee, Nokap Tony Park, Yeonggyoo Jeon, Taeyoung Na -+ [DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification](https://arxiv.org//abs/2508.00552) ++ [DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification](https://arxiv.org/abs/2508.00552) Chihan Huang, Belal Alsinglawi, Islam Al-qudah -+ [Revisiting Adversarial Patch Defenses on Object Detectors: Unified Evaluation, Large-Scale Dataset, and New Insights](https://arxiv.org//abs/2508.00649) ++ [Revisiting Adversarial Patch Defenses on Object Detectors: Unified Evaluation, Large-Scale Dataset, and New Insights](https://arxiv.org/abs/2508.00649) Junhao Zheng, Jiahao Sun, Chenhao Lin, Zhengyu Zhao, Chen Ma, Chong Zhang, Cong Wang, Qian Wang, Chao Shen -+ [STF: Shallow-Level Temporal Feedback to Enhance Spiking Transformers](https://arxiv.org//abs/2508.00387) ++ [STF: Shallow-Level Temporal Feedback to Enhance Spiking Transformers](https://arxiv.org/abs/2508.00387) Zeqi Zheng, Zizheng Zhu, Yingchao Yu, Yanchen Huang, Changze Lv, Junfeng Tang, Zhaofei Yu, Yaochu Jin -+ [Wind Power Scenario Generation based on the Generalized Dynamic Factor Model and Generative Adversarial Network](https://arxiv.org//abs/2508.00692) ++ [Wind Power Scenario Generation based on the Generalized Dynamic Factor Model and Generative Adversarial Network](https://arxiv.org/abs/2508.00692) Young-ho Cho, Hao Zhu, Duehee Lee, Ross Baldick -+ [FedGuard: A Diverse-Byzantine-Robust Mechanism for Federated Learning with Major Malicious Clients](https://arxiv.org//abs/2508.00636) ++ [FedGuard: A Diverse-Byzantine-Robust Mechanism for Federated Learning with Major Malicious Clients](https://arxiv.org/abs/2508.00636) Haocheng Jiang, Hua Shen, Jixin Zhang, Willy Susilo, Mingwu Zhang -+ [LeakyCLIP: Extracting Training Data from CLIP](https://arxiv.org//abs/2508.00756) ++ [LeakyCLIP: Extracting Training Data from CLIP](https://arxiv.org/abs/2508.00756) Yunhao Chen, Shujie Wang, Xin Wang, Xingjun Ma -+ [Random Walk Learning and the Pac-Man Attack](https://arxiv.org//abs/2508.05663) ++ [Random Walk Learning and the Pac-Man Attack](https://arxiv.org/abs/2508.05663) Xingran Chen, Parimal Parag, Rohit Bhagat, Zonghong Liu, Salim El Rouayheb -+ [Privacy Enhancement for Gaze Data Using a Noise-Infused Autoencoder](https://arxiv.org//abs/2508.10918) ++ [Privacy Enhancement for Gaze Data Using a Noise-Infused Autoencoder](https://arxiv.org/abs/2508.10918) Samantha Aziz, Oleg Komogortsev # 2025-07-31 -+ [Hyperproperty-Constrained Secure Reinforcement Learning](https://arxiv.org//abs/2508.00106) ++ [Hyperproperty-Constrained Secure Reinforcement Learning](https://arxiv.org/abs/2508.00106) Ernest Bonnah, Luan Viet Nguyen, Khaza Anuarul Hoque -+ [Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs](https://arxiv.org//abs/2508.00161) ++ [Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs](https://arxiv.org/abs/2508.00161) Ziqian Zhong, Aditi Raghunathan -+ [On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI](https://arxiv.org//abs/2508.00171) ++ [On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI](https://arxiv.org/abs/2508.00171) David Restrepo, Ira Ktena, Maria Vakalopoulou, Stergios Christodoulidis, Enzo Ferrante -+ [Improved Robustness and Functional Localization in Topographic CNNs Through Weight Similarity](https://arxiv.org//abs/2508.00043) ++ [Improved Robustness and Functional Localization in Topographic CNNs Through Weight Similarity](https://arxiv.org/abs/2508.00043) Nhut Truong, Uri Hasson -+ [Data-driven global ocean model resolving ocean-atmosphere coupling dynamics](https://arxiv.org//abs/2508.10908) ++ [Data-driven global ocean model resolving ocean-atmosphere coupling dynamics](https://arxiv.org/abs/2508.10908) Jeong-Hwan Kim, Daehyun Kang, Young-Min Yang, Jae-Heung Park, Yoo-Geun Ham -+ [Gaussian Splatting Feature Fields for Privacy-Preserving Visual Localization](https://arxiv.org//abs/2507.23569) ++ [Gaussian Splatting Feature Fields for Privacy-Preserving Visual Localization](https://arxiv.org/abs/2507.23569) Maxime Pietrantoni, Gabriela Csurka, Torsten Sattler -+ [Foundations and Models in Modern Computer Vision: Key Building Blocks in Landmark Architectures](https://arxiv.org//abs/2507.23357) ++ [Foundations and Models in Modern Computer Vision: Key Building Blocks in Landmark Architectures](https://arxiv.org/abs/2507.23357) Radu-Andrei Bourceanu, Neil De La Fuente, Jan Grimm, Andrei Jardan, Andriy Manucharyan, Cornelius Weiss, Daniel Cremers, Roman Pflugfelder -+ [FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning](https://arxiv.org//abs/2507.23318) ++ [FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning](https://arxiv.org/abs/2507.23318) Jiajun Cao, Qizhe Zhang, Peidong Jia, Xuhui Zhao, Bo Lan, Xiaoan Zhang, Zhuo Li, Xiaobao Wei, Sixiang Chen, Liyun Li, Xianming Liu, Ming Lu, Yang Wang, Shanghang Zhang -+ [Measuring Harmfulness of Computer-Using Agents](https://arxiv.org//abs/2508.00935) ++ [Measuring Harmfulness of Computer-Using Agents](https://arxiv.org/abs/2508.00935) Aaron Xuxiang Tian, Ruofan Zhang, Janet Tang, Ji Wang, Tianyu Shi, Jiaxin Wen -+ [Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level](https://arxiv.org//abs/2507.23512) ++ [Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level](https://arxiv.org/abs/2507.23512) Saleh Vatan Khah, Savelii Chezhegov, Shahrokh Farahmand, Samuel Horváth, Eduard Gorbunov # 2025-07-30 -+ [Theoretical Analysis of Relative Errors in Gradient Computations for Adversarial Attacks with CE Loss](https://arxiv.org//abs/2507.22428) ++ [Theoretical Analysis of Relative Errors in Gradient Computations for Adversarial Attacks with CE Loss](https://arxiv.org/abs/2507.22428) Yunrui Yu, Hang Su, Cheng-zhong Xu, Zhizhong Su, Jun Zhu -+ [RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function](https://arxiv.org//abs/2507.22446) ++ [RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function](https://arxiv.org/abs/2507.22446) Yunrui Yu, Kafeng Wang, Hang Su, Jun Zhu -+ [LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning](https://arxiv.org//abs/2507.22499) ++ [LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning](https://arxiv.org/abs/2507.22499) Xiang Li, Qianli Shen, Haonan Wang, Kenji Kawaguchi -+ [Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs](https://arxiv.org//abs/2507.22564) ++ [Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs](https://arxiv.org/abs/2507.22564) Xikang Yang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu -+ [Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning](https://arxiv.org//abs/2507.22565) ++ [Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning](https://arxiv.org/abs/2507.22565) Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen -+ [Metamorphic Testing of Deep Code Models: A Systematic Literature Review](https://arxiv.org//abs/2507.22610) ++ [Metamorphic Testing of Deep Code Models: A Systematic Literature Review](https://arxiv.org/abs/2507.22610) Ali Asgari, Milan de Koning, Pouria Derakhshanfar, Annibale Panichella -+ [Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision](https://arxiv.org//abs/2507.22760) ++ [Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision](https://arxiv.org/abs/2507.22760) Samuel Teuber, Debasmita Lohar, Bernhard Beckert -+ [CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models](https://arxiv.org//abs/2507.22828) ++ [CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models](https://arxiv.org/abs/2507.22828) Kedong Xiu, Saiqian Zhang -+ [On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations](https://arxiv.org//abs/2507.22398) ++ [On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations](https://arxiv.org/abs/2507.22398) Jordan Vice, Naveed Akhtar, Yansong Gao, Richard Hartley, Ajmal Mian -+ [Bridging the Gap in Missing Modalities: Leveraging Knowledge Distillation and Style Matching for Brain Tumor Segmentation](https://arxiv.org//abs/2507.22626) ++ [Bridging the Gap in Missing Modalities: Leveraging Knowledge Distillation and Style Matching for Brain Tumor Segmentation](https://arxiv.org/abs/2507.22626) Shenghao Zhu, Yifei Chen, Weihong Chen, Yuanhan Wang, Chang Liu, Shuo Jiang, Feiwei Qin, Changmiao Wang -+ [DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion](https://arxiv.org//abs/2507.22813) ++ [DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion](https://arxiv.org/abs/2507.22813) Hossein Mirzaei, Zeinab Taghavi, Sepehr Rezaee, Masoud Hadi, Moein Madadi, Mackenzie W. Mathis -+ [LCS: An AI-based Low-Complexity Scaler for Power-Efficient Super-Resolution of Game Content](https://arxiv.org//abs/2507.22873) ++ [LCS: An AI-based Low-Complexity Scaler for Power-Efficient Super-Resolution of Game Content](https://arxiv.org/abs/2507.22873) Simon Pochinda, Momen K. Tageldeen, Mark Thompson, Tony Rinaldi, Troy Giorshev, Keith Lee, Jie Zhou, Frederick Walls -+ [Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions](https://arxiv.org//abs/2507.22617) ++ [Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions](https://arxiv.org/abs/2507.22617) Yiting Qu, Ziqing Yang, Yihan Ma, Michael Backes, Savvas Zannettou, Yang Zhang -+ [Invisible Injections: Exploiting Vision-Language Models Through Steganographic Prompt Embedding](https://arxiv.org//abs/2507.22304) ++ [Invisible Injections: Exploiting Vision-Language Models Through Steganographic Prompt Embedding](https://arxiv.org/abs/2507.22304) Chetan Pathade -+ [Benchmarking Fraud Detectors on Private Graph Data](https://arxiv.org//abs/2507.22347) ++ [Benchmarking Fraud Detectors on Private Graph Data](https://arxiv.org/abs/2507.22347) Alexander Goldberg, Giulia Fanti, Nihar Shah, Zhiwei Steven Wu -+ [Low-Communication Resilient Distributed Estimation Algorithm Based on Memory Mechanism](https://arxiv.org//abs/2508.02705) ++ [Low-Communication Resilient Distributed Estimation Algorithm Based on Memory Mechanism](https://arxiv.org/abs/2508.02705) Wei Li, Limei Hu, Feng Chen, Ye Yao -+ [Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards](https://arxiv.org//abs/2508.05658) ++ [Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards](https://arxiv.org/abs/2508.05658) Song Yan, Hui Wei, Jinlong Fei, Guoliang Yang, Zhengyu Zhao, Zheng Wamg -+ [Whisper Smarter, not Harder: Adversarial Attack on Partial Suppression](https://arxiv.org//abs/2508.09994) ++ [Whisper Smarter, not Harder: Adversarial Attack on Partial Suppression](https://arxiv.org/abs/2508.09994) Zheng Jie Wong, Bingquan Shen # 2025-07-29 -+ [When Truthful Representations Flip Under Deceptive Instructions?](https://arxiv.org//abs/2507.22149) ++ [When Truthful Representations Flip Under Deceptive Instructions?](https://arxiv.org/abs/2507.22149) Xianxuan Long, Yao Fu, Runchao Li, Mu Sheng, Haotian Yu, Xiaotian Han, Pan Li -+ [Strategic Deflection: Defending LLMs from Logit Manipulation](https://arxiv.org//abs/2507.22160) ++ [Strategic Deflection: Defending LLMs from Logit Manipulation](https://arxiv.org/abs/2507.22160) Yassine Rachidy, Jihad Rbaiti, Youssef Hmamouche, Faissal Sehbaoui, Amal El Fallah Seghrouchni -+ [Quantum-Inspired Audio Unlearning: Towards Privacy-Preserving Voice Biometrics](https://arxiv.org//abs/2507.22208) ++ [Quantum-Inspired Audio Unlearning: Towards Privacy-Preserving Voice Biometrics](https://arxiv.org/abs/2507.22208) Shreyansh Pathak, Sonu Shreshtha, Richa Singh, Mayank Vatsa -+ [Prompt Optimization and Evaluation for LLM Automated Red Teaming](https://arxiv.org//abs/2507.22133) ++ [Prompt Optimization and Evaluation for LLM Automated Red Teaming](https://arxiv.org/abs/2507.22133) Michael Freenor, Lauren Alvarez, Milton Leal, Lily Smith, Joel Garrett, Yelyzaveta Husieva, Madeline Woodruff, Ryan Miller, Erich Kummerfeld, Rafael Medeiros, Sander Schulhoff -+ [Towards Privacy-preserving Photorealistic Self-avatars in Mixed Reality](https://arxiv.org//abs/2507.22153) ++ [Towards Privacy-preserving Photorealistic Self-avatars in Mixed Reality](https://arxiv.org/abs/2507.22153) Ethan Wilson, Vincent Bindschaedler, Sophie Jörg, Sean Sheikholeslam, Kevin Butler, Eakta Jain -+ [Cascading and Proxy Membership Inference Attacks](https://arxiv.org//abs/2507.21412) ++ [Cascading and Proxy Membership Inference Attacks](https://arxiv.org/abs/2507.21412) Yuntao Du, Jiacheng Li, Yuetian Chen, Kaiyuan Zhang, Zhizhen Yuan, Hanshen Xiao, Bruno Ribeiro, Ninghui Li @@ -5468,16 +5468,16 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yang Wang, Chenghao Xiao, Yizhi Li, Stuart E. Middleton, Noura Al Moubayed, Chenghua Lin -+ [Adversarial Defence without Adversarial Defence: Enhancing Language Model Robustness via Instance-level Principal Component Removal](https://arxiv.org//abs/2507.21750) ++ [Adversarial Defence without Adversarial Defence: Enhancing Language Model Robustness via Instance-level Principal Component Removal](https://arxiv.org/abs/2507.21750) Yang Wang, Chenghao Xiao, Yizhi Li, Stuart E. Middleton, Noura Al Moubayed, Chenghua Lin # 2025-07-28 -+ [Enhancing Jailbreak Attacks on LLMs via Persona Prompts](https://arxiv.org//abs/2507.22171) ++ [Enhancing Jailbreak Attacks on LLMs via Persona Prompts](https://arxiv.org/abs/2507.22171) Zheng Zhang, Peilin Zhao, Deheng Ye, Hao Wang -+ [Harnessing Diffusion-Yielded Score Priors for Image Restoration](https://arxiv.org//abs/2507.20590) ++ [Harnessing Diffusion-Yielded Score Priors for Image Restoration](https://arxiv.org/abs/2507.20590) Xinqi Lin, Fanghua Yu, Jinfan Hu, Zhiyuan You, Wu Shi, Jimmy S. Ren, Jinjin Gu, Chao Dong @@ -5505,15 +5505,15 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Danil Savine, Muni Sreenivas Pydi, Jamal Atif, Olivier Cappé -+ [Verification Cost Asymmetry in Cognitive Warfare: A Complexity-Theoretic Framework](https://arxiv.org//abs/2507.21258) ++ [Verification Cost Asymmetry in Cognitive Warfare: A Complexity-Theoretic Framework](https://arxiv.org/abs/2507.21258) Joshua Luberisse -+ [From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation](https://arxiv.org//abs/2507.20968) ++ [From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation](https://arxiv.org/abs/2507.20968) Rongyao Cai, Ming Jin, Qingsong Wen, Kexin Zhang -+ [Memorization in Fine-Tuned Large Language Models](https://arxiv.org//abs/2507.21009) ++ [Memorization in Fine-Tuned Large Language Models](https://arxiv.org/abs/2507.21009) Danil Savine @@ -5581,75 +5581,75 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Muntasir Wahed, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Nirav Diwan, Gang Wang, Dilek Hakkani-Tür, Ismini Lourentzou # 2025-07-24 -+ [ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks](https://arxiv.org//abs/2507.18031) ++ [ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks](https://arxiv.org/abs/2507.18031) Ahmad ALBarqawi, Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan -+ [Safeguarding RAG Pipelines with GMTP: A Gradient-based Masked Token Probability Method for Poisoned Document Detection](https://arxiv.org//abs/2507.18202) ++ [Safeguarding RAG Pipelines with GMTP: A Gradient-based Masked Token Probability Method for Poisoned Document Detection](https://arxiv.org/abs/2507.18202) San Kim, Jonghwi Kim, Yejin Jeon, Gary Geunbae Lee -+ [LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models](https://arxiv.org//abs/2507.18302) ++ [LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models](https://arxiv.org/abs/2507.18302) Delong Ran, Xinlei He, Tianshuo Cong, Anyu Wang, Qi Li, Xiaoyun Wang -+ [Revisiting Physically Realizable Adversarial Object Attack against LiDAR-based Detection: Clarifying Problem Formulation and Experimental Protocols](https://arxiv.org//abs/2507.18457) ++ [Revisiting Physically Realizable Adversarial Object Attack against LiDAR-based Detection: Clarifying Problem Formulation and Experimental Protocols](https://arxiv.org/abs/2507.18457) Luo Cheng, Hanwei Zhang, Lijun Zhang, Holger Hermanns -+ [Reinforced Embodied Active Defense: Exploiting Adaptive Interaction for Robust Visual Perception in Adversarial 3D Environments](https://arxiv.org//abs/2507.18484) ++ [Reinforced Embodied Active Defense: Exploiting Adaptive Interaction for Robust Visual Perception in Adversarial 3D Environments](https://arxiv.org/abs/2507.18484) Xiao Yang, Lingxuan Wu, Lizhong Wang, Chengyang Ying, Hang Su, Jun Zhu -+ [Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs](https://arxiv.org//abs/2507.18055) ++ [Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs](https://arxiv.org/abs/2507.18055) Tevin Atwal, Chan Nam Tieu, Yefeng Yuan, Zhan Shi, Yuhong Liu, Liang Cheng -+ [Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation](https://arxiv.org//abs/2507.18203) ++ [Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation](https://arxiv.org/abs/2507.18203) Kyubeen Han, Junseo Jang, Hongjin Kim, Geunyeong Jeong, Harksoo Kim -+ [BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit](https://arxiv.org//abs/2507.18305) ++ [BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit](https://arxiv.org/abs/2507.18305) Biao Yi, Zekun Fei, Jianing Geng, Tong Li, Lihai Nie, Zheli Liu, Yiming Li -+ [RECALLED: An Unbounded Resource Consumption Attack on Large Vision-Language Models](https://arxiv.org//abs/2507.18053) ++ [RECALLED: An Unbounded Resource Consumption Attack on Large Vision-Language Models](https://arxiv.org/abs/2507.18053) Haoran Gao, Yuanhe Zhang, Zhenhong Zhou, Lei Jiang, Fanyu Meng, Yujia Xiao, Kun Wang, Yang Liu, Junlan Feng -+ [Facial Demorphing from a Single Morph Using a Latent Conditional GAN](https://arxiv.org//abs/2507.18566) ++ [Facial Demorphing from a Single Morph Using a Latent Conditional GAN](https://arxiv.org/abs/2507.18566) Nitish Shukla, Arun Ross -+ [Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis](https://arxiv.org//abs/2507.18569) ++ [Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis](https://arxiv.org/abs/2507.18569) Yanzuo Lu, Yuxi Ren, Xin Xia, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Andy J. Ma, Xiaohua Xie, Jian-Huang Lai -+ [NWaaS: Nonintrusive Watermarking as a Service for X-to-Image DNN](https://arxiv.org//abs/2507.18036) ++ [NWaaS: Nonintrusive Watermarking as a Service for X-to-Image DNN](https://arxiv.org/abs/2507.18036) Haonan An, Guang Hua, Yu Guo, Hangcheng Cao, Susanto Rahardja, Yuguang Fang -+ [C-AAE: Compressively Anonymizing Autoencoders for Privacy-Preserving Activity Recognition in Healthcare Sensor Streams](https://arxiv.org//abs/2507.18072) ++ [C-AAE: Compressively Anonymizing Autoencoders for Privacy-Preserving Activity Recognition in Healthcare Sensor Streams](https://arxiv.org/abs/2507.18072) Ryusei Fujimoto, Yugo Nakamura, Yutaka Arakawa -+ [Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification](https://arxiv.org//abs/2507.18113) ++ [Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification](https://arxiv.org/abs/2507.18113) Junyong Jiang, Buwei Tian, Chenxing Xu, Songze Li, Lu Dong -+ [On Reconstructing Training Data From Bayesian Posteriors and Trained Models](https://arxiv.org//abs/2507.18372) ++ [On Reconstructing Training Data From Bayesian Posteriors and Trained Models](https://arxiv.org/abs/2507.18372) George Wynne -+ [Removing Box-Free Watermarks for Image-to-Image Models via Query-Based Reverse Engineering](https://arxiv.org//abs/2507.18034) ++ [Removing Box-Free Watermarks for Image-to-Image Models via Query-Based Reverse Engineering](https://arxiv.org/abs/2507.18034) Haonan An, Guang Hua, Hangcheng Cao, Zhengru Fang, Guowen Xu, Susanto Rahardja, Yuguang Fang -+ [Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment](https://arxiv.org//abs/2507.18631) ++ [Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment](https://arxiv.org/abs/2507.18631) Hao Li, Lijun Li, Zhenghao Lu, Xianyi Wei, Rui Li, Jing Shao, Lei Sha -+ [RecPS: Privacy Risk Scoring for Recommender Systems](https://arxiv.org//abs/2507.18365) ++ [RecPS: Privacy Risk Scoring for Recommender Systems](https://arxiv.org/abs/2507.18365) Jiajie He, Yuechun Gu, Keke Chen @@ -5665,113 +5665,113 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Ran Tong, Songtao Wei, Jiaqi Liu, Lanruo Wang -+ [Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content](https://arxiv.org//abs/2507.19551) ++ [Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content](https://arxiv.org/abs/2507.19551) Ran Tong, Songtao Wei, Jiaqi Liu, Lanruo Wang # 2025-07-23 -+ [Ctx2TrajGen: Traffic Context-Aware Microscale Vehicle Trajectories using Generative Adversarial Imitation Learning](https://arxiv.org//abs/2507.17418) ++ [Ctx2TrajGen: Traffic Context-Aware Microscale Vehicle Trajectories using Generative Adversarial Imitation Learning](https://arxiv.org/abs/2507.17418) Joobin Jin, Seokjun Hong, Gyeongseon Baek, Yeeun Kim, Byeongjoon Noh -+ [P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices](https://arxiv.org//abs/2507.17228) ++ [P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices](https://arxiv.org/abs/2507.17228) Wei Fan, JinYi Yoon, Xiaochang Li, Huajie Shao, Bo Ji -+ [Investigating Training Data Detection in AI Coders](https://arxiv.org//abs/2507.17389) ++ [Investigating Training Data Detection in AI Coders](https://arxiv.org/abs/2507.17389) Tianlin Li, Yunxiang Wei, Zhiming Li, Aishan Liu, Qing Guo, Xianglong Liu, Dongning Sun, Yang Liu -+ [On the Interaction of Compressibility and Adversarial Robustness](https://arxiv.org//abs/2507.17725) ++ [On the Interaction of Compressibility and Adversarial Robustness](https://arxiv.org/abs/2507.17725) Melih Barsbey, Antônio H. Ribeiro, Umut Şimşekli, Tolga Birdal -+ [Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks](https://arxiv.org//abs/2507.17747) ++ [Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks](https://arxiv.org/abs/2507.17747) Linbo Cao, Jinman Zhao -+ [Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs](https://arxiv.org//abs/2507.17259) ++ [Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs](https://arxiv.org/abs/2507.17259) Eyal German, Sagiv Antebi, Daniel Samira, Asaf Shabtai, Yuval Elovici -+ [An h-space Based Adversarial Attack for Protection Against Few-shot Personalization](https://arxiv.org//abs/2507.17554) ++ [An h-space Based Adversarial Attack for Protection Against Few-shot Personalization](https://arxiv.org/abs/2507.17554) Xide Xu, Sandesh Kamath, Muhammad Atif Butt, Bogdan Raducanu -+ [Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors](https://arxiv.org//abs/2507.17577) ++ [Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors](https://arxiv.org/abs/2507.17577) Chen Ma, Xinjie Xu, Shuyu Cheng, Qi Xuan -+ [BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems](https://arxiv.org//abs/2507.17722) ++ [BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems](https://arxiv.org/abs/2507.17722) Malsha Ashani Mahawatta Dona, Beatriz Cabrero-Daniel, Yinan Yu, Christian Berger -+ [A Comprehensive Evaluation Framework for the Study of the Effects of Facial Filters on Face Recognition Accuracy](https://arxiv.org//abs/2507.17729) ++ [A Comprehensive Evaluation Framework for the Study of the Effects of Facial Filters on Face Recognition Accuracy](https://arxiv.org/abs/2507.17729) Kagan Ozturk, Louisa Conwill, Jacob Gutierrez, Kevin Bowyer, Walter J. Scheirer -+ [Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees](https://arxiv.org//abs/2507.17453) ++ [Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees](https://arxiv.org/abs/2507.17453) Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H.M.N. Dilum Bandara, Shiping Chen, Jianjun Zhao, Yulei Sui -+ [A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis](https://arxiv.org//abs/2507.17180) ++ [A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis](https://arxiv.org/abs/2507.17180) Hao Jiang, Quan Zhou, Dongdong Zhao, Shangshang Yang, Wenjian Luo, Xingyi Zhang -+ [Threshold-Protected Searchable Sharing: Privacy Preserving Aggregated-ANN Search for Collaborative RAG](https://arxiv.org//abs/2507.17199) ++ [Threshold-Protected Searchable Sharing: Privacy Preserving Aggregated-ANN Search for Collaborative RAG](https://arxiv.org/abs/2507.17199) Ruoyang Rykie Guo -+ [From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models](https://arxiv.org//abs/2507.17922) ++ [From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models](https://arxiv.org/abs/2507.17922) Jessica Quaye, Charvi Rastogi, Alicia Parrish, Oana Inel, Minsuk Kahng, Lora Aroyo, Vijay Janapa Reddi -+ [Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation](https://arxiv.org//abs/2507.17937) ++ [Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation](https://arxiv.org/abs/2507.17937) Jaechul Roh, Zachary Novack, Yuefeng Peng, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Amir Houmansadr -+ [Minimax Data Sanitization with Distortion Constraint and Adversarial Inference](https://arxiv.org//abs/2507.17942) ++ [Minimax Data Sanitization with Distortion Constraint and Adversarial Inference](https://arxiv.org/abs/2507.17942) Amirarsalan Moatazedian, Yauhen Yakimenka, Rémi A. Chou, Jörg Kliewer -+ [Evaluating the Performance of AI Text Detectors, Few-Shot and Chain-of-Thought Prompting Using DeepSeek Generated Text](https://arxiv.org//abs/2507.17944) ++ [Evaluating the Performance of AI Text Detectors, Few-Shot and Chain-of-Thought Prompting Using DeepSeek Generated Text](https://arxiv.org/abs/2507.17944) Hulayyil Alshammari, Praveen Rao -+ [Wasserstein GAN-Based Precipitation Downscaling with Optimal Transport for Enhancing Perceptual Realism](https://arxiv.org//abs/2507.17798) ++ [Wasserstein GAN-Based Precipitation Downscaling with Optimal Transport for Enhancing Perceptual Realism](https://arxiv.org/abs/2507.17798) Kenta Shiraishi, Yuka Muto, Atsushi Okazaki, Shunji Kotsuki -+ [Lower Bounds for Public-Private Learning under Distribution Shift](https://arxiv.org//abs/2507.17895) ++ [Lower Bounds for Public-Private Learning under Distribution Shift](https://arxiv.org/abs/2507.17895) Amrith Setlur, Pratiksha Thaker, Jonathan Ullman # 2025-07-22 -+ [CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage](https://arxiv.org//abs/2507.16872) ++ [CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage](https://arxiv.org/abs/2507.16872) Na Li, Yansong Gao, Hongsheng Hu, Boyu Kuang, Anmin Fu -+ [Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed](https://arxiv.org//abs/2507.16880) ++ [Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed](https://arxiv.org/abs/2507.16880) Antoni Kowalczuk, Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch -+ [Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge Proofs](https://arxiv.org//abs/2507.17010) ++ [Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge Proofs](https://arxiv.org/abs/2507.17010) H M Mohaimanul Islam, Huynh Q. N. Vo, Aditya Rane -+ [Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense Approach](https://arxiv.org//abs/2507.17070) ++ [Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense Approach](https://arxiv.org/abs/2507.17070) Adithya Mohan, Dominik Rößle, Daniel Cremers, Torsten Schön -+ [Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation](https://arxiv.org//abs/2507.17066) ++ [Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation](https://arxiv.org/abs/2507.17066) Jessup Byun, Xiaofeng Lin, Joshua Ward, Guang Cheng -+ [GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI](https://arxiv.org//abs/2507.17033) ++ [GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI](https://arxiv.org/abs/2507.17033) Joshua Kalyanapu, Farshad Dizani, Darsh Asher, Azam Ghanbari, Rosario Cammarota, Aydin Aysu, Samira Mirbagher Ajorpaz -+ [LENS-DF: Deepfake Detection and Temporal Localization for Long-Form Noisy Speech](https://arxiv.org//abs/2507.16220) ++ [LENS-DF: Deepfake Detection and Temporal Localization for Long-Form Noisy Speech](https://arxiv.org/abs/2507.16220) Xuechen Liu, Wanying Ge, Xin Wang, Junichi Yamagishi @@ -5779,154 +5779,154 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Muhammad Zaeem Shahzad, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique -+ [Argument Quality Annotation and Gender Bias Detection in Financial Communication through Large Language Models](https://arxiv.org//abs/2508.08262) ++ [Argument Quality Annotation and Gender Bias Detection in Financial Communication through Large Language Models](https://arxiv.org/abs/2508.08262) Alaa Alhamzeh, Mays Al Rebdawi -+ [The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation](https://arxiv.org//abs/2507.16345) ++ [The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation](https://arxiv.org/abs/2507.16345) Sara Ahmadian, Edith Cohen, Uri Stemmer -+ [Robustifying Learning-Augmented Caching Efficiently without Compromising 1-Consistency](https://arxiv.org//abs/2507.16242) ++ [Robustifying Learning-Augmented Caching Efficiently without Compromising 1-Consistency](https://arxiv.org/abs/2507.16242) Peng Chen, Hailiang Zhao, Jiaji Zhang, Xueyan Tang, Yixuan Wang, Shuiguang Deng # 2025-07-21 -+ [Challenges of Trustworthy Federated Learning: What's Done, Current Trends and Remaining Work](https://arxiv.org//abs/2507.15796) ++ [Challenges of Trustworthy Federated Learning: What's Done, Current Trends and Remaining Work](https://arxiv.org/abs/2507.15796) Nuria Rodríguez-Barroso, Mario García-Márquez, M. Victoria Luzón, Francisco Herrera -+ [PromptArmor: Simple yet Effective Prompt Injection Defenses](https://arxiv.org//abs/2507.15219) ++ [PromptArmor: Simple yet Effective Prompt Injection Defenses](https://arxiv.org/abs/2507.15219) Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, Dawn Song -+ [Scaling Decentralized Learning with FLock](https://arxiv.org//abs/2507.15349) ++ [Scaling Decentralized Learning with FLock](https://arxiv.org/abs/2507.15349) Zehua Cheng, Rui Sun, Jiahao Sun, Yike Guo -+ [Red-Team Multi-Agent Reinforcement Learning for Emergency Braking Scenario](https://arxiv.org//abs/2507.15587) ++ [Red-Team Multi-Agent Reinforcement Learning for Emergency Braking Scenario](https://arxiv.org/abs/2507.15587) Yinsong Chen, Kaifeng Wang, Xiaoqiang Meng, Xueyuan Li, Zirui Li, Xin Gao -+ [Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems](https://arxiv.org//abs/2507.15613) ++ [Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems](https://arxiv.org/abs/2507.15613) Andrii Balashov, Olena Ponomarova, Xiaohua Zhai -+ [Missing value imputation with adversarial random forests -- MissARF](https://arxiv.org//abs/2507.15681) ++ [Missing value imputation with adversarial random forests -- MissARF](https://arxiv.org/abs/2507.15681) Pegah Golchian, Jan Kapar, David S. Watson, Marvin N. Wright -+ [Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection](https://arxiv.org//abs/2507.15286) ++ [Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection](https://arxiv.org/abs/2507.15286) Navid Ayoobi, Sadat Shahriar, Arjun Mukherjee -+ [Exploiting Context-dependent Duration Features for Voice Anonymization Attack Systems](https://arxiv.org//abs/2507.15214) ++ [Exploiting Context-dependent Duration Features for Voice Anonymization Attack Systems](https://arxiv.org/abs/2507.15214) Natalia Tomashenko, Emmanuel Vincent, Marc Tommasi -+ [In-context Learning of Vision Language Models for Detection of Physical and Digital Attacks against Face Recognition Systems](https://arxiv.org//abs/2507.15285) ++ [In-context Learning of Vision Language Models for Detection of Physical and Digital Attacks against Face Recognition Systems](https://arxiv.org/abs/2507.15285) Lazaro Janier Gonzalez-Soler, Maciej Salwowski, Christoph Busch -+ [Rethinking Occlusion in FER: A Semantic-Aware Perspective and Go Beyond](https://arxiv.org//abs/2507.15401) ++ [Rethinking Occlusion in FER: A Semantic-Aware Perspective and Go Beyond](https://arxiv.org/abs/2507.15401) Huiyu Zhai, Xingxing Yang, Yalan Ye, Chenyang Li, Bin Fan, Changze Li -+ [Optimizing Canaries for Privacy Auditing with Metagradient Descent](https://arxiv.org//abs/2507.15836) ++ [Optimizing Canaries for Privacy Auditing with Metagradient Descent](https://arxiv.org/abs/2507.15836) Matteo Boglioni, Terrance Liu, Andrew Ilyas, Zhiwei Steven Wu -+ [Robust and Differentially Private PCA for non-Gaussian data](https://arxiv.org//abs/2507.15232) ++ [Robust and Differentially Private PCA for non-Gaussian data](https://arxiv.org/abs/2507.15232) Minwoo Kim, Sungkyu Jung -+ [Weak Links in LinkedIn: Enhancing Fake Profile Detection in the Age of LLMs](https://arxiv.org//abs/2507.16860) ++ [Weak Links in LinkedIn: Enhancing Fake Profile Detection in the Age of LLMs](https://arxiv.org/abs/2507.16860) Apoorva Gulati, Rajesh Kumar, Vinti Agarwal, Aditya Sharma -+ [Security study based on the Chatgptplugin system: ldentifying Security Vulnerabilities](https://arxiv.org//abs/2507.21128) ++ [Security study based on the Chatgptplugin system: ldentifying Security Vulnerabilities](https://arxiv.org/abs/2507.21128) Ruomai Ren # 2025-07-20 -+ [DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection](https://arxiv.org//abs/2507.15042) ++ [DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection](https://arxiv.org/abs/2507.15042) Jerry Wang, Fang Yu -+ [Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree](https://arxiv.org//abs/2507.14799) ++ [Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree](https://arxiv.org/abs/2507.14799) Sam Johnson, Viet Pham, Thai Le -+ [Subliminal Learning: Language models transmit behavioral traits via hidden signals in data](https://arxiv.org//abs/2507.14805) ++ [Subliminal Learning: Language models transmit behavioral traits via hidden signals in data](https://arxiv.org/abs/2507.14805) Alex Cloud, Minh Le, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, Owain Evans -+ [Byzantine-Robust Decentralized Coordination of LLM Agents](https://arxiv.org//abs/2507.14928) ++ [Byzantine-Robust Decentralized Coordination of LLM Agents](https://arxiv.org/abs/2507.14928) Yongrae Jo, Chanik Park -+ [Robust Control with Gradient Uncertainty](https://arxiv.org//abs/2507.15082) ++ [Robust Control with Gradient Uncertainty](https://arxiv.org/abs/2507.15082) Qian Qi -+ [Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding](https://arxiv.org//abs/2507.15028) ++ [Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding](https://arxiv.org/abs/2507.15028) Yuanhan Zhang, Yunice Chew, Yuhao Dong, Aria Leo, Bo Hu, Ziwei Liu -+ [Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data](https://arxiv.org//abs/2507.14999) ++ [Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data](https://arxiv.org/abs/2507.14999) Yunfeng Li, Junhong Liu, Zhaohui Yang, Guofu Liao, Chuyun Zhang -+ [ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model](https://arxiv.org//abs/2507.15067) ++ [ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model](https://arxiv.org/abs/2507.15067) Bing He, Mustaque Ahamad, Srijan Kumar -+ [Distributional Unlearning: Forgetting Distributions, Not Just Samples](https://arxiv.org//abs/2507.15112) ++ [Distributional Unlearning: Forgetting Distributions, Not Just Samples](https://arxiv.org/abs/2507.15112) Youssef Allouah, Rachid Guerraoui, Sanmi Koyejo -+ [Differentially Private Synthetic Graphs Preserving Triangle-Motif Cuts](https://arxiv.org//abs/2507.14835) ++ [Differentially Private Synthetic Graphs Preserving Triangle-Motif Cuts](https://arxiv.org/abs/2507.14835) Pan Peng, Hangyu Xu -+ [Privacy Risks of LLM-Empowered Recommender Systems: An Inversion Attack Perspective](https://arxiv.org//abs/2508.03703) ++ [Privacy Risks of LLM-Empowered Recommender Systems: An Inversion Attack Perspective](https://arxiv.org/abs/2508.03703) Yubo Wang, Min Tang, Nuo Shen, Shujie Cui, Weiqing Wang # 2025-07-19 -+ [Automated Safety Evaluations Across 20 Large Language Models: The Aymara LLM Risk and Responsibility Matrix](https://arxiv.org//abs/2507.14719) ++ [Automated Safety Evaluations Across 20 Large Language Models: The Aymara LLM Risk and Responsibility Matrix](https://arxiv.org/abs/2507.14719) Juan Manuel Contreras -+ [VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning](https://arxiv.org//abs/2507.14625) ++ [VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning](https://arxiv.org/abs/2507.14625) Juntao Tan, Anran Li, Quanchao Liu, Peng Ran, Lan Zhang -+ [VMask: Tunable Label Privacy Protection for Vertical Federated Learning via Layer Masking](https://arxiv.org//abs/2507.14629) ++ [VMask: Tunable Label Privacy Protection for Vertical Federated Learning via Layer Masking](https://arxiv.org/abs/2507.14629) Juntao Tan, Lan Zhang, Zhonghao Hu, Kai Yang, Peng Ran, Bo Li -+ [GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks](https://arxiv.org//abs/2507.14679) ++ [GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks](https://arxiv.org/abs/2507.14679) Zixin Xu, Zhijie Wang, Zhiyuan Pan -+ [Analyzing Internal Activity and Robustness of SNNs Across Neuron Parameter Space](https://arxiv.org//abs/2507.14757) ++ [Analyzing Internal Activity and Robustness of SNNs Across Neuron Parameter Space](https://arxiv.org/abs/2507.14757) Szymon Mazurek, Jakub Caputa, Maciej Wielgosz -+ [MultiRetNet: A Multimodal Vision Model and Deferral System for Staging Diabetic Retinopathy](https://arxiv.org//abs/2507.14738) ++ [MultiRetNet: A Multimodal Vision Model and Deferral System for Staging Diabetic Retinopathy](https://arxiv.org/abs/2507.14738) Jeannie She, Katie Spivakovsky -+ [Glitches in Decision Tree Ensemble Models](https://arxiv.org//abs/2507.14492) ++ [Glitches in Decision Tree Ensemble Models](https://arxiv.org/abs/2507.14492) Satyankar Chandra, Ashutosh Gupta, Kaushik Mallik, Krishna Shankaranarayanan, Namrita Varshney -+ [FORTA: Byzantine-Resilient FL Aggregation via DFT-Guided Krum](https://arxiv.org//abs/2507.14588) ++ [FORTA: Byzantine-Resilient FL Aggregation via DFT-Guided Krum](https://arxiv.org/abs/2507.14588) Usayd Shahul, J. Harshan -+ [Towards Urban Planing AI Agent in the Age of Agentic AI](https://arxiv.org//abs/2507.14730) ++ [Towards Urban Planing AI Agent in the Age of Agentic AI](https://arxiv.org/abs/2507.14730) Yanjie Fu, Dongjie Wang @@ -5935,698 +5935,698 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Rui Liu, Tao Zhe, Zhong-Ren Peng, Necati Catbas, Xinyue Ye, Dongjie Wang, Yanjie Fu # 2025-07-18 -+ [GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention](https://arxiv.org//abs/2507.13598) ++ [GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention](https://arxiv.org/abs/2507.13598) Amro Abdalla, Ismail Shaheen, Dan DeGenaro, Rupayan Mallick, Bogdan Raita, Sarah Adel Bargal -+ [Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques](https://arxiv.org//abs/2507.13629) ++ [Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques](https://arxiv.org/abs/2507.13629) Niveen O. Jaffal, Mohammed Alkhanafseh, David Mohaisen -+ [Innocence in the Crossfire: Roles of Skip Connections in Jailbreaking Visual Language Models](https://arxiv.org//abs/2507.13761) ++ [Innocence in the Crossfire: Roles of Skip Connections in Jailbreaking Visual Language Models](https://arxiv.org/abs/2507.13761) Palash Nandi, Maithili Joshi, Tanmoy Chakraborty -+ [Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model](https://arxiv.org//abs/2507.13599) ++ [Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model](https://arxiv.org/abs/2507.13599) Chengxu Liu, Lu Qi, Jinshan Pan, Xueming Qian, Ming-Hsuan Yang -+ [Adversarial Training Improves Generalization Under Distribution Shifts in Bioacoustics](https://arxiv.org//abs/2507.13727) ++ [Adversarial Training Improves Generalization Under Distribution Shifts in Bioacoustics](https://arxiv.org/abs/2507.13727) René Heinrich, Lukas Rauch, Bernhard Sick, Christoph Scholz -+ [Byzantine-resilient federated online learning for Gaussian process regression](https://arxiv.org//abs/2507.14021) ++ [Byzantine-resilient federated online learning for Gaussian process regression](https://arxiv.org/abs/2507.14021) Xu Zhang, Zhenyuan Yuan, Minghui Zhu -+ [FuSeFL: Fully Secure and Scalable Cross-Silo Federated Learning](https://arxiv.org//abs/2507.13591) ++ [FuSeFL: Fully Secure and Scalable Cross-Silo Federated Learning](https://arxiv.org/abs/2507.13591) Sahar Ghoflsaz Ghinani, Elaheh Sadredini -+ [An Adversarial-Driven Experimental Study on Deep Learning for RF Fingerprinting](https://arxiv.org//abs/2507.14109) ++ [An Adversarial-Driven Experimental Study on Deep Learning for RF Fingerprinting](https://arxiv.org/abs/2507.14109) Xinyu Cao, Bimal Adhikari, Shangqing Zhao, Jingxian Wu, Yanjun Pan -+ [TopicAttack: An Indirect Prompt Injection Attack via Topic Transition](https://arxiv.org//abs/2507.13686) ++ [TopicAttack: An Indirect Prompt Injection Attack via Topic Transition](https://arxiv.org/abs/2507.13686) Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, Bryan Hooi -+ [Breaking the Illusion of Security via Interpretation: Interpretable Vision Transformer Systems under Attack](https://arxiv.org//abs/2507.14248) ++ [Breaking the Illusion of Security via Interpretation: Interpretable Vision Transformer Systems under Attack](https://arxiv.org/abs/2507.14248) Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Hyoungshick Kim, Tamer Abuhmed -+ [Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution](https://arxiv.org//abs/2507.14367) ++ [Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution](https://arxiv.org/abs/2507.14367) Weiming Ren, Raghav Goyal, Zhiming Hu, Tristan Ty Aumentado-Armstrong, Iqbal Mohomed, Alex Levinshtein -+ [FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning](https://arxiv.org//abs/2507.14322) ++ [FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning](https://arxiv.org/abs/2507.14322) Md Rafid Haque, Abu Raihan Mostofa Kamal, Md. Azam Hossain # 2025-07-17 -+ [Manipulation Attacks by Misaligned AI: Risk Analysis and Safety Case Framework](https://arxiv.org//abs/2507.12872) ++ [Manipulation Attacks by Misaligned AI: Risk Analysis and Safety Case Framework](https://arxiv.org/abs/2507.12872) Rishane Dassanayake, Mario Demetroudi, James Walpole, Lindley Lentati, Jason R. Brown, Edward James Young -+ [A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints](https://arxiv.org//abs/2507.12979) ++ [A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints](https://arxiv.org/abs/2507.12979) Youssef Tawfilis, Hossam Amer, Minar El-Aasser, Tallal Elshabrawy -+ [Prompt Injection 2.0: Hybrid AI Threats](https://arxiv.org//abs/2507.13169) ++ [Prompt Injection 2.0: Hybrid AI Threats](https://arxiv.org/abs/2507.13169) Jeremy McHugh, Kristina Šekrst, Jon Cefalu -+ [SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks](https://arxiv.org//abs/2507.13170) ++ [SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks](https://arxiv.org/abs/2507.13170) Kutub Uddin, Awais Khan, Muhammad Umar Farooq, Khalid Malik -+ [Automating Steering for Safe Multimodal Large Language Models](https://arxiv.org//abs/2507.13255) ++ [Automating Steering for Safe Multimodal Large Language Models](https://arxiv.org/abs/2507.13255) Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng -+ [DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation](https://arxiv.org//abs/2507.13292) ++ [DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation](https://arxiv.org/abs/2507.13292) Ekta Balkrishna Gavas, Chinmay Hegde, Nasir Memon, Sudipta Banerjee -+ [Taming Diffusion Transformer for Real-Time Mobile Video Generation](https://arxiv.org//abs/2507.13343) ++ [Taming Diffusion Transformer for Real-Time Mobile Video Generation](https://arxiv.org/abs/2507.13343) Yushu Wu, Yanyu Li, Anil Kag, Ivan Skorokhodov, Willi Menapace, Ke Ma, Arpit Sahni, Ju Hu, Aliaksandr Siarohin, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov -+ [Training Transformers with Enforced Lipschitz Constants](https://arxiv.org//abs/2507.13338) ++ [Training Transformers with Enforced Lipschitz Constants](https://arxiv.org/abs/2507.13338) Laker Newhouse, R. Preston Hess, Franz Cesista, Andrii Zahorodnii, Jeremy Bernstein, Phillip Isola -+ [Architectural Backdoors in Deep Learning: A Survey of Vulnerabilities, Detection, and Defense](https://arxiv.org//abs/2507.12919) ++ [Architectural Backdoors in Deep Learning: A Survey of Vulnerabilities, Detection, and Defense](https://arxiv.org/abs/2507.12919) Victoria Childress, Josh Collyer, Jodie Knapp -+ [MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems](https://arxiv.org//abs/2507.13038) ++ [MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems](https://arxiv.org/abs/2507.13038) Yu Cui, Hongyang Du -+ [IConMark: Robust Interpretable Concept-Based Watermark For AI Images](https://arxiv.org//abs/2507.13407) ++ [IConMark: Robust Interpretable Concept-Based Watermark For AI Images](https://arxiv.org/abs/2507.13407) Vinu Sankar Sadasivan, Mehrdad Saberi, Soheil Feizi -+ [Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers](https://arxiv.org//abs/2507.13474) ++ [Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers](https://arxiv.org/abs/2507.13474) Liang Lin, Zhihao Xu, Xuehai Tang, Shi Liu, Biyu Zhou, Fuqing Zhu, Jizhong Han, Songlin Hu -+ [Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?](https://arxiv.org//abs/2507.13490) ++ [Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?](https://arxiv.org/abs/2507.13490) Siqi Shen, Mehar Singh, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Rada Mihalcea -+ [Fake or Real: The Impostor Hunt in Texts for Space Operations](https://arxiv.org//abs/2507.13508) ++ [Fake or Real: The Impostor Hunt in Texts for Space Operations](https://arxiv.org/abs/2507.13508) Agata Kaczmarek (1), Dawid Płudowski (1), Piotr Wilczyński (1), Przemysław Biecek (1), Krzysztof Kotowski (2), Ramez Shendy (2), Jakub Nalepa (2 and 3), Artur Janicki (1), Evridiki Ntagiou (4) ((1) Warsaw University of Technology, (2) KP Labs, (3) Silesian University of Technology, (4) European Space Agency, European Space Operations Center) # 2025-07-16 -+ [Spatial Frequency Modulation for Semantic Segmentation](https://arxiv.org//abs/2507.11893) ++ [Spatial Frequency Modulation for Semantic Segmentation](https://arxiv.org/abs/2507.11893) Linwei Chen, Ying Fu, Lin Gu, Dezhi Zheng, Jifeng Dai -+ [Robust Planning for Autonomous Vehicles with Diffusion-Based Failure Samplers](https://arxiv.org//abs/2507.11991) ++ [Robust Planning for Autonomous Vehicles with Diffusion-Based Failure Samplers](https://arxiv.org/abs/2507.11991) Juanran Wang, Marc R. Schlichting, Mykel J. Kochenderfer -+ [InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing](https://arxiv.org//abs/2507.12060) ++ [InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing](https://arxiv.org/abs/2507.12060) Kun-Hsiang Lin, Yu-Wen Tseng, Kang-Yang Huang, Jhih-Ciang Wu, Wen-Huang Cheng -+ [Non-Adaptive Adversarial Face Generation](https://arxiv.org//abs/2507.12107) ++ [Non-Adaptive Adversarial Face Generation](https://arxiv.org/abs/2507.12107) Sunpill Kim, Seunghun Paik, Chanwoo Hwang, Minsu Kim, Jae Hong Seo -+ [Thought Purity: Defense Paradigm For Chain-of-Thought Attack](https://arxiv.org//abs/2507.12314) ++ [Thought Purity: Defense Paradigm For Chain-of-Thought Attack](https://arxiv.org/abs/2507.12314) Zihao Xue, Zhen Bi, Long Ma, Zhenlin Hu, Yan Wang, Zhenfang Liu, Qing Sheng, Jie Xiao, Jungang Lou -+ [LLMs Encode Harmfulness and Refusal Separately](https://arxiv.org//abs/2507.11878) ++ [LLMs Encode Harmfulness and Refusal Separately](https://arxiv.org/abs/2507.11878) Jiachen Zhao, Jing Huang, Zhengxuan Wu, David Bau, Weiyan Shi -+ [Overview of the Sensemaking Task at the ELOQUENT 2025 Lab: LLMs as Teachers, Students and Evaluators](https://arxiv.org//abs/2507.12143) ++ [Overview of the Sensemaking Task at the ELOQUENT 2025 Lab: LLMs as Teachers, Students and Evaluators](https://arxiv.org/abs/2507.12143) Pavel Šindelář, Ondřej Bojar -+ [Nonlinear Concept Erasure: a Density Matching Approach](https://arxiv.org//abs/2507.12341) ++ [Nonlinear Concept Erasure: a Density Matching Approach](https://arxiv.org/abs/2507.12341) Antoine Saillenfest, Pirmin Lemberger -+ [Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation](https://arxiv.org//abs/2507.11968) ++ [Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation](https://arxiv.org/abs/2507.11968) Sahid Hossain Mustakim, S M Jishanul Islam, Ummay Maria Muna, Montasir Chowdhury, Mohammed Jawwadul Islam, Sadia Ahmmed, Tashfia Sikder, Syed Tasdid Azam Dhrubo, Swakkhar Shatabda -+ [FADE: Adversarial Concept Erasure in Flow Models](https://arxiv.org//abs/2507.12283) ++ [FADE: Adversarial Concept Erasure in Flow Models](https://arxiv.org/abs/2507.12283) Zixuan Fu, Yan Ren, Finn Carter, Chenyue Wang, Ze Niu, Dacheng Yu, Emily Davis, Bo Zhang -+ [Self-Adaptive and Robust Federated Spectrum Sensing without Benign Majority for Cellular Networks](https://arxiv.org//abs/2507.12127) ++ [Self-Adaptive and Robust Federated Spectrum Sensing without Benign Majority for Cellular Networks](https://arxiv.org/abs/2507.12127) Ngoc Duy Pham, Thusitha Dayaratne, Viet Vo, Shangqi Lai, Sharif Abuadbba, Hajime Suzuki, Xingliang Yuan, Carsten Rudolph -+ [Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries](https://arxiv.org//abs/2507.12384) ++ [Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries](https://arxiv.org/abs/2507.12384) Bo Wen, Guoyun Gao, Zhicheng Xu, Ruibin Mao, Xiaojuan Qi, X. Sharon Hu, Xunzhao Yin, Can Li -+ [A Bayesian Incentive Mechanism for Poison-Resilient Federated Learning](https://arxiv.org//abs/2507.12439) ++ [A Bayesian Incentive Mechanism for Poison-Resilient Federated Learning](https://arxiv.org/abs/2507.12439) Daniel Commey, Rebecca A. Sarpong, Griffith S. Klogo, Winful Bagyl-Bac, Garth V. Crosby -+ [A Privacy-Preserving Framework for Advertising Personalization Incorporating Federated Learning and Differential Privacy](https://arxiv.org//abs/2507.12098) ++ [A Privacy-Preserving Framework for Advertising Personalization Incorporating Federated Learning and Differential Privacy](https://arxiv.org/abs/2507.12098) Xiang Li, Yifan Lin, Yuanzhe Zhang -+ [Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks](https://arxiv.org//abs/2507.12185) ++ [Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks](https://arxiv.org/abs/2507.12185) Rina Mishra, Gaurav Varshney -+ [Benchmarking Deception Probes via Black-to-White Performance Boosts](https://arxiv.org//abs/2507.12691) ++ [Benchmarking Deception Probes via Black-to-White Performance Boosts](https://arxiv.org/abs/2507.12691) Avi Parrack, Carlo Leonardo Attubato, Stefan Heimersheim -+ [Safeguarding Federated Learning-based Road Condition Classification](https://arxiv.org//abs/2507.12568) ++ [Safeguarding Federated Learning-based Road Condition Classification](https://arxiv.org/abs/2507.12568) Sheng Liu, Panos Papadimitratos -+ [Minimalist Concept Erasure in Generative Models](https://arxiv.org//abs/2507.13386) ++ [Minimalist Concept Erasure in Generative Models](https://arxiv.org/abs/2507.13386) Yang Zhang, Er Jin, Yanfei Dong, Yixuan Wu, Philip Torr, Ashkan Khakzar, Johannes Stegmaier, Kenji Kawaguchi # 2025-07-15 -+ [How to Protect Models against Adversarial Unlearning?](https://arxiv.org//abs/2507.10886) ++ [How to Protect Models against Adversarial Unlearning?](https://arxiv.org/abs/2507.10886) Patryk Jasiorski, Marek Klonowski, Michał Woźniak -+ [Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data](https://arxiv.org//abs/2507.10998) ++ [Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data](https://arxiv.org/abs/2507.10998) Zhipeng He, Alexander Stevens, Chun Ouyang, Johannes De Smedt, Alistair Barros, Catarina Moreira -+ [The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs](https://arxiv.org//abs/2507.11097) ++ [The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs](https://arxiv.org/abs/2507.11097) Zichen Wen, Jiashu Qu, Dongrui Liu, Zhiyuan Liu, Ruixi Wu, Yicun Yang, Xiangqi Jin, Haoyun Xu, Xuyang Liu, Weijia Li, Chaochao Lu, Jing Shao, Conghui He, Linfeng Zhang -+ [Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs](https://arxiv.org//abs/2507.11112) ++ [Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs](https://arxiv.org/abs/2507.11112) Sanhanat Sivapiromrat, Caiqi Zhang, Marco Basaldella, Nigel Collier -+ [What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests](https://arxiv.org//abs/2507.11128) ++ [What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests](https://arxiv.org/abs/2507.11128) Dimitri Staufer -+ [Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction](https://arxiv.org//abs/2507.11173) ++ [Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction](https://arxiv.org/abs/2507.11173) Deepak Kumar Panda, Weisi Guo -+ [Striking the Perfect Balance: Preserving Privacy While Boosting Utility in Collaborative Medical Prediction Platforms](https://arxiv.org//abs/2507.11187) ++ [Striking the Perfect Balance: Preserving Privacy While Boosting Utility in Collaborative Medical Prediction Platforms](https://arxiv.org/abs/2507.11187) Shao-Bo Lin, Xiaotong Liu, Yao Wang -+ [Robust-Multi-Task Gradient Boosting](https://arxiv.org//abs/2507.11411) ++ [Robust-Multi-Task Gradient Boosting](https://arxiv.org/abs/2507.11411) Seyedsaman Emami, Gonzalo Martínez-Muñoz, Daniel Hernández-Lobato -+ [Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking](https://arxiv.org//abs/2507.11137) ++ [Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking](https://arxiv.org/abs/2507.11137) Yuan Yao, Jin Song, Jian Jin -+ [A Parallelizable Approach for Characterizing NE in Zero-Sum Games After a Linear Number of Iterations of Gradient Descent](https://arxiv.org//abs/2507.11366) ++ [A Parallelizable Approach for Characterizing NE in Zero-Sum Games After a Linear Number of Iterations of Gradient Descent](https://arxiv.org/abs/2507.11366) Taemin Kim, James P. Bailey -+ [Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility](https://arxiv.org//abs/2507.11630) ++ [Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility](https://arxiv.org/abs/2507.11630) Brendan Murphy, Dillon Bowen, Shahrad Mohammadzadeh, Julius Broomfield, Adam Gleave, Kellin Pelrine -+ [Subgraph Generation for Generalizing on Out-of-Distribution Links](https://arxiv.org//abs/2507.11710) ++ [Subgraph Generation for Generalizing on Out-of-Distribution Links](https://arxiv.org/abs/2507.11710) Jay Revolinsky, Harry Shomer, Jiliang Tang -+ [Challenges in GenAI and Authentication: a scoping review](https://arxiv.org//abs/2507.11775) ++ [Challenges in GenAI and Authentication: a scoping review](https://arxiv.org/abs/2507.11775) Wesley dos Reis Bezerra, Lais Machado Bezerra, Carlos Becker Westphall -+ [ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs](https://arxiv.org//abs/2507.11649) ++ [ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs](https://arxiv.org/abs/2507.11649) Daniel Commey, Benjamin Appiah, Griffith S. Klogo, Garth V. Crosby -+ [Evasion Under Blockchain Sanctions](https://arxiv.org//abs/2507.11721) ++ [Evasion Under Blockchain Sanctions](https://arxiv.org/abs/2507.11721) Endong Liu, Mark Ryan, Liyi Zhou, Pascal Berrang -+ [Differentially Private Conformal Prediction via Quantile Binary Search](https://arxiv.org//abs/2507.12497) ++ [Differentially Private Conformal Prediction via Quantile Binary Search](https://arxiv.org/abs/2507.12497) Ogonnaya M. Romanus, Roberto Molinari -+ [Mitigating Trojanized Prompt Chains in Educational LLM Use Cases: Experimental Findings and Detection Tool Design](https://arxiv.org//abs/2507.14207) ++ [Mitigating Trojanized Prompt Chains in Educational LLM Use Cases: Experimental Findings and Detection Tool Design](https://arxiv.org/abs/2507.14207) Richard M. Charles, James H. Curry, Richard B. Charles -+ [Secure Goal-Oriented Communication: Defending against Eavesdropping Timing Attacks](https://arxiv.org//abs/2507.14212) ++ [Secure Goal-Oriented Communication: Defending against Eavesdropping Timing Attacks](https://arxiv.org/abs/2507.14212) Federico Mason, Federico Chiariotti, Pietro Talli, Andrea Zanella # 2025-07-14 -+ [BlueGlass: A Framework for Composite AI Safety](https://arxiv.org//abs/2507.10106) ++ [BlueGlass: A Framework for Composite AI Safety](https://arxiv.org/abs/2507.10106) Harshal Nandigramwar, Syed Qutub, Kay-Ulrich Scholl -+ [Differentially Private Federated Low Rank Adaptation Beyond Fixed-Matrix](https://arxiv.org//abs/2507.09990) ++ [Differentially Private Federated Low Rank Adaptation Beyond Fixed-Matrix](https://arxiv.org/abs/2507.09990) Ming Wen, Jiaqi Zhu, Yuedong Xu, Yipeng Zhou, Dingding Han -+ [Learning Private Representations through Entropy-based Adversarial Training](https://arxiv.org//abs/2507.10194) ++ [Learning Private Representations through Entropy-based Adversarial Training](https://arxiv.org/abs/2507.10194) Tassilo Klein, Moin Nabi -+ [Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems](https://arxiv.org//abs/2507.10457) ++ [Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems](https://arxiv.org/abs/2507.10457) Hammad Atta, Ken Huang, Manish Bhatt, Kamal Ahmed, Muhammad Aziz Ul Haq, Yasir Mehmood -+ [Can You Detect the Difference?](https://arxiv.org//abs/2507.10475) ++ [Can You Detect the Difference?](https://arxiv.org/abs/2507.10475) İsmail Tarım, Aytuğ Onan -+ [Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach](https://arxiv.org//abs/2507.10330) ++ [Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach](https://arxiv.org/abs/2507.10330) Mohammed Bouri, Adnane Saoud -+ [Counterfactual Visual Explanation via Causally-Guided Adversarial Steering](https://arxiv.org//abs/2507.09881) ++ [Counterfactual Visual Explanation via Causally-Guided Adversarial Steering](https://arxiv.org/abs/2507.09881) Yiran Qiao, Disheng Liu, Yiren Lu, Yu Yin, Mengnan Du, Jing Ma -+ [3DGAA: Realistic and Robust 3D Gaussian-based Adversarial Attack for Autonomous Driving](https://arxiv.org//abs/2507.09993) ++ [3DGAA: Realistic and Robust 3D Gaussian-based Adversarial Attack for Autonomous Driving](https://arxiv.org/abs/2507.09993) Yixun Zhang, Lizhi Wang, Junjun Zhao, Wending Zhao, Feng Zhou, Yonghao Dang, Jianqin Yin -+ [Navigating the Challenges of AI-Generated Image Detection in the Wild: What Truly Matters?](https://arxiv.org//abs/2507.10236) ++ [Navigating the Challenges of AI-Generated Image Detection in the Wild: What Truly Matters?](https://arxiv.org/abs/2507.10236) Despina Konstantinidou, Dimitrios Karageorgiou, Christos Koutlis, Olga Papadopoulou, Emmanouil Schinas, Symeon Papadopoulos -+ [Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks](https://arxiv.org//abs/2507.10239) ++ [Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks](https://arxiv.org/abs/2507.10239) Ben Hamscher, Edgar Heinert, Annika Mütze, Kira Maag, Matthias Rottmann -+ [Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures](https://arxiv.org//abs/2507.10265) ++ [Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures](https://arxiv.org/abs/2507.10265) Xinlong Ding, Hongwei Yu, Jiawei Li, Feifan Li, Yu Shang, Bochao Zou, Huimin Ma, Jiansheng Chen -+ [Test-Time Canonicalization by Foundation Models for Robust Perception](https://arxiv.org//abs/2507.10375) ++ [Test-Time Canonicalization by Foundation Models for Robust Perception](https://arxiv.org/abs/2507.10375) Utkarsh Singhal, Ryan Feng, Stella X. Yu, Atul Prakash -+ [On the Efficiency of Training Robust Decision Trees](https://arxiv.org//abs/2507.10048) ++ [On the Efficiency of Training Robust Decision Trees](https://arxiv.org/abs/2507.10048) Benedict Gerlach, Marie Anastacio, Holger H. Hoos -+ [MoCap-Impute: A Comprehensive Benchmark and Comparative Analysis of Imputation Methods for IMU-based Motion Capture Data](https://arxiv.org//abs/2507.10334) ++ [MoCap-Impute: A Comprehensive Benchmark and Comparative Analysis of Imputation Methods for IMU-based Motion Capture Data](https://arxiv.org/abs/2507.10334) Mahmoud Bekhit, Ahmad Salah, Ahmed Salim Alrawahi, Tarek Attia, Ahmed Ali, Esraa Eldesokey, Ahmed Fathalla -+ [Split Happens: Combating Advanced Threats with Split Learning and Function Secret Sharing](https://arxiv.org//abs/2507.10494) ++ [Split Happens: Combating Advanced Threats with Split Learning and Function Secret Sharing](https://arxiv.org/abs/2507.10494) Tanveer Khan, Mindaugas Budzys, Antonis Michalas -+ [The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents](https://arxiv.org//abs/2507.10016) ++ [The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents](https://arxiv.org/abs/2507.10016) Lixu Wang, Kaixiang Yao, Xinfeng Li, Dong Yang, Haoyang Li, Xiaofeng Wang, Wei Dong -+ [HASSLE: A Self-Supervised Learning Enhanced Hijacking Attack on Vertical Federated Learning](https://arxiv.org//abs/2507.10162) ++ [HASSLE: A Self-Supervised Learning Enhanced Hijacking Attack on Vertical Federated Learning](https://arxiv.org/abs/2507.10162) Weiyang He, Chip-Hong Chang -+ [BURN: Backdoor Unlearning via Adversarial Boundary Analysis](https://arxiv.org//abs/2507.10491) ++ [BURN: Backdoor Unlearning via Adversarial Boundary Analysis](https://arxiv.org/abs/2507.10491) Yanghao Su, Jie Zhang, Yiming Li, Tianwei Zhang, Qing Guo, Weiming Zhang, Nenghai Yu, Nils Lukas, Wenbo Zhou -+ [AdvGrasp: Adversarial Attacks on Robotic Grasping from a Physical Perspective](https://arxiv.org//abs/2507.09857) ++ [AdvGrasp: Adversarial Attacks on Robotic Grasping from a Physical Perspective](https://arxiv.org/abs/2507.09857) Xiaofei Wang, Mingliang Han, Tianyu Hao, Cegang Li, Yunbo Zhao, Keke Tang -+ [Game Theory Meets LLM and Agentic AI: Reimagining Cybersecurity for the Age of Intelligent Threats](https://arxiv.org//abs/2507.10621) ++ [Game Theory Meets LLM and Agentic AI: Reimagining Cybersecurity for the Age of Intelligent Threats](https://arxiv.org/abs/2507.10621) Quanyan Zhu -+ [HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong](https://arxiv.org//abs/2507.11502) ++ [HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong](https://arxiv.org/abs/2507.11502) Sirui Han, Junqi Zhu, Ruiyuan Zhang, Yike Guo -+ [Distributionally Robust Optimization with Adversarial Data Contamination](https://arxiv.org//abs/2507.10718) ++ [Distributionally Robust Optimization with Adversarial Data Contamination](https://arxiv.org/abs/2507.10718) Shuyao Li, Ilias Diakonikolas, Jelena Diakonikolas -+ [Formal Verification of Variational Quantum Circuits](https://arxiv.org//abs/2507.10635) ++ [Formal Verification of Variational Quantum Circuits](https://arxiv.org/abs/2507.10635) Nicola Assolini, Luca Marzari, Isabella Mastroeni, Alessandra di Pierro -+ [3S-Attack: Spatial, Spectral and Semantic Invisible Backdoor Attack Against DNN Models](https://arxiv.org//abs/2507.10733) ++ [3S-Attack: Spatial, Spectral and Semantic Invisible Backdoor Attack Against DNN Models](https://arxiv.org/abs/2507.10733) Jianyao Yin, Luca Arnaboldi, Honglong Chen, Pascal Berrang -+ [REAL-IoT: Characterizing GNN Intrusion Detection Robustness under Practical Adversarial Attack](https://arxiv.org//abs/2507.10836) ++ [REAL-IoT: Characterizing GNN Intrusion Detection Robustness under Practical Adversarial Attack](https://arxiv.org/abs/2507.10836) Zhonghao Zhan, Huichi Zhou, Hamed Haddadi -+ [ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning](https://arxiv.org//abs/2507.11500) ++ [ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning](https://arxiv.org/abs/2507.11500) Zhengyue Zhao, Yingzi Ma, Somesh Jha, Marco Pavone, Chaowei Xiao -+ [Optimal Debiased Inference on Privatized Data via Indirect Estimation and Parametric Bootstrap](https://arxiv.org//abs/2507.10746) ++ [Optimal Debiased Inference on Privatized Data via Indirect Estimation and Parametric Bootstrap](https://arxiv.org/abs/2507.10746) Zhanyu Wang, Arin Chang, Jordan Awan -+ [PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training](https://arxiv.org//abs/2507.14202) ++ [PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training](https://arxiv.org/abs/2507.14202) Pengfei Du # 2025-07-13 -+ [DRAGD: A Federated Unlearning Data Reconstruction Attack Based on Gradient Differences](https://arxiv.org//abs/2507.09602) ++ [DRAGD: A Federated Unlearning Data Reconstruction Attack Based on Gradient Differences](https://arxiv.org/abs/2507.09602) Bocheng Ju, Junchao Fan, Jiaqi Liu, Xiaolin Chang -+ [Conformal Prediction for Privacy-Preserving Machine Learning](https://arxiv.org//abs/2507.09678) ++ [Conformal Prediction for Privacy-Preserving Machine Learning](https://arxiv.org/abs/2507.09678) Alexander David Balinsky, Dominik Krzeminski, Alexander Balinsky -+ [Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces](https://arxiv.org//abs/2507.09709) ++ [Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces](https://arxiv.org/abs/2507.09709) Baturay Saglam, Paul Kassianik, Blaine Nelson, Sajana Weerawardhena, Yaron Singer, Amin Karbasi -+ [Efficient Private Inference Based on Helper-Assisted Malicious Security Dishonest Majority MPC](https://arxiv.org//abs/2507.09607) ++ [Efficient Private Inference Based on Helper-Assisted Malicious Security Dishonest Majority MPC](https://arxiv.org/abs/2507.09607) Kaiwen Wang, Yuehan Dong, Junchao Fan, Xiaolin Chang -+ [LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents](https://arxiv.org//abs/2507.10610) ++ [LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents](https://arxiv.org/abs/2507.10610) Zihe Yan, Zhuosheng Zhang # 2025-07-12 -+ [Hide-and-Shill: A Reinforcement Learning Framework for Market Manipulation Detection in Symphony-a Decentralized Multi-Agent System](https://arxiv.org//abs/2507.09179) ++ [Hide-and-Shill: A Reinforcement Learning Framework for Market Manipulation Detection in Symphony-a Decentralized Multi-Agent System](https://arxiv.org/abs/2507.09179) Ronghua Shi, Yiou Liu, Xinyu Ying, Yang Tan, Yuchun Feng, Lynn Ai, Bill Shi, Xuhui Wang, Zhuang Liu -+ [LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing](https://arxiv.org//abs/2507.09407) ++ [LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing](https://arxiv.org/abs/2507.09407) Quanyan Zhu -+ [Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers](https://arxiv.org//abs/2507.09406) ++ [Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers](https://arxiv.org/abs/2507.09406) Santhosh Kumar Ravindran -+ [ClaritySpeech: Dementia Obfuscation in Speech](https://arxiv.org//abs/2507.09282) ++ [ClaritySpeech: Dementia Obfuscation in Speech](https://arxiv.org/abs/2507.09282) Dominika Woszczyk, Ranya Aloufi, Soteris Demetriou -+ [On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving](https://arxiv.org//abs/2507.09095) ++ [On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving](https://arxiv.org/abs/2507.09095) Md Hasan Shahriar, Md Mohaimin Al Barat, Harshavardhan Sundar, Naren Ramakrishnan, Y. Thomas Hou, Wenjing Lou -+ [Digital Twin-Assisted Explainable AI for Robust Beam Prediction in mmWave MIMO Systems](https://arxiv.org//abs/2507.14180) ++ [Digital Twin-Assisted Explainable AI for Robust Beam Prediction in mmWave MIMO Systems](https://arxiv.org/abs/2507.14180) Nasir Khan, Asmaa Abdallah, Abdulkadir Celik, Ahmed M. Eltawil, Sinem Coleri # 2025-07-11 -+ [Agent Safety Alignment via Reinforcement Learning](https://arxiv.org//abs/2507.08270) ++ [Agent Safety Alignment via Reinforcement Learning](https://arxiv.org/abs/2507.08270) Zeyang Sha, Hanling Tian, Zhuoer Xu, Shiwen Cui, Changhua Meng, Weiqiang Wang -+ [Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training](https://arxiv.org//abs/2507.08284) ++ [Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training](https://arxiv.org/abs/2507.08284) Aleksei Ilin, Gor Matevosyan, Xueying Ma, Vladimir Eremin, Suhaa Dada, Muqun Li, Riyaaz Shaik, Haluk Noyan Tokgozoglu -+ [Invariant-based Robust Weights Watermark for Large Language Models](https://arxiv.org//abs/2507.08288) ++ [Invariant-based Robust Weights Watermark for Large Language Models](https://arxiv.org/abs/2507.08288) Qingxiao Guo, Xinjie Zhu, Yilong Ma, Hui Jin, Yunhao Wang, Weifeng Zhang, Xiaobing Guo -+ [One Token to Fool LLM-as-a-Judge](https://arxiv.org//abs/2507.08794) ++ [One Token to Fool LLM-as-a-Judge](https://arxiv.org/abs/2507.08794) Yulai Zhao, Haolin Liu, Dian Yu, S.Y. Kung, Haitao Mi, Dong Yu -+ [Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation](https://arxiv.org//abs/2507.08343) ++ [Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation](https://arxiv.org/abs/2507.08343) Junxue Yang, Xin Liao, Weixuan Tang, Jianhua Yang, Zheng Qin -+ [SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations](https://arxiv.org//abs/2507.08707) ++ [SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations](https://arxiv.org/abs/2507.08707) Peter Crowley, Zachary Serlin, Tyler Paine, Makai Mann, Michael Benjamin, Calin Belta -+ [Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks](https://arxiv.org//abs/2507.08261) ++ [Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks](https://arxiv.org/abs/2507.08261) Sofia Ivolgina, P. Thomas Fletcher, Baba C. Vemuri -+ [Entangled Threats: A Unified Kill Chain Model for Quantum Machine Learning Security](https://arxiv.org//abs/2507.08623) ++ [Entangled Threats: A Unified Kill Chain Model for Quantum Machine Learning Security](https://arxiv.org/abs/2507.08623) Pascal Debus, Maximilian Wendlinger, Kilian Tscharke, Daniel Herr, Cedric Brügmann, Daniel Ohl de Mello, Juris Ulmanis, Alexander Erhard, Arthur Schmidt, Fabian Petsch -+ [Detecting Deepfake Talking Heads from Facial Biometric Anomalies](https://arxiv.org//abs/2507.08917) ++ [Detecting Deepfake Talking Heads from Facial Biometric Anomalies](https://arxiv.org/abs/2507.08917) Justin D. Norman, Hany Farid -+ [VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models](https://arxiv.org//abs/2507.08982) ++ [VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models](https://arxiv.org/abs/2507.08982) Hanene F. Z. Brachemi Meftah, Wassim Hamidouche, Sid Ahmed Fezza, Olivier Déforges -+ [Exploiting Leaderboards for Large-Scale Distribution of Malicious Models](https://arxiv.org//abs/2507.08983) ++ [Exploiting Leaderboards for Large-Scale Distribution of Malicious Models](https://arxiv.org/abs/2507.08983) Anshuman Suri, Harsh Chaudhari, Yuefeng Peng, Ali Naseh, Amir Houmansadr, Alina Oprea -+ [When and Where do Data Poisons Attack Textual Inversion?](https://arxiv.org//abs/2507.10578) ++ [When and Where do Data Poisons Attack Textual Inversion?](https://arxiv.org/abs/2507.10578) Jeremy Styborski, Mingzhi Lyu, Jiayou Lu, Nupur Kapur, Adams Kong -+ [$\texttt{Droid}$: A Resource Suite for AI-Generated Code Detection](https://arxiv.org//abs/2507.10583) ++ [$\texttt{Droid}$: A Resource Suite for AI-Generated Code Detection](https://arxiv.org/abs/2507.10583) Daniil Orel, Indraneil Paul, Iryna Gurevych, Preslav Nakov # 2025-07-10 -+ [May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks](https://arxiv.org//abs/2507.07417) ++ [May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks](https://arxiv.org/abs/2507.07417) Nishit V. Pandya, Andrey Labunets, Sicun Gao, Earlence Fernandes -+ [OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting](https://arxiv.org//abs/2507.07754) ++ [OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting](https://arxiv.org/abs/2507.07754) Jaeheun Jung, Bosung Jung, Suhyun Bae, Donghun Lee -+ [Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking](https://arxiv.org//abs/2507.07871) ++ [Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking](https://arxiv.org/abs/2507.07871) Toluwani Aremu, Noor Hussein, Munachiso Nwadike, Samuele Poppi, Jie Zhang, Karthik Nandakumar, Neil Gong, Nils Lukas -+ [Low Resource Reconstruction Attacks Through Benign Prompts](https://arxiv.org//abs/2507.07947) ++ [Low Resource Reconstruction Attacks Through Benign Prompts](https://arxiv.org/abs/2507.07947) Sol Yarkoni, Roi Livni -+ [Rethinking the Privacy of Text Embeddings: A Reproducibility Study of "Text Embeddings Reveal (Almost) As Much As Text"](https://arxiv.org//abs/2507.07700) ++ [Rethinking the Privacy of Text Embeddings: A Reproducibility Study of "Text Embeddings Reveal (Almost) As Much As Text"](https://arxiv.org/abs/2507.07700) Dominykas Seputis, Yongkang Li, Karsten Langerak, Serghei Mihailov -+ [GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing](https://arxiv.org//abs/2507.07735) ++ [GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing](https://arxiv.org/abs/2507.07735) Peiyan Zhang, Haibo Jin, Liying Kang, Haohan Wang -+ [Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking](https://arxiv.org//abs/2507.07483) ++ [Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking](https://arxiv.org/abs/2507.07483) Qiangqiang Wu, Yi Yu, Chenqi Kong, Ziquan Liu, Jia Wan, Haoliang Li, Alex C. Kot, Antoni B. Chan -+ [One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models](https://arxiv.org//abs/2507.07709) ++ [One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models](https://arxiv.org/abs/2507.07709) Jiale Zhao, Xinyang Jiang, Junyao Gao, Yuhao Xue, Cairong Zhao -+ [SCOOTER: A Human Evaluation Framework for Unrestricted Adversarial Examples](https://arxiv.org//abs/2507.07776) ++ [SCOOTER: A Human Evaluation Framework for Unrestricted Adversarial Examples](https://arxiv.org/abs/2507.07776) Dren Fazlija, Monty-Maximilian Zühlke, Johanna Schrader, Arkadij Orlov, Clara Stein, Iyiola E. Olatunji, Daniel Kudenko -+ [TRIX- Trading Adversarial Fairness via Mixed Adversarial Training](https://arxiv.org//abs/2507.07768) ++ [TRIX- Trading Adversarial Fairness via Mixed Adversarial Training](https://arxiv.org/abs/2507.07768) Tejaswini Medi, Steffen Jung, Margret Keuper -+ [Rainbow Artifacts from Electromagnetic Signal Injection Attacks on Image Sensors](https://arxiv.org//abs/2507.07773) ++ [Rainbow Artifacts from Electromagnetic Signal Injection Attacks on Image Sensors](https://arxiv.org/abs/2507.07773) Youqian Zhang, Xinyu Ji, Zhihao Wang, Qinhong Jiang -+ [Defending Against Prompt Injection With a Few DefensiveTokens](https://arxiv.org//abs/2507.07974) ++ [Defending Against Prompt Injection With a Few DefensiveTokens](https://arxiv.org/abs/2507.07974) Sizhe Chen, Yizhu Wang, Nicholas Carlini, Chawin Sitawarin, David Wagner -+ [A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking](https://arxiv.org//abs/2507.08207) ++ [A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking](https://arxiv.org/abs/2507.08207) Zhengye Han, Quanyan Zhu -+ [An Enhanced Privacy-preserving Federated Few-shot Learning Framework for Respiratory Disease Diagnosis](https://arxiv.org//abs/2507.08050) ++ [An Enhanced Privacy-preserving Federated Few-shot Learning Framework for Respiratory Disease Diagnosis](https://arxiv.org/abs/2507.08050) Ming Wang, Zhaoyang Duan, Dong Xue, Fangzhou Liu, Zhongheng Zhang -+ [Quantum Properties Trojans (QuPTs) for Attacking Quantum Neural Networks](https://arxiv.org//abs/2507.08202) ++ [Quantum Properties Trojans (QuPTs) for Attacking Quantum Neural Networks](https://arxiv.org/abs/2507.08202) Sounak Bhowmik, Travis S. Humble, Himanshu Thapliyal -+ [Simple Mechanistic Explanations for Out-Of-Context Reasoning](https://arxiv.org//abs/2507.08218) ++ [Simple Mechanistic Explanations for Out-Of-Context Reasoning](https://arxiv.org/abs/2507.08218) Atticus Wang, Joshua Engels, Oliver Clive-Griffin -+ [Adaptive Diffusion Denoised Smoothing : Certified Robustness via Randomized Smoothing with Differentially Private Guided Denoising Diffusion](https://arxiv.org//abs/2507.08163) ++ [Adaptive Diffusion Denoised Smoothing : Certified Robustness via Randomized Smoothing with Differentially Private Guided Denoising Diffusion](https://arxiv.org/abs/2507.08163) Frederick Shpilevskiy, Saiyue Lyu, Krishnamurthy Dj Dvijotham, Mathias Lécuyer, Pierre-André Noël -+ [EvA: Evolutionary Attacks on Graphs](https://arxiv.org//abs/2507.08212) ++ [EvA: Evolutionary Attacks on Graphs](https://arxiv.org/abs/2507.08212) Mohammad Sadegh Akhondzadeh, Soroush H. Zargarbashi, Jimin Cao, Aleksandar Bojchevski -+ [Beyond the Worst Case: Extending Differential Privacy Guarantees to Realistic Adversaries](https://arxiv.org//abs/2507.08158) ++ [Beyond the Worst Case: Extending Differential Privacy Guarantees to Realistic Adversaries](https://arxiv.org/abs/2507.08158) Marika Swanberg, Meenatchi Sundaram Muthu Selva Annamalai, Jamie Hayes, Borja Balle, Adam Smith -+ [Towards Privacy-Preserving and Personalized Smart Homes via Tailored Small Language Models](https://arxiv.org//abs/2507.08878) ++ [Towards Privacy-Preserving and Personalized Smart Homes via Tailored Small Language Models](https://arxiv.org/abs/2507.08878) Xinyu Huang, Leming Shen, Zijing Ma, Yuanqing Zheng # 2025-07-09 -+ [Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning](https://arxiv.org//abs/2507.07259) ++ [Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning](https://arxiv.org/abs/2507.07259) Giulio Rossolini, Fabio Brau, Alessandro Biondi, Battista Biggio, Giorgio Buttazzo -+ [The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover](https://arxiv.org//abs/2507.06850) ++ [The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover](https://arxiv.org/abs/2507.06850) Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, Angelo Furfaro -+ [Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning](https://arxiv.org//abs/2507.07139) ++ [Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning](https://arxiv.org/abs/2507.07139) Renyang Liu, Guanlin Li, Tianwei Zhang, See-Kiong Ng -+ [Concept Unlearning by Modeling Key Steps of Diffusion Process](https://arxiv.org//abs/2507.06526) ++ [Concept Unlearning by Modeling Key Steps of Diffusion Process](https://arxiv.org/abs/2507.06526) Chaoshuo Zhang, Chenhao Lin, Zhengyu Zhao, Le Yang, Qian Wang, Chao Shen -+ [An attention-aware GNN-based input defender against multi-turn jailbreak on LLMs](https://arxiv.org//abs/2507.07146) ++ [An attention-aware GNN-based input defender against multi-turn jailbreak on LLMs](https://arxiv.org/abs/2507.07146) Zixuan Huang, Kecheng Huang, Lihao Yin, Bowei He, Huiling Zhen, Mingxuan Yuan, Zili Shao -+ [Optimizing Model Splitting and Device Task Assignment for Deceptive Signal Assisted Private Multi-hop Split Learning](https://arxiv.org//abs/2507.07323) ++ [Optimizing Model Splitting and Device Task Assignment for Deceptive Signal Assisted Private Multi-hop Split Learning](https://arxiv.org/abs/2507.07323) Dongyu Wei, Xiaoren Xu, Yuchen Liu, H. Vincent Poor, Mingzhe Chen -+ [On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment](https://arxiv.org//abs/2507.07341) ++ [On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment](https://arxiv.org/abs/2507.07341) Sarah Ball, Greg Gluch, Shafi Goldwasser, Frauke Kreuter, Omer Reingold, Guy N. Rothblum -+ [Privacy-Utility-Fairness: A Balanced Approach to Vehicular-Traffic Management System](https://arxiv.org//abs/2507.08864) ++ [Privacy-Utility-Fairness: A Balanced Approach to Vehicular-Traffic Management System](https://arxiv.org/abs/2507.08864) Poushali Sengupta, Sabita Maharjan, frank Eliassen, Yan Zhang -+ [RAG Safety: Exploring Knowledge Poisoning Attacks to Retrieval-Augmented Generation](https://arxiv.org//abs/2507.08862) ++ [RAG Safety: Exploring Knowledge Poisoning Attacks to Retrieval-Augmented Generation](https://arxiv.org/abs/2507.08862) Tianzhe Zhao, Jiaoyan Chen, Yanchi Ru, Haiping Zhu, Nan Hu, Jun Liu, Qika Lin -+ [VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation](https://arxiv.org//abs/2507.06899) ++ [VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation](https://arxiv.org/abs/2507.06899) Ziang Ye, Yang Zhang, Wentao Shi, Xiaoyu You, Fuli Feng, Tat-Seng Chua # 2025-07-08 -+ [DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective](https://arxiv.org//abs/2507.05622) ++ [DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective](https://arxiv.org/abs/2507.05622) Shuo Shao, Yiming Li, Mengren Zheng, Zhiyang Hu, Yukun Chen, Boheng Li, Yu He, Junfeng Guo, Tianwei Zhang, Dacheng Tao, Zhan Qin -+ [How Not to Detect Prompt Injections with an LLM](https://arxiv.org//abs/2507.05630) ++ [How Not to Detect Prompt Injections with an LLM](https://arxiv.org/abs/2507.05630) Sarthak Choudhary, Divyam Anshumaan, Nils Palumbo, Somesh Jha -+ [TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data](https://arxiv.org//abs/2507.05660) ++ [TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data](https://arxiv.org/abs/2507.05660) Aravind Cheruvu, Shravya Kanchi, Sifat Muhammad Abdullah, Nicholas Kong, Daphne Yao, Murtuza Jadliwala, Bimal Viswanath -+ [CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations](https://arxiv.org//abs/2507.06043) ++ [CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations](https://arxiv.org/abs/2507.06043) Xiaohu Li, Yunfeng Ning, Zepeng Bao, Mayi Xu, Jianhao Chen, Tieyun Qian -+ [RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages](https://arxiv.org//abs/2507.05980) ++ [RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages](https://arxiv.org/abs/2507.05980) Gabriel Chua, Leanne Tan, Ziyu Ge, Roy Ka-Wei Lee -+ [The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation](https://arxiv.org//abs/2507.05578) ++ [The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation](https://arxiv.org/abs/2507.05578) Alexander Xiong, Xuandong Zhao, Aneesh Pappu, Dawn Song -+ [ScoreAdv: Score-based Targeted Generation of Natural Adversarial Examples via Diffusion Models](https://arxiv.org//abs/2507.06078) ++ [ScoreAdv: Score-based Targeted Generation of Natural Adversarial Examples via Diffusion Models](https://arxiv.org/abs/2507.06078) Chihan Huang, Hao Tang -+ [On the Inherent Privacy of Zeroth Order Projected Gradient Descent](https://arxiv.org//abs/2507.05610) ++ [On the Inherent Privacy of Zeroth Order Projected Gradient Descent](https://arxiv.org/abs/2507.05610) Devansh Gupta, Meisam Razaviyayn, Vatsal Sharan -+ [Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset](https://arxiv.org//abs/2507.05728) ++ [Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset](https://arxiv.org/abs/2507.05728) Ruofei Wang, Peiqi Duan, Boxin Shi, Renjie Wan -+ [Circumventing Safety Alignment in Large Language Models Through Embedding Space Toxicity Attenuation](https://arxiv.org//abs/2507.08020) ++ [Circumventing Safety Alignment in Large Language Models Through Embedding Space Toxicity Attenuation](https://arxiv.org/abs/2507.08020) Zhibo Zhang, Yuxi Li, Kailong Wang, Shuai Yuan, Ling Shi, Haoyu Wang -+ [The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models](https://arxiv.org//abs/2507.11544) ++ [The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models](https://arxiv.org/abs/2507.11544) Ann-Kathrin Dombrowski, Dillon Bowen, Adam Gleave, Chris Cundy -+ [Hedge Funds on a Swamp: Analyzing Patterns, Vulnerabilities, and Defense Measures in Blockchain Bridges](https://arxiv.org//abs/2507.06156) ++ [Hedge Funds on a Swamp: Analyzing Patterns, Vulnerabilities, and Defense Measures in Blockchain Bridges](https://arxiv.org/abs/2507.06156) Poupak Azad, Jiahua Xu, Yebo Feng, Preston Strowbridge, Cuneyt Akcora # 2025-07-07 -+ [Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message](https://arxiv.org//abs/2507.04673) ++ [Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message](https://arxiv.org/abs/2507.04673) Wei Duan, Li Qian -+ [Losing Control: Data Poisoning Attack on Guided Diffusion via ControlNet](https://arxiv.org//abs/2507.04726) ++ [Losing Control: Data Poisoning Attack on Guided Diffusion via ControlNet](https://arxiv.org/abs/2507.04726) Raz Lapid, Almog Dubin -+ [Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning](https://arxiv.org//abs/2507.04883) ++ [Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning](https://arxiv.org/abs/2507.04883) Sanyam Vyas, Alberto Caron, Chris Hicks, Pete Burnap, Vasilios Mavroudis -+ [BackFed: An Efficient & Standardized Benchmark Suite for Backdoor Attacks in Federated Learning](https://arxiv.org//abs/2507.04903) ++ [BackFed: An Efficient & Standardized Benchmark Suite for Backdoor Attacks in Federated Learning](https://arxiv.org/abs/2507.04903) Thinh Dao, Dung Thuy Nguyen, Khoa D Doan, Kok-Seng Wong -+ [ICAS: Detecting Training Data from Autoregressive Image Generative Models](https://arxiv.org//abs/2507.05068) ++ [ICAS: Detecting Training Data from Autoregressive Image Generative Models](https://arxiv.org/abs/2507.05068) Hongyao Yu, Yixiang Qiu, Yiheng Yang, Hao Fang, Tianqu Zhuang, Jiaxin Hong, Bin Chen, Hao Wu, Shu-Tao Xia -+ [The Hidden Threat in Plain Text: Attacking RAG Data Loaders](https://arxiv.org//abs/2507.05093) ++ [The Hidden Threat in Plain Text: Attacking RAG Data Loaders](https://arxiv.org/abs/2507.05093) Alberto Castagnaro, Umberto Salviati, Mauro Conti, Luca Pajola, Simeone Pizzi -+ [Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models](https://arxiv.org//abs/2507.05248) ++ [Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models](https://arxiv.org/abs/2507.05248) Ziqi Miao, Lijun Li, Yuan Xiong, Zhenhua Liu, Pengyu Zhu, Jing Shao -+ [Robustifying 3D Perception through Least-Squares Multi-Agent Graphs Object Tracking](https://arxiv.org//abs/2507.04762) ++ [Robustifying 3D Perception through Least-Squares Multi-Agent Graphs Object Tracking](https://arxiv.org/abs/2507.04762) Maria Damanaki, Ioulia Kapsali, Nikos Piperigkos, Alexandros Gkillas, Aris S. Lalos -+ [FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift](https://arxiv.org//abs/2507.04781) ++ [FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift](https://arxiv.org/abs/2507.04781) Yong Zhang, Feng Liang, Guanghu Yuan, Min Yang, Chengming Li, Xiping Hu -+ [Cascade: Token-Sharded Private LLM Inference](https://arxiv.org//abs/2507.05228) ++ [Cascade: Token-Sharded Private LLM Inference](https://arxiv.org/abs/2507.05228) Rahul Thomas, Louai Zahran, Erica Choi, Akilesh Potti, Micah Goldblum, Arka Pal -+ [CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation](https://arxiv.org//abs/2507.05113) ++ [CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation](https://arxiv.org/abs/2507.05113) Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang -+ [Red Teaming AI Red Teaming](https://arxiv.org//abs/2507.05538) ++ [Red Teaming AI Red Teaming](https://arxiv.org/abs/2507.05538) Subhabrata Majumdar, Brian Pendleton, Abhishek Gupta -+ [Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences](https://arxiv.org//abs/2507.05391) ++ [Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences](https://arxiv.org/abs/2507.05391) Guillem Ramírez, Alexandra Birch, Ivan Titov -+ [Adversarial Machine Learning Attacks on Financial Reporting via Maximum Violated Multi-Objective Attack](https://arxiv.org//abs/2507.05441) ++ [Adversarial Machine Learning Attacks on Financial Reporting via Maximum Violated Multi-Objective Attack](https://arxiv.org/abs/2507.05441) Edward Raff, Karen Kukla, Michel Benaroch, Joseph Comprix -+ [Bit-Flip Fault Attack: Crushing Graph Neural Networks via Gradual Bit Search](https://arxiv.org//abs/2507.05531) ++ [Bit-Flip Fault Attack: Crushing Graph Neural Networks via Gradual Bit Search](https://arxiv.org/abs/2507.05531) Sanaz Kazemi Abharian, Sai Manoj Pudukotai Dinakarrao @@ -6635,326 +6635,326 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang # 2025-07-06 -+ [Towards integration of Privacy Enhancing Technologies in Explainable Artificial Intelligence](https://arxiv.org//abs/2507.04528) ++ [Towards integration of Privacy Enhancing Technologies in Explainable Artificial Intelligence](https://arxiv.org/abs/2507.04528) Sonal Allana, Rozita Dara, Xiaodong Lin, Pulei Xiong -+ [Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties](https://arxiv.org//abs/2507.04227) ++ [Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties](https://arxiv.org/abs/2507.04227) Guohong Liu, Jialei Ye, Jiacheng Liu, Yuanchun Li, Wei Liu, Pengzhi Gao, Jian Luan, Yunxin Liu -+ [Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs](https://arxiv.org//abs/2507.04365) ++ [Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs](https://arxiv.org/abs/2507.04365) Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho -+ [Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models](https://arxiv.org//abs/2507.04478) ++ [Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models](https://arxiv.org/abs/2507.04478) Sathesh P.Sivashanmugam -+ [DP-Fusion: Token-Level Differentially Private Inference for Large Language Models](https://arxiv.org//abs/2507.04531) ++ [DP-Fusion: Token-Level Differentially Private Inference for Large Language Models](https://arxiv.org/abs/2507.04531) Rushil Thareja, Preslav Nakov, Praneeth Vepakomma, Nils Lukas -+ [Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking](https://arxiv.org//abs/2507.04446) ++ [Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking](https://arxiv.org/abs/2507.04446) Tim Beyer, Yan Scholten, Stephan Günnemann, Leo Schwinn -+ [Mass-Scale Analysis of In-the-Wild Conversations Reveals Complexity Bounds on LLM Jailbreaking](https://arxiv.org//abs/2507.08014) ++ [Mass-Scale Analysis of In-the-Wild Conversations Reveals Complexity Bounds on LLM Jailbreaking](https://arxiv.org/abs/2507.08014) Aldan Creo, Raul Castro Fernandez, Manuel Cebrian # 2025-07-05 -+ [Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing](https://arxiv.org//abs/2507.04105) ++ [Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing](https://arxiv.org/abs/2507.04105) Jinwei Hu, Yi Dong, Zhengtao Ding, Xiaowei Huang -+ [Evaluating Adversarial Protections for Diffusion Personalization: A Comprehensive Study](https://arxiv.org//abs/2507.03953) ++ [Evaluating Adversarial Protections for Diffusion Personalization: A Comprehensive Study](https://arxiv.org/abs/2507.03953) Kai Ye, Tianyi Chen, Zhen Wang -+ [Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems](https://arxiv.org//abs/2507.04100) ++ [Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems](https://arxiv.org/abs/2507.04100) Jinwei Hu, Zezhi Tang, Xin Jin, Benyuan Zhang, Yi Dong, Xiaowei Huang -+ [Addressing The Devastating Effects Of Single-Task Data Poisoning In Exemplar-Free Continual Learning](https://arxiv.org//abs/2507.04106) ++ [Addressing The Devastating Effects Of Single-Task Data Poisoning In Exemplar-Free Continual Learning](https://arxiv.org/abs/2507.04106) Stanisław Pawlak (1), Bartłomiej Twardowski (2 and 3), Tomasz Trzciński (1 and 2), Joost van de Weijer (3) ((1) Warsaw University of Technology, Poland, (2) IDEAS Research Institute, Poland, (3) Computer Vision Center, Universitat Autonoma de Barcelona, Spain) -+ [When Data-Free Knowledge Distillation Meets Non-Transferable Teacher: Escaping Out-of-Distribution Trap is All You Need](https://arxiv.org//abs/2507.04119) ++ [When Data-Free Knowledge Distillation Meets Non-Transferable Teacher: Escaping Out-of-Distribution Trap is All You Need](https://arxiv.org/abs/2507.04119) Ziming Hong, Runnan Chen, Zengmao Wang, Bo Han, Bo Du, Tongliang Liu # 2025-07-04 -+ [On Jailbreaking Quantized Language Models Through Fault Injection Attacks](https://arxiv.org//abs/2507.03236) ++ [On Jailbreaking Quantized Language Models Through Fault Injection Attacks](https://arxiv.org/abs/2507.03236) Noureldin Zahran, Ahmad Tahmasivand, Ihsen Alouani, Khaled Khasawneh, Mohammed E. Fouda -+ [De-Fake: Style based Anomaly Deepfake Detection](https://arxiv.org//abs/2507.03334) ++ [De-Fake: Style based Anomaly Deepfake Detection](https://arxiv.org/abs/2507.03334) Sudev Kumar Padhi, Harshit Kumar, Umesh Kashyap, Sk. Subidh Ali -+ [Evaluating the Evaluators: Trust in Adversarial Robustness Tests](https://arxiv.org//abs/2507.03450) ++ [Evaluating the Evaluators: Trust in Adversarial Robustness Tests](https://arxiv.org/abs/2507.03450) Antonio Emanuele Cinà, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Fabio Roli -+ [Beyond Weaponization: NLP Security for Medium and Lower-Resourced Languages in Their Own Right](https://arxiv.org//abs/2507.03473) ++ [Beyond Weaponization: NLP Security for Medium and Lower-Resourced Languages in Their Own Right](https://arxiv.org/abs/2507.03473) Heather Lent -+ [Rectifying Adversarial Sample with Low Entropy Prior for Test-Time Defense](https://arxiv.org//abs/2507.03427) ++ [Rectifying Adversarial Sample with Low Entropy Prior for Test-Time Defense](https://arxiv.org/abs/2507.03427) Lina Ma, Xiaowei Fu, Fuxiang Huang, Xinbo Gao, Lei Zhang -+ [SecureT2I: No More Unauthorized Manipulation on AI Generated Images from Prompts](https://arxiv.org//abs/2507.03636) ++ [SecureT2I: No More Unauthorized Manipulation on AI Generated Images from Prompts](https://arxiv.org/abs/2507.03636) Xiaodong Wu, Xiangman Li, Qi Li, Jianbing Ni, Rongxing Lu -+ [Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization](https://arxiv.org//abs/2507.03372) ++ [Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization](https://arxiv.org/abs/2507.03372) Buqing Nie, Yangqing Fu, Jingtian Ji, Yue Gao -+ [Blackbox Dataset Inference for LLM](https://arxiv.org//abs/2507.03619) ++ [Blackbox Dataset Inference for LLM](https://arxiv.org/abs/2507.03619) Ruikai Zhou, Kang Yang, Xun Chen, Wendy Hui Wang, Guanhong Tao, Jun Xu -+ [When There Is No Decoder: Removing Watermarks from Stable Diffusion Models in a No-box Setting](https://arxiv.org//abs/2507.03646) ++ [When There Is No Decoder: Removing Watermarks from Stable Diffusion Models in a No-box Setting](https://arxiv.org/abs/2507.03646) Xiaodong Wu, Tianyi Tang, Xiangman Li, Jianbing Ni, Yong Yu # 2025-07-03 -+ [De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks](https://arxiv.org//abs/2507.02606) ++ [De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks](https://arxiv.org/abs/2507.02606) Wei Fan, Kejiang Chen, Chang Liu, Weiming Zhang, Nenghai Yu -+ [Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks](https://arxiv.org//abs/2507.02735) ++ [Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks](https://arxiv.org/abs/2507.02735) Sizhe Chen, Arman Zharmagambetov, David Wagner, Chuan Guo -+ [Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models](https://arxiv.org//abs/2507.02799) ++ [Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models](https://arxiv.org/abs/2507.02799) Riccardo Cantini, Nicola Gabriele, Alessio Orsino, Domenico Talia -+ [LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users](https://arxiv.org//abs/2507.02850) ++ [LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users](https://arxiv.org/abs/2507.02850) Almog Hilel, Idan Shenfeld, Leshem Choshen, Jacob Andreas -+ [Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection](https://arxiv.org//abs/2507.02844) ++ [Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection](https://arxiv.org/abs/2507.02844) Ziqi Miao, Yi Ding, Lijun Li, Jing Shao -+ [Fluid Democracy in Federated Data Aggregation](https://arxiv.org//abs/2507.02710) ++ [Fluid Democracy in Federated Data Aggregation](https://arxiv.org/abs/2507.02710) Aditya Vema Reddy Kesari, Krishna Reddy Kesari -+ [PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage](https://arxiv.org//abs/2507.02332) ++ [PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage](https://arxiv.org/abs/2507.02332) Krishna Kanth Nakka, Xue Jiang, Xuebing Zhou -+ [On the Mathematical Impossibility of Safe Universal Approximators](https://arxiv.org//abs/2507.03031) ++ [On the Mathematical Impossibility of Safe Universal Approximators](https://arxiv.org/abs/2507.03031) Jasper Yao -+ [Adversarial Manipulation of Reasoning Models using Internal Representations](https://arxiv.org//abs/2507.03167) ++ [Adversarial Manipulation of Reasoning Models using Internal Representations](https://arxiv.org/abs/2507.03167) Kureha Yamaguchi, Benjamin Etheridge, Andy Arditi -+ [Adopting a human developmental visual diet yields robust, shape-based AI vision](https://arxiv.org//abs/2507.03168) ++ [Adopting a human developmental visual diet yields robust, shape-based AI vision](https://arxiv.org/abs/2507.03168) Zejin Lu, Sushrut Thorat, Radoslaw M Cichy, Tim C Kietzmann -+ [Rethinking Data Protection in the (Generative) Artificial Intelligence Era](https://arxiv.org//abs/2507.03034) ++ [Rethinking Data Protection in the (Generative) Artificial Intelligence Era](https://arxiv.org/abs/2507.03034) Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, Kui Ren -+ [CyberRAG: An Agentic RAG cyber attack classification and reporting tool](https://arxiv.org//abs/2507.02424) ++ [CyberRAG: An Agentic RAG cyber attack classification and reporting tool](https://arxiv.org/abs/2507.02424) Francesco Blefari, Cristian Cosentino, Francesco Aurelio Pironti, Angelo Furfaro, Fabrizio Marozzo # 2025-07-02 -+ [ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks](https://arxiv.org//abs/2507.01321) ++ [ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks](https://arxiv.org/abs/2507.01321) Zhiyao Ren, Siyuan Liang, Aishan Liu, Dacheng Tao -+ [Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems](https://arxiv.org//abs/2507.01607) ++ [Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems](https://arxiv.org/abs/2507.01607) Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi, Eric Bourbao -+ [GPT, But Backwards: Exactly Inverting Language Model Outputs](https://arxiv.org//abs/2507.01693) ++ [GPT, But Backwards: Exactly Inverting Language Model Outputs](https://arxiv.org/abs/2507.01693) Adrians Skapars, Edoardo Manino, Youcheng Sun, Lucas C. Cordeiro -+ [Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training](https://arxiv.org//abs/2507.01752) ++ [Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training](https://arxiv.org/abs/2507.01752) Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud -+ [Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging](https://arxiv.org//abs/2507.01788) ++ [Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging](https://arxiv.org/abs/2507.01788) Montasir Shams, Chashi Mahiul Islam, Shaeke Salman, Phat Tran, Xiuwen Liu -+ [3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation](https://arxiv.org//abs/2507.01367) ++ [3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation](https://arxiv.org/abs/2507.01367) Tianrui Lou, Xiaojun Jia, Siyuan Liang, Jiawei Liang, Ming Zhang, Yanjun Xiao, Xiaochun Cao -+ [Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention](https://arxiv.org//abs/2507.01417) ++ [Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention](https://arxiv.org/abs/2507.01417) Jiawei Gu, Ziyue Qiao, Zechao Li -+ [Boosting Adversarial Transferability Against Defenses via Multi-Scale Transformation](https://arxiv.org//abs/2507.01791) ++ [Boosting Adversarial Transferability Against Defenses via Multi-Scale Transformation](https://arxiv.org/abs/2507.01791) Zihong Guo, Chen Wan, Yayin Zheng, Hailing Kuang, Xiaohai Lu -+ [SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism](https://arxiv.org//abs/2507.01513) ++ [SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism](https://arxiv.org/abs/2507.01513) Beitao Chen, Xinyu Lyu, Lianli Gao, Jingkuan Song, Heng Tao Shen -+ [Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks](https://arxiv.org//abs/2507.01694) ++ [Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks](https://arxiv.org/abs/2507.01694) Hanlin Cai, Haofan Dong, Houtianfu Wang, Kai Li, Ozgur B. Akan -+ [Towards Better Attribute Inference Vulnerability Measures](https://arxiv.org//abs/2507.01710) ++ [Towards Better Attribute Inference Vulnerability Measures](https://arxiv.org/abs/2507.01710) Paul Francis, David Wagner -+ [Subversion via Focal Points: Investigating Collusion in LLM Monitoring](https://arxiv.org//abs/2507.03010) ++ [Subversion via Focal Points: Investigating Collusion in LLM Monitoring](https://arxiv.org/abs/2507.03010) Olli Järviniemi -+ [Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence](https://arxiv.org//abs/2507.01504) ++ [Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence](https://arxiv.org/abs/2507.01504) Robert Aufschläger, Youssef Shoeb, Azarm Nowzad, Michael Heigl, Fabian Bally, Martin Schramm # 2025-07-01 -+ [PNAct: Crafting Backdoor Attacks in Safe Reinforcement Learning](https://arxiv.org//abs/2507.00485) ++ [PNAct: Crafting Backdoor Attacks in Safe Reinforcement Learning](https://arxiv.org/abs/2507.00485) Weiran Guo, Guanjun Liu, Ziyuan Zhou, Ling Wang -+ [BadViM: Backdoor Attack against Vision Mamba](https://arxiv.org//abs/2507.00577) ++ [BadViM: Backdoor Attack against Vision Mamba](https://arxiv.org/abs/2507.00577) Yinghao Wu, Liyan Zhang -+ [CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs](https://arxiv.org//abs/2507.00817) ++ [CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs](https://arxiv.org/abs/2507.00817) Jiaming Zhang, Rui Hu, Qing Guo, Wei Yang Bryan Lim -+ [Reasoning as an Adaptive Defense for Safety](https://arxiv.org//abs/2507.00971) ++ [Reasoning as an Adaptive Defense for Safety](https://arxiv.org/abs/2507.00971) Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan, Aviral Kumar -+ [Cage-Based Deformation for Transferable and Undefendable Point Cloud Attack](https://arxiv.org//abs/2507.00690) ++ [Cage-Based Deformation for Transferable and Undefendable Point Cloud Attack](https://arxiv.org/abs/2507.00690) Keke Tang, Ziyong Du, Weilong Peng, Xiaofei Wang, Peican Zhu, Ligang Liu, Zhihong Tian -+ [Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning](https://arxiv.org//abs/2507.00423) ++ [Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning](https://arxiv.org/abs/2507.00423) Wenjin Mo, Zhiyuan Li, Minghong Fang, Mingwei Fang -+ [`For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts](https://arxiv.org//abs/2507.02990) ++ [`For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts](https://arxiv.org/abs/2507.02990) Annika M Schoene, Cansu Canca -+ [PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning](https://arxiv.org//abs/2507.01216) ++ [PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning](https://arxiv.org/abs/2507.01216) Xingke Yang, Liang Li, Zhiyi Wan, Sicong Li, Xiaoqi Qi, Jiang Liu, Tomoaki Ohtsuki, Xin Fu, Miao Pan -+ [Geological Everything Model 3D: A Physics-informed Promptable Foundation Model for Unified and Zero-Shot Subsurface Understanding](https://arxiv.org//abs/2507.00419) ++ [Geological Everything Model 3D: A Physics-informed Promptable Foundation Model for Unified and Zero-Shot Subsurface Understanding](https://arxiv.org/abs/2507.00419) Yimin Dou, Xinming Wu, Nathan L Bangs, Harpreet Singh Sethi, Jintao Li, Hang Gao, Zhixiang Guo # 2025-06-30 -+ [Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models](https://arxiv.org//abs/2506.23576) ++ [Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models](https://arxiv.org/abs/2506.23576) Maria Carolina Cornelia Wit, Jun Pang -+ [PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection](https://arxiv.org//abs/2506.23581) ++ [PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection](https://arxiv.org/abs/2506.23581) Xiao Li, Yiming Zhu, Yifan Huang, Wei Zhang, Yingzhe He, Jie Shi, Xiaolin Hu -+ [SoK: Semantic Privacy in Large Language Models](https://arxiv.org//abs/2506.23603) ++ [SoK: Semantic Privacy in Large Language Models](https://arxiv.org/abs/2506.23603) Baihe Ma, Yanna Jiang, Xu Wang, Guangshen Yu, Qin Wang, Caijun Sun, Chen Li, Xuelei Qi, Ying He, Wei Ni, Ren Ping Liu -+ [AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data](https://arxiv.org//abs/2506.23735) ++ [AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data](https://arxiv.org/abs/2506.23735) JiaRu Wu, Mingwei Liu -+ [STACK: Adversarial Attacks on LLM Safeguard Pipelines](https://arxiv.org//abs/2506.24068) ++ [STACK: Adversarial Attacks on LLM Safeguard Pipelines](https://arxiv.org/abs/2506.24068) Ian R. McKenzie, Oskar J. Hollinsworth, Tom Tseng, Xander Davies, Stephen Casper, Aaron D. Tucker, Robert Kirk, Adam Gleave -+ [SQUASH: A SWAP-Based Quantum Attack to Sabotage Hybrid Quantum Neural Networks](https://arxiv.org//abs/2506.24081) ++ [SQUASH: A SWAP-Based Quantum Attack to Sabotage Hybrid Quantum Neural Networks](https://arxiv.org/abs/2506.24081) Rahul Kumar, Wenqi Wei, Ying Mao, Junaid Farooq, Ying Wang, Juntao Chen -+ [Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack](https://arxiv.org//abs/2506.23661) ++ [Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack](https://arxiv.org/abs/2506.23661) Arnisa Fazla, Lucas Krauter, David Guzman Piedrahita, Andrianos Michail -+ [AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays](https://arxiv.org//abs/2506.23467) ++ [AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays](https://arxiv.org/abs/2506.23467) Chenlang Yi, Zizhan Xiong, Qi Qi, Xiyuan Wei, Girish Bathla, Ching-Long Lin, Bobak Jack Mortazavi, Tianbao Yang -+ [A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement](https://arxiv.org//abs/2506.23676) ++ [A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement](https://arxiv.org/abs/2506.23676) Gaozheng Pei, Ke Ma, Dongpeng Zhang, Chengzhi Sun, Qianqian Xu, Qingming Huang -+ [A Scalable Approach for Safe and Robust Learning via Lipschitz-Constrained Networks](https://arxiv.org//abs/2506.23977) ++ [A Scalable Approach for Safe and Robust Learning via Lipschitz-Constrained Networks](https://arxiv.org/abs/2506.23977) Zain ul Abdeen, Vassilis Kekatos, Ming Jin -+ [Consensus-based optimization for closed-box adversarial attacks and a connection to evolution strategies](https://arxiv.org//abs/2506.24048) ++ [Consensus-based optimization for closed-box adversarial attacks and a connection to evolution strategies](https://arxiv.org/abs/2506.24048) Tim Roith, Leon Bungert, Philipp Wacker -+ [Privacy-Preserving Federated Learning Scheme with Mitigating Model Poisoning Attacks: Vulnerabilities and Countermeasures](https://arxiv.org//abs/2506.23622) ++ [Privacy-Preserving Federated Learning Scheme with Mitigating Model Poisoning Attacks: Vulnerabilities and Countermeasures](https://arxiv.org/abs/2506.23622) Jiahui Wu, Fucai Luo, Tiecheng Sun, Haiyan Wang, Weizhe Zhang -+ [Poisoning Attacks to Local Differential Privacy for Ranking Estimation](https://arxiv.org//abs/2506.24033) ++ [Poisoning Attacks to Local Differential Privacy for Ranking Estimation](https://arxiv.org/abs/2506.24033) Pei Zhan, Peng Tang, Yangzhuo Li, Puwen Wei, Shanqing Guo -+ [Impact of Fine-Tuning Methods on Memorization in Large Language Models](https://arxiv.org//abs/2507.00258) ++ [Impact of Fine-Tuning Methods on Memorization in Large Language Models](https://arxiv.org/abs/2507.00258) Jie Hou, Chuxiong Wu, Lannan Luo, Qiang Zeng -+ [PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction](https://arxiv.org//abs/2507.00230) ++ [PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction](https://arxiv.org/abs/2507.00230) Peilin He, James Joshi -+ [Concept-based Adversarial Attack: a Probabilistic Perspective](https://arxiv.org//abs/2507.02965) ++ [Concept-based Adversarial Attack: a Probabilistic Perspective](https://arxiv.org/abs/2507.02965) Andi Zhang, Xuan Ding, Steven McDonagh, Samuel Kaski # 2025-06-29 -+ [From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows](https://arxiv.org//abs/2506.23260) ++ [From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows](https://arxiv.org/abs/2506.23260) Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Merouane Debbah -+ [Securing AI Systems: A Guide to Known Attacks and Impacts](https://arxiv.org//abs/2506.23296) ++ [Securing AI Systems: A Guide to Known Attacks and Impacts](https://arxiv.org/abs/2506.23296) Naoto Kiribuchi, Kengo Zenitani, Takayuki Semitsu -+ [TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs](https://arxiv.org//abs/2506.23423) ++ [TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs](https://arxiv.org/abs/2506.23423) Felipe Nuti, Tim Franzmeyer, João Henriques -+ [Trident: Detecting Face Forgeries with Adversarial Triplet Learning](https://arxiv.org//abs/2506.23189) ++ [Trident: Detecting Face Forgeries with Adversarial Triplet Learning](https://arxiv.org/abs/2506.23189) Mustafa Hakan Kara, Aysegul Dundar, Uğur Güdükbay -+ [Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings](https://arxiv.org//abs/2506.23145) ++ [Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings](https://arxiv.org/abs/2506.23145) Shahad Hardan, Darya Taratynova, Abdelmajid Essofi, Karthik Nandakumar, Mohammad Yaqub -+ [A Practical and Secure Byzantine Robust Aggregator](https://arxiv.org//abs/2506.23183) ++ [A Practical and Secure Byzantine Robust Aggregator](https://arxiv.org/abs/2506.23183) De Zhang Lee, Aashish Kolluri, Prateek Saxena, Ee-Chien Chang -+ [A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks](https://arxiv.org//abs/2507.02956) ++ [A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks](https://arxiv.org/abs/2507.02956) Blake Bullwinkel, Mark Russinovich, Ahmed Salem, Santiago Zanella-Beguelin, Daniel Jones, Giorgio Severi, Eugenia Kim, Keegan Hines, Amanda Minnich, Yonatan Zunger, Ram Shankar Siva Kumar -+ [Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes](https://arxiv.org//abs/2506.23165) ++ [Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes](https://arxiv.org/abs/2506.23165) David Bossens, Atsushi Nitanda @@ -6963,178 +6963,178 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca De Zhang Lee, Aashish Kolluri, Prateek Saxena, Ee-Chien Chang # 2025-06-28 -+ [Kill Two Birds with One Stone! Trajectory enabled Unified Online Detection of Adversarial Examples and Backdoor Attacks](https://arxiv.org//abs/2506.22722) ++ [Kill Two Birds with One Stone! Trajectory enabled Unified Online Detection of Adversarial Examples and Backdoor Attacks](https://arxiv.org/abs/2506.22722) Anmin Fu, Fanyu Meng, Huaibing Peng, Hua Ma, Zhi Zhang, Yifeng Zheng, Willy Susilo, Yansong Gao -+ [Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation](https://arxiv.org//abs/2506.22776) ++ [Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation](https://arxiv.org/abs/2506.22776) Sen Fang, Weiyuan Ding, Antonio Mastropaolo, Bowen Xu -+ [PhonemeFake: Redefining Deepfake Realism with Language-Driven Segmental Manipulation and Adaptive Bilevel Detection](https://arxiv.org//abs/2506.22783) ++ [PhonemeFake: Redefining Deepfake Realism with Language-Driven Segmental Manipulation and Adaptive Bilevel Detection](https://arxiv.org/abs/2506.22783) Oguzhan Baser, Ahmet Ege Tanriverdi, Sriram Vishwanath, Sandeep P. Chinchali -+ [WavShape: Information-Theoretic Speech Representation Learning for Fair and Privacy-Aware Audio Processing](https://arxiv.org//abs/2506.22789) ++ [WavShape: Information-Theoretic Speech Representation Learning for Fair and Privacy-Aware Audio Processing](https://arxiv.org/abs/2506.22789) Oguzhan Baser, Ahmet Ege Tanriverdi, Kaan Kale, Sandeep P. Chinchali, Sriram Vishwanath -+ [Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate](https://arxiv.org//abs/2506.22806) ++ [Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate](https://arxiv.org/abs/2506.22806) Byung Hyun Lee, Sungjin Lim, Seunggyu Lee, Dong Un Kang, Se Young Chun -+ [Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images](https://arxiv.org//abs/2506.22960) ++ [Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images](https://arxiv.org/abs/2506.22960) Shreyas Dixit, Ashhar Aziz, Shashwat Bajpai, Vasu Sharma, Aman Chadha, Vinija Jain, Amitava Das -+ [Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models](https://arxiv.org//abs/2506.22982) ++ [Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models](https://arxiv.org/abs/2506.22982) Atharv Mittal, Agam Pandey, Amritanshu Tiwari, Sukrit Jindal, Swadesh Swain -+ [Fragile, Robust, and Antifragile: A Perspective from Parameter Responses in Reinforcement Learning Under Stress](https://arxiv.org//abs/2506.23036) ++ [Fragile, Robust, and Antifragile: A Perspective from Parameter Responses in Reinforcement Learning Under Stress](https://arxiv.org/abs/2506.23036) Zain ul Abdeen, Ming Jin -+ [FreqDGT: Frequency-Adaptive Dynamic Graph Networks with Transformer for Cross-subject EEG Emotion Recognition](https://arxiv.org//abs/2506.22807) ++ [FreqDGT: Frequency-Adaptive Dynamic Graph Networks with Transformer for Cross-subject EEG Emotion Recognition](https://arxiv.org/abs/2506.22807) Yueyang Li, Shengyu Gong, Weiming Zeng, Nizhuan Wang, Wai Ting Siok # 2025-06-27 -+ [On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling](https://arxiv.org//abs/2506.21874) ++ [On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling](https://arxiv.org/abs/2506.21874) Stanley Wu, Ronik Bhaskar, Anna Yoo Jeong Ha, Shawn Shan, Haitao Zheng, Ben Y. Zhao -+ [Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses](https://arxiv.org//abs/2506.21972) ++ [Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses](https://arxiv.org/abs/2506.21972) Mohamed Ahmed, Mohamed Abdelmouty, Mingyu Kim, Gunvanth Kandula, Alex Park, James C. Davis -+ [ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks](https://arxiv.org//abs/2506.22423) ++ [ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks](https://arxiv.org/abs/2506.22423) Pritam Dash, Ethan Chan, Nathan P. Lawrence, Karthik Pattabiraman -+ [Adversarial Threats in Quantum Machine Learning: A Survey of Attacks and Defenses](https://arxiv.org//abs/2506.21842) ++ [Adversarial Threats in Quantum Machine Learning: A Survey of Attacks and Defenses](https://arxiv.org/abs/2506.21842) Archisman Ghosh, Satwik Kundu, Swaroop Ghosh -+ [VERA: Variational Inference Framework for Jailbreaking Large Language Models](https://arxiv.org//abs/2506.22666) ++ [VERA: Variational Inference Framework for Jailbreaking Large Language Models](https://arxiv.org/abs/2506.22666) Anamika Lochab, Lu Yan, Patrick Pynadath, Xiangyu Zhang, Ruqi Zhang -+ [Are Fast Methods Stable in Adversarially Robust Transfer Learning?](https://arxiv.org//abs/2506.22602) ++ [Are Fast Methods Stable in Adversarially Robust Transfer Learning?](https://arxiv.org/abs/2506.22602) Joshua C. Zhao, Saurabh Bagchi -+ [MetaCipher: A General and Extensible Reinforcement Learning Framework for Obfuscation-Based Jailbreak Attacks on Black-Box LLMs](https://arxiv.org//abs/2506.22557) ++ [MetaCipher: A General and Extensible Reinforcement Learning Framework for Obfuscation-Based Jailbreak Attacks on Black-Box LLMs](https://arxiv.org/abs/2506.22557) Boyuan Chen, Minghao Shao, Abdul Basit, Siddharth Garg, Muhammad Shafique # 2025-06-26 -+ [Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments](https://arxiv.org//abs/2506.21127) ++ [Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments](https://arxiv.org/abs/2506.21127) Deepak Kumar Panda, Weisi Guo -+ [Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks](https://arxiv.org//abs/2506.21129) ++ [Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks](https://arxiv.org/abs/2506.21129) Deepak Kumar Panda, Adolfo Perrusquia, Weisi Guo -+ [TITAN: Query-Token based Domain Adaptive Adversarial Learning](https://arxiv.org//abs/2506.21484) ++ [TITAN: Query-Token based Domain Adaptive Adversarial Learning](https://arxiv.org/abs/2506.21484) Tajamul Ashraf, Janibul Bashir -+ [Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features](https://arxiv.org//abs/2506.21046) ++ [Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features](https://arxiv.org/abs/2506.21046) Shangbo Wu, Yu-an Tan, Ruinan Ma, Wencong Ma, Dehua Zhu, Yuanzhang Li -+ [GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models](https://arxiv.org//abs/2506.21245) ++ [GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models](https://arxiv.org/abs/2506.21245) Qifei Cui, Xinyu Lu -+ [Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks](https://arxiv.org//abs/2506.21142) ++ [Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks](https://arxiv.org/abs/2506.21142) Deepak Kumar Panda, Weisi Guo -+ [CodeGuard: A Generalized and Stealthy Backdoor Watermarking for Generative Code Models](https://arxiv.org//abs/2506.20926) ++ [CodeGuard: A Generalized and Stealthy Backdoor Watermarking for Generative Code Models](https://arxiv.org/abs/2506.20926) Haoxuan Li, Jiale Zhang, Xiaobing Sun, Xiapu Luo -+ [SPA: Towards More Stealth and Persistent Backdoor Attacks in Federated Learning](https://arxiv.org//abs/2506.20931) ++ [SPA: Towards More Stealth and Persistent Backdoor Attacks in Federated Learning](https://arxiv.org/abs/2506.20931) Chengcheng Zhu, Ye Li, Bosen Rao, Jiale Zhang, Yunlong Mao, Sheng Zhong -+ [PrivacyGo: Privacy-Preserving Ad Measurement with Multidimensional Intersection](https://arxiv.org//abs/2506.20981) ++ [PrivacyGo: Privacy-Preserving Ad Measurement with Multidimensional Intersection](https://arxiv.org/abs/2506.20981) Jian Du, Haohao Qian, Shikun Zhang, Wen-jie Lu, Donghang Lu, Yongchuan Niu, Bo Jiang, Yongjun Zhao, Qiang Yan -+ [AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text](https://arxiv.org//abs/2506.22508) ++ [AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text](https://arxiv.org/abs/2506.22508) Chenyang Shao, Tianxing Li, Chenhao Pu, Fengli Xu, Yong Li -+ [A Survey on Model Extraction Attacks and Defenses for Large Language Models](https://arxiv.org//abs/2506.22521) ++ [A Survey on Model Extraction Attacks and Defenses for Large Language Models](https://arxiv.org/abs/2506.22521) Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong -+ [Balancing Privacy and Utility in Correlated Data: A Study of Bayesian Differential Privacy](https://arxiv.org//abs/2506.21308) ++ [Balancing Privacy and Utility in Correlated Data: A Study of Bayesian Differential Privacy](https://arxiv.org/abs/2506.21308) Martin Lange, Patricia Guerra-Balboa, Javier Parra-Arnau, Thorsten Strufe # 2025-06-25 -+ [Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning](https://arxiv.org//abs/2506.20413) ++ [Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning](https://arxiv.org/abs/2506.20413) Mohammad Mahdi Maheri, Denys Herasymuk, Hamed Haddadi -+ [Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks](https://arxiv.org//abs/2506.20548) ++ [Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks](https://arxiv.org/abs/2506.20548) Manyi Li, Renshuai Tao, Yufan Liu, Chuangchuang Tan, Haotong Qin, Bing Li, Yunchao Wei, Yao Zhao -+ [Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS](https://arxiv.org//abs/2506.20576) ++ [Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS](https://arxiv.org/abs/2506.20576) Sabrine Ennaji, Elhadj Benkhelifa, Luigi V. Mancini -+ [InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking](https://arxiv.org//abs/2506.20370) ++ [InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking](https://arxiv.org/abs/2506.20370) Abdullah All Tanvir, Xin Zhong -+ [AdvMIM: Adversarial Masked Image Modeling for Semi-Supervised Medical Image Segmentation](https://arxiv.org//abs/2506.20563) ++ [AdvMIM: Adversarial Masked Image Modeling for Semi-Supervised Medical Image Segmentation](https://arxiv.org/abs/2506.20563) Lei Zhu, Jun Zhou, Rick Siow Mong Goh, Yong Liu -+ [Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated Learning](https://arxiv.org//abs/2506.20651) ++ [Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated Learning](https://arxiv.org/abs/2506.20651) Fei Wang, Baochun Li -+ [Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox](https://arxiv.org//abs/2506.20102) ++ [Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox](https://arxiv.org/abs/2506.20102) Malikussaid, Sutiyo -+ [Don't Hash Me Like That: Exposing and Mitigating Hash-Induced Unfairness in Local Differential Privacy](https://arxiv.org//abs/2506.20290) ++ [Don't Hash Me Like That: Exposing and Mitigating Hash-Induced Unfairness in Local Differential Privacy](https://arxiv.org/abs/2506.20290) Berkay Kemal Balioglu, Alireza Khodaie, Mehmet Emre Gursoy -+ [Poster: Enhancing GNN Robustness for Network Intrusion Detection via Agent-based Analysis](https://arxiv.org//abs/2506.20806) ++ [Poster: Enhancing GNN Robustness for Network Intrusion Detection via Agent-based Analysis](https://arxiv.org/abs/2506.20806) Zhonghao Zhan, Huichi Zhou, Hamed Haddadi -+ [Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA](https://arxiv.org//abs/2506.20856) ++ [Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA](https://arxiv.org/abs/2506.20856) Fei Wang, Baochun Li -+ [Universal and Efficient Detection of Adversarial Data through Nonuniform Impact on Network Layers](https://arxiv.org//abs/2506.20816) ++ [Universal and Efficient Detection of Adversarial Data through Nonuniform Impact on Network Layers](https://arxiv.org/abs/2506.20816) Furkan Mumcu, Yasin Yilmaz -+ [On the Necessity of Output Distribution Reweighting for Effective Class Unlearning](https://arxiv.org//abs/2506.20893) ++ [On the Necessity of Output Distribution Reweighting for Effective Class Unlearning](https://arxiv.org/abs/2506.20893) Yian Wang, Ali Ebrahimpour-Boroojeny, Hari Sundaram -+ [SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning](https://arxiv.org//abs/2506.22506) ++ [SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning](https://arxiv.org/abs/2506.22506) Momin Ahmad Khan, Yasra Chandio, Fatima Muhammad Anwar -+ [VSF-Med:A Vulnerability Scoring Framework for Medical Vision-Language Models](https://arxiv.org//abs/2507.00052) ++ [VSF-Med:A Vulnerability Scoring Framework for Medical Vision-Language Models](https://arxiv.org/abs/2507.00052) Binesh Sadanandan, Vahid Behzadan -+ [RedCoder: Automated Multi-Turn Red Teaming for Code LLMs](https://arxiv.org//abs/2507.22063) ++ [RedCoder: Automated Multi-Turn Red Teaming for Code LLMs](https://arxiv.org/abs/2507.22063) Wenjie Jacky Mo, Qin Liu, Xiaofei Wen, Dongwon Jung, Hadi Askari, Wenxuan Zhou, Zhe Zhao, Muhao Chen @@ -7143,223 +7143,223 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Kin Kwan Leung, Rasa Hosseinzadeh, Gabriel Loaiza-Ganem # 2025-06-24 -+ [Automated Detection of Pre-training Text in Black-box LLMs](https://arxiv.org//abs/2506.19399) ++ [Automated Detection of Pre-training Text in Black-box LLMs](https://arxiv.org/abs/2506.19399) Ruihan Hu, Yu-Ming Shang, Jiankun Peng, Wei Luo, Yazhe Wang, Xi Zhang -+ [Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy](https://arxiv.org//abs/2506.19486) ++ [Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy](https://arxiv.org/abs/2506.19486) Zhihao Sui, Liang Hu, Jian Cao, Dora D. Liu, Usman Naseem, Zhongyuan Lai, Qi Zhang -+ [PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty](https://arxiv.org//abs/2506.19563) ++ [PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty](https://arxiv.org/abs/2506.19563) Jinwen He, Yiyang Lu, Zijin Lin, Kai Chen, Yue Zhao -+ [MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models](https://arxiv.org//abs/2506.19257) ++ [MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models](https://arxiv.org/abs/2506.19257) Yinan Xia, Yilei Jiang, Yingshui Tan, Xiaoyong Zhu, Xiangyu Yue, Bo Zheng -+ [Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation](https://arxiv.org//abs/2506.19267) ++ [Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation](https://arxiv.org/abs/2506.19267) Weichen Zhang, Dong Xu, Wanli Ouyang, Wen Li -+ [Assessing Risk of Stealing Proprietary Models for Medical Imaging Tasks](https://arxiv.org//abs/2506.19464) ++ [Assessing Risk of Stealing Proprietary Models for Medical Imaging Tasks](https://arxiv.org/abs/2506.19464) Ankita Raj, Harsh Swaika, Deepankar Varma, Chetan Arora -+ [Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks](https://arxiv.org//abs/2506.19533) ++ [Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks](https://arxiv.org/abs/2506.19533) Ankita Raj, Ambar Pal, Chetan Arora -+ [SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation](https://arxiv.org//abs/2506.19360) ++ [SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation](https://arxiv.org/abs/2506.19360) Yunsung Chung, Yunbei Zhang, Nassir Marrouche, Jihun Hamm -+ [Adversarial Attacks on Deep Learning-Based False Data Injection Detection in Differential Relays](https://arxiv.org//abs/2506.19302) ++ [Adversarial Attacks on Deep Learning-Based False Data Injection Detection in Differential Relays](https://arxiv.org/abs/2506.19302) Ahmad Mohammad Saber, Aditi Maheshwari, Amr Youssef, Deepa Kundur -+ [Network Structures as an Attack Surface: Topology-Based Privacy Leakage in Federated Learning](https://arxiv.org//abs/2506.19260) ++ [Network Structures as an Attack Surface: Topology-Based Privacy Leakage in Federated Learning](https://arxiv.org/abs/2506.19260) Murtaza Rangwala, Richard O. Sinnott, Rajkumar Buyya -+ [KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs](https://arxiv.org//abs/2506.19802) ++ [KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs](https://arxiv.org/abs/2506.19802) Xin Fan Guo, Albert Merono Penuela, Sergio Maffeis, Fabio Pierazzi -+ [Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models](https://arxiv.org//abs/2506.19889) ++ [Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models](https://arxiv.org/abs/2506.19889) Wanli Peng, Xin Chen, Hang Fu, XinYu He, Xue Yiming, Juan Wen -+ [RepuNet: A Reputation System for Mitigating Malicious Clients in DFL](https://arxiv.org//abs/2506.19892) ++ [RepuNet: A Reputation System for Mitigating Malicious Clients in DFL](https://arxiv.org/abs/2506.19892) Isaac Marroqui Penalva, Enrique Tomás Martínez Beltrán, Manuel Gil Pérez, Alberto Huertas Celdrán -+ [Diffusion-based Task-oriented Semantic Communications with Model Inversion Attack](https://arxiv.org//abs/2506.19886) ++ [Diffusion-based Task-oriented Semantic Communications with Model Inversion Attack](https://arxiv.org/abs/2506.19886) Xuesong Wang, Mo Li, Xingyan Shi, Zhaoqian Liu, Shenghao Yang -+ [Holmes: Towards Effective and Harmless Model Ownership Verification to Personalized Large Vision Models via Decoupling Common Features](https://arxiv.org//abs/2507.00724) ++ [Holmes: Towards Effective and Harmless Model Ownership Verification to Personalized Large Vision Models via Decoupling Common Features](https://arxiv.org/abs/2507.00724) Linghui Zhu, Yiming Li, Haiqin Weng, Yan Liu, Tianwei Zhang, Shu-Tao Xia, Zhi Wang -+ [Robust Behavior Cloning Via Global Lipschitz Regularization](https://arxiv.org//abs/2506.19250) ++ [Robust Behavior Cloning Via Global Lipschitz Regularization](https://arxiv.org/abs/2506.19250) Shili Wu, Yizhao Jin, Puhua Niu, Aniruddha Datta, Sean B. Andersson -+ [Model Guidance via Robust Feature Attribution](https://arxiv.org//abs/2506.19680) ++ [Model Guidance via Robust Feature Attribution](https://arxiv.org/abs/2506.19680) Mihnea Ghitu, Vihari Piratla, Matthew Wicker # 2025-06-23 -+ [Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability](https://arxiv.org//abs/2506.18248) ++ [Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability](https://arxiv.org/abs/2506.18248) Jongoh Jeong, Hunmin Yang, Jaeseok Jeong, Kuk-Jin Yoon -+ [Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies](https://arxiv.org//abs/2506.18304) ++ [Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies](https://arxiv.org/abs/2506.18304) Junchao Fan, Xuyang Lei, Xiaolin Chang -+ [Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks](https://arxiv.org//abs/2506.18543) ++ [Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks](https://arxiv.org/abs/2506.18543) Xiaodong Wu, Xiangman Li, Jianbing Ni -+ [SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds](https://arxiv.org//abs/2506.18591) ++ [SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds](https://arxiv.org/abs/2506.18591) Mauricio Byrd Victorica, György Dán, Henrik Sandberg -+ [A Multi-view Divergence-Convergence Feature Augmentation Framework for Drug-related Microbes Prediction](https://arxiv.org//abs/2506.18797) ++ [A Multi-view Divergence-Convergence Feature Augmentation Framework for Drug-related Microbes Prediction](https://arxiv.org/abs/2506.18797) Xin An, Ruijie Li, Qiao Ning, Shikai Guo, Hui Li, Qian Ma -+ [Multi-Agent Online Control with Adversarial Disturbances](https://arxiv.org//abs/2506.18814) ++ [Multi-Agent Online Control with Adversarial Disturbances](https://arxiv.org/abs/2506.18814) Anas Barakat, John Lazarsfeld, Georgios Piliouras, Antonios Varvitsiotis -+ [DUMB and DUMBer: Is Adversarial Training Worth It in the Real World?](https://arxiv.org//abs/2506.18516) ++ [DUMB and DUMBer: Is Adversarial Training Worth It in the Real World?](https://arxiv.org/abs/2506.18516) Francesco Marchiori, Marco Alecci, Luca Pajola, Mauro Conti -+ [Amplifying Machine Learning Attacks Through Strategic Compositions](https://arxiv.org//abs/2506.18870) ++ [Amplifying Machine Learning Attacks Through Strategic Compositions](https://arxiv.org/abs/2506.18870) Yugeng Liu, Zheng Li, Hai Huang, Michael Backes, Yang Zhang -+ [Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems](https://arxiv.org//abs/2506.19109) ++ [Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems](https://arxiv.org/abs/2506.19109) Valerii Gakh, Hayretdin Bahsi -+ [NIC-RobustBench: A Comprehensive Open-Source Toolkit for Neural Image Compression and Robustness Analysis](https://arxiv.org//abs/2506.19051) ++ [NIC-RobustBench: A Comprehensive Open-Source Toolkit for Neural Image Compression and Robustness Analysis](https://arxiv.org/abs/2506.19051) Georgii Bychkov, Khaled Abud, Egor Kovalev, Alexander Gushchin, Dmitriy Vatolin, Anastasia Antsiferova -+ [Towards Provable (In)Secure Model Weight Release Schemes](https://arxiv.org//abs/2506.19874) ++ [Towards Provable (In)Secure Model Weight Release Schemes](https://arxiv.org/abs/2506.19874) Xing Yang, Bingtao Wang, Yuhao Wang, Zimo Ji, Terry Jingchen Zhang, Wenyuan Jiang # 2025-06-22 -+ [Multi-turn Jailbreaking via Global Refinement and Active Fabrication](https://arxiv.org//abs/2506.17881) ++ [Multi-turn Jailbreaking via Global Refinement and Active Fabrication](https://arxiv.org/abs/2506.17881) Hua Tang, Lingyong Yan, Yukun Zhao, Shuaiqiang Wang, Jizhou Huang, Dawei Yin -+ [Federated Learning-Based Data Collaboration Method for Enhancing Edge Cloud AI System Security Using Large Language Models](https://arxiv.org//abs/2506.18087) ++ [Federated Learning-Based Data Collaboration Method for Enhancing Edge Cloud AI System Security Using Large Language Models](https://arxiv.org/abs/2506.18087) Huaiying Luo, Cheng Ji -+ [$ϕ^{\infty}$: Clause Purification, Embedding Realignment, and the Total Suppression of the Em Dash in Autoregressive Language Models](https://arxiv.org//abs/2506.18129) ++ [$ϕ^{\infty}$: Clause Purification, Embedding Realignment, and the Total Suppression of the Em Dash in Autoregressive Language Models](https://arxiv.org/abs/2506.18129) Bugra Kilictas, Faruk Alpay -+ [Targeted False Positive Synthesis via Detector-guided Adversarial Diffusion Attacker for Robust Polyp Detection](https://arxiv.org//abs/2506.18134) ++ [Targeted False Positive Synthesis via Detector-guided Adversarial Diffusion Attacker for Robust Polyp Detection](https://arxiv.org/abs/2506.18134) Quan Zhou, Gan Luo, Qiang Hu, Qingyong Zhang, Jinhua Zhang, Yinjiao Tian, Qiang Li, Zhiwei Wang -+ [DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation](https://arxiv.org//abs/2506.17874) ++ [DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation](https://arxiv.org/abs/2506.17874) Jiaming Hu, Debarghya Mukherjee, Ioannis Ch. Paschalidis -+ [Generalization under Byzantine & Poisoning Attacks: Tight Stability Bounds in Robust Distributed Learning](https://arxiv.org//abs/2506.18020) ++ [Generalization under Byzantine & Poisoning Attacks: Tight Stability Bounds in Robust Distributed Learning](https://arxiv.org/abs/2506.18020) Thomas Boudou, Batiste Le Bars, Nirupam Gupta, Aurélien Bellet -+ [An Attack Method for Medical Insurance Claim Fraud Detection based on Generative Adversarial Network](https://arxiv.org//abs/2506.19871) ++ [An Attack Method for Medical Insurance Claim Fraud Detection based on Generative Adversarial Network](https://arxiv.org/abs/2506.19871) Yining Pang, Chenghan Li # 2025-06-21 -+ [Exploiting Efficiency Vulnerabilities in Dynamic Deep Learning Systems](https://arxiv.org//abs/2506.17621) ++ [Exploiting Efficiency Vulnerabilities in Dynamic Deep Learning Systems](https://arxiv.org/abs/2506.17621) Ravishka Rathnasuriya, Wei Yang -+ [Optimization-Free Patch Attack on Stereo Depth Estimation](https://arxiv.org//abs/2506.17632) ++ [Optimization-Free Patch Attack on Stereo Depth Estimation](https://arxiv.org/abs/2506.17632) Hangcheng Liu, Xu Kuang, Xingshuo Han, Xingwan Wu, Haoran Ou, Shangwei Guo, Xingyi Huang, Tao Xiang, Tianwei Zhang -+ [CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition](https://arxiv.org//abs/2506.17709) ++ [CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition](https://arxiv.org/abs/2506.17709) Zebin Wang, Menghan Lin, Bolin Shen, Ken Anderson, Molei Liu, Tianxi Cai, Yushun Dong -+ [AdRo-FL: Informed and Secure Client Selection for Federated Learning in the Presence of Adversarial Aggregator](https://arxiv.org//abs/2506.17805) ++ [AdRo-FL: Informed and Secure Client Selection for Federated Learning in the Presence of Adversarial Aggregator](https://arxiv.org/abs/2506.17805) Md. Kamrul Hossain, Walid Aljoby, Anis Elgabli, Ahmed M. Abdelmoniem, Khaled A. Harras -+ [LastingBench: Defend Benchmarks Against Knowledge Leakage](https://arxiv.org//abs/2506.21614) ++ [LastingBench: Defend Benchmarks Against Knowledge Leakage](https://arxiv.org/abs/2506.21614) Yixiong Fang, Tianran Sun, Yuling Shi, Min Wang, Xiaodong Gu # 2025-06-20 -+ [TriCon-SF: A Triple-Shuffle and Contribution-Aware Serial Federated Learning Framework for Heterogeneous Healthcare Data](https://arxiv.org//abs/2506.16723) ++ [TriCon-SF: A Triple-Shuffle and Contribution-Aware Serial Federated Learning Framework for Heterogeneous Healthcare Data](https://arxiv.org/abs/2506.16723) Yuping Yan, Yizhi Wang, Yuanshuai Li, Yaochu Jin -+ [Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation](https://arxiv.org//abs/2506.16753) ++ [Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation](https://arxiv.org/abs/2506.16753) Kosuke Nakanishi, Akihiro Kubo, Yuji Yasui, Shin Ishii -+ [MIST: Jailbreaking Black-box Large Language Models via Iterative Semantic Tuning](https://arxiv.org//abs/2506.16792) ++ [MIST: Jailbreaking Black-box Large Language Models via Iterative Semantic Tuning](https://arxiv.org/abs/2506.16792) Muyang Zheng, Yuanzhi Yao, Changting Lin, Rui Wang, Meng Han -+ [Robust Training with Data Augmentation for Medical Imaging Classification](https://arxiv.org//abs/2506.17133) ++ [Robust Training with Data Augmentation for Medical Imaging Classification](https://arxiv.org/abs/2506.17133) Josué Martínez-Martínez, Olivia Brown, Mostafa Karami, Sheida Nabavi -+ [Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models](https://arxiv.org//abs/2506.16760) ++ [Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models](https://arxiv.org/abs/2506.16760) Lei Jiang, Zixun Zhang, Zizhou Wang, Xiaobing Sun, Zhen Li, Liangli Zhen, Xiaohua Xu -+ [Better Language Model Inversion by Compactly Representing Next-Token Distributions](https://arxiv.org//abs/2506.17090) ++ [Better Language Model Inversion by Compactly Representing Next-Token Distributions](https://arxiv.org/abs/2506.17090) Murtaza Nazir, Matthew Finlayson, John X. Morris, Xiang Ren, Swabha Swayamdipta -+ [DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches](https://arxiv.org//abs/2506.16690) ++ [DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches](https://arxiv.org/abs/2506.16690) Yun Xing, Yue Cao, Nhat Chung, Jie Zhang, Ivor Tsang, Ming-Ming Cheng, Yang Liu, Lei Ma, Qing Guo -+ [Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance](https://arxiv.org//abs/2506.17040) ++ [Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance](https://arxiv.org/abs/2506.17040) Lorenzo Tausani, Paolo Muratore, Morgan B. Talbot, Giacomo Amerio, Gabriel Kreiman, Davide Zoccolan -+ [Navigating the Deep: Signature Extraction on Deep Neural Networks](https://arxiv.org//abs/2506.17047) ++ [Navigating the Deep: Signature Extraction on Deep Neural Networks](https://arxiv.org/abs/2506.17047) Haolin Liu, Adrien Siproudhis, Samuel Experton, Peter Lorenz, Christina Boura, Thomas Peyrin -+ [Analyzing PDFs like Binaries: Adversarially Robust PDF Malware Analysis via Intermediate Representation and Language Model](https://arxiv.org//abs/2506.17162) ++ [Analyzing PDFs like Binaries: Adversarially Robust PDF Malware Analysis via Intermediate Representation and Language Model](https://arxiv.org/abs/2506.17162) Side Liu, Jiang Ming, Guodong Zhou, Xinyi Liu, Jianming Fu, Guojun Peng -+ [CUBA: Controlled Untargeted Backdoor Attack against Deep Neural Networks](https://arxiv.org//abs/2506.17350) ++ [CUBA: Controlled Untargeted Backdoor Attack against Deep Neural Networks](https://arxiv.org/abs/2506.17350) Yinghao Wu, Liyan Zhang -+ [Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMs](https://arxiv.org//abs/2506.17353) ++ [Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMs](https://arxiv.org/abs/2506.17353) Zongjie Li, Daoyuan Wu, Shuai Wang, Zhendong Su -+ [SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification](https://arxiv.org//abs/2506.17368) ++ [SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification](https://arxiv.org/abs/2506.17368) Zhenglin Lai, Mengyao Liao, Dong Xu, Zebin Zhao, Zhihang Yuan, Chao Fan, Jianqiang Li, Bingzhe Wu -+ [A workflow for generating synthetic LiDAR datasets in simulation environments](https://arxiv.org//abs/2506.17378) ++ [A workflow for generating synthetic LiDAR datasets in simulation environments](https://arxiv.org/abs/2506.17378) Abhishek Phadke, Shakib Mahmud Dipto, Pratip Rana @@ -7368,120 +7368,120 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Zhenglin Lai, Mengyao Liao, Bingzhe Wu, Dong Xu, Zebin Zhao, Zhihang Yuan, Chao Fan, Jianqiang Li # 2025-06-19 -+ [Probing the Robustness of Large Language Models Safety to Latent Perturbations](https://arxiv.org//abs/2506.16078) ++ [Probing the Robustness of Large Language Models Safety to Latent Perturbations](https://arxiv.org/abs/2506.16078) Tianle Gu, Kexin Huang, Zongqi Wang, Yixu Wang, Jie Li, Yuanqi Yao, Yang Yao, Yujiu Yang, Yan Teng, Yingchun Wang -+ [Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks](https://arxiv.org//abs/2506.16407) ++ [Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks](https://arxiv.org/abs/2506.16407) Dong Nguyen Tien, Dung D. Le -+ [Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors](https://arxiv.org//abs/2506.16497) ++ [Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors](https://arxiv.org/abs/2506.16497) Riccardo Ziglio, Cecilia Pasquini, Silvio Ranise -+ [Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation](https://arxiv.org//abs/2506.16636) ++ [Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation](https://arxiv.org/abs/2506.16636) Rex Shen, Lu Tian -+ [PL-Guard: Benchmarking Language Model Safety for Polish](https://arxiv.org//abs/2506.16322) ++ [PL-Guard: Benchmarking Language Model Safety for Polish](https://arxiv.org/abs/2506.16322) Aleksandra Krasnodębska, Karolina Seweryn, Szymon Łukasik, Wojciech Kusa -+ [Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models](https://arxiv.org//abs/2506.16447) ++ [Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models](https://arxiv.org/abs/2506.16447) Biao Yi, Tiansheng Huang, Sishuo Chen, Tong Li, Zheli Liu, Zhixuan Chu, Yiming Li -+ [Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot Navigation](https://arxiv.org//abs/2506.15988) ++ [Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot Navigation](https://arxiv.org/abs/2506.15988) Connor Malone, Owen Claxton, Iman Shames, Michael Milford -+ [MBA: Multimodal Bidirectional Attack for Referring Expression Segmentation Models](https://arxiv.org//abs/2506.16157) ++ [MBA: Multimodal Bidirectional Attack for Referring Expression Segmentation Models](https://arxiv.org/abs/2506.16157) Xingbai Chen, Tingchao Fu, Renyang Liu, Wei Zhou, Chao Yi -+ [Black-Box Privacy Attacks on Shared Representations in Multitask Learning](https://arxiv.org//abs/2506.16460) ++ [Black-Box Privacy Attacks on Shared Representations in Multitask Learning](https://arxiv.org/abs/2506.16460) John Abascal, Nicolás Berrios, Alina Oprea, Jonathan Ullman, Adam Smith, Matthew Jagielski -+ [Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU](https://arxiv.org//abs/2506.16548) ++ [Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU](https://arxiv.org/abs/2506.16548) Arjun Dosajh, Mihika Sanghi -+ [SecureFed: A Two-Phase Framework for Detecting Malicious Clients in Federated Learning](https://arxiv.org//abs/2506.16458) ++ [SecureFed: A Two-Phase Framework for Detecting Malicious Clients in Federated Learning](https://arxiv.org/abs/2506.16458) Likhitha Annapurna Kavuri, Akshay Mhatre, Akarsh K Nair, Deepti Gupta -+ [FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models](https://arxiv.org//abs/2506.16218) ++ [FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models](https://arxiv.org/abs/2506.16218) Xinting Liao, Weiming Liu, Jiaming Qian, Pengyang Zhou, Jiahe Xu, Wenjie Wang, Chaochao Chen, Xiaolin Zheng, Tat-Seng Chua -+ [From Teacher to Student: Tracking Memorization Through Model Distillation](https://arxiv.org//abs/2506.16170) ++ [From Teacher to Student: Tracking Memorization Through Model Distillation](https://arxiv.org/abs/2506.16170) Simardeep Singh -+ [PRISON: Unmasking the Criminal Potential of Large Language Models](https://arxiv.org//abs/2506.16150) ++ [PRISON: Unmasking the Criminal Potential of Large Language Models](https://arxiv.org/abs/2506.16150) Xinyi Wu, Geng Hong, Pei Chen, Yueyue Chen, Xudong Pan, Min Yang # 2025-06-18 -+ [RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments](https://arxiv.org//abs/2506.15253) ++ [RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments](https://arxiv.org/abs/2506.15253) Yuchuan Fu, Xiaohan Yuan, Dongxia Wang -+ [Pixel-level Certified Explanations via Randomized Smoothing](https://arxiv.org//abs/2506.15499) ++ [Pixel-level Certified Explanations via Randomized Smoothing](https://arxiv.org/abs/2506.15499) Alaa Anani, Tobias Lorenz, Mario Fritz, Bernt Schiele -+ [LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning](https://arxiv.org//abs/2506.15606) ++ [LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning](https://arxiv.org/abs/2506.15606) Gabrel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong -+ [Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers](https://arxiv.org//abs/2506.15674) ++ [Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers](https://arxiv.org/abs/2506.15674) Tommaso Green, Martin Gubri, Haritz Puerto, Sangdoo Yun, Seong Joon Oh -+ [Approximating Language Model Training Data from Weights](https://arxiv.org//abs/2506.15553) ++ [Approximating Language Model Training Data from Weights](https://arxiv.org/abs/2506.15553) John X. Morris, Junjie Oscar Yin, Woojeong Kim, Vitaly Shmatikov, Alexander M. Rush -+ [Privacy-Shielded Image Compression: Defending Against Exploitation from Vision-Language Pretrained Models](https://arxiv.org//abs/2506.15201) ++ [Privacy-Shielded Image Compression: Defending Against Exploitation from Vision-Language Pretrained Models](https://arxiv.org/abs/2506.15201) Xuelin Shen, Jiayin Xu, Kangsheng Yin, Wenhan Yang -+ [ImprovDML: Improved Trade-off in Private Byzantine-Resilient Distributed Machine Learning](https://arxiv.org//abs/2506.15181) ++ [ImprovDML: Improved Trade-off in Private Byzantine-Resilient Distributed Machine Learning](https://arxiv.org/abs/2506.15181) Bing Liu, Chengcheng Zhao, Li Chai, Peng Cheng, Yaonan Wang -+ [Enhancing One-run Privacy Auditing with Quantile Regression-Based Membership Inference](https://arxiv.org//abs/2506.15349) ++ [Enhancing One-run Privacy Auditing with Quantile Regression-Based Membership Inference](https://arxiv.org/abs/2506.15349) Terrance Liu, Matteo Boglioni, Yiwei Fu, Shengyuan Hu, Pratiksha Thaker, Zhiwei Steven Wu -+ [Insights on Adversarial Attacks for Tabular Machine Learning via a Systematic Literature Review](https://arxiv.org//abs/2506.15506) ++ [Insights on Adversarial Attacks for Tabular Machine Learning via a Systematic Literature Review](https://arxiv.org/abs/2506.15506) Salijona Dyrmishi, Mohamed Djilani, Thibault Simonetto, Salah Ghamizi, Maxime Cordy -+ [PDLRecover: Privacy-preserving Decentralized Model Recovery with Machine Unlearning](https://arxiv.org//abs/2506.15112) ++ [PDLRecover: Privacy-preserving Decentralized Model Recovery with Machine Unlearning](https://arxiv.org/abs/2506.15112) Xiangman Li, Xiaodong Wu, Jianbing Ni, Mohamed Mahmoud, Maazen Alsabaan -+ [From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem](https://arxiv.org//abs/2506.15170) ++ [From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem](https://arxiv.org/abs/2506.15170) Yanxu Mao, Tiehan Cui, Peipei Liu, Datao You, Hongsong Zhu -+ [Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts](https://arxiv.org//abs/2506.15751) ++ [Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts](https://arxiv.org/abs/2506.15751) Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar -+ [VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service](https://arxiv.org//abs/2506.15755) ++ [VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service](https://arxiv.org/abs/2506.15755) Xiasi Wang, Tianliang Yao, Simin Chen, Runqi Wang, Lei YE, Kuofeng Gao, Yi Huang, Yuan Yao -+ [Context manipulation attacks : Web agents are susceptible to corrupted memory](https://arxiv.org//abs/2506.17318) ++ [Context manipulation attacks : Web agents are susceptible to corrupted memory](https://arxiv.org/abs/2506.17318) Atharv Singh Patlan, Ashwin Hebbar, Pramod Viswanath, Prateek Mittal -+ [PolyGuard: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset](https://arxiv.org//abs/2506.19054) ++ [PolyGuard: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset](https://arxiv.org/abs/2506.19054) Mintong Kang, Zhaorun Chen, Chejian Xu, Jiawei Zhang, Chengquan Guo, Minzhou Pan, Ivan Revilla, Yu Sun, Bo Li @@ -7490,104 +7490,104 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Gabriel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong # 2025-06-17 -+ [Frequency-Calibrated Membership Inference Attacks on Medical Image Diffusion Models](https://arxiv.org//abs/2506.14919) ++ [Frequency-Calibrated Membership Inference Attacks on Medical Image Diffusion Models](https://arxiv.org/abs/2506.14919) Xinkai Zhao, Yuta Tokuoka, Junichiro Iwasawa, Keita Oda -+ [Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning](https://arxiv.org//abs/2506.14913) ++ [Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning](https://arxiv.org/abs/2506.14913) Wassim Bouaziz, Mathurin Videau, Nicolas Usunier, El-Mahdi El-Mhamdi -+ [RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?](https://arxiv.org//abs/2506.14261) ++ [RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?](https://arxiv.org/abs/2506.14261) Rohan Gupta, Erik Jenner -+ [LLM Jailbreak Oracle](https://arxiv.org//abs/2506.17299) ++ [LLM Jailbreak Oracle](https://arxiv.org/abs/2506.17299) Shuyi Lin, Anshuman Suri, Alina Oprea, Cheng Tan -+ [Doppelganger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack](https://arxiv.org//abs/2506.14539) ++ [Doppelganger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack](https://arxiv.org/abs/2506.14539) Daewon Kang, YeongHwan Shin, Doyeon Kim, Kyu-Hwan Jung, Meong Hi Son -+ [ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models](https://arxiv.org//abs/2507.00026) ++ [ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models](https://arxiv.org/abs/2507.00026) Jiale Ding, Xiang Zheng, Cong Wang, Wei-Bin Lee, Xingjun Ma, Yu-Gang Jiang -+ [AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions](https://arxiv.org//abs/2506.14697) ++ [AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions](https://arxiv.org/abs/2506.14697) Aishan Liu, Zonghao Ying, Le Wang, Junjie Mu, Jinyang Guo, Jiakai Wang, Yuqing Ma, Siyuan Liang, Mingchuan Zhang, Xianglong Liu, Dacheng Tao -+ [Empirical Evidence for Alignment Faking in a Small LLM and Prompt-Based Mitigation Techniques](https://arxiv.org//abs/2506.21584) ++ [Empirical Evidence for Alignment Faking in a Small LLM and Prompt-Based Mitigation Techniques](https://arxiv.org/abs/2506.21584) J. Koorndijk -+ [HiLight: A Hierarchical Reinforcement Learning Framework with Global Adversarial Guidance for Large-Scale Traffic Signal Control](https://arxiv.org//abs/2506.14391) ++ [HiLight: A Hierarchical Reinforcement Learning Framework with Global Adversarial Guidance for Large-Scale Traffic Signal Control](https://arxiv.org/abs/2506.14391) Yaqiao Zhu, Hongkai Wen, Geyong Min, Man Luo # 2025-06-16 -+ [Navigating the Black Box: Leveraging LLMs for Effective Text-Level Graph Injection Attacks](https://arxiv.org//abs/2506.13276) ++ [Navigating the Black Box: Leveraging LLMs for Effective Text-Level Graph Injection Attacks](https://arxiv.org/abs/2506.13276) Yuefei Lyu, Chaozhuo Li, Xi Zhang, Tianle Zhang -+ [Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models](https://arxiv.org//abs/2506.13726) ++ [Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models](https://arxiv.org/abs/2506.13726) Arjun Krishna, Aaditya Rastogi, Erick Galinkin -+ [CertDW: Towards Certified Dataset Ownership Verification via Conformal Prediction](https://arxiv.org//abs/2506.13160) ++ [CertDW: Towards Certified Dataset Ownership Verification via Conformal Prediction](https://arxiv.org/abs/2506.13160) Ting Qiao, Yiming Li, Jianbin Li, Yingjia Wang, Leyi Qi, Junfeng Guo, Ruili Feng, Dacheng Tao -+ [Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments](https://arxiv.org//abs/2506.13205) ++ [Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments](https://arxiv.org/abs/2506.13205) Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang -+ [Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models](https://arxiv.org//abs/2506.13206) ++ [Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models](https://arxiv.org/abs/2506.13206) James Chua, Jan Betley, Mia Taylor, Owain Evans -+ [LapDDPM: A Conditional Graph Diffusion Model for scRNA-seq Generation with Spectral Adversarial Perturbations](https://arxiv.org//abs/2506.13344) ++ [LapDDPM: A Conditional Graph Diffusion Model for scRNA-seq Generation with Spectral Adversarial Perturbations](https://arxiv.org/abs/2506.13344) Lorenzo Bini, Stephane Marchand-Maillet -+ [EBS-CFL: Efficient and Byzantine-robust Secure Clustered Federated Learning](https://arxiv.org//abs/2506.13612) ++ [EBS-CFL: Efficient and Byzantine-robust Secure Clustered Federated Learning](https://arxiv.org/abs/2506.13612) Zhiqiang Li, Haiyong Bao, Menghong Guan, Hao Pan, Cheng Huang, Hong-Ning Dai -+ [Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs](https://arxiv.org//abs/2506.13285) ++ [Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs](https://arxiv.org/abs/2506.13285) Houcheng Jiang, Zetong Zhao, Junfeng Fang, Haokai Ma, Ruipeng Wang, Yang Deng, Xiang Wang, Xiangnan He -+ [Perfect Privacy for Discriminator-Based Byzantine-Resilient Federated Learning](https://arxiv.org//abs/2506.13561) ++ [Perfect Privacy for Discriminator-Based Byzantine-Resilient Federated Learning](https://arxiv.org/abs/2506.13561) Yue Xia, Christoph Hofmeister, Maximilian Egger, Rawad Bitar -+ [Rectifying Privacy and Efficacy Measurements in Machine Unlearning: A New Inference Attack Perspective](https://arxiv.org//abs/2506.13009) ++ [Rectifying Privacy and Efficacy Measurements in Machine Unlearning: A New Inference Attack Perspective](https://arxiv.org/abs/2506.13009) Nima Naderloui, Shenao Yan, Binghui Wang, Jie Fu, Wendy Hui Wang, Weiran Liu, Yuan Hong -+ [Position: Certified Robustness Does Not (Yet) Imply Model Security](https://arxiv.org//abs/2506.13024) ++ [Position: Certified Robustness Does Not (Yet) Imply Model Security](https://arxiv.org/abs/2506.13024) Andrew C. Cullen, Paul Montague, Sarah M. Erfani, Benjamin I.P. Rubinstein -+ [From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs](https://arxiv.org//abs/2506.13434) ++ [From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs](https://arxiv.org/abs/2506.13434) Alsharif Abuadbba, Chris Hicks, Kristen Moore, Vasilios Mavroudis, Burak Hasircioglu, Diksha Goel, Piers Jennings -+ [Unlearning-Enhanced Website Fingerprinting Attack: Against Backdoor Poisoning in Anonymous Networks](https://arxiv.org//abs/2506.13563) ++ [Unlearning-Enhanced Website Fingerprinting Attack: Against Backdoor Poisoning in Anonymous Networks](https://arxiv.org/abs/2506.13563) Yali Yuan, Kai Xu, Ruolin Ma, Yuchen Zhang -+ [Theoretically Unmasking Inference Attacks Against LDP-Protected Clients in Federated Vision Models](https://arxiv.org//abs/2506.17292) ++ [Theoretically Unmasking Inference Attacks Against LDP-Protected Clients in Federated Vision Models](https://arxiv.org/abs/2506.17292) Quan Nguyen, Minh N. Vu, Truc Nguyen, My T. Thai -+ [Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble](https://arxiv.org//abs/2506.13972) ++ [Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble](https://arxiv.org/abs/2506.13972) Zhiqi Wang, Chengyu Zhang, Yuetian Chen, Nathalie Baracaldo, Swanand Kadhe, Lei Yu -+ [Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs](https://arxiv.org//abs/2506.14003) ++ [Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs](https://arxiv.org/abs/2506.14003) Yiwei Chen, Soumyadeep Pal, Yimeng Zhang, Qing Qu, Sijia Liu @@ -7595,60 +7595,60 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Minjae Lee, Yoonjae Jung, Sangdon Park -+ [Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder](https://arxiv.org//abs/2506.13658) ++ [Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder](https://arxiv.org/abs/2506.13658) Ioannis Christoforos Koune, Alice Cicirello # 2025-06-15 -+ [Constraint-Guided Prediction Refinement via Deterministic Diffusion Trajectories](https://arxiv.org//abs/2506.12911) ++ [Constraint-Guided Prediction Refinement via Deterministic Diffusion Trajectories](https://arxiv.org/abs/2506.12911) Pantelis Dogoulis, Fabien Bernier, Félix Fourreau, Karim Tit, Maxime Cordy -+ [Alphabet Index Mapping: Jailbreaking LLMs through Semantic Dissimilarity](https://arxiv.org//abs/2506.12685) ++ [Alphabet Index Mapping: Jailbreaking LLMs through Semantic Dissimilarity](https://arxiv.org/abs/2506.12685) Bilal Saleh Husain -+ [NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models](https://arxiv.org//abs/2506.12706) ++ [NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models](https://arxiv.org/abs/2506.12706) Jiaming Zhang, Xin Wang, Xingjun Ma, Lingyu Qiu, Yu-Gang Jiang, Jitao Sang -+ [Privacy-Preserving Federated Learning against Malicious Clients Based on Verifiable Functional Encryption](https://arxiv.org//abs/2506.12846) ++ [Privacy-Preserving Federated Learning against Malicious Clients Based on Verifiable Functional Encryption](https://arxiv.org/abs/2506.12846) Nina Cai, Jinguang Han -+ [Transforming Chatbot Text: A Sequence-to-Sequence Approach](https://arxiv.org//abs/2506.12843) ++ [Transforming Chatbot Text: A Sequence-to-Sequence Approach](https://arxiv.org/abs/2506.12843) Natesh Reddy, Mark Stamp -+ [SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression](https://arxiv.org//abs/2506.12707) ++ [SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression](https://arxiv.org/abs/2506.12707) Yucheng Li, Surin Ahn, Huiqiang Jiang, Amir H. Abdi, Yuqing Yang, Lili Qiu -+ [Active Adversarial Noise Suppression for Image Forgery Localization](https://arxiv.org//abs/2506.12871) ++ [Active Adversarial Noise Suppression for Image Forgery Localization](https://arxiv.org/abs/2506.12871) Rongxuan Peng, Shunquan Tan, Xianbo Mo, Alex C. Kot, Jiwu Huang -+ [Intriguing Frequency Interpretation of Adversarial Robustness for CNNs and ViTs](https://arxiv.org//abs/2506.12875) ++ [Intriguing Frequency Interpretation of Adversarial Robustness for CNNs and ViTs](https://arxiv.org/abs/2506.12875) Lu Chen, Han Yang, Hu Wang, Yuxin Cao, Shaofeng Li, Yuan Luo -+ [Free Privacy Protection for Wireless Federated Learning: Enjoy It or Suffer from It?](https://arxiv.org//abs/2506.12749) ++ [Free Privacy Protection for Wireless Federated Learning: Enjoy It or Suffer from It?](https://arxiv.org/abs/2506.12749) Weicai Li, Tiejun Lv, Xiyu Zhao, Xin Yuan, Wei Ni -+ [TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models](https://arxiv.org//abs/2506.12815) ++ [TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models](https://arxiv.org/abs/2506.12815) Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Xiaochun Cao, Shouling Ji, Jiaheng Zhang, Jincai Huang, Li Shen -+ [Jailbreak Strength and Model Similarity Predict Transferability](https://arxiv.org//abs/2506.12913) ++ [Jailbreak Strength and Model Similarity Predict Transferability](https://arxiv.org/abs/2506.12913) Rico Angell, Jannik Brinkmann, He He -+ [Universal Jailbreak Suffixes Are Strong Attention Hijackers](https://arxiv.org//abs/2506.12880) ++ [Universal Jailbreak Suffixes Are Strong Attention Hijackers](https://arxiv.org/abs/2506.12880) Matan Ben-Tov, Mor Geva, Mahmood Sharif -+ [The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language Models](https://arxiv.org//abs/2506.15734) ++ [The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language Models](https://arxiv.org/abs/2506.15734) Peiyuan Tang, Haojie Xin, Xiaodong Zhang, Jun Sun, Qin Xia, Zijiang Yang @@ -7657,117 +7657,117 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Changsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu # 2025-06-14 -+ [MEraser: An Effective Fingerprint Erasure Approach for Large Language Models](https://arxiv.org//abs/2506.12551) ++ [MEraser: An Effective Fingerprint Erasure Approach for Large Language Models](https://arxiv.org/abs/2506.12551) Jingxuan Zhang, Zhenhua Xu, Rui Hu, Wenpeng Xing, Xuhong Zhang, Meng Han -+ [Image Corruption-Inspired Membership Inference Attacks against Large Vision-Language Models](https://arxiv.org//abs/2506.12340) ++ [Image Corruption-Inspired Membership Inference Attacks against Large Vision-Language Models](https://arxiv.org/abs/2506.12340) Zongyu Wu, Minhua Lin, Zhiwei Zhang, Fali Wang, Xianren Zhang, Xiang Zhang, Suhang Wang -+ [Restoring Gaussian Blurred Face Images for Deanonymization Attacks](https://arxiv.org//abs/2506.12344) ++ [Restoring Gaussian Blurred Face Images for Deanonymization Attacks](https://arxiv.org/abs/2506.12344) Haoyu Zhai, Shuo Wang, Pirouz Naghavi, Qingying Hao, Gang Wang -+ [InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning](https://arxiv.org//abs/2506.12411) ++ [InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning](https://arxiv.org/abs/2506.12411) Mengyuan Sun, Yu Li, Yuchen Liu, Bo Du, Yunjie Ge -+ [Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025](https://arxiv.org//abs/2506.12430) ++ [Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025](https://arxiv.org/abs/2506.12430) Zonghao Ying, Siyang Wu, Run Hao, Peng Ying, Shixuan Sun, Pengyu Chen, Junze Chen, Hao Du, Kaiwen Shen, Shangkun Wu, Jiwei Wei, Shiyuan He, Yang Yang, Xiaohai Xu, Ke Ma, Qianqian Xu, Qingming Huang, Shi Lin, Xun Wang, Changting Lin, Meng Han, Yilei Jiang, Siqi Lai, Yaozhi Zheng, Yifei Song, Xiangyu Yue, Zonglei Jing, Tianyuan Zhang, Zhilei Zhu, Aishan Liu, Jiakai Wang, Siyuan Liang, Xianglong Kong, Hainan Li, Junjie Mu, Haotong Qin, Yue Yu, Lei Chen, Felix Juefei-Xu, Qing Guo, Xinyun Chen, Yew Soon Ong, Xianglong Liu, Dawn Song, Alan Yuille, Philip Torr, Dacheng Tao -+ [Existence of Adversarial Examples for Random Convolutional Networks via Isoperimetric Inequalities on $\mathbb{so}(d)$](https://arxiv.org//abs/2506.12613) ++ [Existence of Adversarial Examples for Random Convolutional Networks via Isoperimetric Inequalities on $\mathbb{so}(d)$](https://arxiv.org/abs/2506.12613) Amit Daniely -+ [On the existence of consistent adversarial attacks in high-dimensional linear classification](https://arxiv.org//abs/2506.12454) ++ [On the existence of consistent adversarial attacks in high-dimensional linear classification](https://arxiv.org/abs/2506.12454) Matteo Vilucchio, Lenka Zdeborová, Bruno Loureiro -+ [Information-theoretic Estimation of the Risk of Privacy Leaks](https://arxiv.org//abs/2506.12328) ++ [Information-theoretic Estimation of the Risk of Privacy Leaks](https://arxiv.org/abs/2506.12328) Kenneth Odoh -+ [Exploiting AI for Attacks: On the Interplay between Adversarial AI and Offensive AI](https://arxiv.org//abs/2506.12519) ++ [Exploiting AI for Attacks: On the Interplay between Adversarial AI and Offensive AI](https://arxiv.org/abs/2506.12519) Saskia Laura Schröer, Luca Pajola, Alberto Castagnaro, Giovanni Apruzzese, Mauro Conti -+ [When Forgetting Triggers Backdoors: A Clean Unlearning Attack](https://arxiv.org//abs/2506.12522) ++ [When Forgetting Triggers Backdoors: A Clean Unlearning Attack](https://arxiv.org/abs/2506.12522) Marco Arazzi, Antonino Nocera, Vinod P -+ [Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models](https://arxiv.org//abs/2506.17279) ++ [Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models](https://arxiv.org/abs/2506.17279) Yash Sinha, Manit Baser, Murari Mandal, Dinil Mon Divakaran, Mohan Kankanhalli -+ [Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding](https://arxiv.org//abs/2506.12336) ++ [Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding](https://arxiv.org/abs/2506.12336) Youze Wang, Zijun Chen, Ruoyu Chen, Shishen Gu, Wenbo Hu, Jiayang Liu, Yinpeng Dong, Hang Su, Jun Zhu, Meng Wang, Richang Hong # 2025-06-13 -+ [LoRA Users Beware: A Few Spurious Tokens Can Manipulate Your Finetuned Model](https://arxiv.org//abs/2506.11402) ++ [LoRA Users Beware: A Few Spurious Tokens Can Manipulate Your Finetuned Model](https://arxiv.org/abs/2506.11402) Pradyut Sekhsaria, Marcel Mateos Salles, Hai Huang, Randall Balestriero -+ [Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models](https://arxiv.org//abs/2506.11521) ++ [Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models](https://arxiv.org/abs/2506.11521) Jinming Wen, Xinyi Wu, Shuai Zhao, Yanhao Jia, Yuwen Li -+ [Differential Privacy in Machine Learning: From Symbolic AI to LLMs](https://arxiv.org//abs/2506.11687) ++ [Differential Privacy in Machine Learning: From Symbolic AI to LLMs](https://arxiv.org/abs/2506.11687) Francisco Aguilera-Martínez, Fernando Berzal -+ [TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks](https://arxiv.org//abs/2506.11844) ++ [TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks](https://arxiv.org/abs/2506.11844) Qihai Zhang, Xinyue Sheng, Yuanfu Sun, Qiaoyu Tan -+ [Attention-based Adversarial Robust Distillation in Radio Signal Classifications for Low-Power IoT Devices](https://arxiv.org//abs/2506.11892) ++ [Attention-based Adversarial Robust Distillation in Radio Signal Classifications for Low-Power IoT Devices](https://arxiv.org/abs/2506.11892) Lu Zhang, Sangarapillai Lambotharan, Gan Zheng, Guisheng Liao, Basil AsSadhan, Fabio Roli -+ [A Neural Rejection System Against Universal Adversarial Perturbations in Radio Signal Classification](https://arxiv.org//abs/2506.11901) ++ [A Neural Rejection System Against Universal Adversarial Perturbations in Radio Signal Classification](https://arxiv.org/abs/2506.11901) Lu Zhang, Sangarapillai Lambotharan, Gan Zheng, Fabio Roli -+ [Improving Large Language Model Safety with Contrastive Representation Learning](https://arxiv.org//abs/2506.11938) ++ [Improving Large Language Model Safety with Contrastive Representation Learning](https://arxiv.org/abs/2506.11938) Samuel Simko, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin -+ [Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs](https://arxiv.org//abs/2506.11415) ++ [Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs](https://arxiv.org/abs/2506.11415) Linlin Wang, Tianqing Zhu, Laiqiao Qin, Longxiang Gao, Wanlei Zhou -+ [On the Natural Robustness of Vision-Language Models Against Visual Perception Attacks in Autonomous Driving](https://arxiv.org//abs/2506.11472) ++ [On the Natural Robustness of Vision-Language Models Against Visual Perception Attacks in Autonomous Driving](https://arxiv.org/abs/2506.11472) Pedram MohajerAnsari, Amir Salarpour, Michael Kühr, Siyu Huang, Mohammad Hamad, Sebastian Steinhorst, Habeeb Olufowobi, Mert D. Pesé -+ [Byzantine Outside, Curious Inside: Reconstructing Data Through Malicious Updates](https://arxiv.org//abs/2506.11413) ++ [Byzantine Outside, Curious Inside: Reconstructing Data Through Malicious Updates](https://arxiv.org/abs/2506.11413) Kai Yue, Richeng Jin, Chau-Wai Wong, Huaiyu Dai -+ [KCES: Training-Free Defense for Robust Graph Neural Networks via Kernel Complexity](https://arxiv.org//abs/2506.11611) ++ [KCES: Training-Free Defense for Robust Graph Neural Networks via Kernel Complexity](https://arxiv.org/abs/2506.11611) Yaning Jia, Shenyang Deng, Chiyu Ma, Yaoqing Yang, Soroush Vosoughi -+ [InfoFlood: Jailbreaking Large Language Models with Information Overload](https://arxiv.org//abs/2506.12274) ++ [InfoFlood: Jailbreaking Large Language Models with Information Overload](https://arxiv.org/abs/2506.12274) Advait Yadav, Haibo Jin, Man Luo, Jun Zhuang, Haohan Wang -+ [EgoPrivacy: What Your First-Person Camera Says About You?](https://arxiv.org//abs/2506.12258) ++ [EgoPrivacy: What Your First-Person Camera Says About You?](https://arxiv.org/abs/2506.12258) Yijiang Li, Genpei Zhang, Jiacheng Cheng, Yi Li, Xiaojun Shan, Dashan Gao, Jiancheng Lyu, Yuan Li, Ning Bi, Nuno Vasconcelos -+ [Vision Transformer with Adversarial Indicator Token against Adversarial Attacks in Radio Signal Classifications](https://arxiv.org//abs/2507.00015) ++ [Vision Transformer with Adversarial Indicator Token against Adversarial Attacks in Radio Signal Classifications](https://arxiv.org/abs/2507.00015) Lu Zhang, Sangarapillai Lambotharan, Gan Zheng, Guisheng Liao, Xuekang Liu, Fabio Roli, Carsten Maple -+ [DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents](https://arxiv.org//abs/2506.12104) ++ [DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents](https://arxiv.org/abs/2506.12104) Hao Li, Xiaogeng Liu, Hung-Chun Chiu, Dianqi Li, Ning Zhang, Chaowei Xiao # 2025-06-12 -+ [Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning](https://arxiv.org//abs/2506.11172) ++ [Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning](https://arxiv.org/abs/2506.11172) Xue Zhou, Dapeng Man, Chen Xu, Fanyi Zeng, Tao Liu, Huan Wang, Shucheng He, Chaoyang Gao, Wu Yang @@ -7827,104 +7827,104 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Xiaobei Yan, Han Qiu, Tianwei Zhang -+ [Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors](https://arxiv.org//abs/2506.10949) ++ [Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors](https://arxiv.org/abs/2506.10949) Chen Yueh-Han, Nitish Joshi, Yulin Chen, Maksym Andriushchenko, Rico Angell, He He -+ [Can We Infer Confidential Properties of Training Data from LLMs?](https://arxiv.org//abs/2506.10364) ++ [Can We Infer Confidential Properties of Training Data from LLMs?](https://arxiv.org/abs/2506.10364) Pengrun Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri -+ [Unsourced Adversarial CAPTCHA: A Bi-Phase Adversarial CAPTCHA Framework](https://arxiv.org//abs/2506.10685) ++ [Unsourced Adversarial CAPTCHA: A Bi-Phase Adversarial CAPTCHA Framework](https://arxiv.org/abs/2506.10685) Xia Du, Xiaoyuan Liu, Jizhe Zhou, Zheng Lin, Chi-man Pun, Cong Wu, Tao Li, Zhe Chen, Wei Ni, Jun Luo -+ [Distributionally-Constrained Adversaries in Online Learning](https://arxiv.org//abs/2506.10293) ++ [Distributionally-Constrained Adversaries in Online Learning](https://arxiv.org/abs/2506.10293) Moïse Blanchard, Samory Kpotufe -+ [Hierarchical Adversarially-Resilient Multi-Agent Reinforcement Learning for Cyber-Physical Systems Security](https://arxiv.org//abs/2506.22445) ++ [Hierarchical Adversarially-Resilient Multi-Agent Reinforcement Learning for Cyber-Physical Systems Security](https://arxiv.org/abs/2506.22445) Saad Alqithami -+ [Boosting Adversarial Transferability for Hyperspectral Image Classification Using 3D Structure-invariant Transformation and Weighted Intermediate Feature Divergence](https://arxiv.org//abs/2506.10459) ++ [Boosting Adversarial Transferability for Hyperspectral Image Classification Using 3D Structure-invariant Transformation and Weighted Intermediate Feature Divergence](https://arxiv.org/abs/2506.10459) Chun Liu, Bingqian Zhu, Tao Xu, Zheng Zheng, Zheng Li, Wei Yang, Zhigang Han, Jiayao Wang -+ [SoK: Evaluating Jailbreak Guardrails for Large Language Models](https://arxiv.org//abs/2506.10597) ++ [SoK: Evaluating Jailbreak Guardrails for Large Language Models](https://arxiv.org/abs/2506.10597) Xunguang Wang, Zhenlan Ji, Wenxuan Wang, Zongjie Li, Daoyuan Wu, Shuai Wang # 2025-06-11 -+ [Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models](https://arxiv.org//abs/2506.09408) ++ [Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models](https://arxiv.org/abs/2506.09408) Jui-Ming Yao, Hao-Yuan Chen, Zi-Xian Tang, Bing-Jia Tan, Sheng-Wei Peng, Bing-Cheng Xie, Shun-Feng Su -+ [Effective Red-Teaming of Policy-Adherent Agents](https://arxiv.org//abs/2506.09600) ++ [Effective Red-Teaming of Policy-Adherent Agents](https://arxiv.org/abs/2506.09600) Itay Nakash, George Kour, Koren Lazar, Matan Vetzler, Guy Uziel, Ateret Anaby-Tavor -+ [Reasoning Models Are More Easily Gaslighted Than You Think](https://arxiv.org//abs/2506.09677) ++ [Reasoning Models Are More Easily Gaslighted Than You Think](https://arxiv.org/abs/2506.09677) Bin Zhu, Hailong Yin, Jingjing Chen, Yu-Gang Jiang -+ [Inverting Black-Box Face Recognition Systems via Zero-Order Optimization in Eigenface Space](https://arxiv.org//abs/2506.09777) ++ [Inverting Black-Box Face Recognition Systems via Zero-Order Optimization in Eigenface Space](https://arxiv.org/abs/2506.09777) Anton Razzhigaev, Matvey Mikhalchuk, Klim Kireev, Igor Udovichenko, Andrey Kuznetsov, Aleksandr Petiushko -+ [LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge](https://arxiv.org//abs/2506.09956) ++ [LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge](https://arxiv.org/abs/2506.09956) Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin -+ [Memorization in Language Models through the Lens of Intrinsic Dimension](https://arxiv.org//abs/2506.09591) ++ [Memorization in Language Models through the Lens of Intrinsic Dimension](https://arxiv.org/abs/2506.09591) Stefan Arnold -+ [You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks](https://arxiv.org//abs/2506.09521) ++ [You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks](https://arxiv.org/abs/2506.09521) Ünal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi, Prachi Singh, Daniel Tenbrinck, Emanuël A. P. Habets, Nils Peters -+ [AngleRoCL: Angle-Robust Concept Learning for Physically View-Invariant T2I Adversarial Patches](https://arxiv.org//abs/2506.09538) ++ [AngleRoCL: Angle-Robust Concept Learning for Physically View-Invariant T2I Adversarial Patches](https://arxiv.org/abs/2506.09538) Wenjun Ji, Yuxiang Fu, Luyang Ying, Deng-Ping Fan, Yuyi Wang, Ming-Ming Cheng, Ivor Tsang, Qing Guo -+ [DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt](https://arxiv.org//abs/2506.09353) ++ [DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt](https://arxiv.org/abs/2506.09353) Yitong Zhang, Jia Li, Liyi Cai, Ge Li -+ [Canonical Latent Representations in Conditional Diffusion Models](https://arxiv.org//abs/2506.09955) ++ [Canonical Latent Representations in Conditional Diffusion Models](https://arxiv.org/abs/2506.09955) Yitao Xu, Tong Zhang, Ehsan Pajouheshgar, Sabine Süsstrunk -+ [Adversarial Surrogate Risk Bounds for Binary Classification](https://arxiv.org//abs/2506.09348) ++ [Adversarial Surrogate Risk Bounds for Binary Classification](https://arxiv.org/abs/2506.09348) Natalie S. Frank -+ [In-Context Bias Propagation in LLM-Based Tabular Data Generation](https://arxiv.org//abs/2506.09630) ++ [In-Context Bias Propagation in LLM-Based Tabular Data Generation](https://arxiv.org/abs/2506.09630) Pol G.Recasens, Alberto Gutierrez, Jordi Torres, Josep.Ll Berral, Anisa Halimi, Kieran Fraser -+ [Devil's Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols](https://arxiv.org//abs/2506.09803) ++ [Devil's Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols](https://arxiv.org/abs/2506.09803) Longzhu He, Chaozhuo Li, Peng Tang, Litian Zhang, Sen Su -+ [A look at adversarial attacks on radio waveforms from discrete latent space](https://arxiv.org//abs/2506.09896) ++ [A look at adversarial attacks on radio waveforms from discrete latent space](https://arxiv.org/abs/2506.09896) Attanasia Garuso, Silvija Kokalj-Filipovic, Yagna Kaasaragadda -+ [Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning](https://arxiv.org//abs/2506.09923) ++ [Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning](https://arxiv.org/abs/2506.09923) Liou Tang, James Joshi, Ashish Kundu -+ [TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning](https://arxiv.org//abs/2506.09562) ++ [TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning](https://arxiv.org/abs/2506.09562) Songze Li, Mingxuan Zhang, Oubo Ma, Kang Wei, Shouling Ji -+ [Evasion Attacks Against Bayesian Predictive Models](https://arxiv.org//abs/2506.09640) ++ [Evasion Attacks Against Bayesian Predictive Models](https://arxiv.org/abs/2506.09640) Pablo G. Arce, Roi Naveiro, David Ríos Insua -+ [LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge](https://arxiv.org//abs/2506.09443) ++ [LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge](https://arxiv.org/abs/2506.09443) Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji @@ -7944,15 +7944,15 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Songze Li, Mingxuan Zhang, Kang Wei, Shouling Ji -+ [Disclosure Audits for LLM Agents](https://arxiv.org//abs/2506.10171) ++ [Disclosure Audits for LLM Agents](https://arxiv.org/abs/2506.10171) Saswat Das, Jameson Sandler, Ferdinando Fioretto -+ [Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods](https://arxiv.org//abs/2506.10236) ++ [Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods](https://arxiv.org/abs/2506.10236) Yeonwoo Jang, Shariqah Hossain, Ashwin Sreevatsa, Diogo Cruz -+ [Revisiting Visual Understanding in Multimodal Reasoning through a Lens of Image Perturbation](https://arxiv.org//abs/2506.09736) ++ [Revisiting Visual Understanding in Multimodal Reasoning through a Lens of Image Perturbation](https://arxiv.org/abs/2506.09736) Yuting Li, Lai Wei, Kaipeng Zheng, Jingyuan Huang, Guilin Li, Bo Wang, Linghe Kong, Lichao Sun, Weiran Huang @@ -7961,67 +7961,67 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Natalie S. Frank # 2025-06-10 -+ [Single-Node Trigger Backdoor Attacks in Graph-Based Recommendation Systems](https://arxiv.org//abs/2506.08401) ++ [Single-Node Trigger Backdoor Attacks in Graph-Based Recommendation Systems](https://arxiv.org/abs/2506.08401) Runze Li, Di Jin, Xiaobao Wang, Dongxiao He, Bingdao Feng, Zhen Wang -+ [Your Agent Can Defend Itself against Backdoor Attacks](https://arxiv.org//abs/2506.08336) ++ [Your Agent Can Defend Itself against Backdoor Attacks](https://arxiv.org/abs/2506.08336) Li Changjiang, Liang Jiacheng, Cao Bochuan, Chen Jinghui, Wang Ting -+ [SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models](https://arxiv.org//abs/2506.08346) ++ [SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models](https://arxiv.org/abs/2506.08346) Wenhan Yao, Fen Xiao, Xiarun Chen, Jia Liu, YongQiang He, Weiping Wen -+ [WGLE:Backdoor-free and Multi-bit Black-box Watermarking for Graph Neural Networks](https://arxiv.org//abs/2506.08602) ++ [WGLE:Backdoor-free and Multi-bit Black-box Watermarking for Graph Neural Networks](https://arxiv.org/abs/2506.08602) Tingzhi Li, Xuefeng Liu -+ [Towards Robust Deep Reinforcement Learning against Environmental State Perturbation](https://arxiv.org//abs/2506.08961) ++ [Towards Robust Deep Reinforcement Learning against Environmental State Perturbation](https://arxiv.org/abs/2506.08961) Chenxu Wang, Huaping Liu -+ [CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling](https://arxiv.org//abs/2506.08584) ++ [CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling](https://arxiv.org/abs/2506.08584) Yahan Li, Jifan Yao, John Bosco S. Bunyi, Adam C. Frank, Angel Hwang, Ruishan Liu -+ [AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)](https://arxiv.org//abs/2506.08885) ++ [AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)](https://arxiv.org/abs/2506.08885) Danush Khanna, Krishna Kumar, Basab Ghosh, Vinija Jain, Vasu Sharma, Aman Chadha, Amitava Das -+ [Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation](https://arxiv.org//abs/2506.08611) ++ [Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation](https://arxiv.org/abs/2506.08611) Shiji Zhao, Chi Chen, Ranjie Duan, Xizhe Wang, Xingxing Wei -+ [Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings](https://arxiv.org//abs/2506.08435) ++ [Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings](https://arxiv.org/abs/2506.08435) Mingyuan Fan, Fuyi Wang, Cen Chen, Jianying Zhou -+ [DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training](https://arxiv.org//abs/2506.08514) ++ [DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training](https://arxiv.org/abs/2506.08514) Jacob Piland, Chris Sweet, Adam Czakja -+ [Design Patterns for Securing LLM Agents against Prompt Injections](https://arxiv.org//abs/2506.08837) ++ [Design Patterns for Securing LLM Agents against Prompt Injections](https://arxiv.org/abs/2506.08837) Luca Beurer-Kellner, Beat Buesser Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn -+ [GPS Spoofing Attacks on AI-based Navigation Systems with Obstacle Avoidance in UAV](https://arxiv.org//abs/2506.08445) ++ [GPS Spoofing Attacks on AI-based Navigation Systems with Obstacle Avoidance in UAV](https://arxiv.org/abs/2506.08445) Ji Hyuk Jung, Mi Yeon Hong, Ji Won Yoon -+ [One Patch to Rule Them All: Transforming Static Patches into Dynamic Attacks in the Physical World](https://arxiv.org//abs/2506.08482) ++ [One Patch to Rule Them All: Transforming Static Patches into Dynamic Attacks in the Physical World](https://arxiv.org/abs/2506.08482) Xingshuo Han, Chen Ling, Shiyi Yao, Haozhao Wang, Hangcheng Liu, Yutong Wu, Shengmin Xu, Changhai Ou, Xinyi Huang, Tianwei Zhang -+ [Adversarial Text Generation with Dynamic Contextual Perturbation](https://arxiv.org//abs/2506.09148) ++ [Adversarial Text Generation with Dynamic Contextual Perturbation](https://arxiv.org/abs/2506.09148) Hetvi Waghela, Jaydip Sen, Sneha Rakshit, Subhasis Dasgupta -+ [PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies](https://arxiv.org//abs/2506.09237) ++ [PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies](https://arxiv.org/abs/2506.09237) Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki, Jafar Habibi, Mohammad Hossein Rohban -+ [ASRJam: Human-Friendly AI Speech Jamming to Prevent Automated Phone Scams](https://arxiv.org//abs/2506.11125) ++ [ASRJam: Human-Friendly AI Speech Jamming to Prevent Automated Phone Scams](https://arxiv.org/abs/2506.11125) Freddie Grabovski, Gilad Gressel, Yisroel Mirsky @@ -8029,92 +8029,92 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Rafaël Nouailles (GdR) -+ [Towards Cross-Subject EMG Pattern Recognition via Dual-Branch Adversarial Feature Disentanglement](https://arxiv.org//abs/2506.08555) ++ [Towards Cross-Subject EMG Pattern Recognition via Dual-Branch Adversarial Feature Disentanglement](https://arxiv.org/abs/2506.08555) Xinyue Niu, Akira Furui -+ [Does Multimodal Large Language Model Truly Unlearn? Stealthy MLLM Unlearning Attack](https://arxiv.org//abs/2506.17265) ++ [Does Multimodal Large Language Model Truly Unlearn? Stealthy MLLM Unlearning Attack](https://arxiv.org/abs/2506.17265) Xianren Zhang, Hui Liu, Delvin Ce Zhang, Xianfeng Tang, Qi He, Dongwon Lee, Suhang Wang # 2025-06-09 -+ [HeTa: Relation-wise Heterogeneous Graph Foundation Attack Model](https://arxiv.org//abs/2506.07428) ++ [HeTa: Relation-wise Heterogeneous Graph Foundation Attack Model](https://arxiv.org/abs/2506.07428) Yuling Wang, Zihui Chen, Pengfei Jiao, Xiao Wang -+ [RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards](https://arxiv.org//abs/2506.07736) ++ [RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards](https://arxiv.org/abs/2506.07736) Jingnan Zheng, Xiangtian Ji, Yijun Lu, Chenhang Cui, Weixiang Zhao, Gelei Deng, Zhenkai Liang, An Zhang, Tat-Seng Chua -+ [JavelinGuard: Low-Cost Transformer Architectures for LLM Security](https://arxiv.org//abs/2506.07330) ++ [JavelinGuard: Low-Cost Transformer Architectures for LLM Security](https://arxiv.org/abs/2506.07330) Yash Datta, Sharath Rajasekar -+ [MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems](https://arxiv.org//abs/2506.07399) ++ [MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems](https://arxiv.org/abs/2506.07399) Peiru Yang, Jinhua Yin, Haoran Zheng, Xueying Bai, Huili Wang, Yufei Sun, Xintian Li, Shangguang Wang, Yongfeng Huang, Tao Qi -+ [When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment](https://arxiv.org//abs/2506.07452) ++ [When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment](https://arxiv.org/abs/2506.07452) Yuxin Xiao, Sana Tonekaboni, Walter Gerych, Vinith Suriyakumar, Marzyeh Ghassemi -+ [Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability](https://arxiv.org//abs/2506.07804) ++ [Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability](https://arxiv.org/abs/2506.07804) Jie Bao, Chuangyin Dang, Rui Luo, Hanwei Zhang, Zhixin Zhou -+ [Evaluating LLMs Robustness in Less Resourced Languages with Proxy Models](https://arxiv.org//abs/2506.07645) ++ [Evaluating LLMs Robustness in Less Resourced Languages with Proxy Models](https://arxiv.org/abs/2506.07645) Maciej Chrabąszcz, Katarzyna Lorenc, Karolina Seweryn -+ [Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures](https://arxiv.org//abs/2506.07402) ++ [Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures](https://arxiv.org/abs/2506.07402) Yukai Zhou, Sibei Yang, Wenjie Wang -+ [Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models](https://arxiv.org//abs/2506.07468) ++ [Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models](https://arxiv.org/abs/2506.07468) Mickel Liu, Liwei Jiang, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff, Natasha Jaques -+ [Explore the vulnerability of black-box models via diffusion models](https://arxiv.org//abs/2506.07590) ++ [Explore the vulnerability of black-box models via diffusion models](https://arxiv.org/abs/2506.07590) Jiacheng Shi, Yanfu Zhang, Huajie Shao, Ashley Gao -+ [Circumventing Backdoor Space via Weight Symmetry](https://arxiv.org//abs/2506.07467) ++ [Circumventing Backdoor Space via Weight Symmetry](https://arxiv.org/abs/2506.07467) Jie Peng, Hongwei Yang, Jing Zhao, Hengji Dong, Hui He, Weizhe Zhang, Haoyu He -+ [TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts](https://arxiv.org//abs/2506.07596) ++ [TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts](https://arxiv.org/abs/2506.07596) Torsten Krauß, Hamid Dashtbani, Alexandra Dmitrienko -+ [ProARD: progressive adversarial robustness distillation: provide wide range of robust students](https://arxiv.org//abs/2506.07666) ++ [ProARD: progressive adversarial robustness distillation: provide wide range of robust students](https://arxiv.org/abs/2506.07666) Seyedhamidreza Mousavi, Seyedali Mousavi, Masoud Daneshtalab -+ [TokenBreak: Bypassing Text Classification Models Through Token Manipulation](https://arxiv.org//abs/2506.07948) ++ [TokenBreak: Bypassing Text Classification Models Through Token Manipulation](https://arxiv.org/abs/2506.07948) Kasimir Schulz, Kenneth Yeung, Kieran Evans -+ [TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems](https://arxiv.org//abs/2506.07605) ++ [TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems](https://arxiv.org/abs/2506.07605) Marco Di Gennaro, Giovanni De Lucia, Stefano Longari, Stefano Zanero, Michele Carminati -+ [SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark](https://arxiv.org//abs/2506.07888) ++ [SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark](https://arxiv.org/abs/2506.07888) Rui Wen, Yiyong Liu, Michael Backes, Yang Zhang -+ [Secure Distributed Learning for CAVs: Defending Against Gradient Leakage with Leveled Homomorphic Encryption](https://arxiv.org//abs/2506.07894) ++ [Secure Distributed Learning for CAVs: Defending Against Gradient Leakage with Leveled Homomorphic Encryption](https://arxiv.org/abs/2506.07894) Muhammad Ali Najjar, Ren-Yi Huang, Dumindu Samaraweera, Prashant Shekhar -+ [Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion Models in a Vision-Language-Action Framework](https://arxiv.org//abs/2506.08185) ++ [Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion Models in a Vision-Language-Action Framework](https://arxiv.org/abs/2506.08185) Huixin Zhan, Jason H. Moore -+ [SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense](https://arxiv.org//abs/2506.08255) ++ [SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense](https://arxiv.org/abs/2506.08255) Patryk Krukowski, Łukasz Gorczyca, Piotr Helm, Kamil Książek, Przemysław Spurek -+ [GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors](https://arxiv.org//abs/2506.08188) ++ [GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors](https://arxiv.org/abs/2506.08188) Wenlong Meng, Shuguo Fan, Chengkun Wei, Min Chen, Yuwei Li, Yuanchao Zhang, Zhikun Zhang, Wenzhi Chen @@ -8130,11 +8130,11 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Marco Di Gennaro, Giovanni De Lucia, Stefano Longari, Stefano Zanero, Michele Carminati -+ [QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA](https://arxiv.org//abs/2506.08123) ++ [QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA](https://arxiv.org/abs/2506.08123) Jacob Dineen (1), Aswin RRV (1), Qin Liu (2), Zhikun Xu (1), Xiao Ye (1), Ming Shen (1), Zhaonan Li (1), Shijie Lu (1), Chitta Baral (1), Muhao Chen (2), Ben Zhou (1) ((1) Arizona State University, (2) University of California Davis) -+ [InverseScope: Scalable Activation Inversion for Interpreting Large Language Models](https://arxiv.org//abs/2506.07406) ++ [InverseScope: Scalable Activation Inversion for Interpreting Large Language Models](https://arxiv.org/abs/2506.07406) Yifan Luo, Zhennan Zhou, Bin Dong @@ -8142,64 +8142,64 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Seokil Ham, Yubin Choi, Yujin Yang, Seungju Cho, Younghun Kim, Changick Kim -+ [TAI3: Testing Agent Integrity in Interpreting User Intent](https://arxiv.org//abs/2506.07524) ++ [TAI3: Testing Agent Integrity in Interpreting User Intent](https://arxiv.org/abs/2506.07524) Shiwei Feng, Xiangzhe Xu, Xuan Chen, Kaiyuan Zhang, Syed Yusuf Ahmed, Zian Su, Mingwei Zheng, Xiangyu Zhang # 2025-06-08 -+ [Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test](https://arxiv.org//abs/2506.06975) ++ [Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test](https://arxiv.org/abs/2506.06975) Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, Willie Neiswanger -+ [AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint](https://arxiv.org//abs/2506.07022) ++ [AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint](https://arxiv.org/abs/2506.07022) Leheng Sheng, Changshuo Shen, Weixiang Zhao, Junfeng Fang, Xiaohao Liu, Zhenkai Liang, Xiang Wang, An Zhang, Tat-Seng Chua -+ [HauntAttack: When Attack Follows Reasoning as a Shadow](https://arxiv.org//abs/2506.07031) ++ [HauntAttack: When Attack Follows Reasoning as a Shadow](https://arxiv.org/abs/2506.07031) Jingyuan Ma, Rui Li, Zheng Li, Junfeng Liu, Lei Sha, Zhifang Sui -+ [Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models](https://arxiv.org//abs/2506.07121) ++ [Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models](https://arxiv.org/abs/2506.07121) Ren-Jian Wang, Ke Xue, Zeyu Qin, Ziniu Li, Sheng Tang, Hao-Tian Li, Shengcai Liu, Chao Qian -+ [Break-The-Chain: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation](https://arxiv.org//abs/2506.06971) ++ [Break-The-Chain: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation](https://arxiv.org/abs/2506.06971) Jaechul Roh, Varun Gandhi, Shivani Anilkumar, Arin Garg -+ [Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text](https://arxiv.org//abs/2506.07001) ++ [Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text](https://arxiv.org/abs/2506.07001) Yize Cheng, Vinu Sankar Sadasivan, Mehrdad Saberi, Shoumik Saha, Soheil Feizi -+ [Boosting Adversarial Transferability via Commonality-Oriented Gradient Optimization](https://arxiv.org//abs/2506.06992) ++ [Boosting Adversarial Transferability via Commonality-Oriented Gradient Optimization](https://arxiv.org/abs/2506.06992) Yanting Gao, Yepeng Liu, Junming Liu, Qi Zhang, Hongyun Zhang, Duoqian Miao, Cairong Zhao -+ [D2R: dual regularization loss with collaborative adversarial generation for model robustness](https://arxiv.org//abs/2506.07056) ++ [D2R: dual regularization loss with collaborative adversarial generation for model robustness](https://arxiv.org/abs/2506.07056) Zhenyu Liu, Huizhi Liang, Rajiv Ranjan, Zhanxing Zhu, Vaclav Snasel, Varun Ojha -+ [UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-label Learning](https://arxiv.org//abs/2506.07087) ++ [UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-label Learning](https://arxiv.org/abs/2506.07087) Weiqi Yan, Lvhai Chen, Huaijia Kou, Shengchuan Zhang, Yan Zhang, Liujuan Cao -+ [Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation](https://arxiv.org//abs/2506.07214) ++ [Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation](https://arxiv.org/abs/2506.07214) Zhiyuan Zhong, Zhen Sun, Yepang Liu, Xinlei He, Guanhong Tao -+ [PASS: Private Attributes Protection with Stochastic Data Substitution](https://arxiv.org//abs/2506.07308) ++ [PASS: Private Attributes Protection with Stochastic Data Substitution](https://arxiv.org/abs/2506.07308) Yizhuo Chen, Chun-Fu (Richard)Chen, Hsiang Hsu, Shaohan Hu, Tarek Abdelzaher -+ [Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations](https://arxiv.org//abs/2506.09067) ++ [Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations](https://arxiv.org/abs/2506.09067) Zhiyu Xue, Reza Abbasi-Asl, Ramtin Pedarsani -+ [Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions](https://arxiv.org//abs/2506.11111) ++ [Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions](https://arxiv.org/abs/2506.11111) Kun Zhang, Le Wu, Kui Yu, Guangyi Lv, Dacao Zhang -+ [Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks](https://arxiv.org//abs/2506.11113) ++ [Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks](https://arxiv.org/abs/2506.11113) Tzu-Ling Lin, Wei-Chih Chen, Teng-Fang Hsiao, Hou-I Liu, Ya-Hsin Yeh, Yu Kai Chan, Wen-Sheng Lien, Po-Yen Kuo, Philip S. Yu, Hong-Han Shuai @@ -8207,7 +8207,7 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Jaechul Roh, Varun Gandhi, Shivani Anilkumar, Arin Garg -+ [Towards Interpretable Adversarial Examples via Sparse Adversarial Attack](https://arxiv.org//abs/2506.17250) ++ [Towards Interpretable Adversarial Examples via Sparse Adversarial Attack](https://arxiv.org/abs/2506.17250) Fudong Lin, Jiadong Lou, Hao Wang, Brian Jalaian, Xu Yuan @@ -8220,35 +8220,35 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yanting Gao, Yepeng Liu, Junming Liu, Qi Zhang, Hongyun Zhang, Duoqian Miao, Cairong Zhao # 2025-06-07 -+ [Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry](https://arxiv.org//abs/2506.06933) ++ [Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry](https://arxiv.org/abs/2506.06933) Mahdi Salmani, Alireza Abdollahpoorrostam, Seyed-Mohsen Moosavi-Dezfooli -+ [KNN-Defense: Defense against 3D Adversarial Point Clouds using Nearest-Neighbor Search](https://arxiv.org//abs/2506.06906) ++ [KNN-Defense: Defense against 3D Adversarial Point Clouds using Nearest-Neighbor Search](https://arxiv.org/abs/2506.06906) Nima Jamali, Matina Mahdizadeh Sani, Hanieh Naderi, Shohreh Kasaei -+ [FREE: Fast and Robust Vision Language Models with Early Exits](https://arxiv.org//abs/2506.06884) ++ [FREE: Fast and Robust Vision Language Models with Early Exits](https://arxiv.org/abs/2506.06884) Divya Jyoti Bajpai, Manjesh Kumar Hanawal -+ [Rescaled Influence Functions: Accurate Data Attribution in High Dimension](https://arxiv.org//abs/2506.06656) ++ [Rescaled Influence Functions: Accurate Data Attribution in High Dimension](https://arxiv.org/abs/2506.06656) Ittai Rubinstein, Samuel B. Hopkins -+ [Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks?](https://arxiv.org//abs/2506.06891) ++ [Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks?](https://arxiv.org/abs/2506.06891) Paulius Sasnauskas, Yiğit Yalın, Goran Radanović -+ [Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations](https://arxiv.org//abs/2506.06613) ++ [Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations](https://arxiv.org/abs/2506.06613) Arefe Boushehrian, Amir Najafi -+ [Stochastic Training for Side-Channel Resilient AI](https://arxiv.org//abs/2506.06597) ++ [Stochastic Training for Side-Channel Resilient AI](https://arxiv.org/abs/2506.06597) Anuj Dubey, Aydin Aysu -+ [LADSG: Label-Anonymized Distillation and Similar Gradient Substitution for Label Privacy in Vertical Federated Learning](https://arxiv.org//abs/2506.06742) ++ [LADSG: Label-Anonymized Distillation and Similar Gradient Substitution for Label Privacy in Vertical Federated Learning](https://arxiv.org/abs/2506.06742) Zeyu Yan, Yifei Yao, Xuanbing Wen, Juli Zhang, Kai Fan @@ -8261,196 +8261,196 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Woongjib Choi, Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang # 2025-06-06 -+ [To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt](https://arxiv.org//abs/2506.05739) ++ [To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt](https://arxiv.org/abs/2506.05739) Zhilong Wang, Neha Nagaraja, Lan Zhang, Hayretdin Bahsi, Pawan Patil, Peng Liu -+ [When Better Features Mean Greater Risks: The Performance-Privacy Trade-Off in Contrastive Learning](https://arxiv.org//abs/2506.05743) ++ [When Better Features Mean Greater Risks: The Performance-Privacy Trade-Off in Contrastive Learning](https://arxiv.org/abs/2506.05743) Ruining Sun, Hongsheng Hu, Wei Luo, Zhaoxi Zhang, Yanjun Zhang, Haizhuan Yuan, Leo Yu Zhang -+ [DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection](https://arxiv.org//abs/2506.05851) ++ [DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection](https://arxiv.org/abs/2506.05851) Marcel Klemt, Carlotta Segna, Anna Rohrbach -+ [Quantifying Adversarial Uncertainty in Evidential Deep Learning using Conflict Resolution](https://arxiv.org//abs/2506.05937) ++ [Quantifying Adversarial Uncertainty in Evidential Deep Learning using Conflict Resolution](https://arxiv.org/abs/2506.05937) Charmaine Barker, Daniel Bethell, Simos Gerasimou -+ [Hey, That's My Data! Label-Only Dataset Inference in Large Language Models](https://arxiv.org//abs/2506.06057) ++ [Hey, That's My Data! Label-Only Dataset Inference in Large Language Models](https://arxiv.org/abs/2506.06057) Chen Xiong, Zihao Wang, Rui Zhu, Tsung-Yi Ho, Pin-Yu Chen, Jingwei Xiong, Haixu Tang, Lucila Ohno-Machado -+ [Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models](https://arxiv.org//abs/2506.06060) ++ [Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models](https://arxiv.org/abs/2506.06060) Yingqi Hu, Zhuo Zhang, Jingyuan Zhang, Lizhen Qu, Zenglin Xu -+ [Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness](https://arxiv.org//abs/2506.06112) ++ [Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness](https://arxiv.org/abs/2506.06112) Cheng-Long Wang, Qi Li, Zihang Xiang, Yinzhi Cao, Di Wang -+ [Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems](https://arxiv.org//abs/2506.06151) ++ [Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems](https://arxiv.org/abs/2506.06151) Haowei Wang, Rupeng Zhang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, Qing Wang -+ [AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization](https://arxiv.org//abs/2506.06273) ++ [AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization](https://arxiv.org/abs/2506.06273) Mukur Gupta, Nikhil Reddy Varimalla, Nicholas Deas, Melanie Subbiah, Kathleen McKeown -+ [MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks](https://arxiv.org//abs/2506.05982) ++ [MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks](https://arxiv.org/abs/2506.05982) Zonglin Wu, Yule Xue, Xin Wei, Yiren Song -+ [Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification](https://arxiv.org//abs/2506.06027) ++ [Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification](https://arxiv.org/abs/2506.06027) Yuhao Sun, Jiacheng Zhang, Zesheng Ye, Chaowei Xiao, Feng Liu -+ [What Really is a Member? Discrediting Membership Inference via Poisoning](https://arxiv.org//abs/2506.06003) ++ [What Really is a Member? Discrediting Membership Inference via Poisoning](https://arxiv.org/abs/2506.06003) Neal Mangaokar, Ashish Hooda, Zhuohang Li, Bradley A. Malin, Kassem Fawaz, Somesh Jha, Atul Prakash, Amrita Roy Chowdhury -+ [Synthetic Tabular Data: Methods, Attacks and Defenses](https://arxiv.org//abs/2506.06108) ++ [Synthetic Tabular Data: Methods, Attacks and Defenses](https://arxiv.org/abs/2506.06108) Graham Cormode, Samuel Maddock, Enayat Ullah, Shripad Gade -+ [Stealix: Model Stealing via Prompt Evolution](https://arxiv.org//abs/2506.05867) ++ [Stealix: Model Stealing via Prompt Evolution](https://arxiv.org/abs/2506.05867) Zhixiong Zhuang, Hui-Po Wang, Maria-Irina Nicolae, Mario Fritz -+ [FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model](https://arxiv.org//abs/2506.05640) ++ [FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model](https://arxiv.org/abs/2506.05640) Md Jueal Mia, M. Hadi Amini -+ [SATversary: Adversarial Attacks on Satellite Fingerprinting](https://arxiv.org//abs/2506.06119) ++ [SATversary: Adversarial Attacks on Satellite Fingerprinting](https://arxiv.org/abs/2506.06119) Joshua Smailes, Sebastian Köhler, Simon Birnbach, Martin Strohmeier, Ivan Martinovic -+ [Benchmarking Misuse Mitigation Against Covert Adversaries](https://arxiv.org//abs/2506.06414) ++ [Benchmarking Misuse Mitigation Against Covert Adversaries](https://arxiv.org/abs/2506.06414) Davis Brown, Mahdi Sabbaghi, Luze Sun, Alexander Robey, George J. Pappas, Eric Wong, Hamed Hassani -+ [Securing Traffic Sign Recognition Systems in Autonomous Vehicles](https://arxiv.org//abs/2506.06563) ++ [Securing Traffic Sign Recognition Systems in Autonomous Vehicles](https://arxiv.org/abs/2506.06563) Thushari Hapuarachchi, Long Dang, Kaiqi Xiong -+ [Membership Inference Attacks for Unseen Classes](https://arxiv.org//abs/2506.06488) ++ [Membership Inference Attacks for Unseen Classes](https://arxiv.org/abs/2506.06488) Pratiksha Thaker, Neil Kale, Zhiwei Steven Wu, Virginia Smith -+ [SDN-Based False Data Detection With Its Mitigation and Machine Learning Robustness for In-Vehicle Networks](https://arxiv.org//abs/2506.06556) ++ [SDN-Based False Data Detection With Its Mitigation and Machine Learning Robustness for In-Vehicle Networks](https://arxiv.org/abs/2506.06556) Long Dang, Thushari Hapuarachchi, Kaiqi Xiong, Yi Li -+ [A Systematic Review of Poisoning Attacks Against Large Language Models](https://arxiv.org//abs/2506.06518) ++ [A Systematic Review of Poisoning Attacks Against Large Language Models](https://arxiv.org/abs/2506.06518) Neil Fendley, Edward W. Staley, Joshua Carney, William Redman, Marie Chau, Nathan Drenkow -+ [Adapting Under Fire: Multi-Agent Reinforcement Learning for Adversarial Drift in Network Security](https://arxiv.org//abs/2506.06565) ++ [Adapting Under Fire: Multi-Agent Reinforcement Learning for Adversarial Drift in Network Security](https://arxiv.org/abs/2506.06565) Emilia Rivas, Sabrina Saika, Ahtesham Bakht, Aritran Piplai, Nathaniel D. Bastian, Ankit Shah -+ [A Certified Unlearning Approach without Access to Source Data](https://arxiv.org//abs/2506.06486) ++ [A Certified Unlearning Approach without Access to Source Data](https://arxiv.org/abs/2506.06486) Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, Basak Guler # 2025-06-05 -+ [Control Tax: The Price of Keeping AI in Check](https://arxiv.org//abs/2506.05296) ++ [Control Tax: The Price of Keeping AI in Check](https://arxiv.org/abs/2506.05296) Mikhail Terekhov, Zhen Ning David Liu, Caglar Gulcehre, Samuel Albanie -+ [BESA: Boosting Encoder Stealing Attack with Perturbation Recovery](https://arxiv.org//abs/2506.04556) ++ [BESA: Boosting Encoder Stealing Attack with Perturbation Recovery](https://arxiv.org/abs/2506.04556) Xuhao Ren, Haotian Liang, Yajie Wang, Chuan Zhang, Zehui Xiong, Liehuang Zhu -+ [SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing](https://arxiv.org//abs/2506.04583) ++ [SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing](https://arxiv.org/abs/2506.04583) Hongjun Liu, Yilun Zhao, Arman Cohan, Chen Zhao -+ [Influence Functions for Edge Edits in Non-Convex Graph Neural Networks](https://arxiv.org//abs/2506.04694) ++ [Influence Functions for Edge Edits in Non-Convex Graph Neural Networks](https://arxiv.org/abs/2506.04694) Jaeseung Heo, Kyeongheung Yun, Seokwon Yoon, MoonJeong Park, Jungseul Ok, Dongwoo Kim -+ [Robustness as Architecture: Designing IQA Models to Withstand Adversarial Perturbations](https://arxiv.org//abs/2506.04951) ++ [Robustness as Architecture: Designing IQA Models to Withstand Adversarial Perturbations](https://arxiv.org/abs/2506.04951) Igor Meleshin, Anna Chistyakova, Anastasia Antsiferova, Dmitriy Vatolin -+ [Identifying and Understanding Cross-Class Features in Adversarial Training](https://arxiv.org//abs/2506.05032) ++ [Identifying and Understanding Cross-Class Features in Adversarial Training](https://arxiv.org/abs/2506.05032) Zeming Wei, Yiwen Guo, Yisen Wang -+ [Normative Conflicts and Shallow AI Alignment](https://arxiv.org//abs/2506.04679) ++ [Normative Conflicts and Shallow AI Alignment](https://arxiv.org/abs/2506.04679) Raphaël Millière -+ [Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies](https://arxiv.org//abs/2506.04887) ++ [Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies](https://arxiv.org/abs/2506.04887) Wenxi Li -+ [Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets](https://arxiv.org//abs/2506.05346) ++ [Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets](https://arxiv.org/abs/2506.05346) Lei Hsiung, Tianyu Pang, Yung-Chen Tang, Linyue Song, Tsung-Yi Ho, Pin-Yu Chen, Yaoqing Yang -+ [SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs](https://arxiv.org//abs/2506.04743) ++ [SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs](https://arxiv.org/abs/2506.04743) Shuhan Xu, Siyuan Liang, Hongling Zheng, Yong Luo, Aishan Liu, Dacheng Tao -+ [Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors](https://arxiv.org//abs/2506.04823) ++ [Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors](https://arxiv.org/abs/2506.04823) Svetlana Pavlitska, Jamie Robb, Nikolai Polley, Melih Yazgan, J. Marius Zöllner -+ [Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking](https://arxiv.org//abs/2506.04879) ++ [Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking](https://arxiv.org/abs/2506.04879) Yu-Feng Chen, Tzuhsuan Huang, Pin-Yen Chiu, Jun-Cheng Chen -+ [Privacy Amplification Through Synthetic Data: Insights from Linear Regression](https://arxiv.org//abs/2506.05101) ++ [Privacy Amplification Through Synthetic Data: Insights from Linear Regression](https://arxiv.org/abs/2506.05101) Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard -+ [Membership Inference Attacks on Sequence Models](https://arxiv.org//abs/2506.05126) ++ [Membership Inference Attacks on Sequence Models](https://arxiv.org/abs/2506.05126) Lorenzo Rossi, Michael Aerni, Jie Zhang, Florian Tramèr -+ [Coordinated Robustness Evaluation Framework for Vision-Language Models](https://arxiv.org//abs/2506.05429) ++ [Coordinated Robustness Evaluation Framework for Vision-Language Models](https://arxiv.org/abs/2506.05429) Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar -+ [Explainer-guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models](https://arxiv.org//abs/2506.05430) ++ [Explainer-guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models](https://arxiv.org/abs/2506.05430) Mingjie Chen, Tiancheng Zhu, Mingxue Zhang, Yiling He, Minghao Lin, Penghui Li, Kui Ren -+ [Robustness Evaluation for Video Models with Reinforcement Learning](https://arxiv.org//abs/2506.05431) ++ [Robustness Evaluation for Video Models with Reinforcement Learning](https://arxiv.org/abs/2506.05431) Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar -+ [Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks](https://arxiv.org//abs/2506.05434) ++ [Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks](https://arxiv.org/abs/2506.05434) Thomas Massena, Léo andéol, Thibaut Boissin, Franck Mamalet, Corentin Friedrich, Mathieu Serrurier, Sébastien Gerchinovitz -+ [Sentinel: SOTA model to protect against prompt injections](https://arxiv.org//abs/2506.05446) ++ [Sentinel: SOTA model to protect against prompt injections](https://arxiv.org/abs/2506.05446) Dror Ivry, Oran Nahum -+ [SoK: Are Watermarks in LLMs Ready for Deployment?](https://arxiv.org//abs/2506.05594) ++ [SoK: Are Watermarks in LLMs Ready for Deployment?](https://arxiv.org/abs/2506.05594) Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin, Abdallah Khreishah, My Thai -+ [SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms](https://arxiv.org//abs/2506.05538) ++ [SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms](https://arxiv.org/abs/2506.05538) Arnesh Batra, Anushk Kumar, Jashn Khemani, Arush Gumber, Arhan Jain, Somil Gupta -+ [TRIDENT -- A Three-Tier Privacy-Preserving Propaganda Detection Model in Mobile Networks using Transformers, Adversarial Learning, and Differential Privacy](https://arxiv.org//abs/2506.05421) ++ [TRIDENT -- A Three-Tier Privacy-Preserving Propaganda Detection Model in Mobile Networks using Transformers, Adversarial Learning, and Differential Privacy](https://arxiv.org/abs/2506.05421) Al Nahian Bin Emran, Dhiman Goswami, Md Hasan Ullah Sadi, Sanchari Das -+ [Breaking Anonymity at Scale: Re-identifying the Trajectories of 100K Real Users in Japan](https://arxiv.org//abs/2506.05611) ++ [Breaking Anonymity at Scale: Re-identifying the Trajectories of 100K Real Users in Japan](https://arxiv.org/abs/2506.05611) Abhishek Kumar Mishra, Mathieu Cunche, Heber H. Arcolezi -+ [Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering](https://arxiv.org//abs/2506.06384) ++ [Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering](https://arxiv.org/abs/2506.06384) Yi Ji, Runzhi Li, Baolei Mao -+ [Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images](https://arxiv.org//abs/2506.06389) ++ [Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images](https://arxiv.org/abs/2506.06389) Rifat Sadik, Tanvir Rahman, Arpan Bhattacharjee, Bikash Chandra Halder, Ismail Hossain @@ -8458,628 +8458,628 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Al Nahian Bin Emran, Dhiman Goswami, Md Hasan Ullah Sadi, Sanchari Das -+ [RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation](https://arxiv.org//abs/2506.05070) ++ [RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation](https://arxiv.org/abs/2506.05070) Tianjiao Li, Mengran Yu, Chenyu Shi, Yanjun Zhao, Xiaojing Liu, Qiang Zhang, Qi Zhang, Xuanjing Huang, Jiayin Wang -+ [Towards Better Generalization via Distributional Input Projection Network](https://arxiv.org//abs/2506.04690) ++ [Towards Better Generalization via Distributional Input Projection Network](https://arxiv.org/abs/2506.04690) Yifan Hao, Yanxin Lu, Hanning Zhang, Xinwei Shen, Tong Zhang -+ [Beyond Per-Querier Budgets: Rigorous and Resilient Global Privacy Enforcement for the W3C Attribution API](https://arxiv.org//abs/2506.05290) ++ [Beyond Per-Querier Budgets: Rigorous and Resilient Global Privacy Enforcement for the W3C Attribution API](https://arxiv.org/abs/2506.05290) Pierre Tholoniat, Alison Caulfield, Giorgio Cavicchioli, Mark Chen, Nikos Goutzoulias, Benjamin Case, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer, Martin Thomson -+ [A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search](https://arxiv.org//abs/2506.05294) ++ [A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search](https://arxiv.org/abs/2506.05294) Arnav Kumar Jain, Vibhakar Mohta, Subin Kim, Atiksh Bhardwaj, Juntao Ren, Yunhai Feng, Sanjiban Choudhury, Gokul Swamy # 2025-06-04 -+ [VLMs Can Aggregate Scattered Training Patches](https://arxiv.org//abs/2506.03614) ++ [VLMs Can Aggregate Scattered Training Patches](https://arxiv.org/abs/2506.03614) Zhanhui Zhou, Lingjie Chen, Chao Yang, Chaochao Lu -+ [Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks](https://arxiv.org//abs/2506.03627) ++ [Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks](https://arxiv.org/abs/2506.03627) Lin Mu, Guowei Chu, Li Ni, Lei Sang, Zhize Wu, Peiquan Jin, Yiwen Zhang -+ [DiffCAP: Diffusion-based Cumulative Adversarial Purification for Vision Language Models](https://arxiv.org//abs/2506.03933) ++ [DiffCAP: Diffusion-based Cumulative Adversarial Purification for Vision Language Models](https://arxiv.org/abs/2506.03933) Jia Fu, Yongtao Wu, Yihang Chen, Kunyu Peng, Xiao Zhang, Volkan Cevher, Sepideh Pashami, Anders Holst -+ [Privacy and Security Threat for OpenAI GPTs](https://arxiv.org//abs/2506.04036) ++ [Privacy and Security Threat for OpenAI GPTs](https://arxiv.org/abs/2506.04036) Wei Wenying, Zhao Kaifa, Xue Lei, Fan Ming -+ [RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors](https://arxiv.org//abs/2506.03988) ++ [RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors](https://arxiv.org/abs/2506.03988) Hicham Eddoubi, Jonas Ricker, Federico Cocchi, Lorenzo Baraldi, Angelo Sotgiu, Maura Pintor, Marcella Cornia, Lorenzo Baraldi, Asja Fischer, Rita Cucchiara, Battista Biggio -+ [Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning](https://arxiv.org//abs/2506.03850) ++ [Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning](https://arxiv.org/abs/2506.03850) Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong -+ [Evaluating Apple Intelligence's Writing Tools for Privacy Against Large Language Model-Based Inference Attacks: Insights from Early Datasets](https://arxiv.org//abs/2506.03870) ++ [Evaluating Apple Intelligence's Writing Tools for Privacy Against Large Language Model-Based Inference Attacks: Insights from Early Datasets](https://arxiv.org/abs/2506.03870) Mohd. Farhan Israk Soumik, Syed Mhamudul Hasan, Abdur R. Shahid -+ [Prediction Inconsistency Helps Achieve Generalizable Detection of Adversarial Examples](https://arxiv.org//abs/2506.03765) ++ [Prediction Inconsistency Helps Achieve Generalizable Detection of Adversarial Examples](https://arxiv.org/abs/2506.03765) Sicong Han, Chenhao Lin, Zhengyu Zhao, Xiyuan Wang, Xinlei He, Qian Li, Cong Wang, Qian Wang, Chao Shen -+ [Through the Stealth Lens: Rethinking Attacks and Defenses in RAG](https://arxiv.org//abs/2506.04390) ++ [Through the Stealth Lens: Rethinking Attacks and Defenses in RAG](https://arxiv.org/abs/2506.04390) Sarthak Choudhary, Nils Palumbo, Ashish Hooda, Krishnamurthy Dj Dvijotham, Somesh Jha -+ [Is Perturbation-Based Image Protection Disruptive to Image Editing?](https://arxiv.org//abs/2506.04394) ++ [Is Perturbation-Based Image Protection Disruptive to Image Editing?](https://arxiv.org/abs/2506.04394) Qiuyu Tang, Bonor Ayambem, Mooi Choo Chuah, Aparna Bharati -+ [Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning](https://arxiv.org//abs/2506.04453) ++ [Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2506.04453) Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy, Basak Guler -+ [Robust Anti-Backdoor Instruction Tuning in LVLMs](https://arxiv.org//abs/2506.05401) ++ [Robust Anti-Backdoor Instruction Tuning in LVLMs](https://arxiv.org/abs/2506.05401) Yuan Xun, Siyuan Liang, Xiaojun Jia, Xinwei Liu, Xiaochun Cao -+ [Sylva: Tailoring Personalized Adversarial Defense in Pre-trained Models via Collaborative Fine-tuning](https://arxiv.org//abs/2506.05402) ++ [Sylva: Tailoring Personalized Adversarial Defense in Pre-trained Models via Collaborative Fine-tuning](https://arxiv.org/abs/2506.05402) Tianyu Qi, Lei Xue, Yufeng Zhan, Xiaobo Ma -+ [Poisoning Behavioral-based Worker Selection in Mobile Crowdsensing using Generative Adversarial Networks](https://arxiv.org//abs/2506.05403) ++ [Poisoning Behavioral-based Worker Selection in Mobile Crowdsensing using Generative Adversarial Networks](https://arxiv.org/abs/2506.05403) Ruba Nasser, Ahmed Alagha, Shakti Singh, Rabeb Mizouni, Hadi Otrok, Jamal Bentahar -+ [RedDebate: Safer Responses through Multi-Agent Red Teaming Debates](https://arxiv.org//abs/2506.11083) ++ [RedDebate: Safer Responses through Multi-Agent Red Teaming Debates](https://arxiv.org/abs/2506.11083) Ali Asad, Stephen Obadinma, Radin Shayanfar, Xiaodan Zhu -+ [TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems](https://arxiv.org//abs/2506.04133) ++ [TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems](https://arxiv.org/abs/2506.04133) Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis -+ [macOSWorld: A Multilingual Interactive Benchmark for GUI Agents](https://arxiv.org//abs/2506.04135) ++ [macOSWorld: A Multilingual Interactive Benchmark for GUI Agents](https://arxiv.org/abs/2506.04135) Pei Yang, Hai Ci, Mike Zheng Shou # 2025-06-03 -+ [VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents](https://arxiv.org//abs/2506.02456) ++ [VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents](https://arxiv.org/abs/2506.02456) Tri Cao, Bennett Lim, Yue Liu, Yuan Sui, Yuexin Li, Shumin Deng, Lin Lu, Nay Oo, Shuicheng Yan, Bryan Hooi -+ [Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations](https://arxiv.org//abs/2506.02696) ++ [Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations](https://arxiv.org/abs/2506.02696) Jinyuan Luo, Zhen Fang, Yixuan Li, Seongheon Park, Ling Chen -+ [MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models](https://arxiv.org//abs/2506.02362) ++ [MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models](https://arxiv.org/abs/2506.02362) Xueqi Cheng, Minxing Zheng, Shixiang Zhu, Yushun Dong -+ [ATAG: AI-Agent Application Threat Assessment with Attack Graphs](https://arxiv.org//abs/2506.02859) ++ [ATAG: AI-Agent Application Threat Assessment with Attack Graphs](https://arxiv.org/abs/2506.02859) Parth Atulbhai Gandhi, Akansha Shukla, David Tayouri, Beni Ifland, Yuval Elovici, Rami Puzis, Asaf Shabtai -+ [How Explanations Leak the Decision Logic: Stealing Graph Neural Networks via Explanation Alignment](https://arxiv.org//abs/2506.03087) ++ [How Explanations Leak the Decision Logic: Stealing Graph Neural Networks via Explanation Alignment](https://arxiv.org/abs/2506.03087) Bin Ma, Yuyuan Feng, Minhua Lin, Enyan Dai -+ [Should LLM Safety Be More Than Refusing Harmful Instructions?](https://arxiv.org//abs/2506.02442) ++ [Should LLM Safety Be More Than Refusing Harmful Instructions?](https://arxiv.org/abs/2506.02442) Utsav Maskey, Mark Dras, Usman Naseem -+ [BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage](https://arxiv.org//abs/2506.02479) ++ [BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage](https://arxiv.org/abs/2506.02479) Kalyan Nakka, Nitesh Saxena -+ [Synthetic Iris Image Databases and Identity Leakage: Risks and Mitigation Strategies](https://arxiv.org//abs/2506.02626) ++ [Synthetic Iris Image Databases and Identity Leakage: Risks and Mitigation Strategies](https://arxiv.org/abs/2506.02626) Ada Sawilska, Mateusz Trokielewicz -+ [Explicitly Modeling Subcortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness](https://arxiv.org//abs/2506.03089) ++ [Explicitly Modeling Subcortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness](https://arxiv.org/abs/2506.03089) Lucas Piper, Arlindo L. Oliveira, Tiago Marques -+ [On the Robustness of Tabular Foundation Models: Test-Time Attacks and In-Context Defenses](https://arxiv.org//abs/2506.02978) ++ [On the Robustness of Tabular Foundation Models: Test-Time Attacks and In-Context Defenses](https://arxiv.org/abs/2506.02978) Mohamed Djilani, Thibault Simonetto, Karim Tit, Florian Tambon, Paul Récamier, Salah Ghamizi, Maxime Cordy, Mike Papadakis -+ [Agnostic Learning under Targeted Poisoning: Optimal Rates and the Role of Randomness](https://arxiv.org//abs/2506.03075) ++ [Agnostic Learning under Targeted Poisoning: Optimal Rates and the Role of Randomness](https://arxiv.org/abs/2506.03075) Bogdan Chornomaz, Yonatan Koren, Shay Moran, Tom Waknine -+ [On the Benefits of Accelerated Optimization in Robust and Private Estimation](https://arxiv.org//abs/2506.03044) ++ [On the Benefits of Accelerated Optimization in Robust and Private Estimation](https://arxiv.org/abs/2506.03044) Laurentiu Andrei Marchis, Po-Ling Loh -+ [Tarallo: Evading Behavioral Malware Detectors in the Problem Space](https://arxiv.org//abs/2506.02660) ++ [Tarallo: Evading Behavioral Malware Detectors in the Problem Space](https://arxiv.org/abs/2506.02660) Gabriele Digregorio, Salvatore Maccarrone, Mario D'Onghia, Luigi Gallo, Michele Carminati, Mario Polino, Stefano Zanero -+ [Poster: FedBlockParadox -- A Framework for Simulating and Securing Decentralized Federated Learning](https://arxiv.org//abs/2506.02679) ++ [Poster: FedBlockParadox -- A Framework for Simulating and Securing Decentralized Federated Learning](https://arxiv.org/abs/2506.02679) Gabriele Digregorio, Francesco Bleggi, Federico Caroli, Michele Carminati, Stefano Zanero, Stefano Longari -+ [Privacy Leaks by Adversaries: Adversarial Iterations for Membership Inference Attack](https://arxiv.org//abs/2506.02711) ++ [Privacy Leaks by Adversaries: Adversarial Iterations for Membership Inference Attack](https://arxiv.org/abs/2506.02711) Jing Xue, Zhishen Sun, Haishan Ye, Luo Luo, Xiangyu Chang, Ivor Tsang, Guang Dai -+ [Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows](https://arxiv.org//abs/2506.03332) ++ [Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows](https://arxiv.org/abs/2506.03332) Yifei Ming, Zixuan Ke, Xuan-Phi Nguyen, Jiayu Wang, Shafiq Joty -+ [BadReward: Clean-Label Poisoning of Reward Models in Text-to-Image RLHF](https://arxiv.org//abs/2506.03234) ++ [BadReward: Clean-Label Poisoning of Reward Models in Text-to-Image RLHF](https://arxiv.org/abs/2506.03234) Kaiwen Duan, Hongwei Yao, Yufei Chen, Ziyun Li, Tong Qiao, Zhan Qin, Cong Wang -+ [Adversarial Attacks on Robotic Vision Language Action Models](https://arxiv.org//abs/2506.03350) ++ [Adversarial Attacks on Robotic Vision Language Action Models](https://arxiv.org/abs/2506.03350) Eliot Krzysztof Jones, Alexander Robey, Andy Zou, Zachary Ravichandran, George J. Pappas, Hamed Hassani, Matt Fredrikson, J. Zico Kolter -+ [Robustness in Both Domains: CLIP Needs a Robust Text Encoder](https://arxiv.org//abs/2506.03355) ++ [Robustness in Both Domains: CLIP Needs a Robust Text Encoder](https://arxiv.org/abs/2506.03355) Elias Abad Rocamora, Christian Schlarmann, Naman Deep Singh, Yongtao Wu, Matthias Hein, Volkan Cevher -+ [Dynamic Epsilon Scheduling: A Multi-Factor Adaptive Perturbation Budget for Adversarial Training](https://arxiv.org//abs/2506.04263) ++ [Dynamic Epsilon Scheduling: A Multi-Factor Adaptive Perturbation Budget for Adversarial Training](https://arxiv.org/abs/2506.04263) Alan Mitkiy, James Smith, Hana Satou, Hiroshi Tanaka, Emily Johnson, F Monkey -+ [How stealthy is stealthy? Studying the Efficacy of Black-Box Adversarial Attacks in the Real World](https://arxiv.org//abs/2506.05382) ++ [How stealthy is stealthy? Studying the Efficacy of Black-Box Adversarial Attacks in the Real World](https://arxiv.org/abs/2506.05382) Francesco Panebianco, Mario D'Onghia, Stefano Zanero aand Michele Carminati -+ [Attacking Attention of Foundation Models Disrupts Downstream Tasks](https://arxiv.org//abs/2506.05394) ++ [Attacking Attention of Foundation Models Disrupts Downstream Tasks](https://arxiv.org/abs/2506.05394) Hondamunige Prasanna Silva, Federico Becattini, Lorenzo Seidenari # 2025-06-02 -+ [Unlearning's Blind Spots: Over-Unlearning and Prototypical Relearning Attack](https://arxiv.org//abs/2506.01318) ++ [Unlearning's Blind Spots: Over-Unlearning and Prototypical Relearning Attack](https://arxiv.org/abs/2506.01318) SeungBum Ha, Saerom Park, Sung Whan Yoon -+ [ReconXF: Graph Reconstruction Attack via Public Feature Explanations on Privatized Node Features and Labels](https://arxiv.org//abs/2506.02134) ++ [ReconXF: Graph Reconstruction Attack via Public Feature Explanations on Privatized Node Features and Labels](https://arxiv.org/abs/2506.02134) Rishi Raj Sahoo, Rucha Bhalchandra Joshi, Subhankar Mishra -+ [Mitigating Data Poisoning Attacks to Local Differential Privacy](https://arxiv.org//abs/2506.02156) ++ [Mitigating Data Poisoning Attacks to Local Differential Privacy](https://arxiv.org/abs/2506.02156) Xiaolin Li, Ninghui Li, Boyang Wang, Wenhai Sun -+ [Fingerprinting Deep Learning Models via Network Traffic Patterns in Federated Learning](https://arxiv.org//abs/2506.03207) ++ [Fingerprinting Deep Learning Models via Network Traffic Patterns in Federated Learning](https://arxiv.org/abs/2506.03207) Md Nahid Hasan Shuvo, Moinul Hossain -+ [Dirty and Clean-Label attack detection using GAN discriminators](https://arxiv.org//abs/2506.01224) ++ [Dirty and Clean-Label attack detection using GAN discriminators](https://arxiv.org/abs/2506.01224) John W. Smutny -+ [Comprehensive Vulnerability Analysis is Necessary for Trustworthy LLM-MAS](https://arxiv.org//abs/2506.01245) ++ [Comprehensive Vulnerability Analysis is Necessary for Trustworthy LLM-MAS](https://arxiv.org/abs/2506.01245) Pengfei He, Yue Xing, Shen Dong, Juanhui Li, Zhenwei Dai, Xianfeng Tang, Hui Liu, Han Xu, Zhen Xiang, Charu C. Aggarwal, Hui Liu -+ [Variance-Based Defense Against Blended Backdoor Attacks](https://arxiv.org//abs/2506.01444) ++ [Variance-Based Defense Against Blended Backdoor Attacks](https://arxiv.org/abs/2506.01444) Sujeevan Aseervatham, Achraf Kerzazi, Younès Bennani -+ [MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations](https://arxiv.org//abs/2506.01367) ++ [MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations](https://arxiv.org/abs/2506.01367) Kensuke Mitsuzawa, Damien Garreau -+ [Self-Refining Language Model Anonymizers via Adversarial Distillation](https://arxiv.org//abs/2506.01420) ++ [Self-Refining Language Model Anonymizers via Adversarial Distillation](https://arxiv.org/abs/2506.01420) Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin # 2025-06-01 -+ [CoP: Agentic Red-teaming for Large Language Models using Composition of Principles](https://arxiv.org//abs/2506.00781) ++ [CoP: Agentic Red-teaming for Large Language Models using Composition of Principles](https://arxiv.org/abs/2506.00781) Chen Xiong, Pin-Yu Chen, Tsung-Yi Ho -+ [Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning](https://arxiv.org//abs/2506.00782) ++ [Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning](https://arxiv.org/abs/2506.00782) Weiyang Guo, Zesheng Shi, Zhuo Li, Yequan Wang, Xuebo Liu, Wenya Wang, Fangming Liu, Min Zhang, Jing Li -+ [Unlearning Inversion Attacks for Graph Neural Networks](https://arxiv.org//abs/2506.00808) ++ [Unlearning Inversion Attacks for Graph Neural Networks](https://arxiv.org/abs/2506.00808) Jiahao Zhang, Yilong Wang, Zhiwei Zhang, Xiaorui Liu, Suhang Wang -+ [SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models](https://arxiv.org//abs/2506.00821) ++ [SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models](https://arxiv.org/abs/2506.00821) Huixin Zhan, Jason H. Moore -+ [Understanding and Mitigating Cross-lingual Privacy Leakage via Language-specific and Universal Privacy Neurons](https://arxiv.org//abs/2506.00759) ++ [Understanding and Mitigating Cross-lingual Privacy Leakage via Language-specific and Universal Privacy Neurons](https://arxiv.org/abs/2506.00759) Wenshuo Dong, Qingsong Yang, Shu Yang, Lijie Hu, Meng Ding, Wanyu Lin, Tianhang Zheng, Di Wang -+ [CAPAA: Classifier-Agnostic Projector-Based Adversarial Attack](https://arxiv.org//abs/2506.00978) ++ [CAPAA: Classifier-Agnostic Projector-Based Adversarial Attack](https://arxiv.org/abs/2506.00978) Zhan Li, Mingyu Zhao, Xin Dong, Haibin Ling, Bingyao Huang -+ [Fighting Fire with Fire (F3): A Training-free and Efficient Visual Adversarial Example Purification Method in LVLMs](https://arxiv.org//abs/2506.01064) ++ [Fighting Fire with Fire (F3): A Training-free and Efficient Visual Adversarial Example Purification Method in LVLMs](https://arxiv.org/abs/2506.01064) Yudong Zhang, Ruobing Xie, Yiqing Huang, Jiansheng Chen, Xingwu Sun, Zhanhui Kang, Di Wang, Yu Wang # 2025-05-31 -+ [Monitoring Robustness and Individual Fairness](https://arxiv.org//abs/2506.00496) ++ [Monitoring Robustness and Individual Fairness](https://arxiv.org/abs/2506.00496) Ashutosh Gupta, Thomas A. Henzinger, Konstantin Kueffner, Kaushik Mallik, David Pape -+ [The Security Threat of Compressed Projectors in Large Vision-Language Models](https://arxiv.org//abs/2506.00534) ++ [The Security Threat of Compressed Projectors in Large Vision-Language Models](https://arxiv.org/abs/2506.00534) Yudong Zhang, Ruobing Xie, Xingwu Sun, Jiansheng Chen, Zhanhui Kang, Di Wang, Yu Wang -+ [Bayesian Inference of Training Dataset Membership](https://arxiv.org//abs/2506.00701) ++ [Bayesian Inference of Training Dataset Membership](https://arxiv.org/abs/2506.00701) Yongchao Huang -+ [Spectral Insights into Data-Oblivious Critical Layers in Large Language Models](https://arxiv.org//abs/2506.00382) ++ [Spectral Insights into Data-Oblivious Critical Layers in Large Language Models](https://arxiv.org/abs/2506.00382) Xuyuan Liu, Lei Hsiung, Yaoqing Yang, Yujun Yan -+ [LoRA as a Flexible Framework for Securing Large Vision Systems](https://arxiv.org//abs/2506.00661) ++ [LoRA as a Flexible Framework for Securing Large Vision Systems](https://arxiv.org/abs/2506.00661) Zander W. Blasingame, Richard E. Neddo, Chen Liu -+ [Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol (MCP) Ecosystem](https://arxiv.org//abs/2506.02040) ++ [Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol (MCP) Ecosystem](https://arxiv.org/abs/2506.02040) Hao Song, Yiming Shen, Wenxuan Luo, Leixin Guo, Ting Chen, Jiashui Wang, Beibei Li, Xiaosong Zhang, Jiachi Chen # 2025-05-30 -+ [SEAR: A Multimodal Dataset for Analyzing AR-LLM-Driven Social Engineering Behaviors](https://arxiv.org//abs/2505.24458) ++ [SEAR: A Multimodal Dataset for Analyzing AR-LLM-Driven Social Engineering Behaviors](https://arxiv.org/abs/2505.24458) Tianlong Yu, Chenghang Ye, Zheyu Yang, Ziyi Zhou, Cui Tang, Zui Tao, Jun Zhang, Kailong Wang, Liting Zhou, Yang Yang, Ting Bi -+ [The Butterfly Effect in Pathology: Exploring Security in Pathology Foundation Models](https://arxiv.org//abs/2505.24141) ++ [The Butterfly Effect in Pathology: Exploring Security in Pathology Foundation Models](https://arxiv.org/abs/2505.24141) Jiashuai Liu, Yingjia Shang, Yingkang Zhan, Di Zhang, Yi Niu, Dong Wei, Xian Wu, Zeyu Gao, Chen Li, Yefeng Zheng -+ [From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models](https://arxiv.org//abs/2505.24232) ++ [From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models](https://arxiv.org/abs/2505.24232) Haibo Jin, Peiyan Zhang, Peiran Wang, Man Luo, Haohan Wang -+ [An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring](https://arxiv.org//abs/2505.24239) ++ [An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring](https://arxiv.org/abs/2505.24239) Sana Ebrahimi, Mohsen Dehghankar, Abolfazl Asudeh -+ [Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings](https://arxiv.org//abs/2505.24341) ++ [Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings](https://arxiv.org/abs/2505.24341) Shujian Yang, Shiyao Cui, Chuanrui Hu, Haicheng Wang, Tianwei Zhang, Minlie Huang, Jialiang Lu, Han Qiu -+ [Adversarial Preference Learning for Robust LLM Alignment](https://arxiv.org//abs/2505.24369) ++ [Adversarial Preference Learning for Robust LLM Alignment](https://arxiv.org/abs/2505.24369) Yuanfu Wang, Pengyu Wang, Chenyang Xi, Bo Tang, Junyi Zhu, Wenqiang Wei, Chen Chen, Chao Yang, Jingfeng Zhang, Chaochao Lu, Yijun Niu, Keming Mao, Zhiyu Li, Feiyu Xiong, Jie Hu, Mingchuan Yang -+ [Breaking the Gold Standard: Extracting Forgotten Data under Exact Unlearning in Large Language Models](https://arxiv.org//abs/2505.24379) ++ [Breaking the Gold Standard: Extracting Forgotten Data under Exact Unlearning in Large Language Models](https://arxiv.org/abs/2505.24379) Xiaoyu Wu, Yifei Pang, Terrance Liu, Zhiwei Steven Wu -+ [Learning Safety Constraints for Large Language Models](https://arxiv.org//abs/2505.24445) ++ [Learning Safety Constraints for Large Language Models](https://arxiv.org/abs/2505.24445) Xin Chen, Yarden As, Andreas Krause -+ [Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors](https://arxiv.org//abs/2505.24523) ++ [Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors](https://arxiv.org/abs/2505.24523) Andrea Pedrotti, Michele Papucci, Cristiano Ciaccio, Alessio Miaschi, Giovanni Puccetti, Felice Dell'Orletta, Andrea Esuli -+ [A Flat Minima Perspective on Understanding Augmentations and Model Robustness](https://arxiv.org//abs/2505.24592) ++ [A Flat Minima Perspective on Understanding Augmentations and Model Robustness](https://arxiv.org/abs/2505.24592) Weebum Yoo, Sung Whan Yoon -+ [Model Unlearning via Sparse Autoencoder Subspace Guided Projections](https://arxiv.org//abs/2505.24428) ++ [Model Unlearning via Sparse Autoencoder Subspace Guided Projections](https://arxiv.org/abs/2505.24428) Xu Wang, Zihao Li, Benyou Wang, Yan Hu, Difan Zou -+ [AMIA: Automatic Masking and Joint Intention Analysis Makes LVLMs Robust Jailbreak Defenders](https://arxiv.org//abs/2505.24519) ++ [AMIA: Automatic Masking and Joint Intention Analysis Makes LVLMs Robust Jailbreak Defenders](https://arxiv.org/abs/2505.24519) Yuqi Zhang, Yuchun Miao, Zuchao Li, Liang Ding -+ [Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models](https://arxiv.org//abs/2505.24227) ++ [Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models](https://arxiv.org/abs/2505.24227) Ying Yang, Jie Zhang, Xiao Lv, Di Lin, Tao Xiang, Qing Guo -+ [Leveraging Intermediate Features of Vision Transformer for Face Anti-Spoofing](https://arxiv.org//abs/2505.24402) ++ [Leveraging Intermediate Features of Vision Transformer for Face Anti-Spoofing](https://arxiv.org/abs/2505.24402) Mika Feng, Koichi Ito, Takafumi Aoki, Tetsushi Ohki, Masakatsu Nishigaki -+ [Black-box Adversarial Attacks on CNN-based SLAM Algorithms](https://arxiv.org//abs/2505.24654) ++ [Black-box Adversarial Attacks on CNN-based SLAM Algorithms](https://arxiv.org/abs/2505.24654) Maria Rafaela Gkeka, Bowen Sun, Evgenia Smirni, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas -+ [PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches](https://arxiv.org//abs/2505.24703) ++ [PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches](https://arxiv.org/abs/2505.24703) Dennis Jacob, Chong Xiang, Prateek Mittal -+ [Practical Bayes-Optimal Membership Inference Attacks](https://arxiv.org//abs/2505.24089) ++ [Practical Bayes-Optimal Membership Inference Attacks](https://arxiv.org/abs/2505.24089) Marcus Lassila, Johan Östman, Khac-Hoang Ngo, Alexandre Graell i Amat -+ [Robust Federated Learning against Model Perturbation in Edge Networks](https://arxiv.org//abs/2505.24728) ++ [Robust Federated Learning against Model Perturbation in Edge Networks](https://arxiv.org/abs/2505.24728) Dongzi Jin, Yong Xiao, Yingyu Li -+ [ByzFL: Research Framework for Robust Federated Learning](https://arxiv.org//abs/2505.24802) ++ [ByzFL: Research Framework for Robust Federated Learning](https://arxiv.org/abs/2505.24802) Marc González, Rachid Guerraoui, Rafael Pinot, Geovani Rizk, John Stephan, François Taïani -+ [Cascading Adversarial Bias from Injection to Distillation in Language Models](https://arxiv.org//abs/2505.24842) ++ [Cascading Adversarial Bias from Injection to Distillation in Language Models](https://arxiv.org/abs/2505.24842) Harsh Chaudhari, Jamie Hayes, Matthew Jagielski, Ilia Shumailov, Milad Nasr, Alina Oprea -+ [COSMIC: Generalized Refusal Direction Identification in LLM Activations](https://arxiv.org//abs/2506.00085) ++ [COSMIC: Generalized Refusal Direction Identification in LLM Activations](https://arxiv.org/abs/2506.00085) Vincent Siu, Nicholas Crispino, Zihao Yu, Sam Pan, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang -+ [TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents](https://arxiv.org//abs/2506.00089) ++ [TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents](https://arxiv.org/abs/2506.00089) Hyundong Jin, Sicheol Sung, Shinwoo Park, SeungYeop Baik, Yo-Sub Han -+ [Heterogeneous Graph Backdoor Attack](https://arxiv.org//abs/2506.00191) ++ [Heterogeneous Graph Backdoor Attack](https://arxiv.org/abs/2506.00191) Jiawei Chen, Lusi Li, Daniel Takabi, Masha Sosonkina, Rui Ning -+ [Adversarial Threat Vectors and Risk Mitigation for Retrieval-Augmented Generation Systems](https://arxiv.org//abs/2506.00281) ++ [Adversarial Threat Vectors and Risk Mitigation for Retrieval-Augmented Generation Systems](https://arxiv.org/abs/2506.00281) Chris M. Ward, Josh Harguess -+ [Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges](https://arxiv.org//abs/2506.02032) ++ [Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges](https://arxiv.org/abs/2506.02032) Raj Patel, Himanshu Tripathi, Jasper Stone, Noorbakhsh Amiri Golilarz, Sudip Mittal, Shahram Rahimi, Vini Chaudhary -+ [A Red Teaming Roadmap Towards System-Level Safety](https://arxiv.org//abs/2506.05376) ++ [A Red Teaming Roadmap Towards System-Level Safety](https://arxiv.org/abs/2506.05376) Zifan Wang, Christina Q. Knight, Jeremy Kritz, Willow E. Primack, Julian Michael -+ [An Independent Discriminant Network Towards Identification of Counterfeit Images and Videos](https://arxiv.org//abs/2506.05377) ++ [An Independent Discriminant Network Towards Identification of Counterfeit Images and Videos](https://arxiv.org/abs/2506.05377) Shayantani Kar, B. Shresth Bhimrajka, Aditya Kumar, Sahil Gupta, Sourav Ghosh, Subhamita Mukherjee, Shauvik Paul -+ [How much do language models memorize?](https://arxiv.org//abs/2505.24832) ++ [How much do language models memorize?](https://arxiv.org/abs/2505.24832) John X. Morris, Chawin Sitawarin, Chuan Guo, Narine Kokhlikyan, G. Edward Suh, Alexander M. Rush, Kamalika Chaudhuri, Saeed Mahloujifar -+ [Shadow defense against gradient inversion attack in federated learning](https://arxiv.org//abs/2506.15711) ++ [Shadow defense against gradient inversion attack in federated learning](https://arxiv.org/abs/2506.15711) Le Jiang, Liyan Ma, Guang Yang # 2025-05-29 -+ [TRAP: Targeted Redirecting of Agentic Preferences](https://arxiv.org//abs/2505.23518) ++ [TRAP: Targeted Redirecting of Agentic Preferences](https://arxiv.org/abs/2505.23518) Hangoo Kang, Jehyeok Yeon, Gagandeep Singh -+ [SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents](https://arxiv.org//abs/2505.23559) ++ [SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents](https://arxiv.org/abs/2505.23559) Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yue Su, Haofei Yu, Jiaxuan You -+ [Fooling the Watchers: Breaking AIGC Detectors via Semantic Prompt Attacks](https://arxiv.org//abs/2505.23192) ++ [Fooling the Watchers: Breaking AIGC Detectors via Semantic Prompt Attacks](https://arxiv.org/abs/2505.23192) Run Hao, Peng Ying -+ [Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion](https://arxiv.org//abs/2505.23266) ++ [Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion](https://arxiv.org/abs/2505.23266) Chunlong Xie, Jialing He, Shangwei Guo, Jiacheng Wang, Shudong Zhang, Tianwei Zhang, Tao Xiang -+ [Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition](https://arxiv.org//abs/2505.23313) ++ [Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition](https://arxiv.org/abs/2505.23313) Weizhe Kong, Xiao Wang, Ruichong Gao, Chenglong Li, Yu Zhang, Xing Yang, Yaowei Wang, Jin Tang -+ [Keyed Chaotic Tensor Transformations for Secure And Attributable Neural Inference](https://arxiv.org//abs/2505.23655) ++ [Keyed Chaotic Tensor Transformations for Secure And Attributable Neural Inference](https://arxiv.org/abs/2505.23655) Peter David Fagan -+ [Distributed Federated Learning for Vehicular Network Security: Anomaly Detection Benefits and Multi-Domain Attack Threats](https://arxiv.org//abs/2505.23706) ++ [Distributed Federated Learning for Vehicular Network Security: Anomaly Detection Benefits and Multi-Domain Attack Threats](https://arxiv.org/abs/2505.23706) Utku Demir, Yalin E. Sagduyu, Tugba Erpek, Hossein Jafari, Sastry Kompella, Mengran Xue -+ [DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors](https://arxiv.org//abs/2505.23001) ++ [DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors](https://arxiv.org/abs/2505.23001) Yize Cheng, Wenxiao Wang, Mazda Moayeri, Soheil Feizi -+ [Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language Models](https://arxiv.org//abs/2505.23015) ++ [Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language Models](https://arxiv.org/abs/2505.23015) Jinwen Chen, Hainan Zhang, Fei Sun, Qinnan Zhang, Sijia Wen, Ziwei Wang, Zhiming Zheng -+ [Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models](https://arxiv.org//abs/2505.23404) ++ [Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models](https://arxiv.org/abs/2505.23404) Mingyu Yu, Wei Wang, Yanjie Wei, Sujuan Qin -+ [MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment](https://arxiv.org//abs/2505.23634) ++ [MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment](https://arxiv.org/abs/2505.23634) John Halloran -+ [Model Immunization from a Condition Number Perspective](https://arxiv.org//abs/2505.23760) ++ [Model Immunization from a Condition Number Perspective](https://arxiv.org/abs/2505.23760) Amber Yijia Zheng, Cedar Site Bai, Brian Bullins, Raymond A. Yeh -+ [Bayesian Perspective on Memorization and Reconstruction](https://arxiv.org//abs/2505.23658) ++ [Bayesian Perspective on Memorization and Reconstruction](https://arxiv.org/abs/2505.23658) Haim Kaplan, Yishay Mansour, Kobbi Nissim, Uri Stemmer -+ [Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models](https://arxiv.org//abs/2505.23561) ++ [Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models](https://arxiv.org/abs/2505.23561) Zenghui Yuan, Yangming Xu, Jiawen Shi, Pan Zhou, Lichao Sun -+ [Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention](https://arxiv.org//abs/2505.23968) ++ [Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention](https://arxiv.org/abs/2505.23968) Stephan Rabanser, Ali Shahin Shamsabadi, Olive Franzese, Xiao Wang, Adrian Weller, Nicolas Papernot -+ [LLM Agents Should Employ Security Principles](https://arxiv.org//abs/2505.24019) ++ [LLM Agents Should Employ Security Principles](https://arxiv.org/abs/2505.24019) Kaiyuan Zhang, Zian Su, Pin-Yu Chen, Elisa Bertino, Xiangyu Zhang, Ninghui Li -+ [Keyed Chaotic Masking: A Functional Privacy Framework for Neural Inference](https://arxiv.org//abs/2505.23655) ++ [Keyed Chaotic Masking: A Functional Privacy Framework for Neural Inference](https://arxiv.org/abs/2505.23655) Peter David Fagan -+ [NeuronTune: Towards Self-Guided Spurious Bias Mitigation](https://arxiv.org//abs/2505.24048) ++ [NeuronTune: Towards Self-Guided Spurious Bias Mitigation](https://arxiv.org/abs/2505.24048) Guangtao Zheng, Wenqian Ye, Aidong Zhang -+ [Can Emotion Fool Anti-spoofing?](https://arxiv.org//abs/2505.23962) ++ [Can Emotion Fool Anti-spoofing?](https://arxiv.org/abs/2505.23962) Aurosweta Mahapatra, Ismail Rasim Ulgen, Abinay Reddy Naini, Carlos Busso, Berrak Sisman -+ [DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors](https://arxiv.org//abs/2505.23001) ++ [DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors](https://arxiv.org/abs/2505.23001) Yize Cheng, Wenxiao Wang, Mazda Moayeri, Soheil Feizi -+ [Vid-SME: Membership Inference Attacks against Large Video Understanding Models](https://arxiv.org//abs/2506.03179) ++ [Vid-SME: Membership Inference Attacks against Large Video Understanding Models](https://arxiv.org/abs/2506.03179) Qi Li, Runpeng Yu, Xinchao Wang -+ [Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models](https://arxiv.org//abs/2505.23404) ++ [Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models](https://arxiv.org/abs/2505.23404) Mingyu Yu, Wei Wang, Yanjie Wei, Sujuan Qin -+ [Securing AI Agents with Information-Flow Control](https://arxiv.org//abs/2505.23643) ++ [Securing AI Agents with Information-Flow Control](https://arxiv.org/abs/2505.23643) Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin -+ [Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert](https://arxiv.org//abs/2505.23868) ++ [Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert](https://arxiv.org/abs/2505.23868) Zhaokun Wang, Jinyu Guo, Jingwen Pu, Lingfeng Chen, Hongli Pu, Jie Ou, Libo Qin, Wenhong Tian -+ [Differential Gated Self-Attention](https://arxiv.org//abs/2505.24054) ++ [Differential Gated Self-Attention](https://arxiv.org/abs/2505.24054) Elpiniki Maria Lygizou, Mónika Farsang, Radu Grosu -+ [ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork](https://arxiv.org//abs/2505.23686) ++ [ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork](https://arxiv.org/abs/2505.23686) Caroline Wang, Arrasy Rahman, Jiaxun Cui, Yoonchang Sung, Peter Stone # 2025-05-28 -+ [Rethinking Gradient-based Adversarial Attacks on Point Cloud Classification](https://arxiv.org//abs/2505.21854) ++ [Rethinking Gradient-based Adversarial Attacks on Point Cloud Classification](https://arxiv.org/abs/2505.21854) Jun Chen, Xinke Li, Mingyue Xu, Tianrui Li, Chongshou Li -+ [Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection](https://arxiv.org//abs/2505.21938) ++ [Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection](https://arxiv.org/abs/2505.21938) Qirun Zeng, Eric He, Richard Hoffmann, Xuchuang Wang, Jinhang Zuo -+ [Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models](https://arxiv.org//abs/2505.22271) ++ [Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models](https://arxiv.org/abs/2505.22271) Yongcan Yu, Yanbo Wang, Ran He, Jian Liang -+ [From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization](https://arxiv.org//abs/2505.22310) ++ [From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization](https://arxiv.org/abs/2505.22310) Shoaib Ahmed Siddiqui, Adrian Weller, David Krueger, Gintare Karolina Dziugaite, Michael Curtis Mozer, Eleni Triantafillou -+ [RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments](https://arxiv.org//abs/2505.21936) ++ [RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments](https://arxiv.org/abs/2505.21936) Zeyi Liao, Jaylen Jones, Linxi Jiang, Eric Fosler-Lussier, Yu Su, Zhiqiang Lin, Huan Sun -+ [Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack](https://arxiv.org//abs/2505.21967) ++ [Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack](https://arxiv.org/abs/2505.21967) Juan Ren, Mark Dras, Usman Naseem -+ [Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home?](https://arxiv.org//abs/2505.22061) ++ [Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home?](https://arxiv.org/abs/2505.22061) Yujin Choi, Youngjoo Park, Junyoung Byun, Jaewook Lee, Jinseong Park -+ [Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing](https://arxiv.org//abs/2505.22298) ++ [Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing](https://arxiv.org/abs/2505.22298) Yifan Lu, Jing Li, Yigeng Zhou, Yihui Zhang, Wenya Wang, Xiucheng Li, Meishan Zhang, Fangming Liu, Jun Yu, Min Zhang -+ [On the Transferability and Discriminability of Repersentation Learning in Unsupervised Domain Adaptation](https://arxiv.org//abs/2505.22099) ++ [On the Transferability and Discriminability of Repersentation Learning in Unsupervised Domain Adaptation](https://arxiv.org/abs/2505.22099) Wenwen Qiang, Ziyin Gu, Lingyu Si, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong -+ [IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground Truth](https://arxiv.org//abs/2505.22305) ++ [IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground Truth](https://arxiv.org/abs/2505.22305) Md Touhidul Islam, Imran Kabir, Md Alimoor Reza, Syed Masum Billah -+ [The Meeseeks Mesh: Spatially Consistent 3D Adversarial Objects for BEV Detector](https://arxiv.org//abs/2505.22499) ++ [The Meeseeks Mesh: Spatially Consistent 3D Adversarial Objects for BEV Detector](https://arxiv.org/abs/2505.22499) Aixuan Li, Mochu Xiang, Jing Zhang, Yuchao Dai -+ [Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective](https://arxiv.org//abs/2505.22604) ++ [Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective](https://arxiv.org/abs/2505.22604) Ruixuan Zhang, He Wang, Zhengyu Zhao, Zhiqing Guo, Xun Yang, Yunfeng Diao, Meng Wang -+ [MAMBO-NET: Multi-Causal Aware Modeling Backdoor-Intervention Optimization for Medical Image Segmentation Network](https://arxiv.org//abs/2505.21874) ++ [MAMBO-NET: Multi-Causal Aware Modeling Backdoor-Intervention Optimization for Medical Image Segmentation Network](https://arxiv.org/abs/2505.21874) Ruiguo Yu, Yiyang Zhang, Yuan Tian, Yujie Diao, Di Jin, Witold Pedrycz -+ [Understanding Adversarial Training with Energy-based Models](https://arxiv.org//abs/2505.22486) ++ [Understanding Adversarial Training with Energy-based Models](https://arxiv.org/abs/2505.22486) Mujtaba Hussain Mirza, Maria Rosaria Briglia, Filippo Bartolucci, Senad Beadini, Giuseppe Lisanti, Iacopo Masi -+ [A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective](https://arxiv.org//abs/2505.22322) ++ [A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective](https://arxiv.org/abs/2505.22322) Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen, Xiaoge Zhang, Kaiyu Tang, Xiao Li, Jing Li -+ [Privacy-preserving Prompt Personalization in Federated Learning for Multimodal Large Language Models](https://arxiv.org//abs/2505.22447) ++ [Privacy-preserving Prompt Personalization in Federated Learning for Multimodal Large Language Models](https://arxiv.org/abs/2505.22447) Sizai Hou, Songze Li, Baturalp Buyukates -+ [Efficient Preimage Approximation for Neural Network Certification](https://arxiv.org//abs/2505.22798) ++ [Efficient Preimage Approximation for Neural Network Certification](https://arxiv.org/abs/2505.22798) Anton Björklund, Mykola Zaitsev, Marta Kwiatkowska -+ [How Do Diffusion Models Improve Adversarial Robustness?](https://arxiv.org//abs/2505.22839) ++ [How Do Diffusion Models Improve Adversarial Robustness?](https://arxiv.org/abs/2505.22839) Liu Yuezhang, Xue-Xin Wei -+ [Operationalizing CaMeL: Strengthening LLM Defenses for Enterprise Deployment](https://arxiv.org//abs/2505.22852) ++ [Operationalizing CaMeL: Strengthening LLM Defenses for Enterprise Deployment](https://arxiv.org/abs/2505.22852) Krti Tallam, Emma Miller -+ [Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates](https://arxiv.org//abs/2505.22943) ++ [Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates](https://arxiv.org/abs/2505.22943) Jaewoo Ahn, Heeseung Yun, Dayoon Ko, Gunhee Kim -+ [Machine Learning Models Have a Supply Chain Problem](https://arxiv.org//abs/2505.22778) ++ [Machine Learning Models Have a Supply Chain Problem](https://arxiv.org/abs/2505.22778) Sarah Meiklejohn, Hayden Blauzvern, Mihai Maruseac, Spencer Schrock, Laurent Simon, Ilia Shumailov -+ [TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE](https://arxiv.org//abs/2505.22735) ++ [TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE](https://arxiv.org/abs/2505.22735) Tong Sun, Bowen Jiang, Hailong Lin, Borui Li, Yixiao Teng, Yi Gao, Wei Dong -+ [Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM](https://arxiv.org//abs/2505.23828) ++ [Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM](https://arxiv.org/abs/2505.23828) Lei Yu, Yechao Zhang, Ziqi Zhou, Yang Wu, Wei Wan, Minghui Li, Shengshan Hu, Pei Xiaobing, Jing Wang -+ [GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance](https://arxiv.org//abs/2505.23839) ++ [GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance](https://arxiv.org/abs/2505.23839) Zaixi Zhang, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang -+ [Are classical deep neural networks weakly adversarially robust?](https://arxiv.org//abs/2506.02016) ++ [Are classical deep neural networks weakly adversarially robust?](https://arxiv.org/abs/2506.02016) Nuolin Sun, Linyuan Wang, Dongyang Li, Bin Yan, Lei Li -+ [PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models](https://arxiv.org//abs/2506.03170) ++ [PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models](https://arxiv.org/abs/2506.03170) Murthy L, Subarna Tripathi -+ [Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems](https://arxiv.org//abs/2505.23847) ++ [Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems](https://arxiv.org/abs/2505.23847) Ronny Ko, Jiseong Jeong, Shuyuan Zheng, Chuan Xiao, Tae-Wan Kim, Makoto Onizuka, Won-Yong Shin # 2025-05-27 -+ [Preventing Adversarial AI Attacks Against Autonomous Situational Awareness: A Maritime Case Study](https://arxiv.org//abs/2505.21609) ++ [Preventing Adversarial AI Attacks Against Autonomous Situational Awareness: A Maritime Case Study](https://arxiv.org/abs/2505.21609) Mathew J. Walter, Aaron Barrett, Kimberly Tam -+ [VideoMarkBench: Benchmarking Robustness of Video Watermarking](https://arxiv.org//abs/2505.21620) ++ [VideoMarkBench: Benchmarking Robustness of Video Watermarking](https://arxiv.org/abs/2505.21620) Zhengyuan Jiang, Moyang Guo, Kecen Li, Yuepeng Hu, Yupu Wang, Zhicong Huang, Cheng Hong, Neil Zhenqiang Gong -+ [Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space](https://arxiv.org//abs/2505.21277) ++ [Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space](https://arxiv.org/abs/2505.21277) Yao Huang, Yitong Sun, Shouwei Ruan, Yichi Zhang, Yinpeng Dong, Xingxing Wei -+ [Calibrating LLM Confidence by Probing Perturbed Representation Stability](https://arxiv.org//abs/2505.21772) ++ [Calibrating LLM Confidence by Probing Perturbed Representation Stability](https://arxiv.org/abs/2505.21772) Reza Khanmohammadi, Erfan Miahi, Mehrsa Mardikoraem, Simerjot Kaur, Ivan Brugere, Charese H. Smiley, Kundan Thind, Mohammad M. Ghassemi -+ [What is Adversarial Training for Diffusion Models?](https://arxiv.org//abs/2505.21742) ++ [What is Adversarial Training for Diffusion Models?](https://arxiv.org/abs/2505.21742) Briglia Maria Rosaria, Mujtaba Hussain Mirza, Giuseppe Lisanti, Iacopo Masi -+ [Faster Rates for Private Adversarial Bandits](https://arxiv.org//abs/2505.21790) ++ [Faster Rates for Private Adversarial Bandits](https://arxiv.org/abs/2505.21790) Hilal Asi, Vinod Raman, Kunal Talwar -+ [System Prompt Extraction Attacks and Defenses in Large Language Models](https://arxiv.org//abs/2505.23817) ++ [System Prompt Extraction Attacks and Defenses in Large Language Models](https://arxiv.org/abs/2505.23817) Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu @@ -9119,96 +9119,96 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Haowei Wang, Junjie Wang, Xiaojun Jia, Rupeng Zhang, Mingyang Li, Zhe Liu, Yang Liu, Qing Wang -+ [Adversarial bandit optimization for approximately linear functions](https://arxiv.org//abs/2505.20734) ++ [Adversarial bandit optimization for approximately linear functions](https://arxiv.org/abs/2505.20734) Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto -+ [Label Leakage in Federated Inertial-based Human Activity Recognition](https://arxiv.org//abs/2505.20924) ++ [Label Leakage in Federated Inertial-based Human Activity Recognition](https://arxiv.org/abs/2505.20924) Marius Bock, Maximilian Hopp, Kristof Van Laerhoven, Michael Moeller -+ [Automated Privacy Information Annotation in Large Language Model Interactions](https://arxiv.org//abs/2505.20910) ++ [Automated Privacy Information Annotation in Large Language Model Interactions](https://arxiv.org/abs/2505.20910) Hang Zeng, Xiangyu Liu, Yong Hu, Chaoyue Niu, Fan Wu, Shaojie Tang, Guihai Chen -+ [Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration](https://arxiv.org//abs/2505.21472) ++ [Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration](https://arxiv.org/abs/2505.21472) Mehrdad Fazli, Bowen Wei, Ahmet Sari, Ziwei Zhu -+ [Concealment of Intent: A Game-Theoretic Analysis](https://arxiv.org//abs/2505.20841) ++ [Concealment of Intent: A Game-Theoretic Analysis](https://arxiv.org/abs/2505.20841) Xinbo Wu, Abhishek Umrawal, Lav R. Varshney -+ [Learnable Kernel Density Estimation for Graphs](https://arxiv.org//abs/2505.21285) ++ [Learnable Kernel Density Estimation for Graphs](https://arxiv.org/abs/2505.21285) Xudong Wang, Ziheng Sun, Chris Ding, Jicong Fan -+ [PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing](https://arxiv.org//abs/2505.21184) ++ [PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing](https://arxiv.org/abs/2505.21184) Yu Yan, Sheng Sun, Zhifei Zheng, Ziji Hao, Teli Liu, Min Liu -+ [Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models](https://arxiv.org//abs/2505.20955) ++ [Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models](https://arxiv.org/abs/2505.20955) Puwei Lian, Yujun Cai, Songze Li, Bingkun Bao # 2025-05-26 -+ [Capability-Based Scaling Laws for LLM Red-Teaming](https://arxiv.org//abs/2505.20162) ++ [Capability-Based Scaling Laws for LLM Red-Teaming](https://arxiv.org/abs/2505.20162) Alexander Panfilov, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping -+ [Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation](https://arxiv.org//abs/2505.19459) ++ [Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation](https://arxiv.org/abs/2505.19459) Kaichao Jiang, He Wang, Xiaoshuai Hao, Xiulong Yang, Ajian Liu, Qi Chu, Yunfeng Diao -+ [DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation](https://arxiv.org//abs/2505.19504) ++ [DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation](https://arxiv.org/abs/2505.19504) Pingzhi Li, Zhen Tan, Huaizhi Qu, Huan Liu, Tianlong Chen -+ [Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models](https://arxiv.org//abs/2505.19616) ++ [Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models](https://arxiv.org/abs/2505.19616) Rui Cai, Bangzheng Li, Xiaofei Wen, Muhao Chen, Zhe Zhao -+ [LAPA-based Dynamic Privacy Optimization for Wireless Federated Learning in Heterogeneous Environments](https://arxiv.org//abs/2505.19823) ++ [LAPA-based Dynamic Privacy Optimization for Wireless Federated Learning in Heterogeneous Environments](https://arxiv.org/abs/2505.19823) Pengcheng Sun, Erwu Liu, Wei Ni, Rui Wang, Yuanzhe Geng, Lijuan Lai, Abbas Jamalipour -+ [Novel Loss-Enhanced Universal Adversarial Patches for Sustainable Speaker Privacy](https://arxiv.org//abs/2505.19951) ++ [Novel Loss-Enhanced Universal Adversarial Patches for Sustainable Speaker Privacy](https://arxiv.org/abs/2505.19951) Elvir Karimov, Alexander Varlamov, Danil Ivanov, Dmitrii Korzh, Oleg Y. Rogov -+ [Gradient Inversion Transcript: Leveraging Robust Generative Priors to Reconstruct Training Data from Gradient Leakage](https://arxiv.org//abs/2505.20026) ++ [Gradient Inversion Transcript: Leveraging Robust Generative Priors to Reconstruct Training Data from Gradient Leakage](https://arxiv.org/abs/2505.20026) Xinping Chen, Chen Liu -+ [Lifelong Safety Alignment for Language Models](https://arxiv.org//abs/2505.20259) ++ [Lifelong Safety Alignment for Language Models](https://arxiv.org/abs/2505.20259) Haoyu Wang, Zeyu Qin, Yifei Zhao, Chao Du, Min Lin, Xueqian Wang, Tianyu Pang -+ [Holes in Latent Space: Topological Signatures Under Adversarial Influence](https://arxiv.org//abs/2505.20435) ++ [Holes in Latent Space: Topological Signatures Under Adversarial Influence](https://arxiv.org/abs/2505.20435) Aideen Fay, Inés García-Redondo, Qiquan Wang, Haim Dubossarsky, Anthea Monod -+ [Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts](https://arxiv.org//abs/2505.21556) ++ [Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts](https://arxiv.org/abs/2505.21556) Hee-Seon Kim, Minbeom Kim, Wonjun Lee, Kihyun Kim, Changick Kim -+ [VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models](https://arxiv.org//abs/2505.19684) ++ [VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models](https://arxiv.org/abs/2505.19684) Bingrui Sima, Linhua Cong, Wenxuan Wang, Kun He -+ [Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression](https://arxiv.org//abs/2505.19398) ++ [Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression](https://arxiv.org/abs/2505.19398) Yiwei Xie, Ping Liu, Zheng Zhang -+ [Zero-Trust Foundation Models: A New Paradigm for Secure and Collaborative Artificial Intelligence for Internet of Things](https://arxiv.org//abs/2505.23792) ++ [Zero-Trust Foundation Models: A New Paradigm for Secure and Collaborative Artificial Intelligence for Internet of Things](https://arxiv.org/abs/2505.23792) Kai Li, Conggai Li, Xin Yuan, Shenghong Li, Sai Zou, Syed Sohail Ahmed, Wei Ni, Dusit Niyato, Abbas Jamalipour, Falko Dressler, Ozgur B. Akan -+ [MultiPhishGuard: An LLM-based Multi-Agent System for Phishing Email Detection](https://arxiv.org//abs/2505.23803) ++ [MultiPhishGuard: An LLM-based Multi-Agent System for Phishing Email Detection](https://arxiv.org/abs/2505.23803) Yinuo Xue, Eric Spero, Yun Sing Koh, Giovanni Russello -+ [JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models](https://arxiv.org//abs/2505.19610) ++ [JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models](https://arxiv.org/abs/2505.19610) Jiaxin Song, Yixu Wang, Jie Li, Rui Yu, Yan Teng, Xingjun Ma, Yingchun Wang @@ -9284,48 +9284,48 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Sangyeop Kim, Yohan Lee, Yongwoo Song, Kimin Lee -+ [Efficient and Stealthy Jailbreak Attacks via Adversarial Prompt Distillation from LLMs to SLMs](https://arxiv.org//abs/2506.17231) ++ [Efficient and Stealthy Jailbreak Attacks via Adversarial Prompt Distillation from LLMs to SLMs](https://arxiv.org/abs/2506.17231) Xiang Li, Chong Zhang, Jia Wang, Fangyu Wu, Yushi Li, Xiaobo Jin -+ [Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach](https://arxiv.org//abs/2506.18756) ++ [Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach](https://arxiv.org/abs/2506.18756) Chong Zhang, Xiang Li, Jia Wang, Shan Liang, Haochen Xue, Xiaobo Jin -+ [One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP](https://arxiv.org//abs/2505.19840) ++ [One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP](https://arxiv.org/abs/2505.19840) Binyan Xu, Xilin Dai, Di Tang, Kehuan Zhang -+ [Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study](https://arxiv.org//abs/2505.19598) ++ [Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study](https://arxiv.org/abs/2505.19598) Guanyu Hou, Jiaming He, Yinhang Zhou, Ji Guo, Yitong Qiao, Rui Zhang, Wenbo Jiang -+ [TESSER: Transfer-Enhancing Adversarial Attacks from Vision Transformers via Spectral and Semantic Regularization](https://arxiv.org//abs/2505.19613) ++ [TESSER: Transfer-Enhancing Adversarial Attacks from Vision Transformers via Spectral and Semantic Regularization](https://arxiv.org/abs/2505.19613) Amira Guesmi, Bassem Ouni, Muhammad Shafique -+ [TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent](https://arxiv.org//abs/2505.20118) ++ [TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent](https://arxiv.org/abs/2505.20118) Dominik Meier, Jan Philip Wahle, Paul Röttger, Terry Ruas, Bela Gipp # 2025-05-25 -+ [Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations](https://arxiv.org//abs/2505.18907) ++ [Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations](https://arxiv.org/abs/2505.18907) Sanjay Kariyappa, G. Edward Suh -+ [An Embarrassingly Simple Defense Against LLM Abliteration Attacks](https://arxiv.org//abs/2505.19056) ++ [An Embarrassingly Simple Defense Against LLM Abliteration Attacks](https://arxiv.org/abs/2505.19056) Harethah Abu Shairah, Hasan Abed Al Kader Hammoud, Bernard Ghanem, George Turkiyyah -+ [CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning](https://arxiv.org//abs/2505.19119) ++ [CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning](https://arxiv.org/abs/2505.19119) Renyuan Li, Zhibo Liang, Haichuan Zhang, Tianyu Shi, Zhiyuan Cheng, Jia Shi, Carl Yang, Mingjie Tang -+ [Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation](https://arxiv.org//abs/2505.19194) ++ [Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation](https://arxiv.org/abs/2505.19194) Peiran Sun -+ [Evaluating Query Efficiency and Accuracy of Transfer Learning-based Model Extraction Attack in Federated Learning](https://arxiv.org//abs/2505.23791) ++ [Evaluating Query Efficiency and Accuracy of Transfer Learning-based Model Extraction Attack in Federated Learning](https://arxiv.org/abs/2505.23791) Sayyed Farid Ahamed, Sandip Roy, Soumya Banerjee, Marc Vucovich, Kevin Choi, Abdul Rahman, Alison Hu, Edward Bowen, Sachin Shetty @@ -9353,19 +9353,19 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Amit Chakraborty, Sayyed Farid Ahamed, Sandip Roy, Soumya Banerjee, Kevin Choi, Abdul Rahman, Alison Hu, Edward Bowen, Sachin Shetty -+ [A Comprehensive Survey on the Risks and Limitations of Concept-based Models](https://arxiv.org//abs/2506.04237) ++ [A Comprehensive Survey on the Risks and Limitations of Concept-based Models](https://arxiv.org/abs/2506.04237) Sanchit Sinha, Aidong Zhang -+ [Ignition Phase : Standard Training for Fast Adversarial Robustness](https://arxiv.org//abs/2506.15685) ++ [Ignition Phase : Standard Training for Fast Adversarial Robustness](https://arxiv.org/abs/2506.15685) Wang Yu-Hang, Liu ying, Fang liang, Wang Xuelin, Junkang Guo, Shiwei Li, Lei Gao, Jian Liu, Wenfei Yin -+ [JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models](https://arxiv.org//abs/2505.19166) ++ [JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models](https://arxiv.org/abs/2505.19166) Eric Tillmann Bill, Enis Simsar, Thomas Hofmann -+ [ALRPHFS: Adversarially Learned Risk Patterns with Hierarchical Fast \& Slow Reasoning for Robust Agent Defense](https://arxiv.org//abs/2505.19260) ++ [ALRPHFS: Adversarially Learned Risk Patterns with Hierarchical Fast \& Slow Reasoning for Robust Agent Defense](https://arxiv.org/abs/2505.19260) Shiyu Xiang, Tong Zhang, Ronghao Chen @@ -9373,44 +9373,44 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Wang Yu-Hang, Liu ying, Fang liang, Wang Xuelin, Junkang Guo, Shiwei Li, Lei Gao, Jian Liu, Wenfei Yin -+ [GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling](https://arxiv.org//abs/2505.19234) ++ [GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling](https://arxiv.org/abs/2505.19234) Jialong Zhou, Lichao Wang, Xiao Yang -+ [AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science](https://arxiv.org//abs/2506.13992) ++ [AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science](https://arxiv.org/abs/2506.13992) An Luo, Xun Xian, Jin Du, Fangqiao Tian, Ganghua Wang, Ming Zhong, Shengchun Zhao, Xuan Bi, Zirui Liu, Jiawei Zhou, Jayanth Srinivasa, Ashish Kundu, Charles Fleming, Mingyi Hong, Jie Ding # 2025-05-24 -+ [EdgeAgentX: A Novel Framework for Agentic AI at the Edge in Military Communication Networks](https://arxiv.org//abs/2505.18457) ++ [EdgeAgentX: A Novel Framework for Agentic AI at the Edge in Military Communication Networks](https://arxiv.org/abs/2505.18457) Abir Ray -+ [Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation](https://arxiv.org//abs/2505.18556) ++ [Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation](https://arxiv.org/abs/2505.18556) Jun Zhuang, Haibo Jin, Ye Zhang, Zhengjian Kang, Wenbin Zhang, Gaby G. Dagher, Haohan Wang -+ [Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics](https://arxiv.org//abs/2505.18658) ++ [Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics](https://arxiv.org/abs/2505.18658) Pankaj Kumar, Subhankar Mishra -+ [StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations](https://arxiv.org//abs/2505.18766) ++ [StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations](https://arxiv.org/abs/2505.18766) Yanjie Li, Wenxuan Zhang, Xinqi Lyu, Yihao Liu, Bin Xiao -+ [Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models](https://arxiv.org//abs/2505.18773) ++ [Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models](https://arxiv.org/abs/2505.18773) Jamie Hayes, Ilia Shumailov, Christopher A. Choquette-Choo, Matthew Jagielski, George Kaissis, Katherine Lee, Milad Nasr, Sahra Ghalebikesabi, Niloofar Mireshghallah, Meenatchi Sundaram Mutu Selva Annamalai, Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye, Franziska Boenisch, Adam Dziedzic, A. Feder Cooper -+ [LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders](https://arxiv.org//abs/2505.18884) ++ [LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders](https://arxiv.org/abs/2505.18884) Borna Khodabandeh, Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, Sanjay Lall, Sajjad Amini, Seyed-Mohsen Moosavi-Dezfooli -+ [Security Concerns for Large Language Models: A Survey](https://arxiv.org//abs/2505.18889) ++ [Security Concerns for Large Language Models: A Survey](https://arxiv.org/abs/2505.18889) Miles Q. Li, Benjamin C. M. Fung -+ [Mind the Gap: A Practical Attack on GGUF Quantization](https://arxiv.org//abs/2505.23786) ++ [Mind the Gap: A Practical Attack on GGUF Quantization](https://arxiv.org/abs/2505.23786) Kazuki Egashira, Robin Staab, Mark Vero, Jingxuan He, Martin Vechev @@ -9434,104 +9434,104 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Peijie Yu, Yifan Yang, Jinjian Li, Zelong Zhang, Haorui Wang, Xiao Feng, Feng Zhang -+ [$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking](https://arxiv.org//abs/2505.18746) ++ [$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking](https://arxiv.org/abs/2505.18746) Peijie Yu, Yifan Yang, Jinjian Li, Zelong Zhang, Haorui Wang, Xiao Feng, Feng Zhang -+ [Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models](https://arxiv.org//abs/2505.18596) ++ [Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models](https://arxiv.org/abs/2505.18596) Chen Han, Wenzhen Zheng, Xijin Tang -+ [How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark](https://arxiv.org//abs/2505.18761) ++ [How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark](https://arxiv.org/abs/2505.18761) Minglai Yang, Ethan Huang, Liang Zhang, Mihai Surdeanu, William Wang, Liangming Pan # 2025-05-23 -+ [Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness](https://arxiv.org//abs/2505.17406) ++ [Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness](https://arxiv.org/abs/2505.17406) Enyi Jiang, Changming Xu, Nischay Singh, Gagandeep Singh -+ [Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?](https://arxiv.org//abs/2505.17650) ++ [Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?](https://arxiv.org/abs/2505.17650) Chengda Lu, Xiaoyu Fan, Yu Huang, Rongwu Xu, Jijie Li, Wei Xu -+ [RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition](https://arxiv.org//abs/2505.17501) ++ [RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition](https://arxiv.org/abs/2505.17501) Yuehan Jin, Xiaoqing Liu, Yiyuan Yang, Zhiwen Yu, Tong Zhang, Kaixiang Yang -+ [JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models](https://arxiv.org//abs/2505.17568) ++ [JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models](https://arxiv.org/abs/2505.17568) Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Zeren Luo, Jingyi Zheng, Wenhan Dong, Xinlei He, Xuechao Wang, Yingjie Xue, Shengmin Xu, Xinyi Huang -+ [Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models](https://arxiv.org//abs/2505.17601) ++ [Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models](https://arxiv.org/abs/2505.17601) Jiawei Kong, Hao Fang, Xiaochen Yang, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei Wang, Min Zhang -+ [What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection](https://arxiv.org//abs/2505.17513) ++ [What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection](https://arxiv.org/abs/2505.17513) Binh Nguyen, Shuji Shi, Ryan Ofman, Thai Le -+ [Chain-of-Lure: A Synthetic Narrative-Driven Approach to Compromise Large Language Models](https://arxiv.org//abs/2505.17519) ++ [Chain-of-Lure: A Synthetic Narrative-Driven Approach to Compromise Large Language Models](https://arxiv.org/abs/2505.17519) Wenhan Chang, Tianqing Zhu, Yu Zhao, Shuangyong Song, Ping Xiong, Wanlei Zhou, Yongxiang Li -+ [One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs](https://arxiv.org//abs/2505.17598) ++ [One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs](https://arxiv.org/abs/2505.17598) Linbao Li, Yannan Liu, Daojing He, Yu Li -+ [VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models](https://arxiv.org//abs/2505.17440) ++ [VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models](https://arxiv.org/abs/2505.17440) Hefei Mei, Zirui Wang, Shen You, Minjing Dong, Chang Xu -+ [The Coherence Trap: When MLLM-Crafted Narratives Exploit Manipulated Visual Contexts](https://arxiv.org//abs/2505.17476) ++ [The Coherence Trap: When MLLM-Crafted Narratives Exploit Manipulated Visual Contexts](https://arxiv.org/abs/2505.17476) Yuchen Zhang, Yaxiong Wang, Yujiao Wu, Lianwei Wu, Li Zhu -+ [Enhancing Adversarial Robustness of Vision Language Models via Adversarial Mixture Prompt Tuning](https://arxiv.org//abs/2505.17509) ++ [Enhancing Adversarial Robustness of Vision Language Models via Adversarial Mixture Prompt Tuning](https://arxiv.org/abs/2505.17509) Shiji Zhao, Qihui Zhu, Shukun Xiong, Shouwei Ruan, Yize Fan, Ranjie Duan, Qing Guo, Xingxing Wei -+ [Temporal Consistency Constrained Transferable Adversarial Attacks with Background Mixup for Action Recognition](https://arxiv.org//abs/2505.17807) ++ [Temporal Consistency Constrained Transferable Adversarial Attacks with Background Mixup for Action Recognition](https://arxiv.org/abs/2505.17807) Ping Li, Jianan Ni, Bo Pang -+ [SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification](https://arxiv.org//abs/2505.18015) ++ [SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification](https://arxiv.org/abs/2505.18015) Shashank Agnihotri, David Schader, Jonas Jakubassa, Nico Sharei, Simon Kral, Mehmet Ege Kaçar, Ruben Weber, Margret Keuper -+ [CAMME: Adaptive Deepfake Image Detection with Multi-Modal Cross-Attention](https://arxiv.org//abs/2505.18035) ++ [CAMME: Adaptive Deepfake Image Detection with Multi-Modal Cross-Attention](https://arxiv.org/abs/2505.18035) Naseem Khan, Tuan Nguyen, Amine Bermak, Issa Khalil -+ [Mahalanobis++: Improving OOD Detection via Feature Normalization](https://arxiv.org//abs/2505.18032) ++ [Mahalanobis++: Improving OOD Detection via Feature Normalization](https://arxiv.org/abs/2505.18032) Maximilian Mueller, Matthias Hein -+ [Towards more transferable adversarial attack in black-box manner](https://arxiv.org//abs/2505.18097) ++ [Towards more transferable adversarial attack in black-box manner](https://arxiv.org/abs/2505.18097) Chun Tong Lei, Zhongliang Guo, Hon Chung Lee, Minh Quoc Duong, Chun Pong Lau -+ [Adversarial Robustness of Nonparametric Regression](https://arxiv.org//abs/2505.17356) ++ [Adversarial Robustness of Nonparametric Regression](https://arxiv.org/abs/2505.17356) Parsa Moradi, Hanzaleh Akabrinodehi, Mohammad Ali Maddah-Ali -+ [Improved and Oracle-Efficient Online $\ell_1$-Multicalibration](https://arxiv.org//abs/2505.17365) ++ [Improved and Oracle-Efficient Online $\ell_1$-Multicalibration](https://arxiv.org/abs/2505.17365) Rohan Ghuge, Vidya Muthukumar, Sahil Singla -+ [Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation](https://arxiv.org//abs/2505.17579) ++ [Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation](https://arxiv.org/abs/2505.17579) Teruki Sano, Minoru Kuribayashi, Masao Sakai, Shuji Ishobe, Eisuke Koizumi -+ [Sec5GLoc: Securing 5G Indoor Localization via Adversary-Resilient Deep Learning Architecture](https://arxiv.org//abs/2505.17776) ++ [Sec5GLoc: Securing 5G Indoor Localization via Adversary-Resilient Deep Learning Architecture](https://arxiv.org/abs/2505.17776) Ildi Alla, Valeria Loscri -+ [Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation](https://arxiv.org//abs/2505.18323) ++ [Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation](https://arxiv.org/abs/2505.18323) Nicolas Küchler, Ivan Petrov, Conrad Grobler, Ilia Shumailov -+ [A Critical Evaluation of Defenses against Prompt Injection Attacks](https://arxiv.org//abs/2505.18333) ++ [A Critical Evaluation of Defenses against Prompt Injection Attacks](https://arxiv.org/abs/2505.18333) Yuqi Jia, Zedian Shao, Yupei Liu, Jinyuan Jia, Dawn Song, Neil Zhenqiang Gong @@ -9543,19 +9543,19 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Sandeep Pirbhulal, Habtamu Abie, Martin Jullum, Didrik Nielsen, Anders Løland -+ [EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications](https://arxiv.org//abs/2505.17654) ++ [EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications](https://arxiv.org/abs/2505.17654) Ancheng Xu, Zhihao Yang, Jingpeng Li, Guanghu Yuan, Longze Chen, Liang Yan, Jiehui Zhou, Zhen Qin, Hengyun Chang, Hamid Alinejad-Rokny, Bo Zheng, Min Yang -+ [How Can I Publish My LLM Benchmark Without Giving the True Answers Away?](https://arxiv.org//abs/2505.18102) ++ [How Can I Publish My LLM Benchmark Without Giving the True Answers Away?](https://arxiv.org/abs/2505.18102) Takashi Ishida, Thanawat Lodkaew, Ikko Yamane -+ [Reward Model Overoptimisation in Iterated RLHF](https://arxiv.org//abs/2505.18126) ++ [Reward Model Overoptimisation in Iterated RLHF](https://arxiv.org/abs/2505.18126) Lorenz Wolf, Robert Kirk, Mirco Musolesi -+ [T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion Models](https://arxiv.org//abs/2505.17550) ++ [T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion Models](https://arxiv.org/abs/2505.17550) Xiaoyu Ye, Songjie Cheng, Yongtao Wang, Yajiao Xiong, Yishen Li @@ -9564,135 +9564,135 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Huanran Chen, Yinpeng Dong, Zeming Wei, Yao Huang, Yichi Zhang, Hang Su, Jun Zhu # 2025-05-22 -+ [SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning](https://arxiv.org//abs/2505.16186) ++ [SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning](https://arxiv.org/abs/2505.16186) Kaiwen Zhou, Xuandong Zhao, Gaowen Liu, Jayanth Srinivasa, Aosong Feng, Dawn Song, Xin Eric Wang -+ [Finetuning-Activated Backdoors in LLMs](https://arxiv.org//abs/2505.16567) ++ [Finetuning-Activated Backdoors in LLMs](https://arxiv.org/abs/2505.16567) Thibaud Gloaguen, Mark Vero, Robin Staab, Martin Vechev -+ [BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization](https://arxiv.org//abs/2505.16640) ++ [BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization](https://arxiv.org/abs/2505.16640) Xueyang Zhou, Guiyao Tie, Guowen Zhang, Hechang Wang, Pan Zhou, Lichao Sun -+ [From Evaluation to Defense: Advancing Safety in Video Large Language Models](https://arxiv.org//abs/2505.16643) ++ [From Evaluation to Defense: Advancing Safety in Video Large Language Models](https://arxiv.org/abs/2505.16643) Yiwei Sun, Peiqi Jiang, Chuanbin Liu, Luohao Lin, Zhiying Lu, Hongtao Xie -+ [BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models](https://arxiv.org//abs/2505.16670) ++ [BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models](https://arxiv.org/abs/2505.16670) Xiaobei Yan, Yiming Li, Zhaoxin Fan, Han Qiu, Tianwei Zhang -+ [Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization](https://arxiv.org//abs/2505.16737) ++ [Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization](https://arxiv.org/abs/2505.16737) Chengcan Wu, Zhixin Zhang, Zeming Wei, Yihao Zhang, Meng Sun -+ [When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques](https://arxiv.org//abs/2505.16765) ++ [When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques](https://arxiv.org/abs/2505.16765) Jianing Geng, Biao Yi, Zekun Fei, Tongxi Wu, Lihai Nie, Zheli Liu -+ [CoTSRF: Utilize Chain of Thought as Stealthy and Robust Fingerprint of Large Language Models](https://arxiv.org//abs/2505.16785) ++ [CoTSRF: Utilize Chain of Thought as Stealthy and Robust Fingerprint of Large Language Models](https://arxiv.org/abs/2505.16785) Zhenzhen Ren, GuoBiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang -+ [Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability](https://arxiv.org//abs/2505.16789) ++ [Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability](https://arxiv.org/abs/2505.16789) Punya Syon Pandey, Samuel Simko, Kellin Pelrine, Zhijing Jin -+ [CAIN: Hijacking LLM-Humans Conversations via a Two-Stage Malicious System Prompt Generation and Refining Framework](https://arxiv.org//abs/2505.16888) ++ [CAIN: Hijacking LLM-Humans Conversations via a Two-Stage Malicious System Prompt Generation and Refining Framework](https://arxiv.org/abs/2505.16888) Viet Pham, Thai Le -+ [MixAT: Combining Continuous and Discrete Adversarial Training for LLMs](https://arxiv.org//abs/2505.16947) ++ [MixAT: Combining Continuous and Discrete Adversarial Training for LLMs](https://arxiv.org/abs/2505.16947) Csaba Dékány, Stefan Balauca, Robin Staab, Dimitar I. Dimitrov, Martin Vechev -+ [Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models](https://arxiv.org//abs/2505.16957) ++ [Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models](https://arxiv.org/abs/2505.16957) Junjie Xiong, Changjia Zhu, Shuhang Lin, Chong Zhang, Yongfeng Zhang, Yao Liu, Lingyao Li -+ [Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers](https://arxiv.org//abs/2505.16241) ++ [Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers](https://arxiv.org/abs/2505.16241) Viet-Anh Nguyen, Shiqian Zhao, Gia Dao, Runyi Hu, Yi Xie, Luu Anh Tuan -+ [All You Need is "Leet": Evading Hate-speech Detection AI](https://arxiv.org//abs/2505.16263) ++ [All You Need is "Leet": Evading Hate-speech Detection AI](https://arxiv.org/abs/2505.16263) Sampanna Yashwant Kahu, Naman Ahuja -+ [CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning](https://arxiv.org//abs/2505.16559) ++ [CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning](https://arxiv.org/abs/2505.16559) Biao Yi, Tiansheng Huang, Baolei Zhang, Tong Li, Lihai Nie, Zheli Liu, Li Shen -+ [BadDepth: Backdoor Attacks Against Monocular Depth Estimation in the Physical World](https://arxiv.org//abs/2505.16154) ++ [BadDepth: Backdoor Attacks Against Monocular Depth Estimation in the Physical World](https://arxiv.org/abs/2505.16154) Ji Guo, Long Zhou, Zhijin Wang, Jiaming He, Qiyang Song, Aiguo Chen, Wenbo Jiang -+ [TRAIL: Transferable Robust Adversarial Images via Latent diffusion](https://arxiv.org//abs/2505.16166) ++ [TRAIL: Transferable Robust Adversarial Images via Latent diffusion](https://arxiv.org/abs/2505.16166) Yuhao Xue, Zhifei Zhang, Xinyang Jiang, Yifei Shen, Junyao Gao, Wentao Gu, Jiale Zhao, Miaojing Shi, Cairong Zhao -+ [Accelerating Targeted Hard-Label Adversarial Attacks in Low-Query Black-Box Settings](https://arxiv.org//abs/2505.16313) ++ [Accelerating Targeted Hard-Label Adversarial Attacks in Low-Query Black-Box Settings](https://arxiv.org/abs/2505.16313) Arjhun Swaminathan, Mete Akgün -+ [SuperPure: Efficient Purification of Localized and Distributed Adversarial Patches via Super-Resolution GAN Models](https://arxiv.org//abs/2505.16318) ++ [SuperPure: Efficient Purification of Localized and Distributed Adversarial Patches via Super-Resolution GAN Models](https://arxiv.org/abs/2505.16318) Hossein Khalili, Seongbin Park, Venkat Bollapragada, Nader Sehatbakhsh -+ [AdvReal: Adversarial Patch Generation Framework with Application to Adversarial Safety Evaluation of Object Detection Systems](https://arxiv.org//abs/2505.16402) ++ [AdvReal: Adversarial Patch Generation Framework with Application to Adversarial Safety Evaluation of Object Detection Systems](https://arxiv.org/abs/2505.16402) Yuanhao Huang, Yilong Ren, Jinlei Wang, Lujia Huo, Xuesong Bai, Jinchuan Zhang, Haiyan Yu -+ [Backdoor Cleaning without External Guidance in MLLM Fine-tuning](https://arxiv.org//abs/2505.16916) ++ [Backdoor Cleaning without External Guidance in MLLM Fine-tuning](https://arxiv.org/abs/2505.16916) Xuankun Rong, Wenke Huang, Jian Liang, Jinhe Bi, Xun Xiao, Yiming Li, Bo Du, Mang Ye -+ [When Are Concepts Erased From Diffusion Models?](https://arxiv.org//abs/2505.17013) ++ [When Are Concepts Erased From Diffusion Models?](https://arxiv.org/abs/2505.17013) Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen -+ [Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach](https://arxiv.org//abs/2505.16403) ++ [Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach](https://arxiv.org/abs/2505.16403) Huazi Pan, Yanjun Zhang, Leo Yu Zhang, Scott Adams, Abbas Kouzani, Suiyang Khoo -+ [Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models](https://arxiv.org//abs/2505.16446) ++ [Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models](https://arxiv.org/abs/2505.16446) Zhaoxin Wang, Handing Wang, Cong Tian, Yaochu Jin -+ [Experimental robustness benchmark of quantum neural network on a superconducting quantum processor](https://arxiv.org//abs/2505.16714) ++ [Experimental robustness benchmark of quantum neural network on a superconducting quantum processor](https://arxiv.org/abs/2505.16714) Hai-Feng Zhang, Zhao-Yun Chen, Peng Wang, Liang-Liang Guo, Tian-Le Wang, Xiao-Yan Yang, Ren-Ze Zhao, Ze-An Zhao, Sheng Zhang, Lei Du, Hao-Ran Tao, Zhi-Long Jia, Wei-Cheng Kong, Huan-Yu Liu, Athanasios V. Vasilakos, Yang Yang, Yu-Chun Wu, Ji Guan, Peng Duan, Guo-Ping Guo -+ [Robust LLM Fingerprinting via Domain-Specific Watermarks](https://arxiv.org//abs/2505.16723) ++ [Robust LLM Fingerprinting via Domain-Specific Watermarks](https://arxiv.org/abs/2505.16723) Thibaud Gloaguen, Robin Staab, Nikola Jovanović, Martin Vechev -+ [Privacy-Aware Cyberterrorism Network Analysis using Graph Neural Networks and Federated Learning](https://arxiv.org//abs/2505.16371) ++ [Privacy-Aware Cyberterrorism Network Analysis using Graph Neural Networks and Federated Learning](https://arxiv.org/abs/2505.16371) Anas Ali, Mubashar Husain, Peter Hans -+ [MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming](https://arxiv.org//abs/2505.17147) ++ [MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming](https://arxiv.org/abs/2505.17147) Weiyang Guo, Jing Li, Wenya Wang, YU LI, Daojing He, Jun Yu, Min Zhang -+ [Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting](https://arxiv.org//abs/2505.17160) ++ [Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting](https://arxiv.org/abs/2505.17160) Bang Trinh Tran To, Thai Le -+ [Robustifying Vision-Language Models via Dynamic Token Reweighting](https://arxiv.org//abs/2505.17132) ++ [Robustifying Vision-Language Models via Dynamic Token Reweighting](https://arxiv.org/abs/2505.17132) Tanqiu Jiang, Jiacheng Liang, Rongyi Zhu, Jiawei Zhou, Fenglong Ma, Ting Wang -+ [Secure and Private Federated Learning: Achieving Adversarial Resilience through Robust Aggregation](https://arxiv.org//abs/2505.17226) ++ [Secure and Private Federated Learning: Achieving Adversarial Resilience through Robust Aggregation](https://arxiv.org/abs/2505.17226) Kun Yang, Neena Imam -+ [Backdoors in DRL: Four Environments Focusing on In-distribution Triggers](https://arxiv.org//abs/2505.17248) ++ [Backdoors in DRL: Four Environments Focusing on In-distribution Triggers](https://arxiv.org/abs/2505.17248) Chace Ashcraft, Ted Staley, Josh Carney, Cameron Hickert, Derek Juba, Kiran Karra, Nathan Drenkow -+ [Towards medical AI misalignment: a preliminary study](https://arxiv.org//abs/2505.18212) ++ [Towards medical AI misalignment: a preliminary study](https://arxiv.org/abs/2505.18212) Barbara Puccio, Federico Castagna, Allan Tucker, Pierangelo Veltri @@ -9704,7 +9704,7 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen -+ [Training on Plausible Counterfactuals Removes Spurious Correlations](https://arxiv.org//abs/2505.16583) ++ [Training on Plausible Counterfactuals Removes Spurious Correlations](https://arxiv.org/abs/2505.16583) Shpresim Sadiku, Kartikeya Chitranshi, Hiroshi Kera, Sebastian Pokutta @@ -9712,11 +9712,11 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Punya Syon Pandey, Samuel Simko, Kellin Pelrine, Zhijing Jin -+ [Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models](https://arxiv.org//abs/2505.16538) ++ [Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models](https://arxiv.org/abs/2505.16538) Ercong Nie, Helmut Schmid, Hinrich Schütze -+ [Erased or Dormant? Rethinking Concept Erasure Through Reversibility](https://arxiv.org//abs/2505.16174) ++ [Erased or Dormant? Rethinking Concept Erasure Through Reversibility](https://arxiv.org/abs/2505.16174) Ping Liu, Chi Zhang @@ -9728,148 +9728,148 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Thibaud Gloaguen, Mark Vero, Robin Staab, Martin Vechev -+ [REOBench: Benchmarking Robustness of Earth Observation Foundation Models](https://arxiv.org//abs/2505.16793) ++ [REOBench: Benchmarking Robustness of Earth Observation Foundation Models](https://arxiv.org/abs/2505.16793) Xiang Li, Yong Tao, Siyuan Zhang, Siwei Liu, Zhitong Xiong, Chunbo Luo, Lu Liu, Mykola Pechenizkiy, Xiao Xiang Zhu, Tianjin Huang -+ [Shape it Up! Restoring LLM Safety during Finetuning](https://arxiv.org//abs/2505.17196) ++ [Shape it Up! Restoring LLM Safety during Finetuning](https://arxiv.org/abs/2505.17196) ShengYun Peng, Pin-Yu Chen, Jianfeng Chi, Seongmin Lee, Duen Horng Chau # 2025-05-21 -+ [Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs](https://arxiv.org//abs/2505.15265) ++ [Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs](https://arxiv.org/abs/2505.15265) Zihao Pan, Yu Tong, Weibin Wu, Jingyi Wang, Lifeng Chen, Zhe Zhao, Jiajia Wei, Yitong Qiao, Zibin Zheng -+ [BadSR: Stealthy Label Backdoor Attacks on Image Super-Resolution](https://arxiv.org//abs/2505.15308) ++ [BadSR: Stealthy Label Backdoor Attacks on Image Super-Resolution](https://arxiv.org/abs/2505.15308) Ji Guo, Xiaolei Wen, Wenbo Jiang, Cheng Huang, Jinjin Li, Hongwei Li -+ [Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors](https://arxiv.org//abs/2505.15337) ++ [Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors](https://arxiv.org/abs/2505.15337) Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei Wang, Min Zhang -+ [Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models](https://arxiv.org//abs/2505.15406) ++ [Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models](https://arxiv.org/abs/2505.15406) Zirui Song, Qian Jiang, Mingxuan Cui, Mingzhe Li, Lang Gao, Zeyu Zhang, Zixiang Xu, Yanbo Wang, Chenxi Wang, Guangxian Ouyang, Zhenhao Chen, Xiuying Chen -+ [Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries](https://arxiv.org//abs/2505.15420) ++ [Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries](https://arxiv.org/abs/2505.15420) Yuhao Wang, Wenjie Qu, Yanze Jiang, Zichen Liu, Yue Liu, Shengfang Zhai, Yinpeng Dong, Jiaheng Zhang -+ [Beyond Classification: Evaluating Diffusion Denoised Smoothing for Security-Utility Trade off](https://arxiv.org//abs/2505.15594) ++ [Beyond Classification: Evaluating Diffusion Denoised Smoothing for Security-Utility Trade off](https://arxiv.org/abs/2505.15594) Yury Belousov, Brian Pulfer, Vitaliy Kinakh, Slava Voloshynovskiy -+ [A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability](https://arxiv.org//abs/2505.15683) ++ [A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability](https://arxiv.org/abs/2505.15683) Zishuai Zhang, Hainan Zhang, Jiaying Zheng, Ziwei Wang, Yongxin Tong, Jin Dong, Zhiming Zheng -+ [A Unified Theoretical Analysis of Private and Robust Offline Alignment: from RLHF to DPO](https://arxiv.org//abs/2505.15694) ++ [A Unified Theoretical Analysis of Private and Robust Offline Alignment: from RLHF to DPO](https://arxiv.org/abs/2505.15694) Xingyu Zhou, Yulian Wu, Francesco Orabona -+ [Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses](https://arxiv.org//abs/2505.15738) ++ [Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses](https://arxiv.org/abs/2505.15738) Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye -+ [Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval](https://arxiv.org//abs/2505.15753) ++ [Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval](https://arxiv.org/abs/2505.15753) Taiye Chen, Zeming Wei, Ang Li, Yisen Wang -+ [Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation](https://arxiv.org//abs/2505.15249) ++ [Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation](https://arxiv.org/abs/2505.15249) Yerin Hwang, Dongryeol Lee, Kyungmin Min, Taegwan Kang, Yong-il Kim, Kyomin Jung -+ [Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack](https://arxiv.org//abs/2505.15323) ++ [Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack](https://arxiv.org/abs/2505.15323) Silvia Cappelletti, Tobia Poppi, Samuele Poppi, Zheng-Xin Yong, Diego Garcia-Olano, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara -+ [Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study](https://arxiv.org//abs/2505.15389) ++ [Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study](https://arxiv.org/abs/2505.15389) DongGeon Lee, Joonwon Jang, Jihae Jeong, Hwanjo Yu -+ [Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!](https://arxiv.org//abs/2505.15656) ++ [Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!](https://arxiv.org/abs/2505.15656) Zhexin Zhang, Yuhao Sun, Junxiao Yang, Shiyao Cui, Hongning Wang, Minlie Huang -+ [Advancing LLM Safe Alignment with Safety Representation Ranking](https://arxiv.org//abs/2505.15710) ++ [Advancing LLM Safe Alignment with Safety Representation Ranking](https://arxiv.org/abs/2505.15710) Tianqi Du, Zeming Wei, Quan Chen, Chenheng Zhang, Yisen Wang -+ [Reverse Engineering Human Preferences with Reinforcement Learning](https://arxiv.org//abs/2505.15795) ++ [Reverse Engineering Human Preferences with Reinforcement Learning](https://arxiv.org/abs/2505.15795) Lisa Alazraki, Tan Yi-Chern, Jon Ander Campos, Maximilian Mozes, Marek Rei, Max Bartolo -+ [Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering](https://arxiv.org//abs/2505.15805) ++ [Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering](https://arxiv.org/abs/2505.15805) Hwan Chang, Yumin Kim, Yonghyun Jun, Hwanhee Lee -+ [Geometrically Regularized Transfer Learning with On-Manifold and Off-Manifold Perturbation](https://arxiv.org//abs/2505.15191) ++ [Geometrically Regularized Transfer Learning with On-Manifold and Off-Manifold Perturbation](https://arxiv.org/abs/2505.15191) Hana Satou, Alan Mitkiy, F Monkey -+ [GAMA: Geometry-Aware Manifold Alignment via Structured Adversarial Perturbations for Robust Domain Adaptation](https://arxiv.org//abs/2505.15194) ++ [GAMA: Geometry-Aware Manifold Alignment via Structured Adversarial Perturbations for Robust Domain Adaptation](https://arxiv.org/abs/2505.15194) Hana Satou, F Monkey -+ [My Face Is Mine, Not Yours: Facial Protection Against Diffusion Model Face Swapping](https://arxiv.org//abs/2505.15336) ++ [My Face Is Mine, Not Yours: Facial Protection Against Diffusion Model Face Swapping](https://arxiv.org/abs/2505.15336) Hon Ming Yam, Zhongliang Guo, Chun Pong Lau -+ [Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models](https://arxiv.org//abs/2505.15130) ++ [Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models](https://arxiv.org/abs/2505.15130) Sajjad Ghiasvand, Haniyeh Ehsani Oskouie, Mahnoosh Alizadeh, Ramtin Pedarsani -+ [EC-LDA : Label Distribution Inference Attack against Federated Graph Learning with Embedding Compression](https://arxiv.org//abs/2505.15140) ++ [EC-LDA : Label Distribution Inference Attack against Federated Graph Learning with Embedding Compression](https://arxiv.org/abs/2505.15140) Tong Cheng, Fu Jie, Xinpeng Ling, Huifa Li, Zhili Chen -+ [Enhancing Certified Robustness via Block Reflector Orthogonal Layers and Logit Annealing Loss](https://arxiv.org//abs/2505.15174) ++ [Enhancing Certified Robustness via Block Reflector Orthogonal Layers and Logit Annealing Loss](https://arxiv.org/abs/2505.15174) Bo-Han Lai, Pin-Han Huang, Bo-Han Kung, Shang-Tse Chen -+ [A Linear Approach to Data Poisoning](https://arxiv.org//abs/2505.15175) ++ [A Linear Approach to Data Poisoning](https://arxiv.org/abs/2505.15175) Diego Granziol, Donald Flynn -+ [EEG-Based Inter-Patient Epileptic Seizure Detection Combining Domain Adversarial Training with CNN-BiLSTM Network](https://arxiv.org//abs/2505.15203) ++ [EEG-Based Inter-Patient Epileptic Seizure Detection Combining Domain Adversarial Training with CNN-BiLSTM Network](https://arxiv.org/abs/2505.15203) Rina Tazaki, Tomoyuki Akiyama, Akira Furui -+ [A Survey On Secure Machine Learning](https://arxiv.org//abs/2505.15124) ++ [A Survey On Secure Machine Learning](https://arxiv.org/abs/2505.15124) Taobo Liao, Taoran Li, Prathamesh Nadkarni -+ [Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations](https://arxiv.org//abs/2505.16004) ++ [Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations](https://arxiv.org/abs/2505.16004) Aaron J. Li, Suraj Srinivas, Usha Bhalla, Himabindu Lakkaraju -+ [LAGO: Few-shot Crosslingual Embedding Inversion Attacks via Language Similarity-Aware Graph Optimization](https://arxiv.org//abs/2505.16008) ++ [LAGO: Few-shot Crosslingual Embedding Inversion Attacks via Language Similarity-Aware Graph Optimization](https://arxiv.org/abs/2505.16008) Wenrui Yu, Yiyi Chen, Johannes Bjerva, Sokol Kosta, Qiongxiu Li -+ [Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains](https://arxiv.org//abs/2505.16014) ++ [Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains](https://arxiv.org/abs/2505.16014) Yash Saxena, Anpur Padia, Mandar S Chaudhary, Kalpa Gunaratna, Srinivasan Parthasarathy, Manas Gaur -+ [Challenger: Affordable Adversarial Driving Video Generation](https://arxiv.org//abs/2505.15880) ++ [Challenger: Affordable Adversarial Driving Video Generation](https://arxiv.org/abs/2505.15880) Zhiyuan Xu, Bohan Li, Huan-ang Gao, Mingju Gao, Yong Chen, Ming Liu, Chenxu Yan, Hang Zhao, Shuo Feng, Hao Zhao -+ [RRTL: Red Teaming Reasoning Large Language Models in Tool Learning](https://arxiv.org//abs/2505.17106) ++ [RRTL: Red Teaming Reasoning Large Language Models in Tool Learning](https://arxiv.org/abs/2505.17106) Yifei Liu, Yu Cui, Haibin Zhang -+ [Covert Attacks on Machine Learning Training in Passively Secure MPC](https://arxiv.org//abs/2505.17092) ++ [Covert Attacks on Machine Learning Training in Passively Secure MPC](https://arxiv.org/abs/2505.17092) Matthew Jagielski, Daniel Escudero, Rahul Rachuri, Peter Scholl -+ [Neuromorphic Mimicry Attacks Exploiting Brain-Inspired Computing for Covert Cyber Intrusions](https://arxiv.org//abs/2505.17094) ++ [Neuromorphic Mimicry Attacks Exploiting Brain-Inspired Computing for Covert Cyber Intrusions](https://arxiv.org/abs/2505.17094) Hemanth Ravipati -+ [CrossRF: A Domain-Invariant Deep Learning Approach for RF Fingerprinting](https://arxiv.org//abs/2505.18200) ++ [CrossRF: A Domain-Invariant Deep Learning Approach for RF Fingerprinting](https://arxiv.org/abs/2505.18200) Fahrettin Emin Tiras, Hayriye Serra Altinoluk @@ -9885,128 +9885,128 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Zihao Pan, Yu Tong, Weibin Wu, Jingyi Wang, Lifeng Chen, Zhe Zhao, Jiajia Wei, Yitong Qiao, Zibin Zheng -+ [MAPS: A Multilingual Benchmark for Global Agent Performance and Security](https://arxiv.org//abs/2505.15935) ++ [MAPS: A Multilingual Benchmark for Global Agent Performance and Security](https://arxiv.org/abs/2505.15935) Omer Hofman, Jonathan Brokman, Oren Rachmil, Shamik Bose, Vikas Pahuja, Toshiya Shimizu, Trisha Starostina, Kelly Marchisio, Seraphina Goldfarb-Tarrant, Roman Vainshtein -+ [RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection](https://arxiv.org//abs/2505.15386) ++ [RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection](https://arxiv.org/abs/2505.15386) Yiming Huang, Junyan Zhang, Zihao Wang, Biquan Bie, Yunzhong Qiu, Yi R. Fung, Xinlei He # 2025-05-20 -+ [EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection](https://arxiv.org//abs/2505.14289) ++ [EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection](https://arxiv.org/abs/2505.14289) Yijie Lu, Tianjie Ju, Manman Zhao, Xinbei Ma, Yuan Guo, ZhuoSheng Zhang -+ [SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors](https://arxiv.org//abs/2505.14300) ++ [SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors](https://arxiv.org/abs/2505.14300) Maheep Chaudhary, Fazl Barez -+ [Safety2Drive: Safety-Critical Scenario Benchmark for the Evaluation of Autonomous Driving](https://arxiv.org//abs/2505.13872) ++ [Safety2Drive: Safety-Critical Scenario Benchmark for the Evaluation of Autonomous Driving](https://arxiv.org/abs/2505.13872) Jingzheng Li, Tiancheng Wang, Xingyu Peng, Jiacheng Chen, Zhijun Chen, Bing Li, Xianglong Liu -+ [FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram Matrix](https://arxiv.org//abs/2505.14024) ++ [FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram Matrix](https://arxiv.org/abs/2505.14024) Di Wu, Qian Li, Heng Yang, Yong Han -+ [AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models](https://arxiv.org//abs/2505.14103) ++ [AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models](https://arxiv.org/abs/2505.14103) Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, Weizhe Zhang -+ ["Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs](https://arxiv.org//abs/2505.14226) ++ ["Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs](https://arxiv.org/abs/2505.14226) Darpan Aswal, Siddharth D Jaiswal -+ [Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion](https://arxiv.org//abs/2505.14316) ++ [Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion](https://arxiv.org/abs/2505.14316) Tiehan Cui, Yanxu Mao, Peipei Liu, Congying Liu, Datao You -+ [Can Large Language Models Really Recognize Your Name?](https://arxiv.org//abs/2505.14549) ++ [Can Large Language Models Really Recognize Your Name?](https://arxiv.org/abs/2505.14549) Dzung Pham, Peter Kairouz, Niloofar Mireshghallah, Eugene Bagdasarian, Chau Minh Pham, Amir Houmansadr -+ [Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)](https://arxiv.org//abs/2505.14608) ++ [Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)](https://arxiv.org/abs/2505.14608) Rafael Rivera Soto, Barry Chen, Nicholas Andrews -+ [Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs](https://arxiv.org//abs/2505.14286) ++ [Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs](https://arxiv.org/abs/2505.14286) Rao Ma, Mengjie Qian, Vyas Raina, Mark Gales, Kate Knill -+ [Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents](https://arxiv.org//abs/2505.14418) ++ [Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents](https://arxiv.org/abs/2505.14418) Pengzhou Cheng, Haowen Hu, Zheng Wu, Zongru Wu, Tianjie Ju, Daizong Ding, Zhuosheng Zhang, Gongshen Liu -+ [PandaGuard: Systematic Evaluation of LLM Safety in the Era of Jailbreaking Attacks](https://arxiv.org//abs/2505.13862) ++ [PandaGuard: Systematic Evaluation of LLM Safety in the Era of Jailbreaking Attacks](https://arxiv.org/abs/2505.13862) Guobin Shen, Dongcheng Zhao, Linghao Feng, Xiang He, Jihang Wang, Sicheng Shen, Haibo Tong, Yiting Dong, Jindong Li, Xiang Zheng, Yi Zeng -+ [Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation](https://arxiv.org//abs/2505.13957) ++ [Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation](https://arxiv.org/abs/2505.13957) Jiankun Zhang, Shenglai Zeng, Jie Ren, Tianqi Zheng, Hui Liu, Xianfeng Tang, Hui Liu, Yi Chang -+ [Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs](https://arxiv.org//abs/2505.14368) ++ [Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs](https://arxiv.org/abs/2505.14368) Jiawen Wang, Pritha Gupta, Ivan Habernal, Eyke Hüllermeier -+ [Domain Adaptation for Multi-label Image Classification: a Discriminator-free Approach](https://arxiv.org//abs/2505.14333) ++ [Domain Adaptation for Multi-label Image Classification: a Discriminator-free Approach](https://arxiv.org/abs/2505.14333) Inder Pal Singh, Enjie Ghorbel, Anis Kacem, Djamila Aouada -+ [Adversarial Training from Mean Field Perspective](https://arxiv.org//abs/2505.14021) ++ [Adversarial Training from Mean Field Perspective](https://arxiv.org/abs/2505.14021) Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki -+ [Adversarially Pretrained Transformers may be Universally Robust In-Context Learners](https://arxiv.org//abs/2505.14042) ++ [Adversarially Pretrained Transformers may be Universally Robust In-Context Learners](https://arxiv.org/abs/2505.14042) Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki -+ [Fragments to Facts: Partial-Information Fragment Inference from LLMs](https://arxiv.org//abs/2505.13819) ++ [Fragments to Facts: Partial-Information Fragment Inference from LLMs](https://arxiv.org/abs/2505.13819) Lucas Rosenblatt, Bin Han, Robert Wolfe, Bill Howe -+ [ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models](https://arxiv.org//abs/2505.13910) ++ [ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models](https://arxiv.org/abs/2505.13910) Guangtao Zheng, Wenqian Ye, Aidong Zhang -+ [Adverseness vs. Equilibrium: Exploring Graph Adversarial Resilience through Dynamic Equilibrium](https://arxiv.org//abs/2505.14463) ++ [Adverseness vs. Equilibrium: Exploring Graph Adversarial Resilience through Dynamic Equilibrium](https://arxiv.org/abs/2505.14463) Xinxin Fan, Wenxiong Chen, Mengfan Li, Wenqi Wei, Ling Liu -+ [SifterNet: A Generalized and Model-Agnostic Trigger Purification Approach](https://arxiv.org//abs/2505.14531) ++ [SifterNet: A Generalized and Model-Agnostic Trigger Purification Approach](https://arxiv.org/abs/2505.14531) Shaoye Luo, Xinxin Fan, Quanliang Jing, Chi Lin, Mengfan Li, Yunfeng Lu, Yongjun Xu -+ [Vulnerability of Transfer-Learned Neural Networks to Data Reconstruction Attacks in Small-Data Regime](https://arxiv.org//abs/2505.14323) ++ [Vulnerability of Transfer-Learned Neural Networks to Data Reconstruction Attacks in Small-Data Regime](https://arxiv.org/abs/2505.14323) Tomasz Maciążek, Robert Allison -+ [Lessons from Defending Gemini Against Indirect Prompt Injections](https://arxiv.org//abs/2505.14534) ++ [Lessons from Defending Gemini Against Indirect Prompt Injections](https://arxiv.org/abs/2505.14534) Chongyang Shi, Sharon Lin, Shuang Song, Jamie Hayes, Ilia Shumailov, Itay Yona, Juliette Pluto, Aneesh Pappu, Christopher A. Choquette-Choo, Milad Nasr, Chawin Sitawarin, Gena Gibson, Andreas Terzis, John "Four" Flynn -+ [D4+: Emergent Adversarial Driving Maneuvers with Approximate Functional Optimization](https://arxiv.org//abs/2505.13942) ++ [D4+: Emergent Adversarial Driving Maneuvers with Approximate Functional Optimization](https://arxiv.org/abs/2505.13942) Diego Ortiz Barbosa, Luis Burbano, Carlos Hernandez, Zengxiang Lei, Younghee Park, Satish Ukkusuri, Alvaro A Cardenas -+ [Replay Attacks Against Audio Deepfake Detection](https://arxiv.org//abs/2505.14862) ++ [Replay Attacks Against Audio Deepfake Detection](https://arxiv.org/abs/2505.14862) Nicolas Müller, Piotr Kawa, Wei-Herng Choong, Adriana Stan, Aditya Tirumala Bukkapatnam, Karla Pizzi, Alexander Wagner, Philip Sperl -+ [Anomaly Detection Based on Critical Paths for Deep Neural Networks](https://arxiv.org//abs/2505.14967) ++ [Anomaly Detection Based on Critical Paths for Deep Neural Networks](https://arxiv.org/abs/2505.14967) Fangzhen Zhao, Chenyi Zhang, Naipeng Dong, Ming Li, Jinxiao Shan -+ [Efficient Privacy-Preserving Cross-Silo Federated Learning with Multi-Key Homomorphic Encryption](https://arxiv.org//abs/2505.14797) ++ [Efficient Privacy-Preserving Cross-Silo Federated Learning with Multi-Key Homomorphic Encryption](https://arxiv.org/abs/2505.14797) Abdullah Al Omar, Xin Yang, Euijin Choo, Omid Ardakanian -+ [GloSS over Toxicity: Understanding and Mitigating Toxicity in LLMs via Global Toxic Subspace](https://arxiv.org//abs/2505.17078) ++ [GloSS over Toxicity: Understanding and Mitigating Toxicity in LLMs via Global Toxic Subspace](https://arxiv.org/abs/2505.17078) Zenghao Duan, Zhiyi Yin, Zhichao Shi, Liang Pang, Shaoling Jing, Jiayi Wu, Yu Yan, Huawei Shen, Xueqi Cheng -+ [Trust Me, I Can Handle It: Self-Generated Adversarial Scenario Extrapolation for Robust Language Models](https://arxiv.org//abs/2505.17089) ++ [Trust Me, I Can Handle It: Self-Generated Adversarial Scenario Extrapolation for Robust Language Models](https://arxiv.org/abs/2505.17089) Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Ye Wang, Gang Tan, Shagufta Mehnaz @@ -10018,15 +10018,15 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang -+ [Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable](https://arxiv.org//abs/2505.14359) ++ [Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable](https://arxiv.org/abs/2505.14359) Ruoxin Chen, Junwei Xi, Zhiyuan Yan, Ke-Yue Zhang, Shuang Wu, Jingyi Xie, Xu Chen, Lei Xu, Isabel Guan, Taiping Yao, Shouhong Ding -+ [Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning](https://arxiv.org//abs/2505.14585) ++ [Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning](https://arxiv.org/abs/2505.14585) Wenbin Hu, Haoran Li, Huihao Jing, Qi Hu, Ziqian Zeng, Sirui Han, Heli Xu, Tianshu Chu, Peizhao Hu, Yangqiu Song -+ [Causes and Consequences of Representational Similarity in Machine Learning Models](https://arxiv.org//abs/2505.13899) ++ [Causes and Consequences of Representational Similarity in Machine Learning Models](https://arxiv.org/abs/2505.13899) Zeyu Michael Li, Hung Anh Vu, Damilola Awofisayo, Emily Wenger @@ -10038,80 +10038,80 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Darpan Aswal, Siddharth D Jaiswal -+ [SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment](https://arxiv.org//abs/2505.14667) ++ [SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment](https://arxiv.org/abs/2505.14667) Wonje Jeung, Sangyeon Yoon, Minsuk Kahng, Albert No # 2025-05-19 -+ [Bullying the Machine: How Personas Increase LLM Vulnerability](https://arxiv.org//abs/2505.12692) ++ [Bullying the Machine: How Personas Increase LLM Vulnerability](https://arxiv.org/abs/2505.12692) Ziwei Xu, Udit Sanghi, Mohan Kankanhalli -+ [Language Models That Walk the Talk: A Framework for Formal Fairness Certificates](https://arxiv.org//abs/2505.12767) ++ [Language Models That Walk the Talk: A Framework for Formal Fairness Certificates](https://arxiv.org/abs/2505.12767) Danqing Chen, Tobias Ladner, Ahmed Rayen Mhadhbi, Matthias Althoff -+ [Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities](https://arxiv.org//abs/2505.13195) ++ [Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities](https://arxiv.org/abs/2505.13195) Lili Zhang, Haomiaomiao Wang, Long Cheng, Libao Deng, Tomas Ward -+ [FLTG: Byzantine-Robust Federated Learning via Angle-Based Defense and Non-IID-Aware Weighting](https://arxiv.org//abs/2505.12851) ++ [FLTG: Byzantine-Robust Federated Learning via Angle-Based Defense and Non-IID-Aware Weighting](https://arxiv.org/abs/2505.12851) Yanhua Wen, Lu Ai, Gang Liu, Chuang Li, Jianhao Wei -+ [Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?](https://arxiv.org//abs/2505.12871) ++ [Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?](https://arxiv.org/abs/2505.12871) Zi Liang, Haibo Hu, Qingqing Ye, Yaxin Xiao, Ronghua Li -+ [From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents](https://arxiv.org//abs/2505.12981) ++ [From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents](https://arxiv.org/abs/2505.12981) Liangxuan Wu, Chao Wang, Tianming Liu, Yanjie Zhao, Haoyu Wang -+ [Anti-Inpainting: A Proactive Defense against Malicious Diffusion-based Inpainters under Unknown Conditions](https://arxiv.org//abs/2505.13023) ++ [Anti-Inpainting: A Proactive Defense against Malicious Diffusion-based Inpainters under Unknown Conditions](https://arxiv.org/abs/2505.13023) Yimao Guo, Zuomin Qu, Wei Lu, Xiangyang Luo -+ [Evaluatiing the efficacy of LLM Safety Solutions : The Palit Benchmark Dataset](https://arxiv.org//abs/2505.13028) ++ [Evaluatiing the efficacy of LLM Safety Solutions : The Palit Benchmark Dataset](https://arxiv.org/abs/2505.13028) Sayon Palit, Daniel Woods -+ [FlowPure: Continuous Normalizing Flows for Adversarial Purification](https://arxiv.org//abs/2505.13280) ++ [FlowPure: Continuous Normalizing Flows for Adversarial Purification](https://arxiv.org/abs/2505.13280) Elias Collaert, Abel Rodríguez, Sander Joos, Lieven Desmet, Vera Rimmer -+ [Recommender Systems for Democracy: Toward Adversarial Robustness in Voting Advice Applications](https://arxiv.org//abs/2505.13329) ++ [Recommender Systems for Democracy: Toward Adversarial Robustness in Voting Advice Applications](https://arxiv.org/abs/2505.13329) Frédéric Berdoz, Dustin Brunner, Yann Vonlanthen, Roger Wattenhofer -+ [Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning](https://arxiv.org//abs/2505.13709) ++ [Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning](https://arxiv.org/abs/2505.13709) Jiayu Chen, Aravind Venugopal, Jeff Schneider -+ [Robust learning of halfspaces under log-concave marginals](https://arxiv.org//abs/2505.13708) ++ [Robust learning of halfspaces under log-concave marginals](https://arxiv.org/abs/2505.13708) Jane Lange, Arsen Vasilyan -+ [A Few Large Shifts: Layer-Inconsistency Based Minimal Overhead Adversarial Example Detection](https://arxiv.org//abs/2505.12586) ++ [A Few Large Shifts: Layer-Inconsistency Based Minimal Overhead Adversarial Example Detection](https://arxiv.org/abs/2505.12586) Sanggeon Yun, Ryozo Masukawa, Hyunwoo Oh, Nathaniel D. Bastian, Mohsen Imani -+ [BeamClean: Language Aware Embedding Reconstruction](https://arxiv.org//abs/2505.13758) ++ [BeamClean: Language Aware Embedding Reconstruction](https://arxiv.org/abs/2505.13758) Kaan Kale, Kyle Mylonakis, Jay Roberts, Sidhartha Roy -+ [SVAFD: A Secure and Verifiable Co-Aggregation Protocol for Federated Distillation](https://arxiv.org//abs/2505.13319) ++ [SVAFD: A Secure and Verifiable Co-Aggregation Protocol for Federated Distillation](https://arxiv.org/abs/2505.13319) Tian Wen, Sheng Sun, Yuwei Wang, Peiyan Chen, Zhiyuan Wu, Min Liu, Bo Gao -+ [Safety Alignment Can Be Not Superficial With Explicit Safety Signals](https://arxiv.org//abs/2505.17072) ++ [Safety Alignment Can Be Not Superficial With Explicit Safety Signals](https://arxiv.org/abs/2505.17072) Jianwei Li, Jung-Eng Kim -+ [Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning](https://arxiv.org//abs/2505.13327) ++ [Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning](https://arxiv.org/abs/2505.13327) Ajian Liu, Haocheng Yuan, Xiao Guo, Hui Ma, Wanyi Zhuang, Changtao Miao, Yan Hong, Chuanbiao Song, Jun Lan, Qi Chu, Tao Gong, Yanyan Liang, Weiqiang Wang, Jun Wan, Xiaoming Liu, Zhen Lei -+ [RMMSS: Towards Advanced Robust Multi-Modal Semantic Segmentation with Hybrid Prototype Distillation and Feature Selection](https://arxiv.org//abs/2505.12861) ++ [RMMSS: Towards Advanced Robust Multi-Modal Semantic Segmentation with Hybrid Prototype Distillation and Feature Selection](https://arxiv.org/abs/2505.12861) Jiaqi Tan, Xu Zheng, Yang Liu @@ -10119,76 +10119,76 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Shristi Das Biswas, Arani Roy, Kaushik Roy -+ [3D Visual Illusion Depth Estimation](https://arxiv.org//abs/2505.13061) ++ [3D Visual Illusion Depth Estimation](https://arxiv.org/abs/2505.13061) Chengtang Yao, Zhidan Liu, Jiaxi Zeng, Lidong Yu, Yuwei Wu, Yunde Jia -+ [Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations](https://arxiv.org//abs/2505.13763) ++ [Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations](https://arxiv.org/abs/2505.13763) Li Ji-An, Hua-Dong Xiong, Robert C. Wilson, Marcelo G. Mattar, Marcus K. Benna # 2025-05-18 -+ [Self-Destructive Language Model](https://arxiv.org//abs/2505.12186) ++ [Self-Destructive Language Model](https://arxiv.org/abs/2505.12186) Yuhui Wang, Rongyi Zhu, Ting Wang -+ [PANORAMA: A synthetic PII-laced dataset for studying sensitive data memorization in LLMs](https://arxiv.org//abs/2505.12238) ++ [PANORAMA: A synthetic PII-laced dataset for studying sensitive data memorization in LLMs](https://arxiv.org/abs/2505.12238) Sriram Selvam, Anneswa Ghosh -+ [The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models](https://arxiv.org//abs/2505.12287) ++ [The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models](https://arxiv.org/abs/2505.12287) Linghan Huang, Haolin Jin, Zhaoge Bi, Pengyue Yang, Peizhou Zhao, Taozhao Chen, Xiongfei Wu, Lei Ma, Huaming Chen -+ [Robust Planning for Autonomous Driving via Mixed Adversarial Diffusion Predictions](https://arxiv.org//abs/2505.12327) ++ [Robust Planning for Autonomous Driving via Mixed Adversarial Diffusion Predictions](https://arxiv.org/abs/2505.12327) Albert Zhao, Stefano Soatto -+ [VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning](https://arxiv.org//abs/2505.12332) ++ [VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning](https://arxiv.org/abs/2505.12332) Qianyue Hu, Junyan Wu, Wei Lu, Xiangyang Luo -+ [CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement](https://arxiv.org//abs/2505.12368) ++ [CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement](https://arxiv.org/abs/2505.12368) Gauri Kholkar, Ratinder Ahuja -+ [IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems](https://arxiv.org//abs/2505.12442) ++ [IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems](https://arxiv.org/abs/2505.12442) Liwen Wang, Wenxuan Wang, Shuai Wang, Zongjie Li, Zhenlan Ji, Zongyi Lyu, Daoyuan Wu, Shing-Chi Cheung -+ [A Survey of Attacks on Large Language Models](https://arxiv.org//abs/2505.12567) ++ [A Survey of Attacks on Large Language Models](https://arxiv.org/abs/2505.12567) Wenrui Xu, Keshab K. Parhi -+ [Extracting memorized pieces of (copyrighted) books from open-weight language models](https://arxiv.org//abs/2505.12546) ++ [Extracting memorized pieces of (copyrighted) books from open-weight language models](https://arxiv.org/abs/2505.12546) A. Feder Cooper, Aaron Gokaslan, Amy B. Cyphert, Christopher De Sa, Mark A. Lemley, Daniel E. Ho, Percy Liang -+ [Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression](https://arxiv.org//abs/2505.13527) ++ [Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression](https://arxiv.org/abs/2505.13527) Jingyu Peng, Maolin Wang, Nan Wang, Xiangyu Zhao, Jiatong Li, Kai Zhang, Qi Liu -+ [LLM-Based User Simulation for Low-Knowledge Shilling Attacks on Recommender Systems](https://arxiv.org//abs/2505.13528) ++ [LLM-Based User Simulation for Low-Knowledge Shilling Attacks on Recommender Systems](https://arxiv.org/abs/2505.13528) Shengkang Gu, Jiahao Liu, Dongsheng Li, Guangping Zhang, Mingzhe Han, Hansu Gu, Peng Zhang, Ning Gu, Li Shang, Tun Lu -+ [SPIRIT: Patching Speech Language Models against Jailbreak Attacks](https://arxiv.org//abs/2505.13541) ++ [SPIRIT: Patching Speech Language Models against Jailbreak Attacks](https://arxiv.org/abs/2505.13541) Amirbek Djanibekov, Nurdaulet Mukhituly, Kentaro Inui, Hanan Aldarmaki, Nils Lukas -+ [Harnessing the Universal Geometry of Embeddings](https://arxiv.org//abs/2505.12540) ++ [Harnessing the Universal Geometry of Embeddings](https://arxiv.org/abs/2505.12540) Rishi Jha, Collin Zhang, Vitaly Shmatikov, John X. Morris -+ [Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration](https://arxiv.org//abs/2505.17066) ++ [Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration](https://arxiv.org/abs/2505.17066) Tatia Tsmindashvili, Ana Kolkhidashvili, Dachi Kurtskhalia, Nino Maghlakelidze, Elene Mekvabishvili, Guram Dentoshvili, Orkhan Shamilov, Zaal Gachechiladze, Steven Saporta, David Dachi Choladze -+ [EVALOOP: Assessing LLM Robustness in Programming from a Self-consistency Perspective](https://arxiv.org//abs/2505.12185) ++ [EVALOOP: Assessing LLM Robustness in Programming from a Self-consistency Perspective](https://arxiv.org/abs/2505.12185) Sen Fang, Weiyuan Ding, Bowen Xu -+ [Provably Sample-Efficient Robust Reinforcement Learning with Average Reward](https://arxiv.org//abs/2505.12462) ++ [Provably Sample-Efficient Robust Reinforcement Learning with Average Reward](https://arxiv.org/abs/2505.12462) Zachary Roch, Chi Zhang, George Atia, Yue Wang @@ -10197,104 +10197,104 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Jingyu Peng, Maolin Wang, Nan Wang, Jiatong Li, Yuchen Li, Yuyang Ye, Wanyu Wang, Pengyue Jia, Kai Zhang, Xiangyu Zhao # 2025-05-17 -+ [CL-CaGAN: Capsule differential adversarial continuous learning for cross-domain hyperspectral anomaly detection](https://arxiv.org//abs/2505.11793) ++ [CL-CaGAN: Capsule differential adversarial continuous learning for cross-domain hyperspectral anomaly detection](https://arxiv.org/abs/2505.11793) Jianing Wang, Siying Guo, Zheng Hua, Runhu Huang, Jinyu Hu, Maoguo Gong -+ [Multilingual Collaborative Defense for Large Language Models](https://arxiv.org//abs/2505.11835) ++ [Multilingual Collaborative Defense for Large Language Models](https://arxiv.org/abs/2505.11835) Hongliang Li, Jinan Xu, Gengping Cui, Changhao Guan, Fengran Mo, Kaiyu Huang -+ [On Membership Inference Attacks in Knowledge Distillation](https://arxiv.org//abs/2505.11837) ++ [On Membership Inference Attacks in Knowledge Distillation](https://arxiv.org/abs/2505.11837) Ziyao Cui, Minxing Zhang, Jian Pei -+ [Improving Fairness in LLMs Through Testing-Time Adversaries](https://arxiv.org//abs/2505.12100) ++ [Improving Fairness in LLMs Through Testing-Time Adversaries](https://arxiv.org/abs/2505.12100) Isabela Pereira Gregio, Ian Pons, Anna Helena Reali Costa, Artur Jordão -+ [Why Not Act on What You Know? Unleashing Safety Potential of LLMs via Self-Aware Guard Enhancement](https://arxiv.org//abs/2505.12060) ++ [Why Not Act on What You Know? Unleashing Safety Potential of LLMs via Self-Aware Guard Enhancement](https://arxiv.org/abs/2505.12060) Peng Ding, Jun Kuang, Zongyu Wang, Xuezhi Cao, Xunliang Cai, Jiajun Chen, Shujian Huang -+ [JULI: Jailbreak Large Language Models by Self-Introspection](https://arxiv.org//abs/2505.11790) ++ [JULI: Jailbreak Large Language Models by Self-Introspection](https://arxiv.org/abs/2505.11790) Jesson Wang, Zhanhao Hu, David Wagner -+ [Coded Robust Aggregation for Distributed Learning under Byzantine Attacks](https://arxiv.org//abs/2506.01989) ++ [Coded Robust Aggregation for Distributed Learning under Byzantine Attacks](https://arxiv.org/abs/2506.01989) Chengxi Li, Ming Xiao, Mikael Skoglund -+ [TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text](https://arxiv.org//abs/2505.11988) ++ [TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text](https://arxiv.org/abs/2505.11988) Ahmed Lekssays, Utsav Shukla, Husrev Taha Sencar, Md Rizwan Parvez # 2025-05-16 -+ [LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios](https://arxiv.org//abs/2505.11247) ++ [LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios](https://arxiv.org/abs/2505.11247) Mingxing Peng, Yuting Xie, Xusen Guo, Ruoyu Yao, Hai Yang, Jun Ma -+ [On the Security Risks of ML-based Malware Detection Systems: A Survey](https://arxiv.org//abs/2505.10903) ++ [On the Security Risks of ML-based Malware Detection Systems: A Survey](https://arxiv.org/abs/2505.10903) Ping He, Yuhao Mao, Changjiang Li, Lorenzo Cavallaro, Ting Wang, Shouling Ji -+ [GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation Models](https://arxiv.org//abs/2505.10983) ++ [GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation Models](https://arxiv.org/abs/2505.10983) Haozheng Luo, Chenghao Qiu, Yimin Wang, Shang Wu, Jiahao Yu, Han Liu, Binghui Wang, Yan Chen -+ [CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs](https://arxiv.org//abs/2505.11413) ++ [CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs](https://arxiv.org/abs/2505.11413) Sijia Chen, Xiaomin Li, Mengxue Zhang, Eric Hanchen Jiang, Qingcheng Zeng, Chen-Hsiang Yu -+ [LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs](https://arxiv.org//abs/2505.10838) ++ [LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs](https://arxiv.org/abs/2505.10838) Ran Li, Hao Wang, Chengzhi Mao -+ [Random Client Selection on Contrastive Federated Learning for Tabular Data](https://arxiv.org//abs/2505.10759) ++ [Random Client Selection on Contrastive Federated Learning for Tabular Data](https://arxiv.org/abs/2505.10759) Achmad Ginanjar, Xue Li, Priyanka Singh, Wen Hua -+ [AutoRAN: Weak-to-Strong Jailbreaking of Large Reasoning Models](https://arxiv.org//abs/2505.10846) ++ [AutoRAN: Weak-to-Strong Jailbreaking of Large Reasoning Models](https://arxiv.org/abs/2505.10846) Jiacheng Liang, Tanqiu Jiang, Yuhui Wang, Rongyi Zhu, Fenglong Ma, Ting Wang -+ [Nosy Layers, Noisy Fixes: Tackling DRAs in Federated Learning Systems using Explainable AI](https://arxiv.org//abs/2505.10942) ++ [Nosy Layers, Noisy Fixes: Tackling DRAs in Federated Learning Systems using Explainable AI](https://arxiv.org/abs/2505.10942) Meghali Nandi, Arash Shaghaghi, Nazatul Haque Sultan, Gustavo Batista, Raymond K. Zhao, Sanjay Jha -+ [Verifiably Forgotten? Gradient Differences Still Enable Data Reconstruction in Federated Unlearning](https://arxiv.org//abs/2505.11097) ++ [Verifiably Forgotten? Gradient Differences Still Enable Data Reconstruction in Federated Unlearning](https://arxiv.org/abs/2505.11097) Fuyao Zhang, Wenjie Li, Yurong Hao, Xinyu Yan, Yang Cao, Wei Yang Bryan Lim -+ [ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks](https://arxiv.org//abs/2505.11459) ++ [ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks](https://arxiv.org/abs/2505.11459) Zhixiong Zhuang, Maria-Irina Nicolae, Hui-Po Wang, Mario Fritz -+ [Probing the Vulnerability of Large Language Models to Polysemantic Interventions](https://arxiv.org//abs/2505.11611) ++ [Probing the Vulnerability of Large Language Models to Polysemantic Interventions](https://arxiv.org/abs/2505.11611) Bofan Gong, Shiyang Lai, Dawn Song -+ [The Ripple Effect: On Unforeseen Complications of Backdoor Attacks](https://arxiv.org//abs/2505.11586) ++ [The Ripple Effect: On Unforeseen Complications of Backdoor Attacks](https://arxiv.org/abs/2505.11586) Rui Zhang, Yun Shen, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Yuan Zhang, Guowen Xu, Yang Zhang -+ [PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning](https://arxiv.org//abs/2505.11642) ++ [PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning](https://arxiv.org/abs/2505.11642) Falong Fan, Xi Li -+ [EnvInjection: Environmental Prompt Injection Attack to Multi-modal Web Agents](https://arxiv.org//abs/2505.11717) ++ [EnvInjection: Environmental Prompt Injection Attack to Multi-modal Web Agents](https://arxiv.org/abs/2505.11717) Xilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, Neil Zhenqiang Gong -+ [Noise Injection Systemically Degrades Large Language Model Safety Guardrails](https://arxiv.org//abs/2505.13500) ++ [Noise Injection Systemically Degrades Large Language Model Safety Guardrails](https://arxiv.org/abs/2505.13500) Prithviraj Singh Shahani, Matthias Scheutz -+ [EcoSafeRAG: Efficient Security through Context Analysis in Retrieval-Augmented Generation](https://arxiv.org//abs/2505.13506) ++ [EcoSafeRAG: Efficient Security through Context Analysis in Retrieval-Augmented Generation](https://arxiv.org/abs/2505.13506) Ruobing Yao, Yifei Zhang, Shuang Song, Neng Gao, Chenyang Tu -+ [Adversarially Robust Spiking Neural Networks with Sparse Connectivity](https://arxiv.org//abs/2505.15833) ++ [Adversarially Robust Spiking Neural Networks with Sparse Connectivity](https://arxiv.org/abs/2505.15833) Mathias Schmolli, Maximilian Baronig, Robert Legenstein, Ozan Özdenizci @@ -10319,84 +10319,84 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Wei Huang, Hanchen Wang, Dong Wen, Shaozhen Ma, Wenjie Zhang, Xuemin Lin # 2025-05-15 -+ [Dark LLMs: The Growing Threat of Unaligned AI Models](https://arxiv.org//abs/2505.10066) ++ [Dark LLMs: The Growing Threat of Unaligned AI Models](https://arxiv.org/abs/2505.10066) Michael Fire, Yitzhak Elbazis, Adi Wasenstein, Lior Rokach -+ [Cutting Through Privacy: A Hyperplane-Based Data Reconstruction Attack in Federated Learning](https://arxiv.org//abs/2505.10264) ++ [Cutting Through Privacy: A Hyperplane-Based Data Reconstruction Attack in Federated Learning](https://arxiv.org/abs/2505.10264) Francesco Diana, André Nusser, Chuan Xu, Giovanni Neglia -+ [Defending the Edge: Representative-Attention for Mitigating Backdoor Attacks in Federated Learning](https://arxiv.org//abs/2505.10297) ++ [Defending the Edge: Representative-Attention for Mitigating Backdoor Attacks in Federated Learning](https://arxiv.org/abs/2505.10297) Chibueze Peace Obioma, Youcheng Sun, Mustafa A. Mustafa -+ [Learned Lightweight Smartphone ISP with Unpaired Data](https://arxiv.org//abs/2505.10420) ++ [Learned Lightweight Smartphone ISP with Unpaired Data](https://arxiv.org/abs/2505.10420) Andrei Arhire, Radu Timofte -+ [PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization](https://arxiv.org//abs/2505.09921) ++ [PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization](https://arxiv.org/abs/2505.09921) Yidan Wang, Yanan Cao, Yubing Ren, Fang Fang, Zheng Lin, Binxing Fang -+ [A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability](https://arxiv.org//abs/2505.10351) ++ [A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability](https://arxiv.org/abs/2505.10351) Jie Zhu, Jirong Zha, Ding Li, Leye Wang -+ [MorphGuard: Morph Specific Margin Loss for Enhancing Robustness to Face Morphing Attacks](https://arxiv.org//abs/2505.10497) ++ [MorphGuard: Morph Specific Margin Loss for Enhancing Robustness to Face Morphing Attacks](https://arxiv.org/abs/2505.10497) Iurii Medvedev, Nuno Goncalves -+ [Sybil-based Virtual Data Poisoning Attacks in Federated Learning](https://arxiv.org//abs/2505.09983) ++ [Sybil-based Virtual Data Poisoning Attacks in Federated Learning](https://arxiv.org/abs/2505.09983) Changxun Zhu, Qilong Wu, Lingjuan Lyu, Shibei Xue -+ [The Ephemeral Threat: Assessing the Security of Algorithmic Trading Systems powered by Deep Learning](https://arxiv.org//abs/2505.10430) ++ [The Ephemeral Threat: Assessing the Security of Algorithmic Trading Systems powered by Deep Learning](https://arxiv.org/abs/2505.10430) Advije Rizvani, Giovanni Apruzzese, Pavel Laskov -+ [One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems](https://arxiv.org//abs/2505.11548) ++ [One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems](https://arxiv.org/abs/2505.11548) Zhiyuan Chang, Xiaojun Jia, Mingyang Li, Junjie Wang, Yuekai Huang, Qing Wang, Ziyou Jiang, Yang Liu -+ [Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?](https://arxiv.org//abs/2505.10443) ++ [Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?](https://arxiv.org/abs/2505.10443) Pedro Orvalho, Marta Kwiatkowska -+ [Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data](https://arxiv.org//abs/2505.09974) ++ [Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data](https://arxiv.org/abs/2505.09974) Adel ElZemity, Budi Arief, Shujun Li # 2025-05-14 -+ [Evaluating the Robustness of Adversarial Defenses in Malware Detection Systems](https://arxiv.org//abs/2505.09342) ++ [Evaluating the Robustness of Adversarial Defenses in Malware Detection Systems](https://arxiv.org/abs/2505.09342) Mostafa Jafari, Alireza Shameli-Sendi -+ [FedSaaS: Class-Consistency Federated Semantic Segmentation via Global Prototype Supervision and Local Adversarial Harmonization](https://arxiv.org//abs/2505.09385) ++ [FedSaaS: Class-Consistency Federated Semantic Segmentation via Global Prototype Supervision and Local Adversarial Harmonization](https://arxiv.org/abs/2505.09385) Xiaoyang Yu, Xiaoming Wu, Xin Wang, Dongrun Li, Ming Yang, Peng Cheng -+ [Layered Unlearning for Adversarial Relearning](https://arxiv.org//abs/2505.09500) ++ [Layered Unlearning for Adversarial Relearning](https://arxiv.org/abs/2505.09500) Timothy Qian, Vinith Suriyakumar, Ashia Wilson, Dylan Hadfield-Menell -+ [Adversarial Suffix Filtering: a Defense Pipeline for LLMs](https://arxiv.org//abs/2505.09602) ++ [Adversarial Suffix Filtering: a Defense Pipeline for LLMs](https://arxiv.org/abs/2505.09602) David Khachaturov, Robert Mullins -+ [Toward Malicious Clients Detection in Federated Learning](https://arxiv.org//abs/2505.09110) ++ [Toward Malicious Clients Detection in Federated Learning](https://arxiv.org/abs/2505.09110) Zhihao Dou, Jiaqi Wang, Wei Sun, Zhuqing Liu, Minghong Fang -+ [Self-Consuming Generative Models with Adversarially Curated Data](https://arxiv.org//abs/2505.09768) ++ [Self-Consuming Generative Models with Adversarially Curated Data](https://arxiv.org/abs/2505.09768) Xiukun Wei, Xueru Zhang -+ [Adversarial Attack on Large Language Models using Exponentiated Gradient Descent](https://arxiv.org//abs/2505.09820) ++ [Adversarial Attack on Large Language Models using Exponentiated Gradient Descent](https://arxiv.org/abs/2505.09820) Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu -+ [Revisiting Adversarial Perception Attacks and Defense Methods on Autonomous Driving Systems](https://arxiv.org//abs/2505.11532) ++ [Revisiting Adversarial Perception Attacks and Defense Methods on Autonomous Driving Systems](https://arxiv.org/abs/2505.11532) Cheng Chen, Yuhong Wang, Nafis S Munir, Xiangwei Zhou, Xugui Zhou @@ -10409,92 +10409,92 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Zhihao Dou, Jiaqi Wang, Wei Sun, Zhuqing Liu, Minghong Fang # 2025-05-13 -+ [Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning](https://arxiv.org//abs/2505.08138) ++ [Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning](https://arxiv.org/abs/2505.08138) Brennon Brimhall, Philip Mathew, Neil Fendley, Yinzhi Cao, Matthew Green -+ [A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem](https://arxiv.org//abs/2505.08148) ++ [A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem](https://arxiv.org/abs/2505.08148) Sunday Oyinlola Ogundoyin, Muhammad Ikram, Hassan Jameel Asghar, Benjamin Zi Hao Zhao, Dali Kaafar -+ [Removing Watermarks with Partial Regeneration using Semantic Information](https://arxiv.org//abs/2505.08234) ++ [Removing Watermarks with Partial Regeneration using Semantic Information](https://arxiv.org/abs/2505.08234) Krti Tallam, John Kevin Cava, Caleb Geniesse, N. Benjamin Erichson, Michael W. Mahoney -+ [SHAP-based Explanations are Sensitive to Feature Representation](https://arxiv.org//abs/2505.08345) ++ [SHAP-based Explanations are Sensitive to Feature Representation](https://arxiv.org/abs/2505.08345) Hyunseung Hwang, Andrew Bell, Joao Fonseca, Venetia Pliatsika, Julia Stoyanovich, Steven Euijong Whang -+ [DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art](https://arxiv.org//abs/2505.08552) ++ [DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art](https://arxiv.org/abs/2505.08552) Haroon Wahab, Hassan Ugail, Irfan Mehmood -+ [Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted](https://arxiv.org//abs/2505.08255) ++ [Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted](https://arxiv.org/abs/2505.08255) Shuaiwei Yuan, Junyu Dong, Yuezun Li -+ [SpecSphere: Dual-Pass Spectral-Spatial Graph Neural Networks with Certified Robustness](https://arxiv.org//abs/2505.08320) ++ [SpecSphere: Dual-Pass Spectral-Spatial Graph Neural Networks with Certified Robustness](https://arxiv.org/abs/2505.08320) Yoonhyuk Choi, Chong-Kwon Kim -+ [On the Account Security Risks Posed by Password Strength Meters](https://arxiv.org//abs/2505.08292) ++ [On the Account Security Risks Posed by Password Strength Meters](https://arxiv.org/abs/2505.08292) Ming Xu, Weili Han, Jitao Yu, Jing Liu, Xinyi Zhang, Yun Lin, Jin Song Dong -+ [Federated Large Language Models: Feasibility, Robustness, Security and Future Directions](https://arxiv.org//abs/2505.08830) ++ [Federated Large Language Models: Feasibility, Robustness, Security and Future Directions](https://arxiv.org/abs/2505.08830) Wenhao Jiang, Yuchuan Luo, Guilin Deng, Silong Chen, Xu Yang, Shihong Wu, Xinwen Gao, Lin Liu, Shaojing Fu -+ [Robustness Analysis against Adversarial Patch Attacks in Fully Unmanned Stores](https://arxiv.org//abs/2505.08835) ++ [Robustness Analysis against Adversarial Patch Attacks in Fully Unmanned Stores](https://arxiv.org/abs/2505.08835) Hyunsik Na, Wonho Lee, Seungdeok Roh, Sohee Park, Daeseon Choi -+ [On the interplay of Explainability, Privacy and Predictive Performance with Explanation-assisted Model Extraction](https://arxiv.org//abs/2505.08847) ++ [On the interplay of Explainability, Privacy and Predictive Performance with Explanation-assisted Model Extraction](https://arxiv.org/abs/2505.08847) Fatima Ezzeddine, Rinad Akel, Ihab Sbeity, Silvia Giordano, Marc Langheinrich, Omran Ayoub -+ [Towards Adaptive Meta-Gradient Adversarial Examples for Visual Tracking](https://arxiv.org//abs/2505.08999) ++ [Towards Adaptive Meta-Gradient Adversarial Examples for Visual Tracking](https://arxiv.org/abs/2505.08999) Wei-Long Tian, Peng Gao, Xiao Liu, Long Xu, Hamido Fujita, Hanan Aljuai, Mao-Li Wang -+ [Lower Bounds on the MMSE of Adversarially Inferring Sensitive Features](https://arxiv.org//abs/2505.09004) ++ [Lower Bounds on the MMSE of Adversarially Inferring Sensitive Features](https://arxiv.org/abs/2505.09004) Monica Welfert, Nathan Stromberg, Mario Diaz, Lalitha Sankar -+ [Inference Attacks for X-Vector Speaker Anonymization](https://arxiv.org//abs/2505.08978) ++ [Inference Attacks for X-Vector Speaker Anonymization](https://arxiv.org/abs/2505.08978) Luke Bauer, Wenxuan Bao, Malvika Jadhav, Vincent Bindschaedler -+ [DArFace: Deformation Aware Robustness for Low Quality Face Recognition](https://arxiv.org//abs/2505.08423) ++ [DArFace: Deformation Aware Robustness for Low Quality Face Recognition](https://arxiv.org/abs/2505.08423) Sadaf Gulshad, Abdullah Aldahlawi Thakaa # 2025-05-12 -+ [GRADA: Graph-based Reranker against Adversarial Documents Attack](https://arxiv.org//abs/2505.07546) ++ [GRADA: Graph-based Reranker against Adversarial Documents Attack](https://arxiv.org/abs/2505.07546) Jingjie Zheng, Aryo Pradipta Gema, Giwon Hong, Xuanli He, Pasquale Minervini, Youcheng Sun, Qiongkai Xu -+ [Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks](https://arxiv.org//abs/2505.08022) ++ [Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks](https://arxiv.org/abs/2505.08022) Steffen Schotthöfer, H. Lexie Yang, Stefan Schnake -+ [MixBridge: Heterogeneous Image-to-Image Backdoor Attack through Mixture of Schrödinger Bridges](https://arxiv.org//abs/2505.08809) ++ [MixBridge: Heterogeneous Image-to-Image Backdoor Attack through Mixture of Schrödinger Bridges](https://arxiv.org/abs/2505.08809) Shixi Qin, Zhiyong Yang, Shilong Bao, Shi Wang, Qianqian Xu, Qingming Huang -+ [SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models](https://arxiv.org//abs/2505.07584) ++ [SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models](https://arxiv.org/abs/2505.07584) Huining Cui, Wei Liu -+ [Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy](https://arxiv.org//abs/2505.07614) ++ [Trial and Trust: Addressing Byzantine Attacks with Comprehensive Defense Strategy](https://arxiv.org/abs/2505.07614) Gleb Molodtsov, Daniil Medyakov, Sergey Skorik, Nikolas Khachaturov, Shahane Tigranyan, Vladimir Aletov, Aram Avetisyan, Martin Takáč, Aleksandr Beznosikov -+ [FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning](https://arxiv.org//abs/2505.08054) ++ [FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning](https://arxiv.org/abs/2505.08054) Zhehao Zhang, Weijie Xu, Fanyou Wu, Chandan K. Reddy -+ [No Query, No Access](https://arxiv.org//abs/2505.07258) ++ [No Query, No Access](https://arxiv.org/abs/2505.07258) Wenqiang Wang, Siyuan Liang, Yangshijie Zhang, Xiaojun Jia, Hao Lin, Xiaochun Cao @@ -10502,12 +10502,12 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Fariha Tanjim Shifat, Sayma Sarwar Ela, Mosarrat Jahan -+ [Sharp Gaussian approximations for Decentralized Federated Learning](https://arxiv.org//abs/2505.08125) ++ [Sharp Gaussian approximations for Decentralized Federated Learning](https://arxiv.org/abs/2505.08125) Soham Bonnerjee, Sayar Karmakar, Wei Biao Wu # 2025-05-11 -+ [TokenProber: Jailbreaking Text-to-image Models via Fine-grained Word Impact Analysis](https://arxiv.org//abs/2505.08804) ++ [TokenProber: Jailbreaking Text-to-image Models via Fine-grained Word Impact Analysis](https://arxiv.org/abs/2505.08804) Longtian Wang, Xiaofei Xie, Tianlin Li, Yuhan Zhi, Chao Shen @@ -10516,105 +10516,105 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Zihan Guan, Mengxuan Hu, Ronghang Zhu, Sheng Li, Anil Vullikanti # 2025-05-10 -+ [TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification](https://arxiv.org//abs/2505.06580) ++ [TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification](https://arxiv.org/abs/2505.06580) Dongyoon Yang, Jihu Lee, Yongdai Kim -+ [I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference](https://arxiv.org//abs/2505.06738) ++ [I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference](https://arxiv.org/abs/2505.06738) Zibo Gao, Junjie Hu, Feng Guo, Yixin Zhang, Yinglong Han, Siyuan Liu, Haiyang Li, Zhiqiang Lv -+ [PRUNE: A Patching Based Repair Framework for Certifiable Unlearning of Neural Networks](https://arxiv.org//abs/2505.06520) ++ [PRUNE: A Patching Based Repair Framework for Certifiable Unlearning of Neural Networks](https://arxiv.org/abs/2505.06520) Xuran Li, Jingyi Wang, Xiaohan Yuan, Peixin Zhang -+ [Boundary-Guided Trajectory Prediction for Road Aware and Physically Feasible Autonomous Driving](https://arxiv.org//abs/2505.06740) ++ [Boundary-Guided Trajectory Prediction for Road Aware and Physically Feasible Autonomous Driving](https://arxiv.org/abs/2505.06740) Ahmed Abouelazm, Mianzhi Liu, Christian Hubschneider, Yin Wu, Daniel Slieter, J. Marius Zöllner # 2025-05-09 -+ [AgentXploit: End-to-End Redteaming of Black-Box AI Agents](https://arxiv.org//abs/2505.05849) ++ [AgentXploit: End-to-End Redteaming of Black-Box AI Agents](https://arxiv.org/abs/2505.05849) Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song -+ [Realistic Adversarial Attacks for Robustness Evaluation of Trajectory Prediction Models via Future State Perturbation](https://arxiv.org//abs/2505.06134) ++ [Realistic Adversarial Attacks for Robustness Evaluation of Trajectory Prediction Models via Future State Perturbation](https://arxiv.org/abs/2505.06134) Julian F. Schumann, Jeroen Hagenus, Frederik Baymler Mathiesen, Arkady Zgonnikov -+ [A Taxonomy of Attacks and Defenses in Split Learning](https://arxiv.org//abs/2505.05872) ++ [A Taxonomy of Attacks and Defenses in Split Learning](https://arxiv.org/abs/2505.05872) Aqsa Shabbir, Halil İbrahim Kanpak, Alptekin Küpçü, Sinem Sav -+ [CAPE: Context-Aware Prompt Perturbation Mechanism with Differential Privacy](https://arxiv.org//abs/2505.05922) ++ [CAPE: Context-Aware Prompt Perturbation Mechanism with Differential Privacy](https://arxiv.org/abs/2505.05922) Haoqi Wu, Wei Dai, Li Wang, Qiang Yan -+ [LLM-Text Watermarking based on Lagrange Interpolation](https://arxiv.org//abs/2505.05712) ++ [LLM-Text Watermarking based on Lagrange Interpolation](https://arxiv.org/abs/2505.05712) Jarosław Janas, Paweł Morawiecki, Josef Pieprzyk -+ [Efficient Full-Stack Private Federated Deep Learning with Post-Quantum Security](https://arxiv.org//abs/2505.05751) ++ [Efficient Full-Stack Private Federated Deep Learning with Post-Quantum Security](https://arxiv.org/abs/2505.05751) Yiwei Zhang, Rouzbeh Behnia, Attila A. Yavuz, Reza Ebrahimi, Elisa Bertino -+ [Self-Supervised Federated GNSS Spoofing Detection with Opportunistic Data](https://arxiv.org//abs/2505.06171) ++ [Self-Supervised Federated GNSS Spoofing Detection with Opportunistic Data](https://arxiv.org/abs/2505.06171) Wenjie Liu, Panos Papadimitratos -+ [Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients](https://arxiv.org//abs/2505.06335) ++ [Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients](https://arxiv.org/abs/2505.06335) Jinsheng Yuan, Yuhang Hao, Weisi Guo, Yun Wu, Chongyan Gu # 2025-05-08 -+ [Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation](https://arxiv.org//abs/2505.05235) ++ [Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation](https://arxiv.org/abs/2505.05235) Luca Marzari, Isabella Mastroeni, Alessandro Farinelli -+ [ChainMarks: Securing DNN Watermark with Cryptographic Chain](https://arxiv.org//abs/2505.04977) ++ [ChainMarks: Securing DNN Watermark with Cryptographic Chain](https://arxiv.org/abs/2505.04977) Brian Choi, Shu Wang, Isabelle Choi, Kun Sun -+ [Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks](https://arxiv.org//abs/2505.05190) ++ [Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks](https://arxiv.org/abs/2505.05190) Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal -+ [DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions](https://arxiv.org//abs/2505.05091) ++ [DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions](https://arxiv.org/abs/2505.05091) Shashank Agnihotri, Amaan Ansari, Annika Dackermann, Fabian Rösch, Margret Keuper -+ [PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting](https://arxiv.org//abs/2505.05183) ++ [PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting](https://arxiv.org/abs/2505.05183) Elad Feldman, Jacob Shams, Dudi Biton, Alfred Chen, Shaoyuan Xie, Satoru Koda, Yisroel Mirsky, Asaf Shabtai, Yuval Elovici, Ben Nassi -+ [MTL-UE: Learning to Learn Nothing for Multi-Task Learning](https://arxiv.org//abs/2505.05279) ++ [MTL-UE: Learning to Learn Nothing for Multi-Task Learning](https://arxiv.org/abs/2505.05279) Yi Yu, Song Xia, Siyuan Yang, Chenqi Kong, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex C. Kot -+ [FedRE: Robust and Effective Federated Learning with Privacy Preference](https://arxiv.org//abs/2505.04889) ++ [FedRE: Robust and Effective Federated Learning with Privacy Preference](https://arxiv.org/abs/2505.04889) Tianzhe Xiao, Yichen Li, Yu Zhou, Yining Qi, Yi Liu, Wei Wang, Haozhao Wang, Yi Wang, Ruixuan Li -+ [X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP](https://arxiv.org//abs/2505.05528) ++ [X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP](https://arxiv.org/abs/2505.05528) Hanxun Huang, Sarah Erfani, Yige Li, Xingjun Ma, James Bailey -+ [LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities](https://arxiv.org//abs/2505.05619) ++ [LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities](https://arxiv.org/abs/2505.05619) Kalyan Nakka, Jimmy Dani, Ausmit Mondal, Nitesh Saxena -+ [Unpacking Robustness in Inflectional Languages: Adversarial Evaluation and Mechanistic Insights](https://arxiv.org//abs/2505.07856) ++ [Unpacking Robustness in Inflectional Languages: Adversarial Evaluation and Mechanistic Insights](https://arxiv.org/abs/2505.07856) Paweł Walkowiak, Marek Klonowski, Marcin Oleksy, Arkadiusz Janz -+ [Graph-Based Adversarial Domain Generalization with Anatomical Correlation Knowledge for Cross-User Human Activity Recognition](https://arxiv.org//abs/2506.01962) ++ [Graph-Based Adversarial Domain Generalization with Anatomical Correlation Knowledge for Cross-User Human Activity Recognition](https://arxiv.org/abs/2506.01962) Xiaozhou Ye, Kevin I-Kai Wang -+ [Defending against Indirect Prompt Injection by Instruction Detection](https://arxiv.org//abs/2505.06311) ++ [Defending against Indirect Prompt Injection by Instruction Detection](https://arxiv.org/abs/2505.06311) Tongyu Wen, Chenglong Wang, Xiyuan Yang, Haoyu Tang, Yueqi Xie, Lingjuan Lyu, Zhicheng Dou, Fangzhao Wu -+ [Timestamp Manipulation: Timestamp-based Nakamoto-style Blockchains are Vulnerable](https://arxiv.org//abs/2505.05328) ++ [Timestamp Manipulation: Timestamp-based Nakamoto-style Blockchains are Vulnerable](https://arxiv.org/abs/2505.05328) Junjie Hu, Sisi Duan @@ -10623,84 +10623,84 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Neeloy Chakraborty, John Pohovey, Melkior Ornik, Katherine Driggs-Campbell # 2025-05-07 -+ [Izhikevich-Inspired Temporal Dynamics for Enhancing Privacy, Efficiency, and Transferability in Spiking Neural Networks](https://arxiv.org//abs/2505.04034) ++ [Izhikevich-Inspired Temporal Dynamics for Enhancing Privacy, Efficiency, and Transferability in Spiking Neural Networks](https://arxiv.org/abs/2505.04034) Ayana Moshruba, Hamed Poursiami, Maryam Parsa -+ [Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety](https://arxiv.org//abs/2505.04146) ++ [Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety](https://arxiv.org/abs/2505.04146) Variath Madhupal Gautham Nair, Vishal Varma Dantuluri -+ [Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization](https://arxiv.org//abs/2505.04578) ++ [Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization](https://arxiv.org/abs/2505.04578) Wenjun Cao -+ [Reliable Disentanglement Multi-view Learning Against View Adversarial Attacks](https://arxiv.org//abs/2505.04046) ++ [Reliable Disentanglement Multi-view Learning Against View Adversarial Attacks](https://arxiv.org/abs/2505.04046) Xuyang Wang, Siyuan Duan, Qizhi Li, Guiduo Duan, Yuan Sun, Dezhong Peng -+ [REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM](https://arxiv.org//abs/2505.04673) ++ [REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM](https://arxiv.org/abs/2505.04673) Madhur Jindal, Saurabh Deshpande -+ [A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models](https://arxiv.org//abs/2505.04784) ++ [A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models](https://arxiv.org/abs/2505.04784) Pedro Pinacho-Davidson, Fernando Gutierrez, Pablo Zapata, Rodolfo Vergara, Pablo Aqueveque -+ [Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs](https://arxiv.org//abs/2505.04806) ++ [Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs](https://arxiv.org/abs/2505.04806) Chetan Pathade -+ [Fast Fourier Transform-Based Spectral and Temporal Gradient Filtering for Differential Privacy](https://arxiv.org//abs/2505.04468) ++ [Fast Fourier Transform-Based Spectral and Temporal Gradient Filtering for Differential Privacy](https://arxiv.org/abs/2505.04468) Hyeju Shin, Vincent-Daniel, Kyudan Jung, Seongwon Yun # 2025-05-06 -+ [Automatic Calibration for Membership Inference Attack on Large Language Models](https://arxiv.org//abs/2505.03392) ++ [Automatic Calibration for Membership Inference Attack on Large Language Models](https://arxiv.org/abs/2505.03392) Saleh Zare Zade, Yao Qiang, Xiangyu Zhou, Hui Zhu, Mohammad Amin Roshani, Prashant Khanduri, Dongxiao Zhu -+ [Framework GNN-AID: Graph Neural Network Analysis Interpretation and Defense](https://arxiv.org//abs/2505.03424) ++ [Framework GNN-AID: Graph Neural Network Analysis Interpretation and Defense](https://arxiv.org/abs/2505.03424) Kirill Lukyanov, Mikhail Drobyshevskiy, Georgii Sazonov, Mikhail Soloviov, Ilya Makarov -+ [A new membership inference attack that spots memorization in generative and predictive models: Loss-Based with Reference Model algorithm (LBRM)](https://arxiv.org//abs/2505.03490) ++ [A new membership inference attack that spots memorization in generative and predictive models: Loss-Based with Reference Model algorithm (LBRM)](https://arxiv.org/abs/2505.03490) Faiz Taleb, Ivan Gazeau, Maryline Laurent -+ [ALMA: Aggregated Lipschitz Maximization Attack on Auto-encoders](https://arxiv.org//abs/2505.03646) ++ [ALMA: Aggregated Lipschitz Maximization Attack on Auto-encoders](https://arxiv.org/abs/2505.03646) Chethan Krishnamurthy Ramanaik, Arjun Roy, Eirini Ntoutsi -+ [BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models](https://arxiv.org//abs/2505.03501) ++ [BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models](https://arxiv.org/abs/2505.03501) Zihan Wang, Hongwei Li, Rui Zhang, Wenbo Jiang, Kangjie Chen, Tianwei Zhang, Qingchuan Zhao, Guowen Xu -+ [Attention-aggregated Attack for Boosting the Transferability of Facial Adversarial Examples](https://arxiv.org//abs/2505.03383) ++ [Attention-aggregated Attack for Boosting the Transferability of Facial Adversarial Examples](https://arxiv.org/abs/2505.03383) Jian-Wei Li, Wen-Ze Shao -+ [Robustness in AI-Generated Detection: Enhancing Resistance to Adversarial Attacks](https://arxiv.org//abs/2505.03435) ++ [Robustness in AI-Generated Detection: Enhancing Resistance to Adversarial Attacks](https://arxiv.org/abs/2505.03435) Sun Haoxuan, Hong Yan, Zhan Jiahui, Chen Haoxing, Lan Jun, Zhu Huijia, Wang Weiqiang, Zhang Liqing, Zhang Jianfu -+ [Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images](https://arxiv.org//abs/2505.03611) ++ [Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images](https://arxiv.org/abs/2505.03611) Fangling Jiang, Qi Li, Weining Wang, Wei Shen, Bing Liu, Zhenan Sun -+ [Data-Driven Falsification of Cyber-Physical Systems](https://arxiv.org//abs/2505.03863) ++ [Data-Driven Falsification of Cyber-Physical Systems](https://arxiv.org/abs/2505.03863) Atanu Kundu, Sauvik Gon, Rajarshi Ray -+ [MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models](https://arxiv.org//abs/2505.04015) ++ [MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models](https://arxiv.org/abs/2505.04015) Soheil Zibakhsh Shabgahi, Yaman Jandali, Farinaz Koushanfar -+ [Uncovering the Limitations of Model Inversion Evaluation -- Benchmarks and Connection to Type-I Adversarial Attacks](https://arxiv.org//abs/2505.03519) ++ [Uncovering the Limitations of Model Inversion Evaluation -- Benchmarks and Connection to Type-I Adversarial Attacks](https://arxiv.org/abs/2505.03519) Sy-Tuyen Ho, Koh Jun Hao, Ngoc-Bao Nguyen, Alexander Binder, Ngai-Man Cheung -+ [Adversarial Attacks in Multimodal Systems: A Practitioner's Survey](https://arxiv.org//abs/2505.03084) ++ [Adversarial Attacks in Multimodal Systems: A Practitioner's Survey](https://arxiv.org/abs/2505.03084) Shashank Kapoor, Sanjay Surendranath Girija, Lakshit Arora, Dipen Pradhan, Ankit Shetgaonkar, Aman Raj @@ -10709,477 +10709,477 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Sy-Tuyen Ho, Koh Jun Hao, Ngoc-Bao Nguyen, Alexander Binder, Ngai-Man Cheung # 2025-05-05 -+ [Catastrophic Overfitting, Entropy Gap and Participation Ratio: A Noiseless $l^p$ Norm Solution for Fast Adversarial Training](https://arxiv.org//abs/2505.02360) ++ [Catastrophic Overfitting, Entropy Gap and Participation Ratio: A Noiseless $l^p$ Norm Solution for Fast Adversarial Training](https://arxiv.org/abs/2505.02360) Fares B. Mehouachi, Saif Eddin Jabari -+ [Robustness questions the interpretability of graph neural networks: what to do?](https://arxiv.org//abs/2505.02566) ++ [Robustness questions the interpretability of graph neural networks: what to do?](https://arxiv.org/abs/2505.02566) Kirill Lukyanov, Georgii Sazonov, Serafim Boyarsky, Ilya Makarov -+ [Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models](https://arxiv.org//abs/2505.02824) ++ [Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models](https://arxiv.org/abs/2505.02824) Kuofeng Gao, Yufei Zhu, Yiming Li, Jiawang Bai, Yong Yang, Zhifeng Li, Shu-Tao Xia -+ [Bayesian Robust Aggregation for Federated Learning](https://arxiv.org//abs/2505.02490) ++ [Bayesian Robust Aggregation for Federated Learning](https://arxiv.org/abs/2505.02490) Aleksandr Karakulev, Usama Zafar, Salman Toor, Prashant Singh -+ [Adversarial Robustness Analysis of Vision-Language Models in Medical Image Segmentation](https://arxiv.org//abs/2505.02971) ++ [Adversarial Robustness Analysis of Vision-Language Models in Medical Image Segmentation](https://arxiv.org/abs/2505.02971) Anjila Budathoki, Manish Dhakal -+ [Impact Analysis of Inference Time Attack of Perception Sensors on Autonomous Vehicles](https://arxiv.org//abs/2505.03850) ++ [Impact Analysis of Inference Time Attack of Perception Sensors on Autonomous Vehicles](https://arxiv.org/abs/2505.03850) Hanlin Chen, Simin Chen, Wenyu Li, Wei Yang, Yiheng Feng -+ [AKD : Adversarial Knowledge Distillation For Large Language Models Alignment on Coding tasks](https://arxiv.org//abs/2505.06267) ++ [AKD : Adversarial Knowledge Distillation For Large Language Models Alignment on Coding tasks](https://arxiv.org/abs/2505.06267) Ilyas Oulkadda, Julien Perez -+ [Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem](https://arxiv.org//abs/2505.02581) ++ [Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem](https://arxiv.org/abs/2505.02581) Alberto Hernández-Espinosa, Felipe S. Abrahão, Olaf Witkowski, Hector Zenil -+ [Privacy Risks and Preservation Methods in Explainable Artificial Intelligence: A Scoping Review](https://arxiv.org//abs/2505.02828) ++ [Privacy Risks and Preservation Methods in Explainable Artificial Intelligence: A Scoping Review](https://arxiv.org/abs/2505.02828) Sonal Allana, Mohan Kankanhalli, Rozita Dara # 2025-05-04 -+ [Lightweight Defense Against Adversarial Attacks in Time Series Classification](https://arxiv.org//abs/2505.02073) ++ [Lightweight Defense Against Adversarial Attacks in Time Series Classification](https://arxiv.org/abs/2505.02073) Yi Han -+ [A Survey on Privacy Risks and Protection in Large Language Models](https://arxiv.org//abs/2505.01976) ++ [A Survey on Privacy Risks and Protection in Large Language Models](https://arxiv.org/abs/2505.01976) Kang Chen, Xiuze Zhou, Yuanguo Lin, Shibo Feng, Li Shen, Pengcheng Wu -+ [Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets](https://arxiv.org//abs/2505.02118) ++ [Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets](https://arxiv.org/abs/2505.02118) Wei Liu, Zhongyu Niu, Lang Gao, Zhiying Deng, Jun Wang, Haozhao Wang, Ruixuan Li -+ [A Comprehensive Analysis of Adversarial Attacks against Spam Filters](https://arxiv.org//abs/2505.03831) ++ [A Comprehensive Analysis of Adversarial Attacks against Spam Filters](https://arxiv.org/abs/2505.03831) Esra Hotoğlu, Sevil Sen, Burcu Can -+ [Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs](https://arxiv.org//abs/2505.02009) ++ [Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs](https://arxiv.org/abs/2505.02009) Sai Krishna Mendu, Harish Yenala, Aditi Gulati, Shanu Kumar, Parag Agrawal -+ [Demystifying optimized prompts in language models](https://arxiv.org//abs/2505.02273) ++ [Demystifying optimized prompts in language models](https://arxiv.org/abs/2505.02273) Rimon Melamed, Lucas H. McCabe, H. Howie Huang # 2025-05-03 -+ [Adversarial Robustness of Deep Learning Models for Inland Water Body Segmentation from SAR Images](https://arxiv.org//abs/2505.01884) ++ [Adversarial Robustness of Deep Learning Models for Inland Water Body Segmentation from SAR Images](https://arxiv.org/abs/2505.01884) Siddharth Kothari, Srinivasan Murali, Sankalp Kothari, Ujjwal Verma, Jaya Sreevalsan-Nair -+ [CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation](https://arxiv.org//abs/2505.01900) ++ [CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation](https://arxiv.org/abs/2505.01900) Mazal Bethany, Nishant Vishwamitra, Cho-Yu Jason Chiang, Peyman Najafirad -+ [Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement](https://arxiv.org//abs/2505.01766) ++ [Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement](https://arxiv.org/abs/2505.01766) Long Bai, Boyi Ma, Ruohan Wang, Guankun Wang, Beilei Cui, Zhongliang Jiang, Mobarakol Islam, Zhe Min, Jiewen Lai, Nassir Navab, Hongliang Ren -+ [Towards Trustworthy Federated Learning with Untrusted Participants](https://arxiv.org//abs/2505.01874) ++ [Towards Trustworthy Federated Learning with Untrusted Participants](https://arxiv.org/abs/2505.01874) Youssef Allouah, Rachid Guerraoui, John Stephan -+ [Rogue Cell: Adversarial Attack and Defense in Untrusted O-RAN Setup Exploiting the Traffic Steering xApp](https://arxiv.org//abs/2505.01816) ++ [Rogue Cell: Adversarial Attack and Defense in Untrusted O-RAN Setup Exploiting the Traffic Steering xApp](https://arxiv.org/abs/2505.01816) Eran Aizikovich, Dudu Mimran, Edita Grolman, Yuval Elovici, Asaf Shabtai -+ [Backdoor Attacks Against Patch-based Mixture of Experts](https://arxiv.org//abs/2505.01811) ++ [Backdoor Attacks Against Patch-based Mixture of Experts](https://arxiv.org/abs/2505.01811) Cedric Chan, Jona te Lintelo, Stjepan Picek -+ [Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs](https://arxiv.org//abs/2505.02862) ++ [Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs](https://arxiv.org/abs/2505.02862) Haoming Yang, Ke Ma, Xiaojun Jia, Yingfei Sun, Qianqian Xu, Qingming Huang -+ [Detecting Musical Deepfakes](https://arxiv.org//abs/2505.09633) ++ [Detecting Musical Deepfakes](https://arxiv.org/abs/2505.09633) Nick Sunday # 2025-05-02 -+ [Explainable AI Based Diagnosis of Poisoning Attacks in Evolutionary Swarms](https://arxiv.org//abs/2505.01181) ++ [Explainable AI Based Diagnosis of Poisoning Attacks in Evolutionary Swarms](https://arxiv.org/abs/2505.01181) Mehrdad Asadi, Roxana Rădulescu, Ann Nowé -+ [Attack and defense techniques in large language models: A survey and new perspectives](https://arxiv.org//abs/2505.00976) ++ [Attack and defense techniques in large language models: A survey and new perspectives](https://arxiv.org/abs/2505.00976) Zhiyu Liao, Kang Chen, Yuanguo Lin, Kangkang Li, Yunxuan Liu, Hefeng Chen, Xingwang Huang, Yuanhui Yu -+ [Risk Analysis and Design Against Adversarial Actions](https://arxiv.org//abs/2505.01130) ++ [Risk Analysis and Design Against Adversarial Actions](https://arxiv.org/abs/2505.01130) Marco C. Campi, Algo Carè, Luis G. Crespo, Simone Garatti, Federico A. Ramponi -+ [Harmonizing Intra-coherence and Inter-divergence in Ensemble Attacks for Adversarial Transferability](https://arxiv.org//abs/2505.01168) ++ [Harmonizing Intra-coherence and Inter-divergence in Ensemble Attacks for Adversarial Transferability](https://arxiv.org/abs/2505.01168) Zhaoyang Ma, Zhihao Wu, Wang Lu, Xin Gao, Jinghang Yue, Taolin Zhang, Lipo Wang, Youfang Lin, Jing Wang -+ [LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures](https://arxiv.org//abs/2505.01177) ++ [LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures](https://arxiv.org/abs/2505.01177) Francisco Aguilera-Martínez, Fernando Berzal -+ [Secure Cluster-Based Hierarchical Federated Learning in Vehicular Networks](https://arxiv.org//abs/2505.01186) ++ [Secure Cluster-Based Hierarchical Federated Learning in Vehicular Networks](https://arxiv.org/abs/2505.01186) M. Saeid HaghighiFard, Sinem Coleri -+ [Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System](https://arxiv.org//abs/2505.01315) ++ [Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System](https://arxiv.org/abs/2505.01315) Sheikh Samit Muhaimin, Spyridon Mastorakis -+ [Constrained Network Adversarial Attacks: Validity, Robustness, and Transferability](https://arxiv.org//abs/2505.01328) ++ [Constrained Network Adversarial Attacks: Validity, Robustness, and Transferability](https://arxiv.org/abs/2505.01328) Anass Grini, Oumaima Taheri, Btissam El Khamlichi, Amal El Fallah-Seghrouchni -+ [Transferable Adversarial Attacks on Black-Box Vision-Language Models](https://arxiv.org//abs/2505.01050) ++ [Transferable Adversarial Attacks on Black-Box Vision-Language Models](https://arxiv.org/abs/2505.01050) Kai Hu, Weichen Yu, Li Zhang, Alexander Robey, Andy Zou, Chengming Xu, Haoqi Hu, Matt Fredrikson -+ [Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain](https://arxiv.org//abs/2505.01267) ++ [Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain](https://arxiv.org/abs/2505.01267) Gaozheng Pei, Ke Ma, Yingfei Sun, Qianqian Xu, Qingming Huang -+ [Quantum Support Vector Regression for Robust Anomaly Detection](https://arxiv.org//abs/2505.01012) ++ [Quantum Support Vector Regression for Robust Anomaly Detection](https://arxiv.org/abs/2505.01012) Kilian Tscharke, Maximilian Wendlinger, Sebastian Issel, Pascal Debus -+ [Fine-grained Manipulation Attacks to Local Differential Privacy Protocols for Data Streams](https://arxiv.org//abs/2505.01292) ++ [Fine-grained Manipulation Attacks to Local Differential Privacy Protocols for Data Streams](https://arxiv.org/abs/2505.01292) Xinyu Li, Xuebin Ren, Shusen Yang, Liang Shi, Chia-Mu Yu -+ [Watermark Overwriting Attack on StegaStamp algorithm](https://arxiv.org//abs/2505.01474) ++ [Watermark Overwriting Attack on StegaStamp algorithm](https://arxiv.org/abs/2505.01474) I.F.Serzhenko, L.A.Khaertdinova, M.A.Pautov, A.V.Antsiferova -+ [The DCR Delusion: Measuring the Privacy Risk of Synthetic Data](https://arxiv.org//abs/2505.01524) ++ [The DCR Delusion: Measuring the Privacy Risk of Synthetic Data](https://arxiv.org/abs/2505.01524) Zexi Yao, Nataša Krčo, Georgi Ganev, Yves-Alexandre de Montjoye -+ [LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps](https://arxiv.org//abs/2505.01484) ++ [LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps](https://arxiv.org/abs/2505.01484) Pedro Abdalla, Roman Vershynin -+ [Modeling Behavioral Preferences of Cyber Adversaries Using Inverse Reinforcement Learning](https://arxiv.org//abs/2505.03817) ++ [Modeling Behavioral Preferences of Cyber Adversaries Using Inverse Reinforcement Learning](https://arxiv.org/abs/2505.03817) Aditya Shinde, Prashant Doshi -+ [Seeking to Collide: Online Safety-Critical Scenario Generation for Autonomous Driving with Retrieval Augmented Large Language Models](https://arxiv.org//abs/2505.00972) ++ [Seeking to Collide: Online Safety-Critical Scenario Generation for Autonomous Driving with Retrieval Augmented Large Language Models](https://arxiv.org/abs/2505.00972) Yuewen Mei, Tong Nie, Jian Sun, Ye Tian -+ [Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content](https://arxiv.org//abs/2505.01008) ++ [Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content](https://arxiv.org/abs/2505.01008) Haoyue Bai, Yiyou Sun, Wei Cheng, Haifeng Chen # 2025-05-01 -+ [Red Teaming Large Language Models for Healthcare](https://arxiv.org//abs/2505.00467) ++ [Red Teaming Large Language Models for Healthcare](https://arxiv.org/abs/2505.00467) Vahid Balazadeh, Michael Cooper, David Pellow, Atousa Assadi, Jennifer Bell, Jim Fackler, Gabriel Funingana, Spencer Gable-Cook, Anirudh Gangadhar, Abhishek Jaiswal, Sumanth Kaja, Christopher Khoury, Randy Lin, Kaden McKeen, Sara Naimimohasses, Khashayar Namdar, Aviraj Newatia, Allan Pang, Anshul Pattoo, Sameer Peesapati, Diana Prepelita, Bogdana Rakova, Saba Sadatamin, Rafael Schulman, Ajay Shah, Syed Azhar Shah, Syed Ahmar Shah, Babak Taati, Balagopal Unnikrishnan, Stephanie Williams, Rahul G Krishnan -+ [Analysis of the vulnerability of machine learning regression models to adversarial attacks using data from 5G wireless networks](https://arxiv.org//abs/2505.00487) ++ [Analysis of the vulnerability of machine learning regression models to adversarial attacks using data from 5G wireless networks](https://arxiv.org/abs/2505.00487) Leonid Legashev, Artur Zhigalov, Denis Parfenov -+ [Safety-Critical Traffic Simulation with Guided Latent Diffusion Model](https://arxiv.org//abs/2505.00515) ++ [Safety-Critical Traffic Simulation with Guided Latent Diffusion Model](https://arxiv.org/abs/2505.00515) Mingxing Peng, Ruoyu Yao, Xusen Guo, Yuting Xie, Xianda Chen, Jun Ma -+ [Fast and Low-Cost Genomic Foundation Models via Outlier Removal](https://arxiv.org//abs/2505.00598) ++ [Fast and Low-Cost Genomic Foundation Models via Outlier Removal](https://arxiv.org/abs/2505.00598) Haozheng Luo, Chenghao Qiu, Maojiang Su, Zhihan Zhou, Zoe Mehta, Guo Ye, Jerry Yao-Chieh Hu, Han Liu -+ [The Invisible Threat: Evaluating the Vulnerability of Cross-Spectral Face Recognition to Presentation Attacks](https://arxiv.org//abs/2505.00380) ++ [The Invisible Threat: Evaluating the Vulnerability of Cross-Spectral Face Recognition to Presentation Attacks](https://arxiv.org/abs/2505.00380) Anjith George, Sebastien Marcel -+ [Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models](https://arxiv.org//abs/2505.00817) ++ [Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models](https://arxiv.org/abs/2505.00817) Andrew Adiletta, Berk Sunar -+ [OET: Optimization-based prompt injection Evaluation Toolkit](https://arxiv.org//abs/2505.00843) ++ [OET: Optimization-based prompt injection Evaluation Toolkit](https://arxiv.org/abs/2505.00843) Jinsheng Pan, Xiaogeng Liu, Chaowei Xiao -+ [Protocol-agnostic and Data-free Backdoor Attacks on Pre-trained Models in RF Fingerprinting](https://arxiv.org//abs/2505.00881) ++ [Protocol-agnostic and Data-free Backdoor Attacks on Pre-trained Models in RF Fingerprinting](https://arxiv.org/abs/2505.00881) Tianya Zhao, Ningning Wang, Junqing Zhang, Xuyu Wang -+ [Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation](https://arxiv.org//abs/2505.01456) ++ [Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation](https://arxiv.org/abs/2505.01456) Vaidehi Patil, Yi-Lin Sung, Peter Hase, Jie Peng, Tianlong Chen, Mohit Bansal -+ [Development of an Adapter for Analyzing and Protecting Machine Learning Models from Competitive Activity in the Networks Services](https://arxiv.org//abs/2505.01460) ++ [Development of an Adapter for Analyzing and Protecting Machine Learning Models from Competitive Activity in the Networks Services](https://arxiv.org/abs/2505.01460) Denis Parfenov, Anton Parfenov # 2025-04-30 -+ [How to Backdoor the Knowledge Distillation](https://arxiv.org//abs/2504.21323) ++ [How to Backdoor the Knowledge Distillation](https://arxiv.org/abs/2504.21323) Chen Wu, Qian Ma, Prasenjit Mitra, Sencun Zhu -+ [XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs](https://arxiv.org//abs/2504.21700) ++ [XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs](https://arxiv.org/abs/2504.21700) Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera, Vinod P -+ [Cert-SSB: Toward Certified Sample-Specific Backdoor Defense](https://arxiv.org//abs/2504.21730) ++ [Cert-SSB: Toward Certified Sample-Specific Backdoor Defense](https://arxiv.org/abs/2504.21730) Ting Qiao, Yingjia Wang, Xing Liu, Sixing Wu, Jianbing Li, Yiming Li -+ [The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning](https://arxiv.org//abs/2504.21307) ++ [The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning](https://arxiv.org/abs/2504.21307) Siyi Chen, Yimeng Zhang, Sijia Liu, Qing Qu -+ [Diffusion-based Adversarial Identity Manipulation for Facial Privacy Protection](https://arxiv.org//abs/2504.21646) ++ [Diffusion-based Adversarial Identity Manipulation for Facial Privacy Protection](https://arxiv.org/abs/2504.21646) Liqin Wang, Qianyue Hu, Wei Lu, Xiangyang Luo -+ [Whispers of Data: Unveiling Label Distributions in Federated Learning Through Virtual Client Simulation](https://arxiv.org//abs/2504.21436) ++ [Whispers of Data: Unveiling Label Distributions in Federated Learning Through Virtual Client Simulation](https://arxiv.org/abs/2504.21436) Zhixuan Ma, Haichang Gao, Junxiang Huang, Ping Wang -+ [Traceback of Poisoning Attacks to Retrieval-Augmented Generation](https://arxiv.org//abs/2504.21668) ++ [Traceback of Poisoning Attacks to Retrieval-Augmented Generation](https://arxiv.org/abs/2504.21668) Baolei Zhang, Haoran Xin, Minghong Fang, Zhuqing Liu, Biao Yi, Tong Li, Zheli Liu -+ [Generative AI in Financial Institution: A Global Survey of Opportunities, Threats, and Regulation](https://arxiv.org//abs/2504.21574) ++ [Generative AI in Financial Institution: A Global Survey of Opportunities, Threats, and Regulation](https://arxiv.org/abs/2504.21574) Bikash Saha, Nanda Rani, Sandeep Kumar Shukla -+ [Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs](https://arxiv.org//abs/2504.21680) ++ [Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs](https://arxiv.org/abs/2504.21680) Pan Suo, Yu-Ming Shang, San-Chuan Guo, Xi Zhang -+ [Enhancing Security and Strengthening Defenses in Automated Short-Answer Grading Systems](https://arxiv.org//abs/2505.00061) ++ [Enhancing Security and Strengthening Defenses in Automated Short-Answer Grading Systems](https://arxiv.org/abs/2505.00061) Sahar Yarmohammadtoosky, Yiyun Zhou, Victoria Yaneva, Peter Baldwin, Saed Rezayi, Brian Clauser, Polina Harikeo -+ [Stochastic Subspace Descent Accelerated via Bi-fidelity Line Search](https://arxiv.org//abs/2505.00162) ++ [Stochastic Subspace Descent Accelerated via Bi-fidelity Line Search](https://arxiv.org/abs/2505.00162) Nuojin Cheng, Alireza Doostan, Stephen Becker -+ [Sparsification Under Siege: Defending Against Poisoning Attacks in Communication-Efficient Federated Learning](https://arxiv.org//abs/2505.01454) ++ [Sparsification Under Siege: Defending Against Poisoning Attacks in Communication-Efficient Federated Learning](https://arxiv.org/abs/2505.01454) Zhiyong Jin, Runhua Xu, Chao Li, Yizhong Liu, Jianxin Li -+ [Combating Falsification of Speech Videos with Live Optical Signatures (Extended Version)](https://arxiv.org//abs/2504.21846) ++ [Combating Falsification of Speech Videos with Live Optical Signatures (Extended Version)](https://arxiv.org/abs/2504.21846) Hadleigh Schwartz, Xiaofeng Yan, Charles J. Carver, Xia Zhou # 2025-04-29 -+ [Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression](https://arxiv.org//abs/2504.20493) ++ [Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression](https://arxiv.org/abs/2504.20493) Yu Cui, Yujun Cai, Yiwei Wang -+ [Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption](https://arxiv.org//abs/2504.20769) ++ [Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption](https://arxiv.org/abs/2504.20769) Wenxiao Wang, Parsa Hosseini, Soheil Feizi -+ [GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion](https://arxiv.org//abs/2504.20829) ++ [GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion](https://arxiv.org/abs/2504.20829) Jiaxin Hong, Sixu Chen, Shuoyang Sun, Hongyao Yu, Hao Fang, Yuqi Tan, Bin Chen, Shuhan Qi, Jiawei Li -+ [Mitigating the Structural Bias in Graph Adversarial Defenses](https://arxiv.org//abs/2504.20848) ++ [Mitigating the Structural Bias in Graph Adversarial Defenses](https://arxiv.org/abs/2504.20848) Junyuan Fang, Huimin Liu, Han Yang, Jiajing Wu, Zibin Zheng, Chi K. Tse -+ [Quantifying the Noise of Structural Perturbations on Graph Adversarial Attacks](https://arxiv.org//abs/2504.20869) ++ [Quantifying the Noise of Structural Perturbations on Graph Adversarial Attacks](https://arxiv.org/abs/2504.20869) Junyuan Fang, Han Yang, Haixian Wen, Jiajing Wu, Zibin Zheng, Chi K. Tse -+ [Inception: Jailbreak the Memory Mechanism of Text-to-Image Generation Systems](https://arxiv.org//abs/2504.20376) ++ [Inception: Jailbreak the Memory Mechanism of Text-to-Image Generation Systems](https://arxiv.org/abs/2504.20376) Shiqian Zhao, Jiayang Liu, Yiming Li, Runyi Hu, Xiaojun Jia, Wenshu Fan, Xinfeng Li, Jie Zhang, Wei Dong, Tianwei Zhang, Luu Anh Tuan -+ [Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models](https://arxiv.org//abs/2504.20518) ++ [Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models](https://arxiv.org/abs/2504.20518) Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen -+ [AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security](https://arxiv.org//abs/2504.20965) ++ [AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security](https://arxiv.org/abs/2504.20965) Zikui Cai, Shayan Shabihi, Bang An, Zora Che, Brian R. Bartoldson, Bhavya Kailkhura, Tom Goldstein, Furong Huang -+ [Enhancing Leakage Attacks on Searchable Symmetric Encryption Using LLM-Based Synthetic Data Generation](https://arxiv.org//abs/2504.20414) ++ [Enhancing Leakage Attacks on Searchable Symmetric Encryption Using LLM-Based Synthetic Data Generation](https://arxiv.org/abs/2504.20414) Joshua Chiu, Partha Protim Paul, Zahin Wahab -+ [Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction](https://arxiv.org//abs/2504.20472) ++ [Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction](https://arxiv.org/abs/2504.20472) Yulin Chen, Haoran Li, Yuan Sui, Yue Liu, Yufei He, Yangqiu Song, Bryan Hooi -+ [ReCIT: Reconstructing Full Private Data from Gradient in Parameter-Efficient Fine-Tuning of Large Language Models](https://arxiv.org//abs/2504.20570) ++ [ReCIT: Reconstructing Full Private Data from Gradient in Parameter-Efficient Fine-Tuning of Large Language Models](https://arxiv.org/abs/2504.20570) Jin Xie, Ruishi He, Songze Li, Xiaojun Jia, Shouling Ji -+ [SFIBA: Spatial-based Full-target Invisible Backdoor Attacks](https://arxiv.org//abs/2504.21052) ++ [SFIBA: Spatial-based Full-target Invisible Backdoor Attacks](https://arxiv.org/abs/2504.21052) Yangxu Yin, Honglong Chen, Yudong Gao, Peng Sun, Zhishuai Li, Weifeng Liu -+ [NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models](https://arxiv.org//abs/2504.21053) ++ [NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models](https://arxiv.org/abs/2504.21053) Yi Zhou, Wenpeng Xing, Dezhang Kong, Changting Lin, Meng Han -+ [FFCBA: Feature-based Full-target Clean-label Backdoor Attacks](https://arxiv.org//abs/2504.21054) ++ [FFCBA: Feature-based Full-target Clean-label Backdoor Attacks](https://arxiv.org/abs/2504.21054) Yangxu Yin, Honglong Chen, Yudong Gao, Peng Sun, Liantao Wu, Zhe Li, Weifeng Liu -+ [Erased but Not Forgotten: How Backdoors Compromise Concept Erasure](https://arxiv.org//abs/2504.21072) ++ [Erased but Not Forgotten: How Backdoors Compromise Concept Erasure](https://arxiv.org/abs/2504.21072) Jonas Henry Grebe, Tobias Braun, Marcus Rohrbach, Anna Rohrbach -+ [CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks](https://arxiv.org//abs/2504.21228) ++ [CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks](https://arxiv.org/abs/2504.21228) Rui Wang, Junda Wu, Yu Xia, Tong Yu, Ruiyi Zhang, Ryan Rossi, Lina Yao, Julian McAuley -+ [Quantifying the Noise of Structural Perturbations on Graph Adversarial Attacks](https://arxiv.org//abs/2504.20869) ++ [Quantifying the Noise of Structural Perturbations on Graph Adversarial Attacks](https://arxiv.org/abs/2504.20869) Junyuan Fang, Han Yang, Haixian Wen, Jiajing Wu, Zibin Zheng, Chi K. Tse -+ [Generate-then-Verify: Reconstructing Data from Limited Published Statistics](https://arxiv.org//abs/2504.21199) ++ [Generate-then-Verify: Reconstructing Data from Limited Published Statistics](https://arxiv.org/abs/2504.21199) Terrance Liu, Eileen Xiao, Pratiksha Thaker, Adam Smith, Zhiwei Steven Wu -+ [AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security](https://arxiv.org//abs/2504.20965) ++ [AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security](https://arxiv.org/abs/2504.20965) Zikui Cai, Shayan Shabihi, Bang An, Zora Che, Brian R. Bartoldson, Bhavya Kailkhura, Tom Goldstein, Furong Huang -+ [ACE: A Security Architecture for LLM-Integrated App Systems](https://arxiv.org//abs/2504.20984) ++ [ACE: A Security Architecture for LLM-Integrated App Systems](https://arxiv.org/abs/2504.20984) Evan Li, Tushin Mallick, Evan Rose, William Robertson, Alina Oprea, Cristina Nita-Rotaru # 2025-04-28 -+ [Transformation & Translation Occupancy Grid Mapping: 2-Dimensional Deep Learning Refined SLAM](https://arxiv.org//abs/2504.19654) ++ [Transformation & Translation Occupancy Grid Mapping: 2-Dimensional Deep Learning Refined SLAM](https://arxiv.org/abs/2504.19654) Leon Davies, Baihua Li, Mohamad Saada, Simon Sølvsten, Qinggang Meng -+ [$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation](https://arxiv.org//abs/2504.19674) ++ [$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation](https://arxiv.org/abs/2504.19674) Madhur Jindal, Hari Shrawgi, Parag Agrawal, Sandipan Dandapat -+ [Evaluate-and-Purify: Fortifying Code Language Models Against Adversarial Attacks Using LLM-as-a-Judge](https://arxiv.org//abs/2504.19730) ++ [Evaluate-and-Purify: Fortifying Code Language Models Against Adversarial Attacks Using LLM-as-a-Judge](https://arxiv.org/abs/2504.19730) Wenhan Mu, Ling Xu, Shuren Pei, Le Mi, Huichi Zhou -+ [Adversarial Shallow Watermarking](https://arxiv.org//abs/2504.19529) ++ [Adversarial Shallow Watermarking](https://arxiv.org/abs/2504.19529) Guobiao Li, Lei Tan, Yuliang Xue, Gaozhi Liu, Zhenxing Qian, Sheng Li, Xinpeng Zhang -+ [Hierarchical Uncertainty-Aware Graph Neural Network](https://arxiv.org//abs/2504.19820) ++ [Hierarchical Uncertainty-Aware Graph Neural Network](https://arxiv.org/abs/2504.19820) Yoonhyuk Choi, Chong-Kwon Kim -+ [JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift](https://arxiv.org//abs/2504.19440) ++ [JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift](https://arxiv.org/abs/2504.19440) Julien Piet, Xiao Huang, Dennis Jacob, Annabella Chow, Maha Alrashed, Geng Zhao, Zhanhao Hu, Chawin Sitawarin, Basel Alomair, David Wagner -+ [FCGHunter: Towards Evaluating Robustness of Graph-Based Android Malware Detection](https://arxiv.org//abs/2504.19456) ++ [FCGHunter: Towards Evaluating Robustness of Graph-Based Android Malware Detection](https://arxiv.org/abs/2504.19456) Shiwen Song, Xiaofei Xie, Ruitao Feng, Qi Guo, Sen Chen -+ [Security Steerability is All You Need](https://arxiv.org//abs/2504.19521) ++ [Security Steerability is All You Need](https://arxiv.org/abs/2504.19521) Itay Hazan, Idan Habler, Ron Bitton, Itsik Mantin -+ [Prompt Injection Attack to Tool Selection in LLM Agents](https://arxiv.org//abs/2504.19793) ++ [Prompt Injection Attack to Tool Selection in LLM Agents](https://arxiv.org/abs/2504.19793) Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun -+ [The Automation Advantage in AI Red Teaming](https://arxiv.org//abs/2504.19855) ++ [The Automation Advantage in AI Red Teaming](https://arxiv.org/abs/2504.19855) Rob Mulla, Will Pearce, Nick Landers, Brian Greunke, Brad Palm, Vincent Abruzzo, Ads Dawson -+ [The Dark Side of Digital Twins: Adversarial Attacks on AI-Driven Water Forecasting](https://arxiv.org//abs/2504.20295) ++ [The Dark Side of Digital Twins: Adversarial Attacks on AI-Driven Water Forecasting](https://arxiv.org/abs/2504.20295) Mohammadhossein Homaei, Victor Gonzalez Morales, Oscar Mogollon-Gutierrez, Andres Caro -+ [A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning](https://arxiv.org//abs/2504.20310) ++ [A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning](https://arxiv.org/abs/2504.20310) Greg Gluch, Shafi Goldwasser -+ [A Case Study on the Use of Representativeness Bias as a Defense Against Adversarial Cyber Threats](https://arxiv.org//abs/2504.20245) ++ [A Case Study on the Use of Representativeness Bias as a Defense Against Adversarial Cyber Threats](https://arxiv.org/abs/2504.20245) Briland Hitaj, Grit Denker, Laura Tinnel, Michael McAnally, Bruce DeBruhl, Nathan Bunting, Alex Fafard, Daniel Aaron, Richard D. Roberts, Joshua Lawson, Greg McCain, Dylan Starink -+ [Can Differentially Private Fine-tuning LLMs Protect Against Privacy Attacks?](https://arxiv.org//abs/2504.21036) ++ [Can Differentially Private Fine-tuning LLMs Protect Against Privacy Attacks?](https://arxiv.org/abs/2504.21036) Hao Du, Shang Liu, Yang Cao -+ [Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary](https://arxiv.org//abs/2504.21038) ++ [Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary](https://arxiv.org/abs/2504.21038) Yakai Li, Jiekang Hu, Weiduan Sang, Luping Ma, Jing Xie, Weijuan Zhang, Aimin Yu, Shijie Zhao, Qingjia Huang, Qihang Zhou -+ [What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift](https://arxiv.org//abs/2504.21042) ++ [What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift](https://arxiv.org/abs/2504.21042) Jiamin Chang, Haoyang Li, Hammond Pearce, Ruoxi Sun, Bo Li, Minhui Xue -+ [AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection](https://arxiv.org//abs/2504.21044) ++ [AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection](https://arxiv.org/abs/2504.21044) Jianbo Gao, Keke Gai, Jing Yu, Liehuang Zhu, Qi Wu -+ [A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage](https://arxiv.org//abs/2504.21035) ++ [A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage](https://arxiv.org/abs/2504.21035) Rui Xin, Niloofar Mireshghallah, Shuyue Stella Li, Michael Duan, Hyunwoo Kim, Yejin Choi, Yulia Tsvetkov, Sewoong Oh, Pang Wei Koh -+ [A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning](https://arxiv.org//abs/2504.20310) ++ [A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning](https://arxiv.org/abs/2504.20310) Greg Gluch, Shafi Goldwasser # 2025-04-27 -+ [Doxing via the Lens: Revealing Privacy Leakage in Image Geolocation for Agentic Multi-Modal Large Reasoning Model](https://arxiv.org//abs/2504.19373) ++ [Doxing via the Lens: Revealing Privacy Leakage in Image Geolocation for Agentic Multi-Modal Large Reasoning Model](https://arxiv.org/abs/2504.19373) Weidi Luo, Qiming Zhang, Tianyu Lu, Xiaogeng Liu, Yue Zhao, Zhen Xiang, Chaowei Xiao -+ [Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image](https://arxiv.org//abs/2504.20111) ++ [Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image](https://arxiv.org/abs/2504.20111) Anubhav Jain, Yuya Kobayashi, Naoki Murata, Yuhta Takida, Takashi Shibuya, Yuki Mitsufuji, Niv Cohen, Nasir Memon, Julian Togelius # 2025-04-26 -+ [Test It Before You Trust It: Applying Software Testing for Trustworthy In-context Learning](https://arxiv.org//abs/2504.18827) ++ [Test It Before You Trust It: Applying Software Testing for Trustworthy In-context Learning](https://arxiv.org/abs/2504.18827) Teeradaj Racharak, Chaiyong Ragkhitwetsagul, Chommakorn Sontesadisai, Thanwadee Sunetnanta -+ [Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs](https://arxiv.org//abs/2504.19019) ++ [Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs](https://arxiv.org/abs/2504.19019) Mohammad Akbar-Tajari, Mohammad Taher Pilehvar, Mohammad Mahmoody -+ [Latent Adversarial Training Improves the Representation of Refusal](https://arxiv.org//abs/2504.18872) ++ [Latent Adversarial Training Improves the Representation of Refusal](https://arxiv.org/abs/2504.18872) Alexandra Abbas, Nora Petrova, Helios Ael Lyons, Natalia Perez-Campanero -+ [Unveiling and Mitigating Adversarial Vulnerabilities in Iterative Optimizers](https://arxiv.org//abs/2504.19000) ++ [Unveiling and Mitigating Adversarial Vulnerabilities in Iterative Optimizers](https://arxiv.org/abs/2504.19000) Elad Sofer, Tomer Shaked, Caroline Chaux, Nir Shlezinger -+ [SONNI: Secure Oblivious Neural Network Inference](https://arxiv.org//abs/2504.18974) ++ [SONNI: Secure Oblivious Neural Network Inference](https://arxiv.org/abs/2504.18974) Luke Sperling, Sandeep S. Kulkarni -+ [Safety Interventions against Adversarial Patches in an Open-Source Driver Assistance System](https://arxiv.org//abs/2504.18990) ++ [Safety Interventions against Adversarial Patches in an Open-Source Driver Assistance System](https://arxiv.org/abs/2504.18990) Cheng Chen, Grant Xiao, Daehyun Lee, Lishan Yang, Evgenia Smirni, Homa Alemzadeh, Xugui Zhou -+ [PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight](https://arxiv.org//abs/2504.21029) ++ [PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight](https://arxiv.org/abs/2504.21029) Ben Goertzel, Paulos Yibelo -+ [Toward Safe and Human-Aligned Game Conversational Recommendation via Multi-Agent Decomposition](https://arxiv.org//abs/2504.20094) ++ [Toward Safe and Human-Aligned Game Conversational Recommendation via Multi-Agent Decomposition](https://arxiv.org/abs/2504.20094) Zheng Hui, Xiaokai Wei, Yexi Jiang, Kevin Gao, Chen Wang, Frank Ong, Se-eun Yoon, Rachit Pareek, Michelle Gong # 2025-04-25 -+ [DeSIA: Attribute Inference Attacks Against Limited Fixed Aggregate Statistics](https://arxiv.org//abs/2504.18497) ++ [DeSIA: Attribute Inference Attacks Against Limited Fixed Aggregate Statistics](https://arxiv.org/abs/2504.18497) Yifeng Mao, Bozhidar Stevanoski, Yves-Alexandre de Montjoye -+ [Edge-Based Learning for Improved Classification Under Adversarial Noise](https://arxiv.org//abs/2504.20077) ++ [Edge-Based Learning for Improved Classification Under Adversarial Noise](https://arxiv.org/abs/2504.20077) Manish Kansana, Keyan Alexander Rahimi, Elias Hossain, Iman Dehzangi, Noorbakhsh Amiri Golilarz -+ [Anti-adversarial Learning: Desensitizing Prompts for Large Language Models](https://arxiv.org//abs/2505.01273) ++ [Anti-adversarial Learning: Desensitizing Prompts for Large Language Models](https://arxiv.org/abs/2505.01273) Xuan Li, Zhe Yin, Xiaodong Gu, Beijun Shen @@ -11188,64 +11188,64 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Hanrui Wang, Shuo Wang, Chun-Shien Lu, Isao Echizen # 2025-04-24 -+ [AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models](https://arxiv.org//abs/2504.17179) ++ [AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models](https://arxiv.org/abs/2504.17179) Mohammad Zarei, Melanie A Jutras, Eliana Evans, Mike Tan, Omid Aaramoon -+ [Enhancing Variational Autoencoders with Smooth Robust Latent Encoding](https://arxiv.org//abs/2504.17219) ++ [Enhancing Variational Autoencoders with Smooth Robust Latent Encoding](https://arxiv.org/abs/2504.17219) Hyomin Lee, Minseon Kim, Sangwon Jang, Jongheon Jeong, Sung Ju Hwang -+ [Unified Attacks to Large Language Model Watermarks: Spoofing and Scrubbing in Unauthorized Knowledge Distillation](https://arxiv.org//abs/2504.17480) ++ [Unified Attacks to Large Language Model Watermarks: Spoofing and Scrubbing in Unauthorized Knowledge Distillation](https://arxiv.org/abs/2504.17480) Xin Yi, Shunfan Zhengc, Linlin Wanga, Xiaoling Wang, Liang He -+ [Safety in Large Reasoning Models: A Survey](https://arxiv.org//abs/2504.17704) ++ [Safety in Large Reasoning Models: A Survey](https://arxiv.org/abs/2504.17704) Cheng Wang, Yue Liu, Baolong Li, Duzhen Zhang, Zhongzhi Li, Junfeng Fang -+ [Unveiling Hidden Vulnerabilities in Digital Human Generation via Adversarial Attacks](https://arxiv.org//abs/2504.17457) ++ [Unveiling Hidden Vulnerabilities in Digital Human Generation via Adversarial Attacks](https://arxiv.org/abs/2504.17457) Zhiying Li, Yeying Jin, Fan Shen, Zhi Liu, Weibin Chen, Pengju Zhang, Xiaomei Zhang, Boyu Chen, Michael Shen, Kejian Wu, Zhaoxin Fan, Jin Dong -+ [The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes](https://arxiv.org//abs/2504.17300) ++ [The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes](https://arxiv.org/abs/2504.17300) Wencong You, Daniel Lowd -+ [Towards Robust LLMs: an Adversarial Robustness Measurement Framework](https://arxiv.org//abs/2504.17723) ++ [Towards Robust LLMs: an Adversarial Robustness Measurement Framework](https://arxiv.org/abs/2504.17723) Natan Levy, Adiel Ashrov, Guy Katz -+ [On the Generalization of Adversarially Trained Quantum Classifiers](https://arxiv.org//abs/2504.17690) ++ [On the Generalization of Adversarially Trained Quantum Classifiers](https://arxiv.org/abs/2504.17690) Petros Georgiou, Aaron Mark Thomas, Sharu Theresa Jose, Osvaldo Simeone -+ [Evaluating the Vulnerability of ML-Based Ethereum Phishing Detectors to Single-Feature Adversarial Perturbations](https://arxiv.org//abs/2504.17684) ++ [Evaluating the Vulnerability of ML-Based Ethereum Phishing Detectors to Single-Feature Adversarial Perturbations](https://arxiv.org/abs/2504.17684) Ahod Alghuried, Ali Alkinoon, Abdulaziz Alghamdi, Soohyeon Choi, Manar Mohaisen, David Mohaisen -+ [Fine-Tuning Adversarially-Robust Transformers for Single-Image Dehazing](https://arxiv.org//abs/2504.17829) ++ [Fine-Tuning Adversarially-Robust Transformers for Single-Image Dehazing](https://arxiv.org/abs/2504.17829) Vlad Vasilescu, Ana Neacsu, Daniela Faur -+ [A Simple DropConnect Approach to Transfer-based Targeted Attack](https://arxiv.org//abs/2504.18594) ++ [A Simple DropConnect Approach to Transfer-based Targeted Attack](https://arxiv.org/abs/2504.18594) Tongrui Su, Qingbin Li, Shengyu Zhu, Wei Chen, Xueqi Cheng -+ [BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts](https://arxiv.org//abs/2504.18598) ++ [BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts](https://arxiv.org/abs/2504.18598) Qingyue Wang, Qi Pang, Xixun Lin, Shuai Wang, Daoyuan Wu -+ [Beyond Public Access in LLM Pre-Training Data](https://arxiv.org//abs/2505.00020) ++ [Beyond Public Access in LLM Pre-Training Data](https://arxiv.org/abs/2505.00020) Sruly Rosenblat, Tim O'Reilly, Ilan Strauss @@ -11253,64 +11253,64 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Cheng Wang, Yue Liu, Baolong Bi, Duzhen Zhang, Zhong-Zhi Li, Yingwei Ma, Yufei He, Shengju Yu, Xinfeng Li, Junfeng Fang, Jiaheng Zhang, Bryan Hooi -+ [Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts](https://arxiv.org//abs/2504.17921) ++ [Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts](https://arxiv.org/abs/2504.17921) Mateo Espinosa Zarlenga, Gabriele Dominici, Pietro Barbiero, Zohreh Shams, Mateja Jamnik -+ [Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence](https://arxiv.org//abs/2504.17703) ++ [Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence](https://arxiv.org/abs/2504.17703) Nusrat Jahan, Ratun Rahman, Michel Wang -+ [DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing](https://arxiv.org//abs/2504.17894) ++ [DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing](https://arxiv.org/abs/2504.17894) Aniruddha Bala, Rohit Chowdhury, Rohan Jaiswal, Siddharth Roheda -+ [FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation](https://arxiv.org//abs/2504.17311) ++ [FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation](https://arxiv.org/abs/2504.17311) Yulia Otmakhova, Hung Thinh Truong, Rahmad Mahendra, Zenan Zhai, Rongxin Zhu, Daniel Beck, Jey Han Lau # 2025-04-23 -+ [Statistical Guarantees in Synthetic Data through Conformal Adversarial Generation](https://arxiv.org//abs/2504.17058) ++ [Statistical Guarantees in Synthetic Data through Conformal Adversarial Generation](https://arxiv.org/abs/2504.17058) Rahul Vishwakarma -+ [Robo-Troj: Attacking LLM-based Task Planners](https://arxiv.org//abs/2504.17070) ++ [Robo-Troj: Attacking LLM-based Task Planners](https://arxiv.org/abs/2504.17070) Mohaiminul Al Nahian, Zainab Altaweel, David Reitano, Sabbir Ahmed, Saumitra Lohokare, Shiqi Zhang, Adnan Siraj Rakin -+ [Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-based Multi-Agent Debate](https://arxiv.org//abs/2504.16489) ++ [Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-based Multi-Agent Debate](https://arxiv.org/abs/2504.16489) Senmao Qi, Yifei Zou, Peng Li, Ziyi Lin, Xiuzhen Cheng, Dongxiao Yu -+ [BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation](https://arxiv.org//abs/2504.16907) ++ [BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation](https://arxiv.org/abs/2504.16907) Ruotong Wang, Mingli Zhu, Jiarong Ou, Rui Chen, Xin Tao, Pengfei Wan, Baoyuan Wu -+ [Beyond Anonymization: Object Scrubbing for Privacy-Preserving 2D and 3D Vision Tasks](https://arxiv.org//abs/2504.16557) ++ [Beyond Anonymization: Object Scrubbing for Privacy-Preserving 2D and 3D Vision Tasks](https://arxiv.org/abs/2504.16557) Murat Bilgehan Ertan, Ronak Sahu, Phuong Ha Nguyen, Kaleel Mahmood, Marten van Dijk -+ [MCMC for Bayesian estimation of Differential Privacy from Membership Inference Attacks](https://arxiv.org//abs/2504.16683) ++ [MCMC for Bayesian estimation of Differential Privacy from Membership Inference Attacks](https://arxiv.org/abs/2504.16683) Ceren Yildirim, Kamer Kaya, Sinan Yildirim, Erkay Savas -+ [Property-Preserving Hashing for $\ell_1$-Distance Predicates: Applications to Countering Adversarial Input Attacks](https://arxiv.org//abs/2504.16355) ++ [Property-Preserving Hashing for $\ell_1$-Distance Predicates: Applications to Countering Adversarial Input Attacks](https://arxiv.org/abs/2504.16355) Hassan Asghar, Chenhan Zhang, Dali Kaafar -+ [Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation](https://arxiv.org//abs/2504.16474) ++ [Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation](https://arxiv.org/abs/2504.16474) Meixi Zheng, Kehan Wu, Yanbo Fan, Rui Huang, Baoyuan Wu -+ [Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection](https://arxiv.org//abs/2504.16429) ++ [Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection](https://arxiv.org/abs/2504.16429) Bo Lin, Shangwen Wang, Yihao Qin, Liqian Chen, Xiaoguang Mao @@ -11319,37 +11319,37 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Ruotong Wang, Mingli Zhu, Jiarong Ou, Rui Chen, Xin Tao, Pengfei Wan, Baoyuan Wu -+ [Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control](https://arxiv.org//abs/2504.17130) ++ [Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control](https://arxiv.org/abs/2504.17130) Hannah Cyberey, David Evans -+ [POPri: Private Federated Learning using Preference-Optimized Synthetic Data](https://arxiv.org//abs/2504.16438) ++ [POPri: Private Federated Learning using Preference-Optimized Synthetic Data](https://arxiv.org/abs/2504.16438) Charlie Hou, Mei-Yu Wang, Yige Zhu, Daniel Lazar, Giulia Fanti -+ [Safety Pretraining: Toward the Next Generation of Safe AI](https://arxiv.org//abs/2504.16980) ++ [Safety Pretraining: Toward the Next Generation of Safe AI](https://arxiv.org/abs/2504.16980) Pratyush Maini, Sachin Goyal, Dylan Sam, Alex Robey, Yash Savani, Yiding Jiang, Andy Zou, Matt Fredrikson, Zacharcy C. Lipton, J. Zico Kolter # 2025-04-22 -+ [A Geometric Approach to Problems in Optimization and Data Science](https://arxiv.org//abs/2504.16270) ++ [A Geometric Approach to Problems in Optimization and Data Science](https://arxiv.org/abs/2504.16270) Naren Sarayu Manoj -+ [WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks](https://arxiv.org//abs/2504.18575) ++ [WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks](https://arxiv.org/abs/2504.18575) Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori, Chuan Guo, Kamalika Chaudhuri -+ [Residual-Evasive Attacks on ADMM in Distributed Optimization](https://arxiv.org//abs/2504.18570) ++ [Residual-Evasive Attacks on ADMM in Distributed Optimization](https://arxiv.org/abs/2504.18570) Sabrina Bruckmeier, Huadong Mo, James Qin -+ [Defending Against Intelligent Attackers at Large Scales](https://arxiv.org//abs/2504.18577) ++ [Defending Against Intelligent Attackers at Large Scales](https://arxiv.org/abs/2504.18577) Andrew J. Lohn -+ [Kill two birds with one stone: generalized and robust AI-generated text detection via dynamic perturbations](https://arxiv.org//abs/2504.21019) ++ [Kill two birds with one stone: generalized and robust AI-generated text detection via dynamic perturbations](https://arxiv.org/abs/2504.21019) Yinghan Zhou, Juan Wen, Wanli Peng, Yiming Xue, Ziwei Zhang, Zhengxian Wu @@ -11358,94 +11358,94 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yinghan Zhou, Juan Wen, Wanli Peng, Yiming Xue, Ziwei Zhang, Zhengxian Wu # 2025-04-21 -+ [Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos](https://arxiv.org//abs/2504.14921) ++ [Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos](https://arxiv.org/abs/2504.14921) Songping Wang, Hanqing Liu, Yueming Lyu, Xiantao Hu, Ziwen He, Wei Wang, Caifeng Shan, Liang Wang -+ [aiXamine: LLM Safety and Security Simplified](https://arxiv.org//abs/2504.14985) ++ [aiXamine: LLM Safety and Security Simplified](https://arxiv.org/abs/2504.14985) Fatih Deniz, Dorde Popovic, Yazan Boshmaf, Euisuh Jeong, Minhaj Ahmad, Sanjay Chawla, Issa Khalil -+ [RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search](https://arxiv.org//abs/2504.15047) ++ [RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search](https://arxiv.org/abs/2504.15047) Quy-Anh Dang, Chris Ngo, Truong-Son Hy -+ [Verifying Robust Unlearning: Probing Residual Knowledge in Unlearned Models](https://arxiv.org//abs/2504.14798) ++ [Verifying Robust Unlearning: Probing Residual Knowledge in Unlearned Models](https://arxiv.org/abs/2504.14798) Hao Xuan, Xingyu Li -+ [Backdoor Defense in Diffusion Models via Spatial Attention Unlearning](https://arxiv.org//abs/2504.18563) ++ [Backdoor Defense in Diffusion Models via Spatial Attention Unlearning](https://arxiv.org/abs/2504.18563) Abha Jha, Ashwath Vaithinathan Aravindan, Matthew Salaway, Atharva Sandeep Bhide, Duygu Nur Yaldiz -+ [DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization](https://arxiv.org//abs/2504.18564) ++ [DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization](https://arxiv.org/abs/2504.18564) Xinzhe Huang, Kedong Xiu, Tianhang Zheng, Churui Zeng, Wangze Ni, Zhan Qiin, Kui Ren, Chun Chen -+ [Feature Selection via GANs (GANFS): Enhancing Machine Learning Models for DDoS Mitigation](https://arxiv.org//abs/2504.18566) ++ [Feature Selection via GANs (GANFS): Enhancing Machine Learning Models for DDoS Mitigation](https://arxiv.org/abs/2504.18566) Harsh Patel -+ [MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety](https://arxiv.org//abs/2504.15241) ++ [MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety](https://arxiv.org/abs/2504.15241) Yahan Yang, Soham Dan, Shuo Li, Dan Roth, Insup Lee -+ [Improving Human-AI Coordination through Online Adversarial Training and Generative Models](https://arxiv.org//abs/2504.15457) ++ [Improving Human-AI Coordination through Online Adversarial Training and Generative Models](https://arxiv.org/abs/2504.15457) Paresh Chaudhary, Yancheng Liang, Daphne Chen, Simon S. Du, Natasha Jaques # 2025-04-20 -+ [Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection](https://arxiv.org//abs/2504.16125) ++ [Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection](https://arxiv.org/abs/2504.16125) Xiangyu Chang, Guang Dai, Hao Di, Haishan Ye -+ [Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation](https://arxiv.org//abs/2504.14541) ++ [Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation](https://arxiv.org/abs/2504.14541) Yi Yu, Song Xia, Xun Lin, Chenqi Kong, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex C. Kot -+ [REDEditing: Relationship-Driven Precise Backdoor Poisoning on Text-to-Image Diffusion Models](https://arxiv.org//abs/2504.14554) ++ [REDEditing: Relationship-Driven Precise Backdoor Poisoning on Text-to-Image Diffusion Models](https://arxiv.org/abs/2504.14554) Chongye Guo, Jinhu Fu, Junfeng Fang, Kun Wang, Guorui Feng -+ [LLM-Enabled In-Context Learning for Data Collection Scheduling in UAV-assisted Sensor Networks](https://arxiv.org//abs/2504.14556) ++ [LLM-Enabled In-Context Learning for Data Collection Scheduling in UAV-assisted Sensor Networks](https://arxiv.org/abs/2504.14556) Yousef Emami, Hao Zhou, SeyedSina Nabavirazani, Luis Almeida # 2025-04-19 -+ [CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations](https://arxiv.org//abs/2504.14119) ++ [CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations](https://arxiv.org/abs/2504.14119) Man Ho Lam, Chaozheng Wang, Jen-tse Huang, Michael R. Lyu -+ [Hydra: An Agentic Reasoning Approach for Enhancing Adversarial Robustness and Mitigating Hallucinations in Vision-Language Models](https://arxiv.org//abs/2504.14395) ++ [Hydra: An Agentic Reasoning Approach for Enhancing Adversarial Robustness and Mitigating Hallucinations in Vision-Language Models](https://arxiv.org/abs/2504.14395) Chung-En (Johnny)Yu, Hsuan-Chih (Neil)Chen, Brian Jalaian, Nathaniel D. Bastian -+ [Adversarial Attack for RGB-Event based Visual Object Tracking](https://arxiv.org//abs/2504.14423) ++ [Adversarial Attack for RGB-Event based Visual Object Tracking](https://arxiv.org/abs/2504.14423) Qiang Chen, Xiao Wang, Haowen Wang, Bo Jiang, Lin Zhu, Dawei Zhang, Yonghong Tian, Jin Tang -+ [The First VoicePrivacy Attacker Challenge](https://arxiv.org//abs/2504.14183) ++ [The First VoicePrivacy Attacker Challenge](https://arxiv.org/abs/2504.14183) Natalia Tomashenko, Xiaoxiao Miao, Emmanuel Vincent, Junichi Yamagishi -+ [Rethinking Target Label Conditioning in Adversarial Attacks: A 2D Tensor-Guided Generative Approach](https://arxiv.org//abs/2504.14137) ++ [Rethinking Target Label Conditioning in Adversarial Attacks: A 2D Tensor-Guided Generative Approach](https://arxiv.org/abs/2504.14137) Hangyu Liu, Bo Peng, Pengxiang Ding, Donglin Wang -+ [Manipulating Multimodal Agents via Cross-Modal Prompt Injection](https://arxiv.org//abs/2504.14348) ++ [Manipulating Multimodal Agents via Cross-Modal Prompt Injection](https://arxiv.org/abs/2504.14348) Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, Xianglong Liu @@ -11456,98 +11456,98 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca # 2025-04-18 -+ [Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes](https://arxiv.org//abs/2504.16117) ++ [Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes](https://arxiv.org/abs/2504.16117) Sridevi Polavaram, Xin Zhou, Meenu Ravi, Mohammad Zarei, Anmol Srivastava -+ [Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation](https://arxiv.org//abs/2504.13551) ++ [Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation](https://arxiv.org/abs/2504.13551) CheolWon Na, YunSeok Choi, Jee-Hyong Lee -+ [DETAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification](https://arxiv.org//abs/2504.13562) ++ [DETAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification](https://arxiv.org/abs/2504.13562) Yu Li, Han Jiang, Zhihua Wei -+ [BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models](https://arxiv.org//abs/2504.13775) ++ [BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models](https://arxiv.org/abs/2504.13775) Zhengxian Wu, Juan Wen, Wanli Peng, Ziwei Zhang, Yinghan Zhou, Yiming Xue -+ [STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings](https://arxiv.org//abs/2504.13416) ++ [STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings](https://arxiv.org/abs/2504.13416) Saksham Rastogi, Pratyush Maini, Danish Pruthi -+ [Fairness and Robustness in Machine Unlearning](https://arxiv.org//abs/2504.13610) ++ [Fairness and Robustness in Machine Unlearning](https://arxiv.org/abs/2504.13610) Khoa Tran, Simon S. Woo -+ [On the Relationship Between Robustness and Expressivity of Graph Neural Networks](https://arxiv.org//abs/2504.13786) ++ [On the Relationship Between Robustness and Expressivity of Graph Neural Networks](https://arxiv.org/abs/2504.13786) Lorenz Kummer, Wilfried N. Gansterer, Nils M. Kriege -+ [DoomArena: A framework for Testing AI Agents Against Evolving Security Threats](https://arxiv.org//abs/2504.14064) ++ [DoomArena: A framework for Testing AI Agents Against Evolving Security Threats](https://arxiv.org/abs/2504.14064) Leo Boisvert, Mihir Bansal, Chandra Kiran Reddy Evuru, Gabriel Huang, Abhay Puri, Avinandan Bose, Maryam Fazel, Quentin Cappart, Jason Stanley, Alexandre Lacoste, Alexandre Drouin, Krishnamurthy Dvijotham -+ [AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models](https://arxiv.org//abs/2507.01020) ++ [AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models](https://arxiv.org/abs/2507.01020) Aashray Reddy, Andrew Zagula, Nicholas Saban # 2025-04-17 -+ [Security-First AI: Foundations for Robust and Trustworthy Systems](https://arxiv.org//abs/2504.16110) ++ [Security-First AI: Foundations for Robust and Trustworthy Systems](https://arxiv.org/abs/2504.16110) Krti Tallam -+ [Antidistillation Sampling](https://arxiv.org//abs/2504.13146) ++ [Antidistillation Sampling](https://arxiv.org/abs/2504.13146) Yash Savani, Asher Trockman, Zhili Feng, Avi Schwarzschild, Alexander Robey, Marc Finzi, J. Zico Kolter -+ [Quantum Computing Supported Adversarial Attack-Resilient Autonomous Vehicle Perception Module for Traffic Sign Classification](https://arxiv.org//abs/2504.12644) ++ [Quantum Computing Supported Adversarial Attack-Resilient Autonomous Vehicle Perception Module for Traffic Sign Classification](https://arxiv.org/abs/2504.12644) Reek Majumder, Mashrur Chowdhury, Sakib Mahmud Khan, Zadid Khan, Fahim Ahmad, Frank Ngeni, Gurcan Comert, Judith Mwakalonge, Dimitra Michalaka -+ [A Numerical Gradient Inversion Attack in Variational Quantum Neural-Networks](https://arxiv.org//abs/2504.12806) ++ [A Numerical Gradient Inversion Attack in Variational Quantum Neural-Networks](https://arxiv.org/abs/2504.12806) Georgios Papadopoulos, Shaltiel Eloul, Yash Satsangi, Jamie Heredge, Niraj Kumar, Chun-Fu Chen, Marco Pistoia -+ [Privacy Protection Against Personalized Text-to-Image Synthesis via Cross-image Consistency Constraints](https://arxiv.org//abs/2504.12747) ++ [Privacy Protection Against Personalized Text-to-Image Synthesis via Cross-image Consistency Constraints](https://arxiv.org/abs/2504.12747) Guanyu Wang, Kailong Wang, Yihao Huang, Mingyi Zhou, Zhang Qing cnwatcher, Geguang Pu, Li Li -+ [A Client-level Assessment of Collaborative Backdoor Poisoning in Non-IID Federated Learning](https://arxiv.org//abs/2504.12875) ++ [A Client-level Assessment of Collaborative Backdoor Poisoning in Non-IID Federated Learning](https://arxiv.org/abs/2504.12875) Phung Lai, Guanxiong Liu, Hai Phan, Issa Khalil, Abdallah Khreishah, Xintao Wu -+ [GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms](https://arxiv.org//abs/2504.13052) ++ [GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms](https://arxiv.org/abs/2504.13052) Sinan He, An Wang -+ [On the Definition of Robustness and Resilience of AI Agents for Real-time Congestion Management](https://arxiv.org//abs/2504.13314) ++ [On the Definition of Robustness and Resilience of AI Agents for Real-time Congestion Management](https://arxiv.org/abs/2504.13314) Timothy Tjhay, Ricardo J. Bessa, Jose Paulos -+ [DYNAMITE: Dynamic Defense Selection for Enhancing Machine Learning-based Intrusion Detection Against Adversarial Attacks](https://arxiv.org//abs/2504.13301) ++ [DYNAMITE: Dynamic Defense Selection for Enhancing Machine Learning-based Intrusion Detection Against Adversarial Attacks](https://arxiv.org/abs/2504.13301) Jing Chen, Onat Gungor, Zhengli Shang, Elvin Li, Tajana Rosing -+ [Recursive Deep Inverse Reinforcement Learning](https://arxiv.org//abs/2504.13241) ++ [Recursive Deep Inverse Reinforcement Learning](https://arxiv.org/abs/2504.13241) Paul Ghanem, Michael Potter, Owen Howell, Pau Closas, Alireza Ramezani, Deniz Erdogmus, Tales Imbiriba @@ -11556,196 +11556,196 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Iris Ma, Ian Domingo, Alberto Krone-Martins, Pierre Baldi, Cristina V. Lopes # 2025-04-16 -+ [Anti-Aesthetics: Protecting Facial Privacy against Customized Text-to-Image Synthesis](https://arxiv.org//abs/2504.12129) ++ [Anti-Aesthetics: Protecting Facial Privacy against Customized Text-to-Image Synthesis](https://arxiv.org/abs/2504.12129) Songping Wang, Yueming Lyu, Shiqi Liu, Ning Li, Tong Tong, Hao Sun, Caifeng Shan -+ [Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset](https://arxiv.org//abs/2504.11707) ++ [Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset](https://arxiv.org/abs/2504.11707) Muhammad Shahid Muneer, Simon S. Woo -+ [Robust and Fine-Grained Detection of AI Generated Texts](https://arxiv.org//abs/2504.11952) ++ [Robust and Fine-Grained Detection of AI Generated Texts](https://arxiv.org/abs/2504.11952) Ram Mohan Rao Kadiyala, Siddartha Pullakhandam, Kanwal Mehreen, Drishti Sharma, Siddhant Gupta, Jebish Purbey, Ashay Srivastava, Subhasya TippaReddy, Arvind Reddy Bobbili, Suraj Telugara Chandrashekhar, Modabbir Adeeb, Srinadh Vura, Hamza Farooq -+ [ACE: Attentional Concept Erasure in Diffusion Models](https://arxiv.org//abs/2504.11850) ++ [ACE: Attentional Concept Erasure in Diffusion Models](https://arxiv.org/abs/2504.11850) Finn Carter -+ [RDI: An adversarial robustness evaluation metric for deep neural networks based on sample clustering features](https://arxiv.org//abs/2504.18556) ++ [RDI: An adversarial robustness evaluation metric for deep neural networks based on sample clustering features](https://arxiv.org/abs/2504.18556) Jialei Song, Xingquan Zuo, Feiyang Wang, Hai Huang, Tianle Zhang -+ [Support is All You Need for Certified VAE Training](https://arxiv.org//abs/2504.11831) ++ [Support is All You Need for Certified VAE Training](https://arxiv.org/abs/2504.11831) Changming Xu, Debangshu Banerjee, Deepak Vasisht, Gagandeep Singh -+ [InjectLab: A Tactical Framework for Adversarial Threat Modeling Against Large Language Models](https://arxiv.org//abs/2505.18156) ++ [InjectLab: A Tactical Framework for Adversarial Threat Modeling Against Large Language Models](https://arxiv.org/abs/2505.18156) Austin Howard -+ [Exploring Video-Based Driver Activity Recognition under Noisy Labels](https://arxiv.org//abs/2504.11966) ++ [Exploring Video-Based Driver Activity Recognition under Noisy Labels](https://arxiv.org/abs/2504.11966) Linjuan Fan, Di Wen, Kunyu Peng, Kailun Yang, Jiaming Zhang, Ruiping Liu, Yufan Chen, Junwei Zheng, Jiamin Wu, Xudong Han, Rainer Stiefelhagen -+ [AttentionDrop: A Novel Regularization Method for Transformer Models](https://arxiv.org//abs/2504.12088) ++ [AttentionDrop: A Novel Regularization Method for Transformer Models](https://arxiv.org/abs/2504.12088) Mirza Samad Ahmed Baig, Syeda Anshrah Gillani, Abdul Akbar Khan, Shahid Munir Shah, Muhammad Omer Khan # 2025-04-15 -+ [RAID: An In-Training Defense against Attribute Inference Attacks in Recommender Systems](https://arxiv.org//abs/2504.11510) ++ [RAID: An In-Training Defense against Attribute Inference Attacks in Recommender Systems](https://arxiv.org/abs/2504.11510) Xiaohua Feng, Yuyuan Li, Fengyuan Yu, Ke Xiong, Junjie Fang, Li Zhang, Tianyu Du, Chaochao Chen -+ [Propaganda via AI? A Study on Semantic Backdoors in Large Language Models](https://arxiv.org//abs/2504.12344) ++ [Propaganda via AI? A Study on Semantic Backdoors in Large Language Models](https://arxiv.org/abs/2504.12344) Nay Myat Min, Long H. Pham, Yige Li, Jun Sun -+ [PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage](https://arxiv.org//abs/2504.11509) ++ [PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage](https://arxiv.org/abs/2504.11509) Wenyi Zhang, Ju Jia, Xiaojun Jia, Yihao Huang, Xinfeng Li, Cong Wu, Lina Wang -+ [X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents](https://arxiv.org//abs/2504.13203) ++ [X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents](https://arxiv.org/abs/2504.13203) Salman Rahman, Liwei Jiang, James Shiffer, Genglin Liu, Sheriff Issaka, Md Rizwan Parvez, Hamid Palangi, Kai-Wei Chang, Yejin Choi, Saadia Gabriel -+ [Concept Enhancement Engineering: A Lightweight and Efficient Robust Defense Against Jailbreak Attacks in Embodied AI](https://arxiv.org//abs/2504.13201) ++ [Concept Enhancement Engineering: A Lightweight and Efficient Robust Defense Against Jailbreak Attacks in Embodied AI](https://arxiv.org/abs/2504.13201) Jirui Yang, Zheyu Lin, Shuhan Yang, Zhihui Lu, Xin Du -+ [Bias Beyond English: Evaluating Social Bias and Debiasing Methods in a Low-Resource Setting](https://arxiv.org//abs/2504.11183) ++ [Bias Beyond English: Evaluating Social Bias and Debiasing Methods in a Low-Resource Setting](https://arxiv.org/abs/2504.11183) Ej Zhou, Weiming Lu # 2025-04-14 -+ [Beyond Worst-Case Online Classification: VC-Based Regret Bounds for Relaxed Benchmarks](https://arxiv.org//abs/2504.10598) ++ [Beyond Worst-Case Online Classification: VC-Based Regret Bounds for Relaxed Benchmarks](https://arxiv.org/abs/2504.10598) Omar Montasser, Abhishek Shetty, Nikita Zhivotovskiy -+ [You've Changed: Detecting Modification of Black-Box Large Language Models](https://arxiv.org//abs/2504.12335) ++ [You've Changed: Detecting Modification of Black-Box Large Language Models](https://arxiv.org/abs/2504.12335) Alden Dima, James Foulds, Shimei Pan, Philip Feldman -+ [Investigating cybersecurity incidents using large language models in latest-generation wireless networks](https://arxiv.org//abs/2504.13196) ++ [Investigating cybersecurity incidents using large language models in latest-generation wireless networks](https://arxiv.org/abs/2504.13196) Leonid Legashev, Arthur Zhigalov # 2025-04-13 -+ [ControlNET: A Firewall for RAG-based LLM System](https://arxiv.org//abs/2504.09593) ++ [ControlNET: A Firewall for RAG-based LLM System](https://arxiv.org/abs/2504.09593) Hongwei Yao, Haoran Shi, Yidou Chen, Yixin Jiang, Cong Wang, Zhan Qin -+ [CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent](https://arxiv.org//abs/2504.13192) ++ [CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent](https://arxiv.org/abs/2504.13192) Liang-bo Ning, Shijie Wang, Wenqi Fan, Qing Li, Xin Xu, Hao Chen, Feiran Huang -+ [Mitigating Many-Shot Jailbreaking](https://arxiv.org//abs/2504.09604) ++ [Mitigating Many-Shot Jailbreaking](https://arxiv.org/abs/2504.09604) Christopher M. Ackerman, Nina Panickssery -+ [AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender](https://arxiv.org//abs/2504.09466) ++ [AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender](https://arxiv.org/abs/2504.09466) Weixiang Zhao, Jiahe Guo, Yulin Hu, Yang Deng, An Zhang, Xingyu Sui, Xinyang Han, Yanyan Zhao, Bing Qin, Tat-Seng Chua, Ting Liu # 2025-04-12 -+ [RT-DATR: Real-time Unsupervised Domain Adaptive Detection Transformer with Adversarial Feature Alignment](https://arxiv.org//abs/2504.09196) ++ [RT-DATR: Real-time Unsupervised Domain Adaptive Detection Transformer with Adversarial Feature Alignment](https://arxiv.org/abs/2504.09196) Feng Lv, Guoqing Li, Jin Li, Chunlong Xia # 2025-04-11 -+ [A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation](https://arxiv.org//abs/2504.08411) ++ [A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation](https://arxiv.org/abs/2504.08411) Dawei Zhou, Suzhi Gang, Decheng Liu, Tongliang Liu, Nannan Wang, Xinbo Gao -+ [EO-VLM: VLM-Guided Energy Overload Attacks on Vision Models](https://arxiv.org//abs/2504.08205) ++ [EO-VLM: VLM-Guided Energy Overload Attacks on Vision Models](https://arxiv.org/abs/2504.08205) Minjae Seo, Myoungsung You, Junhee Lee, Jaehan Kim, Hwanjo Heo, Jintae Oh, Jinwoo Kim -+ [Adversarial Examples in Environment Perception for Automated Driving (Review)](https://arxiv.org//abs/2504.08414) ++ [Adversarial Examples in Environment Perception for Automated Driving (Review)](https://arxiv.org/abs/2504.08414) Jun Yan, Huilin Yin -+ [Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition](https://arxiv.org//abs/2504.08616) ++ [Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition](https://arxiv.org/abs/2504.08616) Lei Kang, Xuanshuo Fu, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas -+ [Understanding the Impact of Data Domain Extraction on Synthetic Data Privacy](https://arxiv.org//abs/2504.08254) ++ [Understanding the Impact of Data Domain Extraction on Synthetic Data Privacy](https://arxiv.org/abs/2504.08254) Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Sofiane Mahiou, Emiliano De Cristofaro -+ [To See or Not to See -- Fingerprinting Devices in Adversarial Environments Amid Advanced Machine Learning](https://arxiv.org//abs/2504.08264) ++ [To See or Not to See -- Fingerprinting Devices in Adversarial Environments Amid Advanced Machine Learning](https://arxiv.org/abs/2504.08264) Justin Feng, Nader Sehatbakhsh -+ [Toward Realistic Adversarial Attacks in IDS: A Novel Feasibility Metric for Transferability](https://arxiv.org//abs/2504.08480) ++ [Toward Realistic Adversarial Attacks in IDS: A Novel Feasibility Metric for Transferability](https://arxiv.org/abs/2504.08480) Sabrine Ennaji, Elhadj Benkhelifa, Luigi Vincenzo Mancini -+ [An Early Experience with Confidential Computing Architecture for On-Device Model Protection](https://arxiv.org//abs/2504.08508) ++ [An Early Experience with Confidential Computing Architecture for On-Device Model Protection](https://arxiv.org/abs/2504.08508) Sina Abdollahi, Mohammad Maheri, Sandra Siby, Marios Kogias, Hamed Haddadi -+ [Palmprint De-Identification Using Diffusion Model for High-Quality and Diverse Synthesis](https://arxiv.org//abs/2504.08272) ++ [Palmprint De-Identification Using Diffusion Model for High-Quality and Diverse Synthesis](https://arxiv.org/abs/2504.08272) Licheng Yan, Bob Zhang, Andrew Beng Jin Teoh, Lu Leng, Shuyi Li, Yuqi Wang, Ziyuan Yang # 2025-04-10 -+ [Geneshift: Impact of different scenario shift on Jailbreaking LLM](https://arxiv.org//abs/2504.08104) ++ [Geneshift: Impact of different scenario shift on Jailbreaking LLM](https://arxiv.org/abs/2504.08104) Tianyi Wu, Zhiwei Xue, Yue Liu, Jiaheng Zhang, Bryan Hooi, See-Kiong Ng -+ [PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization](https://arxiv.org//abs/2504.07717) ++ [PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization](https://arxiv.org/abs/2504.07717) Yang Jiao, Xiaodong Wang, Kai Yang -+ [Decomposition-Based Optimal Bounds for Privacy Amplification via Shuffling](https://arxiv.org//abs/2504.07414) ++ [Decomposition-Based Optimal Bounds for Privacy Amplification via Shuffling](https://arxiv.org/abs/2504.07414) Pengcheng Su, Haibo Cheng, Ping Wang -+ [FakeIDet: Exploring Patches for Privacy-Preserving Fake ID Detection](https://arxiv.org//abs/2504.07761) ++ [FakeIDet: Exploring Patches for Privacy-Preserving Fake ID Detection](https://arxiv.org/abs/2504.07761) Javier Muñoz-Haro, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez -+ [Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge](https://arxiv.org//abs/2504.07887) ++ [Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge](https://arxiv.org/abs/2504.07887) Riccardo Cantini, Alessio Orsino, Massimo Ruggiero, Domenico Talia # 2025-04-09 -+ [The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data](https://arxiv.org//abs/2504.06923) ++ [The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data](https://arxiv.org/abs/2504.06923) Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Sofiane Mahiou, Emiliano De Cristofaro -+ [DeCoMa: Detecting and Purifying Code Dataset Watermarks through Dual Channel Code Abstraction](https://arxiv.org//abs/2504.07002) ++ [DeCoMa: Detecting and Purifying Code Dataset Watermarks through Dual Channel Code Abstraction](https://arxiv.org/abs/2504.07002) Yuan Xiao, Yuchen Chen, Shiqing Ma, Haocheng Huang, Chunrong Fang, Yanwei Chen, Weisong Sun, Yunfeng Zhu, Xiaofang Zhang, Zhenyu Chen # 2025-04-08 -+ [StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization](https://arxiv.org//abs/2504.05804) ++ [StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization](https://arxiv.org/abs/2504.05804) Yiming Tang, Yi Fan, Chenxiao Yu, Tiankai Yang, Yue Zhao, Xiyang Hu @@ -11753,27 +11753,27 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yu-Hang Wu, Yu-Jie Xiong, Hao Zhang, Jia-Chen Zhang, Zheng Zhou -+ [Parasite: A Steganography-based Backdoor Attack Framework for Diffusion Models](https://arxiv.org//abs/2504.05815) ++ [Parasite: A Steganography-based Backdoor Attack Framework for Diffusion Models](https://arxiv.org/abs/2504.05815) Jiahao Chen, Yu Pan, Yi Du, Chunkai Wu, Lin Wang -+ [Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking](https://arxiv.org//abs/2504.05652) ++ [Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking](https://arxiv.org/abs/2504.05652) Yu-Hang Wu, Yu-Jie Xiong, Hao Zhang, Jia-Chen Zhang, Zheng Zhou # 2025-04-07 -+ [SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models](https://arxiv.org//abs/2504.04893) ++ [SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models](https://arxiv.org/abs/2504.04893) Justus Westerhoff, Erblina Purelku, Jakob Hackstein, Leo Pinetzki, Lorenz Hufe -+ [Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models](https://arxiv.org//abs/2504.05050) ++ [Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models](https://arxiv.org/abs/2504.05050) Jiawei Lian, Jianhong Pan, Lefan Wang, Yi Wang, Shaohui Mei, Lap-Pui Chau -+ [Adversarial KA](https://arxiv.org//abs/2504.05255) ++ [Adversarial KA](https://arxiv.org/abs/2504.05255) Sviatoslav Dzhenzher, Michael H. Freedman @@ -11781,23 +11781,23 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Roie Kazoom, Raz Lapid, Moshe Sipper, Ofer Hadar -+ [SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement](https://arxiv.org//abs/2504.04818) ++ [SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement](https://arxiv.org/abs/2504.04818) Zuying Xie, Changtao Miao, Ajian Liu, Jiabao Guo, Feng Li, Dan Guo, Yunfeng Diao -+ [Don't Lag, RAG: Training-Free Adversarial Detection Using RAG](https://arxiv.org//abs/2504.04858) ++ [Don't Lag, RAG: Training-Free Adversarial Detection Using RAG](https://arxiv.org/abs/2504.04858) Roie Kazoom, Raz Lapid, Moshe Sipper, Ofer Hadar -+ [Pr$εε$mpt: Sanitizing Sensitive Prompts for LLMs](https://arxiv.org//abs/2504.05147) ++ [Pr$εε$mpt: Sanitizing Sensitive Prompts for LLMs](https://arxiv.org/abs/2504.05147) Amrita Roy Chowdhury, David Glukhov, Divyam Anshumaan, Prasad Chalasani, Nicolas Papernot, Somesh Jha, Mihir Bellare -+ [Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial approaches](https://arxiv.org//abs/2504.04751) ++ [Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial approaches](https://arxiv.org/abs/2504.04751) Eloi Moliner, Michal Švento, Alec Wright, Lauri Juvela, Pavel Rajmic, Vesa Välimäki -+ [Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs](https://arxiv.org//abs/2504.04715) ++ [Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs](https://arxiv.org/abs/2504.04715) Will Cai, Tianneng Shi, Xuandong Zhao, Dawn Song @@ -11806,107 +11806,107 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yujin Potter, Wenbo Guo, Zhun Wang, Tianneng Shi, Andy Zhang, Patrick Gage Kelley, Kurt Thomas, Dawn Song # 2025-04-06 -+ [Systematic Literature Review on Vehicular Collaborative Perception - A Computer Vision Perspective](https://arxiv.org//abs/2504.04631) ++ [Systematic Literature Review on Vehicular Collaborative Perception - A Computer Vision Perspective](https://arxiv.org/abs/2504.04631) Lei Wan, Jianxin Zhao, Andreas Wiedholz, Manuel Bied, Mateus Martinez de Lucena, Abhishek Dinkar Jagtap, Andreas Festag, Antônio Augusto Fröhlich, Hannan Ejaz Keen, Alexey Vinel # 2025-04-05 -+ [Task-based Loss Functions in Computer Vision: A Comprehensive Review](https://arxiv.org//abs/2504.04242) ++ [Task-based Loss Functions in Computer Vision: A Comprehensive Review](https://arxiv.org/abs/2504.04242) Omar Elharrouss, Yasir Mahmood, Yassine Bechqito, Mohamed Adel Serhani, Elarbi Badidi, Jamal Riffi, Hamid Tairi # 2025-04-04 -+ [Multi-lingual Multi-turn Automated Red Teaming for LLMs](https://arxiv.org//abs/2504.03174) ++ [Multi-lingual Multi-turn Automated Red Teaming for LLMs](https://arxiv.org/abs/2504.03174) Abhishek Singhania, Christophe Dupuy, Shivam Mangale, Amani Namboori -+ [AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing](https://arxiv.org//abs/2504.03587) ++ [AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing](https://arxiv.org/abs/2504.03587) Niu Lian, Jun Li, Jinpeng Wang, Ruisheng Luo, Yaowei Wang, Shu-Tao Xia, Bin Chen -+ [PPFPL: Cross-silo Privacy-preserving Federated Prototype Learning Against Data Poisoning Attacks on Non-IID Data](https://arxiv.org//abs/2504.03173) ++ [PPFPL: Cross-silo Privacy-preserving Federated Prototype Learning Against Data Poisoning Attacks on Non-IID Data](https://arxiv.org/abs/2504.03173) Hongliang Zhang, Jiguo Yu, Fenghua Xu, Chunqiang Hu, Yongzhao Zhang, Xiaofen Wang, Zhongyuan Yu, Xiaosong Zhang -+ [Multi-lingual Multi-turn Automated Red Teaming for LLMs](https://arxiv.org//abs/2504.03174) ++ [Multi-lingual Multi-turn Automated Red Teaming for LLMs](https://arxiv.org/abs/2504.03174) Abhishek Singhania, Christophe Dupuy, Shivam Mangale, Amani Namboori -+ [AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing](https://arxiv.org//abs/2504.03587) ++ [AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing](https://arxiv.org/abs/2504.03587) Niu Lian, Jun Li, Jinpeng Wang, Ruisheng Luo, Yaowei Wang, Shu-Tao Xia, Bin Chen -+ [PPFPL: Cross-silo Privacy-preserving Federated Prototype Learning Against Data Poisoning Attacks on Non-IID Data](https://arxiv.org//abs/2504.03173) ++ [PPFPL: Cross-silo Privacy-preserving Federated Prototype Learning Against Data Poisoning Attacks on Non-IID Data](https://arxiv.org/abs/2504.03173) Hongliang Zhang, Jiguo Yu, Fenghua Xu, Chunqiang Hu, Yongzhao Zhang, Xiaofen Wang, Zhongyuan Yu, Xiaosong Zhang # 2025-04-03 -+ [ESC: Erasing Space Concept for Knowledge Deletion](https://arxiv.org//abs/2504.02199) ++ [ESC: Erasing Space Concept for Knowledge Deletion](https://arxiv.org/abs/2504.02199) Tae-Young Lee, Sundong Park, Minwoo Jeon, Hyoseok Hwang, Gyeong-Moon Park -+ [Retrieval-Augmented Purifier for Robust LLM-Empowered Recommendation](https://arxiv.org//abs/2504.02458) ++ [Retrieval-Augmented Purifier for Robust LLM-Empowered Recommendation](https://arxiv.org/abs/2504.02458) Liangbo Ning, Wenqi Fan, Qing Li -+ [Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study](https://arxiv.org//abs/2504.02733) ++ [Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study](https://arxiv.org/abs/2504.02733) Aryan Agrawal, Lisa Alazraki, Shahin Honarvar, Marek Rei -+ [Evaluating and Enhancing Segmentation Model Robustness with Metamorphic Testing](https://arxiv.org//abs/2504.02335) ++ [Evaluating and Enhancing Segmentation Model Robustness with Metamorphic Testing](https://arxiv.org/abs/2504.02335) Seif Mzoughi, Mohamed Elshafeia, Foutse Khomh -+ [Secure Generalization through Stochastic Bidirectional Parameter Updates Using Dual-Gradient Mechanism](https://arxiv.org//abs/2504.02213) ++ [Secure Generalization through Stochastic Bidirectional Parameter Updates Using Dual-Gradient Mechanism](https://arxiv.org/abs/2504.02213) Shourya Goel, Himanshi Tibrewal, Anant Jain, Anshul Pundhir, Pravendra Singh -+ [CRC-SGAD: Conformal Risk Control for Supervised Graph Anomaly Detection](https://arxiv.org//abs/2504.02248) ++ [CRC-SGAD: Conformal Risk Control for Supervised Graph Anomaly Detection](https://arxiv.org/abs/2504.02248) Songran Bai, Xiaolong Zheng, Daniel Dajun Zeng -+ [Bridging the Theoretical Gap in Randomized Smoothing](https://arxiv.org//abs/2504.02412) ++ [Bridging the Theoretical Gap in Randomized Smoothing](https://arxiv.org/abs/2504.02412) Blaise Delattre, Paul Caillon, Quentin Barthélemy, Erwan Fagnou, Alexandre Allauzen -+ [Integrating Identity-Based Identification against Adaptive Adversaries in Federated Learning](https://arxiv.org//abs/2504.03077) ++ [Integrating Identity-Based Identification against Adaptive Adversaries in Federated Learning](https://arxiv.org/abs/2504.03077) Jakub Kacper Szelag, Ji-Jian Chin, Lauren Ansell, Sook-Chin Yip -+ [SLACK: Attacking LiDAR-based SLAM with Adversarial Point Injections](https://arxiv.org//abs/2504.03089) ++ [SLACK: Attacking LiDAR-based SLAM with Adversarial Point Injections](https://arxiv.org/abs/2504.03089) Prashant Kumar, Dheeraj Vattikonda, Kshitij Madhav Bhat, Kunal Dargan, Prem Kalra -+ [Integrating Identity-Based Identification against Adaptive Adversaries in Federated Learning](https://arxiv.org//abs/2504.03077) ++ [Integrating Identity-Based Identification against Adaptive Adversaries in Federated Learning](https://arxiv.org/abs/2504.03077) Jakub Kacper Szelag, Ji-Jian Chin, Lauren Ansell, Sook-Chin Yip -+ [SLACK: Attacking LiDAR-based SLAM with Adversarial Point Injections](https://arxiv.org//abs/2504.03089) ++ [SLACK: Attacking LiDAR-based SLAM with Adversarial Point Injections](https://arxiv.org/abs/2504.03089) Prashant Kumar, Dheeraj Vattikonda, Kshitij Madhav Bhat, Kunal Dargan, Prem Kalra -+ [JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model](https://arxiv.org//abs/2504.03770) ++ [JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model](https://arxiv.org/abs/2504.03770) Yi Nian, Shenzhe Zhu, Yuehan Qin, Li Li, Ziyi Wang, Chaowei Xiao, Yue Zhao -+ [More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment](https://arxiv.org//abs/2504.02193) ++ [More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment](https://arxiv.org/abs/2504.02193) Yifan Wang, Runjin Chen, Bolian Li, David Cho, Yihe Deng, Ruqi Zhang, Tianlong Chen, Zhangyang Wang, Ananth Grama, Junyuan Hong @@ -11914,53 +11914,53 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yifan Wang, Runjin Chen, Bolian Li, David Cho, Yihe Deng, Ruqi Zhang, Tianlong Chen, Zhangyang Wang, Ananth Grama, Junyuan Hong -+ [Deep Positive-Negative Prototypes for Adversarially Robust Discriminative Prototypical Learning](https://arxiv.org//abs/2504.03782) ++ [Deep Positive-Negative Prototypes for Adversarially Robust Discriminative Prototypical Learning](https://arxiv.org/abs/2504.03782) Ramin Zarei Sabzevar, Hamed Mohammadzadeh, Tahmineh Tavakoli, Ahad Harati # 2025-04-02 -+ [Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses](https://arxiv.org//abs/2504.02080) ++ [Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses](https://arxiv.org/abs/2504.02080) Zhengchun Shang, Wenlan Wei -+ [On Model Protection in Federated Learning against Eavesdropping Attacks](https://arxiv.org//abs/2504.02114) ++ [On Model Protection in Federated Learning against Eavesdropping Attacks](https://arxiv.org/abs/2504.02114) Dipankar Maity, Kushal Chakrabarti -+ [One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image](https://arxiv.org//abs/2504.02132) ++ [One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image](https://arxiv.org/abs/2504.02132) Ezzeldin Shereen, Dan Ristea, Burak Hasircioglu, Shae McFadden, Vasilios Mavroudis, Chris Hicks -+ [Robust Unsupervised Domain Adaptation for 3D Point Cloud Segmentation Under Source Adversarial Attacks](https://arxiv.org//abs/2504.01659) ++ [Robust Unsupervised Domain Adaptation for 3D Point Cloud Segmentation Under Source Adversarial Attacks](https://arxiv.org/abs/2504.01659) Haosheng Li, Junjie Chen, Yuecong Xu, Kemi Ding -+ [Like Oil and Water: Group Robustness Methods and Poisoning Defenses May Be at Odds](https://arxiv.org//abs/2504.02142) ++ [Like Oil and Water: Group Robustness Methods and Poisoning Defenses May Be at Odds](https://arxiv.org/abs/2504.02142) Michael-Andrei Panaitescu-Liess, Yigitcan Kaya, Sicheng Zhu, Furong Huang, Tudor Dumitras -+ [All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning](https://arxiv.org//abs/2504.01396) ++ [All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning](https://arxiv.org/abs/2504.01396) Zheng Yang, Ruoxin Chen, Zhiyuan Yan, Ke-Yue Zhang, Xinghe Fu, Shuang Wu, Xiujun Shu, Taiping Yao, Shouhong Ding, Xi Li -+ [Representation Bending for Large Language Model Safety](https://arxiv.org//abs/2504.01550) ++ [Representation Bending for Large Language Model Safety](https://arxiv.org/abs/2504.01550) Ashkan Yousefpour, Taeheon Kim, Ryan S. Kwon, Seungbeen Lee, Wonje Jeung, Seungju Han, Alvin Wan, Harrison Ngan, Youngjae Yu, Jonghyun Choi -+ [PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$de Contextualization](https://arxiv.org//abs/2504.01444) ++ [PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$de Contextualization](https://arxiv.org/abs/2504.01444) Aofan Liu, Lulu Tang, Ting Pan, Yuguo Yin, Bin Wang, Ao Yang -+ [Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation](https://arxiv.org//abs/2504.01668) ++ [Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation](https://arxiv.org/abs/2504.01668) Junjie Chen, Yuecong Xu, Haosheng Li, Kemi Ding -+ [A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content](https://arxiv.org//abs/2504.02898) ++ [A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content](https://arxiv.org/abs/2504.02898) Lele Cao @@ -11969,89 +11969,89 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Aofan Liu, Lulu Tang, Ting Pan, Yuguo Yin, Bin Wang, Ao Yang # 2025-04-01 -+ [Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection](https://arxiv.org//abs/2504.00429) ++ [Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection](https://arxiv.org/abs/2504.00429) Yinghe Zhang, Chi Liu, Shuai Zhou, Sheng Shen, Peng Gui -+ [FA^{3}-CLIP: Frequency-Aware Cues Fusion and Attack-Agnostic Prompt Learning for Unified Face Attack Detection](https://arxiv.org//abs/2504.00454) ++ [FA^{3}-CLIP: Frequency-Aware Cues Fusion and Attack-Agnostic Prompt Learning for Unified Face Attack Detection](https://arxiv.org/abs/2504.00454) Yongze Li, Ning Li, Ajian Liu, Hui Ma, Liying Yang, Xihong Chen, Zhiyao Liang, Yanyan Liang, Jun Wan, Zhen Lei -+ [Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection](https://arxiv.org//abs/2504.00458) ++ [Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection](https://arxiv.org/abs/2504.00458) Shunxin Chen, Ajian Liu, Junze Zheng, Jun Wan, Kailai Peng, Sergio Escalera, Zhen Lei -+ [Alleviating Performance Disparity in Adversarial Spatiotemporal Graph Learning Under Zero-Inflated Distribution](https://arxiv.org//abs/2504.00721) ++ [Alleviating Performance Disparity in Adversarial Spatiotemporal Graph Learning Under Zero-Inflated Distribution](https://arxiv.org/abs/2504.00721) Songran Bai, Yuheng Ji, Yue Liu, Xingwei Zhang, Xiaolong Zheng, Daniel Dajun Zeng -+ [TAMIS: Tailored Membership Inference Attacks on Synthetic Data](https://arxiv.org//abs/2504.00758) ++ [TAMIS: Tailored Membership Inference Attacks on Synthetic Data](https://arxiv.org/abs/2504.00758) Paul Andrey, Batiste Le Bars, Marc Tommasi -+ [CopyQNN: Quantum Neural Network Extraction Attack under Varying Quantum Noise](https://arxiv.org//abs/2504.00366) ++ [CopyQNN: Quantum Neural Network Extraction Attack under Varying Quantum Noise](https://arxiv.org/abs/2504.00366) Zhenxiao Fu, Leyi Zhao, Xuhong Zhang, Yilun Xu, Gang Huang, Fan Chen -+ [Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems](https://arxiv.org//abs/2504.00858) ++ [Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems](https://arxiv.org/abs/2504.00858) Weifei Jin, Yuxin Cao, Junjie Su, Derui Wang, Yedi Zhang, Minhui Xue, Jie Hao, Jin Song Dong, Yixian Yang -+ [Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics](https://arxiv.org//abs/2504.00446) ++ [Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics](https://arxiv.org/abs/2504.00446) Shide Zhou, Kailong Wang, Ling Shi, Haoyu Wang -+ [The Illusionist's Prompt: Exposing the Factual Vulnerabilities of Large Language Models with Linguistic Nuances](https://arxiv.org//abs/2504.02865) ++ [The Illusionist's Prompt: Exposing the Factual Vulnerabilities of Large Language Models with Linguistic Nuances](https://arxiv.org/abs/2504.02865) Yining Wang, Yuquan Wang, Xi Li, Mi Zhang, Geng Hong, Min Yang # 2025-03-31 -+ [Pay More Attention to the Robustness of Prompt for Instruction Data Mining](https://arxiv.org//abs/2503.24028) ++ [Pay More Attention to the Robustness of Prompt for Instruction Data Mining](https://arxiv.org/abs/2503.24028) Qiang Wang, Dawei Feng, Xu Zhang, Ao Shen, Yang Xu, Bo Ding, Huaimin Wang -+ [Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios](https://arxiv.org//abs/2503.23708) ++ [Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios](https://arxiv.org/abs/2503.23708) Jingzheng Li, Xianglong Liu, Shikui Wei, Zhijun Chen, Bing Li, Qing Guo, Xianqi Yang, Yanjun Pu, Jiakai Wang -+ [Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms](https://arxiv.org//abs/2503.24191) ++ [Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms](https://arxiv.org/abs/2503.24191) Shuoming Zhang, Jiacheng Zhao, Ruiyuan Xu, Xiaobing Feng, Huimin Cui -+ [Value of Information-based Deceptive Path Planning Under Adversarial Interventions](https://arxiv.org//abs/2503.24284) ++ [Value of Information-based Deceptive Path Planning Under Adversarial Interventions](https://arxiv.org/abs/2503.24284) Wesley A. Suttle, Jesse Milzman, Mustafa O. Karabag, Brian M. Sadler, Ufuk Topcu -+ [Get the Agents Drunk: Memory Perturbations in Autonomous Agent-based Recommender Systems](https://arxiv.org//abs/2503.23804) ++ [Get the Agents Drunk: Memory Perturbations in Autonomous Agent-based Recommender Systems](https://arxiv.org/abs/2503.23804) Shiyi Yang, Zhibo Hu, Chen Wang, Tong Yu, Xiwei Xu, Liming Zhu, Lina Yao -+ [Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation](https://arxiv.org//abs/2503.23869) ++ [Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation](https://arxiv.org/abs/2503.23869) Yongle Li, Bo Liu, Sheng Huang, ZHeng ZHang, Xiaotong Yuan, Richang Hong -+ [A Channel-Triggered Backdoor Attack on Wireless Semantic Image Reconstruction](https://arxiv.org//abs/2503.23866) ++ [A Channel-Triggered Backdoor Attack on Wireless Semantic Image Reconstruction](https://arxiv.org/abs/2503.23866) Jialin Wan, Nan Cheng, Jinglong Shen -+ [TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection](https://arxiv.org//abs/2503.24115) ++ [TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection](https://arxiv.org/abs/2503.24115) Zhiming Ma, Peidong Wang, Minhua Huang, Jingpeng Wang, Kai Wu, Xiangzhao Lv, Yachun Pang, Yin Yang, Wenjie Tang, Yuchen Kang @@ -12060,512 +12060,512 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Rana Muhammad Shahroz Khan, Zhen Tan, Sukwon Yun, Charles Fleming, Tianlong Chen # 2025-03-30 -+ [Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better](https://arxiv.org//abs/2504.00038) ++ [Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better](https://arxiv.org/abs/2504.00038) MingWei Zhou, Xiaobing Pei # 2025-03-29 -+ [AuditVotes: A Framework Towards More Deployable Certified Robustness for Graph Neural Networks](https://arxiv.org//abs/2503.22998) ++ [AuditVotes: A Framework Towards More Deployable Certified Robustness for Graph Neural Networks](https://arxiv.org/abs/2503.22998) Yuni Lai, Yulin Zhu, Yixuan Sun, Yulun Wu, Bin Xiao, Gaolei Li, Jianhua Li, Kai Zhou -+ [Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions](https://arxiv.org//abs/2503.23250) ++ [Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions](https://arxiv.org/abs/2503.23250) Shih-Han Chan # 2025-03-28 -+ [Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories](https://arxiv.org//abs/2503.22115) ++ [Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories](https://arxiv.org/abs/2503.22115) Yazhou Zhang, Qimeng Liu, Qiuchi Li, Peng Zhang, Jing Qin -+ [Data-Free Universal Attack by Exploiting the Intrinsic Vulnerability of Deep Models](https://arxiv.org//abs/2503.22205) ++ [Data-Free Universal Attack by Exploiting the Intrinsic Vulnerability of Deep Models](https://arxiv.org/abs/2503.22205) YangTian Yan, Jinyu Tian -+ [Imperceptible but Forgeable: Practical Invisible Watermark Forgery via Diffusion Models](https://arxiv.org//abs/2503.22330) ++ [Imperceptible but Forgeable: Practical Invisible Watermark Forgery via Diffusion Models](https://arxiv.org/abs/2503.22330) Ziping Dong, Chao Shuai, Zhongjie Ba, Peng Cheng, Zhan Qin, Qinglong Wang, Kui Ren -+ [T-CIL: Temperature Scaling using Adversarial Perturbation for Calibration in Class-Incremental Learning](https://arxiv.org//abs/2503.22163) ++ [T-CIL: Temperature Scaling using Adversarial Perturbation for Calibration in Class-Incremental Learning](https://arxiv.org/abs/2503.22163) Seong-Hyeon Hwang, Minsu Kim, Steven Euijong Whang -+ [Tropical Bisectors and Carlini-Wagner Attacks](https://arxiv.org//abs/2503.22653) ++ [Tropical Bisectors and Carlini-Wagner Attacks](https://arxiv.org/abs/2503.22653) Gillian Grindstaff, Julia Lindberg, Daniela Schkoda, Miruna-Stefana Sorea, Ruriko Yoshida -+ [Instance-Level Data-Use Auditing of Visual ML Models](https://arxiv.org//abs/2503.22413) ++ [Instance-Level Data-Use Auditing of Visual ML Models](https://arxiv.org/abs/2503.22413) Zonghao Huang, Neil Zhenqiang Gong, Michael K. Reiter -+ [Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models](https://arxiv.org//abs/2504.03714) ++ [Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models](https://arxiv.org/abs/2504.03714) Runpeng Dai, Run Yang, Fan Zhou, Hongtu Zhu # 2025-03-27 -+ [Adversarial Wear and Tear: Exploiting Natural Damage for Generating Physical-World Adversarial Examples](https://arxiv.org//abs/2503.21164) ++ [Adversarial Wear and Tear: Exploiting Natural Damage for Generating Physical-World Adversarial Examples](https://arxiv.org/abs/2503.21164) Samra Irshad, Seungkyu Lee, Nassir Navab, Hong Joo Lee, Seong Tae Kim -+ [DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data](https://arxiv.org//abs/2503.21305) ++ [DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data](https://arxiv.org/abs/2503.21305) Dorde Popovic, Amin Sadeghi, Ting Yu, Sanjay Chawla, Issa Khalil -+ [Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection](https://arxiv.org//abs/2503.21464) ++ [Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection](https://arxiv.org/abs/2503.21464) Ryan Marinelli, Josef Pichlmeier, Tamas Bisztray -+ [Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing](https://arxiv.org//abs/2503.21598) ++ [Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing](https://arxiv.org/abs/2503.21598) Johan Wahréus, Ahmed Hussain, Panos Papadimitratos -+ [AMA-SAM: Adversarial Multi-Domain Alignment of Segment Anything Model for High-Fidelity Histology Nuclei Segmentation](https://arxiv.org//abs/2503.21695) ++ [AMA-SAM: Adversarial Multi-Domain Alignment of Segment Anything Model for High-Fidelity Histology Nuclei Segmentation](https://arxiv.org/abs/2503.21695) Jiahe Qian, Yaoyu Fang, Jinkui Hao, Bo Zhou -+ [Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing](https://arxiv.org//abs/2503.21236) ++ [Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing](https://arxiv.org/abs/2503.21236) Shuai Li, Jie Zhang, Yuang Qi, Kejiang Chen, Tianwei Zhang, Weiming Zhang, Nenghai Yu -+ [Tricking Retrievers with Influential Tokens: An Efficient Black-Box Corpus Poisoning Attack](https://arxiv.org//abs/2503.21315) ++ [Tricking Retrievers with Influential Tokens: An Efficient Black-Box Corpus Poisoning Attack](https://arxiv.org/abs/2503.21315) Cheng Wang, Yiwei Wang, Yujun Cai, Bryan Hooi -+ [AdvSGM: Differentially Private Graph Learning via Adversarial Skip-gram Model](https://arxiv.org//abs/2503.21426) ++ [AdvSGM: Differentially Private Graph Learning via Adversarial Skip-gram Model](https://arxiv.org/abs/2503.21426) Sen Zhang, Qingqing Ye, Haibo Hu, Jianliang Xu -+ [EmoDebt: Bayesian-Optimized Emotional Intelligence for Strategic Agent-to-Agent Debt Recovery](https://arxiv.org//abs/2503.21080) ++ [EmoDebt: Bayesian-Optimized Emotional Intelligence for Strategic Agent-to-Agent Debt Recovery](https://arxiv.org/abs/2503.21080) Yunbo Long, Yuhan Liu, Liming Xu, Alexandra Brintrup # 2025-03-26 -+ [sudo rm -rf agentic_security](https://arxiv.org//abs/2503.20279) ++ [sudo rm -rf agentic_security](https://arxiv.org/abs/2503.20279) Sejin Lee, Jian Kim, Haon Park, Ashkan Yousefpour, Sangyoon Yu, Min Song -+ [Are We There Yet? Unraveling the State-of-the-Art Graph Network Intrusion Detection Systems](https://arxiv.org//abs/2503.20281) ++ [Are We There Yet? Unraveling the State-of-the-Art Graph Network Intrusion Detection Systems](https://arxiv.org/abs/2503.20281) Chenglong Wang, Pujia Zheng, Jiaping Gui, Cunqing Hua, Wajih Ul Hassan -+ [Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation](https://arxiv.org//abs/2503.20285) ++ [Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation](https://arxiv.org/abs/2503.20285) Hongye Cao, Fan Feng, Jing Huo, Shangdong Yang, Meng Fang, Tianpei Yang, Yang Gao -+ [Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models](https://arxiv.org//abs/2503.20320) ++ [Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models](https://arxiv.org/abs/2503.20320) Shih-Wen Ke, Guan-Yu Lai, Guo-Lin Fang, Hsi-Yuan Kao -+ [State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning](https://arxiv.org//abs/2503.20613) ++ [State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning](https://arxiv.org/abs/2503.20613) Zongyuan Zhang, Tianyang Duan, Zheng Lin, Dong Huang, Zihan Fang, Zekai Sun, Ling Xiong, Hongbin Liang, Heming Cui, Yong Cui -+ [$β$-GNN: A Robust Ensemble Approach Against Graph Structure Perturbation](https://arxiv.org//abs/2503.20630) ++ [$β$-GNN: A Robust Ensemble Approach Against Graph Structure Perturbation](https://arxiv.org/abs/2503.20630) Haci Ismail Aslan, Philipp Wiesner, Ping Xiong, Odej Kao -+ [Enabling Heterogeneous Adversarial Transferability via Feature Permutation Attacks](https://arxiv.org//abs/2503.20310) ++ [Enabling Heterogeneous Adversarial Transferability via Feature Permutation Attacks](https://arxiv.org/abs/2503.20310) Tao Wu, Tie Luo -+ [Lipschitz Constant Meets Condition Number: Learning Robust and Compact Deep Neural Networks](https://arxiv.org//abs/2503.20454) ++ [Lipschitz Constant Meets Condition Number: Learning Robust and Compact Deep Neural Networks](https://arxiv.org/abs/2503.20454) Yangqi Feng, Shing-Ho J. Lin, Baoyuan Gao, Xian Wei -+ [Feature Statistics with Uncertainty Help Adversarial Robustness](https://arxiv.org//abs/2503.20583) ++ [Feature Statistics with Uncertainty Help Adversarial Robustness](https://arxiv.org/abs/2503.20583) Ran Wang, Xinlei Zhou, Rihao Li, Meng Hu, Wenhui Wu, Yuheng Jia -+ [DR-PETS: Learning-Based Control With Planning in Adversarial Environments](https://arxiv.org//abs/2503.20660) ++ [DR-PETS: Learning-Based Control With Planning in Adversarial Environments](https://arxiv.org/abs/2503.20660) Hozefa Jesawada, Antonio Acernese, Giovanni Russo, Carmen Del Vecchiob -+ [How Secure is Forgetting? Linking Machine Unlearning to Machine Learning Attacks](https://arxiv.org//abs/2503.20257) ++ [How Secure is Forgetting? Linking Machine Unlearning to Machine Learning Attacks](https://arxiv.org/abs/2503.20257) Muhammed Shafi K. P., Serena Nicolazzo, Antonino Nocera, Vinod P -+ [Reflex: Faster Secure Collaborative Analytics via Controlled Intermediate Result Size Disclosure](https://arxiv.org//abs/2503.20932) ++ [Reflex: Faster Secure Collaborative Analytics via Controlled Intermediate Result Size Disclosure](https://arxiv.org/abs/2503.20932) Long Gu, Shaza Zeitouni, Carsten Binnig, Zsolt István # 2025-03-25 -+ [Process or Result? Manipulated Ending Tokens Can Mislead Reasoning LLMs to Ignore the Correct Reasoning Steps](https://arxiv.org//abs/2503.19326) ++ [Process or Result? Manipulated Ending Tokens Can Mislead Reasoning LLMs to Ignore the Correct Reasoning Steps](https://arxiv.org/abs/2503.19326) Yu Cui, Bryan Hooi, Yujun Cai, Yiwei Wang -+ [Bitstream Collisions in Neural Image Compression via Adversarial Perturbations](https://arxiv.org//abs/2503.19817) ++ [Bitstream Collisions in Neural Image Compression via Adversarial Perturbations](https://arxiv.org/abs/2503.19817) Jordan Madden, Lhamo Dorje, Xiaohua Li -+ [Stop Walking in Circles! Bailing Out Early in Projected Gradient Descent](https://arxiv.org//abs/2503.19347) ++ [Stop Walking in Circles! Bailing Out Early in Projected Gradient Descent](https://arxiv.org/abs/2503.19347) Philip Doldo, Derek Everett, Amol Khanna, Andre T Nguyen, Edward Raff -+ [SITA: Structurally Imperceptible and Transferable Adversarial Attacks for Stylized Image Generation](https://arxiv.org//abs/2503.19791) ++ [SITA: Structurally Imperceptible and Transferable Adversarial Attacks for Stylized Image Generation](https://arxiv.org/abs/2503.19791) Jingdan Kang, Haoxin Yang, Yan Cai, Huaidong Zhang, Xuemiao Xu, Yong Du, Shengfeng He -+ [Membership Inference Attacks on Large-Scale Models: A Survey](https://arxiv.org//abs/2503.19338) ++ [Membership Inference Attacks on Large-Scale Models: A Survey](https://arxiv.org/abs/2503.19338) Hengyu Wu, Yang Cao -+ [Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization](https://arxiv.org//abs/2503.19591) ++ [Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization](https://arxiv.org/abs/2503.19591) Weifei Jin, Junjie Su, Hejia Wang, Yulin Ye, Jie Hao -+ [Efficient Adversarial Detection Frameworks for Vehicle-to-Microgrid Services in Edge Computing](https://arxiv.org//abs/2503.19318) ++ [Efficient Adversarial Detection Frameworks for Vehicle-to-Microgrid Services in Edge Computing](https://arxiv.org/abs/2503.19318) Ahmed Omara, Burak Kantarci -+ [Towards Imperceptible Adversarial Attacks for Time Series Classification with Local Perturbations and Frequency Analysis](https://arxiv.org//abs/2503.19519) ++ [Towards Imperceptible Adversarial Attacks for Time Series Classification with Local Perturbations and Frequency Analysis](https://arxiv.org/abs/2503.19519) Wenwei Gu, Renyi Zhong, Jianping Zhang, Michael R. Lyu -+ [Robust Deep Reinforcement Learning in Robotics via Adaptive Gradient-Masked Adversarial Attacks](https://arxiv.org//abs/2503.20844) ++ [Robust Deep Reinforcement Learning in Robotics via Adaptive Gradient-Masked Adversarial Attacks](https://arxiv.org/abs/2503.20844) Zongyuan Zhang, Tianyang Duan, Zheng Lin, Dong Huang, Zihan Fang, Zekai Sun, Ling Xiong, Hongbin Liang, Heming Cui, Yong Cui, Yue Gao -+ [Robust Federated Learning Against Poisoning Attacks: A GAN-Based Defense Framework](https://arxiv.org//abs/2503.20884) ++ [Robust Federated Learning Against Poisoning Attacks: A GAN-Based Defense Framework](https://arxiv.org/abs/2503.20884) Usama Zafar, André Teixeira, Salman Toor -+ [Prototype Guided Backdoor Defense](https://arxiv.org//abs/2503.20925) ++ [Prototype Guided Backdoor Defense](https://arxiv.org/abs/2503.20925) Venkat Adithya Amula, Sunayana Samavedam, Saurabh Saini, Avani Gupta, Narayanan P J -+ [TS-Inverse: A Gradient Inversion Attack Tailored for Federated Time Series Forecasting Models](https://arxiv.org//abs/2503.20952) ++ [TS-Inverse: A Gradient Inversion Attack Tailored for Federated Time Series Forecasting Models](https://arxiv.org/abs/2503.20952) Caspar Meijer, Jiyue Huang, Shreshtha Sharma, Elena Lazovik, Lydia Y. Chen -+ [Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy](https://arxiv.org//abs/2503.20823) ++ [Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy](https://arxiv.org/abs/2503.20823) Joonhyun Jeong, Seyun Bae, Yeonsung Jung, Jaeryong Hwang, Eunho Yang # 2025-03-24 -+ [When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD](https://arxiv.org//abs/2503.18290) ++ [When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD](https://arxiv.org/abs/2503.18290) Paul K. Mandal -+ [Anchor-based oversampling for imbalanced tabular data via contrastive and adversarial learning](https://arxiv.org//abs/2503.18569) ++ [Anchor-based oversampling for imbalanced tabular data via contrastive and adversarial learning](https://arxiv.org/abs/2503.18569) Hadi Mohammadi, Ehsan Nazerfard, Mostafa Haghir Chehreghani -+ [Defeating Prompt Injections by Design](https://arxiv.org//abs/2503.18813) ++ [Defeating Prompt Injections by Design](https://arxiv.org/abs/2503.18813) Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr -+ [J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal Domain](https://arxiv.org//abs/2503.18360) ++ [J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal Domain](https://arxiv.org/abs/2503.18360) Yiran Hu, Huanghai Liu, Qingjing Chen, Ning Zheng, Chong Wang, Yun Liu, Charles L.A. Clarke, Weixing Shen -+ [NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping](https://arxiv.org//abs/2503.18678) ++ [NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping](https://arxiv.org/abs/2503.18678) Tianyi Wang, Harry Cheng, Xiao Zhang, Yinglong Wang -+ [Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection](https://arxiv.org//abs/2503.18784) ++ [Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection](https://arxiv.org/abs/2503.18784) Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou, Yan Gu -+ [Deterministic Certification of Graph Neural Networks against Graph Poisoning Attacks with Arbitrary Perturbations](https://arxiv.org//abs/2503.18503) ++ [Deterministic Certification of Graph Neural Networks against Graph Poisoning Attacks with Arbitrary Perturbations](https://arxiv.org/abs/2503.18503) Jiate Li, Meng Pang, Yun Dong, Binghui Wang -+ [Graph-Level Label-Only Membership Inference Attack against Graph Neural Networks](https://arxiv.org//abs/2503.19070) ++ [Graph-Level Label-Only Membership Inference Attack against Graph Neural Networks](https://arxiv.org/abs/2503.19070) Jiazhu Dai, Yubing Lu -+ [Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship Verification](https://arxiv.org//abs/2503.19099) ++ [Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship Verification](https://arxiv.org/abs/2503.19099) Kenneth Alperin, Rohan Leekha, Adaku Uchendu, Trang Nguyen, Srilakshmi Medarametla, Carlos Levya Capote, Seth Aycock, Charlie Dagli -+ [MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks](https://arxiv.org//abs/2503.19134) ++ [MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks](https://arxiv.org/abs/2503.19134) Wenhao You, Bryan Hooi, Yiwei Wang, Youke Wang, Zong Ke, Ming-Hsuan Yang, Zi Huang, Yujun Cai -+ [Activation Functions Considered Harmful: Recovering Neural Network Weights through Controlled Channels](https://arxiv.org//abs/2503.19142) ++ [Activation Functions Considered Harmful: Recovering Neural Network Weights through Controlled Channels](https://arxiv.org/abs/2503.19142) Jesse Spielman, David Oswald, Mark Ryan, Jo Van Bulck -+ [MODIS: Multi-Omics Data Integration for Small and unpaired datasets](https://arxiv.org//abs/2503.18856) ++ [MODIS: Multi-Omics Data Integration for Small and unpaired datasets](https://arxiv.org/abs/2503.18856) Daniel Lepe-Soltero, Thierry Artières, Anaïs Baudot, Paul Villoutreix # 2025-03-23 -+ [STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models](https://arxiv.org//abs/2503.17932) ++ [STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models](https://arxiv.org/abs/2503.17932) Xunguang Wang, Wenxuan Wang, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Daoyuan Wu, Shuai Wang -+ [Metaphor-based Jailbreaking Attacks on Text-to-Image Models](https://arxiv.org//abs/2503.17987) ++ [Metaphor-based Jailbreaking Attacks on Text-to-Image Models](https://arxiv.org/abs/2503.17987) Chenyu Zhang, Yiwen Ma, Lanjun Wang, Wenhui Li, Yi Tu, An-An Liu -+ [Enhance GNNs with Reliable Confidence Estimation via Adversarial Calibration Learning](https://arxiv.org//abs/2503.18235) ++ [Enhance GNNs with Reliable Confidence Estimation via Adversarial Calibration Learning](https://arxiv.org/abs/2503.18235) Yilong Wang, Jiahao Zhang, Tianxiang Zhao, Suhang Wang -+ [Model-Guardian: Protecting against Data-Free Model Stealing Using Gradient Representations and Deceptive Predictions](https://arxiv.org//abs/2503.18081) ++ [Model-Guardian: Protecting against Data-Free Model Stealing Using Gradient Representations and Deceptive Predictions](https://arxiv.org/abs/2503.18081) Yunfei Yang, Xiaojun Chen, Yuexin Xuan, Zhendong Zhao -+ [HAIR: Hardness-Aware Inverse Reinforcement Learning with Introspective Reasoning for LLM Alignment](https://arxiv.org//abs/2503.18991) ++ [HAIR: Hardness-Aware Inverse Reinforcement Learning with Introspective Reasoning for LLM Alignment](https://arxiv.org/abs/2503.18991) Ruoxi Cheng, Haoxuan Ma, Weixin Wang -+ [Personalized Language Models via Privacy-Preserving Evolutionary Model Merging](https://arxiv.org//abs/2503.18008) ++ [Personalized Language Models via Privacy-Preserving Evolutionary Model Merging](https://arxiv.org/abs/2503.18008) Kyuyoung Kim, Jinwoo Shin, Jaehyung Kim # 2025-03-22 -+ [Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model](https://arxiv.org//abs/2503.17724) ++ [Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model](https://arxiv.org/abs/2503.17724) Jie Zhang, Zhongqi Wang, Shiguang Shan, Xilin Chen # 2025-03-21 -+ [Rethinking the Role of Spatial Mixing](https://arxiv.org//abs/2503.16760) ++ [Rethinking the Role of Spatial Mixing](https://arxiv.org/abs/2503.16760) George Cazenavette, Joel Julin, Simon Lucey -+ [EasyRobust: A Comprehensive and Easy-to-use Toolkit for Robust and Generalized Vision](https://arxiv.org//abs/2503.16975) ++ [EasyRobust: A Comprehensive and Easy-to-use Toolkit for Robust and Generalized Vision](https://arxiv.org/abs/2503.16975) Xiaofeng Mao, Yuefeng Chen, Rong Zhang, Hui Xue, Zhao Li, Hang Su -+ [Hi-ALPS -- An Experimental Robustness Quantification of Six LiDAR-based Object Detection Systems for Autonomous Driving](https://arxiv.org//abs/2503.17168) ++ [Hi-ALPS -- An Experimental Robustness Quantification of Six LiDAR-based Object Detection Systems for Autonomous Driving](https://arxiv.org/abs/2503.17168) Alexandra Arzberger, Ramin Tavakoli Kolagari -+ [Lie Detector: Unified Backdoor Detection via Cross-Examination Framework](https://arxiv.org//abs/2503.16872) ++ [Lie Detector: Unified Backdoor Detection via Cross-Examination Framework](https://arxiv.org/abs/2503.16872) Xuan Wang, Siyuan Liang, Dongping Liao, Han Fang, Aishan Liu, Xiaochun Cao, Yu-liang Lu, Ee-Chien Chang, Xitong Gao -+ [Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising](https://arxiv.org//abs/2503.17198) ++ [Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising](https://arxiv.org/abs/2503.17198) Yongli Xiang, Ziming Hong, Lina Yao, Dadong Wang, Tongliang Liu -+ [Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers](https://arxiv.org//abs/2503.17172) ++ [Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers](https://arxiv.org/abs/2503.17172) Gaojie Jin, Tianjin Huang, Ronghui Mu, Xiaowei Huang -+ [Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability](https://arxiv.org//abs/2503.17173) ++ [Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability](https://arxiv.org/abs/2503.17173) Sanjif Shanmugavelu, Mathieu Taillefumier, Christopher Culver, Vijay Ganesh, Oscar Hernandez, Ada Sedova -+ [Measuring the Robustness of Audio Deepfake Detectors](https://arxiv.org//abs/2503.17577) ++ [Measuring the Robustness of Audio Deepfake Detectors](https://arxiv.org/abs/2503.17577) Xiang Li, Pin-Yu Chen, Wenqi Wei -+ [Understanding Bias Reinforcement in LLM Agents Debate](https://arxiv.org//abs/2503.16814) ++ [Understanding Bias Reinforcement in LLM Agents Debate](https://arxiv.org/abs/2503.16814) Jihwan Oh, Minchan Jeong, Jongwoo Ko, Se-Young Yun # 2025-03-20 -+ [AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration](https://arxiv.org//abs/2503.15754) ++ [AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration](https://arxiv.org/abs/2503.15754) Andy Zhou, Kevin Wu, Francesco Pinto, Zhaorun Chen, Yi Zeng, Yu Yang, Shuang Yang, Sanmi Koyejo, James Zou, Bo Li -+ [AI Agents in Cryptoland: Practical Attacks and No Silver Bullet](https://arxiv.org//abs/2503.16248) ++ [AI Agents in Cryptoland: Practical Attacks and No Silver Bullet](https://arxiv.org/abs/2503.16248) Atharv Singh Patlan, Peiyao Sheng, S. Ashwin Hebbar, Prateek Mittal, Pramod Viswanath -+ [Narrowing Class-Wise Robustness Gaps in Adversarial Training](https://arxiv.org//abs/2503.16179) ++ [Narrowing Class-Wise Robustness Gaps in Adversarial Training](https://arxiv.org/abs/2503.16179) Fatemeh Amerehi, Patrick Healy -+ [RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility in Autonomous Vehicles](https://arxiv.org//abs/2503.16251) ++ [RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility in Autonomous Vehicles](https://arxiv.org/abs/2503.16251) Dawood Wasif, Terrence J. Moore, Jin-Hee Cho -+ [Rethinking Robustness in Machine Learning: A Posterior Agreement Approach](https://arxiv.org//abs/2503.16271) ++ [Rethinking Robustness in Machine Learning: A Posterior Agreement Approach](https://arxiv.org/abs/2503.16271) João Borges S. Carvalho, Alessandro Torcinovich, Victor Jimenez Rodriguez, Antonio E. Cinà, Carlos Cotrini, Lea Schönherr, Joachim M. Buhmann -+ [BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models](https://arxiv.org//abs/2503.16023) ++ [BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models](https://arxiv.org/abs/2503.16023) Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun -+ [From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning](https://arxiv.org//abs/2503.16266) ++ [From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning](https://arxiv.org/abs/2503.16266) Ziang Li, Hongguang Zhang, Juan Wang, Meihui Chen, Hongxin Hu, Wenzhe Yi, Xiaoyang Xu, Mengda Yang, Chenjun Ma -+ [CAARMA: Class Augmentation with Adversarial Mixup Regularization](https://arxiv.org//abs/2503.16718) ++ [CAARMA: Class Augmentation with Adversarial Mixup Regularization](https://arxiv.org/abs/2503.16718) Massa Baali, Xiang Li, Hao Chen, Rita Singh, Bhiksha Raj -+ [ATOM: A Framework of Detecting Query-Based Model Extraction Attacks for Graph Neural Networks](https://arxiv.org//abs/2503.16693) ++ [ATOM: A Framework of Detecting Query-Based Model Extraction Attacks for Graph Neural Networks](https://arxiv.org/abs/2503.16693) Zhan Cheng, Bolin Shen, Tianming Sha, Yuan Gao, Shibo Li, Yushun Dong -+ [Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI](https://arxiv.org//abs/2503.16233) ++ [Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI](https://arxiv.org/abs/2503.16233) Dawood Wasif, Dian Chen, Sindhuja Madabushi, Nithin Alluru, Terrence J. Moore, Jin-Hee Cho # 2025-03-19 -+ [MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models](https://arxiv.org//abs/2503.14827) ++ [MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models](https://arxiv.org/abs/2503.14827) Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Yujin Potter, Zhun Wang, Zhuowen Yuan, Alexander Xiong, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Ziwei Tan, Xuandong Zhao, Francesco Pinto, Zhen Xiang, Yu Gai, Zinan Lin, Dan Hendrycks, Bo Li, Dawn Song -+ [A Semantic and Clean-label Backdoor Attack against Graph Convolutional Networks](https://arxiv.org//abs/2503.14922) ++ [A Semantic and Clean-label Backdoor Attack against Graph Convolutional Networks](https://arxiv.org/abs/2503.14922) Jiazhu Dai, Haoyu Sun -+ [Test-Time Backdoor Detection for Object Detection Models](https://arxiv.org//abs/2503.15293) ++ [Test-Time Backdoor Detection for Object Detection Models](https://arxiv.org/abs/2503.15293) Hangtao Zhang, Yichen Wang, Shihui Yan, Chenyu Zhu, Ziqi Zhou, Linshan Hou, Shengshan Hu, Minghui Li, Yanjun Zhang, Leo Yu Zhang -+ [Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement](https://arxiv.org//abs/2503.15404) ++ [Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement](https://arxiv.org/abs/2503.15404) Yuchen Ren, Zhengyu Zhao, Chenhao Lin, Bo Yang, Lu Zhou, Zhe Liu, Chao Shen -+ [On the Robustness Tradeoff in Fine-Tuning](https://arxiv.org//abs/2503.14836) ++ [On the Robustness Tradeoff in Fine-Tuning](https://arxiv.org/abs/2503.14836) Kunyang Li, Jean-Charles Noirot Ferrand, Ryan Sheatsley, Blaine Hoak, Yohan Beugin, Eric Pauley, Patrick McDaniel -+ [Unified Enhancement of the Generalization and Robustness of Language Models via Bi-Stage Optimization](https://arxiv.org//abs/2503.16550) ++ [Unified Enhancement of the Generalization and Robustness of Language Models via Bi-Stage Optimization](https://arxiv.org/abs/2503.16550) Yudao Sun, Juan Yin, Juan Zhao, Fan Zhang, Yongheng Liu, Hongji Chen -+ [Defending Against Gradient Inversion Attacks for Biomedical Images via Learnable Data Perturbation](https://arxiv.org//abs/2503.16542) ++ [Defending Against Gradient Inversion Attacks for Biomedical Images via Learnable Data Perturbation](https://arxiv.org/abs/2503.16542) Shiyi Jiang, Farshad Firouzi, Krishnendu Chakrabarty # 2025-03-18 -+ [Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization](https://arxiv.org//abs/2503.13945) ++ [Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization](https://arxiv.org/abs/2503.13945) Long Tang, Dengpan Ye, Sirun Chen, Xiuwen Shi, Yunna Lv, Ziyi Liu -+ [Survey of Adversarial Robustness in Multimodal Large Language Models](https://arxiv.org//abs/2503.13962) ++ [Survey of Adversarial Robustness in Multimodal Large Language Models](https://arxiv.org/abs/2503.13962) Chengze Jiang, Zhuangzhuang Wang, Minjing Dong, Jie Gui -+ [Towards properties of adversarial image perturbations](https://arxiv.org//abs/2503.14111) ++ [Towards properties of adversarial image perturbations](https://arxiv.org/abs/2503.14111) Egor Kuznetsov, Kirill Aistov, Maxim Koroteev -+ [Empirical Calibration and Metric Differential Privacy in Language Models](https://arxiv.org//abs/2503.13872) ++ [Empirical Calibration and Metric Differential Privacy in Language Models](https://arxiv.org/abs/2503.13872) Pedro Faustini, Natasha Fernandes, Annabelle McIver, Mark Dras -+ [Unveiling the Role of Randomization in Multiclass Adversarial Classification: Insights from Graph Theory](https://arxiv.org//abs/2503.14299) ++ [Unveiling the Role of Randomization in Multiclass Adversarial Classification: Insights from Graph Theory](https://arxiv.org/abs/2503.14299) Lucas Gnecco-Heredia, Matteo Sammut, Muni Sreenivas Pydi, Rafael Pinot, Benjamin Negrevergne, Yann Chevaleyre -+ [XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants](https://arxiv.org//abs/2503.14281) ++ [XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants](https://arxiv.org/abs/2503.14281) Adam Štorek, Mukur Gupta, Noopur Bhatt, Aditya Gupta, Janie Kim, Prashast Srivastava, Suman Jana -+ [LipShiFT: A Certifiably Robust Shift-based Vision Transformer](https://arxiv.org//abs/2503.14751) ++ [LipShiFT: A Certifiably Robust Shift-based Vision Transformer](https://arxiv.org/abs/2503.14751) Rohan Menon, Nicola Franco, Stephan Günnemann -+ [RAT: Boosting Misclassification Detection Ability without Extra Data](https://arxiv.org//abs/2503.14783) ++ [RAT: Boosting Misclassification Detection Ability without Extra Data](https://arxiv.org/abs/2503.14783) Ge Yan, Tsui-Wei Weng -+ [Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack](https://arxiv.org//abs/2503.15551) ++ [Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack](https://arxiv.org/abs/2503.15551) Murong Yue, Ziyu Yao -+ [Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models](https://arxiv.org//abs/2503.15560) ++ [Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models](https://arxiv.org/abs/2503.15560) Prashant Kulkarni, Assaf Namer -+ [Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts](https://arxiv.org//abs/2503.16529) ++ [Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts](https://arxiv.org/abs/2503.16529) Wenjing Zhang, Xuejiao Lei, Zhaoxiang Liu, Limin Han, Jiaojiao Zhao, Junting Guo, Zhenhong Long, Shu Yang, Meijuan An, Beibei Huang, Rongjia Du, Ning Wang, Kai Wang, Shiguo Lian @@ -12574,657 +12574,657 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Lucas Gnecco-Heredia, Matteo Sammut, Muni Sreenivas Pydi, Rafael Pinot, Benjamin Negrevergne, Yann Chevaleyre # 2025-03-17 -+ [Analyzing sequential activity and travel decisions with interpretable deep inverse reinforcement learning](https://arxiv.org//abs/2503.12761) ++ [Analyzing sequential activity and travel decisions with interpretable deep inverse reinforcement learning](https://arxiv.org/abs/2503.12761) Yuebing Liang, Shenhao Wang, Jiangbo Yu, Zhan Zhao, Jinhua Zhao, Sandy Pentland -+ [MirrorGuard: Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting](https://arxiv.org//abs/2503.12931) ++ [MirrorGuard: Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting](https://arxiv.org/abs/2503.12931) Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang -+ [A Framework to Assess Multilingual Vulnerabilities of LLMs](https://arxiv.org//abs/2503.13081) ++ [A Framework to Assess Multilingual Vulnerabilities of LLMs](https://arxiv.org/abs/2503.13081) Likai Tang, Niruth Bogahawatta, Yasod Ginige, Jiarui Xu, Shixuan Sun, Surangika Ranathunga, Suranga Seneviratne -+ [Securing Virtual Reality Experiences: Unveiling and Tackling Cybersickness Attacks with Explainable AI](https://arxiv.org//abs/2503.13419) ++ [Securing Virtual Reality Experiences: Unveiling and Tackling Cybersickness Attacks with Explainable AI](https://arxiv.org/abs/2503.13419) Ripan Kumar Kundu, Matthew Denton, Genova Mongalo, Prasad Calyam, Khaza Anuarul Hoque -+ [GSBAK$^K$: $top$-$K$ Geometric Score-based Black-box Attack](https://arxiv.org//abs/2503.12827) ++ [GSBAK$^K$: $top$-$K$ Geometric Score-based Black-box Attack](https://arxiv.org/abs/2503.12827) Md Farhamdur Reza, Richeng Jin, Tianfu Wu, Huaiyu Dai -+ [Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models](https://arxiv.org//abs/2503.12874) ++ [Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models](https://arxiv.org/abs/2503.12874) Xiaojun Jia, Sensen Gao, Simeng Qin, Ke Ma, Xinfeng Li, Yihao Huang, Wei Dong, Yang Liu, Xiaochun Cao -+ [Improving Generalization of Universal Adversarial Perturbation via Dynamic Maximin Optimization](https://arxiv.org//abs/2503.12793) ++ [Improving Generalization of Universal Adversarial Perturbation via Dynamic Maximin Optimization](https://arxiv.org/abs/2503.12793) Yechao Zhang, Yingzhe Xu, Junyu Shi, Leo Yu Zhang, Shengshan Hu, Minghui Li, Yanjun Zhang -+ [BLIA: Detect model memorization in binary classification model through passive Label Inference attack](https://arxiv.org//abs/2503.12801) ++ [BLIA: Detect model memorization in binary classification model through passive Label Inference attack](https://arxiv.org/abs/2503.12801) Mohammad Wahiduzzaman Khan, Sheng Chen, Ilya Mironov, Leizhen Zhang, Rabib Noor -+ [ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction](https://arxiv.org//abs/2503.13224) ++ [ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction](https://arxiv.org/abs/2503.13224) Tong Zhou, Shijin Duan, Gaowen Liu, Charles Fleming, Ramana Rao Kompella, Shaolei Ren, Xiaolin Xu -+ [Safeguarding LLM Embeddings in End-Cloud Collaboration via Entropy-Driven Perturbation](https://arxiv.org//abs/2503.12896) ++ [Safeguarding LLM Embeddings in End-Cloud Collaboration via Entropy-Driven Perturbation](https://arxiv.org/abs/2503.12896) Shuaifan Jin, Xiaoyi Pang, Zhibo Wang, He Wang, Jiacheng Du, Jiahui Hu, Kui Ren -+ [Web Artifact Attacks Disrupt Vision Language Models](https://arxiv.org//abs/2503.13652) ++ [Web Artifact Attacks Disrupt Vision Language Models](https://arxiv.org/abs/2503.13652) Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer -+ [A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation](https://arxiv.org//abs/2503.12899) ++ [A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation](https://arxiv.org/abs/2503.12899) Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang -+ [Robust Decision-Making Via Free Energy Minimization](https://arxiv.org//abs/2503.13223) ++ [Robust Decision-Making Via Free Energy Minimization](https://arxiv.org/abs/2503.13223) Allahkaram Shafiei, Hozefa Jesawada, Karl Friston, Giovanni Russo # 2025-03-16 -+ [Augmented Adversarial Trigger Learning](https://arxiv.org//abs/2503.12339) ++ [Augmented Adversarial Trigger Learning](https://arxiv.org/abs/2503.12339) Zhe Wang, Yanjun Qi -+ [Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy](https://arxiv.org//abs/2503.12497) ++ [Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy](https://arxiv.org/abs/2503.12497) Jian-Ping Mei, Weibin Zhang, Jie Chen, Xuyun Zhang, Tiantian Zhu -+ [UniBERTs: Adversarial Training for Language-Universal Representations](https://arxiv.org//abs/2503.12608) ++ [UniBERTs: Adversarial Training for Language-Universal Representations](https://arxiv.org/abs/2503.12608) Andrei-Marius Avram, Marian Lupaşcu, Dumitru-Clementin Cercel, Ionuţ Mironică, Ştefan Trăuşan-Matu -+ [One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise](https://arxiv.org//abs/2503.12301) ++ [One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise](https://arxiv.org/abs/2503.12301) Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, Sanjay Lall -+ [GAN-Based Single-Stage Defense for Traffic Sign Classification Under Adversarial Patch Attack](https://arxiv.org//abs/2503.12567) ++ [GAN-Based Single-Stage Defense for Traffic Sign Classification Under Adversarial Patch Attack](https://arxiv.org/abs/2503.12567) Abyad Enan, Mashrur Chowdhury -+ [Algebraic Adversarial Attacks on Explainability Models](https://arxiv.org//abs/2503.12683) ++ [Algebraic Adversarial Attacks on Explainability Models](https://arxiv.org/abs/2503.12683) Lachlan Simpson, Federico Costanza, Kyle Millar, Adriel Cheng, Cheng-Chew Lim, Hong Gunn Chew -+ [Towards Privacy-Preserving Data-Driven Education: The Potential of Federated Learning](https://arxiv.org//abs/2503.13550) ++ [Towards Privacy-Preserving Data-Driven Education: The Potential of Federated Learning](https://arxiv.org/abs/2503.13550) Mohammad Khalil, Ronas Shakya, Qinyi Liu # 2025-03-15 -+ [Winning the MIDST Challenge: New Membership Inference Attacks on Diffusion Models for Tabular Data Synthesis](https://arxiv.org//abs/2503.12008) ++ [Winning the MIDST Challenge: New Membership Inference Attacks on Diffusion Models for Tabular Data Synthesis](https://arxiv.org/abs/2503.12008) Xiaoyu Wu, Yifei Pang, Terrance Liu, Steven Wu -+ [Revisiting Training-Inference Trigger Intensity in Backdoor Attacks](https://arxiv.org//abs/2503.12058) ++ [Revisiting Training-Inference Trigger Intensity in Backdoor Attacks](https://arxiv.org/abs/2503.12058) Chenhao Lin, Chenyang Zhao, Shiwei Wang, Longtian Wang, Chao Shen, Zhengyu Zhao -+ [Robust Dataset Distillation by Matching Adversarial Trajectories](https://arxiv.org//abs/2503.12069) ++ [Robust Dataset Distillation by Matching Adversarial Trajectories](https://arxiv.org/abs/2503.12069) Wei Lai, Tianyu Ding, ren dongdong, Lei Wang, Jing Huo, Yang Gao, Wenbin Li -+ [A Bubble-Cluster Federated Learning Framework for Privacy-Preserving Demand Forecasting on Heterogeneous Retail Data](https://arxiv.org//abs/2503.12220) ++ [A Bubble-Cluster Federated Learning Framework for Privacy-Preserving Demand Forecasting on Heterogeneous Retail Data](https://arxiv.org/abs/2503.12220) Yunbo Long, Liming Xu, Ge Zheng, Alexandra Brintrup -+ [Multi-Agent Systems Execute Arbitrary Malicious Code](https://arxiv.org//abs/2503.12188) ++ [Multi-Agent Systems Execute Arbitrary Malicious Code](https://arxiv.org/abs/2503.12188) Harold Triedman, Rishi Jha, Vitaly Shmatikov # 2025-03-14 -+ [Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks](https://arxiv.org//abs/2503.11517) ++ [Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks](https://arxiv.org/abs/2503.11517) Diego Gosmar, Deborah A. Dahl, Dario Gosmar -+ [Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification](https://arxiv.org//abs/2503.11185) ++ [Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification](https://arxiv.org/abs/2503.11185) Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen -+ [PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders](https://arxiv.org//abs/2503.11232) ++ [PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders](https://arxiv.org/abs/2503.11232) Ahmed Frikha, Muhammad Reza Ar Razi, Krishna Kanth Nakka, Ricardo Mendes, Xue Jiang, Xuebing Zhou -+ [Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models](https://arxiv.org//abs/2503.11519) ++ [Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models](https://arxiv.org/abs/2503.11519) Hao Cheng, Erjia Xiao, Yichi Wang, Kaidi Xu, Mengshu Sun, Jindong Gu, Renjing Xu -+ [Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data](https://arxiv.org//abs/2503.11032) ++ [Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data](https://arxiv.org/abs/2503.11032) Lilin Zhang, Chengpei Wu, Ning Yang -+ [Are Deep Speech Denoising Models Robust to Adversarial Noise?](https://arxiv.org//abs/2503.11627) ++ [Are Deep Speech Denoising Models Robust to Adversarial Noise?](https://arxiv.org/abs/2503.11627) Will Schwarzer, Philip S. Thomas, Andrea Fanelli, Xiaoyu Liu -+ [Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense](https://arxiv.org//abs/2503.11619) ++ [Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense](https://arxiv.org/abs/2503.11619) Shuyang Hao, Yiwei Wang, Bryan Hooi, Ming-Hsuan Yang, Jun Liu, Chengcheng Tang, Zi Huang, Yujun Cai -+ [Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning](https://arxiv.org//abs/2503.11832) ++ [Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning](https://arxiv.org/abs/2503.11832) Yiwei Chen, Yuguang Yao, Yihua Zhang, Bingquan Shen, Gaowen Liu, Sijia Liu -+ [A Framework for Evaluating Emerging Cyberattack Capabilities of AI](https://arxiv.org//abs/2503.11917) ++ [A Framework for Evaluating Emerging Cyberattack Capabilities of AI](https://arxiv.org/abs/2503.11917) Mikel Rodriguez, Raluca Ada Popa, Four Flynn, Lihao Liang, Allan Dafoe, Anna Wang -+ [Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization](https://arxiv.org//abs/2503.11750) ++ [Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization](https://arxiv.org/abs/2503.11750) Shuyang Hao, Yiwei Wang, Bryan Hooi, Jun Liu, Muhao Chen, Zi Huang, Yujun Cai -+ [Trust Under Siege: Label Spoofing Attacks against Machine Learning for Android Malware Detection](https://arxiv.org//abs/2503.11841) ++ [Trust Under Siege: Label Spoofing Attacks against Machine Learning for Android Malware Detection](https://arxiv.org/abs/2503.11841) Tianwei Lan, Luca Demetrio, Farid Nait-Abdesselam, Yufei Han, Simone Aonzo -+ [reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs](https://arxiv.org//abs/2503.11751) ++ [reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs](https://arxiv.org/abs/2503.11751) Zhaofeng Wu, Michihiro Yasunaga, Andrew Cohen, Yoon Kim, Asli Celikyilmaz, Marjan Ghazvininejad # 2025-03-13 -+ [Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search](https://arxiv.org//abs/2503.10619) ++ [Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search](https://arxiv.org/abs/2503.10619) Andy Zhou -+ [Robustness Tokens: Towards Adversarial Robustness of Transformers](https://arxiv.org//abs/2503.10191) ++ [Robustness Tokens: Towards Adversarial Robustness of Transformers](https://arxiv.org/abs/2503.10191) Brian Pulfer, Yury Belousov, Slava Voloshynovskiy -+ [A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1](https://arxiv.org//abs/2503.10635) ++ [A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1](https://arxiv.org/abs/2503.10635) Zhaoyi Li, Xiaohan Zhao, Dong-Dong Wu, Jiacheng Cui, Zhiqiang Shen -+ [ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content](https://arxiv.org//abs/2503.09964) ++ [ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content](https://arxiv.org/abs/2503.09964) Bhavik Chandna, Mariam Aboujenane, Usman Naseem -+ [AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption](https://arxiv.org//abs/2503.10081) ++ [AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption](https://arxiv.org/abs/2503.10081) Joonsung Jeon, Woo Jae Kim, Suhyeon Ha, Sooel Son, Sung-eui Yoon -+ [Enhancing Facial Privacy Protection via Weakening Diffusion Purification](https://arxiv.org//abs/2503.10350) ++ [Enhancing Facial Privacy Protection via Weakening Diffusion Purification](https://arxiv.org/abs/2503.10350) Ali Salar, Qing Liu, Yingli Tian, Guoying Zhao -+ [MASQUE: A Text-Guided Diffusion-Based Framework for Localized and Customized Adversarial Makeup](https://arxiv.org//abs/2503.10549) ++ [MASQUE: A Text-Guided Diffusion-Based Framework for Localized and Customized Adversarial Makeup](https://arxiv.org/abs/2503.10549) Youngjin Kwon, Xiao Zhang -+ [Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology](https://arxiv.org//abs/2503.10629) ++ [Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology](https://arxiv.org/abs/2503.10629) Hashmat Shadab Malik, Shahina Kunhimon, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan -+ [Policy Teaching via Data Poisoning in Learning from Human Preferences](https://arxiv.org//abs/2503.10228) ++ [Policy Teaching via Data Poisoning in Learning from Human Preferences](https://arxiv.org/abs/2503.10228) Andi Nika, Jonathan Nöther, Debmalya Mandal, Parameswaran Kamalaruban, Adish Singla, Goran Radanović -+ [DP-GPL: Differentially Private Graph Prompt Learning](https://arxiv.org//abs/2503.10544) ++ [DP-GPL: Differentially Private Graph Prompt Learning](https://arxiv.org/abs/2503.10544) Jing Xu, Franziska Boenisch, Iyiola Emmanuel Olatunji, Adam Dziedzic -+ [ASIDE: Architectural Separation of Instructions and Data in Language Models](https://arxiv.org//abs/2503.10566) ++ [ASIDE: Architectural Separation of Instructions and Data in Language Models](https://arxiv.org/abs/2503.10566) Egor Zverev, Evgenii Kortukov, Alexander Panfilov, Soroush Tabesh, Alexandra Volkova, Sebastian Lapuschkin, Wojciech Samek, Christoph H. Lampert -+ [Targeted Data Poisoning for Black-Box Audio Datasets Ownership Verification](https://arxiv.org//abs/2503.10269) ++ [Targeted Data Poisoning for Black-Box Audio Datasets Ownership Verification](https://arxiv.org/abs/2503.10269) Wassim Bouaziz, El-Mahdi El-Mhamdi, Nicolas Usunier -+ [I Can Tell Your Secrets: Inferring Privacy Attributes from Mini-app Interaction History in Super-apps](https://arxiv.org//abs/2503.10239) ++ [I Can Tell Your Secrets: Inferring Privacy Attributes from Mini-app Interaction History in Super-apps](https://arxiv.org/abs/2503.10239) Yifeng Cai, Ziqi Zhang, Mengyu Yao, Junlin Liu, Xiaoke Zhao, Xinyi Fu, Ruoyu Li, Zhe Li, Xiangqun Chen, Yao Guo, Ding Li -+ [TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models](https://arxiv.org//abs/2503.10872) ++ [TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models](https://arxiv.org/abs/2503.10872) Xiangyu Yin, Yi Qi, Jinwei Hu, Zhen Chen, Yi Dong, Xingyu Zhao, Xiaowei Huang, Wenjie Ruan -+ [ChatGPT Encounters Morphing Attack Detection: Zero-Shot MAD with Multi-Modal Large Language Models and General Vision Models](https://arxiv.org//abs/2503.10937) ++ [ChatGPT Encounters Morphing Attack Detection: Zero-Shot MAD with Multi-Modal Large Language Models and General Vision Models](https://arxiv.org/abs/2503.10937) Haoyu Zhang, Raghavendra Ramachandra, Kiran Raja, Christoph Busch -+ [Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks](https://arxiv.org//abs/2503.11514) ++ [Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks](https://arxiv.org/abs/2503.11514) Pengxin Guo, Runxi Wang, Shuang Zeng, Jinjing Zhu, Haoning Jiang, Yanran Wang, Yuyin Zhou, Feifei Wang, Hui Xiong, Liangqiong Qu -+ [Attacking Multimodal OS Agents with Malicious Image Patches](https://arxiv.org//abs/2503.10809) ++ [Attacking Multimodal OS Agents with Malicious Image Patches](https://arxiv.org/abs/2503.10809) Lukas Aichberger, Alasdair Paren, Yarin Gal, Philip Torr, Adel Bibi # 2025-03-12 -+ [In-Context Defense in Computer Agents: An Empirical Study](https://arxiv.org//abs/2503.09241) ++ [In-Context Defense in Computer Agents: An Empirical Study](https://arxiv.org/abs/2503.09241) Pei Yang, Hai Ci, Mike Zheng Shou -+ [JBFuzz: Jailbreaking LLMs Efficiently and Effectively Using Fuzzing](https://arxiv.org//abs/2503.08990) ++ [JBFuzz: Jailbreaking LLMs Efficiently and Effectively Using Fuzzing](https://arxiv.org/abs/2503.08990) Vasudev Gohil -+ [Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States](https://arxiv.org//abs/2503.09066) ++ [Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States](https://arxiv.org/abs/2503.09066) Xin Wei Chia, Jonathan Pan -+ [Probing Network Decisions: Capturing Uncertainties and Unveiling Vulnerabilities Without Label Information](https://arxiv.org//abs/2503.09068) ++ [Probing Network Decisions: Capturing Uncertainties and Unveiling Vulnerabilities Without Label Information](https://arxiv.org/abs/2503.09068) Youngju Joung, Sehyun Lee, Jaesik Choi -+ [Robust Asymmetric Heterogeneous Federated Learning with Corrupted Clients](https://arxiv.org//abs/2503.09206) ++ [Robust Asymmetric Heterogeneous Federated Learning with Corrupted Clients](https://arxiv.org/abs/2503.09206) Xiuwen Fang, Mang Ye, Bo Du -+ [Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrity](https://arxiv.org//abs/2503.09365) ++ [Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrity](https://arxiv.org/abs/2503.09365) Daniel Jiménez-López, Nuria Rodríguez-Barroso, M. Victoria Luzón, Francisco Herrera -+ [Revealing Unintentional Information Leakage in Low-Dimensional Facial Portrait Representations](https://arxiv.org//abs/2503.09306) ++ [Revealing Unintentional Information Leakage in Low-Dimensional Facial Portrait Representations](https://arxiv.org/abs/2503.09306) Kathleen Anderson, Thomas Martinetz -+ [Stealthy Patch-Wise Backdoor Attack in 3D Point Cloud via Curvature Awareness](https://arxiv.org//abs/2503.09336) ++ [Stealthy Patch-Wise Backdoor Attack in 3D Point Cloud via Curvature Awareness](https://arxiv.org/abs/2503.09336) Yu Feng, Dingxin Zhang, Runkai Zhao, Yong Xia, Heng Huang, Weidong Cai -+ [C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion](https://arxiv.org//abs/2503.09095) ++ [C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion](https://arxiv.org/abs/2503.09095) Lijie Hu, Junchi Liao, Weimin Lyu, Shaopeng Fu, Tianhao Huang, Shu Yang, Guimin Hu, Di Wang -+ [AdvAD: Exploring Non-Parametric Diffusion for Imperceptible Adversarial Attacks](https://arxiv.org//abs/2503.09124) ++ [AdvAD: Exploring Non-Parametric Diffusion for Imperceptible Adversarial Attacks](https://arxiv.org/abs/2503.09124) Jin Li, Ziqiang He, Anwei Luo, Jian-Fang Hu, Z. Jane Wang, Xiangui Kang -+ [Quantitative Analysis of Deeply Quantized Tiny Neural Networks Robust to Adversarial Attacks](https://arxiv.org//abs/2503.08973) ++ [Quantitative Analysis of Deeply Quantized Tiny Neural Networks Robust to Adversarial Attacks](https://arxiv.org/abs/2503.08973) Idris Zakariyya, Ferheen Ayaz, Mounia Kharbouche-Harrari, Jeremy Singer, Sye Loong Keoh, Danilo Pau, José Cano -+ [Not All Edges are Equally Robust: Evaluating the Robustness of Ranking-Based Federated Learning](https://arxiv.org//abs/2503.08976) ++ [Not All Edges are Equally Robust: Evaluating the Robustness of Ranking-Based Federated Learning](https://arxiv.org/abs/2503.08976) Zirui Gong, Yanjun Zhang, Leo Yu Zhang, Zhaoxi Zhang, Yong Xiang, Shirui Pan -+ [Adaptive Backdoor Attacks with Reasonable Constraints on Graph Neural Networks](https://arxiv.org//abs/2503.09049) ++ [Adaptive Backdoor Attacks with Reasonable Constraints on Graph Neural Networks](https://arxiv.org/abs/2503.09049) Xuewen Dong, Jiachen Li, Shujun Li, Zhichao You, Qiang Qu, Yaroslav Kholodov, Yulong Shen -+ [Mitigating Membership Inference Vulnerability in Personalized Federated Learning](https://arxiv.org//abs/2503.09414) ++ [Mitigating Membership Inference Vulnerability in Personalized Federated Learning](https://arxiv.org/abs/2503.09414) Kangsoo Jung, Sayan Biswas, Catuscia Palamidessi -+ [Prompt Inversion Attack against Collaborative Inference of Large Language Models](https://arxiv.org//abs/2503.09022) ++ [Prompt Inversion Attack against Collaborative Inference of Large Language Models](https://arxiv.org/abs/2503.09022) Wenjie Qu, Yuguang Zhou, Yongji Wu, Tingsong Xiao, Binhang Yuan, Yiming Li, Jiaheng Zhang -+ [Prompt Inference Attack on Distributed Large Language Model Inference Frameworks](https://arxiv.org//abs/2503.09291) ++ [Prompt Inference Attack on Distributed Large Language Model Inference Frameworks](https://arxiv.org/abs/2503.09291) Xinjian Luo, Ting Yu, Xiaokui Xiao -+ [Detecting and Preventing Data Poisoning Attacks on AI Models](https://arxiv.org//abs/2503.09302) ++ [Detecting and Preventing Data Poisoning Attacks on AI Models](https://arxiv.org/abs/2503.09302) Halima I. Kure, Pradipta Sarkar, Ahmed B. Ndanusa, Augustine O. Nwajana -+ [AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents](https://arxiv.org//abs/2503.09780) ++ [AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents](https://arxiv.org/abs/2503.09780) Arman Zharmagambetov, Chuan Guo, Ivan Evtimov, Maya Pavlova, Ruslan Salakhutdinov, Kamalika Chaudhuri -+ [Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models](https://arxiv.org//abs/2503.09669) ++ [Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models](https://arxiv.org/abs/2503.09669) Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang -+ [Revisiting Backdoor Attacks on Time Series Classification in the Frequency Domain](https://arxiv.org//abs/2503.09712) ++ [Revisiting Backdoor Attacks on Time Series Classification in the Frequency Domain](https://arxiv.org/abs/2503.09712) Yuanmin Huang, Mi Zhang, Zhaoxiang Wang, Wenxuan Li, Min Yang -+ [Enhancing Adversarial Example Detection Through Model Explanation](https://arxiv.org//abs/2503.09735) ++ [Enhancing Adversarial Example Detection Through Model Explanation](https://arxiv.org/abs/2503.09735) Qian Ma, Ziping Ye -+ [How Feasible is Augmenting Fake Nodes with Learnable Features as a Counter-strategy against Link Stealing Attacks?](https://arxiv.org//abs/2503.09726) ++ [How Feasible is Augmenting Fake Nodes with Learnable Features as a Counter-strategy against Link Stealing Attacks?](https://arxiv.org/abs/2503.09726) Mir Imtiaz Mostafiz, Imtiaz Karim, Elisa Bertino -+ [All Your Knowledge Belongs to Us: Stealing Knowledge Graphs via Reasoning APIs](https://arxiv.org//abs/2503.09727) ++ [All Your Knowledge Belongs to Us: Stealing Knowledge Graphs via Reasoning APIs](https://arxiv.org/abs/2503.09727) Zhaohan Xi -+ [Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models](https://arxiv.org//abs/2503.10690) ++ [Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models](https://arxiv.org/abs/2503.10690) Shahnewaz Karim Sakib, Anindya Bijoy Das, Shibbir Ahmed # 2025-03-11 -+ [Generalized Kullback-Leibler Divergence Loss](https://arxiv.org//abs/2503.08038) ++ [Generalized Kullback-Leibler Divergence Loss](https://arxiv.org/abs/2503.08038) Jiequan Cui, Beier Zhu, Qingshan Xu, Zhuotao Tian, Xiaojuan Qi, Bei Yu, Hanwang Zhang, Richang Hong -+ [A Grey-box Text Attack Framework using Explainable AI](https://arxiv.org//abs/2503.08226) ++ [A Grey-box Text Attack Framework using Explainable AI](https://arxiv.org/abs/2503.08226) Esther Chiramal, Kelvin Soh Boon Kai -+ [Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks](https://arxiv.org//abs/2503.08269) ++ [Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks](https://arxiv.org/abs/2503.08269) Junying Wang, Hongyuan Zhang, Yuan Yuan -+ [MINT-Demo: Membership Inference Test Demonstrator](https://arxiv.org//abs/2503.08332) ++ [MINT-Demo: Membership Inference Test Demonstrator](https://arxiv.org/abs/2503.08332) Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana, Ruben Vera-Rodriguez -+ [Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation](https://arxiv.org//abs/2503.08195) ++ [Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation](https://arxiv.org/abs/2503.08195) Wenlong Meng, Fan Zhang, Wendao Yao, Zhenyuan Guo, Yuwei Li, Chengkun Wei, Wenzhi Chen -+ [Birds look like cars: Adversarial analysis of intrinsically interpretable deep learning](https://arxiv.org//abs/2503.08636) ++ [Birds look like cars: Adversarial analysis of intrinsically interpretable deep learning](https://arxiv.org/abs/2503.08636) Hubert Baniecki, Przemyslaw Biecek -+ [Interpreting the Repeated Token Phenomenon in Large Language Models](https://arxiv.org//abs/2503.08908) ++ [Interpreting the Repeated Token Phenomenon in Large Language Models](https://arxiv.org/abs/2503.08908) Itay Yona, Ilia Shumailov, Jamie Hayes, Federico Barbero, Yossi Gandelsman -+ [FairDeFace: Evaluating the Fairness and Adversarial Robustness of Face Obfuscation Methods](https://arxiv.org//abs/2503.08731) ++ [FairDeFace: Evaluating the Fairness and Adversarial Robustness of Face Obfuscation Methods](https://arxiv.org/abs/2503.08731) Seyyed Mohammad Sadegh Moosavi Khorzooghi, Poojitha Thota, Mohit Singhal, Abolfazl Asudeh, Gautam Das, Shirin Nilizadeh -+ [Enhanced Estimation Techniques for Certified Radii in Randomized Smoothing](https://arxiv.org//abs/2503.08801) ++ [Enhanced Estimation Techniques for Certified Radii in Randomized Smoothing](https://arxiv.org/abs/2503.08801) Zixuan Liang -+ [Seal Your Backdoor with Variational Defense](https://arxiv.org//abs/2503.08829) ++ [Seal Your Backdoor with Variational Defense](https://arxiv.org/abs/2503.08829) Ivan Sabolić, Matej Grcić, Siniša Šegvić # 2025-03-10 -+ [Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs](https://arxiv.org//abs/2503.07384) ++ [Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs](https://arxiv.org/abs/2503.07384) Gonzalo Mancera, Daniel de Alcala, Julian Fierrez, Ruben Tolosana, Aythami Morales -+ [TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models](https://arxiv.org//abs/2503.07389) ++ [TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models](https://arxiv.org/abs/2503.07389) Ruidong Chen, Honglin Guo, Lanjun Wang, Chenyu Zhang, Weizhi Nie, An-An Liu -+ [Efficient Membership Inference Attacks by Bayesian Neural Network](https://arxiv.org//abs/2503.07482) ++ [Efficient Membership Inference Attacks by Bayesian Neural Network](https://arxiv.org/abs/2503.07482) Zhenlong Liu, Wenyu Jiang, Feng Zhou, Hongxin Wei -+ [Runtime Detection of Adversarial Attacks in AI Accelerators Using Performance Counters](https://arxiv.org//abs/2503.07568) ++ [Runtime Detection of Adversarial Attacks in AI Accelerators Using Performance Counters](https://arxiv.org/abs/2503.07568) Habibur Rahaman, Atri Chatterjee, Swarup Bhunia -+ [CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation](https://arxiv.org//abs/2503.06950) ++ [CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation](https://arxiv.org/abs/2503.06950) Runqi Sui -+ [When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack](https://arxiv.org//abs/2503.06903) ++ [When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack](https://arxiv.org/abs/2503.06903) Hanqing Liu, Shouwei Ruan, Yao Huang, Shiji Zhao, Xingxing Wei -+ [MIGA: Mutual Information-Guided Attack on Denoising Models for Semantic Manipulation](https://arxiv.org//abs/2503.06966) ++ [MIGA: Mutual Information-Guided Attack on Denoising Models for Semantic Manipulation](https://arxiv.org/abs/2503.06966) Guanghao Li, Mingzhi Chen, Hao Yu, Shuting Dong, Wenhao Jiang, Ming Tang, Chun Yuan -+ [ConcreTizer: Model Inversion Attack via Occupancy Classification and Dispersion Control for 3D Point Cloud Restoration](https://arxiv.org//abs/2503.06986) ++ [ConcreTizer: Model Inversion Attack via Occupancy Classification and Dispersion Control for 3D Point Cloud Restoration](https://arxiv.org/abs/2503.06986) Youngseok Kim, Sunwook Hwang, Hyung-Sin Kim, Saewoong Bahk -+ [Breaking the Limits of Quantization-Aware Defenses: QADT-R for Robustness Against Patch-Based Adversarial Attacks in QNNs](https://arxiv.org//abs/2503.07058) ++ [Breaking the Limits of Quantization-Aware Defenses: QADT-R for Robustness Against Patch-Based Adversarial Attacks in QNNs](https://arxiv.org/abs/2503.07058) Amira Guesmi, Bassem Ouni, Muhammad Shafique -+ [Probabilistic Segmentation for Robust Field of View Estimation](https://arxiv.org//abs/2503.07375) ++ [Probabilistic Segmentation for Robust Field of View Estimation](https://arxiv.org/abs/2503.07375) R. Spencer Hallyburton, David Hunt, Yiwei He, Judy He, Miroslav Pajic -+ [Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs](https://arxiv.org//abs/2503.06989) ++ [Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs](https://arxiv.org/abs/2503.06989) Wenzhuo Xu, Zhipeng Wei, Xiongtao Sun, Deyue Zhang, Dongdong Yang, Quanchen Zou, Xiangzheng Zhang -+ [FedRand: Enhancing Privacy in Federated Learning with Randomized LoRA Subparameter Updates](https://arxiv.org//abs/2503.07216) ++ [FedRand: Enhancing Privacy in Federated Learning with Randomized LoRA Subparameter Updates](https://arxiv.org/abs/2503.07216) Sangwoo Park, Seanie Lee, Byungjoo Kim, Sung Ju Hwang -+ [Learning to Localize Leakage of Cryptographic Sensitive Variables](https://arxiv.org//abs/2503.07464) ++ [Learning to Localize Leakage of Cryptographic Sensitive Variables](https://arxiv.org/abs/2503.07464) Jimmy Gammell, Anand Raghunathan, Abolfazl Hashemi, Kaushik Roy -+ [Safety Guardrails for LLM-Enabled Robots](https://arxiv.org//abs/2503.07885) ++ [Safety Guardrails for LLM-Enabled Robots](https://arxiv.org/abs/2503.07885) Zachary Ravichandran, Alexander Robey, Vijay Kumar, George J. Pappas, Hamed Hassani -+ [TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors](https://arxiv.org//abs/2503.08708) ++ [TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors](https://arxiv.org/abs/2503.08708) Jingyi Zheng, Junfeng Wang, Zhen Sun, Wenhan Dong, Yule Liu, Xinlei He -+ [AuthorMist: Evading AI Text Detectors with Reinforcement Learning](https://arxiv.org//abs/2503.08716) ++ [AuthorMist: Evading AI Text Detectors with Reinforcement Learning](https://arxiv.org/abs/2503.08716) Isaac David, Arthur Gervais -+ [PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models](https://arxiv.org//abs/2503.07697) ++ [PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models](https://arxiv.org/abs/2503.07697) Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang # 2025-03-09 -+ [PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training](https://arxiv.org//abs/2503.06486) ++ [PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training](https://arxiv.org/abs/2503.06486) Cong Chen, Mingyu Liu, Chenchen Jing, Yizhou Zhou, Fengyun Rao, Hao Chen, Bo Zhang, Chunhua Shen -+ [Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation](https://arxiv.org//abs/2503.06519) ++ [Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation](https://arxiv.org/abs/2503.06519) Wenhui Zhang, Huiyu Xu, Zhibo Wang, Zeqing He, Ziqi Zhu, Kui Ren -+ [AnywhereDoor: Multi-Target Backdoor Attacks on Object Detection](https://arxiv.org//abs/2503.06529) ++ [AnywhereDoor: Multi-Target Backdoor Attacks on Object Detection](https://arxiv.org/abs/2503.06529) Jialin Lu, Junjie Shan, Ziqi Zhao, Ka-Ho Chow -+ [Enhancing NLP Robustness and Generalization through LLM-Generated Contrast Sets: A Scalable Framework for Systematic Evaluation and Adversarial Training](https://arxiv.org//abs/2503.06648) ++ [Enhancing NLP Robustness and Generalization through LLM-Generated Contrast Sets: A Scalable Framework for Systematic Evaluation and Adversarial Training](https://arxiv.org/abs/2503.06648) Hender Lin -+ [Privacy Auditing of Large Language Models](https://arxiv.org//abs/2503.06808) ++ [Privacy Auditing of Large Language Models](https://arxiv.org/abs/2503.06808) Ashwinee Panda, Xinyu Tang, Milad Nasr, Christopher A. Choquette-Choo, Prateek Mittal -+ [Long-tailed Adversarial Training with Self-Distillation](https://arxiv.org//abs/2503.06461) ++ [Long-tailed Adversarial Training with Self-Distillation](https://arxiv.org/abs/2503.06461) Seungju Cho, Hongsin Lee, Changick Kim -+ [MMARD: Improving the Min-Max Optimization Process in Adversarial Robustness Distillation](https://arxiv.org//abs/2503.06559) ++ [MMARD: Improving the Min-Max Optimization Process in Adversarial Robustness Distillation](https://arxiv.org/abs/2503.06559) Yuzheng Wang, Zhaoyu Chen, Dingkang Yang, Yuanhang Wang, Lizhe Qi -+ [NaviDet: Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation](https://arxiv.org//abs/2503.06453) ++ [NaviDet: Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation](https://arxiv.org/abs/2503.06453) Shengfang Zhai, Jiajun Li, Yue Liu, Huanran Chen, Zhihua Tian, Wenjie Qu, Qingni Shen, Ruoxi Jia, Yinpeng Dong, Jiaheng Zhang -+ [BDPFL: Backdoor Defense for Personalized Federated Learning via Explainable Distillation](https://arxiv.org//abs/2503.06554) ++ [BDPFL: Backdoor Defense for Personalized Federated Learning via Explainable Distillation](https://arxiv.org/abs/2503.06554) Chengcheng Zhu, Jiale Zhang, Di Wu, Guodong Long -+ [Life-Cycle Routing Vulnerabilities of LLM Router](https://arxiv.org//abs/2503.08704) ++ [Life-Cycle Routing Vulnerabilities of LLM Router](https://arxiv.org/abs/2503.08704) Qiqi Lin, Xiaoyang Ji, Shengfang Zhai, Qingni Shen, Zhi Zhang, Yuejian Fang, Yansong Gao # 2025-03-08 -+ [Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models](https://arxiv.org//abs/2503.06269) ++ [Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models](https://arxiv.org/abs/2503.06269) Thomas Winninger, Boussad Addad, Katarzyna Kapusta -+ [Mitigating Memorization in LLMs using Activation Steering](https://arxiv.org//abs/2503.06040) ++ [Mitigating Memorization in LLMs using Activation Steering](https://arxiv.org/abs/2503.06040) Manan Suri, Nishit Anand, Amisha Bhaskar -+ [Boosting the Local Invariance for Better Adversarial Transferability](https://arxiv.org//abs/2503.06140) ++ [Boosting the Local Invariance for Better Adversarial Transferability](https://arxiv.org/abs/2503.06140) Bohan Liu, Xiaosen Wang -+ [Reinforced Diffuser for Red Teaming Large Vision-Language Models](https://arxiv.org//abs/2503.06223) ++ [Reinforced Diffuser for Red Teaming Large Vision-Language Models](https://arxiv.org/abs/2503.06223) Ruofan Wang, Xiang Zheng, Xiaosen Wang, Cong Wang, Xingjun Ma -+ [Exploring Adversarial Transferability between Kolmogorov-arnold Networks](https://arxiv.org//abs/2503.06276) ++ [Exploring Adversarial Transferability between Kolmogorov-arnold Networks](https://arxiv.org/abs/2503.06276) Songping Wang, Xinquan Yue, Yueming Lyu, Caifeng Shan -+ [Adversarial Robustness of Discriminative Self-Supervised Learning in Vision](https://arxiv.org//abs/2503.06361) ++ [Adversarial Robustness of Discriminative Self-Supervised Learning in Vision](https://arxiv.org/abs/2503.06361) Ömer Veysel Çağatan, Ömer Faruk Tal, M. Emre Gürsoy -+ [FedEM: A Privacy-Preserving Framework for Concurrent Utility Preservation in Federated Learning](https://arxiv.org//abs/2503.06021) ++ [FedEM: A Privacy-Preserving Framework for Concurrent Utility Preservation in Federated Learning](https://arxiv.org/abs/2503.06021) Mingcong Xu, Xiaojin Zhang, Wei Chen, Hai Jin -+ [Do Fairness Interventions Come at the Cost of Privacy: Evaluations for Binary Classifiers](https://arxiv.org//abs/2503.06150) ++ [Do Fairness Interventions Come at the Cost of Privacy: Evaluations for Binary Classifiers](https://arxiv.org/abs/2503.06150) Huan Tian, Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, Wanlei Zhou -+ [Attackers Can Do Better: Over- and Understated Factors of Model Stealing Attacks](https://arxiv.org//abs/2503.06188) ++ [Attackers Can Do Better: Over- and Understated Factors of Model Stealing Attacks](https://arxiv.org/abs/2503.06188) Daryna Oliynyk, Rudolf Mayer, Andreas Rauber -+ [MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming](https://arxiv.org//abs/2503.06253) ++ [MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming](https://arxiv.org/abs/2503.06253) Stefan Schoepf, Muhammad Zaid Hameed, Ambrish Rawat, Kieran Fraser, Giulio Zizzo, Giandomenico Cornacchia, Mark Purcell -+ [Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation](https://arxiv.org//abs/2503.06254) ++ [Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation](https://arxiv.org/abs/2503.06254) Yinuo Liu, Zenghui Yuan, Guiyao Tie, Jiawen Shi, Lichao Sun, Neil Zhenqiang Gong -+ [Backdoor Attacks on Discrete Graph Diffusion Models](https://arxiv.org//abs/2503.06340) ++ [Backdoor Attacks on Discrete Graph Diffusion Models](https://arxiv.org/abs/2503.06340) Jiawen Wang, Samin Karim, Yuan Hong, Binghui Wang -+ [Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy](https://arxiv.org//abs/2503.07661) ++ [Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy](https://arxiv.org/abs/2503.07661) Wei Junhao, Yu Zhe, Sakuma Jun -+ [CeTAD: Towards Certified Toxicity-Aware Distance in Vision Language Models](https://arxiv.org//abs/2503.10661) ++ [CeTAD: Towards Certified Toxicity-Aware Distance in Vision Language Models](https://arxiv.org/abs/2503.10661) Xiangyu Yin, Jiaxu Liu, Zhen Chen, Jinwei Hu, Yi Dong, Xiaowei Huang, Wenjie Ruan @@ -13235,178 +13235,178 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yixin Wu, Feiran Zhang, Tianyuan Shi, Ruicheng Yin, Zhenghua Wang, Zhenliang Gan, Xiaohua Wang, Changze Lv, Xiaoqing Zheng, Xuanjing Huang # 2025-03-07 -+ [Jailbreaking is (Mostly) Simpler Than You Think](https://arxiv.org//abs/2503.05264) ++ [Jailbreaking is (Mostly) Simpler Than You Think](https://arxiv.org/abs/2503.05264) Mark Russinovich, Ahmed Salem -+ [Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models](https://arxiv.org//abs/2503.05595) ++ [Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models](https://arxiv.org/abs/2503.05595) Zheng Li, Liangbin Xie, Jiantao Zhou, Xintao Wang, Haiwei Wu, Jinyu Tian -+ [Safety-Critical Traffic Simulation with Adversarial Transfer of Driving Intentions](https://arxiv.org//abs/2503.05180) ++ [Safety-Critical Traffic Simulation with Adversarial Transfer of Driving Intentions](https://arxiv.org/abs/2503.05180) Zherui Huang, Xing Gao, Guanjie Zheng, Licheng Wen, Xuemeng Yang, Xiao Sun -+ [Robust Intrusion Detection System with Explainable Artificial Intelligence](https://arxiv.org//abs/2503.05303) ++ [Robust Intrusion Detection System with Explainable Artificial Intelligence](https://arxiv.org/abs/2503.05303) Betül Güvenç Paltun, Ramin Fuladi, Rim El Malki -+ [Are Your LLM-based Text-to-SQL Models Secure? Exploring SQL Injection via Backdoor Attacks](https://arxiv.org//abs/2503.05445) ++ [Are Your LLM-based Text-to-SQL Models Secure? Exploring SQL Injection via Backdoor Attacks](https://arxiv.org/abs/2503.05445) Meiyu Lin, Haichuan Zhang, Jiale Lao, Renyuan Li, Yuanchun Zhou, Carl Yang, Yang Cao, Mingjie Tang -+ [This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs](https://arxiv.org//abs/2503.05856) ++ [This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs](https://arxiv.org/abs/2503.05856) Lorenz Wolf, Sangwoong Yoon, Ilija Bogunovic # 2025-03-06 -+ [Activation Space Interventions Can Be Transferred Between Large Language Models](https://arxiv.org//abs/2503.04429) ++ [Activation Space Interventions Can Be Transferred Between Large Language Models](https://arxiv.org/abs/2503.04429) Narmeen Oozeer, Dhruv Nathawani, Nirmalendu Prakash, Michael Lan, Abir Harrasse, Amirali Abdullah -+ [Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization](https://arxiv.org//abs/2503.04315) ++ [Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization](https://arxiv.org/abs/2503.04315) Shuang Liu, Yihan Wang, Yifan Zhu, Yibo Miao, Xiao-Shan Gao -+ [Privacy Preserving and Robust Aggregation for Cross-Silo Federated Learning in Non-IID Settings](https://arxiv.org//abs/2503.04451) ++ [Privacy Preserving and Robust Aggregation for Cross-Silo Federated Learning in Non-IID Settings](https://arxiv.org/abs/2503.04451) Marco Arazzi, Mert Cihangiroglu, Antonino Nocera -+ [Scale-Invariant Adversarial Attack against Arbitrary-scale Super-resolution](https://arxiv.org//abs/2503.04385) ++ [Scale-Invariant Adversarial Attack against Arbitrary-scale Super-resolution](https://arxiv.org/abs/2503.04385) Yihao Huang, Xin Luo, Qing Guo, Felix Juefei-Xu, Xiaojun Jia, Weikai Miao, Geguang Pu, Yang Liu -+ [Controlled privacy leakage propagation throughout overlapping grouped learning](https://arxiv.org//abs/2503.04054) ++ [Controlled privacy leakage propagation throughout overlapping grouped learning](https://arxiv.org/abs/2503.04054) Shahrzad Kiani, Franziska Boenisch, Stark C. Draper -+ [Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges](https://arxiv.org//abs/2503.04474) ++ [Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges](https://arxiv.org/abs/2503.04474) Francisco Eiras, Eliott Zemour, Eric Lin, Vaikkunth Mugunthan -+ [The Challenge of Identifying the Origin of Black-Box Large Language Models](https://arxiv.org//abs/2503.04332) ++ [The Challenge of Identifying the Origin of Black-Box Large Language Models](https://arxiv.org/abs/2503.04332) Ziqing Yang, Yixin Wu, Yun Shen, Wei Dai, Michael Backes, Yang Zhang -+ [Poisoning Bayesian Inference via Data Deletion and Replication](https://arxiv.org//abs/2503.04480) ++ [Poisoning Bayesian Inference via Data Deletion and Replication](https://arxiv.org/abs/2503.04480) Matthieu Carreau, Roi Naveiro, William N. Caballero -+ [Runtime Backdoor Detection for Federated Learning via Representational Dissimilarity Analysis](https://arxiv.org//abs/2503.04473) ++ [Runtime Backdoor Detection for Federated Learning via Representational Dissimilarity Analysis](https://arxiv.org/abs/2503.04473) Xiyue Zhang, Xiaoyong Xue, Xiaoning Du, Xiaofei Xie, Yang Liu, Meng Sun -+ [From Pixels to Trajectory: Universal Adversarial Example Detection via Temporal Imprints](https://arxiv.org//abs/2503.04853) ++ [From Pixels to Trajectory: Universal Adversarial Example Detection via Temporal Imprints](https://arxiv.org/abs/2503.04853) Yansong Gao, Huaibing Peng, Hua Ma, Zhiyang Dai, Shuo Wang, Hongsheng Hu, Anmin Fu, Minhui Xue -+ [One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs](https://arxiv.org//abs/2503.04856) ++ [One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs](https://arxiv.org/abs/2503.04856) Junwoo Ha, Hyunjun Kim, Sangyoon Yu, Haon Park, Ashkan Yousefpour, Yuna Park, Suhyun Kim -+ [Energy-Latency Attacks: A New Adversarial Threat to Deep Learning](https://arxiv.org//abs/2503.04963) ++ [Energy-Latency Attacks: A New Adversarial Threat to Deep Learning](https://arxiv.org/abs/2503.04963) Hanene F. Z. Brachemi Meftah, Wassim Hamidouche, Sid Ahmed Fezza, Olivier Deforges -+ [Poisoning Attacks to Local Differential Privacy Protocols for Trajectory Data](https://arxiv.org//abs/2503.07483) ++ [Poisoning Attacks to Local Differential Privacy Protocols for Trajectory Data](https://arxiv.org/abs/2503.07483) I-Jung Hsu, Chih-Hsun Lin, Chia-Mu Yu, Sy-Yen Kuo, Chun-Ying Huang -+ [AgentSafe: Safeguarding Large Language Model-based Multi-agent Systems via Hierarchical Data Management](https://arxiv.org//abs/2503.04392) ++ [AgentSafe: Safeguarding Large Language Model-based Multi-agent Systems via Hierarchical Data Management](https://arxiv.org/abs/2503.04392) Junyuan Mao, Fanci Meng, Yifan Duan, Miao Yu, Xiaojun Jia, Junfeng Fang, Yuxuan Liang, Kun Wang, Qingsong Wen -+ [Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation](https://arxiv.org//abs/2503.04151) ++ [Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation](https://arxiv.org/abs/2503.04151) Jie Xu, Na Zhao, Gang Niu, Masashi Sugiyama, Xiaofeng Zhu -+ [DP-GTR: Differentially Private Prompt Protection via Group Text Rewriting](https://arxiv.org//abs/2503.04990) ++ [DP-GTR: Differentially Private Prompt Protection via Group Text Rewriting](https://arxiv.org/abs/2503.04990) Mingchen Li, Heng Fan, Song Fu, Junhua Ding, Yunhe Feng # 2025-03-05 -+ [CURVALID: Geometrically-guided Adversarial Prompt Detection](https://arxiv.org//abs/2503.03502) ++ [CURVALID: Geometrically-guided Adversarial Prompt Detection](https://arxiv.org/abs/2503.03502) Canaan Yung, Hanxun Huang, Sarah Monazam Erfani, Christopher Leckie -+ [Token-Level Privacy in Large Language Models](https://arxiv.org//abs/2503.03652) ++ [Token-Level Privacy in Large Language Models](https://arxiv.org/abs/2503.03652) Re'em Harel, Niv Gilboa, Yuval Pinter -+ [Improving LLM Safety Alignment with Dual-Objective Optimization](https://arxiv.org//abs/2503.03710) ++ [Improving LLM Safety Alignment with Dual-Objective Optimization](https://arxiv.org/abs/2503.03710) Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song -+ [Towards Effective and Sparse Adversarial Attack on Spiking Neural Networks via Breaking Invisible Surrogate Gradients](https://arxiv.org//abs/2503.03272) ++ [Towards Effective and Sparse Adversarial Attack on Spiking Neural Networks via Breaking Invisible Surrogate Gradients](https://arxiv.org/abs/2503.03272) Li Lun, Kunyu Feng, Qinglong Ni, Ling Liang, Yuan Wang, Ying Li, Dunshan Yu, Xiaoxin Cui -+ [CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP](https://arxiv.org//abs/2503.03613) ++ [CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP](https://arxiv.org/abs/2503.03613) Songlong Xing, Zhengyu Zhao, Nicu Sebe -+ [Towards Trustworthy Federated Learning](https://arxiv.org//abs/2503.03684) ++ [Towards Trustworthy Federated Learning](https://arxiv.org/abs/2503.03684) Alina Basharat, Yijun Bian, Ping Xu, Zhi Tian -+ [A Practical Memory Injection Attack against LLM Agents](https://arxiv.org//abs/2503.03704) ++ [A Practical Memory Injection Attack against LLM Agents](https://arxiv.org/abs/2503.03704) Shen Dong, Shaocheng Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, Zhen Xiang -+ [Data Poisoning Attacks to Locally Differentially Private Range Query Protocols](https://arxiv.org//abs/2503.03454) ++ [Data Poisoning Attacks to Locally Differentially Private Range Query Protocols](https://arxiv.org/abs/2503.03454) I-Jung Hsu, Chih-Hsun Lin, Chia-Mu Yu, Sy-Yen Kuo, Chun-Ying Huang -+ [PriFFT: Privacy-preserving Federated Fine-tuning of Large Language Models via Function Secret Sharing](https://arxiv.org//abs/2503.03146) ++ [PriFFT: Privacy-preserving Federated Fine-tuning of Large Language Models via Function Secret Sharing](https://arxiv.org/abs/2503.03146) Zhichao You, Xuewen Dong, Ke Cheng, Xutong Mu, Jiaxuan Fu, Shiyang Ma, Qiang Qu, Yulong Shen -+ [Task-Agnostic Attacks Against Vision Foundation Models](https://arxiv.org//abs/2503.03842) ++ [Task-Agnostic Attacks Against Vision Foundation Models](https://arxiv.org/abs/2503.03842) Brian Pulfer, Yury Belousov, Vitaliy Kinakh, Teddy Furon, Slava Voloshynovskiy -+ [GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors](https://arxiv.org//abs/2503.03944) ++ [GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors](https://arxiv.org/abs/2503.03944) Yaopei Zeng, Yuanpu Cao, Lu Lin -+ [Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks](https://arxiv.org//abs/2503.04833) ++ [Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks](https://arxiv.org/abs/2503.04833) Liming Lu, Shuchao Pang, Siyuan Liang, Haotian Zhu, Xiyu Zeng, Aishan Liu, Yunhuai Liu, Yongbin Zhou -+ [Adversarial Example Based Fingerprinting for Robust Copyright Protection in Split Learning](https://arxiv.org//abs/2503.04825) ++ [Adversarial Example Based Fingerprinting for Robust Copyright Protection in Split Learning](https://arxiv.org/abs/2503.04825) Zhangting Lin, Mingfu Xue, Kewei Chen, Wenmao Liu, Xiang Gao, Leo Yu Zhang, Jian Wang, Yushu Zhang @@ -13416,367 +13416,367 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song # 2025-03-04 -+ [LLM Misalignment via Adversarial RLHF Platforms](https://arxiv.org//abs/2503.03039) ++ [LLM Misalignment via Adversarial RLHF Platforms](https://arxiv.org/abs/2503.03039) Erfan Entezami, Ali Naseh -+ [Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis](https://arxiv.org//abs/2503.02986) ++ [Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis](https://arxiv.org/abs/2503.02986) Jeonghwan Park, Niall McLaughlin, Ihsen Alouani -+ [Adversarial Tokenization](https://arxiv.org//abs/2503.02174) ++ [Adversarial Tokenization](https://arxiv.org/abs/2503.02174) Renato Lui Geh, Zilei Shao, Guy Van den Broeck -+ [One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy](https://arxiv.org//abs/2503.02169) ++ [One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy](https://arxiv.org/abs/2503.02169) Jiacheng Zhang, Benjamin I. P. Rubinstein, Jingfeng Zhang, Feng Liu # 2025-03-03 -+ [Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness](https://arxiv.org//abs/2503.01345) ++ [Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness](https://arxiv.org/abs/2503.01345) Tingchen Fu, Fazl Barez -+ [Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification](https://arxiv.org//abs/2503.01407) ++ [Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification](https://arxiv.org/abs/2503.01407) Gaozheng Pei, Shaojie Lyu, Gong Chen, Ke Ma, Qianqian Xu, Yingfei Sun, Qingming Huang -+ [Position: Ensuring mutual privacy is necessary for effective external evaluation of proprietary AI systems](https://arxiv.org//abs/2503.01470) ++ [Position: Ensuring mutual privacy is necessary for effective external evaluation of proprietary AI systems](https://arxiv.org/abs/2503.01470) Ben Bucknall, Robert F. Trager, Michael A. Osborne -+ [Adversarial Agents: Black-Box Evasion Attacks with Reinforcement Learning](https://arxiv.org//abs/2503.01734) ++ [Adversarial Agents: Black-Box Evasion Attacks with Reinforcement Learning](https://arxiv.org/abs/2503.01734) Kyle Domico, Jean-Charles Noirot Ferrand, Ryan Sheatsley, Eric Pauley, Josiah Hanna, Patrick McDaniel -+ [Zero-Trust Artificial Intelligence Model Security Based on Moving Target Defense and Content Disarm and Reconstruction](https://arxiv.org//abs/2503.01758) ++ [Zero-Trust Artificial Intelligence Model Security Based on Moving Target Defense and Content Disarm and Reconstruction](https://arxiv.org/abs/2503.01758) Daniel Gilkarov, Ran Dubin -+ [AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses](https://arxiv.org//abs/2503.01811) ++ [AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses](https://arxiv.org/abs/2503.01811) Nicholas Carlini, Javier Rando, Edoardo Debenedetti, Milad Nasr, Florian Tramèr -+ [Jailbreaking Safeguarded Text-to-Image Models via Large Language Models](https://arxiv.org//abs/2503.01839) ++ [Jailbreaking Safeguarded Text-to-Image Models via Large Language Models](https://arxiv.org/abs/2503.01839) Zhengyuan Jiang, Yuepeng Hu, Yuchen Yang, Yinzhi Cao, Neil Zhenqiang Gong -+ [Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models](https://arxiv.org//abs/2503.01742) ++ [Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models](https://arxiv.org/abs/2503.01742) Alberto Purpura, Sahil Wadhwa, Jesse Zymet, Akshay Gupta, Andy Luo, Melissa Kazemi Rad, Swapnil Shinde, Mohammad Shahed Sorower -+ [Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models](https://arxiv.org//abs/2503.01781) ++ [Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models](https://arxiv.org/abs/2503.01781) Meghana Rajeev, Rajkumar Ramamurthy, Prapti Trivedi, Vikas Yadav, Oluwanifemi Bamgbose, Sathwik Tejaswi Madhusudan, James Zou, Nazneen Rajani -+ [Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models](https://arxiv.org//abs/2503.01208) ++ [Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models](https://arxiv.org/abs/2503.01208) Tianjie Ju, Yi Hua, Hao Fei, Zhenyu Shao, Yubin Zheng, Haodong Zhao, Mong-Li Lee, Wynne Hsu, Zhuosheng Zhang, Gongshen Liu -+ [Revisiting Locally Differentially Private Protocols: Towards Better Trade-offs in Privacy, Utility, and Attack Resistance](https://arxiv.org//abs/2503.01482) ++ [Revisiting Locally Differentially Private Protocols: Towards Better Trade-offs in Privacy, Utility, and Attack Resistance](https://arxiv.org/abs/2503.01482) Héber H. Arcolezi, Sébastien Gambs # 2025-03-02 -+ [Improving the Transferability of Adversarial Attacks by an Input Transpose](https://arxiv.org//abs/2503.00932) ++ [Improving the Transferability of Adversarial Attacks by an Input Transpose](https://arxiv.org/abs/2503.00932) Qing Wan, Shilong Deng, Xun Wang -+ [Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks](https://arxiv.org//abs/2503.00957) ++ [Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks](https://arxiv.org/abs/2503.00957) Chang Liu, Haolin Wu, Xi Yang, Kui Zhang, Cong Wu, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang -+ [DEAL: Data-Efficient Adversarial Learning for High-Quality Infrared Imaging](https://arxiv.org//abs/2503.00905) ++ [DEAL: Data-Efficient Adversarial Learning for High-Quality Infrared Imaging](https://arxiv.org/abs/2503.00905) Zhu Liu, Zijun Wang, Jinyuan Liu, Fanqi Meng, Long Ma, Risheng Liu -+ [AMUN: Adversarial Machine UNlearning](https://arxiv.org//abs/2503.00917) ++ [AMUN: Adversarial Machine UNlearning](https://arxiv.org/abs/2503.00917) Ali Ebrahimpour-Boroojeny, Hari Sundaram, Varun Chandrasekaran # 2025-03-01 -+ [Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems](https://arxiv.org//abs/2503.00383) ++ [Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems](https://arxiv.org/abs/2503.00383) Song Xia, Yi Yu, Wenhan Yang, Meiwen Ding, Zhuo Chen, Lingyu Duan, Alex C. Kot, Xudong Jiang -+ [A Survey of Adversarial Defenses in Vision-based Systems: Categorization, Methods and Challenges](https://arxiv.org//abs/2503.00384) ++ [A Survey of Adversarial Defenses in Vision-based Systems: Categorization, Methods and Challenges](https://arxiv.org/abs/2503.00384) Nandish Chattopadhyay, Abdul Basit, Bassem Ouni, Muhammad Shafique -+ [BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge](https://arxiv.org//abs/2503.00596) ++ [BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge](https://arxiv.org/abs/2503.00596) Terry Tong, Fei Wang, Zhe Zhao, Muhao Chen -+ [Adversarial Attacks on Event-Based Pedestrian Detectors: A Physical Approach](https://arxiv.org//abs/2503.00377) ++ [Adversarial Attacks on Event-Based Pedestrian Detectors: A Physical Approach](https://arxiv.org/abs/2503.00377) Guixu Lin, Muyao Niu, Qingtian Zhu, Zhengwei Yin, Zhuoxiao Li, Shengfeng He, Yinqiang Zheng # 2025-02-28 -+ [Exploring the Impact of Temperature Scaling in Softmax for Classification and Adversarial Robustness](https://arxiv.org//abs/2502.20604) ++ [Exploring the Impact of Temperature Scaling in Softmax for Classification and Adversarial Robustness](https://arxiv.org/abs/2502.20604) Hao Xuan, Bokai Yang, Xingyu Li -+ [Concealed Adversarial attacks on neural networks for sequential data](https://arxiv.org//abs/2502.20948) ++ [Concealed Adversarial attacks on neural networks for sequential data](https://arxiv.org/abs/2502.20948) Petr Sokerin, Dmitry Anikin, Sofia Krehova, Alexey Zaytsev -+ [Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing](https://arxiv.org//abs/2502.21041) ++ [Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing](https://arxiv.org/abs/2502.21041) Xuyang Zhong, Yixiao Huang, Chen Liu -+ [FC-Attack: Jailbreaking Large Vision-Language Models via Auto-Generated Flowcharts](https://arxiv.org//abs/2502.21059) ++ [FC-Attack: Jailbreaking Large Vision-Language Models via Auto-Generated Flowcharts](https://arxiv.org/abs/2502.21059) Ziyi Zhang, Zhen Sun, Zongmin Zhang, Jihui Guo, Xinlei He -+ [AuthSim: Towards Authentic and Effective Safety-critical Scenario Generation for Autonomous Driving Tests](https://arxiv.org//abs/2502.21100) ++ [AuthSim: Towards Authentic and Effective Safety-critical Scenario Generation for Autonomous Driving Tests](https://arxiv.org/abs/2502.21100) Yukuan Yang, Xucheng Lu, Zhili Zhang, Zepeng Wu, Guoqi Li, Lingzhong Meng, Yunzhi Xue -+ [Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models](https://arxiv.org//abs/2502.20650) ++ [Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models](https://arxiv.org/abs/2502.20650) Yu Pan, Bingrong Dai, Jiahao Chen, Lin Wang, Yi Du, Jiao Liu -+ [Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal](https://arxiv.org//abs/2502.20924) ++ [Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal](https://arxiv.org/abs/2502.20924) Haonan An, Guang Hua, Zhengru Fang, Guowen Xu, Susanto Rahardja, Yuguang Fang -+ [BadRefSR: Backdoor Attacks Against Reference-based Image Super Resolution](https://arxiv.org//abs/2502.20943) ++ [BadRefSR: Backdoor Attacks Against Reference-based Image Super Resolution](https://arxiv.org/abs/2502.20943) Xue Yang, Tao Chen, Lei Guo, Wenbo Jiang, Ji Guo, Yongming Li, Jiaming He -+ [Data-free Universal Adversarial Perturbation with Pseudo-semantic Prior](https://arxiv.org//abs/2502.21048) ++ [Data-free Universal Adversarial Perturbation with Pseudo-semantic Prior](https://arxiv.org/abs/2502.21048) Chanhui Lee, Yeonghwan Song, Jeany Son -+ [SafeText: Safe Text-to-image Models via Aligning the Text Encoder](https://arxiv.org//abs/2502.20623) ++ [SafeText: Safe Text-to-image Models via Aligning the Text Encoder](https://arxiv.org/abs/2502.20623) Yuepeng Hu, Zhengyuan Jiang, Neil Zhenqiang Gong -+ [QFAL: Quantum Federated Adversarial Learning](https://arxiv.org//abs/2502.21171) ++ [QFAL: Quantum Federated Adversarial Learning](https://arxiv.org/abs/2502.21171) Walid El Maouaki, Nouhaila Innan, Alberto Marchisio, Taoufik Said, Mohamed Bennai, Muhammad Shafique -+ [Efficient Jailbreaking of Large Models by Freeze Training: Lower Layers Exhibit Greater Sensitivity to Harmful Content](https://arxiv.org//abs/2502.20952) ++ [Efficient Jailbreaking of Large Models by Freeze Training: Lower Layers Exhibit Greater Sensitivity to Harmful Content](https://arxiv.org/abs/2502.20952) Hongyuan Shen, Min Zheng, Jincheng Wang, Yang Zhao -+ [Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis](https://arxiv.org//abs/2502.21286) ++ [Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis](https://arxiv.org/abs/2502.21286) Li Yang, Mirna El Rajab, Abdallah Shami, Sami Muhaidat -+ [Towards Privacy-Preserving Split Learning: Destabilizing Adversarial Inference and Reconstruction Attacks in the Cloud](https://arxiv.org//abs/2502.20629) ++ [Towards Privacy-Preserving Split Learning: Destabilizing Adversarial Inference and Reconstruction Attacks in the Cloud](https://arxiv.org/abs/2502.20629) Griffin Higgins, Roozbeh Razavi-Far, Xichen Zhang, Amir David, Ali Ghorbani, Tongyu Ge -+ [The Effect of Hop-count Modification Attack on Random Walk-based SLP Schemes Developed forWSNs: a Study](https://arxiv.org//abs/2502.20902) ++ [The Effect of Hop-count Modification Attack on Random Walk-based SLP Schemes Developed forWSNs: a Study](https://arxiv.org/abs/2502.20902) Manjula Rajaa, Anirban Ghoshb, Chukkapalli Praveen Kumarc, Suleiman Samba, C N Shariff -+ [The RAG Paradox: A Black-Box Attack Exploiting Unintentional Vulnerabilities in Retrieval-Augmented Generation Systems](https://arxiv.org//abs/2502.20995) ++ [The RAG Paradox: A Black-Box Attack Exploiting Unintentional Vulnerabilities in Retrieval-Augmented Generation Systems](https://arxiv.org/abs/2502.20995) Chanwoo Choi, Jinsoo Kim, Sukmin Cho, Soyeong Jeong, Buru Chang -+ [1-Lipschitz Network Initialization for Certifiably Robust Classification Applications: A Decay Problem](https://arxiv.org//abs/2503.00240) ++ [1-Lipschitz Network Initialization for Certifiably Robust Classification Applications: A Decay Problem](https://arxiv.org/abs/2503.00240) Marius F. R. Juston, William R. Norris, Dustin Nottage, Ahmet Soylemezoglu -+ [Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks](https://arxiv.org//abs/2503.00187) ++ [Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks](https://arxiv.org/abs/2503.00187) Hanjiang Hu, Alexander Robey, Changliu Liu -+ [Approaching the Harm of Gradient Attacks While Only Flipping Labels](https://arxiv.org//abs/2503.00140) ++ [Approaching the Harm of Gradient Attacks While Only Flipping Labels](https://arxiv.org/abs/2503.00140) Abdessamad El-Kabid, El-Mahdi El-Mhamdi -+ [UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning](https://arxiv.org//abs/2503.01908) ++ [UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning](https://arxiv.org/abs/2503.01908) Jiawei Zhang, Shuang Yang, Bo Li -+ [DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping](https://arxiv.org//abs/2502.20900) ++ [DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping](https://arxiv.org/abs/2502.20900) Yifan Zhong, Xuchuan Huang, Ruochong Li, Ceyao Zhang, Zhang Chen, Tianrui Guan, Fanlian Zeng, Ka Num Lui, Yuyao Ye, Yitao Liang, Yaodong Yang, Yuanpei Chen # 2025-02-27 -+ [DeePen: Penetration Testing for Audio Deepfake Detection](https://arxiv.org//abs/2502.20427) ++ [DeePen: Penetration Testing for Audio Deepfake Detection](https://arxiv.org/abs/2502.20427) Nicolas Müller, Piotr Kawa, Adriana Stan, Thien-Phuc Doan, Souhwan Jung, Wei Herng Choong, Philip Sperl, Konstantin Böttinger -+ [Foot-In-The-Door: A Multi-turn Jailbreak for LLMs](https://arxiv.org//abs/2502.19820) ++ [Foot-In-The-Door: A Multi-turn Jailbreak for LLMs](https://arxiv.org/abs/2502.19820) Zixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang -+ [Behind the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models](https://arxiv.org//abs/2502.19883) ++ [Behind the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models](https://arxiv.org/abs/2502.19883) Sibo Yi, Tianshuo Cong, Xinlei He, Qi Li, Jiaxing Song -+ [Protecting multimodal large language models against misleading visualizations](https://arxiv.org//abs/2502.20503) ++ [Protecting multimodal large language models against misleading visualizations](https://arxiv.org/abs/2502.20503) Jonathan Tonglet, Tinne Tuytelaars, Marie-Francine Moens, Iryna Gurevych -+ [Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets](https://arxiv.org//abs/2502.20246) ++ [Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets](https://arxiv.org/abs/2502.20246) Chi-Chien Tsai, Chia-Mu Yu, Ying-Dar Lin, Yu-Sung Wu, Wei-Bin Lee -+ [LISArD: Learning Image Similarity to Defend Against Gray-box Adversarial Attacks](https://arxiv.org//abs/2502.20562) ++ [LISArD: Learning Image Similarity to Defend Against Gray-box Adversarial Attacks](https://arxiv.org/abs/2502.20562) Joana C. Costa, Tiago Roxo, Hugo Proença, Pedro R. M. Inácio -+ [Out-of-distribution Generalization for Total Variation based Invariant Risk Minimization](https://arxiv.org//abs/2502.19665) ++ [Out-of-distribution Generalization for Total Variation based Invariant Risk Minimization](https://arxiv.org/abs/2502.19665) Yuanchao Wang, Zhao-Rong Lai, Tianqi Zhong -+ [NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary](https://arxiv.org//abs/2503.00063) ++ [NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary](https://arxiv.org/abs/2503.00063) Zezeng Li, Xiaoyu Du, Na Lei, Liming Chen, Weimin Wang -+ [Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents](https://arxiv.org//abs/2503.00061) ++ [Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents](https://arxiv.org/abs/2503.00061) Qiusi Zhan, Richard Fang, Henil Shalin Panchal, Daniel Kang -+ [ADAGE: Active Defenses Against GNN Extraction](https://arxiv.org//abs/2503.00065) ++ [ADAGE: Active Defenses Against GNN Extraction](https://arxiv.org/abs/2503.00065) Jing Xu, Franziska Boenisch, Adam Dziedzic -+ [CRFU: Compressive Representation Forgetting Against Privacy Leakage on Machine Unlearning](https://arxiv.org//abs/2503.00062) ++ [CRFU: Compressive Representation Forgetting Against Privacy Leakage on Machine Unlearning](https://arxiv.org/abs/2503.00062) Weiqi Wang, Chenhan Zhang, Zhiyi Tian, Shushu Liu, Shui Yu -+ [Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis](https://arxiv.org//abs/2502.20383) ++ [Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis](https://arxiv.org/abs/2502.20383) Jeffrey Yang Fan Chiang, Seungjae Lee, Jia-Bin Huang, Furong Huang, Yizheng Chen -+ [Prompt-driven Transferable Adversarial Attack on Person Re-Identification with Attribute-aware Textual Inversion](https://arxiv.org//abs/2502.19697) ++ [Prompt-driven Transferable Adversarial Attack on Person Re-Identification with Attribute-aware Textual Inversion](https://arxiv.org/abs/2502.19697) Yuan Bian, Min Liu, Yunqi Yi, Xueping Wang, Yaonan Wang # 2025-02-26 -+ [Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems](https://arxiv.org//abs/2502.19145) ++ [Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems](https://arxiv.org/abs/2502.19145) Pierre Peigne-Lefebvre, Mikolaj Kniejski, Filip Sondej, Matthieu David, Jason Hoelscher-Obermaier, Christian Schroeder de Witt, Esben Kran -+ [JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models](https://arxiv.org//abs/2502.18935) ++ [JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models](https://arxiv.org/abs/2502.18935) Shuyi Liu, Simiao Cui, Haoran Bu, Yuming Shang, Xi Zhang -+ [A Dual-Purpose Framework for Backdoor Defense and Backdoor Amplification in Diffusion Models](https://arxiv.org//abs/2502.19047) ++ [A Dual-Purpose Framework for Backdoor Defense and Backdoor Amplification in Diffusion Models](https://arxiv.org/abs/2502.19047) Vu Tuan Truong, Long Bao Le -+ [Evaluating Membership Inference Attacks in heterogeneous-data setups](https://arxiv.org//abs/2502.18986) ++ [Evaluating Membership Inference Attacks in heterogeneous-data setups](https://arxiv.org/abs/2502.18986) Bram van Dartel, Marc Damie, Florian Hahn -+ [Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs](https://arxiv.org//abs/2502.19041) ++ [Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs](https://arxiv.org/abs/2502.19041) Shiyu Xiang, Ansen Zhang, Yanfei Cao, Yang Fan, Ronghao Chen -+ [One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs](https://arxiv.org//abs/2502.18862) ++ [One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs](https://arxiv.org/abs/2502.18862) Jacob Dunefsky, Arman Cohan -+ [Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in CLIP](https://arxiv.org//abs/2502.19269) ++ [Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in CLIP](https://arxiv.org/abs/2502.19269) Jiawei Kong, Hao Fang, Sihang Guo, Chenxi Qing, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Ke Xu # 2025-02-25 -+ [MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks](https://arxiv.org//abs/2502.17832) ++ [MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks](https://arxiv.org/abs/2502.17832) Hyeonjeong Ha, Qiusi Zhan, Jeonghwan Kim, Dimitrios Bralios, Saikrishna Sanniboina, Nanyun Peng, Kai-wei Chang, Daniel Kang, Heng Ji -+ [CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification](https://arxiv.org//abs/2502.18176) ++ [CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification](https://arxiv.org/abs/2502.18176) Mingkun Zhang, Keping Bi, Wei Chen, Jiafeng Guo, Xueqi Cheng -+ [Examining the Threat Landscape: Foundation Models and Model Stealing](https://arxiv.org//abs/2502.18077) ++ [Examining the Threat Landscape: Foundation Models and Model Stealing](https://arxiv.org/abs/2502.18077) Ankita Raj, Deepankar Varma, Chetan Arora -+ [Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models](https://arxiv.org//abs/2502.18290) ++ [Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models](https://arxiv.org/abs/2502.18290) Zhaoyi Liu, Huan Zhang -+ [VVRec: Reconstruction Attacks on DL-based Volumetric Video Upstreaming via Latent Diffusion Model with Gamma Distribution](https://arxiv.org//abs/2502.17880) ++ [VVRec: Reconstruction Attacks on DL-based Volumetric Video Upstreaming via Latent Diffusion Model with Gamma Distribution](https://arxiv.org/abs/2502.17880) Rui Lu, Bihai Zhang, Dan Wang -+ [Model-Free Adversarial Purification via Coarse-To-Fine Tensor Network Representation](https://arxiv.org//abs/2502.17972) ++ [Model-Free Adversarial Purification via Coarse-To-Fine Tensor Network Representation](https://arxiv.org/abs/2502.17972) Guang Lin, Duc Thien Nguyen, Zerui Tao, Konstantinos Slavakis, Toshihisa Tanaka, Qibin Zhao -+ [Learning atomic forces from uncertainty-calibrated adversarial attacks](https://arxiv.org//abs/2502.18314) ++ [Learning atomic forces from uncertainty-calibrated adversarial attacks](https://arxiv.org/abs/2502.18314) Henrique Musseli Cezar, Tilmann Bodenstein, Henrik Andersen Sveinsson, Morten Ledum, Simen Reine, Sigbjørn Løland Bore -+ [Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs](https://arxiv.org//abs/2503.00037) ++ [Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs](https://arxiv.org/abs/2503.00037) Wei Zhao, Zhe Li, Yige Li, Jun Sun -+ [from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors](https://arxiv.org//abs/2503.00038) ++ [from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors](https://arxiv.org/abs/2503.00038) Yu Yan, Sheng Sun, Zenghao Duan, Teli Liu, Min Liu, Zhiyi Yin, Qi Li, Jiangyu Lei @@ -13791,112 +13791,112 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Hyeonjeong Ha, Qiusi Zhan, Jeonghwan Kim, Dimitrios Bralios, Saikrishna Sanniboina, Nanyun Peng, Kai-Wei Chang, Daniel Kang, Heng Ji # 2025-02-24 -+ [Emoti-Attack: Zero-Perturbation Adversarial Attacks on NLP Systems via Emoji Sequences](https://arxiv.org//abs/2502.17392) ++ [Emoti-Attack: Zero-Perturbation Adversarial Attacks on NLP Systems via Emoji Sequences](https://arxiv.org/abs/2502.17392) Yangshijie Zhang -+ [AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement](https://arxiv.org//abs/2502.16776) ++ [AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement](https://arxiv.org/abs/2502.16776) Zhexin Zhang, Leqi Lei, Junxiao Yang, Xijie Huang, Yida Lu, Shiyao Cui, Renmiao Chen, Qinglin Zhang, Xinyuan Wang, Hao Wang, Hao Li, Xianqi Lei, Chengwei Pan, Lei Sha, Hongning Wang, Minlie Huang -+ [VGFL-SA: Vertical Graph Federated Learning Structure Attack Based on Contrastive Learning](https://arxiv.org//abs/2502.16793) ++ [VGFL-SA: Vertical Graph Federated Learning Structure Attack Based on Contrastive Learning](https://arxiv.org/abs/2502.16793) Yang Chen, Bin Zhou -+ [Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs](https://arxiv.org//abs/2502.16901) ++ [Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs](https://arxiv.org/abs/2502.16901) Himanshu Beniwal, Sailesh Panda, Mayank Singh -+ [Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation](https://arxiv.org//abs/2502.17003) ++ [Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation](https://arxiv.org/abs/2502.17003) Wenyuan Wu, Zheng Liu, Yong Chen, Chao Su, Dezhong Peng, Xu Wang -+ [Improved Diffusion-based Generative Model with Better Adversarial Robustness](https://arxiv.org//abs/2502.17099) ++ [Improved Diffusion-based Generative Model with Better Adversarial Robustness](https://arxiv.org/abs/2502.17099) Zekun Wang, Mingyang Yi, Shuchen Xue, Zhenguo Li, Ming Liu, Bing Qin, Zhi-Ming Ma -+ [Adversarial Training for Defense Against Label Poisoning Attacks](https://arxiv.org//abs/2502.17121) ++ [Adversarial Training for Defense Against Label Poisoning Attacks](https://arxiv.org/abs/2502.17121) Melis Ilayda Bal, Volkan Cevher, Michael Muehlebach -+ [The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence](https://arxiv.org//abs/2502.17420) ++ [The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence](https://arxiv.org/abs/2502.17420) Tom Wollschläger, Jannes Elstner, Simon Geisler, Vincent Cohen-Addad, Stephan Günnemann, Johannes Gasteiger -+ [Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs](https://arxiv.org//abs/2502.17424) ++ [Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs](https://arxiv.org/abs/2502.17424) Jan Betley, Daniel Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans -+ [GuidedBench: Equipping Jailbreak Evaluation with Guidelines](https://arxiv.org//abs/2502.16903) ++ [GuidedBench: Equipping Jailbreak Evaluation with Guidelines](https://arxiv.org/abs/2502.16903) Ruixuan Huang, Xunguang Wang, Zongjie Li, Daoyuan Wu, Shuai Wang -+ [On the Vulnerability of Concept Erasure in Diffusion Models](https://arxiv.org//abs/2502.17537) ++ [On the Vulnerability of Concept Erasure in Diffusion Models](https://arxiv.org/abs/2502.17537) Lucas Beerens, Alex D. Richardson, Kaicheng Zhang, Dongdong Chen -+ [Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility](https://arxiv.org//abs/2502.17591) ++ [Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility](https://arxiv.org/abs/2502.17591) Martin Kuo, Jingyang Zhang, Jianyi Zhang, Minxue Tang, Louis DiValentin, Aolin Ding, Jingwei Sun, William Chen, Amin Hass, Tianlong Chen, Yiran Chen, Hai Li -+ [FedSV: Byzantine-Robust Federated Learning via Shapley Value](https://arxiv.org//abs/2502.17526) ++ [FedSV: Byzantine-Robust Federated Learning via Shapley Value](https://arxiv.org/abs/2502.17526) Khaoula Otmani (AU, LIA), Rachid Elazouzi (LIA, CMU), Vincent Labatut (AU, LIA) -+ [The Cyber Immune System: Harnessing Adversarial Forces for Security Resilience](https://arxiv.org//abs/2502.17698) ++ [The Cyber Immune System: Harnessing Adversarial Forces for Security Resilience](https://arxiv.org/abs/2502.17698) Krti Tallam -+ [Language Model Re-rankers are Fooled by Lexical Similarities](https://arxiv.org//abs/2502.17036) ++ [Language Model Re-rankers are Fooled by Lexical Similarities](https://arxiv.org/abs/2502.17036) Lovisa Hagström, Ercong Nie, Ruben Halifa, Helmut Schmid, Richard Johansson, Alexander Junge # 2025-02-23 -+ [Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images](https://arxiv.org//abs/2502.16593) ++ [Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images](https://arxiv.org/abs/2502.16593) Yubo Wang, Jianting Tang, Chaohu Liu, Linli Xu -+ [FedNIA: Noise-Induced Activation Analysis for Mitigating Data Poisoning in FL](https://arxiv.org//abs/2502.16396) ++ [FedNIA: Noise-Induced Activation Analysis for Mitigating Data Poisoning in FL](https://arxiv.org/abs/2502.16396) Ehsan Hallaji, Roozbeh Razavi-Far, Mehrdad Saif -+ [AdverX-Ray: Ensuring X-Ray Integrity Through Frequency-Sensitive Adversarial VAEs](https://arxiv.org//abs/2502.16610) ++ [AdverX-Ray: Ensuring X-Ray Integrity Through Frequency-Sensitive Adversarial VAEs](https://arxiv.org/abs/2502.16610) Francisco Caetano, Christiaan Viviers, Lena Filatova, Peter H. N. de With, Fons van der Sommen -+ [Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models](https://arxiv.org//abs/2502.16491) ++ [Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models](https://arxiv.org/abs/2502.16491) Yuyi Huang, Runzhe Zhan, Derek F. Wong, Lidia S. Chao, Ailin Tao -+ [Uncovering the Hidden Threat of Text Watermarking from Users with Cross-Lingual Knowledge](https://arxiv.org//abs/2502.16699) ++ [Uncovering the Hidden Threat of Text Watermarking from Users with Cross-Lingual Knowledge](https://arxiv.org/abs/2502.16699) Mansour Al Ghanim, Jiaqi Xue, Rochana Prih Hastuti, Mengxin Zheng, Yan Solihin, Qian Lou -+ [Swallowing the Poison Pills: Insights from Vulnerability Disparity Among LLMs](https://arxiv.org//abs/2502.18518) ++ [Swallowing the Poison Pills: Insights from Vulnerability Disparity Among LLMs](https://arxiv.org/abs/2502.18518) Peng Yifeng, Wu Zhizheng, Chen Chen -+ [Class-Conditional Neural Polarizer: A Lightweight and Effective Backdoor Defense by Purifying Poisoned Features](https://arxiv.org//abs/2502.18520) ++ [Class-Conditional Neural Polarizer: A Lightweight and Effective Backdoor Defense by Purifying Poisoned Features](https://arxiv.org/abs/2502.18520) Mingli Zhu, Shaokui Wei, Hongyuan Zha, Baoyuan Wu @@ -13905,51 +13905,51 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Saikat Barua, Mostafizur Rahman, Md Jafor Sadek, Rafiul Islam, Shehenaz Khaled, Ahmedul Kabir -+ [Can Indirect Prompt Injection Attacks Be Detected and Removed?](https://arxiv.org//abs/2502.16580) ++ [Can Indirect Prompt Injection Attacks Be Detected and Removed?](https://arxiv.org/abs/2502.16580) Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, Bryan Hooi -+ [Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension](https://arxiv.org//abs/2502.16523) ++ [Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension](https://arxiv.org/abs/2502.16523) Yulong Wu, Viktor Schlegel, Riza Batista-Navarro # 2025-02-22 -+ [Cross-Model Transferability of Adversarial Patches in Real-time Segmentation for Autonomous Driving](https://arxiv.org//abs/2502.16012) ++ [Cross-Model Transferability of Adversarial Patches in Real-time Segmentation for Autonomous Driving](https://arxiv.org/abs/2502.16012) Prashant Shekhar, Bidur Devkota, Dumindu Samaraweera, Laxima Niure Kandel, Manoj Babu -+ [A Survey of Model Extraction Attacks and Defenses in Distributed Computing Environments](https://arxiv.org//abs/2502.16065) ++ [A Survey of Model Extraction Attacks and Defenses in Distributed Computing Environments](https://arxiv.org/abs/2502.16065) Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong -+ [PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models](https://arxiv.org//abs/2502.16167) ++ [PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models](https://arxiv.org/abs/2502.16167) Xinwei Liu, Xiaojun Jia, Yuan Xun, Hua Zhang, Xiaochun Cao -+ [Verification of Bit-Flip Attacks against Quantized Neural Networks](https://arxiv.org//abs/2502.16286) ++ [Verification of Bit-Flip Attacks against Quantized Neural Networks](https://arxiv.org/abs/2502.16286) Yedi Zhang, Lei Huang, Pengfei Gao, Fu Song, Jun Sun, Jin Song Dong -+ [A generative approach to LLM harmfulness detection with special red flag tokens](https://arxiv.org//abs/2502.16366) ++ [A generative approach to LLM harmfulness detection with special red flag tokens](https://arxiv.org/abs/2502.16366) Sophie Xhonneux, David Dobre, Mehrnaz Mohfakhami, Leo Schwinn, Gauthier Gidel -+ [Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming](https://arxiv.org//abs/2502.16109) ++ [Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming](https://arxiv.org/abs/2502.16109) Rui Li, Peiyi Wang, Jingyuan Ma, Di Zhang, Lei Sha, Zhifang Sui -+ [REFINE: Inversion-Free Backdoor Defense via Model Reprogramming](https://arxiv.org//abs/2502.18508) ++ [REFINE: Inversion-Free Backdoor Defense via Model Reprogramming](https://arxiv.org/abs/2502.18508) Yukun Chen, Shuo Shao, Enhao Huang, Yiming Li, Pin-Yu Chen, Zhan Qin, Kui Ren -+ [ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models](https://arxiv.org//abs/2502.18511) ++ [ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models](https://arxiv.org/abs/2502.18511) Xuxu Liu, Siyuan Liang, Mengya Han, Yong Luo, Aishan Liu, Xiantao Cai, Zheng He, Dacheng Tao @@ -13960,61 +13960,61 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Ivoline Ngong, Swanand Kadhe, Hao Wang, Keerthiram Murugesan, Justin D. Weisz, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy # 2025-02-21 -+ [Methods and Trends in Detecting Generated Images: A Comprehensive Review](https://arxiv.org//abs/2502.15176) ++ [Methods and Trends in Detecting Generated Images: A Comprehensive Review](https://arxiv.org/abs/2502.15176) Arpan Mahara, Naphtali Rishe -+ [IPAD: Inverse Prompt for AI Detection -- A Robust and Explainable LLM-Generated Text Detector](https://arxiv.org//abs/2502.15902) ++ [IPAD: Inverse Prompt for AI Detection -- A Robust and Explainable LLM-Generated Text Detector](https://arxiv.org/abs/2502.15902) Zheng Chen, Yushi Feng, Changyang He, Yue Deng, Hongxi Pu, Bo Li -+ [TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice](https://arxiv.org//abs/2502.18504) ++ [TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice](https://arxiv.org/abs/2502.18504) Aman Goel, Xian Carrie Wu, Zhe Wang, Dmitriy Bespalov, Yanjun Qi -+ [SafeInt: Shielding Large Language Models from Jailbreak Attacks via Safety-Aware Representation Intervention](https://arxiv.org//abs/2502.15594) ++ [SafeInt: Shielding Large Language Models from Jailbreak Attacks via Safety-Aware Representation Intervention](https://arxiv.org/abs/2502.15594) Jiaqi Wu, Chen Chen, Chunyan Hou, Xiaojie Yuan -+ [A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare](https://arxiv.org//abs/2502.15871) ++ [A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare](https://arxiv.org/abs/2502.15871) Manar Aljohani, Jun Hou, Sindhura Kommu, Xuan Wang # 2025-02-20 -+ [How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation](https://arxiv.org//abs/2502.14486) ++ [How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation](https://arxiv.org/abs/2502.14486) Zhuohang Long, Siyuan Wang, Shujun Liu, Yuhang Lai, Xuanjing Huang, Zhongyu Wei -+ [CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models](https://arxiv.org//abs/2502.14529) ++ [CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models](https://arxiv.org/abs/2502.14529) Zhenhong Zhou, Zherui Li, Jie Zhang, Yuanhe Zhang, Kun Wang, Yang Liu, Qing Guo -+ [FUIA: Model Inversion Attack against Federated Unlearning](https://arxiv.org//abs/2502.14558) ++ [FUIA: Model Inversion Attack against Federated Unlearning](https://arxiv.org/abs/2502.14558) Lei Zhou, Youwen Zhu -+ [Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach](https://arxiv.org//abs/2502.14285) ++ [Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach](https://arxiv.org/abs/2502.14285) Yurong Wu, Fangwen Mu, Qiuhong Zhang, Jinjing Zhao, Xinrun Xu, Lingrui Mei, Yang Wu, Lin Shi, Junjie Wang, Zhiming Ding, Yiwei Wang -+ [CyberSentinel: An Emergent Threat Detection System for AI Security](https://arxiv.org//abs/2502.14966) ++ [CyberSentinel: An Emergent Threat Detection System for AI Security](https://arxiv.org/abs/2502.14966) Krti Tallam -+ [Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code Naturalness](https://arxiv.org//abs/2502.15830) ++ [Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code Naturalness](https://arxiv.org/abs/2502.15830) Weisong Sun, Yuchen Chen, Mengzhe Yuan, Chunrong Fang, Zhenpeng Chen, Chong Wang, Yang Liu, Baowen Xu, Zhenyu Chen -+ [Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models](https://arxiv.org//abs/2502.15836) ++ [Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models](https://arxiv.org/abs/2502.15836) Haokun Chen, Sebastian Szyller, Weilin Xu, Nageen Himayat @@ -14024,174 +14024,174 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Mark Russinovich, Ahmed Salem -+ [Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces](https://arxiv.org//abs/2502.14790) ++ [Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces](https://arxiv.org/abs/2502.14790) Alexander Terenin, Jeffrey Negrea -+ [Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models](https://arxiv.org//abs/2502.15086) ++ [Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models](https://arxiv.org/abs/2502.15086) Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Mehrab Tanjim, Sangwu Park, Kibum Kim, Chanyoung Park -+ [Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs](https://arxiv.org//abs/2502.14828) ++ [Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs](https://arxiv.org/abs/2502.14828) Xander Davies, Eric Winsor, Alexandra Souly, Tomek Korbak, Robert Kirk, Christian Schroeder de Witt, Yarin Gal # 2025-02-19 -+ [Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking](https://arxiv.org//abs/2502.13527) ++ [Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking](https://arxiv.org/abs/2502.13527) Yanzeng Li, Yunfan Xiong, Jialun Zhong, Jinchao Zhang, Jie Zhou, Lei Zou -+ [Efficient Safety Retrofitting Against Jailbreaking for LLMs](https://arxiv.org//abs/2502.13603) ++ [Efficient Safety Retrofitting Against Jailbreaking for LLMs](https://arxiv.org/abs/2502.13603) Dario Garcia-Gasulla, Anna Arias-Duart, Adrian Tormos, Daniel Hinjos, Oscar Molina-Sedano, Ashwin Kumar Gururajan, Maria Eugenia Cardello -+ [Secure Federated Data Distillation](https://arxiv.org//abs/2502.13728) ++ [Secure Federated Data Distillation](https://arxiv.org/abs/2502.13728) Marco Arazzi, Mert Cihangiroglu, Serena Nicolazzo, Antonino Nocera -+ [Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region](https://arxiv.org//abs/2502.13946) ++ [Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region](https://arxiv.org/abs/2502.13946) Chak Tou Leong, Qingyu Yin, Jian Wang, Wenjie Li -+ [PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models](https://arxiv.org//abs/2502.13564) ++ [PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models](https://arxiv.org/abs/2502.13564) Guangwei Li, Yuansen Zhang, Yinggui Wang, Shoumeng Yan, Lei Wang, Tao Wei -+ [Toward Robust Non-Transferable Learning: A Survey and Benchmark](https://arxiv.org//abs/2502.13593) ++ [Toward Robust Non-Transferable Learning: A Survey and Benchmark](https://arxiv.org/abs/2502.13593) Ziming Hong, Yongli Xiang, Tongliang Liu -+ [Contrastive Learning-Based privacy metrics in Tabular Synthetic Datasets](https://arxiv.org//abs/2502.13833) ++ [Contrastive Learning-Based privacy metrics in Tabular Synthetic Datasets](https://arxiv.org/abs/2502.13833) Milton Nicolás Plasencia Palacios, Sebastiano Saccani, Gabriele Sgroi, Alexander Boudewijn, Luca Bortolussi -+ [Poisoned Source Code Detection in Code Models](https://arxiv.org//abs/2502.13459) ++ [Poisoned Source Code Detection in Code Models](https://arxiv.org/abs/2502.13459) Ehab Ghannoum, Mohammad Ghafari -+ [A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos](https://arxiv.org//abs/2502.15806) ++ [A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos](https://arxiv.org/abs/2502.15806) Yang Yao, Xuan Tong, Ruofan Wang, Yixu Wang, Lujundong Li, Liang Liu, Yan Teng, Yingchun Wang -+ [The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text](https://arxiv.org//abs/2502.14921) ++ [The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text](https://arxiv.org/abs/2502.14921) Matthieu Meeus, Lukas Wutschitz, Santiago Zanella-Béguelin, Shruti Tople, Reza Shokri # 2025-02-18 -+ [Computational Safety for Generative AI: A Signal Processing Perspective](https://arxiv.org//abs/2502.12445) ++ [Computational Safety for Generative AI: A Signal Processing Perspective](https://arxiv.org/abs/2502.12445) Pin-Yu Chen -+ [Boosting Illuminant Estimation in Deep Color Constancy through Enhancing Brightness Robustness](https://arxiv.org//abs/2502.12418) ++ [Boosting Illuminant Estimation in Deep Color Constancy through Enhancing Brightness Robustness](https://arxiv.org/abs/2502.12418) Mengda Xie, Chengzhi Zhong, Yiling He, Zhan Qin, Meie Fang -+ [DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent](https://arxiv.org//abs/2502.12575) ++ [DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent](https://arxiv.org/abs/2502.12575) Pengyu Zhu, Zhenhong Zhou, Yuanhe Zhang, Shilinlu Yan, Kun Wang, Sen Su -+ [Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach](https://arxiv.org//abs/2502.12630) ++ [Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach](https://arxiv.org/abs/2502.12630) Tvrtko Sternak, Davor Runje, Dorian Granoša, Chi Wang -+ [The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1](https://arxiv.org//abs/2502.12659) ++ [The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1](https://arxiv.org/abs/2502.12659) Kaiwen Zhou, Chengzhi Liu, Xuandong Zhao, Shreedhar Jangam, Jayanth Srinivasa, Gaowen Liu, Dawn Song, Xin Eric Wang -+ [UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models](https://arxiv.org//abs/2502.13141) ++ [UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models](https://arxiv.org/abs/2502.13141) Huawei Lin, Yingjie Lao, Tong Geng, Tan Yu, Weijie Zhao -+ [Aspect-Guided Multi-Level Perturbation Analysis of Large Language Models in Automated Peer Review](https://arxiv.org//abs/2502.12510) ++ [Aspect-Guided Multi-Level Perturbation Analysis of Large Language Models in Automated Peer Review](https://arxiv.org/abs/2502.12510) Jiatao Li, Yanheng Li, Xinyu Hu, Mingqi Gao, Xiaojun Wan -+ [R.R.: Unveiling LLM Training Privacy through Recollection and Ranking](https://arxiv.org//abs/2502.12658) ++ [R.R.: Unveiling LLM Training Privacy through Recollection and Ranking](https://arxiv.org/abs/2502.12658) Wenlong Meng, Zhenyuan Guo, Lenan Wu, Chen Gong, Wenyan Liu, Weixian Li, Chengkun Wei, Wenzhi Chen -+ [H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking](https://arxiv.org//abs/2502.12893) ++ [H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking](https://arxiv.org/abs/2502.12893) Martin Kuo, Jianyi Zhang, Aolin Ding, Qinsi Wang, Louis DiValentin, Yujia Bao, Wei Wei, Da-Cheng Juan, Hai Li, Yiran Chen -+ [Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking](https://arxiv.org//abs/2502.12970) ++ [Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking](https://arxiv.org/abs/2502.12970) Junda Zhu, Lingyong Yan, Shuaiqiang Wang, Dawei Yin, Lei Sha -+ [AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks](https://arxiv.org//abs/2502.13053) ++ [AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks](https://arxiv.org/abs/2502.13053) Yurun Chen, Xueyu Hu, Keting Yin, Juncheng Li, Shengyu Zhang -+ [Iron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial Training](https://arxiv.org//abs/2502.12734) ++ [Iron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial Training](https://arxiv.org/abs/2502.12734) Yuanfan Li, Zhaohan Zhang, Chengzhengxu Li, Chao Shen, Xiaoming Liu -+ [Preventing the Popular Item Embedding Based Attack in Federated Recommendations](https://arxiv.org//abs/2502.12958) ++ [Preventing the Popular Item Embedding Based Attack in Federated Recommendations](https://arxiv.org/abs/2502.12958) Jun Zhang, Huan Li, Dazhong Rong, Yan Zhao, Ke Chen, Lidan Shou -+ [SoK: Understanding Vulnerabilities in the Large Language Model Supply Chain](https://arxiv.org//abs/2502.12497) ++ [SoK: Understanding Vulnerabilities in the Large Language Model Supply Chain](https://arxiv.org/abs/2502.12497) Shenao Wang, Yanjie Zhao, Zhao Liu, Quanchen Zou, Haoyu Wang -+ [Towards Robust and Secure Embodied AI: A Survey on Vulnerabilities and Attacks](https://arxiv.org//abs/2502.13175) ++ [Towards Robust and Secure Embodied AI: A Survey on Vulnerabilities and Attacks](https://arxiv.org/abs/2502.13175) Wenpeng Xing, Minghao Li, Mohan Li, Meng Han -+ [On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis](https://arxiv.org//abs/2502.13191) ++ [On the Privacy Risks of Spiking Neural Networks: A Membership Inference Analysis](https://arxiv.org/abs/2502.13191) Junyi Guan, Abhijith Sharma, Chong Tian, Salem Lahlou -+ [Pruning as a Defense: Reducing Memorization in Large Language Models](https://arxiv.org//abs/2502.15796) ++ [Pruning as a Defense: Reducing Memorization in Large Language Models](https://arxiv.org/abs/2502.15796) Mansi Gupta, Nikhar Waghela, Sarthak Gupta, Shourya Goel, Sanjif Shanmugavelu -+ [Decentralized and Robust Privacy-Preserving Model Using Blockchain-Enabled Federated Deep Learning in Intelligent Enterprises](https://arxiv.org//abs/2502.17485) ++ [Decentralized and Robust Privacy-Preserving Model Using Blockchain-Enabled Federated Deep Learning in Intelligent Enterprises](https://arxiv.org/abs/2502.17485) Reza Fotohi, Fereidoon Shams Aliee, Bahar Farahani -+ [Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer](https://arxiv.org//abs/2502.12964) ++ [Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer](https://arxiv.org/abs/2502.12964) Adi Simhi, Itay Itzhak, Fazl Barez, Gabriel Stanovsky, Yonatan Belinkov -+ [Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions](https://arxiv.org//abs/2502.12616) ++ [Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions](https://arxiv.org/abs/2502.12616) Leonardo Ranaldi, Marco Valentino, Andrè Freitas -+ [Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection](https://arxiv.org//abs/2502.13061) ++ [Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection](https://arxiv.org/abs/2502.13061) Jingbiao Mei, Jinghong Chen, Guangyu Yang, Weizhe Lin, Bill Byrne -+ [SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning](https://arxiv.org//abs/2502.12520) ++ [SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning](https://arxiv.org/abs/2502.12520) Junkai Chen, Zhijie Deng, Kening Zheng, Yibo Yan, Shuliang Liu, PeiJun Wu, Peijie Jiang, Jia Liu, Xuming Hu @@ -14200,32 +14200,32 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Pengyu Zhu, Zhenhong Zhou, Yuanhe Zhang, Shilinlu Yan, Kun Wang, Sen Su # 2025-02-17 -+ [Mimicking the Familiar: Dynamic Command Generation for Information Theft Attacks in LLM Tool-Learning System](https://arxiv.org//abs/2502.11358) ++ [Mimicking the Familiar: Dynamic Command Generation for Information Theft Attacks in LLM Tool-Learning System](https://arxiv.org/abs/2502.11358) Ziyou Jiang, Mingyang Li, Guowei Yang, Junjie Wang, Yuekai Huang, Zhiyuan Chang, Qing Wang -+ [Detecting Systematic Weaknesses in Vision Models along Predefined Human-Understandable Dimensions](https://arxiv.org//abs/2502.12360) ++ [Detecting Systematic Weaknesses in Vision Models along Predefined Human-Understandable Dimensions](https://arxiv.org/abs/2502.12360) Sujan Sai Gannamaneni, Rohil Prakash Rao, Michael Mock, Maram Akila, Stefan Wrobel -+ [Alignment and Adversarial Robustness: Are More Human-Like Models More Secure?](https://arxiv.org//abs/2502.12377) ++ [Alignment and Adversarial Robustness: Are More Human-Like Models More Secure?](https://arxiv.org/abs/2502.12377) Blaine Hoak, Kunyang Li, Patrick McDaniel -+ [Unveiling Privacy Risks in LLM Agent Memory](https://arxiv.org//abs/2502.13172) ++ [Unveiling Privacy Risks in LLM Agent Memory](https://arxiv.org/abs/2502.13172) Bo Wang, Weiyi He, Pengfei He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang -+ [Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives](https://arxiv.org//abs/2502.11858) ++ [Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives](https://arxiv.org/abs/2502.11858) Zeliang Zhang, Susan Liang, Daiki Shimada, Chenliang Xu -+ [DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing](https://arxiv.org//abs/2502.11647) ++ [DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing](https://arxiv.org/abs/2502.11647) Yi Wang, Fenghua Weng, Sibei Yang, Zhan Qin, Minlie Huang, Wenjie Wang @@ -14233,168 +14233,168 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Leyi Pan, Aiwei Liu, Shiyu Huang, Yijian Lu, Xuming Hu, Lijie Wen, Irwin King, Philip S. Yu -+ [StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models](https://arxiv.org//abs/2502.11853) ++ [StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models](https://arxiv.org/abs/2502.11853) Shehel Yoosuf, Temoor Ali, Ahmed Lekssays, Mashael AlSabah, Issa Khalil -+ [BackdoorDM: A Comprehensive Benchmark for Backdoor Learning on Diffusion Model](https://arxiv.org//abs/2502.11798) ++ [BackdoorDM: A Comprehensive Benchmark for Backdoor Learning on Diffusion Model](https://arxiv.org/abs/2502.11798) Weilin Lin, Nanjun Zhou, Yanyun Wang, Jianze Li, Hui Xiong, Li Liu # 2025-02-16 -+ [BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack](https://arxiv.org//abs/2502.12202) ++ [BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack](https://arxiv.org/abs/2502.12202) Zihao Zhu, Hongbao Zhang, Mingda Zhang, Ruotong Wang, Guanzong Wu, Ke Xu, Baoyuan Wu -+ [PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGAN](https://arxiv.org//abs/2502.12207) ++ [PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGAN](https://arxiv.org/abs/2502.12207) Jiayu Zhang, Zhiyu Zhu, Xinyi Wang, Silin Liao, Zhibo Jin, Flora D. Salim, Huaming Chen -+ [Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models](https://arxiv.org//abs/2502.11054) ++ [Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models](https://arxiv.org/abs/2502.11054) Zonghao Ying, Deyue Zhang, Zonglei Jing, Yisong Xiao, Quanchen Zou, Aishan Liu, Siyuan Liang, Xiangzheng Zhang, Xianglong Liu, Dacheng Tao -+ [SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks](https://arxiv.org//abs/2502.11090) ++ [SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks](https://arxiv.org/abs/2502.11090) Hongye Cao, Yanming Wang, Sijia Jing, Ziyue Peng, Zhixin Bai, Zhe Cao, Meng Fang, Fan Feng, Boyan Wang, Jiaheng Liu, Tianpei Yang, Jing Huo, Yang Gao, Fanyu Meng, Xi Yang, Chao Deng, Junlan Feng -+ [ALGEN: Few-shot Inversion Attacks on Textual Embeddings using Alignment and Generation](https://arxiv.org//abs/2502.11308) ++ [ALGEN: Few-shot Inversion Attacks on Textual Embeddings using Alignment and Generation](https://arxiv.org/abs/2502.11308) Yiyi Chen, Qiongkai Xu, Johannes Bjerva -+ [ShieldLearner: A New Paradigm for Jailbreak Attack Defense in LLMs](https://arxiv.org//abs/2502.13162) ++ [ShieldLearner: A New Paradigm for Jailbreak Attack Defense in LLMs](https://arxiv.org/abs/2502.13162) Ziyi Ni, Hao Wang, Huacan Wang -+ [Rewrite to Jailbreak: Discover Learnable and Transferable Implicit Harmfulness Instruction](https://arxiv.org//abs/2502.11084) ++ [Rewrite to Jailbreak: Discover Learnable and Transferable Implicit Harmfulness Instruction](https://arxiv.org/abs/2502.11084) Yuting Huang, Chengyuan Liu, Yifeng Feng, Yiquan Wu, Chao Wu, Fei Wu, Kun Kuang # 2025-02-15 -+ [CAE-Net: Generalized Deepfake Image Detection using Convolution and Attention Mechanisms with Spatial and Frequency Domain Features](https://arxiv.org//abs/2502.10682) ++ [CAE-Net: Generalized Deepfake Image Detection using Convolution and Attention Mechanisms with Spatial and Frequency Domain Features](https://arxiv.org/abs/2502.10682) Kafi Anan, Anindya Bhattacharjee, Ashir Intesher, Kaidul Islam, Abrar Assaeem Fuad, Utsab Saha, Hafiz Imtiaz -+ [Generalizable speech deepfake detection via meta-learned LoRA](https://arxiv.org//abs/2502.10838) ++ [Generalizable speech deepfake detection via meta-learned LoRA](https://arxiv.org/abs/2502.10838) Janne Laakkonen, Ivan Kukanov, Ville Hautamäki # 2025-02-14 -+ [Has My System Prompt Been Used? Large Language Model Prompt Membership Inference](https://arxiv.org//abs/2502.09974) ++ [Has My System Prompt Been Used? Large Language Model Prompt Membership Inference](https://arxiv.org/abs/2502.09974) Roman Levin, Valeriia Cherepanova, Abhimanyu Hans, Avi Schwarzschild, Tom Goldstein -+ [X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability](https://arxiv.org//abs/2502.09990) ++ [X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability](https://arxiv.org/abs/2502.09990) Xiaoya Lu, Dongrui Liu, Yi Yu, Luxin Xu, Jing Shao -+ [Adversarial Mixup Unlearning](https://arxiv.org//abs/2502.10288) ++ [Adversarial Mixup Unlearning](https://arxiv.org/abs/2502.10288) Zhuoyi Peng, Yixuan Tang, Yi Yang -+ [VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect](https://arxiv.org//abs/2502.10329) ++ [VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect](https://arxiv.org/abs/2502.10329) Qingyuan Fei, Wenjie Hou, Xuan Hai, Xin Liu -+ [VLM-Guard: Safeguarding Vision-Language Models via Fulfilling Safety Alignment Gap](https://arxiv.org//abs/2502.10486) ++ [VLM-Guard: Safeguarding Vision-Language Models via Fulfilling Safety Alignment Gap](https://arxiv.org/abs/2502.10486) Qin Liu, Fei Wang, Chaowei Xiao, Muhao Chen -+ [Fast Proxies for LLM Robustness Evaluation](https://arxiv.org//abs/2502.10487) ++ [Fast Proxies for LLM Robustness Evaluation](https://arxiv.org/abs/2502.10487) Tim Beyer, Jan Schuchardt, Leo Schwinn, Stephan Günnemann -+ [A Robust Attack: Displacement Backdoor Attack](https://arxiv.org//abs/2502.10490) ++ [A Robust Attack: Displacement Backdoor Attack](https://arxiv.org/abs/2502.10490) Yong Li, Han Gao # 2025-02-13 -+ [On the Promise for Assurance of Differentiable Neurosymbolic Reasoning Paradigms](https://arxiv.org//abs/2502.08932) ++ [On the Promise for Assurance of Differentiable Neurosymbolic Reasoning Paradigms](https://arxiv.org/abs/2502.08932) Luke E. Richards, Jessie Yaros, Jasen Babcock, Coung Ly, Robin Cosbey, Timothy Doster, Cynthia Matuszek -+ [RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage](https://arxiv.org//abs/2502.08966) ++ [RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage](https://arxiv.org/abs/2502.08966) Peter Yong Zhong, Siyuan Chen, Ruiqi Wang, McKenna McCall, Ben L. Titzer, Heather Miller -+ [RLSA-PFL: Robust Lightweight Secure Aggregation with Model Inconsistency Detection in Privacy-Preserving Federated Learning](https://arxiv.org//abs/2502.08989) ++ [RLSA-PFL: Robust Lightweight Secure Aggregation with Model Inconsistency Detection in Privacy-Preserving Federated Learning](https://arxiv.org/abs/2502.08989) Nazatul H. Sultan, Yan Bo, Yansong Gao, Seyit Camtepe, Arash Mahboubi, Hang Thanh Bui, Aufeef Chauhan, Hamed Aboutorab, Michael Bewong, Praveen Gauravaram, Rafiqul Islam, Sharif Abuadbba -+ [DynSegNet:Dynamic Architecture Adjustment for Adversarial Learning in Segmenting Hemorrhagic Lesions from Fundus Images](https://arxiv.org//abs/2502.09256) ++ [DynSegNet:Dynamic Architecture Adjustment for Adversarial Learning in Segmenting Hemorrhagic Lesions from Fundus Images](https://arxiv.org/abs/2502.09256) Zesheng Li, Minwen Liao, Haoran Chen, Yan Su, Chengchang Pan, Honggang Qi -+ [LiSA: Leveraging Link Recommender to Attack Graph Neural Networks via Subgraph Injection](https://arxiv.org//abs/2502.09271) ++ [LiSA: Leveraging Link Recommender to Attack Graph Neural Networks via Subgraph Injection](https://arxiv.org/abs/2502.09271) Wenlun Zhang, Enyan Dai, Kentaro Yoshioka -+ [Pulling Back the Curtain: Unsupervised Adversarial Detection via Contrastive Auxiliary Networks](https://arxiv.org//abs/2502.09110) ++ [Pulling Back the Curtain: Unsupervised Adversarial Detection via Contrastive Auxiliary Networks](https://arxiv.org/abs/2502.09110) Eylon Mizrahi, Raz Lapid, Moshe Sipper -+ [Redistribute Ensemble Training for Mitigating Memorization in Diffusion Models](https://arxiv.org//abs/2502.09434) ++ [Redistribute Ensemble Training for Mitigating Memorization in Diffusion Models](https://arxiv.org/abs/2502.09434) Xiaoliu Guan, Yu Wu, Huayang Huang, Xiao Liu, Jiaxu Miao, Yi Yang -+ [Wasserstein distributional adversarial training for deep neural networks](https://arxiv.org//abs/2502.09352) ++ [Wasserstein distributional adversarial training for deep neural networks](https://arxiv.org/abs/2502.09352) Xingjian Bai, Guangyi He, Yifan Jiang, Jan Obloj -+ [A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack](https://arxiv.org//abs/2502.09396) ++ [A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack](https://arxiv.org/abs/2502.09396) Richard J. Preen, Jim Smith -+ [SyntheticPop: Attacking Speaker Verification Systems With Synthetic VoicePops](https://arxiv.org//abs/2502.09553) ++ [SyntheticPop: Attacking Speaker Verification Systems With Synthetic VoicePops](https://arxiv.org/abs/2502.09553) Eshaq Jamdar, Amith Kamath Belman -+ [Siren Song: Manipulating Pose Estimation in XR Headsets Using Acoustic Attacks](https://arxiv.org//abs/2502.08865) ++ [Siren Song: Manipulating Pose Estimation in XR Headsets Using Acoustic Attacks](https://arxiv.org/abs/2502.08865) Zijian Huang, Yicheng Zhang, Sophie Chen, Nael Abu-Ghazaleh, Jiasi Chen -+ [Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models](https://arxiv.org//abs/2502.09723) ++ [Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models](https://arxiv.org/abs/2502.09723) Qingsong Zou, Jingyu Xiao, Qing Li, Zhi Yan, Yuhang Wang, Li Xu, Wenxuan Wang, Kuofeng Gao, Ruoyu Li, Yong Jiang -+ [On the robustness of multimodal language model towards distractions](https://arxiv.org//abs/2502.09818) ++ [On the robustness of multimodal language model towards distractions](https://arxiv.org/abs/2502.09818) Ming Liu, Hao Chen, Jindong Wang, Wensheng Zhang -+ [Enhancing Jailbreak Attacks via Compliance-Refusal-Based Initialization](https://arxiv.org//abs/2502.09755) ++ [Enhancing Jailbreak Attacks via Compliance-Refusal-Based Initialization](https://arxiv.org/abs/2502.09755) Amit Levi, Rom Himelstein, Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin -+ [The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Analysis of Orthogonal Safety Directions](https://arxiv.org//abs/2502.09674) ++ [The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Analysis of Orthogonal Safety Directions](https://arxiv.org/abs/2502.09674) Wenbo Pan, Zhichao Liu, Qiguang Chen, Xiangyang Zhou, Haining Yu, Xiaohua Jia @@ -14406,11 +14406,11 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Richard J. Preen, Jim Smith -+ [Scalable Private Partition Selection via Adaptive Weighting](https://arxiv.org//abs/2502.08878) ++ [Scalable Private Partition Selection via Adaptive Weighting](https://arxiv.org/abs/2502.08878) Justin Y. Chen, Vincent Cohen-Addad, Alessandro Epasto, Morteza Zadimoghaddam -+ [Differentially Private Compression and the Sensitivity of LZ77](https://arxiv.org//abs/2502.09584) ++ [Differentially Private Compression and the Sensitivity of LZ77](https://arxiv.org/abs/2502.09584) Jeremiah Blocki, Seunghoon Lee, Brayan Sebastián Yepes Garcia @@ -14419,132 +14419,132 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Amit Levi, Rom Himelstein, Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin # 2025-02-12 -+ [Compromising Honesty and Harmlessness in Language Models via Deception Attacks](https://arxiv.org//abs/2502.08301) ++ [Compromising Honesty and Harmlessness in Language Models via Deception Attacks](https://arxiv.org/abs/2502.08301) Laurène Vaugrante, Francesca Carlon, Maluna Menke, Thilo Hagendorff -+ [Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks](https://arxiv.org//abs/2502.08586) ++ [Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks](https://arxiv.org/abs/2502.08586) Ang Li, Yin Zhou, Vethavikashini Chithrra Raghuram, Tom Goldstein, Micah Goldblum -+ [MAA: Meticulous Adversarial Attack against Vision-Language Pre-trained Models](https://arxiv.org//abs/2502.08079) ++ [MAA: Meticulous Adversarial Attack against Vision-Language Pre-trained Models](https://arxiv.org/abs/2502.08079) Peng-Fei Zhang, Guangdong Bai, Zi Huang -+ [ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation](https://arxiv.org//abs/2502.08097) ++ [ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation](https://arxiv.org/abs/2502.08097) Qianrui Teng, Xing Cui, Xuannan Liu, Peipei Li, Zekun Li, Huaibo Huang, Ran He -+ [AdvSwap: Covert Adversarial Perturbation with High Frequency Info-swapping for Autonomous Driving Perception](https://arxiv.org//abs/2502.08374) ++ [AdvSwap: Covert Adversarial Perturbation with High Frequency Info-swapping for Autonomous Driving Perception](https://arxiv.org/abs/2502.08374) Yuanhao Huang, Qinfan Zhang, Jiandong Xing, Mengyue Cheng, Haiyang Yu, Yilong Ren, Xiao Xiong -+ [Cascading Bandits Robust to Adversarial Corruptions](https://arxiv.org//abs/2502.08077) ++ [Cascading Bandits Robust to Adversarial Corruptions](https://arxiv.org/abs/2502.08077) Jize Xie, Cheng Chen, Zhiyong Wang, Shuai Li -+ [SLVR: Securely Leveraging Client Validation for Robust Federated Learning](https://arxiv.org//abs/2502.08055) ++ [SLVR: Securely Leveraging Client Validation for Robust Federated Learning](https://arxiv.org/abs/2502.08055) Jihye Choi, Sai Rahul Rachuri, Ke Wang, Somesh Jha, Yizhen Wang -+ [General Coded Computing: Adversarial Settings](https://arxiv.org//abs/2502.08058) ++ [General Coded Computing: Adversarial Settings](https://arxiv.org/abs/2502.08058) Parsa Moradi, Hanzaleh Akbarinodehi, Mohammad Ali Maddah-Ali -+ [Provably Robust Federated Reinforcement Learning](https://arxiv.org//abs/2502.08123) ++ [Provably Robust Federated Reinforcement Learning](https://arxiv.org/abs/2502.08123) Minghong Fang, Xilong Wang, Neil Zhenqiang Gong -+ [Local Differential Privacy is Not Enough: A Sample Reconstruction Attack against Federated Learning with Local Differential Privacy](https://arxiv.org//abs/2502.08151) ++ [Local Differential Privacy is Not Enough: A Sample Reconstruction Attack against Federated Learning with Local Differential Privacy](https://arxiv.org/abs/2502.08151) Zhichao You, Xuewen Dong, Shujun Li, Ximeng Liu, Siqi Ma, Yulong Shen -+ [Typographic Attacks in a Multi-Image Setting](https://arxiv.org//abs/2502.08193) ++ [Typographic Attacks in a Multi-Image Setting](https://arxiv.org/abs/2502.08193) Xiaomeng Wang, Zhengyu Zhao, Martha Larson -+ [Investigating Vulnerabilities of GPS Trip Data to Trajectory-User Linking Attacks](https://arxiv.org//abs/2502.08217) ++ [Investigating Vulnerabilities of GPS Trip Data to Trajectory-User Linking Attacks](https://arxiv.org/abs/2502.08217) Benedikt Ströbl, Alexandra Kapp -+ [Quaternion-Hadamard Network: A Novel Defense Against Adversarial Attacks with a New Dataset](https://arxiv.org//abs/2502.10452) ++ [Quaternion-Hadamard Network: A Novel Defense Against Adversarial Attacks with a New Dataset](https://arxiv.org/abs/2502.10452) Vladimir Frants, Sos Agaian -+ [A Survey on Pre-Trained Diffusion Model Distillations](https://arxiv.org//abs/2502.08364) ++ [A Survey on Pre-Trained Diffusion Model Distillations](https://arxiv.org/abs/2502.08364) Xuhui Fan, Zhangkai Wu, Hongyu Wu # 2025-02-11 -+ [LUNAR: LLM Unlearning via Neural Activation Redirection](https://arxiv.org//abs/2502.07218) ++ [LUNAR: LLM Unlearning via Neural Activation Redirection](https://arxiv.org/abs/2502.07218) William F. Shen, Xinchi Qiu, Meghdad Kurmanji, Alex Iacob, Lorenzo Sani, Yihong Chen, Nicola Cancedda, Nicholas D. Lane -+ [No Data, No Optimization: A Lightweight Method To Disrupt Neural Networks With Sign-Flips](https://arxiv.org//abs/2502.07408) ++ [No Data, No Optimization: A Lightweight Method To Disrupt Neural Networks With Sign-Flips](https://arxiv.org/abs/2502.07408) Ido Galil, Moshe Kimhi, Ran El-Yaniv -+ [The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation](https://arxiv.org//abs/2502.07516) ++ [The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation](https://arxiv.org/abs/2502.07516) Raman Dutt -+ [Auditing Prompt Caching in Language Model APIs](https://arxiv.org//abs/2502.07776) ++ [Auditing Prompt Caching in Language Model APIs](https://arxiv.org/abs/2502.07776) Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, Tatsunori Hashimoto -+ [CAT: Contrastive Adversarial Training for Evaluating the Robustness of Protective Perturbations in Latent Diffusion Models](https://arxiv.org//abs/2502.07225) ++ [CAT: Contrastive Adversarial Training for Evaluating the Robustness of Protective Perturbations in Latent Diffusion Models](https://arxiv.org/abs/2502.07225) Sen Peng, Mingyue Wang, Jianfei He, Jijia Yang, Xiaohua Jia -+ [Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection](https://arxiv.org//abs/2502.07778) ++ [Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection](https://arxiv.org/abs/2502.07778) Anirudh Sundara Rajan, Yong Jae Lee -+ [Universal Adversarial Attack on Aligned Multimodal LLMs](https://arxiv.org//abs/2502.07987) ++ [Universal Adversarial Attack on Aligned Multimodal LLMs](https://arxiv.org/abs/2502.07987) Temurbek Rahmatullaev, Polina Druzhinina, Matvey Mikhalchuk, Andrey Kuznetsov, Anton Razzhigaev -+ [DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities](https://arxiv.org//abs/2502.07905) ++ [DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities](https://arxiv.org/abs/2502.07905) Chashi Mahiul Islam, Samuel Jacob Chacko, Preston Horne, Xiuwen Liu -+ [An Interactive Framework for Implementing Privacy-Preserving Federated Learning: Experiments on Large Language Models](https://arxiv.org//abs/2502.08008) ++ [An Interactive Framework for Implementing Privacy-Preserving Federated Learning: Experiments on Large Language Models](https://arxiv.org/abs/2502.08008) Kasra Ahmadi, Rouzbeh Behnia, Reza Ebrahimi, Mehran Mozaffari Kermani, Jeremiah Birrell, Jason Pacheco, Attila A Yavuz -+ [Optimal Actuator Attacks on Autonomous Vehicles Using Reinforcement Learning](https://arxiv.org//abs/2502.07839) ++ [Optimal Actuator Attacks on Autonomous Vehicles Using Reinforcement Learning](https://arxiv.org/abs/2502.07839) Pengyu Wang, Jialu Li, Ling Shi -+ [Unveiling Client Privacy Leakage from Public Dataset Usage in Federated Distillation](https://arxiv.org//abs/2502.08001) ++ [Unveiling Client Privacy Leakage from Public Dataset Usage in Federated Distillation](https://arxiv.org/abs/2502.08001) Haonan Shi, Tu Ouyang, An Wang -+ [Trustworthy AI on Safety, Bias, and Privacy: A Survey](https://arxiv.org//abs/2502.10450) ++ [Trustworthy AI on Safety, Bias, and Privacy: A Survey](https://arxiv.org/abs/2502.10450) Xingli Fang, Jianwei Li, Varun Mulchandani, Jung-Eun Kim @@ -14559,296 +14559,296 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca William F. Shen, Xinchi Qiu, Meghdad Kurmanji, Alex Iacob, Lorenzo Sani, Yihong Chen, Nicola Cancedda, Nicholas D. Lane # 2025-02-10 -+ [Hyperparameters in Score-Based Membership Inference Attacks](https://arxiv.org//abs/2502.06374) ++ [Hyperparameters in Score-Based Membership Inference Attacks](https://arxiv.org/abs/2502.06374) Gauri Pradhan, Joonas Jälkö, Marlon Tobaben, Antti Honkela -+ [Predictive Red Teaming: Breaking Policies Without Breaking Robots](https://arxiv.org//abs/2502.06575) ++ [Predictive Red Teaming: Breaking Policies Without Breaking Robots](https://arxiv.org/abs/2502.06575) Anirudha Majumdar, Mohit Sharma, Dmitry Kalashnikov, Sumeet Singh, Pierre Sermanet, Vikas Sindhwani -+ [When Data Manipulation Meets Attack Goals: An In-depth Survey of Attacks for VLMs](https://arxiv.org//abs/2502.06390) ++ [When Data Manipulation Meets Attack Goals: An In-depth Survey of Attacks for VLMs](https://arxiv.org/abs/2502.06390) Aobotao Dai, Xinyu Ma, Lei Chen, Songze Li, Lin Wang -+ [Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation](https://arxiv.org//abs/2502.06418) ++ [Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation](https://arxiv.org/abs/2502.06418) Zhongjie Ba, Yitao Zhang, Peng Cheng, Bin Gong, Xinyu Zhang, Qinglong Wang, Kui Ren -+ [Krum Federated Chain (KFC): Using blockchain to defend against adversarial attacks in Federated Learning](https://arxiv.org//abs/2502.06917) ++ [Krum Federated Chain (KFC): Using blockchain to defend against adversarial attacks in Federated Learning](https://arxiv.org/abs/2502.06917) Mario García-Márquez, Nuria Rodríguez-Barroso, M.Victoria Luzón, Francisco Herrera -+ [SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation](https://arxiv.org//abs/2502.07101) ++ [SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation](https://arxiv.org/abs/2502.07101) Saurabh Kumar Pandey, Sachin Vashistha, Debrup Das, Somak Aditya, Monojit Choudhury -+ [Amnesia as a Catalyst for Enhancing Black Box Pixel Attacks in Image Classification and Object Detection](https://arxiv.org//abs/2502.07821) ++ [Amnesia as a Catalyst for Enhancing Black Box Pixel Attacks in Image Classification and Object Detection](https://arxiv.org/abs/2502.07821) Dongsu Song, Daehwa Ko, Jay Hoon Jung -+ [DROP: Poison Dilution via Knowledge Distillation for Federated Learning](https://arxiv.org//abs/2502.07011) ++ [DROP: Poison Dilution via Knowledge Distillation for Federated Learning](https://arxiv.org/abs/2502.07011) Georgios Syros, Anshuman Suri, Farinaz Koushanfar, Cristina Nita-Rotaru, Alina Oprea # 2025-02-09 -+ [Mitigating Sensitive Information Leakage in LLMs4Code through Machine Unlearning](https://arxiv.org//abs/2502.05739) ++ [Mitigating Sensitive Information Leakage in LLMs4Code through Machine Unlearning](https://arxiv.org/abs/2502.05739) Ruotong Geng, Mingyang Geng, Shangwen Wang, Haotian Wang, Zhipeng Lin, Dezun Dong -+ [Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails](https://arxiv.org//abs/2502.05772) ++ [Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails](https://arxiv.org/abs/2502.05772) Yijun Yang, Lichao Wang, Xiao Yang, Lanqing Hong, Jun Zhu -+ [Impact of Data Poisoning Attacks on Feasibility and Optimality of Neural Power System Optimizers](https://arxiv.org//abs/2502.05727) ++ [Impact of Data Poisoning Attacks on Feasibility and Optimality of Neural Power System Optimizers](https://arxiv.org/abs/2502.05727) Nora Agah, Meiyi Li, Javad Mohammadi -+ [Filter, Obstruct and Dilute: Defending Against Backdoor Attacks on Semi-Supervised Learning](https://arxiv.org//abs/2502.05755) ++ [Filter, Obstruct and Dilute: Defending Against Backdoor Attacks on Semi-Supervised Learning](https://arxiv.org/abs/2502.05755) Xinrui Wang, Chuanxing Geng, Wenhai Wan, Shao-yuan Li, Songcan Chen -+ [GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation](https://arxiv.org//abs/2502.05780) ++ [GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation](https://arxiv.org/abs/2502.05780) Danny Wang, Ruihong Qiu, Guangdong Bai, Zi Huang -+ [Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks](https://arxiv.org//abs/2502.06892) ++ [Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks](https://arxiv.org/abs/2502.06892) Bowei He, Lihao Yin, Hui-Ling Zhen, Jianping Zhang, Lanqing Hong, Mingxuan Yuan, Chen Ma -+ [Jailbreaking to Jailbreak](https://arxiv.org//abs/2502.09638) ++ [Jailbreaking to Jailbreak](https://arxiv.org/abs/2502.09638) Jeremy Kritz, Vaughn Robinson, Robert Vacareanu, Bijan Varjavand, Michael Choi, Bobby Gogov, Scale Red Team, Summer Yue, Willow E. Primack, Zifan Wang -+ [RAMer: Reconstruction-based Adversarial Model for Multi-party Multi-modal Multi-label Emotion Recognition](https://arxiv.org//abs/2502.10435) ++ [RAMer: Reconstruction-based Adversarial Model for Multi-party Multi-modal Multi-label Emotion Recognition](https://arxiv.org/abs/2502.10435) Xudong Yang, Yizhang Zhu, Nan Tang, Yuyu Luo -+ [Injecting Universal Jailbreak Backdoors into LLMs in Minutes](https://arxiv.org//abs/2502.10438) ++ [Injecting Universal Jailbreak Backdoors into LLMs in Minutes](https://arxiv.org/abs/2502.10438) Zhuowei Chen, Qiannan Zhang, Shichao Pei -+ [HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models](https://arxiv.org//abs/2502.05945) ++ [HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models](https://arxiv.org/abs/2502.05945) Paul Darm, Annalisa Riccardi -+ [Privacy-Preserving Dataset Combination](https://arxiv.org//abs/2502.05765) ++ [Privacy-Preserving Dataset Combination](https://arxiv.org/abs/2502.05765) Keren Fuentes, Mimee Xu, Irene Chen # 2025-02-08 -+ [Dual Defense: Enhancing Privacy and Mitigating Poisoning Attacks in Federated Learning](https://arxiv.org//abs/2502.05547) ++ [Dual Defense: Enhancing Privacy and Mitigating Poisoning Attacks in Federated Learning](https://arxiv.org/abs/2502.05547) Runhua Xu, Shiqi Gao, Chao Li, James Joshi, Jianxin Li -+ [Adversarial Machine Learning: Attacks, Defenses, and Open Challenges](https://arxiv.org//abs/2502.05637) ++ [Adversarial Machine Learning: Attacks, Defenses, and Open Challenges](https://arxiv.org/abs/2502.05637) Pranav K Jha -+ [Rigid Body Adversarial Attacks](https://arxiv.org//abs/2502.05669) ++ [Rigid Body Adversarial Attacks](https://arxiv.org/abs/2502.05669) Aravind Ramakrishnan, David I.W. Levin, Alec Jacobson -+ [Do Spikes Protect Privacy? Investigating Black-Box Model Inversion Attacks in Spiking Neural Networks](https://arxiv.org//abs/2502.05509) ++ [Do Spikes Protect Privacy? Investigating Black-Box Model Inversion Attacks in Spiking Neural Networks](https://arxiv.org/abs/2502.05509) Hamed Poursiami, Ayana Moshruba, Maryam Parsa -+ [Democratic Training Against Universal Adversarial Perturbations](https://arxiv.org//abs/2502.05542) ++ [Democratic Training Against Universal Adversarial Perturbations](https://arxiv.org/abs/2502.05542) Bing Sun, Jun Sun, Wei Zhao -+ [Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey](https://arxiv.org//abs/2502.06872) ++ [Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey](https://arxiv.org/abs/2502.06872) Bo Ni, Zheyuan Liu, Leyao Wang, Yongjia Lei, Yuying Zhao, Xueqi Cheng, Qingkai Zeng, Luna Dong, Yinglong Xia, Krishnaram Kenthapadi, Ryan Rossi, Franck Dernoncourt, Md Mehrab Tanjim, Nesreen Ahmed, Xiaorui Liu, Wenqi Fan, Erik Blasch, Yu Wang, Meng Jiang, Tyler Derr -+ [The Odyssey of the Fittest: Can Agents Survive and Still Be Good?](https://arxiv.org//abs/2502.05442) ++ [The Odyssey of the Fittest: Can Agents Survive and Still Be Good?](https://arxiv.org/abs/2502.05442) Dylan Waldner, Risto Miikkulainen # 2025-02-07 -+ [DMPA: Model Poisoning Attacks on Decentralized Federated Learning for Model Differences](https://arxiv.org//abs/2502.04771) ++ [DMPA: Model Poisoning Attacks on Decentralized Federated Learning for Model Differences](https://arxiv.org/abs/2502.04771) Chao Feng, Yunlong Li, Yuanzhe Gao, Alberto Huertas Celdrán, Jan von der Assen, Gérôme Bovet, Burkhard Stiller -+ [Robust Graph Learning Against Adversarial Evasion Attacks via Prior-Free Diffusion-Based Structure Purification](https://arxiv.org//abs/2502.05000) ++ [Robust Graph Learning Against Adversarial Evasion Attacks via Prior-Free Diffusion-Based Structure Purification](https://arxiv.org/abs/2502.05000) Jiayi Luo, Qingyun Sun, Haonan Yuan, Xingcheng Fu, Jianxin Li -+ [Federated Learning for Anomaly Detection in Energy Consumption Data: Assessing the Vulnerability to Adversarial Attacks](https://arxiv.org//abs/2502.05041) ++ [Federated Learning for Anomaly Detection in Energy Consumption Data: Assessing the Vulnerability to Adversarial Attacks](https://arxiv.org/abs/2502.05041) Yohannis Kifle Telila, Damitha Senevirathne, Dumindu Tissera, Apurva Narayan, Miriam A.M. Capretz, Katarina Grolinger -+ [ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework](https://arxiv.org//abs/2502.05084) ++ [ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework](https://arxiv.org/abs/2502.05084) Xiaoyu Deng, Ye Zhang, Tianmin Guo, Yongzhe Zhang, Zhengjian Kang, Hang Yang -+ [Confidence Elicitation: A New Attack Vector for Large Language Models](https://arxiv.org//abs/2502.04643) ++ [Confidence Elicitation: A New Attack Vector for Large Language Models](https://arxiv.org/abs/2502.04643) Brian Formento, Chuan Sheng Foo, See-Kiong Ng -+ [ELITE: Enhanced Language-Image Toxicity Evaluation for Safety](https://arxiv.org//abs/2502.04757) ++ [ELITE: Enhanced Language-Image Toxicity Evaluation for Safety](https://arxiv.org/abs/2502.04757) Wonjun Lee, Doehyeon Lee, Eugene Choi, Sangyoon Yu, Ashkan Yousefpour, Haon Park, Bumsub Ham, Suhyun Kim -+ [Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers](https://arxiv.org//abs/2502.04679) ++ [Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers](https://arxiv.org/abs/2502.04679) Chashi Mahiul Islam, Samuel Jacob Chacko, Mao Nishino, Xiuwen Liu -+ [Adversarially-Robust TD Learning with Markovian Data: Finite-Time Rates and Fundamental Limits](https://arxiv.org//abs/2502.04662) ++ [Adversarially-Robust TD Learning with Markovian Data: Finite-Time Rates and Fundamental Limits](https://arxiv.org/abs/2502.04662) Sreejeet Maity, Aritra Mitra -+ [Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning](https://arxiv.org//abs/2502.04890) ++ [Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning](https://arxiv.org/abs/2502.04890) Yuchen Liu, Chen Chen, Lingjuan Lyu, Yaochu Jin, Gang Chen -+ [Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond](https://arxiv.org//abs/2502.05374) ++ [Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond](https://arxiv.org/abs/2502.05374) Chongyu Fan, Jinghan Jia, Yihua Zhang, Anil Ramakrishna, Mingyi Hong, Sijia Liu -+ [Training Set Reconstruction from Differentially Private Forests: How Effective is DP?](https://arxiv.org//abs/2502.05307) ++ [Training Set Reconstruction from Differentially Private Forests: How Effective is DP?](https://arxiv.org/abs/2502.05307) Alice Gorgé, Julien Ferry, Sébastien Gambs, Thibaut Vidal -+ [From Counterfactuals to Trees: Competitive Analysis of Model Extraction Attacks](https://arxiv.org//abs/2502.05325) ++ [From Counterfactuals to Trees: Competitive Analysis of Model Extraction Attacks](https://arxiv.org/abs/2502.05325) Awa Khouna, Julien Ferry, Thibaut Vidal -+ [Removing Neural Signal Artifacts with Autoencoder-Targeted Adversarial Transformers (AT-AT)](https://arxiv.org//abs/2502.05332) ++ [Removing Neural Signal Artifacts with Autoencoder-Targeted Adversarial Transformers (AT-AT)](https://arxiv.org/abs/2502.05332) Benjamin J. Choi -+ [CP-Guard+: A New Paradigm for Malicious Agent Detection and Defense in Collaborative Perception](https://arxiv.org//abs/2502.07807) ++ [CP-Guard+: A New Paradigm for Malicious Agent Detection and Defense in Collaborative Perception](https://arxiv.org/abs/2502.07807) Senkang Hu, Yihang Tao, Zihan Fang, Guowen Xu, Yiqin Deng, Sam Kwong, Yuguang Fang -+ [MELON: Provable Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison](https://arxiv.org//abs/2502.05174) ++ [MELON: Provable Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison](https://arxiv.org/abs/2502.05174) Kaijie Zhu, Xianjun Yang, Jindong Wang, Wenbo Guo, William Yang Wang -+ [Neural Encrypted State Transduction for Ransomware Classification: A Novel Approach Using Cryptographic Flow Residuals](https://arxiv.org//abs/2502.05341) ++ [Neural Encrypted State Transduction for Ransomware Classification: A Novel Approach Using Cryptographic Flow Residuals](https://arxiv.org/abs/2502.05341) Barnaby Fortescue, Edmund Hawksmoor, Alistair Wetherington, Frederick Marlowe, Kevin Pekepok # 2025-02-06 -+ [SoK: Benchmarking Poisoning Attacks and Defenses in Federated Learning](https://arxiv.org//abs/2502.03801) ++ [SoK: Benchmarking Poisoning Attacks and Defenses in Federated Learning](https://arxiv.org/abs/2502.03801) Heyi Zhang, Yule Liu, Xinlei He, Jun Wu, Tianshuo Cong, Xinyi Huang -+ [Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples](https://arxiv.org//abs/2502.03957) ++ [Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples](https://arxiv.org/abs/2502.03957) Konstantinos Tsigos, Evlampios Apostolidis, Vasileios Mezaris -+ [Dark Distillation: Backdooring Distilled Datasets without Accessing Raw Data](https://arxiv.org//abs/2502.04229) ++ [Dark Distillation: Backdooring Distilled Datasets without Accessing Raw Data](https://arxiv.org/abs/2502.04229) Ziyuan Yang, Ming Yan, Yi Zhang, Joey Tianyi Zhou -+ [Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions](https://arxiv.org//abs/2502.04322) ++ [Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions](https://arxiv.org/abs/2502.04322) Yik Siu Chan, Narutatsu Ri, Yuxin Xiao, Marzyeh Ghassemi -+ [DocMIA: Document-Level Membership Inference Attacks against DocVQA Models](https://arxiv.org//abs/2502.03692) ++ [DocMIA: Document-Level Membership Inference Attacks against DocVQA Models](https://arxiv.org/abs/2502.03692) Khanh Nguyen, Raouf Kerkouche, Mario Fritz, Dimosthenis Karatzas -+ [Improving Adversarial Robustness via Phase and Amplitude-aware Prompting](https://arxiv.org//abs/2502.03758) ++ [Improving Adversarial Robustness via Phase and Amplitude-aware Prompting](https://arxiv.org/abs/2502.03758) Yibo Xu, Dawei Zhou, Decheng Liu, Nannan Wang -+ [Synthetic Poisoning Attacks: The Impact of Poisoned MRI Image on U-Net Brain Tumor Segmentation](https://arxiv.org//abs/2502.03825) ++ [Synthetic Poisoning Attacks: The Impact of Poisoned MRI Image on U-Net Brain Tumor Segmentation](https://arxiv.org/abs/2502.03825) Tianhao Li, Tianyu Zeng, Yujia Zheng, Chulong Zhang, Jingyu Lu, Haotian Huang, Chuangxin Chu, Fang-Fang Yin, Zhenyu Yang -+ [How vulnerable is my policy? Adversarial attacks on modern behavior cloning policies](https://arxiv.org//abs/2502.03698) ++ [How vulnerable is my policy? Adversarial attacks on modern behavior cloning policies](https://arxiv.org/abs/2502.03698) Basavasagar Patil, Akansha Kalra, Guanhong Tao, Daniel S. Brown -+ [Comparing privacy notions for protection against reconstruction attacks in machine learning](https://arxiv.org//abs/2502.04045) ++ [Comparing privacy notions for protection against reconstruction attacks in machine learning](https://arxiv.org/abs/2502.04045) Sayan Biswas, Mark Dras, Pedro Faustini, Natasha Fernandes, Annabelle McIver, Catuscia Palamidessi, Parastoo Sadeghi -+ ["Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence](https://arxiv.org//abs/2502.04204) ++ ["Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence](https://arxiv.org/abs/2502.04204) Shaopeng Fu, Liang Ding, Di Wang -+ [Adapting to Evolving Adversaries with Regularized Continual Robust Training](https://arxiv.org//abs/2502.04248) ++ [Adapting to Evolving Adversaries with Regularized Continual Robust Training](https://arxiv.org/abs/2502.04248) Sihui Dai, Christian Cianfarani, Arjun Bhagoji, Vikash Sehwag, Prateek Mittal -+ [Detecting Backdoor Attacks via Similarity in Semantic Communication Systems](https://arxiv.org//abs/2502.03721) ++ [Detecting Backdoor Attacks via Similarity in Semantic Communication Systems](https://arxiv.org/abs/2502.03721) Ziyang Wei, Yili Jiang, Jiaqi Huang, Fangtian Zhong, Sohan Gyawali -+ [The Gradient Puppeteer: Adversarial Domination in Gradient Leakage Attacks through Model Poisoning](https://arxiv.org//abs/2502.04106) ++ [The Gradient Puppeteer: Adversarial Domination in Gradient Leakage Attacks through Model Poisoning](https://arxiv.org/abs/2502.04106) Kunlan Xiang, Haomiao Yang, Meng Hao, Haoxin Wang, Shaofeng Li, Zikang Ding, Tianwei Zhang -+ [Provably Robust Explainable Graph Neural Networks against Graph Perturbation Attacks](https://arxiv.org//abs/2502.04224) ++ [Provably Robust Explainable Graph Neural Networks against Graph Perturbation Attacks](https://arxiv.org/abs/2502.04224) Jiate Li, Meng Pang, Yun Dong, Jinyuan Jia, Binghui Wang -+ [Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer](https://arxiv.org//abs/2502.04573) ++ [Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer](https://arxiv.org/abs/2502.04573) Yulun Wu, Doron L. Bergman -+ [A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluations](https://arxiv.org//abs/2502.05224) ++ [A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluations](https://arxiv.org/abs/2502.05224) Yihe Zhou, Tao Ni, Wei-Bin Lee, Qingchuan Zhao -+ [BitAbuse: A Dataset of Visually Perturbed Texts for Defending Phishing Attacks](https://arxiv.org//abs/2502.05225) ++ [BitAbuse: A Dataset of Visually Perturbed Texts for Defending Phishing Attacks](https://arxiv.org/abs/2502.05225) Hanyong Lee, Chaelyn Lee, Yongjae Lee, Jaesung Lee @@ -14859,57 +14859,57 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Chhavi Yadav, Evan Monroe Laufer, Dan Boneh, Kamalika Chaudhuri # 2025-02-05 -+ [Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning](https://arxiv.org//abs/2502.02844) ++ [Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2502.02844) Sunwoo Lee, Jaebak Hwang, Yonghyeon Jo, Seungyul Han -+ [Position: Editing Large Language Models Poses Serious Safety Risks](https://arxiv.org//abs/2502.02958) ++ [Position: Editing Large Language Models Poses Serious Safety Risks](https://arxiv.org/abs/2502.02958) Paul Youssef, Zhixue Zhao, Daniel Braun, Jörg Schlötterer, Christin Seifert -+ [Privacy Token: Surprised to Find Out What You Accidentally Revealed](https://arxiv.org//abs/2502.02913) ++ [Privacy Token: Surprised to Find Out What You Accidentally Revealed](https://arxiv.org/abs/2502.02913) Jiayang Meng, Tao Huang, Xin Shi, Qingyu Huang, Chen Hou, Hong Chen -+ [Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models](https://arxiv.org//abs/2502.02970) ++ [Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models](https://arxiv.org/abs/2502.02970) Muxing Li, Zesheng Ye, Yixuan Li, Andy Song, Guangquan Zhang, Feng Liu -+ [Understanding and Enhancing the Transferability of Jailbreaking Attacks](https://arxiv.org//abs/2502.03052) ++ [Understanding and Enhancing the Transferability of Jailbreaking Attacks](https://arxiv.org/abs/2502.03052) Runqi Lin, Bo Han, Fengwang Li, Tongling Liu -+ [Large Language Model Adversarial Landscape Through the Lens of Attack Objectives](https://arxiv.org//abs/2502.02960) ++ [Large Language Model Adversarial Landscape Through the Lens of Attack Objectives](https://arxiv.org/abs/2502.02960) Nan Wang, Kane Walter, Yansong Gao, Alsharif Abuadbba -+ [Exploring the Security Threats of Knowledge Base Poisoning in Retrieval-Augmented Code Generation](https://arxiv.org//abs/2502.03233) ++ [Exploring the Security Threats of Knowledge Base Poisoning in Retrieval-Augmented Code Generation](https://arxiv.org/abs/2502.03233) Bo Lin, Shangwen Wang, Liqian Chen, Xiaoguang Mao -+ [Towards Fair Medical AI: Adversarial Debiasing of 3D CT Foundation Embeddings](https://arxiv.org//abs/2502.04386) ++ [Towards Fair Medical AI: Adversarial Debiasing of 3D CT Foundation Embeddings](https://arxiv.org/abs/2502.04386) Guangyao Zheng, Michael A. Jacobs, Vladimir Braverman, Vishwa S. Parekh -+ [MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction](https://arxiv.org//abs/2502.04360) ++ [MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction](https://arxiv.org/abs/2502.04360) Xiao Hu, Eric Liu, Weizhou Wang, Xiangyu Guo, David Lie -+ [KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs](https://arxiv.org//abs/2502.05223) ++ [KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs](https://arxiv.org/abs/2502.05223) Buyun Liang, Kwan Ho Ryan Chan, Darshan Thaker, Jinqi Luo, René Vidal -+ [Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach](https://arxiv.org//abs/2502.06832) ++ [Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach](https://arxiv.org/abs/2502.06832) Xu Zhang, Kaidi Xu, Ziqing Hu, Ren Wang @@ -14920,119 +14920,119 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Xu Zhang, Kaidi Xu, Ziqing Hu, Ren Wang # 2025-02-04 -+ [FRAUD-RLA: A new reinforcement learning adversarial attack against credit card fraud detection](https://arxiv.org//abs/2502.02290) ++ [FRAUD-RLA: A new reinforcement learning adversarial attack against credit card fraud detection](https://arxiv.org/abs/2502.02290) Daniele Lunghi, Yannick Molinghen, Alkis Simitsis, Tom Lenaerts, Gianluca Bontempi -+ [Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment](https://arxiv.org//abs/2502.02438) ++ [Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment](https://arxiv.org/abs/2502.02438) Yaling Shen, Zhixiong Zhuang, Kun Yuan, Maria-Irina Nicolae, Nassir Navab, Nicolas Padoy, Mario Fritz -+ [PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling](https://arxiv.org//abs/2502.01925) ++ [PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling](https://arxiv.org/abs/2502.01925) Avery Ma, Yangchen Pan, Amir-massoud Farahmand -+ [INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy](https://arxiv.org//abs/2502.01896) ++ [INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy](https://arxiv.org/abs/2502.01896) Nastaran Darabi, Divake Kumar, Sina Tayebati, Amit Ranjan Trivedi -+ [Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via In-the-wild Cascading Flow Optimization](https://arxiv.org//abs/2502.02096) ++ [Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via In-the-wild Cascading Flow Optimization](https://arxiv.org/abs/2502.02096) Yixiao Chen, Shikun Sun, Jianshu Li, Ruoyu Li, Zhe Li, Junliang Xing -+ [Privacy Attacks on Image AutoRegressive Models](https://arxiv.org//abs/2502.02514) ++ [Privacy Attacks on Image AutoRegressive Models](https://arxiv.org/abs/2502.02514) Antoni Kowalczuk, Jan Dubiński, Franziska Boenisch, Adam Dziedzic -+ [Uncertainty Quantification for Collaborative Object Detection Under Adversarial Attacks](https://arxiv.org//abs/2502.02537) ++ [Uncertainty Quantification for Collaborative Object Detection Under Adversarial Attacks](https://arxiv.org/abs/2502.02537) Huiqun Huang, Cong Chen, Jean-Philippe Monteuuis, Jonathan Petit, Fei Miao -+ [Query-Based and Unnoticeable Graph Injection Attack from Neighborhood Perspective](https://arxiv.org//abs/2502.01936) ++ [Query-Based and Unnoticeable Graph Injection Attack from Neighborhood Perspective](https://arxiv.org/abs/2502.01936) Chang Liu, Hai Huang, Yujie Xing, Xingquan Zuo -+ [Adversarial ML Problems Are Getting Harder to Solve and to Evaluate](https://arxiv.org//abs/2502.02260) ++ [Adversarial ML Problems Are Getting Harder to Solve and to Evaluate](https://arxiv.org/abs/2502.02260) Javier Rando, Jie Zhang, Nicholas Carlini, Florian Tramèr -+ [OVERTHINKING: Slowdown Attacks on Reasoning LLMs](https://arxiv.org//abs/2502.02542) ++ [OVERTHINKING: Slowdown Attacks on Reasoning LLMs](https://arxiv.org/abs/2502.02542) Abhinav Kumar, Jaechul Roh, Ali Naseh, Marzena Karpinska, Mohit Iyyer, Amir Houmansadr, Eugene Bagdasarian -+ [SMTFL: Secure Model Training to Untrusted Participants in Federated Learning](https://arxiv.org//abs/2502.02038) ++ [SMTFL: Secure Model Training to Untrusted Participants in Federated Learning](https://arxiv.org/abs/2502.02038) Zhihui Zhao, Xiaorong Dong, Yimo Ren, Jianhua Wang, Dan Yu, Hongsong Zhu, Yongle Chen -+ [Investigating the Robustness of Deductive Reasoning with Large Language Models](https://arxiv.org//abs/2502.04352) ++ [Investigating the Robustness of Deductive Reasoning with Large Language Models](https://arxiv.org/abs/2502.04352) Fabian Hoppe, Filip Ilievski, Jan-Christoph Kalo -+ [CoRPA: Adversarial Image Generation for Chest X-rays Using Concept Vector Perturbations and Generative Models](https://arxiv.org//abs/2502.05214) ++ [CoRPA: Adversarial Image Generation for Chest X-rays Using Concept Vector Perturbations and Generative Models](https://arxiv.org/abs/2502.05214) Amy Rafferty, Rishi Ramaesh, Ajitha Rajan -+ [From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios](https://arxiv.org//abs/2502.02145) ++ [From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios](https://arxiv.org/abs/2502.02145) Yuan Gao, Mattia Piccinini, Korbinian Moller, Johannes Betz -+ [Robust LLM Alignment via Distributionally Robust Direct Preference Optimization](https://arxiv.org//abs/2502.01930) ++ [Robust LLM Alignment via Distributionally Robust Direct Preference Optimization](https://arxiv.org/abs/2502.01930) Zaiyan Xu, Sushil Vemuri, Kishan Panaganti, Dileep Kalathil, Rahul Jain, Deepak Ramachandran # 2025-02-03 -+ [A Privacy-Preserving Domain Adversarial Federated learning for multi-site brain functional connectivity analysis](https://arxiv.org//abs/2502.01885) ++ [A Privacy-Preserving Domain Adversarial Federated learning for multi-site brain functional connectivity analysis](https://arxiv.org/abs/2502.01885) Yipu Zhang, Likai Wang, Kuan-Jui Su, Aiying Zhang, Hao Zhu, Xiaowen Liu, Hui Shen, Vince D. Calhoun, Yuping Wang, Hongwen Deng -+ [Mitigation of Camouflaged Adversarial Attacks in Autonomous Vehicles--A Case Study Using CARLA Simulator](https://arxiv.org//abs/2502.05208) ++ [Mitigation of Camouflaged Adversarial Attacks in Autonomous Vehicles--A Case Study Using CARLA Simulator](https://arxiv.org/abs/2502.05208) Yago Romano Martinez, Brady Carter, Abhijeet Solanki, Wesam Al Amiri, Syed Rafay Hasan, Terry N. Guo -+ [Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities](https://arxiv.org//abs/2502.05209) ++ [Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities](https://arxiv.org/abs/2502.05209) Zora Che, Stephen Casper, Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai, Yarin Gal, Furong Huang, Dylan Hadfield-Menell -+ [Decoding FL Defenses: Systemization, Pitfalls, and Remedies](https://arxiv.org//abs/2502.05211) ++ [Decoding FL Defenses: Systemization, Pitfalls, and Remedies](https://arxiv.org/abs/2502.05211) Momin Ahmad Khan, Virat Shejwalkar, Yasra Chandio, Amir Houmansadr, Fatima Muhammad Anwar -+ [Detecting Backdoor Samples in Contrastive Language Image Pretraining](https://arxiv.org//abs/2502.01385) ++ [Detecting Backdoor Samples in Contrastive Language Image Pretraining](https://arxiv.org/abs/2502.01385) Hanxun Huang, Sarah Erfani, Yige Li, Xingjun Ma, James Bailey -+ [Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees](https://arxiv.org//abs/2502.01027) ++ [Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees](https://arxiv.org/abs/2502.01027) Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi -+ [Bias Beware: The Impact of Cognitive Biases on LLM-Driven Product Recommendations](https://arxiv.org//abs/2502.01349) ++ [Bias Beware: The Impact of Cognitive Biases on LLM-Driven Product Recommendations](https://arxiv.org/abs/2502.01349) Giorgos Filandrianos, Angeliki Dimitriou, Maria Lymperaiou, Konstantinos Thomas, Giorgos Stamou -+ [Refining Adaptive Zeroth-Order Optimization at Ease](https://arxiv.org//abs/2502.01014) ++ [Refining Adaptive Zeroth-Order Optimization at Ease](https://arxiv.org/abs/2502.01014) Yao Shu, Qixin Zhang, Kun He, Zhongxiang Dai -+ [Adversarial Reasoning at Jailbreaking Time](https://arxiv.org//abs/2502.01633) ++ [Adversarial Reasoning at Jailbreaking Time](https://arxiv.org/abs/2502.01633) Mahdi Sabbaghi, Paul Kassianik, George Pappas, Yaron Singer, Amin Karbasi, Hamed Hassani @@ -15040,81 +15040,81 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Zora Che, Stephen Casper, Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai, Yarin Gal, Furong Huang, Dylan Hadfield-Menell -+ [GRADIEND: Feature Learning within Neural Networks Exemplified through Biases](https://arxiv.org//abs/2502.01406) ++ [GRADIEND: Feature Learning within Neural Networks Exemplified through Biases](https://arxiv.org/abs/2502.01406) Jonathan Drechsel, Steffen Herbold -+ [Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search](https://arxiv.org//abs/2502.01609) ++ [Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search](https://arxiv.org/abs/2502.01609) Yanbo Wang, Zixiang Xu, Yue Huang, Chujie Gao, Siyuan Wu, Jiayi Ye, Pin-Yu Chen, Xiuying Chen, Xiangliang Zhang -+ [FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model](https://arxiv.org//abs/2502.01472) ++ [FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model](https://arxiv.org/abs/2502.01472) Jinwei Hu, Zhenglin Huang, Xiangyu Yin, Wenjie Ruan, Guangliang Cheng, Yi Dong, Xiaowei Huang # 2025-02-02 -+ [Safety at Scale: A Comprehensive Survey of Large Model Safety](https://arxiv.org//abs/2502.05206) ++ [Safety at Scale: A Comprehensive Survey of Large Model Safety](https://arxiv.org/abs/2502.05206) Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, Hengyuan Xu, Yunhao Chen, Yunhan Zhao, Hanxun Huang, Yige Li, Jiaming Zhang, Xiang Zheng, Yang Bai, Henghui Ding, Zuxuan Wu, Xipeng Qiu, Jingfeng Zhang, Yiming Li, Jun Sun, Cong Wang, Jindong Gu, Baoyuan Wu, Siheng Chen, Tianwei Zhang, Yang Liu, Mingming Gong, Tongliang Liu, Shirui Pan, Cihang Xie, Tianyu Pang, Yinpeng Dong, Ruoxi Jia, Yang Zhang, Shiqing Ma, Xiangyu Zhang, Neil Gong, Chaowei Xiao, Sarah Erfani, Bo Li, Masashi Sugiyama, Dacheng Tao, James Bailey, Yu-Gang Jiang -+ [`Do as I say not as I do': A Semi-Automated Approach for Jailbreak Prompt Attack against Multimodal LLMs](https://arxiv.org//abs/2502.00735) ++ [`Do as I say not as I do': A Semi-Automated Approach for Jailbreak Prompt Attack against Multimodal LLMs](https://arxiv.org/abs/2502.00735) Chun Wai Chiu, Linghan Huang, Bo Li, Huaming Chen -+ ["I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models](https://arxiv.org//abs/2502.00718) ++ ["I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models](https://arxiv.org/abs/2502.00718) Isha Gupta, David Khachaturov, Robert Mullins -+ [Reformulation is All You Need: Addressing Malicious Text Features in DNNs](https://arxiv.org//abs/2502.00652) ++ [Reformulation is All You Need: Addressing Malicious Text Features in DNNs](https://arxiv.org/abs/2502.00652) Yi Jiang, Oubo Ma, Yong Yang, Tong Zhang, Shouling Ji -+ [AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement](https://arxiv.org//abs/2502.00757) ++ [AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement](https://arxiv.org/abs/2502.00757) J Rosser, Jakob Foerster # 2025-02-01 -+ [Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation](https://arxiv.org//abs/2502.00306) ++ [Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation](https://arxiv.org/abs/2502.00306) Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, Amir Houmansadr -+ [Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities](https://arxiv.org//abs/2502.00451) ++ [Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities](https://arxiv.org/abs/2502.00451) Aishik Mandal, Tanmoy Chakraborty, Iryna Gurevych # 2025-01-31 -+ [Adversarial Machine Learning: Attacking and Safeguarding Image Datasets](https://arxiv.org//abs/2502.05203) ++ [Adversarial Machine Learning: Attacking and Safeguarding Image Datasets](https://arxiv.org/abs/2502.05203) Koushik Chowdhury -+ [Deep Learning Model Inversion Attacks and Defenses: A Comprehensive Survey](https://arxiv.org//abs/2501.18934) ++ [Deep Learning Model Inversion Attacks and Defenses: A Comprehensive Survey](https://arxiv.org/abs/2501.18934) Wencheng Yang, Song Wang, Di Wu, Taotao Cai, Yanming Zhu, Shicheng Wei, Yiying Zhang, Xu Yang, Zhaohui Tang, Yan Li -+ [Towards the Worst-case Robustness of Large Language Models](https://arxiv.org//abs/2501.19040) ++ [Towards the Worst-case Robustness of Large Language Models](https://arxiv.org/abs/2501.19040) Huanran Chen, Yinpeng Dong, Zeming Wei, Hang Su, Jun Zhu -+ [Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach](https://arxiv.org//abs/2501.19403) ++ [Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach](https://arxiv.org/abs/2501.19403) Yingdan Shi, Sijia Liu, Ren Wang -+ [Improving LLM Unlearning Robustness via Random Perturbations](https://arxiv.org//abs/2501.19202) ++ [Improving LLM Unlearning Robustness via Random Perturbations](https://arxiv.org/abs/2501.19202) Dang Huu-Tien, Hoang Thanh-Tung, Anh Bui, Le-Minh Nguyen, Naoya Inoue -+ [Concept Steerers: Leveraging K-Sparse Autoencoders for Test-Time Controllable Generations](https://arxiv.org//abs/2501.19066) ++ [Concept Steerers: Leveraging K-Sparse Autoencoders for Test-Time Controllable Generations](https://arxiv.org/abs/2501.19066) Dahye Kim, Deepti Ghadiyaram -+ [Pitfalls of defacing whole-head MRI: re-identification risk with diffusion models and compromised research potential](https://arxiv.org//abs/2501.18834) ++ [Pitfalls of defacing whole-head MRI: re-identification risk with diffusion models and compromised research potential](https://arxiv.org/abs/2501.18834) Chenyu Gao, Kaiwen Xu, Michael E. Kim, Lianrui Zuo, Zhiyuan Li, Derek B. Archer, Timothy J. Hohman, Ann Zenobia Moore, Luigi Ferrucci, Lori L. Beason-Held, Susan M. Resnick, Christos Davatzikos, Jerry L. Prince, Bennett A. Landman -+ [SWAT: Sliding Window Adversarial Training for Gradual Domain Adaptation](https://arxiv.org//abs/2501.19155) ++ [SWAT: Sliding Window Adversarial Training for Gradual Domain Adaptation](https://arxiv.org/abs/2501.19155) Zixi Wang, Xiangxu Zhao, Tonglan Xie, Mengmeng Jing, Lin Zuo @@ -15131,26 +15131,26 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Dahye Kim, Deepti Ghadiyaram # 2025-01-30 -+ [Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models](https://arxiv.org//abs/2501.18280) ++ [Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models](https://arxiv.org/abs/2501.18280) Haoyu Liang, Youran Sun, Yunfeng Cai, Jun Zhu, Bo Zhang -+ [Exploring Audio Editing Features as User-Centric Privacy Defenses Against Large Language Model(LLM) Based Emotion Inference Attacks](https://arxiv.org//abs/2501.18727) ++ [Exploring Audio Editing Features as User-Centric Privacy Defenses Against Large Language Model(LLM) Based Emotion Inference Attacks](https://arxiv.org/abs/2501.18727) Mohd. Farhan Israk Soumik, W.K.M. Mithsara, Abdur R. Shahid, Ahmed Imteaj -+ [Deceptive Sequential Decision-Making via Regularized Policy Optimization](https://arxiv.org//abs/2501.18803) ++ [Deceptive Sequential Decision-Making via Regularized Policy Optimization](https://arxiv.org/abs/2501.18803) Yerin Kim, Alexander Benvenuti, Bo Chen, Mustafa Karabag, Abhishek Kulkarni, Nathaniel D. Bastian, Ufuk Topcu, Matthew Hale # 2025-01-29 -+ [SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders](https://arxiv.org//abs/2501.18052) ++ [SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders](https://arxiv.org/abs/2501.18052) Bartosz Cywiński, Kamil Deja -+ [Improving Your Model Ranking on Chatbot Arena by Vote Rigging](https://arxiv.org//abs/2501.17858) ++ [Improving Your Model Ranking on Chatbot Arena by Vote Rigging](https://arxiv.org/abs/2501.17858) Rui Min, Tianyu Pang, Chao Du, Qian Liu, Minhao Cheng, Min Lin @@ -15159,90 +15159,90 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Zhengpeng Xie, Yulong Zhang # 2025-01-28 -+ [Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation](https://arxiv.org//abs/2501.18638) ++ [Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation](https://arxiv.org/abs/2501.18638) Daniel Schwartz, Dmitriy Bespalov, Zhe Wang, Ninad Kulkarni, Yanjun Qi -+ [Blockchain Address Poisoning](https://arxiv.org//abs/2501.16681) ++ [Blockchain Address Poisoning](https://arxiv.org/abs/2501.16681) Taro Tsuchiya, Jin-Dong Dong, Kyle Soska, Nicolas Christin # 2025-01-27 -+ [The TIP of the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks on LLMs](https://arxiv.org//abs/2501.18626) ++ [The TIP of the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks on LLMs](https://arxiv.org/abs/2501.18626) Sergey Berezin, Reza Farahbakhsh, Noel Crespi -+ [FDLLM: A Dedicated Detector for Black-Box LLMs Fingerprinting](https://arxiv.org//abs/2501.16029) ++ [FDLLM: A Dedicated Detector for Black-Box LLMs Fingerprinting](https://arxiv.org/abs/2501.16029) Zhiyuan Fu, Junfan Chen, Lan Zhang, Ting Yang, Jun Niu, Hongyu Sun, Ruidong Li, Peng Liu, Yuqing Zhang -+ [LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models](https://arxiv.org//abs/2501.15850) ++ [LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models](https://arxiv.org/abs/2501.15850) Yuewen Mei, Tong Nie, Jian Sun, Ye Tian -+ [Adversarially Robust Bloom Filters: Privacy, Reductions, and Open Problems](https://arxiv.org//abs/2501.15751) ++ [Adversarially Robust Bloom Filters: Privacy, Reductions, and Open Problems](https://arxiv.org/abs/2501.15751) Hayder Tirmazi -+ [Towards Robust Stability Prediction in Smart Grids: GAN-based Approach under Data Constraints and Adversarial Challenges](https://arxiv.org//abs/2501.16490) ++ [Towards Robust Stability Prediction in Smart Grids: GAN-based Approach under Data Constraints and Adversarial Challenges](https://arxiv.org/abs/2501.16490) Emad Efatinasab, Alessandro Brighente, Denis Donadel, Mauro Conti, Mirco Rampazzo -+ [Rethinking the Bias of Foundation Model under Long-tailed Distribution](https://arxiv.org//abs/2501.15955) ++ [Rethinking the Bias of Foundation Model under Long-tailed Distribution](https://arxiv.org/abs/2501.15955) Jiahao Chen, Bin Qin, Jiangmeng Li, Hao Chen, Bing Su -+ [TombRaider: Entering the Vault of History to Jailbreak Large Language Models](https://arxiv.org//abs/2501.18628) ++ [TombRaider: Entering the Vault of History to Jailbreak Large Language Models](https://arxiv.org/abs/2501.18628) Junchen Ding, Jiahao Zhang, Yi Liu, Ziqi Ding, Gelei Deng, Yuekang Li -+ [Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs](https://arxiv.org//abs/2501.16534) ++ [Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs](https://arxiv.org/abs/2501.16534) Jean-Charles Noirot Ferrand, Yohan Beugin, Eric Pauley, Ryan Sheatsley, Patrick McDaniel # 2025-01-26 -+ [FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint](https://arxiv.org//abs/2501.15509) ++ [FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint](https://arxiv.org/abs/2501.15509) Shuo Shao, Haozhe Zhu, Hongwei Yao, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren # 2025-01-25 -+ [DBA-DFL: Towards Distributed Backdoor Attacks with Network Detection in Decentralized Federated Learning](https://arxiv.org//abs/2501.15005) ++ [DBA-DFL: Towards Distributed Backdoor Attacks with Network Detection in Decentralized Federated Learning](https://arxiv.org/abs/2501.15005) Bohan Liu, Yang Xiao, Ruimeng Ye, Zinan Ling, Xiaolong Ma, Bo Hui -+ [A Portable and Stealthy Inaudible Voice Attack Based on Acoustic Metamaterials](https://arxiv.org//abs/2501.15031) ++ [A Portable and Stealthy Inaudible Voice Attack Based on Acoustic Metamaterials](https://arxiv.org/abs/2501.15031) Zhiyuan Ning, Juan He, Zhanyong Tang, Weihang Hu, Xiaojiang Chen # 2025-01-24 -+ [Siren: A Learning-Based Multi-Turn Attack Framework for Simulating Real-World Human Jailbreak Behaviors](https://arxiv.org//abs/2501.14250) ++ [Siren: A Learning-Based Multi-Turn Attack Framework for Simulating Real-World Human Jailbreak Behaviors](https://arxiv.org/abs/2501.14250) Yi Zhao, Youzhi Zhang -+ [GreedyPixel: Fine-Grained Black-Box Adversarial Attack Via Greedy Algorithm](https://arxiv.org//abs/2501.14230) ++ [GreedyPixel: Fine-Grained Black-Box Adversarial Attack Via Greedy Algorithm](https://arxiv.org/abs/2501.14230) Hanrui Wang, Ching-Chun Chang, Chun-Shien Lu, Christopher Leckie, Isao Echizen -+ [A Note on Implementation Errors in Recent Adaptive Attacks Against Multi-Resolution Self-Ensembles](https://arxiv.org//abs/2501.14496) ++ [A Note on Implementation Errors in Recent Adaptive Attacks Against Multi-Resolution Self-Ensembles](https://arxiv.org/abs/2501.14496) Stanislav Fort -+ [Optimal Strategies for Federated Learning Maintaining Client Privacy](https://arxiv.org//abs/2501.14453) ++ [Optimal Strategies for Federated Learning Maintaining Client Privacy](https://arxiv.org/abs/2501.14453) Uday Bhaskar, Varul Srivastava, Avyukta Manjunatha Vummintala, Naresh Manwani, Sujit Gujar -+ [Real-world Edge Neural Network Implementations Leak Private Interactions Through Physical Side Channel](https://arxiv.org//abs/2501.14512) ++ [Real-world Edge Neural Network Implementations Leak Private Interactions Through Physical Side Channel](https://arxiv.org/abs/2501.14512) Zhuoran Liu, Senna van Hoek, Péter Horváth, Dirk Lauret, Xiaoyun Xu, Lejla Batina -+ [Optimizing Privacy-Utility Trade-off in Decentralized Learning with Generalized Correlated Noise](https://arxiv.org//abs/2501.14644) ++ [Optimizing Privacy-Utility Trade-off in Decentralized Learning with Generalized Correlated Noise](https://arxiv.org/abs/2501.14644) Angelo Rodio, Zheng Chen, Erik G. Larsson @@ -15251,232 +15251,232 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Hanrui Wang, Ching-Chun Chang, Chun-Shien Lu, Christopher Leckie, Isao Echizen # 2025-01-23 -+ [Ensuring Medical AI Safety: Explainable AI-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data](https://arxiv.org//abs/2501.13818) ++ [Ensuring Medical AI Safety: Explainable AI-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data](https://arxiv.org/abs/2501.13818) Frederik Pahde, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek -+ [Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving](https://arxiv.org//abs/2501.13563) ++ [Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving](https://arxiv.org/abs/2501.13563) Lu Wang, Tianyuan Zhang, Yang Qu, Siyuan Liang, Yuwei Chen, Aishan Liu, Xianglong Liu, Dacheng Tao -+ [Certified Robustness Under Bounded Levenshtein Distance](https://arxiv.org//abs/2501.13676) ++ [Certified Robustness Under Bounded Levenshtein Distance](https://arxiv.org/abs/2501.13676) Elias Abad Rocamora, Grigorios G. Chrysos, Volkan Cevher -+ [Defending against Adversarial Malware Attacks on ML-based Android Malware Detection Systems](https://arxiv.org//abs/2501.13782) ++ [Defending against Adversarial Malware Attacks on ML-based Android Malware Detection Systems](https://arxiv.org/abs/2501.13782) Ping He, Lorenzo Cavallaro, Shouling Ji -+ [Gradient-Free Adversarial Purification with Diffusion Models](https://arxiv.org//abs/2501.13336) ++ [Gradient-Free Adversarial Purification with Diffusion Models](https://arxiv.org/abs/2501.13336) Xuelong Dai, Dong Wang, Duan Mingxing, Bin Xiao -+ [Retrievals Can Be Detrimental: A Contrastive Backdoor Attack Paradigm on Retrieval-Augmented Diffusion Models](https://arxiv.org//abs/2501.13340) ++ [Retrievals Can Be Detrimental: A Contrastive Backdoor Attack Paradigm on Retrieval-Augmented Diffusion Models](https://arxiv.org/abs/2501.13340) Hao Fang, Xiaohang Sui, Hongyao Yu, Jiawei Kong, Sijin Yu, Bin Chen, Hao Wu, Shu-Tao Xia -+ [HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor](https://arxiv.org//abs/2501.13677) ++ [HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor](https://arxiv.org/abs/2501.13677) Zihui Wu, Haichang Gao, Jiacheng Luo, Zhaoxiang Liu -+ [Crossfire: An Elastic Defense Framework for Graph Neural Networks Under Bit Flip Attacks](https://arxiv.org//abs/2501.13776) ++ [Crossfire: An Elastic Defense Framework for Graph Neural Networks Under Bit Flip Attacks](https://arxiv.org/abs/2501.13776) Lorenz Kummer, Samir Moustafa, Wilfried Gansterer, Nils Kriege -+ [Device-aware Optical Adversarial Attack for a Portable Projector-camera System](https://arxiv.org//abs/2501.14005) ++ [Device-aware Optical Adversarial Attack for a Portable Projector-camera System](https://arxiv.org/abs/2501.14005) Ning Jiang, Yanhong Liu, Dingheng Zeng, Yue Feng, Weihong Deng, Ying Li -+ [Reinforcement Learning Platform for Adversarial Black-box Attacks with Custom Distortion Filters](https://arxiv.org//abs/2501.14122) ++ [Reinforcement Learning Platform for Adversarial Black-box Attacks with Custom Distortion Filters](https://arxiv.org/abs/2501.14122) Soumyendu Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Ricardo Luna Gutierrez, Antonio Guillen -+ [LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language](https://arxiv.org//abs/2501.14073) ++ [LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language](https://arxiv.org/abs/2501.14073) Yubin Ge, Neeraja Kirtane, Hao Peng, Dilek Hakkani-Tür -+ [Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models](https://arxiv.org//abs/2501.13772) ++ [Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models](https://arxiv.org/abs/2501.13772) Hao Cheng, Erjia Xiao, Jing Shao, Yichi Wang, Le Yang, Chao Sheng, Philip Torr, Jindong Gu, Renjing Xu -+ [PBM-VFL: Vertical Federated Learning with Feature and Sample Privacy](https://arxiv.org//abs/2501.13916) ++ [PBM-VFL: Vertical Federated Learning with Feature and Sample Privacy](https://arxiv.org/abs/2501.13916) Linh Tran, Timothy Castiglia, Stacy Patterson, Ana Milanova # 2025-01-22 -+ [Robust Representation Consistency Model via Contrastive Denoising](https://arxiv.org//abs/2501.13094) ++ [Robust Representation Consistency Model via Contrastive Denoising](https://arxiv.org/abs/2501.13094) Jiachen Lei, Julius Berner, Jiongxiao Wang, Zhongzhu Chen, Zhongjia Ba, Kui Ren, Jun Zhu, Anima Anandkumar -+ [Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment](https://arxiv.org//abs/2501.13080) ++ [Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment](https://arxiv.org/abs/2501.13080) Melissa Kazemi Rad, Huy Nghiem, Andy Luo, Sahil Wadhwa, Mohammad Sorower, Stephen Rawls -+ [Modality Unified Attack for Omni-Modality Person Re-Identification](https://arxiv.org//abs/2501.12761) ++ [Modality Unified Attack for Omni-Modality Person Re-Identification](https://arxiv.org/abs/2501.12761) Yuan Bian, Min Liu, Yunqi Yi, Xueping Wang, Yunfeng Ma, Yaonan Wang -+ [Bad-PFL: Exploring Backdoor Attacks against Personalized Federated Learning](https://arxiv.org//abs/2501.12736) ++ [Bad-PFL: Exploring Backdoor Attacks against Personalized Federated Learning](https://arxiv.org/abs/2501.12736) Mingyuan Fan, Zhanyi Hu, Fuyi Wang, Cen Chen -+ [Intelligent Attacks on Cyber-Physical Systems and Critical Infrastructures](https://arxiv.org//abs/2501.12762) ++ [Intelligent Attacks on Cyber-Physical Systems and Critical Infrastructures](https://arxiv.org/abs/2501.12762) Alan Oliveira de Sá, Charles Bezerra Prado, Mariana Luiza Flavio, Luiz F. Rust da C. Carmo -+ [FedDAG: Federated Domain Adversarial Generation Towards Generalizable Medical Image Analysis](https://arxiv.org//abs/2501.13967) ++ [FedDAG: Federated Domain Adversarial Generation Towards Generalizable Medical Image Analysis](https://arxiv.org/abs/2501.13967) Haoxuan Che, Yifei Wu, Haibo Jin, Yong Xia, Hao Chen -+ [A Selective Homomorphic Encryption Approach for Faster Privacy-Preserving Federated Learning](https://arxiv.org//abs/2501.12911) ++ [A Selective Homomorphic Encryption Approach for Faster Privacy-Preserving Federated Learning](https://arxiv.org/abs/2501.12911) Abdulkadir Korkmaz, Praveen Rao -+ [Unveiling Zero-Space Detection: A Novel Framework for Autonomous Ransomware Identification in High-Velocity Environments](https://arxiv.org//abs/2501.12811) ++ [Unveiling Zero-Space Detection: A Novel Framework for Autonomous Ransomware Identification in High-Velocity Environments](https://arxiv.org/abs/2501.12811) Lafedi Svet, Arthur Brightwell, Augustus Wildflower, Cecily Marshwood -+ [ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality](https://arxiv.org//abs/2501.12553) ++ [ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality](https://arxiv.org/abs/2501.12553) Yanming Xiu, Tim Scargill, Maria Gorlatova # 2025-01-21 -+ [FedCLEAN: byzantine defense by CLustering Errors of Activation maps in Non-IID federated learning environments](https://arxiv.org//abs/2501.12123) ++ [FedCLEAN: byzantine defense by CLustering Errors of Activation maps in Non-IID federated learning environments](https://arxiv.org/abs/2501.12123) Mehdi Ben Ghali, Reda Bellafqira, Gouenou Coatrieux -+ [With Great Backbones Comes Great Adversarial Transferability](https://arxiv.org//abs/2501.12275) ++ [With Great Backbones Comes Great Adversarial Transferability](https://arxiv.org/abs/2501.12275) Erik Arakelyan, Karen Hambardzumyan, Davit Papikyan, Pasquale Minervini, Albert Gordo, Isabelle Augenstein, Aram H. Markosyan -+ [Cross-Entropy Attacks to Language Models via Rare Event Simulation](https://arxiv.org//abs/2501.11852) ++ [Cross-Entropy Attacks to Language Models via Rare Event Simulation](https://arxiv.org/abs/2501.11852) Mingze Ni, Yongshun Gong, Wei Liu -+ [Extend Adversarial Policy Against Neural Machine Translation via Unknown Token](https://arxiv.org//abs/2501.12183) ++ [Extend Adversarial Policy Against Neural Machine Translation via Unknown Token](https://arxiv.org/abs/2501.12183) Wei Zou, Shujian Huang, Jiajun Chen -+ [CogMorph: Cognitive Morphing Attacks for Text-to-Image Models](https://arxiv.org//abs/2501.11815) ++ [CogMorph: Cognitive Morphing Attacks for Text-to-Image Models](https://arxiv.org/abs/2501.11815) Zonglei Jing, Zonghao Ying, Le Wang, Siyuan Liang, Aishan Liu, Xianglong Liu, Dacheng Tao -+ [Enhancing Adversarial Transferability via Component-Wise Augmentation Method](https://arxiv.org//abs/2501.11901) ++ [Enhancing Adversarial Transferability via Component-Wise Augmentation Method](https://arxiv.org/abs/2501.11901) Hangyu Liu, Bo Peng, Pengxiang Ding, Donglin Wang -+ [Provably effective detection of effective data poisoning attacks](https://arxiv.org//abs/2501.11795) ++ [Provably effective detection of effective data poisoning attacks](https://arxiv.org/abs/2501.11795) Jonathan Gallagher, Yasaman Esfandiari, Callen MacPhee, Michael Warren -+ [FedMUA: Exploring the Vulnerabilities of Federated Learning to Malicious Unlearning Attacks](https://arxiv.org//abs/2501.11848) ++ [FedMUA: Exploring the Vulnerabilities of Federated Learning to Malicious Unlearning Attacks](https://arxiv.org/abs/2501.11848) Jian Chen, Zehui Lin, Wanyu Lin, Wenlong Shi, Xiaoyan Yin, Di Wang -+ [You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense](https://arxiv.org//abs/2501.12210) ++ [You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense](https://arxiv.org/abs/2501.12210) Wuyuao Mai, Geng Hong, Pei Chen, Xudong Pan, Baojun Liu, Yuan Zhang, Haixin Duan, Min Yang -+ [An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts](https://arxiv.org//abs/2501.12521) ++ [An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts](https://arxiv.org/abs/2501.12521) Dhia Elhaq Rzig, Dhruba Jyoti Paul, Kaiser Pister, Jordan Henkel, Foyzul Hassan -+ [Robustness of Selected Learning Models under Label-Flipping Attack](https://arxiv.org//abs/2501.12516) ++ [Robustness of Selected Learning Models under Label-Flipping Attack](https://arxiv.org/abs/2501.12516) Sarvagya Bhargava, Mark Stamp -+ [Topology of Out-of-Distribution Examples in Deep Neural Networks](https://arxiv.org//abs/2501.12522) ++ [Topology of Out-of-Distribution Examples in Deep Neural Networks](https://arxiv.org/abs/2501.12522) Esha Datta, Johanna Hennig, Eva Domschot, Connor Mattes, Michael R. Smith # 2025-01-20 -+ [Trojan Detection Through Pattern Recognition for Large Language Models](https://arxiv.org//abs/2501.11621) ++ [Trojan Detection Through Pattern Recognition for Large Language Models](https://arxiv.org/abs/2501.11621) Vedant Bhasin, Matthew Yudin, Razvan Stefanescu, Rauf Izmailov -+ [On the Adversarial Vulnerabilities of Transfer Learning in Remote Sensing](https://arxiv.org//abs/2501.11462) ++ [On the Adversarial Vulnerabilities of Transfer Learning in Remote Sensing](https://arxiv.org/abs/2501.11462) Tao Bai, Xingjian Tian, Yonghao Xu, Bihan Wen -+ [Rethinking Membership Inference Attacks Against Transfer Learning](https://arxiv.org//abs/2501.11577) ++ [Rethinking Membership Inference Attacks Against Transfer Learning](https://arxiv.org/abs/2501.11577) Cong Wu, Jing Chen, Qianru Fang, Kun He, Ziming Zhao, Hao Ren, Guowen Xu, Yang Liu, Yang Xiang # 2025-01-19 -+ [Tell me about yourself: LLMs are aware of their learned behaviors](https://arxiv.org//abs/2501.11120) ++ [Tell me about yourself: LLMs are aware of their learned behaviors](https://arxiv.org/abs/2501.11120) Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, Owain Evans -+ [Explainable Adversarial Attacks on Coarse-to-Fine Classifiers](https://arxiv.org//abs/2501.10906) ++ [Explainable Adversarial Attacks on Coarse-to-Fine Classifiers](https://arxiv.org/abs/2501.10906) Akram Heidarizadeh, Connor Hatfield, Lorenzo Lazzarotto, HanQin Cai, George Atia -+ [GRID: Protecting Training Graph from Link Stealing Attacks on GNN Models](https://arxiv.org//abs/2501.10985) ++ [GRID: Protecting Training Graph from Link Stealing Attacks on GNN Models](https://arxiv.org/abs/2501.10985) Jiadong Lou, Xu Yuan, Rui Zhang, Xingliang Yuan, Neil Gong, Nian-Feng Tzeng -+ [Temporal Analysis of Adversarial Attacks in Federated Learning](https://arxiv.org//abs/2501.11054) ++ [Temporal Analysis of Adversarial Attacks in Federated Learning](https://arxiv.org/abs/2501.11054) Rohit Mapakshi, Sayma Akther, Mark Stamp -+ [Federated Testing (FedTest): A New Scheme to Enhance Convergence and Mitigate Adversarial Attacks in Federating Learning](https://arxiv.org//abs/2501.11167) ++ [Federated Testing (FedTest): A New Scheme to Enhance Convergence and Mitigate Adversarial Attacks in Federating Learning](https://arxiv.org/abs/2501.11167) Mustafa Ghaleb, Mohanad Obeed, Muhamad Felemban, Anas Chaaban, Halim Yanikomeroglu -+ [Effectiveness of Adversarial Benign and Malware Examples in Evasion and Poisoning Attacks](https://arxiv.org//abs/2501.10996) ++ [Effectiveness of Adversarial Benign and Malware Examples in Evasion and Poisoning Attacks](https://arxiv.org/abs/2501.10996) Matouš Kozák, Martin Jureček # 2025-01-18 -+ [Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks](https://arxiv.org//abs/2501.10639) ++ [Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks](https://arxiv.org/abs/2501.10639) Xin Yi, Yue Li, Linlin Wang, Xiaoling Wang, Liang He -+ [Jailbreaking Large Language Models in Infinitely Many Ways](https://arxiv.org//abs/2501.10800) ++ [Jailbreaking Large Language Models in Infinitely Many Ways](https://arxiv.org/abs/2501.10800) Oliver Goldstein, Emanuele La Malfa, Felix Drinkall, Samuele Marro, Michael Wooldridge -+ [Certifying Robustness via Topological Representations](https://arxiv.org//abs/2501.10876) ++ [Certifying Robustness via Topological Representations](https://arxiv.org/abs/2501.10876) Jens Agerberg, Andrea Guidolin, Andrea Martinelli, Pepijn Roos Hoefgeest, David Eklund, Martina Scolamiero @@ -15490,294 +15490,294 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Bin Han, Ye Yuan, Hans D. Schotten # 2025-01-17 -+ [Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach](https://arxiv.org//abs/2501.10202) ++ [Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach](https://arxiv.org/abs/2501.10202) Nicolas Atienza, Christophe Labreuche, Johanne Cohen, Michele Sebag -+ [CaFA: Cost-aware, Feasible Attacks With Database Constraints Against Neural Tabular Classifiers](https://arxiv.org//abs/2501.10013) ++ [CaFA: Cost-aware, Feasible Attacks With Database Constraints Against Neural Tabular Classifiers](https://arxiv.org/abs/2501.10013) Matan Ben-Tov, Daniel Deutch, Nave Frost, Mahmood Sharif -+ [Michscan: Black-Box Neural Network Integrity Checking at Runtime Through Power Analysis](https://arxiv.org//abs/2501.10174) ++ [Michscan: Black-Box Neural Network Integrity Checking at Runtime Through Power Analysis](https://arxiv.org/abs/2501.10174) Robi Paul, Michael Zuzak -+ [Differentiable Adversarial Attacks for Marked Temporal Point Processes](https://arxiv.org//abs/2501.10606) ++ [Differentiable Adversarial Attacks for Marked Temporal Point Processes](https://arxiv.org/abs/2501.10606) Pritish Chakraborty, Vinayak Gupta, Rahul R, Srikanta J. Bedathur, Abir De # 2025-01-16 -+ [A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy](https://arxiv.org//abs/2501.09431) ++ [A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy](https://arxiv.org/abs/2501.09431) Huandong Wang, Wenjie Fu, Yingzhou Tang, Zhilong Chen, Yuxi Huang, Jinghua Piao, Chen Gao, Fengli Xu, Tao Jiang, Yong Li -+ [Neural Honeytrace: A Robust Plug-and-Play Watermarking Framework against Model Extraction Attacks](https://arxiv.org//abs/2501.09328) ++ [Neural Honeytrace: A Robust Plug-and-Play Watermarking Framework against Model Extraction Attacks](https://arxiv.org/abs/2501.09328) Yixiao Xu, Binxing Fang, Rui Wang, Yinghai Zhou, Shouling Ji, Yuan Liu, Mohan Li, Zhihong Tian -+ [Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness](https://arxiv.org//abs/2501.09446) ++ [Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness](https://arxiv.org/abs/2501.09446) Zeyu Wang, Cihang Xie, Brian Bartoldson, Bhavya Kailkhura -+ [Cooperative Decentralized Backdoor Attacks on Vertical Federated Learning](https://arxiv.org//abs/2501.09320) ++ [Cooperative Decentralized Backdoor Attacks on Vertical Federated Learning](https://arxiv.org/abs/2501.09320) Seohyun Lee, Wenzhi Fang, Anindya Bijoy Das, Seyyedali Hosseinalipour, David J. Love, Christopher G. Brinton -+ [Adversarial-Ensemble Kolmogorov Arnold Networks for Enhancing Indoor Wi-Fi Positioning: A Defensive Approach Against Spoofing and Signal Manipulation Attacks](https://arxiv.org//abs/2501.09609) ++ [Adversarial-Ensemble Kolmogorov Arnold Networks for Enhancing Indoor Wi-Fi Positioning: A Defensive Approach Against Spoofing and Signal Manipulation Attacks](https://arxiv.org/abs/2501.09609) Mitul Goswami, Romit Chatterjee, Somnath Mahato, Prasant Kumar Pattnaik -+ [Enhancing Generalization in Chain of Thought Reasoning for Smaller Models](https://arxiv.org//abs/2501.09804) ++ [Enhancing Generalization in Chain of Thought Reasoning for Smaller Models](https://arxiv.org/abs/2501.09804) Maxwell J. Yin, Dingyi Jiang, Yongbing Chen, Boyu Wang, Charles Ling -+ [Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from Vision Transformer](https://arxiv.org//abs/2501.09817) ++ [Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from Vision Transformer](https://arxiv.org/abs/2501.09817) Haoyu Zhang, Raghavendra Ramachandra, Kiran Raja, Christoph Busch -+ [Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API](https://arxiv.org//abs/2501.09798) ++ [Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API](https://arxiv.org/abs/2501.09798) Andrey Labunets, Nishit V. Pandya, Ashish Hooda, Xiaohan Fu, Earlence Fernandes # 2025-01-15 -+ [Salient Information Preserving Adversarial Training Improves Clean and Robust Accuracy](https://arxiv.org//abs/2501.09086) ++ [Salient Information Preserving Adversarial Training Improves Clean and Robust Accuracy](https://arxiv.org/abs/2501.09086) Timothy Redgrave, Adam Czajka -+ [Improving the Efficiency of Self-Supervised Adversarial Training through Latent Clustering-Based Selection](https://arxiv.org//abs/2501.10466) ++ [Improving the Efficiency of Self-Supervised Adversarial Training through Latent Clustering-Based Selection](https://arxiv.org/abs/2501.10466) Somrita Ghosh, Yuelin Xu, Xiao Zhang # 2025-01-14 -+ [Self-Instruct Few-Shot Jailbreaking: Decompose the Attack into Pattern and Behavior Learning](https://arxiv.org//abs/2501.07959) ++ [Self-Instruct Few-Shot Jailbreaking: Decompose the Attack into Pattern and Behavior Learning](https://arxiv.org/abs/2501.07959) Jiaqi Hua, Wanxu Wei -+ [Gandalf the Red: Adaptive Security for LLMs](https://arxiv.org//abs/2501.07927) ++ [Gandalf the Red: Adaptive Security for LLMs](https://arxiv.org/abs/2501.07927) Niklas Pfister, Václav Volhejn, Manuel Knott, Santiago Arias, Julia Bazińska, Mykhailo Bichurin, Alan Commike, Janet Darling, Peter Dienes, Matthew Fiedler, David Haber, Matthias Kraft, Marco Lancini, Max Mathys, Damián Pascual-Ortiz, Jakub Podolak, Adrià Romero-López, Kyriacos Shiarlis, Andreas Signer, Zsolt Terek, Athanasios Theocharis, Daniel Timbrell, Samuel Trautwein, Samuel Watts, Natalie Wu, Mateo Rojas-Carulla -+ [READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data](https://arxiv.org//abs/2501.08035) ++ [READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data](https://arxiv.org/abs/2501.08035) Rohit Sharma, Shanu Kumar, Avinash Kumar -+ [ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving](https://arxiv.org//abs/2501.08203) ++ [ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving](https://arxiv.org/abs/2501.08203) Zain Ul Abedin, Shahzeb Qamar, Lucie Flek, Akbar Karimi -+ [VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models](https://arxiv.org//abs/2501.07922) ++ [VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models](https://arxiv.org/abs/2501.07922) Hui Kuurila-Zhang, Haoyu Chen, Guoying Zhao -+ [Energy Backdoor Attack to Deep Neural Networks](https://arxiv.org//abs/2501.08152) ++ [Energy Backdoor Attack to Deep Neural Networks](https://arxiv.org/abs/2501.08152) Hanene F. Z. Brachemi Meftah, Wassim Hamidouche, Sid Ahmed Fezza, Olivier Déforges, Kassem Kallas -+ [Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical World](https://arxiv.org//abs/2501.08258) ++ [Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical World](https://arxiv.org/abs/2501.08258) Dudi Biton, Jacob Shams, Koda Satoru, Asaf Shabtai, Yuval Elovici, Ben Nassi -+ [Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints](https://arxiv.org//abs/2501.08246) ++ [Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints](https://arxiv.org/abs/2501.08246) Jonathan Nöther, Adish Singla, Goran Radanović -+ [Tag&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack](https://arxiv.org//abs/2501.08454) ++ [Tag&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack](https://arxiv.org/abs/2501.08454) Sagiv Antebi, Edan Habler, Asaf Shabtai, Yuval Elovici # 2025-01-13 -+ [Lessons From Red Teaming 100 Generative AI Products](https://arxiv.org//abs/2501.07238) ++ [Lessons From Red Teaming 100 Generative AI Products](https://arxiv.org/abs/2501.07238) Blake Bullwinkel, Amanda Minnich, Shiven Chawla, Gary Lopez, Martin Pouliot, Whitney Maxwell, Joris de Gruyter, Katherine Pratt, Saphir Qi, Nina Chikanov, Roman Lutz, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj, Eugenia Kim, Justin Song, Keegan Hines, Daniel Jones, Giorgio Severi, Richard Lundeen, Sam Vaughan, Victoria Westerhoff, Pete Bryan, Ram Shankar Siva Kumar, Yonatan Zunger, Chang Kawaguchi, Mark Russinovich -+ [MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework](https://arxiv.org//abs/2501.07251) ++ [MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework](https://arxiv.org/abs/2501.07251) Ping Guo, Cheng Gong, Xi Lin, Fei Liu, Zhichao Lu, Qingfu Zhang, Zhenkun Wang -+ [Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities](https://arxiv.org//abs/2501.07044) ++ [Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities](https://arxiv.org/abs/2501.07044) Jialin Wu, Kaikai Pan, Yanjiao Chen, Jiangyi Deng, Shengyuan Pang, Wenyuan Xu -+ [Generating Poisoning Attacks against Ridge Regression Models with Categorical Features](https://arxiv.org//abs/2501.07275) ++ [Generating Poisoning Attacks against Ridge Regression Models with Categorical Features](https://arxiv.org/abs/2501.07275) Monse Guedes-Ayala, Lars Schewe, Zeynep Suvak, Miguel Anjos -+ [Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards](https://arxiv.org//abs/2501.07493) ++ [Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards](https://arxiv.org/abs/2501.07493) Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos, Nicholas Carlini, Wei-Lin Chiang, Christopher A. Choquette-Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Ken Ziyu Liu, Ion Stoica, Florian Tramer, Chiyuan Zhang -+ [Pantomime: Motion Data Anonymization using Foundation Motion Models](https://arxiv.org//abs/2501.07149) ++ [Pantomime: Motion Data Anonymization using Foundation Motion Models](https://arxiv.org/abs/2501.07149) Simon Hanisch, Julian Todt, Thorsten Strufe # 2025-01-12 -+ [Measuring the Robustness of Reference-Free Dialogue Evaluation Systems](https://arxiv.org//abs/2501.06728) ++ [Measuring the Robustness of Reference-Free Dialogue Evaluation Systems](https://arxiv.org/abs/2501.06728) Justin Vasselli, Adam Nohejl, Taro Watanabe -+ [ZOQO: Zero-Order Quantized Optimization](https://arxiv.org//abs/2501.06736) ++ [ZOQO: Zero-Order Quantized Optimization](https://arxiv.org/abs/2501.06736) Noga Bar, Raja Giryes -+ [Understanding and Mitigating Membership Inference Risks of Neural Ordinary Differential Equations](https://arxiv.org//abs/2501.06686) ++ [Understanding and Mitigating Membership Inference Risks of Neural Ordinary Differential Equations](https://arxiv.org/abs/2501.06686) Sanghyun Hong, Fan Wu, Anthony Gruber, Kookjin Lee -+ [KeTS: Kernel-based Trust Segmentation against Model Poisoning Attacks](https://arxiv.org//abs/2501.06729) ++ [KeTS: Kernel-based Trust Segmentation against Model Poisoning Attacks](https://arxiv.org/abs/2501.06729) Ankit Gangwal, Mauro Conti, Tommaso Pauselli # 2025-01-11 -+ [DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy](https://arxiv.org//abs/2501.06533) ++ [DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy](https://arxiv.org/abs/2501.06533) Wenshu Fan, Minxing Zhang, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Xiangyu Yue, Michael Backes, Xiao Zhang -+ [SafeSplit: A Novel Defense Against Client-Side Backdoor Attacks in Split Learning](https://arxiv.org//abs/2501.06650) ++ [SafeSplit: A Novel Defense Against Client-Side Backdoor Attacks in Split Learning](https://arxiv.org/abs/2501.06650) Phillip Rieger, Alessandro Pegoraro, Kavita Kumari, Tigist Abera, Jonathan Knauer, Ahmad-Reza Sadeghi # 2025-01-10 -+ [UV-Attack: Physical-World Adversarial Attacks for Person Detection via Dynamic-NeRF-based UV Mapping](https://arxiv.org//abs/2501.05783) ++ [UV-Attack: Physical-World Adversarial Attacks for Person Detection via Dynamic-NeRF-based UV Mapping](https://arxiv.org/abs/2501.05783) Yanjie Li, Wenxuan Zhang, Kaisheng Liang, Bin Xiao -+ [Towards Backdoor Stealthiness in Model Parameter Space](https://arxiv.org//abs/2501.05928) ++ [Towards Backdoor Stealthiness in Model Parameter Space](https://arxiv.org/abs/2501.05928) Xiaoyun Xu, Zhuoran Liu, Stefanos Koffas, Stjepan Picek -+ [Effective faking of verbal deception detection with target-aligned adversarial attacks](https://arxiv.org//abs/2501.05962) ++ [Effective faking of verbal deception detection with target-aligned adversarial attacks](https://arxiv.org/abs/2501.05962) Bennett Kleinberg, Riccardo Loconte, Bruno Verschuere -+ [Fine-tuning is Not Fine: Mitigating Backdoor Attacks in GNNs with Limited Clean Data](https://arxiv.org//abs/2501.05835) ++ [Fine-tuning is Not Fine: Mitigating Backdoor Attacks in GNNs with Limited Clean Data](https://arxiv.org/abs/2501.05835) Jiale Zhang, Bosen Rao, Chengcheng Zhu, Xiaobing Sun, Qingming Li, Haibo Hu, Xiapu Luo, Qingqing Ye, Shouling Ji -+ [Model Inversion in Split Learning for Personalized LLMs: New Insights from Information Bottleneck Theory](https://arxiv.org//abs/2501.05965) ++ [Model Inversion in Split Learning for Personalized LLMs: New Insights from Information Bottleneck Theory](https://arxiv.org/abs/2501.05965) Yunmeng Shu, Shaofeng Li, Tian Dong, Yan Meng, Haojin Zhu -+ [SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech](https://arxiv.org//abs/2505.09616) ++ [SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech](https://arxiv.org/abs/2505.09616) Yuqi Li, Yuanzhong Zheng, Zhongtian Guo, Yaoxuan Wang, Jianjun Yin, Haojun Fei -+ [ActMiner: Applying Causality Tracking and Increment Aligning for Graph-based Cyber Threat Hunting](https://arxiv.org//abs/2501.05793) ++ [ActMiner: Applying Causality Tracking and Increment Aligning for Graph-based Cyber Threat Hunting](https://arxiv.org/abs/2501.05793) Mingjun Ma, Tiantian Zhu, Shuang Li, Tieming Chen, Mingqi Lv, Zhengqiu Weng, Guolang Chen # 2025-01-09 -+ [Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency](https://arxiv.org//abs/2501.04931) ++ [Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency](https://arxiv.org/abs/2501.04931) Shiji Zhao, Ranjie Duan, Fengxiang Wang, Chi Chen, Caixin Kang, Jialing Tao, YueFeng Chen, Hui Xue, Xingxing Wei -+ [On Measuring Unnoticeability of Graph Adversarial Attacks: Observations, New Measure, and Applications](https://arxiv.org//abs/2501.05015) ++ [On Measuring Unnoticeability of Graph Adversarial Attacks: Observations, New Measure, and Applications](https://arxiv.org/abs/2501.05015) Hyeonsoo Jo, Hyunjin Hwang, Fanchen Bu, Soo Yong Lee, Chanyoung Park, Kijung Shin -+ [TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning](https://arxiv.org//abs/2501.05053) ++ [TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning](https://arxiv.org/abs/2501.05053) Runhua Xu, Bo Li, Chao Li, James B.D. Joshi, Shuai Ma, Jianxin Li -+ [CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models](https://arxiv.org//abs/2501.05359) ++ [CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2501.05359) Junha Park, Ian Ryu, Jaehui Hwang, Hyungkeun Park, Jiyoon Kim, Jong-Seok Lee -+ [Is Your Autonomous Vehicle Safe? Understanding the Threat of Electromagnetic Signal Injection Attacks on Traffic Scene Perception](https://arxiv.org//abs/2501.05239) ++ [Is Your Autonomous Vehicle Safe? Understanding the Threat of Electromagnetic Signal Injection Attacks on Traffic Scene Perception](https://arxiv.org/abs/2501.05239) Wenhao Liao, Sineng Yan, Youqian Zhang, Xinwei Zhai, Yuanyuan Wang, Eugene Yujun Fu -+ [Targeted Adversarial Denoising Autoencoders (TADA) for Neural Time Series Filtration](https://arxiv.org//abs/2501.04967) ++ [Targeted Adversarial Denoising Autoencoders (TADA) for Neural Time Series Filtration](https://arxiv.org/abs/2501.04967) Benjamin J. Choi, Griffin Milsap, Clara A. Scholl, Francesco Tenore, Mattson Ogg # 2025-01-08 -+ [Towards Fair Class-wise Robustness: Class Optimal Distribution Adversarial Training](https://arxiv.org//abs/2501.04527) ++ [Towards Fair Class-wise Robustness: Class Optimal Distribution Adversarial Training](https://arxiv.org/abs/2501.04527) Hongxin Zhi, Hongtao Yu, Shaome Li, Xiuming Zhao, Yiteng Wu -+ [Gradient Purification: Defense Against Poisoning Attack in Decentralized Federated Learning](https://arxiv.org//abs/2501.04453) ++ [Gradient Purification: Defense Against Poisoning Attack in Decentralized Federated Learning](https://arxiv.org/abs/2501.04453) Bin Li, Xiaoye Miao, Yongheng Shang, Xinkui Zhao, Shuiguang Deng, Jianwei Yin -+ [Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval](https://arxiv.org//abs/2501.04802) ++ [Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval](https://arxiv.org/abs/2501.04802) Yongkang Li, Panagiotis Eustratiadis, Evangelos Kanoulas -+ [LayerMix: Enhanced Data Augmentation through Fractal Integration for Robust Deep Learning](https://arxiv.org//abs/2501.04861) ++ [LayerMix: Enhanced Data Augmentation through Fractal Integration for Robust Deep Learning](https://arxiv.org/abs/2501.04861) Hafiz Mughees Ahmad, Dario Morle, Afshin Rahimi # 2025-01-07 -+ [Rethinking Adversarial Attacks in Reinforcement Learning from Policy Distribution Perspective](https://arxiv.org//abs/2501.03562) ++ [Rethinking Adversarial Attacks in Reinforcement Learning from Policy Distribution Perspective](https://arxiv.org/abs/2501.03562) Tianyang Duan, Zongyuan Zhang, Zheng Lin, Yue Gao, Ling Xiong, Yong Cui, Hongbin Liang, Xianhao Chen, Heming Cui, Dong Huang -+ [Synthetic Data Privacy Metrics](https://arxiv.org//abs/2501.03941) ++ [Synthetic Data Privacy Metrics](https://arxiv.org/abs/2501.03941) Amy Steier, Lipika Ramaswamy, Andre Manoel, Alexa Haushalter -+ [An Empirical Study of Accuracy-Robustness Tradeoff and Training Efficiency in Self-Supervised Learning](https://arxiv.org//abs/2501.03507) ++ [An Empirical Study of Accuracy-Robustness Tradeoff and Training Efficiency in Self-Supervised Learning](https://arxiv.org/abs/2501.03507) Fatemeh Ghofrani, Pooyan Jamshidi -+ [MADation: Face Morphing Attack Detection with Foundation Models](https://arxiv.org//abs/2501.03800) ++ [MADation: Face Morphing Attack Detection with Foundation Models](https://arxiv.org/abs/2501.03800) Eduarda Caldeira, Guray Ozgur, Tahar Chettaoui, Marija Ivanovska, Fadi Boutros, Vitomir Struc, Naser Damer -+ [Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection](https://arxiv.org//abs/2501.03940) ++ [Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection](https://arxiv.org/abs/2501.03940) Pablo Miralles-González, Javier Huertas-Tato, Alejandro Martín, David Camacho @@ -15786,230 +15786,230 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Mario Bravo, Juan P. Flores-Mella, Cristóbal Guzmán # 2025-01-06 -+ [From Models to Network Topologies: A Topology Inference Attack in Decentralized Federated Learning](https://arxiv.org//abs/2501.03119) ++ [From Models to Network Topologies: A Topology Inference Attack in Decentralized Federated Learning](https://arxiv.org/abs/2501.03119) Chao Feng, Yuanzhe Gao, Alberto Huertas Celdran, Gerome Bovet, Burkhard Stiller -+ [MBTSAD: Mitigating Backdoors in Language Models Based on Token Splitting and Attention Distillation](https://arxiv.org//abs/2501.02754) ++ [MBTSAD: Mitigating Backdoors in Language Models Based on Token Splitting and Attention Distillation](https://arxiv.org/abs/2501.02754) Yidong Ding, Jiafei Niu, Ping Yi -+ [Persistence of Backdoor-based Watermarks for Neural Networks: A Comprehensive Evaluation](https://arxiv.org//abs/2501.02704) ++ [Persistence of Backdoor-based Watermarks for Neural Networks: A Comprehensive Evaluation](https://arxiv.org/abs/2501.02704) Anh Tu Ngo, Chuan Song Heng, Nandish Chattopadhyay, Anupam Chattopadhyay -+ [Rethinking Byzantine Robustness in Federated Recommendation from Sparse Aggregation Perspective](https://arxiv.org//abs/2501.03301) ++ [Rethinking Byzantine Robustness in Federated Recommendation from Sparse Aggregation Perspective](https://arxiv.org/abs/2501.03301) Zhongjian Zhang, Mengmei Zhang, Xiao Wang, Lingjuan Lyu, Bo Yan, Junping Du, Chuan Shi -+ [DAMAGE: Detecting Adversarially Modified AI Generated Text](https://arxiv.org//abs/2501.03437) ++ [DAMAGE: Detecting Adversarially Modified AI Generated Text](https://arxiv.org/abs/2501.03437) Elyas Masrour, Bradley Emi, Max Spero -+ [The Robustness of Spiking Neural Networks in Federated Learning with Compression Against Non-omniscient Byzantine Attacks](https://arxiv.org//abs/2501.03306) ++ [The Robustness of Spiking Neural Networks in Federated Learning with Compression Against Non-omniscient Byzantine Attacks](https://arxiv.org/abs/2501.03306) Manh V. Nguyen, Liang Zhao, Bobin Deng, Shaoen Wu -+ [On the Adversarial Robustness of Benjamini Hochberg](https://arxiv.org//abs/2501.03402) ++ [On the Adversarial Robustness of Benjamini Hochberg](https://arxiv.org/abs/2501.03402) Louis L Chen, Roberto Szechtman, Matan Seri # 2025-01-05 -+ [Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense](https://arxiv.org//abs/2501.02629) ++ [Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense](https://arxiv.org/abs/2501.02629) Yang Ouyang, Hengrui Gu, Shuhang Lin, Wenyue Hua, Jie Peng, Bhavya Kailkhura, Tianlong Chen, Kaixiong Zhou -+ [Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks](https://arxiv.org//abs/2501.02654) ++ [Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks](https://arxiv.org/abs/2501.02654) Yang Wang, Chenghua Lin -+ [GCP: Guarded Collaborative Perception with Spatial-Temporal Aware Malicious Agent Detection](https://arxiv.org//abs/2501.02450) ++ [GCP: Guarded Collaborative Perception with Spatial-Temporal Aware Malicious Agent Detection](https://arxiv.org/abs/2501.02450) Yihang Tao, Senkang Hu, Yue Hu, Haonan An, Hangcheng Cao, Yuguang Fang -+ [Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models](https://arxiv.org//abs/2501.03272) ++ [Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models](https://arxiv.org/abs/2501.03272) Peihai Jiang, Xixiang Lyu, Yige Li, Jing Ma -+ [Towards the Anonymization of the Language Modeling](https://arxiv.org//abs/2501.02407) ++ [Towards the Anonymization of the Language Modeling](https://arxiv.org/abs/2501.02407) Antoine Boutet, Lucas Magnana, Juliette Sénéchal, Helain Zimmermann # 2025-01-04 -+ [AdaMixup: A Dynamic Defense Framework for Membership Inference Attack Mitigation](https://arxiv.org//abs/2501.02182) ++ [AdaMixup: A Dynamic Defense Framework for Membership Inference Attack Mitigation](https://arxiv.org/abs/2501.02182) Ying Chen, Jiajing Chen, Yijie Weng, ChiaHua Chang, Dezhi Yu, Guanbiao Lin -+ [Distillation-Enhanced Physical Adversarial Attacks](https://arxiv.org//abs/2501.02232) ++ [Distillation-Enhanced Physical Adversarial Attacks](https://arxiv.org/abs/2501.02232) Wei Liu, Yonglin Wu, Chaoqun Li, Zhuodong Liu, Huanqian Yan -+ [BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors](https://arxiv.org//abs/2501.02373) ++ [BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors](https://arxiv.org/abs/2501.02373) Chia-Yi Hsu, Yu-Lin Tsai, Yu Zhe, Yan-Lun Chen, Chih-Hsun Lin, Chia-Mu Yu, Yang Zhang, Chun-Ying Huang, Jun Sakuma -+ [Exploring Secure Machine Learning Through Payload Injection and FGSM Attacks on ResNet-50](https://arxiv.org//abs/2501.02147) ++ [Exploring Secure Machine Learning Through Payload Injection and FGSM Attacks on ResNet-50](https://arxiv.org/abs/2501.02147) Umesh Yadav, Suman Niraula, Gaurav Kumar Gupta, Bicky Yadav # 2025-01-03 -+ [BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems](https://arxiv.org//abs/2501.01593) ++ [BLAST: A Stealthy Backdoor Leverage Attack against Cooperative Multi-Agent Deep Reinforcement Learning based Systems](https://arxiv.org/abs/2501.01593) Yinbo Yu, Saihao Yan, Xueyu Yin, Jing Fang, Jiajia Liu -+ [Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models](https://arxiv.org//abs/2501.01830) ++ [Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models](https://arxiv.org/abs/2501.01830) Yanjiang Liu, Shuhen Zhou, Yaojie Lu, Huijia Zhu, Weiqiang Wang, Hongyu Lin, Ben He, Xianpei Han, Le Sun -+ [Mingling with the Good to Backdoor Federated Learning](https://arxiv.org//abs/2501.01913) ++ [Mingling with the Good to Backdoor Federated Learning](https://arxiv.org/abs/2501.01913) Nuno Neves -+ [Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions](https://arxiv.org//abs/2501.01872) ++ [Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions](https://arxiv.org/abs/2501.01872) Rachneet Sachdeva, Rima Hazra, Iryna Gurevych -+ [Detecting and Mitigating Adversarial Attacks on Deep Learning-Based MRI Reconstruction Without Any Retraining](https://arxiv.org//abs/2501.01908) ++ [Detecting and Mitigating Adversarial Attacks on Deep Learning-Based MRI Reconstruction Without Any Retraining](https://arxiv.org/abs/2501.01908) Mahdi Saberi, Chi Zhang, Mehmet Akcakaya -+ [Adaptive Meta-learning-based Adversarial Training for Robust Automatic Modulation Classification](https://arxiv.org//abs/2501.01620) ++ [Adaptive Meta-learning-based Adversarial Training for Robust Automatic Modulation Classification](https://arxiv.org/abs/2501.01620) Amirmohammad Bamdad, Ali Owfi, Fatemeh Afghah -+ [Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models](https://arxiv.org//abs/2501.02029) ++ [Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models](https://arxiv.org/abs/2501.02029) Ziwei Zheng, Junyao Zhao, Le Yang, Lijun He, Fan Li -+ [AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs](https://arxiv.org//abs/2501.02135) ++ [AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs](https://arxiv.org/abs/2501.02135) Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Yaoting Wang, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha -+ [Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI](https://arxiv.org//abs/2501.02042) ++ [Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI](https://arxiv.org/abs/2501.02042) Christopher Burger, Charles Walter, Thai Le, Lingwei Chen -+ [How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models](https://arxiv.org//abs/2501.01741) ++ [How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models](https://arxiv.org/abs/2501.01741) Simone Corbo, Luca Bancale, Valeria De Gennaro, Livia Lestingi, Vincenzo Scotti, Matteo Camilli # 2025-01-02 -+ [Towards Adversarially Robust Deep Metric Learning](https://arxiv.org//abs/2501.01025) ++ [Towards Adversarially Robust Deep Metric Learning](https://arxiv.org/abs/2501.01025) Xiaopeng Ke -+ [Stealthy Backdoor Attack to Real-world Models in Android Apps](https://arxiv.org//abs/2501.01263) ++ [Stealthy Backdoor Attack to Real-world Models in Android Apps](https://arxiv.org/abs/2501.01263) Jiali Wei, Ming Fan, Xicheng Zhang, Wenjing Jiao, Haijun Wang, Ting Liu -+ [Boosting Adversarial Transferability with Spatial Adversarial Alignment](https://arxiv.org//abs/2501.01015) ++ [Boosting Adversarial Transferability with Spatial Adversarial Alignment](https://arxiv.org/abs/2501.01015) Zhaoyu Chen, Haijing Guo, Kaixun Jiang, Jiyuan Fu, Xinyu Zhou, Dingkang Yang, Hao Tang, Bo Li, Wenqiang Zhang -+ [Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs](https://arxiv.org//abs/2501.01042) ++ [Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs](https://arxiv.org/abs/2501.01042) Linhao Huang, Xue Jiang, Zhiqiang Wang, Wentao Mo, Xi Xiao, Bo Han, Yongjie Yin, Feng Zheng -+ [AIM: Additional Image Guided Generation of Transferable Adversarial Attacks](https://arxiv.org//abs/2501.01106) ++ [AIM: Additional Image Guided Generation of Transferable Adversarial Attacks](https://arxiv.org/abs/2501.01106) Teng Li, Xingjun Ma, Yu-Gang Jiang -+ [HoneypotNet: Backdoor Attacks Against Model Extraction](https://arxiv.org//abs/2501.01090) ++ [HoneypotNet: Backdoor Attacks Against Model Extraction](https://arxiv.org/abs/2501.01090) Yixu Wang, Tianle Gu, Yan Teng, Yingchun Wang, Xingjun Ma -+ [Best Transition Matrix Esitimation or Best Label Noise Robustness Classifier? Two Possible Methods to Enhance the Performance of T-revision](https://arxiv.org//abs/2501.01402) ++ [Best Transition Matrix Esitimation or Best Label Noise Robustness Classifier? Two Possible Methods to Enhance the Performance of T-revision](https://arxiv.org/abs/2501.01402) Haixu Liu, Zerui Tao, Naihui Zhang, Sixing Liu -+ [A Game Between the Defender and the Attacker for Trigger-based Black-box Model Watermarking](https://arxiv.org//abs/2501.01194) ++ [A Game Between the Defender and the Attacker for Trigger-based Black-box Model Watermarking](https://arxiv.org/abs/2501.01194) Chaoyue Huang, Hanzhou Wu -+ [Improving Robustness Estimates in Natural Language Explainable AI though Synonymity Weighted Similarity Measures](https://arxiv.org//abs/2501.01516) ++ [Improving Robustness Estimates in Natural Language Explainable AI though Synonymity Weighted Similarity Measures](https://arxiv.org/abs/2501.01516) Christopher Burger -+ [SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers](https://arxiv.org//abs/2501.01529) ++ [SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers](https://arxiv.org/abs/2501.01529) Bhavna Gopal, Huanrui Yang, Mark Horton, Yiran Chen -+ [Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs](https://arxiv.org//abs/2501.02018) ++ [Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs](https://arxiv.org/abs/2501.02018) Joao Fonseca, Andrew Bell, Julia Stoyanovich -+ [Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection](https://arxiv.org//abs/2501.01184) ++ [Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection](https://arxiv.org/abs/2501.01184) Dat Nguyen, Marcella Astrid, Anis Kacem, Enjie Ghorbel, Djamila Aouada -+ [Domain-invariant feature learning in brain MR imaging for content-based image retrieval](https://arxiv.org//abs/2501.01326) ++ [Domain-invariant feature learning in brain MR imaging for content-based image retrieval](https://arxiv.org/abs/2501.01326) Shuya Tobari, Shuhei Tomoshige, Hayato Muraki, Kenichi Oishi, Hitoshi Iyatomi # 2025-01-01 -+ [Everywhere Attack: Attacking Locally and Globally to Boost Targeted Transferability](https://arxiv.org//abs/2501.00707) ++ [Everywhere Attack: Attacking Locally and Globally to Boost Targeted Transferability](https://arxiv.org/abs/2501.00707) Hui Zeng, Sanshuai Cui, Biwei Chen, Anjie Peng -+ [Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines](https://arxiv.org//abs/2501.00745) ++ [Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines](https://arxiv.org/abs/2501.00745) Xiyang Hu -+ [Make Shuffling Great Again: A Side-Channel Resistant Fisher-Yates Algorithm for Protecting Neural Networks](https://arxiv.org//abs/2501.00798) ++ [Make Shuffling Great Again: A Side-Channel Resistant Fisher-Yates Algorithm for Protecting Neural Networks](https://arxiv.org/abs/2501.00798) Leonard Puškáč, Marek Benovič, Jakub Breier, Xiaolu Hou -+ [TrustRAG: Enhancing Robustness and Trustworthiness in RAG](https://arxiv.org//abs/2501.00879) ++ [TrustRAG: Enhancing Robustness and Trustworthiness in RAG](https://arxiv.org/abs/2501.00879) Huichi Zhou, Kin-Hei Lee, Zhonghao Zhan, Yue Chen, Zhenhao Li -+ [Information Sifting Funnel: Privacy-preserving Collaborative Inference Against Model Inversion Attacks](https://arxiv.org//abs/2501.00824) ++ [Information Sifting Funnel: Privacy-preserving Collaborative Inference Against Model Inversion Attacks](https://arxiv.org/abs/2501.00824) Rongke Liu -+ [A Survey of Secure Semantic Communications](https://arxiv.org//abs/2501.00842) ++ [A Survey of Secure Semantic Communications](https://arxiv.org/abs/2501.00842) Rui Meng, Song Gao, Dayu Fan, Haixiao Gao, Yining Wang, Xiaodong Xu, Bizhu Wang, Suyu Lv, Zhidi Zhang, Mengying Sun, Shujun Han, Chen Dong, Xiaofeng Tao, Ping Zhang @@ -16019,387 +16019,387 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Hadi Askari, Shivanshu Gupta, Terry Tong, Fei Wang, Anshuman Chhabra, Muhao Chen # 2024-12-31 -+ [Extending XReason: Formal Explanations for Adversarial Detection](https://arxiv.org//abs/2501.00537) ++ [Extending XReason: Formal Explanations for Adversarial Detection](https://arxiv.org/abs/2501.00537) Amira Jemaa, Adnan Rashid, Sofiene Tahar -+ [Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models](https://arxiv.org//abs/2501.00418) ++ [Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models](https://arxiv.org/abs/2501.00418) Martin Pawelczyk, Lillian Sun, Zhenting Qi, Aounon Kumar, Himabindu Lakkaraju -+ [A Method for Enhancing the Safety of Large Model Generation Based on Multi-dimensional Attack and Defense](https://arxiv.org//abs/2501.00517) ++ [A Method for Enhancing the Safety of Large Model Generation Based on Multi-dimensional Attack and Defense](https://arxiv.org/abs/2501.00517) Keke Zhai # 2024-12-30 -+ [Enhancing AI Safety Through the Fusion of Low Rank Adapters](https://arxiv.org//abs/2501.06208) ++ [Enhancing AI Safety Through the Fusion of Low Rank Adapters](https://arxiv.org/abs/2501.06208) Satya Swaroop Gudipudi, Sreeram Vipparla, Harpreet Singh, Shashwat Goel, Ponnurangam Kumaraguru -+ [ExpShield: Safeguarding Web Text from Unauthorized Crawling and Language Modeling Exploitation](https://arxiv.org//abs/2412.21123) ++ [ExpShield: Safeguarding Web Text from Unauthorized Crawling and Language Modeling Exploitation](https://arxiv.org/abs/2412.21123) Ruixuan Liu, Toan Tran, Tianhao Wang, Hongsheng Hu, Shuo Wang, Li Xiong -+ [BridgePure: Limited Protection Leakage Can Break Black-Box Data Protection](https://arxiv.org//abs/2412.21061) ++ [BridgePure: Limited Protection Leakage Can Break Black-Box Data Protection](https://arxiv.org/abs/2412.21061) Yihan Wang, Yiwei Lu, Xiao-Shan Gao, Gautam Kamath, Yaoliang Yu -+ [Inclusion 2024 Global Multimedia Deepfake Detection Challenge: Towards Multi-dimensional Face Forgery Detection](https://arxiv.org//abs/2412.20833) ++ [Inclusion 2024 Global Multimedia Deepfake Detection Challenge: Towards Multi-dimensional Face Forgery Detection](https://arxiv.org/abs/2412.20833) Yi Zhang, Weize Gao, Changtao Miao, Man Luo, Jianshu Li, Wenzhong Deng, Zhe Li, Bingyu Hu, Weibin Yao, Yunfeng Diao, Wenbo Zhou, Tao Gong, Qi Chu -+ [GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search](https://arxiv.org//abs/2412.20953) ++ [GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search](https://arxiv.org/abs/2412.20953) Matan Ben-Tov, Mahmood Sharif # 2024-12-29 -+ [On Adversarial Robustness of Language Models in Transfer Learning](https://arxiv.org//abs/2501.00066) ++ [On Adversarial Robustness of Language Models in Transfer Learning](https://arxiv.org/abs/2501.00066) Bohdan Turbal, Anastasiia Mazur, Jiaxu Zhao, Mykola Pechenizkiy -+ [Adversarial Negotiation Dynamics in Generative Language Models](https://arxiv.org//abs/2501.00069) ++ [Adversarial Negotiation Dynamics in Generative Language Models](https://arxiv.org/abs/2501.00069) Arinbjörn Kolbeinsson, Benedikt Kolbeinsson # 2024-12-28 -+ [AdvAnchor: Enhancing Diffusion Model Unlearning with Adversarial Anchors](https://arxiv.org//abs/2501.00054) ++ [AdvAnchor: Enhancing Diffusion Model Unlearning with Adversarial Anchors](https://arxiv.org/abs/2501.00054) Mengnan Zhao, Lihe Zhang, Xingyi Yang, Tianhang Zheng, Baocai Yin -+ [LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models](https://arxiv.org//abs/2501.00055) ++ [LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models](https://arxiv.org/abs/2501.00055) Miao Yu, Junfeng Fang, Yingjie Zhou, Xing Fan, Kun Wang, Shirui Pan, Qingsong Wen -+ [Learning in Multiple Spaces: Few-Shot Network Attack Detection with Metric-Fused Prototypical Networks](https://arxiv.org//abs/2501.00050) ++ [Learning in Multiple Spaces: Few-Shot Network Attack Detection with Metric-Fused Prototypical Networks](https://arxiv.org/abs/2501.00050) Fernando Martinez-Lopez, Lesther Santana, Mohamed Rahouti # 2024-12-25 -+ [Robustness Evaluation of Offline Reinforcement Learning for Robot Control Against Action Perturbations](https://arxiv.org//abs/2412.18781) ++ [Robustness Evaluation of Offline Reinforcement Learning for Robot Control Against Action Perturbations](https://arxiv.org/abs/2412.18781) Shingo Ayabe, Takuto Otomo, Hiroshi Kera, Kazuhiko Kawamoto # 2024-12-24 -+ [Pirates of the RAG: Adaptively Attacking LLMs to Leak Knowledge Bases](https://arxiv.org//abs/2412.18295) ++ [Pirates of the RAG: Adaptively Attacking LLMs to Leak Knowledge Bases](https://arxiv.org/abs/2412.18295) Christian Di Maio, Cristian Cosci, Marco Maggini, Valentina Poggioni, Stefano Melacci -+ [Hypergraph Attacks via Injecting Homogeneous Nodes into Elite Hyperedges](https://arxiv.org//abs/2412.18365) ++ [Hypergraph Attacks via Injecting Homogeneous Nodes into Elite Hyperedges](https://arxiv.org/abs/2412.18365) Meixia He, Peican Zhu, Keke Tang, Yangming Guo -+ [Unveiling the Threat of Fraud Gangs to Graph Neural Networks: Multi-Target Graph Injection Attacks against GNN-Based Fraud Detectors](https://arxiv.org//abs/2412.18370) ++ [Unveiling the Threat of Fraud Gangs to Graph Neural Networks: Multi-Target Graph Injection Attacks against GNN-Based Fraud Detectors](https://arxiv.org/abs/2412.18370) Jinhyeok Choi, Heehyeon Kim, Joyce Jiyoung Whang -+ [Robustness-aware Automatic Prompt Optimization](https://arxiv.org//abs/2412.18196) ++ [Robustness-aware Automatic Prompt Optimization](https://arxiv.org/abs/2412.18196) Zeru Shi, Zhenting Wang, Yongye Su, Weidi Luo, Fan Yang, Yongfeng Zhang -+ [AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models](https://arxiv.org//abs/2412.18123) ++ [AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models](https://arxiv.org/abs/2412.18123) Yiming Wang, Jiahao Chen, Qingming Li, Xing Yang, Shouling Ji -+ [FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models](https://arxiv.org//abs/2412.18302) ++ [FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models](https://arxiv.org/abs/2412.18302) Jaechul Roh, Andrew Yuan, Jinsong Mao -+ [On the Effectiveness of Adversarial Training on Malware Classifiers](https://arxiv.org//abs/2412.18218) ++ [On the Effectiveness of Adversarial Training on Malware Classifiers](https://arxiv.org/abs/2412.18218) Hamid Bostani, Jacopo Cortellazzi, Daniel Arp, Fabio Pierazzi, Veelasha Moonsamy, Lorenzo Cavallaro -+ [An Empirical Analysis of Federated Learning Models Subject to Label-Flipping Adversarial Attack](https://arxiv.org//abs/2412.18507) ++ [An Empirical Analysis of Federated Learning Models Subject to Label-Flipping Adversarial Attack](https://arxiv.org/abs/2412.18507) Kunal Bhatnagar, Sagana Chattanathan, Angela Dang, Bhargav Eranki, Ronnit Rana, Charan Sridhar, Siddharth Vedam, Angie Yao, Mark Stamp -+ [Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models](https://arxiv.org//abs/2412.18171) ++ [Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models](https://arxiv.org/abs/2412.18171) Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho -+ [On the Local Complexity of Linear Regions in Deep ReLU Networks](https://arxiv.org//abs/2412.18283) ++ [On the Local Complexity of Linear Regions in Deep ReLU Networks](https://arxiv.org/abs/2412.18283) Niket Patel, Guido Montufar # 2024-12-23 -+ [Retention Score: Quantifying Jailbreak Risks for Vision Language Models](https://arxiv.org//abs/2412.17544) ++ [Retention Score: Quantifying Jailbreak Risks for Vision Language Models](https://arxiv.org/abs/2412.17544) Zaitang Li, Pin-Yu Chen, Tsung-Yi Ho -+ [Large Language Model Safety: A Holistic Survey](https://arxiv.org//abs/2412.17686) ++ [Large Language Model Safety: A Holistic Survey](https://arxiv.org/abs/2412.17686) Dan Shi, Tianhao Shen, Yufei Huang, Zhigen Li, Yongqi Leng, Renren Jin, Chuang Liu, Xinwei Wu, Zishan Guo, Linhao Yu, Ling Shi, Bojian Jiang, Deyi Xiong -+ [Double Landmines: Invisible Textual Backdoor Attacks based on Dual-Trigger](https://arxiv.org//abs/2412.17531) ++ [Double Landmines: Invisible Textual Backdoor Attacks based on Dual-Trigger](https://arxiv.org/abs/2412.17531) Yang Hou, Qiuling Yue, Lujia Chai, Guozhao Liao, Wenbao Han, Wei Ou -+ [Emerging Security Challenges of Large Language Models](https://arxiv.org//abs/2412.17614) ++ [Emerging Security Challenges of Large Language Models](https://arxiv.org/abs/2412.17614) Herve Debar, Sven Dietrich, Pavel Laskov, Emil C. Lupu, Eirini Ntoutsi -+ [Learning from Mistakes: Self-correct Adversarial Training for Chinese Unnatural Text Correction](https://arxiv.org//abs/2412.17279) ++ [Learning from Mistakes: Self-correct Adversarial Training for Chinese Unnatural Text Correction](https://arxiv.org/abs/2412.17279) Xuan Feng, Tianlong Gu, Xiaoli Liu, Liang Chang -+ [DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak](https://arxiv.org//abs/2412.17522) ++ [DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak](https://arxiv.org/abs/2412.17522) Hao Wang, Hao Li, Junda Zhu, Xinyuan Wang, Chengwei Pan, MinLie Huang, Lei Sha -+ [Sensitivity Curve Maximization: Attacking Robust Aggregators in Distributed Learning](https://arxiv.org//abs/2412.17740) ++ [Sensitivity Curve Maximization: Attacking Robust Aggregators in Distributed Learning](https://arxiv.org/abs/2412.17740) Christian A. Schroth, Stefan Vlaski, Abdelhak M. Zoubir -+ [Attack by Yourself: Effective and Unnoticeable Multi-Category Graph Backdoor Attacks with Subgraph Triggers Pool](https://arxiv.org//abs/2412.17213) ++ [Attack by Yourself: Effective and Unnoticeable Multi-Category Graph Backdoor Attacks with Subgraph Triggers Pool](https://arxiv.org/abs/2412.17213) Jiangtong Li, Dungy Liu, Dawei Cheng, Changchun Jiang -+ [EM-MIAs: Enhancing Membership Inference Attacks in Large Language Models through Ensemble Modeling](https://arxiv.org//abs/2412.17249) ++ [EM-MIAs: Enhancing Membership Inference Attacks in Large Language Models through Ensemble Modeling](https://arxiv.org/abs/2412.17249) Zichen Song, Sitan Huang, Zhongfeng Kang -+ [Trading Devil RL: Backdoor attack via Stock market, Bayesian Optimization and Reinforcement Learning](https://arxiv.org//abs/2412.17908) ++ [Trading Devil RL: Backdoor attack via Stock market, Bayesian Optimization and Reinforcement Learning](https://arxiv.org/abs/2412.17908) Orson Mengara # 2024-12-22 -+ [Adversarial Diffusion Model for Unsupervised Domain-Adaptive Semantic Segmentation](https://arxiv.org//abs/2412.16859) ++ [Adversarial Diffusion Model for Unsupervised Domain-Adaptive Semantic Segmentation](https://arxiv.org/abs/2412.16859) Jongmin Yu, Zhongtian Sun, Shan Luo -+ [Preventing Non-intrusive Load Monitoring Privacy Invasion: A Precise Adversarial Attack Scheme for Networked Smart Meters](https://arxiv.org//abs/2412.16893) ++ [Preventing Non-intrusive Load Monitoring Privacy Invasion: A Precise Adversarial Attack Scheme for Networked Smart Meters](https://arxiv.org/abs/2412.16893) Jialing He, Jiacheng Wang, Ning Wang, Shangwei Guo, Liehuang Zhu, Dusit Niyato, Tao Xiang -+ [A Backdoor Attack Scheme with Invisible Triggers Based on Model Architecture Modification](https://arxiv.org//abs/2412.16905) ++ [A Backdoor Attack Scheme with Invisible Triggers Based on Model Architecture Modification](https://arxiv.org/abs/2412.16905) Yuan Ma, Xu Ma, Jiankang Wei, Jinmeng Tang, Xiaoyu Zhang, Yilun Lyu, Kehao Chen, Jingtong Huang -+ [ErasableMask: A Robust and Erasable Privacy Protection Scheme against Black-box Face Recognition Models](https://arxiv.org//abs/2412.17038) ++ [ErasableMask: A Robust and Erasable Privacy Protection Scheme against Black-box Face Recognition Models](https://arxiv.org/abs/2412.17038) Sipeng Shen, Yunming Zhang, Dengpan Ye, Xiuwen Shi, Long Tang, Haoran Duan, Ziyi Liu -+ [DR-Encoder: Encode Low-rank Gradients with Random Prior for Large Language Models Differentially Privately](https://arxiv.org//abs/2412.17053) ++ [DR-Encoder: Encode Low-rank Gradients with Random Prior for Large Language Models Differentially Privately](https://arxiv.org/abs/2412.17053) Huiwen Wu, Deyi Zhang, Xiaohan Li, Xiaogang Xu, Jiafei Wu, Zhe Liu -+ [Robustness of Large Language Models Against Adversarial Attacks](https://arxiv.org//abs/2412.17011) ++ [Robustness of Large Language Models Against Adversarial Attacks](https://arxiv.org/abs/2412.17011) Yiyi Tao, Yixian Shen, Hang Zhang, Yanxin Shen, Lun Wang, Chuanqi Shi, Shaoshuai Du -+ [Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models](https://arxiv.org//abs/2412.17034) ++ [Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models](https://arxiv.org/abs/2412.17034) Lang Gao, Xiangliang Zhang, Preslav Nakov, Xiuying Chen -+ [NumbOD: A Spatial-Frequency Fusion Attack Against Object Detectors](https://arxiv.org//abs/2412.16955) ++ [NumbOD: A Spatial-Frequency Fusion Attack Against Object Detectors](https://arxiv.org/abs/2412.16955) Ziqi Zhou, Bowen Li, Yufei Song, Zhifei Yu, Shengshan Hu, Wei Wan, Leo Yu Zhang, Dezhong Yao, Hai Jin -+ [Breaking Barriers in Physical-World Adversarial Examples: Improving Robustness and Transferability via Robust Feature](https://arxiv.org//abs/2412.16958) ++ [Breaking Barriers in Physical-World Adversarial Examples: Improving Robustness and Transferability via Robust Feature](https://arxiv.org/abs/2412.16958) Yichen Wang, Yuxuan Chou, Ziqi Zhou, Hangtao Zhang, Wei Wan, Shengshan Hu, Minghui Li # 2024-12-21 -+ [Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future Directions](https://arxiv.org//abs/2412.16504) ++ [Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future Directions](https://arxiv.org/abs/2412.16504) Hao Du, Shang Liu, Lele Zheng, Yang Cao, Atsuyoshi Nakamura, Lei Chen -+ [TrojFlow: Flow Models are Natural Targets for Trojan Attacks](https://arxiv.org//abs/2412.16512) ++ [TrojFlow: Flow Models are Natural Targets for Trojan Attacks](https://arxiv.org/abs/2412.16512) Zhengyang Qi, Xiaohua Xu -+ [POEX: Policy Executable Embodied AI Jailbreak Attacks](https://arxiv.org//abs/2412.16633) ++ [POEX: Policy Executable Embodied AI Jailbreak Attacks](https://arxiv.org/abs/2412.16633) Xuancun Lu, Zhengxian Huang, Xinfeng Li, Xiaoyu ji, Wenyuan Xu -+ [PB-UAP: Hybrid Universal Adversarial Attack For Image Segmentation](https://arxiv.org//abs/2412.16651) ++ [PB-UAP: Hybrid Universal Adversarial Attack For Image Segmentation](https://arxiv.org/abs/2412.16651) Yufei Song, Ziqi Zhou, Minghui Li, Xianlong Wang, Menghao Deng, Wei Wan, Shengshan Hu, Leo Yu Zhang -+ [Adversarial Attack Against Images Classification based on Generative Adversarial Networks](https://arxiv.org//abs/2412.16662) ++ [Adversarial Attack Against Images Classification based on Generative Adversarial Networks](https://arxiv.org/abs/2412.16662) Yahe Yang -+ [The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents](https://arxiv.org//abs/2412.16682) ++ [The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents](https://arxiv.org/abs/2412.16682) Feiran Jia, Tong Wu, Xin Qin, Anna Squicciarini -+ [Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models](https://arxiv.org//abs/2412.16555) ++ [Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models](https://arxiv.org/abs/2412.16555) Yanxu Mao, Peipei Liu, Tiehan Cui, Congying Liu, Datao You -+ [Forget Vectors at Play: Universal Input Perturbations Driving Machine Unlearning in Image Classification](https://arxiv.org//abs/2412.16780) ++ [Forget Vectors at Play: Universal Input Perturbations Driving Machine Unlearning in Image Classification](https://arxiv.org/abs/2412.16780) Changchang Sun, Ren Wang, Yihua Zhang, Jinghan Jia, Jiancheng Liu, Gaowen Liu, Sijia Liu, Yan Yan # 2024-12-20 -+ [JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs](https://arxiv.org//abs/2412.15623) ++ [JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs](https://arxiv.org/abs/2412.15623) Hongyi Li, Jiawei Ye, Jie Wu, Tianjie Yan, Chu Wang, Zhixin Li -+ [Watertox: The Art of Simplicity in Universal Attacks A Cross-Model Framework for Robust Adversarial Generation](https://arxiv.org//abs/2412.15924) ++ [Watertox: The Art of Simplicity in Universal Attacks A Cross-Model Framework for Robust Adversarial Generation](https://arxiv.org/abs/2412.15924) Zhenghao Gao, Shengjie Xu, Meixi Chen, Fangyao Zhao -+ [Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM](https://arxiv.org//abs/2412.15614) ++ [Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM](https://arxiv.org/abs/2412.15614) Yangyang Guo, Ziwei Xu, Xilie Xu, YongKang Wong, Liqiang Nie, Mohan Kankanhalli -+ [Prompt-based Unifying Inference Attack on Graph Neural Networks](https://arxiv.org//abs/2412.15735) ++ [Prompt-based Unifying Inference Attack on Graph Neural Networks](https://arxiv.org/abs/2412.15735) Yuecen Wei, Xingcheng Fu, Lingyun Liu, Qingyun Sun, Hao Peng, Chunming Hu -+ [Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers](https://arxiv.org//abs/2412.15503) ++ [Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers](https://arxiv.org/abs/2412.15503) Ruofei Wang, Hongzhan Lin, Ziyuan Luo, Ka Chun Cheung, Simon See, Jing Ma, Renjie Wan -+ [PoisonCatcher: Revealing and Identifying LDP Poisoning Attacks in IIoT](https://arxiv.org//abs/2412.15704) ++ [PoisonCatcher: Revealing and Identifying LDP Poisoning Attacks in IIoT](https://arxiv.org/abs/2412.15704) Lisha Shuai, Shaofeng Tan, Nan Zhang, Jiamin Zhang, Min Zhang, Xiaolong Yang -+ [Adversarial Robustness through Dynamic Ensemble Learning](https://arxiv.org//abs/2412.16254) ++ [Adversarial Robustness through Dynamic Ensemble Learning](https://arxiv.org/abs/2412.16254) Hetvi Waghela, Jaydip Sen, Sneha Rakshit -+ [Texture- and Shape-based Adversarial Attacks for Vehicle Detection in Synthetic Overhead Imagery](https://arxiv.org//abs/2412.16358) ++ [Texture- and Shape-based Adversarial Attacks for Vehicle Detection in Synthetic Overhead Imagery](https://arxiv.org/abs/2412.16358) Mikael Yeghiazaryan, Sai Abhishek Siddhartha Namburu, Emily Kim, Stanislav Panev, Celso de Melo, Brent Lance, Fernando De la Torre, Jessica K. Hodgins # 2024-12-19 -+ [FRIDAY: Mitigating Unintentional Facial Identity in Deepfake Detectors Guided by Facial Recognizers](https://arxiv.org//abs/2412.14623) ++ [FRIDAY: Mitigating Unintentional Facial Identity in Deepfake Detectors Guided by Facial Recognizers](https://arxiv.org/abs/2412.14623) Younhun Kim, Myung-Joon Kwon, Wonjun Lee, Changick Kim -+ [AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving](https://arxiv.org//abs/2412.15206) ++ [AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving](https://arxiv.org/abs/2412.15206) Shuo Xing, Hongyuan Hua, Xiangbo Gao, Shenzhe Zhu, Renjie Li, Kexin Tian, Xiaopeng Li, Heng Huang, Tianbao Yang, Zhangyang Wang, Yang Zhou, Huaxiu Yao, Zhengzhong Tu -+ [Holistic Adversarially Robust Pruning](https://arxiv.org//abs/2412.14714) ++ [Holistic Adversarially Robust Pruning](https://arxiv.org/abs/2412.14714) Qi Zhao, Christian Wressnegger -+ [Boosting GNN Performance via Training Sample Selection Based on Adversarial Robustness Evaluation](https://arxiv.org//abs/2412.14738) ++ [Boosting GNN Performance via Training Sample Selection Based on Adversarial Robustness Evaluation](https://arxiv.org/abs/2412.14738) Yongyu Wang -+ [SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage](https://arxiv.org//abs/2412.15289) ++ [SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage](https://arxiv.org/abs/2412.15289) Xiaoning Dong, Wenbo Hu, Wei Xu, Tianxing He -+ [Understanding the Dark Side of LLMs' Intrinsic Self-Correction](https://arxiv.org//abs/2412.14959) ++ [Understanding the Dark Side of LLMs' Intrinsic Self-Correction](https://arxiv.org/abs/2412.14959) Qingjie Zhang, Di Wang, Haoting Qian, Yiming Li, Tianwei Zhang, Minlie Huang, Ke Xu, Hewu Li, Yan Liu, Han Qiu -+ [Position: Mind the Gap-the Growing Disconnect Between Established Vulnerability Disclosure and AI Security](https://arxiv.org//abs/2412.14855) ++ [Position: Mind the Gap-the Growing Disconnect Between Established Vulnerability Disclosure and AI Security](https://arxiv.org/abs/2412.14855) Lukas Bieringer, Sean McGregor, Nicole Nichols, Kevin Paeth, Jochen Stängler, Andreas Wespi, Alexandre Alahi, Kathrin Grosse # 2024-12-18 -+ [Safeguarding System Prompts for LLMs](https://arxiv.org//abs/2412.13426) ++ [Safeguarding System Prompts for LLMs](https://arxiv.org/abs/2412.13426) Zhifeng Jiang, Zhihua Jin, Guoliang He -+ [A Statistical and Multi-Perspective Revisiting of the Membership Inference Attack in Large Language Models](https://arxiv.org//abs/2412.13475) ++ [A Statistical and Multi-Perspective Revisiting of the Membership Inference Attack in Large Language Models](https://arxiv.org/abs/2412.13475) Bowen Chen, Namgi Han, Yusuke Miyao -+ [Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation](https://arxiv.org//abs/2412.13705) ++ [Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation](https://arxiv.org/abs/2412.13705) Minkyoung Kim, Yunha Kim, Hyeram Seo, Heejung Choi, Jiye Han, Gaeun Kee, Soyoung Ko, HyoJe Jung, Byeolhee Kim, Young-Hak Kim, Sanghyun Park, Tae Joon Jun -+ [Physics-Based Adversarial Attack on Near-Infrared Human Detector for Nighttime Surveillance Camera Systems](https://arxiv.org//abs/2412.13709) ++ [Physics-Based Adversarial Attack on Near-Infrared Human Detector for Nighttime Surveillance Camera Systems](https://arxiv.org/abs/2412.13709) Muyao Niu, Zhuoxiao Li, Yifan Zhan, Huy H. Nguyen, Isao Echizen, Yinqiang Zheng -+ [A Black-Box Evaluation Framework for Semantic Robustness in Bird's Eye View Detection](https://arxiv.org//abs/2412.13913) ++ [A Black-Box Evaluation Framework for Semantic Robustness in Bird's Eye View Detection](https://arxiv.org/abs/2412.13913) Fu Wang, Yanghao Zhang, Xiangyu Yin, Guangliang Cheng, Zeyu Fu, Xiaowei Huang, Wenjie Ruan -+ [Cultivating Archipelago of Forests: Evolving Robust Decision Trees through Island Coevolution](https://arxiv.org//abs/2412.13762) ++ [Cultivating Archipelago of Forests: Evolving Robust Decision Trees through Island Coevolution](https://arxiv.org/abs/2412.13762) Adam Żychowski, Andrew Perrault, Jacek Mańdziuk -+ [On the Robustness of Distributed Machine Learning against Transfer Attacks](https://arxiv.org//abs/2412.14080) ++ [On the Robustness of Distributed Machine Learning against Transfer Attacks](https://arxiv.org/abs/2412.14080) Sébastien Andreina, Pascal Zimmer, Ghassan Karame -+ [A Review of the Duality of Adversarial Learning in Network Intrusion: Attacks and Countermeasures](https://arxiv.org//abs/2412.13880) ++ [A Review of the Duality of Adversarial Learning in Network Intrusion: Attacks and Countermeasures](https://arxiv.org/abs/2412.13880) Shalini Saini, Anitha Chennamaneni, Babatunde Sawyerr -+ [Adversarial Hubness in Multi-Modal Retrieval](https://arxiv.org//abs/2412.14113) ++ [Adversarial Hubness in Multi-Modal Retrieval](https://arxiv.org/abs/2412.14113) Tingwei Zhang, Fnu Suya, Rishi Jha, Collin Zhang, Vitaly Shmatikov -+ [Exploring Query Efficient Data Generation towards Data-free Model Stealing in Hard Label Setting](https://arxiv.org//abs/2412.15276) ++ [Exploring Query Efficient Data Generation towards Data-free Model Stealing in Hard Label Setting](https://arxiv.org/abs/2412.15276) Gaozheng Pei, Shaojie lyu, Ke Ma, Pinci Yang, Qianqian Xu, Yingfei Sun @@ -16409,92 +16409,92 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Aneta Zugecova, Dominik Macko, Ivan Srba, Robert Moro, Jakub Kopal, Katarina Marcincinova, Matus Mesarcik # 2024-12-17 -+ [Solid-SQL: Enhanced Schema-linking based In-context Learning for Robust Text-to-SQL](https://arxiv.org//abs/2412.12522) ++ [Solid-SQL: Enhanced Schema-linking based In-context Learning for Robust Text-to-SQL](https://arxiv.org/abs/2412.12522) Geling Liu, Yunzhi Tan, Ruichao Zhong, Yuanzhen Xie, Lingchen Zhao, Qian Wang, Bo Hu, Zang Li -+ [Defending LVLMs Against Vision Attacks through Partial-Perception Supervision](https://arxiv.org//abs/2412.12722) ++ [Defending LVLMs Against Vision Attacks through Partial-Perception Supervision](https://arxiv.org/abs/2412.12722) Qi Zhou, Tianlin Li, Qing Guo, Dongxia Wang, Yun Lin, Yang Liu, Jin Song Dong -+ [Boosting Fine-Grained Visual Anomaly Detection with Coarse-Knowledge-Aware Adversarial Learning](https://arxiv.org//abs/2412.12850) ++ [Boosting Fine-Grained Visual Anomaly Detection with Coarse-Knowledge-Aware Adversarial Learning](https://arxiv.org/abs/2412.12850) Qingqing Fang, Qinliang Su, Wenxi Lv, Wenchao Xu, Jianxing Yu -+ [Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script](https://arxiv.org//abs/2412.12478) ++ [Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script](https://arxiv.org/abs/2412.12478) Xi Cao, Yuan Sun, Jiajun Li, Quzong Gesang, Nuo Qun, Tashi Nyima -+ [NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning](https://arxiv.org//abs/2412.12497) ++ [NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning](https://arxiv.org/abs/2412.12497) Xin Yi, Shunfan Zheng, Linlin Wang, Gerard de Melo, Xiaoling Wang, Liang He -+ [Jailbreaking? One Step Is Enough!](https://arxiv.org//abs/2412.12621) ++ [Jailbreaking? One Step Is Enough!](https://arxiv.org/abs/2412.12621) Weixiong Zheng, Peijian Zeng, Yiwei Li, Hongyan Wu, Nankai Lin, Junhao Chen, Aimin Yang, Yongmei Zhou -+ [Truthful Text Sanitization Guided by Inference Attacks](https://arxiv.org//abs/2412.12928) ++ [Truthful Text Sanitization Guided by Inference Attacks](https://arxiv.org/abs/2412.12928) Ildikó Pilán, Benet Manzanares-Salor, David Sánchez, Pierre Lison -+ [Invisible Watermarks: Attacks and Robustness](https://arxiv.org//abs/2412.12511) ++ [Invisible Watermarks: Attacks and Robustness](https://arxiv.org/abs/2412.12511) Dongjun Hwang, Sungwon Woo, Tom Gao, Raymond Luo, Sunghwan Baek -+ [Improving the Transferability of 3D Point Cloud Attack via Spectral-aware Admix and Optimization Designs](https://arxiv.org//abs/2412.12626) ++ [Improving the Transferability of 3D Point Cloud Attack via Spectral-aware Admix and Optimization Designs](https://arxiv.org/abs/2412.12626) Shiyu Hu, Daizong Liu, Wei Hu -+ [A New Adversarial Perspective for LiDAR-based 3D Object Detection](https://arxiv.org//abs/2412.13017) ++ [A New Adversarial Perspective for LiDAR-based 3D Object Detection](https://arxiv.org/abs/2412.13017) Shijun Zheng, Weiquan Liu, Yu Guo, Yu Zang, Siqi Shen, Cheng Wang -+ [Building Gradient Bridges: Label Leakage from Restricted Gradient Sharing in Federated Learning](https://arxiv.org//abs/2412.12640) ++ [Building Gradient Bridges: Label Leakage from Restricted Gradient Sharing in Federated Learning](https://arxiv.org/abs/2412.12640) Rui Zhang, Ka-Ho Chow, Ping Li -+ [Deep Learning for Resilient Adversarial Decision Fusion in Byzantine Networks](https://arxiv.org//abs/2412.12739) ++ [Deep Learning for Resilient Adversarial Decision Fusion in Byzantine Networks](https://arxiv.org/abs/2412.12739) Kassem Kallas -+ [Scrutinizing the Vulnerability of Decentralized Learning to Membership Inference Attacks](https://arxiv.org//abs/2412.12837) ++ [Scrutinizing the Vulnerability of Decentralized Learning to Membership Inference Attacks](https://arxiv.org/abs/2412.12837) Ousmane Touat, Jezekael Brunon, Yacine Belal, Julien Nicolas, Mohamed Maouche, César Sabater, Sonia Ben Mokhtar -+ [Adversarially robust generalization theory via Jacobian regularization for deep neural networks](https://arxiv.org//abs/2412.12449) ++ [Adversarially robust generalization theory via Jacobian regularization for deep neural networks](https://arxiv.org/abs/2412.12449) Dongya Wu, Xin Li -+ [Practicable Black-box Evasion Attacks on Link Prediction in Dynamic Graphs -- A Graph Sequential Embedding Method](https://arxiv.org//abs/2412.13134) ++ [Practicable Black-box Evasion Attacks on Link Prediction in Dynamic Graphs -- A Graph Sequential Embedding Method](https://arxiv.org/abs/2412.13134) Jiate Li, Meng Pang, Binghui Wang -+ [RemoteRAG: A Privacy-Preserving LLM Cloud RAG Service](https://arxiv.org//abs/2412.12775) ++ [RemoteRAG: A Privacy-Preserving LLM Cloud RAG Service](https://arxiv.org/abs/2412.12775) Yihang Cheng, Lan Zhang, Junyang Wang, Mu Yuan, Yunhao Yao -+ [BadSAD: Clean-Label Backdoor Attacks against Deep Semi-Supervised Anomaly Detection](https://arxiv.org//abs/2412.13324) ++ [BadSAD: Clean-Label Backdoor Attacks against Deep Semi-Supervised Anomaly Detection](https://arxiv.org/abs/2412.13324) He Cheng, Depeng Xu, Shuhan Yuan -+ [Targeted View-Invariant Adversarial Perturbations for 3D Object Recognition](https://arxiv.org//abs/2412.13376) ++ [Targeted View-Invariant Adversarial Perturbations for 3D Object Recognition](https://arxiv.org/abs/2412.13376) Christian Green, Mehmet Ergezer, Abdurrahman Zeybey @@ -16505,153 +16505,153 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Ousmane Touat, Jezekael Brunon, Yacine Belal, Julien Nicolas, Mohamed Maouche, César Sabater, Sonia Ben Mokhtar # 2024-12-16 -+ [Stepwise Reasoning Error Disruption Attack of LLMs](https://arxiv.org//abs/2412.11934) ++ [Stepwise Reasoning Error Disruption Attack of LLMs](https://arxiv.org/abs/2412.11934) Jingyu Peng, Maolin Wang, Xiangyu Zhao, Kai Zhang, Wanyu Wang, Pengyue Jia, Qidong Liu, Ruocheng Guo, Qi Liu -+ [Red Pill and Blue Pill: Controllable Website Fingerprinting Defense via Dynamic Backdoor Learning](https://arxiv.org//abs/2412.11471) ++ [Red Pill and Blue Pill: Controllable Website Fingerprinting Defense via Dynamic Backdoor Learning](https://arxiv.org/abs/2412.11471) Siyuan Liang, Jiajun Gong, Tianmeng Fang, Aishan Liu, Tao Wang, Xianglong Liu, Xiaochun Cao, Dacheng Tao, Chang Ee-Chien -+ [Transferable Adversarial Face Attack with Text Controlled Attribute](https://arxiv.org//abs/2412.11735) ++ [Transferable Adversarial Face Attack with Text Controlled Attribute](https://arxiv.org/abs/2412.11735) Wenyun Li, Zheng Zhang, Xiangyuan Lan, Dongmei Jiang -+ [The Impact of Generalization Techniques on the Interplay Among Privacy, Utility, and Fairness in Image Classification](https://arxiv.org//abs/2412.11951) ++ [The Impact of Generalization Techniques on the Interplay Among Privacy, Utility, and Fairness in Image Classification](https://arxiv.org/abs/2412.11951) Ahmad Hassanpour, Amir Zarei, Khawla Mallat, Anderson Santana de Oliveira, Bian Yang -+ [How Private are Language Models in Abstractive Summarization?](https://arxiv.org//abs/2412.12040) ++ [How Private are Language Models in Abstractive Summarization?](https://arxiv.org/abs/2412.12040) Anthony Hughes, Nikolaos Aletras, Ning Ma -+ [Relation-Guided Adversarial Learning for Data-free Knowledge Transfer](https://arxiv.org//abs/2412.11380) ++ [Relation-Guided Adversarial Learning for Data-free Knowledge Transfer](https://arxiv.org/abs/2412.11380) Yingping Liang, Ying Fu -+ [Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation](https://arxiv.org//abs/2412.11608) ++ [Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation](https://arxiv.org/abs/2412.11608) Svetlana Pavlitska, Enrico Eisen, J. Marius Zöllner -+ [IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation](https://arxiv.org//abs/2412.11638) ++ [IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation](https://arxiv.org/abs/2412.11638) Yiren Song, Pei Yang, Hai Ci, Mike Zheng Shou -+ [Vertical Federated Unlearning via Backdoor Certification](https://arxiv.org//abs/2412.11476) ++ [Vertical Federated Unlearning via Backdoor Certification](https://arxiv.org/abs/2412.11476) Mengde Han, Tianqing Zhu, Lefeng Zhang, Huan Huo, Wanlei Zhou -+ [Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning](https://arxiv.org//abs/2412.11689) ++ [Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning](https://arxiv.org/abs/2412.11689) Andrei Semenov, Philip Zmushko, Alexander Pichugin, Aleksandr Beznosikov -+ [Efficiently Achieving Secure Model Training and Secure Aggregation to Ensure Bidirectional Privacy-Preservation in Federated Learning](https://arxiv.org//abs/2412.11737) ++ [Efficiently Achieving Secure Model Training and Secure Aggregation to Ensure Bidirectional Privacy-Preservation in Federated Learning](https://arxiv.org/abs/2412.11737) Xue Yang, Depan Peng, Yan Feng, Xiaohu Tang, Weijun Fang, Jun Shao -+ [Accurate, Robust and Privacy-Preserving Brain-Computer Interface Decoding](https://arxiv.org//abs/2412.11390) ++ [Accurate, Robust and Privacy-Preserving Brain-Computer Interface Decoding](https://arxiv.org/abs/2412.11390) Xiaoqing Chen, Tianwang Jia, Dongrui Wu -+ [UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models](https://arxiv.org//abs/2412.11441) ++ [UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models](https://arxiv.org/abs/2412.11441) Yuning Han, Bingyin Zhao, Rui Chu, Feng Luo, Biplab Sikdar, Yingjie Lao -+ [A Comprehensive Review of Adversarial Attacks on Machine Learning](https://arxiv.org//abs/2412.11384) ++ [A Comprehensive Review of Adversarial Attacks on Machine Learning](https://arxiv.org/abs/2412.11384) Syed Quiser Ahmed, Bharathi Vokkaliga Ganesh, Sathyanarayana Sampath Kumar, Prakhar Mishra, Ravi Anand, Bhanuteja Akurathi -+ [Comprehensive Survey on Adversarial Examples in Cybersecurity: Impacts, Challenges, and Mitigation Strategies](https://arxiv.org//abs/2412.12217) ++ [Comprehensive Survey on Adversarial Examples in Cybersecurity: Impacts, Challenges, and Mitigation Strategies](https://arxiv.org/abs/2412.12217) Li Li -+ [Quantum Adversarial Machine Learning and Defense Strategies: Challenges and Opportunities](https://arxiv.org//abs/2412.12373) ++ [Quantum Adversarial Machine Learning and Defense Strategies: Challenges and Opportunities](https://arxiv.org/abs/2412.12373) Eric Yocam, Anthony Rizi, Mahesh Kamepalli, Varghese Vaidyan, Yong Wang, Gurcan Comert -+ [CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird's Eye View Perception](https://arxiv.org//abs/2412.12000) ++ [CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird's Eye View Perception](https://arxiv.org/abs/2412.12000) Senkang Hu, Yihang Tao, Guowen Xu, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong # 2024-12-15 -+ [Impact of Adversarial Attacks on Deep Learning Model Explainability](https://arxiv.org//abs/2412.11119) ++ [Impact of Adversarial Attacks on Deep Learning Model Explainability](https://arxiv.org/abs/2412.11119) Gazi Nazia Nur, Mohammad Ahnaf Sadat -+ [Unpacking the Resilience of SNLI Contradiction Examples to Attacks](https://arxiv.org//abs/2412.11172) ++ [Unpacking the Resilience of SNLI Contradiction Examples to Attacks](https://arxiv.org/abs/2412.11172) Chetan Verma, Archit Agarwal -+ [Sequence-Level Analysis of Leakage Risk of Training Data in Large Language Models](https://arxiv.org//abs/2412.11302) ++ [Sequence-Level Analysis of Leakage Risk of Training Data in Large Language Models](https://arxiv.org/abs/2412.11302) Trishita Tiwari, G. Edward Suh -+ [Learning Robust and Privacy-Preserving Representations via Information Theory](https://arxiv.org//abs/2412.11066) ++ [Learning Robust and Privacy-Preserving Representations via Information Theory](https://arxiv.org/abs/2412.11066) Binghui Zhang, Sayedeh Leila Noorbakhsh, Yun Dong, Yuan Hong, Binghui Wang -+ [PGD-Imp: Rethinking and Unleashing Potential of Classic PGD with Dual Strategies for Imperceptible Adversarial Attacks](https://arxiv.org//abs/2412.11168) ++ [PGD-Imp: Rethinking and Unleashing Potential of Classic PGD with Dual Strategies for Imperceptible Adversarial Attacks](https://arxiv.org/abs/2412.11168) Jin Li, Zitong Yu, Ziqiang He, Z. Jane Wang, Xiangui Kang -+ [Finding a Wolf in Sheep's Clothing: Combating Adversarial Text-To-Image Prompts with Text Summarization](https://arxiv.org//abs/2412.12212) ++ [Finding a Wolf in Sheep's Clothing: Combating Adversarial Text-To-Image Prompts with Text Summarization](https://arxiv.org/abs/2412.12212) Portia Cooper, Harshita Narnoli, Mihai Surdeanu # 2024-12-14 -+ [Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages](https://arxiv.org//abs/2412.10805) ++ [Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages](https://arxiv.org/abs/2412.10805) Poulami Ghosh, Raj Dabre, Pushpak Bhattacharyya -+ [One Pixel is All I Need](https://arxiv.org//abs/2412.10681) ++ [One Pixel is All I Need](https://arxiv.org/abs/2412.10681) Deng Siqin, Zhou Xiaoyi -+ [Centaur: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference](https://arxiv.org//abs/2412.10652) ++ [Centaur: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference](https://arxiv.org/abs/2412.10652) Jinglong Luo, Guanzhong Chen, Yehong Zhang, Shiyu Liu, Hui Wang, Yue Yu, Xun Zhou, Yuan Qi, Zenglin Xu -+ [Improving Graph Neural Networks via Adversarial Robustness Evaluation](https://arxiv.org//abs/2412.10850) ++ [Improving Graph Neural Networks via Adversarial Robustness Evaluation](https://arxiv.org/abs/2412.10850) Yongyu Wang -+ [Towards Action Hijacking of Large Language Model-based Agent](https://arxiv.org//abs/2412.10807) ++ [Towards Action Hijacking of Large Language Model-based Agent](https://arxiv.org/abs/2412.10807) Yuyang Zhang, Kangjie Chen, Xudong Jiang, Yuxiang Sun, Run Wang, Lina Wang -+ [TrendSim: Simulating Trending Topics in Social Media Under Poisoning Attacks with LLM-based Multi-agent System](https://arxiv.org//abs/2412.12196) ++ [TrendSim: Simulating Trending Topics in Social Media Under Poisoning Attacks with LLM-based Multi-agent System](https://arxiv.org/abs/2412.12196) Zeyu Zhang, Jianxun Lian, Chen Ma, Yaning Qu, Ye Luo, Lei Wang, Rui Li, Xu Chen, Yankai Lin, Le Wu, Xing Xie, Ji-Rong Wen -+ [BlockDoor: Blocking Backdoor Based Watermarks in Deep Neural Networks](https://arxiv.org//abs/2412.12194) ++ [BlockDoor: Blocking Backdoor Based Watermarks in Deep Neural Networks](https://arxiv.org/abs/2412.12194) Yi Hao Puah, Anh Tu Ngo, Nandish Chattopadhyay, Anupam Chattopadhyay @@ -16661,215 +16661,215 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yuyang Zhang, Kangjie Chen, Jiaxin Gao, Ronghao Cui, Run Wang, Lina Wang, Tianwei Zhang -+ [Tokens, the oft-overlooked appetizer: Large language models, the distributional hypothesis, and meaning](https://arxiv.org//abs/2412.10924) ++ [Tokens, the oft-overlooked appetizer: Large language models, the distributional hypothesis, and meaning](https://arxiv.org/abs/2412.10924) Julia Witte Zimmerman, Denis Hudon, Kathryn Cramer, Alejandro J. Ruiz, Calla Beauregard, Ashley Fehr, Mikaela Irene Fudolig, Bradford Demarest, Yoshi Meke Bird, Milo Z. Trujillo, Christopher M. Danforth, Peter Sheridan Dodds # 2024-12-13 -+ [BiCert: A Bilinear Mixed Integer Programming Formulation for Precise Certified Bounds Against Data Poisoning Attacks](https://arxiv.org//abs/2412.10186) ++ [BiCert: A Bilinear Mixed Integer Programming Formulation for Precise Certified Bounds Against Data Poisoning Attacks](https://arxiv.org/abs/2412.10186) Tobias Lorenz, Marta Kwiatkowska, Mario Fritz -+ [From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection](https://arxiv.org//abs/2412.10198) ++ [From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection](https://arxiv.org/abs/2412.10198) Haowei Wang, Rupeng Zhang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, Qing Wang -+ [Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language Models](https://arxiv.org//abs/2412.10257) ++ [Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language Models](https://arxiv.org/abs/2412.10257) Harry J. Davies, Giorgos Iacovides, Danilo P. Mandic -+ [AdvPrefix: An Objective for Nuanced LLM Jailbreaks](https://arxiv.org//abs/2412.10321) ++ [AdvPrefix: An Objective for Nuanced LLM Jailbreaks](https://arxiv.org/abs/2412.10321) Sicheng Zhu, Brandon Amos, Yuandong Tian, Chuan Guo, Ivan Evtimov -+ [Real-time Identity Defenses against Malicious Personalization of Diffusion Models](https://arxiv.org//abs/2412.09844) ++ [Real-time Identity Defenses against Malicious Personalization of Diffusion Models](https://arxiv.org/abs/2412.09844) Hanzhong Guo, Shen Nie, Chao Du, Tianyu Pang, Hao Sun, Chongxuan Li -+ [Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images](https://arxiv.org//abs/2412.09910) ++ [Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images](https://arxiv.org/abs/2412.09910) Yasamin Medghalchi, Moein Heidari, Clayton Allard, Leonid Sigal, Ilker Hacihaliloglu -+ [FaceShield: Defending Facial Image against Deepfake Threats](https://arxiv.org//abs/2412.09921) ++ [FaceShield: Defending Facial Image against Deepfake Threats](https://arxiv.org/abs/2412.09921) Jaehwan Jeong, Sumin In, Sieun Kim, Hannie Shin, Jongheon Jeong, Sang Ho Yoon, Jaewook Chung, Sangpil Kim -+ [$\textrm{A}^{\textrm{2}}$RNet: Adversarial Attack Resilient Network for Robust Infrared and Visible Image Fusion](https://arxiv.org//abs/2412.09954) ++ [$\textrm{A}^{\textrm{2}}$RNet: Adversarial Attack Resilient Network for Robust Infrared and Visible Image Fusion](https://arxiv.org/abs/2412.09954) Jiawei Li, Hongwei Yu, Jiansheng Chen, Xinlong Ding, Jinlong Wang, Jinyuan Liu, Bochao Zou, Huimin Ma -+ [Robust image classification with multi-modal large language models](https://arxiv.org//abs/2412.10353) ++ [Robust image classification with multi-modal large language models](https://arxiv.org/abs/2412.10353) Francesco Villani, Igor Maljkovic, Dario Lazzaro, Angelo Sotgiu, Antonio Emanuele Cinà, Fabio Roli -+ [Adversarial Robustness of Bottleneck Injected Deep Neural Networks for Task-Oriented Communication](https://arxiv.org//abs/2412.10265) ++ [Adversarial Robustness of Bottleneck Injected Deep Neural Networks for Task-Oriented Communication](https://arxiv.org/abs/2412.10265) Alireza Furutanpey, Pantelis A. Frangoudis, Patrik Szabo, Schahram Dustdar -+ [On Adversarial Robustness and Out-of-Distribution Robustness of Large Language Models](https://arxiv.org//abs/2412.10535) ++ [On Adversarial Robustness and Out-of-Distribution Robustness of Large Language Models](https://arxiv.org/abs/2412.10535) April Yang, Jordan Tab, Parth Shah, Paul Kotchavong -+ [Too Big to Fool: Resisting Deception in Language Models](https://arxiv.org//abs/2412.10558) ++ [Too Big to Fool: Resisting Deception in Language Models](https://arxiv.org/abs/2412.10558) Mohammad Reza Samsami, Mats Leon Richter, Juan Rodriguez, Megh Thakkar, Sarath Chandar, Maxime Gasse -+ [Client-Side Patching against Backdoor Attacks in Federated Learning](https://arxiv.org//abs/2412.10605) ++ [Client-Side Patching against Backdoor Attacks in Federated Learning](https://arxiv.org/abs/2412.10605) Borja Molina Coronado -+ [BinarySelect to Improve Accessibility of Black-Box Attack Research](https://arxiv.org//abs/2412.10617) ++ [BinarySelect to Improve Accessibility of Black-Box Attack Research](https://arxiv.org/abs/2412.10617) Shatarupa Ghosh, Jonathan Rusert -+ [Err on the Side of Texture: Texture Bias on Real Data](https://arxiv.org//abs/2412.10597) ++ [Err on the Side of Texture: Texture Bias on Real Data](https://arxiv.org/abs/2412.10597) Blaine Hoak, Ryan Sheatsley, Patrick McDaniel -+ [No Free Lunch for Defending Against Prefilling Attack by In-Context Learning](https://arxiv.org//abs/2412.12192) ++ [No Free Lunch for Defending Against Prefilling Attack by In-Context Learning](https://arxiv.org/abs/2412.12192) Zhiyu Xue, Guangliang Liu, Bocheng Chen, Kristen Marie Johnson, Ramtin Pedarsani # 2024-12-12 -+ [SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning](https://arxiv.org//abs/2412.09073) ++ [SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning](https://arxiv.org/abs/2412.09073) Wenqian Li, Pengfei Fang, Hui Xue -+ [Evaluating Adversarial Attacks on Traffic Sign Classifiers beyond Standard Baselines](https://arxiv.org//abs/2412.09150) ++ [Evaluating Adversarial Attacks on Traffic Sign Classifiers beyond Standard Baselines](https://arxiv.org/abs/2412.09150) Svetlana Pavlitska, Leopold Müller, J. Marius Zöllner -+ [Obfuscated Activations Bypass LLM Latent-Space Defenses](https://arxiv.org//abs/2412.09565) ++ [Obfuscated Activations Bypass LLM Latent-Space Defenses](https://arxiv.org/abs/2412.09565) Luke Bailey, Alex Serrano, Abhay Sheshadri, Mikhail Seleznyov, Jordan Taylor, Erik Jenner, Jacob Hilton, Stephen Casper, Carlos Guestrin, Scott Emmons -+ [Deep Learning Model Security: Threats and Defenses](https://arxiv.org//abs/2412.08969) ++ [Deep Learning Model Security: Threats and Defenses](https://arxiv.org/abs/2412.08969) Tianyang Wang, Ziqian Bi, Yichao Zhang, Ming Liu, Weiche Hsieh, Pohsun Feng, Lawrence K.Q. Yan, Yizhu Wen, Benji Peng, Junyu Liu, Keyu Chen, Sen Zhang, Ming Li, Chuanqi Jiang, Xinyuan Song, Junjie Yang, Bowen Jing, Jintao Ren, Junhao Song, Hong-Ming Tseng, Silin Chen, Yunze Wang, Chia Xin Liang, Jiawei Xu, Xuanhe Pan, Jinlang Wang, Qian Niu -+ [On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection](https://arxiv.org//abs/2412.09195) ++ [On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection](https://arxiv.org/abs/2412.09195) Chenyang Guo, Liping Chen, Zhuhai Li, Kong Aik Lee, Zhen-Hua Ling, Wu Guo -+ [A Semi Black-Box Adversarial Bit-Flip Attack with Limited DNN Model Information](https://arxiv.org//abs/2412.09450) ++ [A Semi Black-Box Adversarial Bit-Flip Attack with Limited DNN Model Information](https://arxiv.org/abs/2412.09450) Behnam Ghavami, Mani Sadati, Mohammad Shahidzadeh, Lesley Shannon, Steve Wilton -+ [AI Red-Teaming is a Sociotechnical System. Now What?](https://arxiv.org//abs/2412.09751) ++ [AI Red-Teaming is a Sociotechnical System. Now What?](https://arxiv.org/abs/2412.09751) Tarleton Gillespie, Ryland Shaw, Mary L. Gray, Jina Suh -+ [TOAP: Towards Better Robustness in Universal Transferable Anti-Facial Retrieval](https://arxiv.org//abs/2412.09692) ++ [TOAP: Towards Better Robustness in Universal Transferable Anti-Facial Retrieval](https://arxiv.org/abs/2412.09692) Yunna Lv, Long Tang, Dengpan Ye, Caiyun Xie, Jiacheng Deng, Yiheng He # 2024-12-11 -+ [MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents](https://arxiv.org//abs/2412.08014) ++ [MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents](https://arxiv.org/abs/2412.08014) Yun Xing, Nhat Chung, Jie Zhang, Yue Cao, Ivor Tsang, Yang Liu, Lei Ma, Qing Guo -+ [DynamicPAE: Generating Scene-Aware Physical Adversarial Examples in Real-Time](https://arxiv.org//abs/2412.08053) ++ [DynamicPAE: Generating Scene-Aware Physical Adversarial Examples in Real-Time](https://arxiv.org/abs/2412.08053) Jin Hu, Xianglong Liu, Jiakai Wang, Junkai Zhang, Xianqi Yang, Haotong Qin, Yuqing Ma, Ke Xu -+ [Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting](https://arxiv.org//abs/2412.08099) ++ [Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting](https://arxiv.org/abs/2412.08099) Fuqiang Liu, Sicong Jiang, Luis Miranda-Moreno, Seongjin Choi, Lijun Sun -+ [Antelope: Potent and Concealed Jailbreak Attack Strategy](https://arxiv.org//abs/2412.08156) ++ [Antelope: Potent and Concealed Jailbreak Attack Strategy](https://arxiv.org/abs/2412.08156) Xin Zhao, Xiaojun Chen, Haoyu Gao -+ [How Does the Smoothness Approximation Method Facilitate Generalization for Federated Adversarial Learning?](https://arxiv.org//abs/2412.08282) ++ [How Does the Smoothness Approximation Method Facilitate Generalization for Federated Adversarial Learning?](https://arxiv.org/abs/2412.08282) Wenjun Ding, Ying An, Lixing Chen, Shichao Kan, Fan Wu, Zhe Qu -+ [AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models](https://arxiv.org//abs/2412.08608) ++ [AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models](https://arxiv.org/abs/2412.08608) Mintong Kang, Chejian Xu, Bo Li -+ [Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models](https://arxiv.org//abs/2412.08615) ++ [Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models](https://arxiv.org/abs/2412.08615) Jiahui Li, Yongchang Hao, Haoyu Xu, Xing Wang, Yu Hong -+ [Doubly-Universal Adversarial Perturbations: Deceiving Vision-Language Models Across Both Images and Text with a Single Perturbation](https://arxiv.org//abs/2412.08108) ++ [Doubly-Universal Adversarial Perturbations: Deceiving Vision-Language Models Across Both Images and Text with a Single Perturbation](https://arxiv.org/abs/2412.08108) Hee-Seon Kim, Minbeom Kim, Changick Kim -+ [Enhancing Remote Adversarial Patch Attacks on Face Detectors with Tiling and Scaling](https://arxiv.org//abs/2412.07996) ++ [Enhancing Remote Adversarial Patch Attacks on Face Detectors with Tiling and Scaling](https://arxiv.org/abs/2412.07996) Masora Okano, Koichi Ito, Masakatsu Nishigaki, Tetsushi Ohki -+ [Local Features Meet Stochastic Anonymization: Revolutionizing Privacy-Preserving Face Recognition for Black-Box Models](https://arxiv.org//abs/2412.08276) ++ [Local Features Meet Stochastic Anonymization: Revolutionizing Privacy-Preserving Face Recognition for Black-Box Models](https://arxiv.org/abs/2412.08276) Yuanwei Liu, Chengyu Jia, Ruqi Xiao, Xuemai Jia, Hui Wei, Kui Jiang, Zheng Wang -+ [Backdoor attacks on DNN and GBDT -- A Case Study from the insurance domain](https://arxiv.org//abs/2412.08366) ++ [Backdoor attacks on DNN and GBDT -- A Case Study from the insurance domain](https://arxiv.org/abs/2412.08366) Robin Kühlem, Daniel Otten, Daniel Ludwig, Anselm Hudde, Alexander Rosenbaum, Andreas Mauthe -+ [Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds](https://arxiv.org//abs/2412.08394) ++ [Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds](https://arxiv.org/abs/2412.08394) Shuhai Zhang, Jiahao Yang, Hui Luo, Jie Chen, Li Wang, Feng Liu, Bo Han, Mingkui Tan -+ [Training Data Reconstruction: Privacy due to Uncertainty?](https://arxiv.org//abs/2412.08544) ++ [Training Data Reconstruction: Privacy due to Uncertainty?](https://arxiv.org/abs/2412.08544) Christina Runkel, Kanchana Vaishnavi Gandikota, Jonas Geiping, Carola-Bibiane Schönlieb, Michael Moeller -+ [Grimm: A Plug-and-Play Perturbation Rectifier for Graph Neural Networks Defending against Poisoning Attacks](https://arxiv.org//abs/2412.08555) ++ [Grimm: A Plug-and-Play Perturbation Rectifier for Graph Neural Networks Defending against Poisoning Attacks](https://arxiv.org/abs/2412.08555) Ao Liu, Wenshan Li, Beibei Li, Wengang Ma, Tao Li, Pan Zhou -+ [Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning](https://arxiv.org//abs/2412.08559) ++ [Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning](https://arxiv.org/abs/2412.08559) Rongzhe Wei, Mufei Li, Mohsen Ghassemi, Eleonora Kreačić, Yifan Li, Xiang Yue, Bo Li, Vamsi K. Potluru, Pan Li, Eli Chien -+ [Model-Editing-Based Jailbreak against Safety-aligned Large Language Models](https://arxiv.org//abs/2412.08201) ++ [Model-Editing-Based Jailbreak against Safety-aligned Large Language Models](https://arxiv.org/abs/2412.08201) Yuxi Li, Zhibo Zhang, Kailong Wang, Ling Shi, Haoyu Wang -+ [Proactive Adversarial Defense: Harnessing Prompt Tuning in Vision-Language Models to Detect Unseen Backdoored Images](https://arxiv.org//abs/2412.08755) ++ [Proactive Adversarial Defense: Harnessing Prompt Tuning in Vision-Language Models to Detect Unseen Backdoored Images](https://arxiv.org/abs/2412.08755) Kyle Stein, Andrew Arash Mahyari, Guillermo Francia, Eman El-Sheikh @@ -16880,358 +16880,358 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Nathanaël Carraz Rakotonirina, Corentin Kervadec, Francesca Franzon, Marco Baroni # 2024-12-10 -+ [Defensive Dual Masking for Robust Adversarial Defense](https://arxiv.org//abs/2412.07078) ++ [Defensive Dual Masking for Robust Adversarial Defense](https://arxiv.org/abs/2412.07078) Wangli Yang, Jie Yang, Yi Guo, Johan Barthelemy -+ [On Evaluating the Durability of Safeguards for Open-Weight LLMs](https://arxiv.org//abs/2412.07097) ++ [On Evaluating the Durability of Safeguards for Open-Weight LLMs](https://arxiv.org/abs/2412.07097) Xiangyu Qi, Boyi Wei, Nicholas Carlini, Yangsibo Huang, Tinghao Xie, Luxi He, Matthew Jagielski, Milad Nasr, Prateek Mittal, Peter Henderson -+ [Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation](https://arxiv.org//abs/2412.07249) ++ [Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation](https://arxiv.org/abs/2412.07249) Xin Zhao, Xiaojun Chen, Yuexin Xuan, Zhendong Zhao -+ [Tazza: Shuffling Neural Network Parameters for Secure and Private Federated Learning](https://arxiv.org//abs/2412.07454) ++ [Tazza: Shuffling Neural Network Parameters for Secure and Private Federated Learning](https://arxiv.org/abs/2412.07454) Kichang Lee, Jaeho Jin, JaeYeon Park, JeongGil Ko -+ [FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks](https://arxiv.org//abs/2412.07672) ++ [FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks](https://arxiv.org/abs/2412.07672) Bocheng Chen, Hanqing Guo, Qiben Yan -+ [A Parametric Approach to Adversarial Augmentation for Cross-Domain Iris Presentation Attack Detection](https://arxiv.org//abs/2412.07199) ++ [A Parametric Approach to Adversarial Augmentation for Cross-Domain Iris Presentation Attack Detection](https://arxiv.org/abs/2412.07199) Debasmita Pal, Redwan Sony, Arun Ross -+ [CapGen:An Environment-Adaptive Generator of Adversarial Patches](https://arxiv.org//abs/2412.07253) ++ [CapGen:An Environment-Adaptive Generator of Adversarial Patches](https://arxiv.org/abs/2412.07253) Chaoqun Li, Zhuodong Liu, Huanqian Yan, Hang Su -+ [Backdoor Attacks against No-Reference Image Quality Assessment Models via A Scalable Trigger](https://arxiv.org//abs/2412.07277) ++ [Backdoor Attacks against No-Reference Image Quality Assessment Models via A Scalable Trigger](https://arxiv.org/abs/2412.07277) Yi Yu, Song Xia, Xun Lin, Wenhan Yang, Shijian Lu, Yap-peng Tan, Alex Kot -+ [Stealthy and Robust Backdoor Attack against 3D Point Clouds through Additional Point Features](https://arxiv.org//abs/2412.07511) ++ [Stealthy and Robust Backdoor Attack against 3D Point Clouds through Additional Point Features](https://arxiv.org/abs/2412.07511) Xiaoyang Ning, Qing Xie, Jinyu Xu, Wenbo Jiang, Jiachen Li, Yanchun Ma -+ [A New Federated Learning Framework Against Gradient Inversion Attacks](https://arxiv.org//abs/2412.07187) ++ [A New Federated Learning Framework Against Gradient Inversion Attacks](https://arxiv.org/abs/2412.07187) Pengxin Guo, Shuang Zeng, Wenhao Chen, Xiaodan Zhang, Weihong Ren, Yuyin Zhou, Liangqiong Qu -+ [Addressing Key Challenges of Adversarial Attacks and Defenses in the Tabular Domain: A Methodological Framework for Coherence and Consistency](https://arxiv.org//abs/2412.07326) ++ [Addressing Key Challenges of Adversarial Attacks and Defenses in the Tabular Domain: A Methodological Framework for Coherence and Consistency](https://arxiv.org/abs/2412.07326) Yael Itzhakev, Amit Giloni, Yuval Elovici, Asaf Shabtai -+ [AHSG: Adversarial Attacks on High-level Semantics in Graph Neural Networks](https://arxiv.org//abs/2412.07468) ++ [AHSG: Adversarial Attacks on High-level Semantics in Graph Neural Networks](https://arxiv.org/abs/2412.07468) Kai Yuan, Xiaobing Pei, Haoran Yang -+ [Adaptive Epsilon Adversarial Training for Robust Gravitational Wave Parameter Estimation Using Normalizing Flows](https://arxiv.org//abs/2412.07559) ++ [Adaptive Epsilon Adversarial Training for Robust Gravitational Wave Parameter Estimation Using Normalizing Flows](https://arxiv.org/abs/2412.07559) Yiqian Yang, Xihua Zhu, Fan Zhang -+ [PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips](https://arxiv.org//abs/2412.07192) ++ [PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips](https://arxiv.org/abs/2412.07192) Zachary Coalson, Jeonghyun Woo, Shiyang Chen, Yu Sun, Lishan Yang, Prashant Nair, Bo Fang, Sanghyun Hong -+ [Adversarial Filtering Based Evasion and Backdoor Attacks to EEG-Based Brain-Computer Interfaces](https://arxiv.org//abs/2412.07231) ++ [Adversarial Filtering Based Evasion and Backdoor Attacks to EEG-Based Brain-Computer Interfaces](https://arxiv.org/abs/2412.07231) Lubin Meng, Xue Jiang, Xiaoqing Chen, Wenzhong Liu, Hanbin Luo, Dongrui Wu -+ [Defending Against Neural Network Model Inversion Attacks via Data Poisoning](https://arxiv.org//abs/2412.07575) ++ [Defending Against Neural Network Model Inversion Attacks via Data Poisoning](https://arxiv.org/abs/2412.07575) Shuai Zhou, Dayong Ye, Tianqing Zhu, Wanlei Zhou -+ [Adversarial Autoencoders in Operator Learning](https://arxiv.org//abs/2412.07811) ++ [Adversarial Autoencoders in Operator Learning](https://arxiv.org/abs/2412.07811) Dustin Enyeart, Guang Lin # 2024-12-09 -+ [Enhancing Adversarial Resistance in LLMs with Recursion](https://arxiv.org//abs/2412.06181) ++ [Enhancing Adversarial Resistance in LLMs with Recursion](https://arxiv.org/abs/2412.06181) Bryan Li, Sounak Bagchi, Zizhan Wang -+ [A Real-Time Defense Against Object Vanishing Adversarial Patch Attacks for Object Detection in Autonomous Vehicles](https://arxiv.org//abs/2412.06215) ++ [A Real-Time Defense Against Object Vanishing Adversarial Patch Attacks for Object Detection in Autonomous Vehicles](https://arxiv.org/abs/2412.06215) Jaden Mu -+ [Data Free Backdoor Attacks](https://arxiv.org//abs/2412.06219) ++ [Data Free Backdoor Attacks](https://arxiv.org/abs/2412.06219) Bochuan Cao, Jinyuan Jia, Chuxuan Hu, Wenbo Guo, Zhen Xiang, Jinghui Chen, Bo Li, Dawn Song -+ [An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers](https://arxiv.org//abs/2412.06149) ++ [An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers](https://arxiv.org/abs/2412.06149) Xueluan Gong, Bowei Tian, Meng Xue, Yuan Wu, Yanjiao Chen, Qian Wang -+ [Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection](https://arxiv.org//abs/2412.06727) ++ [Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection](https://arxiv.org/abs/2412.06727) Caiyun Xie, Dengpan Ye, Yunming Zhang, Long Tang, Yunna Lv, Jiacheng Deng, Jiawei Song -+ [Membership Inference Attacks and Defenses in Federated Learning: A Survey](https://arxiv.org//abs/2412.06157) ++ [Membership Inference Attacks and Defenses in Federated Learning: A Survey](https://arxiv.org/abs/2412.06157) Li Bai, Haibo Hu, Qingqing Ye, Haoyang Li, Leixia Wang, Jianliang Xu -+ [Understanding Transformer-based Vision Models through Inversion](https://arxiv.org//abs/2412.06534) ++ [Understanding Transformer-based Vision Models through Inversion](https://arxiv.org/abs/2412.06534) Jan Rathjens, Shirin Reyhanian, David Kappel, Laurenz Wiskott -+ [Vulnerability of Text-Matching in ML/AI Conference Reviewer Assignments to Collusions](https://arxiv.org//abs/2412.06606) ++ [Vulnerability of Text-Matching in ML/AI Conference Reviewer Assignments to Collusions](https://arxiv.org/abs/2412.06606) Jhih-Yi Hsieh, Aditi Raghunathan, Nihar B. Shah # 2024-12-08 -+ [Large Language Models Merging for Enhancing the Link Stealing Attack on Graph Neural Networks](https://arxiv.org//abs/2412.05830) ++ [Large Language Models Merging for Enhancing the Link Stealing Attack on Graph Neural Networks](https://arxiv.org/abs/2412.05830) Faqian Guan, Tianqing Zhu, Wenhan Chang, Wei Ren, Wanlei Zhou -+ [BAMBA: A Bimodal Adversarial Multi-Round Black-Box Jailbreak Attacker for LVLMs](https://arxiv.org//abs/2412.05892) ++ [BAMBA: A Bimodal Adversarial Multi-Round Black-Box Jailbreak Attacker for LVLMs](https://arxiv.org/abs/2412.05892) Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Shaowei Yuan, Zhiqiang Wang, Xiaojun Jia -+ [Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models](https://arxiv.org//abs/2412.05934) ++ [Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models](https://arxiv.org/abs/2412.05934) Ma Teng, Jia Xiaojun, Duan Ranjie, Li Xinfeng, Huang Yihao, Chu Zhixuan, Liu Yang, Ren Wenqi -+ [Trust No AI: Prompt Injection Along The CIA Security Triad](https://arxiv.org//abs/2412.06090) ++ [Trust No AI: Prompt Injection Along The CIA Security Triad](https://arxiv.org/abs/2412.06090) Johann Rehberger (Independent Researcher, Embrace The Red) -+ [Adversarial Transferability in Deep Denoising Models: Theoretical Insights and Robustness Enhancement via Out-of-Distribution Typical Set Sampling](https://arxiv.org//abs/2412.05943) ++ [Adversarial Transferability in Deep Denoising Models: Theoretical Insights and Robustness Enhancement via Out-of-Distribution Typical Set Sampling](https://arxiv.org/abs/2412.05943) Jie Ning, Jiebao Sun, Shengzhu Shi, Zhichang Guo, Yao Li, Hongwei Li, Boying Wu -+ [Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation](https://arxiv.org//abs/2412.05980) ++ [Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation](https://arxiv.org/abs/2412.05980) Yiren Song, Shengtao Lou, Xiaokang Liu, Hai Ci, Pei Yang, Jiaming Liu, Mike Zheng Shou -+ [DeMem: Privacy-Enhanced Robust Adversarial Learning via De-Memorization](https://arxiv.org//abs/2412.05767) ++ [DeMem: Privacy-Enhanced Robust Adversarial Learning via De-Memorization](https://arxiv.org/abs/2412.05767) Xiaoyu Luo, Qiongxiu Li -+ [Understanding the Impact of Graph Reduction on Adversarial Robustness in Graph Neural Networks](https://arxiv.org//abs/2412.05883) ++ [Understanding the Impact of Graph Reduction on Adversarial Robustness in Graph Neural Networks](https://arxiv.org/abs/2412.05883) Kerui Wu, Ka-Ho Chow, Wenqi Wei, Lei Yu -+ [Perceptual Hash Inversion Attacks on Image-Based Sexual Abuse Removal Tools](https://arxiv.org//abs/2412.06056) ++ [Perceptual Hash Inversion Attacks on Image-Based Sexual Abuse Removal Tools](https://arxiv.org/abs/2412.06056) Sophie Hawkes, Christian Weinert, Teresa Almeida, Maryam Mehrnezhad # 2024-12-07 -+ [PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage](https://arxiv.org//abs/2412.05734) ++ [PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage](https://arxiv.org/abs/2412.05734) Yuzhou Nie, Zhun Wang, Ye Yu, Xian Wu, Xuandong Zhao, Wenbo Guo, Dawn Song -+ [Uncovering Vision Modality Threats in Image-to-Image Tasks](https://arxiv.org//abs/2412.05538) ++ [Uncovering Vision Modality Threats in Image-to-Image Tasks](https://arxiv.org/abs/2412.05538) Hao Cheng, Erjia Xiao, Jiayan Yang, Jiahang Cao, Qiang Zhang, Jize Zhang, Kaidi Xu, Jindong Gu, Renjing Xu -+ [Nearly Solved? Robust Deepfake Detection Requires More than Visual Forensics](https://arxiv.org//abs/2412.05676) ++ [Nearly Solved? Robust Deepfake Detection Requires More than Visual Forensics](https://arxiv.org/abs/2412.05676) Guy Levy, Nathan Liebmann # 2024-12-06 -+ [Backdooring Outlier Detection Methods: A Novel Attack Approach](https://arxiv.org//abs/2412.05010) ++ [Backdooring Outlier Detection Methods: A Novel Attack Approach](https://arxiv.org/abs/2412.05010) ZeinabSadat Taghavi, Hossein Mirzaei -+ [A Practical Examination of AI-Generated Text Detectors for Large Language Models](https://arxiv.org//abs/2412.05139) ++ [A Practical Examination of AI-Generated Text Detectors for Large Language Models](https://arxiv.org/abs/2412.05139) Brian Tufts, Xuandong Zhao, Lei Li -+ [LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds](https://arxiv.org//abs/2412.05232) ++ [LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds](https://arxiv.org/abs/2412.05232) James Beetham, Souradip Chakraborty, Mengdi Wang, Furong Huang, Amrit Singh Bedi, Mubarak Shah -+ [Megatron: Evasive Clean-Label Backdoor Attacks against Vision Transformer](https://arxiv.org//abs/2412.04776) ++ [Megatron: Evasive Clean-Label Backdoor Attacks against Vision Transformer](https://arxiv.org/abs/2412.04776) Xueluan Gong, Bowei Tian, Meng Xue, Shuike Li, Yanjiao Chen, Qian Wang -+ [SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models](https://arxiv.org//abs/2412.04852) ++ [SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models](https://arxiv.org/abs/2412.04852) Zilan Wang, Junfeng Guo, Jiacheng Zhu, Yiming Li, Heng Huang, Muhao Chen, Zhengzhong Tu -+ [Privacy Drift: Evolving Privacy Concerns in Incremental Learning](https://arxiv.org//abs/2412.05183) ++ [Privacy Drift: Evolving Privacy Concerns in Incremental Learning](https://arxiv.org/abs/2412.05183) Sayyed Farid Ahamed, Soumya Banerjee, Sandip Roy, Aayush Kapoor, Marc Vucovich, Kevin Choi, Abdul Rahman, Edward Bowen, Sachin Shetty -+ [A Differentially Private Kaplan-Meier Estimator for Privacy-Preserving Survival Analysis](https://arxiv.org//abs/2412.05164) ++ [A Differentially Private Kaplan-Meier Estimator for Privacy-Preserving Survival Analysis](https://arxiv.org/abs/2412.05164) Narasimha Raghavan Veeraragavan, Sai Praneeth Karimireddy, Jan Franz Nygård -+ [Towards Predicting the Success of Transfer-based Attacks by Quantifying Shared Feature Representations](https://arxiv.org//abs/2412.05351) ++ [Towards Predicting the Success of Transfer-based Attacks by Quantifying Shared Feature Representations](https://arxiv.org/abs/2412.05351) Ashley S. Dale, Mei Qiu, Foo Bin Che, Thomas Bsaibes, Lauren Christopher, Paul Salama -+ [BadGPT-4o: stripping safety finetuning from GPT models](https://arxiv.org//abs/2412.05346) ++ [BadGPT-4o: stripping safety finetuning from GPT models](https://arxiv.org/abs/2412.05346) Ekaterina Krupkina, Dmitrii Volkov -+ [Differentially Private Random Feature Model](https://arxiv.org//abs/2412.04785) ++ [Differentially Private Random Feature Model](https://arxiv.org/abs/2412.04785) Chunyang Liao, Deanna Needell, Hayden Schaeffer, Alexander Xue # 2024-12-05 -+ [Dimension Reduction via Random Projection for Privacy in Multi-Agent Systems](https://arxiv.org//abs/2412.04031) ++ [Dimension Reduction via Random Projection for Privacy in Multi-Agent Systems](https://arxiv.org/abs/2412.04031) Puspanjali Ghoshal, Ashok Singh Sairam -+ [Understanding and Mitigating Memorization in Generative Models via Sharpness of Probability Landscapes](https://arxiv.org//abs/2412.04140) ++ [Understanding and Mitigating Memorization in Generative Models via Sharpness of Probability Landscapes](https://arxiv.org/abs/2412.04140) Dongjae Jeon, Dueun Kim, Albert No # 2024-12-04 -+ [Less is More: A Stealthy and Efficient Adversarial Attack Method for DRL-based Autonomous Driving Policies](https://arxiv.org//abs/2412.03051) ++ [Less is More: A Stealthy and Efficient Adversarial Attack Method for DRL-based Autonomous Driving Policies](https://arxiv.org/abs/2412.03051) Junchao Fan, Xuyang Lei, Xiaolin Chang, Jelena Mišić, Vojislav B. Mišić -+ [Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?](https://arxiv.org//abs/2412.03235) ++ [Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?](https://arxiv.org/abs/2412.03235) Sravanti Addepalli, Yerram Varun, Arun Suggala, Karthikeyan Shanmugam, Prateek Jain -+ [Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models](https://arxiv.org//abs/2412.03283) ++ [Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models](https://arxiv.org/abs/2412.03283) Andreas Müller, Denis Lukovnikov, Jonas Thietke, Asja Fischer, Erwin Quiring -+ [PBP: Post-training Backdoor Purification for Malware Classifiers](https://arxiv.org//abs/2412.03441) ++ [PBP: Post-training Backdoor Purification for Malware Classifiers](https://arxiv.org/abs/2412.03441) Dung Thuy Nguyen, Ngoc N. Tran, Taylor T. Johnson, Kevin Leach -+ [NODE-AdvGAN: Improving the transferability and perceptual similarity of adversarial examples by dynamic-system-driven adversarial generative model](https://arxiv.org//abs/2412.03539) ++ [NODE-AdvGAN: Improving the transferability and perceptual similarity of adversarial examples by dynamic-system-driven adversarial generative model](https://arxiv.org/abs/2412.03539) Xinheng Xie, Yue Wu, Cuiyu He -+ [Best-of-N Jailbreaking](https://arxiv.org//abs/2412.03556) ++ [Best-of-N Jailbreaking](https://arxiv.org/abs/2412.03556) John Hughes, Sara Price, Aengus Lynch, Rylan Schaeffer, Fazl Barez, Sanmi Koyejo, Henry Sleight, Erik Jones, Ethan Perez, Mrinank Sharma -+ [Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks](https://arxiv.org//abs/2412.03453) ++ [Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks](https://arxiv.org/abs/2412.03453) Dario Serez, Marco Cristani, Alessio Del Bue, Vittorio Murino, Pietro Morerio -+ [A Taxonomy of System-Level Attacks on Deep Learning Models in Autonomous Vehicles](https://arxiv.org//abs/2412.04510) ++ [A Taxonomy of System-Level Attacks on Deep Learning Models in Autonomous Vehicles](https://arxiv.org/abs/2412.04510) Masoud Jamshidiyan Tehrani, Jinhan Kim, Rosmael Zidane Lekeufack Foulefack, Alessandro Marchetto, Paolo Tonella # 2024-12-03 -+ [Trust & Safety of LLMs and LLMs in Trust & Safety](https://arxiv.org//abs/2412.02113) ++ [Trust & Safety of LLMs and LLMs in Trust & Safety](https://arxiv.org/abs/2412.02113) Doohee You, Dan Chon -+ [Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach](https://arxiv.org//abs/2412.02159) ++ [Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach](https://arxiv.org/abs/2412.02159) Tony T. Wang, John Hughes, Henry Sleight, Rylan Schaeffer, Rajashree Agrawal, Fazl Barez, Mrinank Sharma, Jesse Mu, Nir Shavit, Ethan Perez -+ [Sustainable Self-evolution Adversarial Training](https://arxiv.org//abs/2412.02270) ++ [Sustainable Self-evolution Adversarial Training](https://arxiv.org/abs/2412.02270) Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang -+ [Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining](https://arxiv.org//abs/2412.02454) ++ [Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining](https://arxiv.org/abs/2412.02454) Zongru Wu, Pengzhou Cheng, Lingyong Fang, Zhuosheng Zhang, Gongshen Liu -+ [Pay Attention to the Robustness of Chinese Minority Language Models! Syllable-level Textual Adversarial Attack on Tibetan Script](https://arxiv.org//abs/2412.02323) ++ [Pay Attention to the Robustness of Chinese Minority Language Models! Syllable-level Textual Adversarial Attack on Tibetan Script](https://arxiv.org/abs/2412.02323) Xi Cao, Dolma Dawa, Nuo Qun, Trashi Nyima -+ [Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model](https://arxiv.org//abs/2412.02343) ++ [Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model](https://arxiv.org/abs/2412.02343) Xi Cao, Nuo Qun, Quzong Gesang, Yulei Zhu, Trashi Nyima -+ [TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity](https://arxiv.org//abs/2412.02371) ++ [TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity](https://arxiv.org/abs/2412.02371) Xi Cao, Quzong Gesang, Yuan Sun, Nuo Qun, Tashi Nyima -+ [Underload: Defending against Latency Attacks for Object Detectors on Edge Devices](https://arxiv.org//abs/2412.02171) ++ [Underload: Defending against Latency Attacks for Object Detectors on Edge Devices](https://arxiv.org/abs/2412.02171) Tianyi Wang, Zichen Wang, Cong Wang, Yuanchao Shu, Ruilong Deng, Peng Cheng, Jiming Chen (Zhejiang University, Hangzhou, China) -+ [Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization](https://arxiv.org//abs/2412.02535) ++ [Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization](https://arxiv.org/abs/2412.02535) Nicolás García Trillos, Aditya Kumar Akash, Sixu Li, Konstantin Riedl, Yuhua Zhu -+ [The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis](https://arxiv.org//abs/2412.02576) ++ [The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis](https://arxiv.org/abs/2412.02576) Qilong Wu, Varun Chandrasekaran -+ [Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects](https://arxiv.org//abs/2412.02803) ++ [Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects](https://arxiv.org/abs/2412.02803) Abdurrahman Zeybey, Mehmet Ergezer, Tommy Nguyen -+ [Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks](https://arxiv.org//abs/2412.02795) ++ [Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks](https://arxiv.org/abs/2412.02795) Zijiao Yang, Xiangxi Shi, Eric Slyman, Stefan Lee -+ [GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing](https://arxiv.org//abs/2412.02366) ++ [GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing](https://arxiv.org/abs/2412.02366) Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood, Karthik Nandakumar, Naveed Akhtar @@ -17240,642 +17240,642 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang # 2024-12-02 -+ [CopyrightShield: Spatial Similarity Guided Backdoor Defense against Copyright Infringement in Diffusion Models](https://arxiv.org//abs/2412.01528) ++ [CopyrightShield: Spatial Similarity Guided Backdoor Defense against Copyright Infringement in Diffusion Models](https://arxiv.org/abs/2412.01528) Zhixiang Guo, Siyuan Liang, Aishan Liu, Dacheng Tao -+ [R.I.P.: A Simple Black-box Attack on Continual Test-time Adaptation](https://arxiv.org//abs/2412.01154) ++ [R.I.P.: A Simple Black-box Attack on Continual Test-time Adaptation](https://arxiv.org/abs/2412.01154) Trung-Hieu Hoang, Duc Minh Vo, Minh N. Do -+ [Negative Token Merging: Image-based Adversarial Feature Guidance](https://arxiv.org//abs/2412.01339) ++ [Negative Token Merging: Image-based Adversarial Feature Guidance](https://arxiv.org/abs/2412.01339) Jaskirat Singh, Lindsey Li, Weijia Shi, Ranjay Krishna, Yejin Choi, Pang Wei Koh, Michael F. Cohen, Stephen Gould, Liang Zheng, Luke Zettlemoyer -+ [Behavior Backdoor for Deep Learning Models](https://arxiv.org//abs/2412.01369) ++ [Behavior Backdoor for Deep Learning Models](https://arxiv.org/abs/2412.01369) Jiakai Wang, Pengfei Zhang, Renshuai Tao, Jian Yang, Hao Liu, Xianglong Liu, Yunchao Wei, Yao Zhao -+ [Adversarial Attacks on Hyperbolic Networks](https://arxiv.org//abs/2412.01495) ++ [Adversarial Attacks on Hyperbolic Networks](https://arxiv.org/abs/2412.01495) Max van Spengler, Jan Zahálka, Pascal Mettes -+ [Improved Large Language Model Jailbreak Detection via Pretrained Embeddings](https://arxiv.org//abs/2412.01547) ++ [Improved Large Language Model Jailbreak Detection via Pretrained Embeddings](https://arxiv.org/abs/2412.01547) Erick Galinkin, Martin Sablotny -+ [Privacy-Preserving Federated Learning via Homomorphic Adversarial Networks](https://arxiv.org//abs/2412.01650) ++ [Privacy-Preserving Federated Learning via Homomorphic Adversarial Networks](https://arxiv.org/abs/2412.01650) Wenhan Dong, Chao Lin, Xinlei He, Xinyi Huang, Shengmin Xu -+ [Adversarial Sample-Based Approach for Tighter Privacy Auditing in Final Model-Only Scenarios](https://arxiv.org//abs/2412.01756) ++ [Adversarial Sample-Based Approach for Tighter Privacy Auditing in Final Model-Only Scenarios](https://arxiv.org/abs/2412.01756) Sangyeon Yoon, Wonje Jeung, Albert No -+ [NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training](https://arxiv.org//abs/2412.02030) ++ [NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training](https://arxiv.org/abs/2412.02030) Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song -+ [BadPatch: Diffusion-Based Generation of Physical Adversarial Patches](https://arxiv.org//abs/2412.01440) ++ [BadPatch: Diffusion-Based Generation of Physical Adversarial Patches](https://arxiv.org/abs/2412.01440) Zhixiang Wang, Xingjun Ma, Yu-Gang Jiang # 2024-12-01 -+ [Exposing LLM Vulnerabilities: Adversarial Scam Detection and Performance](https://arxiv.org//abs/2412.00621) ++ [Exposing LLM Vulnerabilities: Adversarial Scam Detection and Performance](https://arxiv.org/abs/2412.00621) Chen-Wei Chang, Shailik Sarkar, Shutonu Mitra, Qi Zhang, Hossein Salemi, Hemant Purohit, Fengxiu Zhang, Michin Hong, Jin-Hee Cho, Chang-Tien Lu -+ [SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts](https://arxiv.org//abs/2412.00765) ++ [SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts](https://arxiv.org/abs/2412.00765) Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia -+ [Online Poisoning Attack Against Reinforcement Learning under Black-box Environments](https://arxiv.org//abs/2412.00797) ++ [Online Poisoning Attack Against Reinforcement Learning under Black-box Environments](https://arxiv.org/abs/2412.00797) Jianhui Li, Bokang Zhang, Junfeng Wu # 2024-11-30 -+ [Fusing Physics-Driven Strategies and Cross-Modal Adversarial Learning: Toward Multi-Domain Applications](https://arxiv.org//abs/2412.00341) ++ [Fusing Physics-Driven Strategies and Cross-Modal Adversarial Learning: Toward Multi-Domain Applications](https://arxiv.org/abs/2412.00341) Hana Satou, Alan Mitkiy -+ [Hard-Label Black-Box Attacks on 3D Point Clouds](https://arxiv.org//abs/2412.00404) ++ [Hard-Label Black-Box Attacks on 3D Point Clouds](https://arxiv.org/abs/2412.00404) Daizong Liu, Yunbo Tao, Pan Zhou, Wei Hu -+ [Jailbreak Large Visual Language Models Through Multi-Modal Linkage](https://arxiv.org//abs/2412.00473) ++ [Jailbreak Large Visual Language Models Through Multi-Modal Linkage](https://arxiv.org/abs/2412.00473) Yu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, Tianxing He -+ [Exact Certification of (Graph) Neural Networks Against Label Poisoning](https://arxiv.org//abs/2412.00537) ++ [Exact Certification of (Graph) Neural Networks Against Label Poisoning](https://arxiv.org/abs/2412.00537) Mahalakshmi Sabanayagam, Lukas Gosch, Stephan Günnemann, Debarghya Ghoshdastidar # 2024-11-29 -+ [Gradient Inversion Attack on Graph Neural Networks](https://arxiv.org//abs/2411.19440) ++ [Gradient Inversion Attack on Graph Neural Networks](https://arxiv.org/abs/2411.19440) Divya Anand Sinha, Yezi Liu, Ruijie Du, Yanning Shen -+ [FLARE: Towards Universal Dataset Purification against Backdoor Attacks](https://arxiv.org//abs/2411.19479) ++ [FLARE: Towards Universal Dataset Purification against Backdoor Attacks](https://arxiv.org/abs/2411.19479) Linshan Hou, Wei Luo, Zhongyun Hua, Songhua Chen, Leo Yu Zhang, Yiming Li -+ [Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook](https://arxiv.org//abs/2411.19537) ++ [Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook](https://arxiv.org/abs/2411.19537) Florinel-Alin Croitoru, Andrei-Iulian Hiji, Vlad Hondru, Nicolae Catalin Ristea, Paul Irofti, Marius Popescu, Cristian Rusu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah -+ [LUMIA: Linear probing for Unimodal and MultiModal Membership Inference A!acks leveraging internal LLM states](https://arxiv.org//abs/2411.19876) ++ [LUMIA: Linear probing for Unimodal and MultiModal Membership Inference A!acks leveraging internal LLM states](https://arxiv.org/abs/2411.19876) Luis Ibanez-Lissen, Lorena Gonzalez-Manzano, Jose Maria de Fuentes, Nicolas Anciaux, Joaquin Garcia-Alfaro -+ [Towards Class-wise Robustness Analysis](https://arxiv.org//abs/2411.19853) ++ [Towards Class-wise Robustness Analysis](https://arxiv.org/abs/2411.19853) Tejaswini Medi, Julia Grabinski, Margret Keuper -+ [On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code](https://arxiv.org//abs/2411.19508) ++ [On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code](https://arxiv.org/abs/2411.19508) Md Imran Hossen, Xiali Hei # 2024-11-28 -+ [Knowledge Database or Poison Base? Detecting RAG Poisoning Attack through LLM Activations](https://arxiv.org//abs/2411.18948) ++ [Knowledge Database or Poison Base? Detecting RAG Poisoning Attack through LLM Activations](https://arxiv.org/abs/2411.18948) Xue Tan, Hao Luan, Mingyu Luo, Xiaoyan Sun, Ping Chen, Jun Dai -+ [Random Sampling for Diffusion-based Adversarial Purification](https://arxiv.org//abs/2411.18956) ++ [Random Sampling for Diffusion-based Adversarial Purification](https://arxiv.org/abs/2411.18956) Jiancheng Zhang, Peiran Dong, Yongyong Chen, Yin-Ping Zhao, Song Guo -+ [LADDER: Multi-objective Backdoor Attack via Evolutionary Algorithm](https://arxiv.org//abs/2411.19075) ++ [LADDER: Multi-objective Backdoor Attack via Evolutionary Algorithm](https://arxiv.org/abs/2411.19075) Dazhuang Liu, Yanqi Qiao, Rui Wang, Kaitai Liang, Georgios Smaragdakis -+ [PEFT-as-an-Attack! Jailbreaking Language Models during Federated Parameter-Efficient Fine-Tuning](https://arxiv.org//abs/2411.19335) ++ [PEFT-as-an-Attack! Jailbreaking Language Models during Federated Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2411.19335) Shenghui Li, Edith C.-H. Ngai, Fanghua Ye, Thiemo Voigt -+ [Swarm Intelligence-Driven Client Selection for Federated Learning in Cybersecurity applications](https://arxiv.org//abs/2411.18877) ++ [Swarm Intelligence-Driven Client Selection for Federated Learning in Cybersecurity applications](https://arxiv.org/abs/2411.18877) Koffka Khan, Wayne Goodridge -+ [SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments](https://arxiv.org//abs/2412.00114) ++ [SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments](https://arxiv.org/abs/2412.00114) Yue Cao, Yun Xing, Jie Zhang, Di Lin, Tianwei Zhang, Ivor Tsang, Yang Liu, Qing Guo # 2024-11-27 -+ [Hidden Data Privacy Breaches in Federated Learning](https://arxiv.org//abs/2411.18269) ++ [Hidden Data Privacy Breaches in Federated Learning](https://arxiv.org/abs/2411.18269) Xueluan Gong, Yuji Wang, Shuaike Li, Mengyuan Sun, Songze Li, Qian Wang, Kwok-Yan Lam, Chen Chen -+ [Neutralizing Backdoors through Information Conflicts for Large Language Models](https://arxiv.org//abs/2411.18280) ++ [Neutralizing Backdoors through Information Conflicts for Large Language Models](https://arxiv.org/abs/2411.18280) Chen Chen, Yuchen Sun, Xueluan Gong, Jiaxin Gao, Kwok-Yan Lam -+ [Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models](https://arxiv.org//abs/2411.18000) ++ [Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models](https://arxiv.org/abs/2411.18000) Shuyang Hao, Bryan Hooi, Jun Liu, Kai-Wei Chang, Zi Huang, Yujun Cai -+ [Visual Adversarial Attack on Vision-Language Models for Autonomous Driving](https://arxiv.org//abs/2411.18275) ++ [Visual Adversarial Attack on Vision-Language Models for Autonomous Driving](https://arxiv.org/abs/2411.18275) Tianyuan Zhang, Lu Wang, Xinwei Zhang, Yitong Zhang, Boyi Jia, Siyuan Liang, Shengshan Hu, Qiang Fu, Aishan Liu, Xianglong Liu -+ [Adversarial Training in Low-Label Regimes with Margin-Based Interpolation](https://arxiv.org//abs/2411.17959) ++ [Adversarial Training in Low-Label Regimes with Margin-Based Interpolation](https://arxiv.org/abs/2411.17959) Tian Ye, Rajgopal Kannan, Viktor Prasanna -+ [InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks](https://arxiv.org//abs/2411.18191) ++ [InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks](https://arxiv.org/abs/2411.18191) Xinyao Zheng, Husheng Han, Shangyi Shi, Qiyan Fang, Zidong Du, Qi Guo, Xing Hu -+ [Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs](https://arxiv.org//abs/2411.18216) ++ [Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs](https://arxiv.org/abs/2411.18216) Samuele Pasini, Jinhan Kim, Tommaso Aiello, Rocio Cabrera Lozoya, Antonino Sabetta, Paolo Tonella -+ [PRSI: Privacy-Preserving Recommendation Model Based on Vector Splitting and Interactive Protocols](https://arxiv.org//abs/2411.18653) ++ [PRSI: Privacy-Preserving Recommendation Model Based on Vector Splitting and Interactive Protocols](https://arxiv.org/abs/2411.18653) Xiaokai Cao, Wenjin Mo, Zhenyu He, Changdong Wang -+ [Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment](https://arxiv.org//abs/2411.18688) ++ [Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment](https://arxiv.org/abs/2411.18688) Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh, Tianrui Guan, Mengdi Wang, Ahmad Beirami, Furong Huang, Alvaro Velasquez, Dinesh Manocha, Amrit Singh Bedi -+ [An indicator for effectiveness of text-to-image guardrails utilizing the Single-Turn Crescendo Attack (STCA)](https://arxiv.org//abs/2411.18699) ++ [An indicator for effectiveness of text-to-image guardrails utilizing the Single-Turn Crescendo Attack (STCA)](https://arxiv.org/abs/2411.18699) Ted Kwartler, Nataliia Bagan, Ivan Banny, Alan Aqrawi, Arian Abbasi -+ [Fall Leaf Adversarial Attack on Traffic Sign Classification](https://arxiv.org//abs/2411.18776) ++ [Fall Leaf Adversarial Attack on Traffic Sign Classification](https://arxiv.org/abs/2411.18776) Anthony Etim, Jakub Szefer -+ [Machine Unlearning for Speaker-Agnostic Detection of Gender-Based Violence Condition in Speech](https://arxiv.org//abs/2411.18177) ++ [Machine Unlearning for Speaker-Agnostic Detection of Gender-Based Violence Condition in Speech](https://arxiv.org/abs/2411.18177) Emma Reyner-Fuentes, Esther Rituerto-Gonzalez, Carmen Pelaez-Moreno # 2024-11-26 -+ [RED: Robust Environmental Design](https://arxiv.org//abs/2411.17026) ++ [RED: Robust Environmental Design](https://arxiv.org/abs/2411.17026) Jinghan Yan -+ [LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks](https://arxiv.org//abs/2411.17209) ++ [LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks](https://arxiv.org/abs/2411.17209) Tianyi Wang, Mengxiao Huang, Harry Cheng, Xiao Zhang, Zhiqi Shen -+ [BadScan: An Architectural Backdoor Attack on Visual State Space Models](https://arxiv.org//abs/2411.17283) ++ [BadScan: An Architectural Backdoor Attack on Visual State Space Models](https://arxiv.org/abs/2411.17283) Om Suhas Deshmukh, Sankalp Nagaonkar, Achyut Mani Tripathi, Ashish Mishra -+ [Adversarial Bounding Boxes Generation (ABBG) Attack against Visual Object Trackers](https://arxiv.org//abs/2411.17468) ++ [Adversarial Bounding Boxes Generation (ABBG) Attack against Visual Object Trackers](https://arxiv.org/abs/2411.17468) Fatemeh Nourilenjan Nokabadi, Jean-Francois Lalonde, Christian Gagné -+ [CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening](https://arxiv.org//abs/2411.16996) ++ [CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening](https://arxiv.org/abs/2411.16996) Amar Kulkarni, Shangtong Zhang, Madhur Behl -+ [PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning](https://arxiv.org//abs/2411.17453) ++ [PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2411.17453) Zhen Sun, Tianshuo Cong, Yule Liu, Chenhao Lin, Xinlei He, Rongmao Chen, Xingshuo Han, Xinyi Huang -+ [RTL-Breaker: Assessing the Security of LLMs against Backdoor Attacks on HDL Code Generation](https://arxiv.org//abs/2411.17569) ++ [RTL-Breaker: Assessing the Security of LLMs against Backdoor Attacks on HDL Code Generation](https://arxiv.org/abs/2411.17569) Lakshmi Likhitha Mankali, Jitendra Bhandari, Manaar Alam, Ramesh Karri, Michail Maniatakos, Ozgur Sinanoglu, Johann Knechtel -+ [Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey](https://arxiv.org//abs/2411.17911) ++ [Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey](https://arxiv.org/abs/2411.17911) Hong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen, Nhien-An Le-Khac -+ [Stealthy Multi-Task Adversarial Attacks](https://arxiv.org//abs/2411.17936) ++ [Stealthy Multi-Task Adversarial Attacks](https://arxiv.org/abs/2411.17936) Jiacheng Guo, Tianyun Zhang, Lei Li, Haochen Yang, Hongkai Yu, Minghai Qin -+ [MADE: Graph Backdoor Defense with Masked Unlearning](https://arxiv.org//abs/2411.18648) ++ [MADE: Graph Backdoor Defense with Masked Unlearning](https://arxiv.org/abs/2411.18648) Xiao Lin amd Mingjie Li, Yisen Wang # 2024-11-25 -+ [Imperceptible Adversarial Examples in the Physical World](https://arxiv.org//abs/2411.16622) ++ [Imperceptible Adversarial Examples in the Physical World](https://arxiv.org/abs/2411.16622) Weilin Xu, Sebastian Szyller, Cory Cornelius, Luis Murillo Rojas, Marius Arvinte, Alvaro Velasquez, Jason Martin, Nageen Himayat -+ [Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective](https://arxiv.org//abs/2411.16642) ++ [Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective](https://arxiv.org/abs/2411.16642) Jean Marie Tshimula, Xavier Ndona, D'Jeff K. Nkashama, Pierre-Martin Tardif, Froduald Kabanza, Marc Frappier, Shengrui Wang -+ [Sparse patches adversarial attacks via extrapolating point-wise information](https://arxiv.org//abs/2411.16162) ++ [Sparse patches adversarial attacks via extrapolating point-wise information](https://arxiv.org/abs/2411.16162) Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin -+ [Privacy Protection in Personalized Diffusion Models via Targeted Cross-Attention Adversarial Attack](https://arxiv.org//abs/2411.16437) ++ [Privacy Protection in Personalized Diffusion Models via Targeted Cross-Attention Adversarial Attack](https://arxiv.org/abs/2411.16437) Xide Xu, Muhammad Atif Butt, Sandesh Kamath, Bogdan Raducanu -+ [Guarding the Gate: ConceptGuard Battles Concept-Level Backdoors in Concept Bottleneck Models](https://arxiv.org//abs/2411.16512) ++ [Guarding the Gate: ConceptGuard Battles Concept-Level Backdoors in Concept Bottleneck Models](https://arxiv.org/abs/2411.16512) Songning Lai, Yu Huang, Jiayu Yang, Gaoxiang Huang, Wenshuo Chen, Yutao Yue -+ [Unlocking The Potential of Adaptive Attacks on Diffusion-Based Purification](https://arxiv.org//abs/2411.16598) ++ [Unlocking The Potential of Adaptive Attacks on Diffusion-Based Purification](https://arxiv.org/abs/2411.16598) Andre Kassis, Urs Hengartner, Yaoliang Yu -+ [DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders](https://arxiv.org//abs/2411.16154) ++ [DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders](https://arxiv.org/abs/2411.16154) Sizai Hou, Songze Li, Duanyi Yao -+ [BadSFL: Backdoor Attack against Scaffold Federated Learning](https://arxiv.org//abs/2411.16167) ++ [BadSFL: Backdoor Attack against Scaffold Federated Learning](https://arxiv.org/abs/2411.16167) Xingshuo Han, Xiang Lan, Haozhao Wang, Shengmin Xu, Shen Ren, Jason Zeng, Ming Wu, Michael Heinrich, Tianwei Zhang -+ [Adversarial Attacks for Drift Detection](https://arxiv.org//abs/2411.16591) ++ [Adversarial Attacks for Drift Detection](https://arxiv.org/abs/2411.16591) Fabian Hinder, Valerie Vaquet, Barbara Hammer -+ [Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing](https://arxiv.org//abs/2411.16832) ++ [Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing](https://arxiv.org/abs/2411.16832) Hanhui Wang, Yihua Zhang, Ruizheng Bai, Yue Zhao, Sijia Liu, Zhengzhong Tu -+ [In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models](https://arxiv.org//abs/2411.16769) ++ [In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models](https://arxiv.org/abs/2411.16769) Zhi-Yi Chin, Kuan-Chen Mu, Mario Fritz, Pin-Yu Chen, Wei-Chen Chiu -+ [Scaling Laws for Black box Adversarial Attacks](https://arxiv.org//abs/2411.16782) ++ [Scaling Laws for Black box Adversarial Attacks](https://arxiv.org/abs/2411.16782) Chuan Liu, Huanran Chen, Yichi Zhang, Yinpeng Dong, Jun Zhu # 2024-11-24 -+ [Data Lineage Inference: Uncovering Privacy Vulnerabilities of Dataset Pruning](https://arxiv.org//abs/2411.15796) ++ [Data Lineage Inference: Uncovering Privacy Vulnerabilities of Dataset Pruning](https://arxiv.org/abs/2411.15796) Qi Li, Cheng-Long Wang, Yinzhi Cao, Di Wang -+ [Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks](https://arxiv.org//abs/2411.15720) ++ [Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks](https://arxiv.org/abs/2411.15720) Peng Xie, Yequan Bie, Jianda Mao, Yangqiu Song, Yang Wang, Hao Chen, Kani Chen -+ [A Tunable Despeckling Neural Network Stabilized via Diffusion Equation](https://arxiv.org//abs/2411.15921) ++ [A Tunable Despeckling Neural Network Stabilized via Diffusion Equation](https://arxiv.org/abs/2411.15921) Yi Ran, Zhichang Guo, Jia Li, Yao Li, Martin Burger, Boying Wu -+ [ExAL: An Exploration Enhanced Adversarial Learning Algorithm](https://arxiv.org//abs/2411.15878) ++ [ExAL: An Exploration Enhanced Adversarial Learning Algorithm](https://arxiv.org/abs/2411.15878) A Vinil, Aneesh Sreevallabh Chivukula, Pranav Chintareddy -+ [Hide in Plain Sight: Clean-Label Backdoor for Auditing Membership Inference](https://arxiv.org//abs/2411.16763) ++ [Hide in Plain Sight: Clean-Label Backdoor for Auditing Membership Inference](https://arxiv.org/abs/2411.16763) Depeng Chen, Hao Chen, Hulin Jin, Jie Cui, Hong Zhong # 2024-11-23 -+ [Twin Trigger Generative Networks for Backdoor Attacks against Object Detection](https://arxiv.org//abs/2411.15439) ++ [Twin Trigger Generative Networks for Backdoor Attacks against Object Detection](https://arxiv.org/abs/2411.15439) Zhiying Li, Zhi Liu, Guanggang Geng, Shreyank N Gowda, Shuyuan Lin, Jian Weng, Xiaobo Jin -+ [Improving Transferable Targeted Attacks with Feature Tuning Mixup](https://arxiv.org//abs/2411.15553) ++ [Improving Transferable Targeted Attacks with Feature Tuning Mixup](https://arxiv.org/abs/2411.15553) Kaisheng Liang, Xuelong Dai, Yanjie Li, Dong Wang, Bin Xiao -+ [Enhancing the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation](https://arxiv.org//abs/2411.15555) ++ [Enhancing the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation](https://arxiv.org/abs/2411.15555) Fengfan Zhou, Bangjie Yin, Hefei Ling, Qianyu Zhou, Wenxuan Wang -+ [Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment](https://arxiv.org//abs/2411.15673) ++ [Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment](https://arxiv.org/abs/2411.15673) Alvi Md Ishmam, Christopher Thomas -+ [Unveiling the Achilles' Heel: Backdoor Watermarking Forgery Attack in Public Dataset Protection](https://arxiv.org//abs/2411.15450) ++ [Unveiling the Achilles' Heel: Backdoor Watermarking Forgery Attack in Public Dataset Protection](https://arxiv.org/abs/2411.15450) Zhiying Li, Zhi Liu, Dongjie Liu, Shengda Zhuo, Guanggang Geng, Jian Weng, Shanxiang Lyu, Xiaobo Jin -+ [Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks](https://arxiv.org//abs/2411.16721) ++ [Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks](https://arxiv.org/abs/2411.16721) Han Wang, Gang Wang, Huan Zhang -+ ["Moralized" Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks](https://arxiv.org//abs/2411.16730) ++ ["Moralized" Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks](https://arxiv.org/abs/2411.16730) Libo Wang -+ [LoBAM: LoRA-Based Backdoor Attack on Model Merging](https://arxiv.org//abs/2411.16746) ++ [LoBAM: LoRA-Based Backdoor Attack on Model Merging](https://arxiv.org/abs/2411.16746) Ming Yin, Jingyang Zhang, Jingwei Sun, Minghong Fang, Hai Li, Yiran Chen -+ [Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens](https://arxiv.org//abs/2411.16724) ++ [Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens](https://arxiv.org/abs/2411.16724) Zhangqi Jiang, Junkai Chen, Beier Zhu, Tingjin Luo, Yankun Shen, Xu Yang -+ [MUNBa: Machine Unlearning via Nash Bargaining](https://arxiv.org//abs/2411.15537) ++ [MUNBa: Machine Unlearning via Nash Bargaining](https://arxiv.org/abs/2411.15537) Jing Wu, Mehrtash Harandi # 2024-11-22 -+ [Universal and Context-Independent Triggers for Precise Control of LLM Outputs](https://arxiv.org//abs/2411.14738) ++ [Universal and Context-Independent Triggers for Precise Control of LLM Outputs](https://arxiv.org/abs/2411.14738) Jiashuo Liang, Guancheng Li, Yang Yu -+ [Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Language Models](https://arxiv.org//abs/2411.14842) ++ [Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Language Models](https://arxiv.org/abs/2411.14842) Wanqi Yang, Yanda Li, Meng Fang, Yunchao Wei, Tianyi Zhou, Ling Chen -+ [Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning](https://arxiv.org//abs/2411.14937) ++ [Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning](https://arxiv.org/abs/2411.14937) Junjie Shan, Ziqi Zhao, Jialin Lu, Rui Zhang, Siu Ming Yiu, Ka-Ho Chow -+ [TrojanEdit: Backdooring Text-Based Image Editing Models](https://arxiv.org//abs/2411.14681) ++ [TrojanEdit: Backdooring Text-Based Image Editing Models](https://arxiv.org/abs/2411.14681) Ji Guo, Peihong Chen, Wenbo Jiang, Guoming Lu -+ [GraphTheft: Quantifying Privacy Risks in Graph Prompt Learning](https://arxiv.org//abs/2411.14718) ++ [GraphTheft: Quantifying Privacy Risks in Graph Prompt Learning](https://arxiv.org/abs/2411.14718) Jiani Zhu, Xi Lin, Yuxin Qi, Qinghua Mao -+ [SecONN: An Optical Neural Network Framework with Concurrent Detection of Thermal Fault Injection Attacks](https://arxiv.org//abs/2411.14741) ++ [SecONN: An Optical Neural Network Framework with Concurrent Detection of Thermal Fault Injection Attacks](https://arxiv.org/abs/2411.14741) Kota Nishida, Yoshihiro Midoh, Noriyuki Miura, Satoshi Kawakami, Jun Shiomi -+ [Adversarial Prompt Distillation for Vision-Language Models](https://arxiv.org//abs/2411.15244) ++ [Adversarial Prompt Distillation for Vision-Language Models](https://arxiv.org/abs/2411.15244) Lin Luo, Xin Wang, Bojia Zi, Shihao Zhao, Xingjun Ma -+ [Heavy-tailed Contamination is Easier than Adversarial Contamination](https://arxiv.org//abs/2411.15306) ++ [Heavy-tailed Contamination is Easier than Adversarial Contamination](https://arxiv.org/abs/2411.15306) Yeshwanth Cherapanamjeri, Daniel Lee -+ [Exploring the Robustness and Transferability of Patch-Based Adversarial Attacks in Quantized Neural Networks](https://arxiv.org//abs/2411.15246) ++ [Exploring the Robustness and Transferability of Patch-Based Adversarial Attacks in Quantized Neural Networks](https://arxiv.org/abs/2411.15246) Amira Guesmi, Bassem Ouni, Muhammad Shafique -+ [Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI](https://arxiv.org//abs/2411.15265) ++ [Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI](https://arxiv.org/abs/2411.15265) Won Jun Kim, Hyungjin Chung, Jaemin Kim, Sangmin Lee, Byeongsu Sim, Jong Chul Ye -+ [Differentially Private Adaptation of Diffusion Models via Noisy Aggregated Embeddings](https://arxiv.org//abs/2411.14639) ++ [Differentially Private Adaptation of Diffusion Models via Noisy Aggregated Embeddings](https://arxiv.org/abs/2411.14639) Pura Peetathawatchai, Wei-Ning Chen, Berivan Isik, Sanmi Koyejo, Albert No # 2024-11-21 -+ [AttentionBreaker: Adaptive Evolutionary Optimization for Unmasking Vulnerabilities in LLMs through Bit-Flip Attacks](https://arxiv.org//abs/2411.13757) ++ [AttentionBreaker: Adaptive Evolutionary Optimization for Unmasking Vulnerabilities in LLMs through Bit-Flip Attacks](https://arxiv.org/abs/2411.13757) Sanjay Das, Swastik Bhattacharya, Souvik Kundu, Shamik Kundu, Anand Menon, Arnab Raha, Kanad Basu -+ [A Survey on Adversarial Robustness of LiDAR-based Machine Learning Perception in Autonomous Vehicles](https://arxiv.org//abs/2411.13778) ++ [A Survey on Adversarial Robustness of LiDAR-based Machine Learning Perception in Autonomous Vehicles](https://arxiv.org/abs/2411.13778) Junae Kim, Amardeep Kaur -+ [GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs](https://arxiv.org//abs/2411.14133) ++ [GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs](https://arxiv.org/abs/2411.14133) Advik Raj Basani, Xiao Zhang -+ [AnywhereDoor: Multi-Target Backdoor Attacks on Object Detection](https://arxiv.org//abs/2411.14243) ++ [AnywhereDoor: Multi-Target Backdoor Attacks on Object Detection](https://arxiv.org/abs/2411.14243) Jialin Lu, Junjie Shan, Ziqi Zhao, Ka-Ho Chow -+ [Generating Realistic Adversarial Examples for Business Processes using Variational Autoencoders](https://arxiv.org//abs/2411.14263) ++ [Generating Realistic Adversarial Examples for Business Processes using Variational Autoencoders](https://arxiv.org/abs/2411.14263) Alexander Stevens, Jari Peeperkorn, Johannes De Smedt, Jochen De Weerdt -+ [Adversarial Poisoning Attack on Quantum Machine Learning Models](https://arxiv.org//abs/2411.14412) ++ [Adversarial Poisoning Attack on Quantum Machine Learning Models](https://arxiv.org/abs/2411.14412) Satwik Kundu, Swaroop Ghosh -+ [Learning Fair Robustness via Domain Mixup](https://arxiv.org//abs/2411.14424) ++ [Learning Fair Robustness via Domain Mixup](https://arxiv.org/abs/2411.14424) Meiyu Zhong, Ravi Tandon -+ [RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks](https://arxiv.org//abs/2411.14110) ++ [RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks](https://arxiv.org/abs/2411.14110) Changyue Jiang, Xudong Pan, Geng Hong, Chenfu Bao, Min Yang -+ [Memory Backdoor Attacks on Neural Networks](https://arxiv.org//abs/2411.14516) ++ [Memory Backdoor Attacks on Neural Networks](https://arxiv.org/abs/2411.14516) Eden Luzon, Guy Amit, Roy Weiss, Yisroel Mirsky -+ [Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation](https://arxiv.org//abs/2411.15222) ++ [Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation](https://arxiv.org/abs/2411.15222) Ke Zhao, Huayang Huang, Miao Li, Yu Wu -+ [XAgents: A Framework for Interpretable Rule-Based Multi-Agents Cooperation](https://arxiv.org//abs/2411.13932) ++ [XAgents: A Framework for Interpretable Rule-Based Multi-Agents Cooperation](https://arxiv.org/abs/2411.13932) Hailong Yang, Mingxian Gu, Renhuo Zhao, Fuping Hu, Zhaohong Deng, Yitang Chen # 2024-11-20 -+ [Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning](https://arxiv.org//abs/2411.13116) ++ [Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning](https://arxiv.org/abs/2411.13116) Zhi Luo, Xiyuan Yang, Pan Zhou, Di Wang -+ [SoK: A Systems Perspective on Compound AI Threats and Countermeasures](https://arxiv.org//abs/2411.13459) ++ [SoK: A Systems Perspective on Compound AI Threats and Countermeasures](https://arxiv.org/abs/2411.13459) Sarbartha Banerjee, Prateek Sahu, Mulong Luo, Anjo Vahldiek-Oberwagner, Neeraja J. Yadwadkar, Mohit Tiwari -+ [WaterPark: A Robustness Assessment of Language Model Watermarking](https://arxiv.org//abs/2411.13425) ++ [WaterPark: A Robustness Assessment of Language Model Watermarking](https://arxiv.org/abs/2411.13425) Jiacheng Liang, Zian Wang, Lauren Hong, Shouling Ji, Ting Wang -+ [TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models](https://arxiv.org//abs/2411.13136) ++ [TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models](https://arxiv.org/abs/2411.13136) Xin Wang, Kai Chen, Jiaming Zhang, Jingjing Chen, Xingjun Ma -+ [Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors](https://arxiv.org//abs/2411.13047) ++ [Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors](https://arxiv.org/abs/2411.13047) Satoru Koda, Ikuya Morikawa -+ [Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks](https://arxiv.org//abs/2411.15210) ++ [Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks](https://arxiv.org/abs/2411.15210) Yong Xie, Weijie Zheng, Hanxun Huang, Guangnan Ye, Xingjun Ma # 2024-11-19 -+ [DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning](https://arxiv.org//abs/2411.12220) ++ [DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning](https://arxiv.org/abs/2411.12220) Kichang Lee, Yujin Shin, Jonghyuk Yun, Jun Han, JeongGil Ko -+ [Attribute Inference Attacks for Federated Regression Tasks](https://arxiv.org//abs/2411.12697) ++ [Attribute Inference Attacks for Federated Regression Tasks](https://arxiv.org/abs/2411.12697) Francesco Diana, Othmane Marfoq, Chuan Xu, Giovanni Neglia, Frédéric Giroire, Eoin Thomas -+ [Combinational Backdoor Attack against Customized Text-to-Image Models](https://arxiv.org//abs/2411.12389) ++ [Combinational Backdoor Attack against Customized Text-to-Image Models](https://arxiv.org/abs/2411.12389) Wenbo Jiang, Jiaming He, Hongwei Li, Guowen Xu, Rui Zhang, Hanxiao Chen, Meng Hao, Haomiao Yang -+ [When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations](https://arxiv.org//abs/2411.12701) ++ [When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations](https://arxiv.org/abs/2411.12701) Huaizhi Ge, Yiming Li, Qifan Wang, Yongfeng Zhang, Ruixiang Tang -+ [NMT-Obfuscator Attack: Ignore a sentence in translation with only one word](https://arxiv.org//abs/2411.12473) ++ [NMT-Obfuscator Attack: Ignore a sentence in translation with only one word](https://arxiv.org/abs/2411.12473) Sahar Sadrizadeh, César Descalzo, Ljiljana Dolamic, Pascal Frossard -+ [Stochastic BIQA: Median Randomized Smoothing for Certified Blind Image Quality Assessment](https://arxiv.org//abs/2411.12575) ++ [Stochastic BIQA: Median Randomized Smoothing for Certified Blind Image Quality Assessment](https://arxiv.org/abs/2411.12575) Ekaterina Shumitskaya, Mikhail Pautov, Dmitriy Vatolin, Anastasia Antsiferova -+ [CDI: Copyrighted Data Identification in Diffusion Models](https://arxiv.org//abs/2411.12858) ++ [CDI: Copyrighted Data Identification in Diffusion Models](https://arxiv.org/abs/2411.12858) Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic -+ [Trojan Cleansing with Neural Collapse](https://arxiv.org//abs/2411.12914) ++ [Trojan Cleansing with Neural Collapse](https://arxiv.org/abs/2411.12914) Xihe Gu, Greg Fields, Yaman Jandali, Tara Javidi, Farinaz Koushanfar -+ [ProSec: Fortifying Code LLMs with Proactive Security Alignment](https://arxiv.org//abs/2411.12882) ++ [ProSec: Fortifying Code LLMs with Proactive Security Alignment](https://arxiv.org/abs/2411.12882) Xiangzhe Xu, Zian Su, Jinyao Guo, Kaiyuan Zhang, Zhenting Wang, Xiangyu Zhang # 2024-11-18 -+ [Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment](https://arxiv.org//abs/2411.11543) ++ [Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment](https://arxiv.org/abs/2411.11543) Zhendong Liu, Yuanbi Nie, Yingshui Tan, Xiangyu Yue, Qiushi Cui, Chongjun Wang, Xiaoyong Zhu, Bo Zheng -+ [TrojanRobot: Backdoor Attacks Against Robotic Manipulation in the Physical World](https://arxiv.org//abs/2411.11683) ++ [TrojanRobot: Backdoor Attacks Against Robotic Manipulation in the Physical World](https://arxiv.org/abs/2411.11683) Xianlong Wang, Hewen Pan, Hangtao Zhang, Minghui Li, Shengshan Hu, Ziqi Zhou, Lulu Xue, Peijin Guo, Yichen Wang, Wei Wan, Aishan Liu, Leo Yu Zhang -+ [Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods](https://arxiv.org//abs/2411.11795) ++ [Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods](https://arxiv.org/abs/2411.11795) Egor Kovalev, Georgii Bychkov, Khaled Abud, Aleksandr Gushchin, Anna Chistyakova, Sergey Lavrushkin, Dmitriy Vatolin, Anastasia Antsiferova -+ [Membership Inference Attack against Long-Context Large Language Models](https://arxiv.org//abs/2411.11424) ++ [Membership Inference Attack against Long-Context Large Language Models](https://arxiv.org/abs/2411.11424) Zixiong Wang, Gaoyang Liu, Yang Yang, Chen Wang -+ [Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models](https://arxiv.org//abs/2411.11496) ++ [Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models](https://arxiv.org/abs/2411.11496) Chenhang Cui, Gelei Deng, An Zhang, Jingnan Zheng, Yicong Li, Lianli Gao, Tianwei Zhang, Tat-Seng Chua -+ [Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization](https://arxiv.org//abs/2411.11525) ++ [Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization](https://arxiv.org/abs/2411.11525) Mingda Zhang, Mingli Zhu, Zihao Zhu, Baoyuan Wu -+ [Theoretical Corrections and the Leveraging of Reinforcement Learning to Enhance Triangle Attack](https://arxiv.org//abs/2411.12071) ++ [Theoretical Corrections and the Leveraging of Reinforcement Learning to Enhance Triangle Attack](https://arxiv.org/abs/2411.12071) Nicole Meng, Caleb Manicke, David Chen, Yingjie Lao, Caiwen Ding, Pengyu Hong, Kaleel Mahmood -+ [Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics](https://arxiv.org//abs/2411.13587) ++ [Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics](https://arxiv.org/abs/2411.13587) Taowen Wang, Dongfang Liu, James Chenhao Liang, Wenhao Yang, Qifan Wang, Cheng Han, Jiebo Luo, Ruixiang Tang -+ [Parallelly Tempered Generative Adversarial Nets: Toward Stabilized Gradients](https://arxiv.org//abs/2411.11786) ++ [Parallelly Tempered Generative Adversarial Nets: Toward Stabilized Gradients](https://arxiv.org/abs/2411.11786) Jinwon Sohn, Qifan Song -+ [Watermarking Visual Concepts for Diffusion Models](https://arxiv.org//abs/2411.11688) ++ [Watermarking Visual Concepts for Diffusion Models](https://arxiv.org/abs/2411.11688) Liangqi Lei, Keke Gai, Jing Yu, Liehuang Zhu, Qi Wu -+ [Secure Reinforcement Learning via Shuffle Privacy Model](https://arxiv.org//abs/2411.11647) ++ [Secure Reinforcement Learning via Shuffle Privacy Model](https://arxiv.org/abs/2411.11647) Shaojie Bai, Mohammad Sadegh Talebi, Chengcheng Zhao, Peng Cheng, Jiming Chen @@ -17884,17 +17884,17 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Mohamad Fazelnia, Sara Moshtari, Mehdi Mirakhorli # 2024-11-17 -+ [BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation](https://arxiv.org//abs/2411.11006) ++ [BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation](https://arxiv.org/abs/2411.11006) Haiyang Yu, Tian Xie, Jiaping Gui, Pengyang Wang, Ping Yi, Yue Wu -+ [Time Step Generating: A Universal Synthesized Deepfake Image Detector](https://arxiv.org//abs/2411.11016) ++ [Time Step Generating: A Universal Synthesized Deepfake Image Detector](https://arxiv.org/abs/2411.11016) Ziyue Zeng, Haoyuan Liu, Dingjie Peng, Luoxu Jing, Hiroshi Watanabe -+ [CLMIA: Membership Inference Attacks via Unsupervised Contrastive Learning](https://arxiv.org//abs/2411.11144) ++ [CLMIA: Membership Inference Attacks via Unsupervised Contrastive Learning](https://arxiv.org/abs/2411.11144) Depeng Chen, Xiao Liu, Jie Cui, Hong Zhong (School of Computer Science and Technology, Anhui University) @@ -17903,291 +17903,291 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Minhua Lin, Enyan Dai, Junjie Xu, Jinyuan Jia, Xiang Zhang, Suhang Wang -+ [SoK: The Security-Safety Continuum of Multimodal Foundation Models through Information Flow and Game-Theoretic Defenses](https://arxiv.org//abs/2411.11195) ++ [SoK: The Security-Safety Continuum of Multimodal Foundation Models through Information Flow and Game-Theoretic Defenses](https://arxiv.org/abs/2411.11195) Ruoxi Sun, Jiamin Chang, Hammond Pearce, Chaowei Xiao, Bo Li, Qi Wu, Surya Nepal, Minhui Xue # 2024-11-16 -+ [Playing Language Game with LLMs Leads to Jailbreaking](https://arxiv.org//abs/2411.12762) ++ [Playing Language Game with LLMs Leads to Jailbreaking](https://arxiv.org/abs/2411.12762) Yu Peng, Zewen Long, Fangming Dong, Congyi Li, Shu Wu, Kai Chen # 2024-11-15 -+ [TEESlice: Protecting Sensitive Neural Network Models in Trusted Execution Environments When Attackers have Pre-Trained Models](https://arxiv.org//abs/2411.09945) ++ [TEESlice: Protecting Sensitive Neural Network Models in Trusted Execution Environments When Attackers have Pre-Trained Models](https://arxiv.org/abs/2411.09945) Ding Li, Ziqi Zhang, Mengyu Yao, Yifeng Cai, Yao Guo, Xiangqun Chen -+ [A Hard-Label Cryptanalytic Extraction of Non-Fully Connected Deep Neural Networks using Side-Channel Attacks](https://arxiv.org//abs/2411.10174) ++ [A Hard-Label Cryptanalytic Extraction of Non-Fully Connected Deep Neural Networks using Side-Channel Attacks](https://arxiv.org/abs/2411.10174) Benoit Coqueret, Mathieu Carbone, Olivier Sentieys, Gabriel Zaid -+ [Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding](https://arxiv.org//abs/2411.10329) ++ [Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding](https://arxiv.org/abs/2411.10329) Huming Qiu, Guanxu Chen, Mi Zhang, Min Yang -+ [Continual Adversarial Reinforcement Learning (CARL) of False Data Injection detection: forgetting and explainability](https://arxiv.org//abs/2411.10367) ++ [Continual Adversarial Reinforcement Learning (CARL) of False Data Injection detection: forgetting and explainability](https://arxiv.org/abs/2411.10367) Pooja Aslami, Kejun Chen, Timothy M. Hansen, Malik Hassanaly -+ [Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations](https://arxiv.org//abs/2411.10414) ++ [Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations](https://arxiv.org/abs/2411.10414) Jianfeng Chi, Ujjwal Karn, Hongyuan Zhan, Eric Smith, Javier Rando, Yiming Zhang, Kate Plawiak, Zacharie Delpierre Coudert, Kartikeya Upasani, Mahesh Pasupuleti -+ [Toward Robust and Accurate Adversarial Camouflage Generation against Vehicle Detectors](https://arxiv.org//abs/2411.10029) ++ [Toward Robust and Accurate Adversarial Camouflage Generation against Vehicle Detectors](https://arxiv.org/abs/2411.10029) Jiawei Zhou, Linye Lyu, Daojing He, Yu Li -+ [Model Inversion Attacks: A Survey of Approaches and Countermeasures](https://arxiv.org//abs/2411.10023) ++ [Model Inversion Attacks: A Survey of Approaches and Countermeasures](https://arxiv.org/abs/2411.10023) Zhanke Zhou, Jianing Zhu, Fengfei Yu, Xuan Li, Xiong Peng, Tongliang Liu, Bo Han -+ [EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations](https://arxiv.org//abs/2411.10034) ++ [EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations](https://arxiv.org/abs/2411.10034) Jung-Woo Chang, Ke Sun, David Xia, Xinyu Zhang, Farinaz Koushanfar -+ [Edge-Only Universal Adversarial Attacks in Distributed Learning](https://arxiv.org//abs/2411.10500) ++ [Edge-Only Universal Adversarial Attacks in Distributed Learning](https://arxiv.org/abs/2411.10500) Giulio Rossolini, Tommaso Baldi, Alessandro Biondi, Giorgio Buttazzo -+ [Prompt-Guided Environmentally Consistent Adversarial Patch](https://arxiv.org//abs/2411.10498) ++ [Prompt-Guided Environmentally Consistent Adversarial Patch](https://arxiv.org/abs/2411.10498) Chaoqun Li, Huanqian Yan, Lifeng Zhou, Tairan Chen, Zhuodong Liu, Hang Su -+ [Embedding Byzantine Fault Tolerance into Federated Learning via Consistency Scoring](https://arxiv.org//abs/2411.10212) ++ [Embedding Byzantine Fault Tolerance into Federated Learning via Consistency Scoring](https://arxiv.org/abs/2411.10212) Youngjoon Lee, Jinu Gong, Joonhyuk Kang # 2024-11-14 -+ [DROJ: A Prompt-Driven Attack against Large Language Models](https://arxiv.org//abs/2411.09125) ++ [DROJ: A Prompt-Driven Attack against Large Language Models](https://arxiv.org/abs/2411.09125) Leyang Hu, Boran Wang -+ [Transferable Adversarial Attacks against ASR](https://arxiv.org//abs/2411.09220) ++ [Transferable Adversarial Attacks against ASR](https://arxiv.org/abs/2411.09220) Xiaoxue Gao, Zexin Li, Yiming Chen, Cong Liu, Haizhou Li -+ [How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception](https://arxiv.org//abs/2411.09266) ++ [How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception](https://arxiv.org/abs/2411.09266) Sahibzada Adil Shahzad, Ammarah Hashmi, Yan-Tsung Peng, Yu Tsao, Hsin-Min Wang -+ [Prompting the Unseen: Detecting Hidden Backdoors in Black-Box Models](https://arxiv.org//abs/2411.09540) ++ [Prompting the Unseen: Detecting Hidden Backdoors in Black-Box Models](https://arxiv.org/abs/2411.09540) Zi-Xuan Huang, Jia-Wei Chen, Zhi-Peng Zhang, Chia-Mu Yu -+ [Adversarial Vessel-Unveiling Semi-Supervised Segmentation for Retinopathy of Prematurity Diagnosis](https://arxiv.org//abs/2411.09140) ++ [Adversarial Vessel-Unveiling Semi-Supervised Segmentation for Retinopathy of Prematurity Diagnosis](https://arxiv.org/abs/2411.09140) Gozde Merve Demirci, Jiachen Yao, Ming-Chih Ho, Xiaoling Hu, Wei-Chi Wu, Chao Chen, Chia-Ling Tsai -+ [BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation](https://arxiv.org//abs/2411.09265) ++ [BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation](https://arxiv.org/abs/2411.09265) Zheng Zhou, Wenquan Feng, Shuchang Lyu, Guangliang Cheng, Xiaowei Huang, Qi Zhao -+ [Injection Attacks Against End-to-End Encrypted Applications](https://arxiv.org//abs/2411.09228) ++ [Injection Attacks Against End-to-End Encrypted Applications](https://arxiv.org/abs/2411.09228) Andrés Fábrega, Carolina Ortega Pérez, Armin Namavari, Ben Nassi, Rachit Agarwal, Thomas Ristenpart -+ [Backdoor Mitigation by Distance-Driven Detoxification](https://arxiv.org//abs/2411.09585) ++ [Backdoor Mitigation by Distance-Driven Detoxification](https://arxiv.org/abs/2411.09585) Shaokui Wei, Jiayin Liu, Hongyuan Zha -+ [Adversarial Attacks Using Differentiable Rendering: A Survey](https://arxiv.org//abs/2411.09749) ++ [Adversarial Attacks Using Differentiable Rendering: A Survey](https://arxiv.org/abs/2411.09749) Matthew Hull, Chao Zhang, Zsolt Kira, Duen Horng Chau -+ [Combining Machine Learning Defenses without Conflicts](https://arxiv.org//abs/2411.09776) ++ [Combining Machine Learning Defenses without Conflicts](https://arxiv.org/abs/2411.09776) Vasisht Duddu, Rui Zhang, N. Asokan -+ [Rethinking Weight-Averaged Model-merging](https://arxiv.org//abs/2411.09263) ++ [Rethinking Weight-Averaged Model-merging](https://arxiv.org/abs/2411.09263) Hu Wang, Congbo Ma, Ibrahim Almakky, Ian Reid, Gustavo Carneiro, Mohammad Yaqub # 2024-11-13 -+ [Neural Corrective Machine Unranking](https://arxiv.org//abs/2411.08562) ++ [Neural Corrective Machine Unranking](https://arxiv.org/abs/2411.08562) Jingrui Hou, Axel Finke, Georgina Cosma # 2024-11-12 -+ [New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook](https://arxiv.org//abs/2411.07691) ++ [New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook](https://arxiv.org/abs/2411.07691) Meng Yang, Tianqing Zhu, Chi Liu, WanLei Zhou, Shui Yu, Philip S. Yu -+ [Can adversarial attacks by large language models be attributed?](https://arxiv.org//abs/2411.08003) ++ [Can adversarial attacks by large language models be attributed?](https://arxiv.org/abs/2411.08003) Manuel Cebrian, Jan Arne Telle -+ [SecEncoder: Logs are All You Need in Security](https://arxiv.org//abs/2411.07528) ++ [SecEncoder: Logs are All You Need in Security](https://arxiv.org/abs/2411.07528) Muhammed Fatih Bulut, Yingqi Liu, Naveed Ahmad, Maximilian Turner, Sami Ait Ouahmane, Cameron Andrews, Lloyd Greenwald -+ [Model Stealing for Any Low-Rank Language Model](https://arxiv.org//abs/2411.07536) ++ [Model Stealing for Any Low-Rank Language Model](https://arxiv.org/abs/2411.07536) Allen Liu, Ankur Moitra -+ [Zer0-Jack: A Memory-efficient Gradient-based Jailbreaking Method for Black-box Multi-modal Large Language Models](https://arxiv.org//abs/2411.07559) ++ [Zer0-Jack: A Memory-efficient Gradient-based Jailbreaking Method for Black-box Multi-modal Large Language Models](https://arxiv.org/abs/2411.07559) Tiejin Chen, Kaishen Wang, Hua Wei -+ [DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises](https://arxiv.org//abs/2411.07457) ++ [DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises](https://arxiv.org/abs/2411.07457) Nan Xu, Xuezhe Ma -+ [Rapid Response: Mitigating LLM Jailbreaks with a Few Examples](https://arxiv.org//abs/2411.07494) ++ [Rapid Response: Mitigating LLM Jailbreaks with a Few Examples](https://arxiv.org/abs/2411.07494) Alwin Peng, Julian Michael, Henry Sleight, Ethan Perez, Mrinank Sharma -+ [Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors](https://arxiv.org//abs/2411.07472) ++ [Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors](https://arxiv.org/abs/2411.07472) Anisha Pal, Julia Kruk, Mansi Phute, Manognya Bhattaram, Diyi Yang, Duen Horng Chau, Judy Hoffman -+ [A Survey on Adversarial Machine Learning for Code Data: Realistic Threats, Countermeasures, and Interpretations](https://arxiv.org//abs/2411.07597) ++ [A Survey on Adversarial Machine Learning for Code Data: Realistic Threats, Countermeasures, and Interpretations](https://arxiv.org/abs/2411.07597) Yulong Yang, Haoran Fan, Chenhao Lin, Qian Li, Zhengyu Zhao, Chao Shen, Xiaohong Guan # 2024-11-11 -+ [Adversarial Detection with a Dynamically Stable System](https://arxiv.org//abs/2411.06666) ++ [Adversarial Detection with a Dynamically Stable System](https://arxiv.org/abs/2411.06666) Xiaowei Long, Jie Lin, Xiangyuan Yang -+ [Computable Model-Independent Bounds for Adversarial Quantum Machine Learning](https://arxiv.org//abs/2411.06863) ++ [Computable Model-Independent Bounds for Adversarial Quantum Machine Learning](https://arxiv.org/abs/2411.06863) Bacui Li, Tansu Alpcan, Chandra Thapa, Udaya Parampalli -+ [LongSafetyBench: Long-Context LLMs Struggle with Safety Issues](https://arxiv.org//abs/2411.06899) ++ [LongSafetyBench: Long-Context LLMs Struggle with Safety Issues](https://arxiv.org/abs/2411.06899) Mianqiu Huang, Xiaoran Liu, Shaojun Zhou, Mozhi Zhang, Chenkun Tan, Pengyu Wang, Qipeng Guo, Zhe Xu, Linyang Li, Zhikai Lei, Linlin Li, Qun Liu, Yaqian Zhou, Xipeng Qiu, Xuanjing Huang -+ [TinyML Security: Exploring Vulnerabilities in Resource-Constrained Machine Learning Systems](https://arxiv.org//abs/2411.07114) ++ [TinyML Security: Exploring Vulnerabilities in Resource-Constrained Machine Learning Systems](https://arxiv.org/abs/2411.07114) Jacob Huckelberry, Yuke Zhang, Allison Sansone, James Mickens, Peter A. Beerel, Vijay Janapa Reddi -+ [ProP: Efficient Backdoor Detection via Propagation Perturbation for Overparametrized Models](https://arxiv.org//abs/2411.07036) ++ [ProP: Efficient Backdoor Detection via Propagation Perturbation for Overparametrized Models](https://arxiv.org/abs/2411.07036) Tao Ren, Qiongxiu Li -+ [The Inherent Adversarial Robustness of Analog In-Memory Computing](https://arxiv.org//abs/2411.07023) ++ [The Inherent Adversarial Robustness of Analog In-Memory Computing](https://arxiv.org/abs/2411.07023) Corey Lammie, Julian Büchel, Athanasios Vasilopoulos, Manuel Le Gallo, Abu Sebastian -+ [SetLexSem Challenge: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models](https://arxiv.org//abs/2411.07336) ++ [SetLexSem Challenge: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models](https://arxiv.org/abs/2411.07336) Bardiya Akhbari, Manish Gawali, Nicholas A. Dronen # 2024-11-10 -+ [SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains](https://arxiv.org//abs/2411.06426) ++ [SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains](https://arxiv.org/abs/2411.06426) Bijoy Ahmed Saiem, MD Sadik Hossain Shanto, Rakib Ahsan, Md Rafi ur Rashid -+ [Prompt-Efficient Fine-Tuning for GPT-like Deep Models to Reduce Hallucination and to Improve Reproducibility in Scientific Text Generation Using Stochastic Optimisation Techniques](https://arxiv.org//abs/2411.06445) ++ [Prompt-Efficient Fine-Tuning for GPT-like Deep Models to Reduce Hallucination and to Improve Reproducibility in Scientific Text Generation Using Stochastic Optimisation Techniques](https://arxiv.org/abs/2411.06445) Daniil Sulimov -+ [vTune: Verifiable Fine-Tuning for LLMs Through Backdooring](https://arxiv.org//abs/2411.06611) ++ [vTune: Verifiable Fine-Tuning for LLMs Through Backdooring](https://arxiv.org/abs/2411.06611) Eva Zhang, Arka Pal, Akilesh Potti, Micah Goldblum -+ [SamRobNODDI: Q-Space Sampling-Augmented Continuous Representation Learning for Robust and Generalized NODDI](https://arxiv.org//abs/2411.06444) ++ [SamRobNODDI: Q-Space Sampling-Augmented Continuous Representation Learning for Robust and Generalized NODDI](https://arxiv.org/abs/2411.06444) Taohui Xiao, Jian Cheng, Wenxin Fan, Enqing Dong, Hairong Zheng, Shanshan Wang -+ [Protection against Source Inference Attacks in Federated Learning using Unary Encoding and Shuffling](https://arxiv.org//abs/2411.06458) ++ [Protection against Source Inference Attacks in Federated Learning using Unary Encoding and Shuffling](https://arxiv.org/abs/2411.06458) Andreas Athanasiou, Kangsoo Jung, Catuscia Palamidessi -+ [InvisMark: Invisible and Robust Watermarking for AI-generated Image Provenance](https://arxiv.org//abs/2411.07795) ++ [InvisMark: Invisible and Robust Watermarking for AI-generated Image Provenance](https://arxiv.org/abs/2411.07795) Rui Xu, Mengya (Mia)Hu, Deren Lei, Yaxi Li, David Lowe, Alex Gorevski, Mingyu Wang, Emily Ching, Alex Deng -+ [Deferred Backdoor Functionality Attacks on Deep Learning Models](https://arxiv.org//abs/2411.14449) ++ [Deferred Backdoor Functionality Attacks on Deep Learning Models](https://arxiv.org/abs/2411.14449) Jeongjin Shin, Sangdon Park # 2024-11-08 -+ [Reasoning Robustness of LLMs to Adversarial Typographical Errors](https://arxiv.org//abs/2411.05345) ++ [Reasoning Robustness of LLMs to Adversarial Typographical Errors](https://arxiv.org/abs/2411.05345) Esther Gan, Yiran Zhao, Liying Cheng, Yancan Mao, Anirudh Goyal, Kenji Kawaguchi, Min-Yen Kan, Michael Shieh -+ [A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics](https://arxiv.org//abs/2411.05718) ++ [A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics](https://arxiv.org/abs/2411.05718) Puze Liu, Jonas Günster, Niklas Funk, Simon Gröger, Dong Chen, Haitham Bou-Ammar, Julius Jankowski, Ante Marić, Sylvain Calinon, Andrej Orsula, Miguel Olivares-Mendez, Hongyi Zhou, Rudolf Lioutikov, Gerhard Neumann, Amarildo Likmeta Amirhossein Zhalehmehrabi, Thomas Bonenfant, Marcello Restelli, Davide Tateo, Ziyuan Liu, Jan Peters -+ [A Quality-Centric Framework for Generic Deepfake Detection](https://arxiv.org//abs/2411.05335) ++ [A Quality-Centric Framework for Generic Deepfake Detection](https://arxiv.org/abs/2411.05335) Wentang Song, Zhiyuan Yan, Yuzhen Lin, Taiping Yao, Changsheng Chen, Shen Chen, Yandan Zhao, Shouhong Ding, Bin Li -+ [Post-Hoc Robustness Enhancement in Graph Neural Networks with Conditional Random Fields](https://arxiv.org//abs/2411.05399) ++ [Post-Hoc Robustness Enhancement in Graph Neural Networks with Conditional Random Fields](https://arxiv.org/abs/2411.05399) Yassine Abbahaddou, Sofiane Ennadir, Johannes F. Lutzeyer, Fragkiskos D. Malliaros, Michalis Vazirgiannis -+ [Towards a Re-evaluation of Data Forging Attacks in Practice](https://arxiv.org//abs/2411.05658) ++ [Towards a Re-evaluation of Data Forging Attacks in Practice](https://arxiv.org/abs/2411.05658) Mohamed Suliman, Anisa Halimi, Swanand Kadhe, Nathalie Baracaldo, Douglas Leith -+ [Revisiting the Robustness of Watermarking to Paraphrasing Attacks](https://arxiv.org//abs/2411.05277) ++ [Revisiting the Robustness of Watermarking to Paraphrasing Attacks](https://arxiv.org/abs/2411.05277) Saksham Rastogi, Danish Pruthi -+ [Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models](https://arxiv.org//abs/2411.05708) ++ [Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models](https://arxiv.org/abs/2411.05708) Puqian Wang, Nikos Zarifis, Ilias Diakonikolas, Jelena Diakonikolas -+ [Joint-Optimized Unsupervised Adversarial Domain Adaptation in Remote Sensing Segmentation with Prompted Foundation Model](https://arxiv.org//abs/2411.05878) ++ [Joint-Optimized Unsupervised Adversarial Domain Adaptation in Remote Sensing Segmentation with Prompted Foundation Model](https://arxiv.org/abs/2411.05878) Shuchang Lyu, Qi Zhaoa, Guangliang Cheng, Yiwei He, Zheng Zhou, Guangbiao Wang, Zhenwei Shi @@ -18197,60 +18197,60 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Joseph Pollock, Igor Shilov, Euodia Dodd, Yves-Alexandre de Montjoye -+ [Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking](https://arxiv.org//abs/2411.05375) ++ [Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking](https://arxiv.org/abs/2411.05375) Mubashara Akhtar, Michael Schlichtkrull, Andreas Vlachos # 2024-11-07 -+ [Adversarial Robustness of In-Context Learning in Transformers for Linear Regression](https://arxiv.org//abs/2411.05189) ++ [Adversarial Robustness of In-Context Learning in Transformers for Linear Regression](https://arxiv.org/abs/2411.05189) Usman Anwar, Johannes Von Oswald, Louis Kirsch, David Krueger, Spencer Frei # 2024-11-06 -+ [Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination](https://arxiv.org//abs/2411.03823) ++ [Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination](https://arxiv.org/abs/2411.03823) Dingjie Song, Sicheng Lai, Shunian Chen, Lichao Sun, Benyou Wang -+ [Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts](https://arxiv.org//abs/2411.03829) ++ [Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts](https://arxiv.org/abs/2411.03829) Zhitong Gao, Bingnan Li, Mathieu Salzmann, Xuming He -+ [Act in Collusion: A Persistent Distributed Multi-Target Backdoor in Federated Learning](https://arxiv.org//abs/2411.03926) ++ [Act in Collusion: A Persistent Distributed Multi-Target Backdoor in Federated Learning](https://arxiv.org/abs/2411.03926) Tao Liu, Wu Yang, Chen Xu, Jiguang Lv, Huanran Wang, Yuhang Zhang, Shuchun Xu, Dapeng Man -+ [Optimal Defenses Against Gradient Reconstruction Attacks](https://arxiv.org//abs/2411.03746) ++ [Optimal Defenses Against Gradient Reconstruction Attacks](https://arxiv.org/abs/2411.03746) Yuxiao Chen, Gamze Gürsoy, Qi Lei -+ [Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion](https://arxiv.org//abs/2411.05034) ++ [Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion](https://arxiv.org/abs/2411.05034) Tiantian Liu, Hongwei Yao, Tong Wu, Zhan Qin, Feng Lin, Kui Ren, Chun Chen -+ [A Fundamental Accuracy--Robustness Trade-off in Regression and Classification](https://arxiv.org//abs/2411.05853) ++ [A Fundamental Accuracy--Robustness Trade-off in Regression and Classification](https://arxiv.org/abs/2411.05853) Sohail Bahmani # 2024-11-05 -+ [Membership Inference Attacks against Large Vision-Language Models](https://arxiv.org//abs/2411.02902) ++ [Membership Inference Attacks against Large Vision-Language Models](https://arxiv.org/abs/2411.02902) Zhan Li, Yongtao Wu, Yihang Chen, Francesco Tonin, Elias Abad Rocamora, Volkan Cevher -+ [DM4Steal: Diffusion Model For Link Stealing Attack On Graph Neural Networks](https://arxiv.org//abs/2411.03364) ++ [DM4Steal: Diffusion Model For Link Stealing Attack On Graph Neural Networks](https://arxiv.org/abs/2411.03364) Jinyin Chen, Haonan Ma, Haibin Zheng -+ [TDDBench: A Benchmark for Training data detection](https://arxiv.org//abs/2411.03363) ++ [TDDBench: A Benchmark for Training data detection](https://arxiv.org/abs/2411.03363) Zhihao Zhu, Yi Yang, Defu Lian @@ -18260,169 +18260,169 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yanzhe Zhang, Tao Yu, Diyi Yang # 2024-11-03 -+ [Undermining Image and Text Classification Algorithms Using Adversarial Attacks](https://arxiv.org//abs/2411.03348) ++ [Undermining Image and Text Classification Algorithms Using Adversarial Attacks](https://arxiv.org/abs/2411.03348) Langalibalele Lunga, Suhas Sreehari -+ [Building the Self-Improvement Loop: Error Detection and Correction in Goal-Oriented Semantic Communications](https://arxiv.org//abs/2411.01544) ++ [Building the Self-Improvement Loop: Error Detection and Correction in Goal-Oriented Semantic Communications](https://arxiv.org/abs/2411.01544) Peizheng Li, Xinyi Lin, Adnan Aijaz # 2024-11-02 -+ [What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks](https://arxiv.org//abs/2411.03343) ++ [What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks](https://arxiv.org/abs/2411.03343) Nathalie Maria Kirch, Severin Field, Stephen Casper # 2024-11-01 -+ [Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing](https://arxiv.org//abs/2411.00425) ++ [Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing](https://arxiv.org/abs/2411.00425) Naufal Suryanto, Andro Aprila Adiputra, Ahmada Yusril Kadiptya, Thi-Thu-Huong Le, Derry Pratama, Yongsu Kim, Howon Kim -+ [ROSS:RObust decentralized Stochastic learning based on Shapley values](https://arxiv.org//abs/2411.00365) ++ [ROSS:RObust decentralized Stochastic learning based on Shapley values](https://arxiv.org/abs/2411.00365) Lina Wang, Yunsheng Yuan, Feng Li, Lingjie Duan -+ [Defense Against Prompt Injection Attack by Leveraging Attack Techniques](https://arxiv.org//abs/2411.00459) ++ [Defense Against Prompt Injection Attack by Leveraging Attack Techniques](https://arxiv.org/abs/2411.00459) Yulin Chen, Haoran Li, Zihao Zheng, Yangqiu Song, Dekai Wu, Bryan Hooi # 2024-10-31 -+ [Transferable Ensemble Black-box Jailbreak Attacks on Large Language Models](https://arxiv.org//abs/2410.23558) ++ [Transferable Ensemble Black-box Jailbreak Attacks on Large Language Models](https://arxiv.org/abs/2410.23558) Yiqi Yang, Hongye Fu -+ [Pseudo-Conversation Injection for LLM Goal Hijacking](https://arxiv.org//abs/2410.23678) ++ [Pseudo-Conversation Injection for LLM Goal Hijacking](https://arxiv.org/abs/2410.23678) Zheng Chen, Buhui Yao -+ [Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models](https://arxiv.org//abs/2410.23861) ++ [Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models](https://arxiv.org/abs/2410.23861) Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari -+ [DIP: Diffusion Learning of Inconsistency Pattern for General DeepFake Detection](https://arxiv.org//abs/2410.23663) ++ [DIP: Diffusion Learning of Inconsistency Pattern for General DeepFake Detection](https://arxiv.org/abs/2410.23663) Fan Nie, Jiangqun Ni, Jian Zhang, Bin Zhang, Weizhe Zhang -+ [Adversarial Attacks of Vision Tasks in the Past 10 Years: A Survey](https://arxiv.org//abs/2410.23687) ++ [Adversarial Attacks of Vision Tasks in the Past 10 Years: A Survey](https://arxiv.org/abs/2410.23687) Chiyu Zhang, Xiaogang Xu, Jiafei Wu, Zhe Liu, Lu Zhou -+ [DiffPAD: Denoising Diffusion-based Adversarial Patch Decontamination](https://arxiv.org//abs/2410.24006) ++ [DiffPAD: Denoising Diffusion-based Adversarial Patch Decontamination](https://arxiv.org/abs/2410.24006) Jia Fu, Xiao Zhang, Sepideh Pashami, Fatemeh Rahimian, Anders Holst -+ [Wide Two-Layer Networks can Learn from Adversarial Perturbations](https://arxiv.org//abs/2410.23677) ++ [Wide Two-Layer Networks can Learn from Adversarial Perturbations](https://arxiv.org/abs/2410.23677) Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki -+ [Noise as a Double-Edged Sword: Reinforcement Learning Exploits Randomized Defenses in Neural Networks](https://arxiv.org//abs/2410.23870) ++ [Noise as a Double-Edged Sword: Reinforcement Learning Exploits Randomized Defenses in Neural Networks](https://arxiv.org/abs/2410.23870) Steve Bakos, Pooria Madani, Heidar Davoudi -+ [I Can Hear You: Selective Robust Training for Deepfake Audio Detection](https://arxiv.org//abs/2411.00121) ++ [I Can Hear You: Selective Robust Training for Deepfake Audio Detection](https://arxiv.org/abs/2411.00121) Zirui Zhang, Wei Hao, Aroon Sankoh, William Lin, Emanuel Mendiola-Ortiz, Junfeng Yang, Chengzhi Mao -+ [Protecting Feed-Forward Networks from Adversarial Attacks Using Predictive Coding](https://arxiv.org//abs/2411.00222) ++ [Protecting Feed-Forward Networks from Adversarial Attacks Using Predictive Coding](https://arxiv.org/abs/2411.00222) Ehsan Ganjidoost, Jeff Orchard -+ [Optical Lens Attack on Monocular Depth Estimation for Autonomous Driving](https://arxiv.org//abs/2411.00192) ++ [Optical Lens Attack on Monocular Depth Estimation for Autonomous Driving](https://arxiv.org/abs/2411.00192) Ce Zhou, Qiben Yan, Daniel Kent, Guangjing Wang, Weikang Ding, Ziqi Zhang, Hayder Radha -+ [Attention Tracker: Detecting Prompt Injection Attacks in LLMs](https://arxiv.org//abs/2411.00348) ++ [Attention Tracker: Detecting Prompt Injection Attacks in LLMs](https://arxiv.org/abs/2411.00348) Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H. Hsu, Pin-Yu Chen # 2024-10-30 -+ [Backdoor Attack Against Vision Transformers via Attention Gradient-Based Image Erosion](https://arxiv.org//abs/2410.22678) ++ [Backdoor Attack Against Vision Transformers via Attention Gradient-Based Image Erosion](https://arxiv.org/abs/2410.22678) Ji Guo, Hongwei Li, Wenbo Jiang, Guoming Lu -+ [InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models](https://arxiv.org//abs/2410.22770) ++ [InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models](https://arxiv.org/abs/2410.22770) Hao Li, Xiaogeng Liu, Chaowei Xiao -+ [Contrastive Learning and Adversarial Disentanglement for Privacy-Preserving Task-Oriented Semantic Communications](https://arxiv.org//abs/2410.22784) ++ [Contrastive Learning and Adversarial Disentanglement for Privacy-Preserving Task-Oriented Semantic Communications](https://arxiv.org/abs/2410.22784) Omar Erak, Omar Alhussein, Wen Tong -+ [HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models](https://arxiv.org//abs/2410.22832) ++ [HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models](https://arxiv.org/abs/2410.22832) Yucheng Zhang, Qinfeng Li, Tianyu Du, Xuhong Zhang, Xinkui Zhao, Zhengwen Feng, Jianwei Yin -+ [Stealing User Prompts from Mixture of Experts](https://arxiv.org//abs/2410.22884) ++ [Stealing User Prompts from Mixture of Experts](https://arxiv.org/abs/2410.22884) Itay Yona, Ilia Shumailov, Jamie Hayes, Nicholas Carlini -+ [Teaching a Language Model to Distinguish Between Similar Details using a Small Adversarial Training Set](https://arxiv.org//abs/2410.23118) ++ [Teaching a Language Model to Distinguish Between Similar Details using a Small Adversarial Training Set](https://arxiv.org/abs/2410.23118) Chris Achard -+ [ProTransformer: Robustify Transformers via Plug-and-Play Paradigm](https://arxiv.org//abs/2410.23182) ++ [ProTransformer: Robustify Transformers via Plug-and-Play Paradigm](https://arxiv.org/abs/2410.23182) Zhichao Hou, Weizhi Gao, Yuchen Shen, Feiyi Wang, Xiaorui Liu -+ [One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks](https://arxiv.org//abs/2410.22725) ++ [One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks](https://arxiv.org/abs/2410.22725) Ji Guo, Wenbo Jiang, Rui Zhang, Guoming Lu, Hongwei Li, Weiren Wu -+ [CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense](https://arxiv.org//abs/2410.23091) ++ [CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense](https://arxiv.org/abs/2410.23091) Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng -+ [FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training](https://arxiv.org//abs/2410.23142) ++ [FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training](https://arxiv.org/abs/2410.23142) Tejaswini Medi, Steffen Jung, Margret Keuper -+ [Byzantine-Robust Federated Learning: An Overview With Focus on Developing Sybil-based Attacks to Backdoor Augmented Secure Aggregation Protocols](https://arxiv.org//abs/2410.22680) ++ [Byzantine-Robust Federated Learning: An Overview With Focus on Developing Sybil-based Attacks to Backdoor Augmented Secure Aggregation Protocols](https://arxiv.org/abs/2410.22680) Atharv Deshmukh -+ [Crosstalk Attack Resilient RNS Quantum Addition](https://arxiv.org//abs/2410.23217) ++ [Crosstalk Attack Resilient RNS Quantum Addition](https://arxiv.org/abs/2410.23217) Bhaskar Gaur, Himanshu Thapliyal -+ [Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System](https://arxiv.org//abs/2410.23483) ++ [Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System](https://arxiv.org/abs/2410.23483) Julian Collado, Kevin Stangl -+ [Causality-Driven Audits of Model Robustness](https://arxiv.org//abs/2410.23494) ++ [Causality-Driven Audits of Model Robustness](https://arxiv.org/abs/2410.23494) Nathan Drenkow, Chris Ribaudo, Mathias Unberath @@ -18433,104 +18433,104 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Lam Nguyen Tung, Steven Cho, Xiaoning Du, Neelofar Neelofar, Valerio Terragni, Stefano Ruberto, Aldeida Aleti # 2024-10-29 -+ [Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models](https://arxiv.org//abs/2410.21802) ++ [Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models](https://arxiv.org/abs/2410.21802) Lu Yu, Haiyang Zhang, Changsheng Xu -+ [Benchmarking OpenAI o1 in Cyber Security](https://arxiv.org//abs/2410.21939) ++ [Benchmarking OpenAI o1 in Cyber Security](https://arxiv.org/abs/2410.21939) Dan Ristea, Vasilios Mavroudis, Chris Hicks -+ [CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs](https://arxiv.org//abs/2410.21695) ++ [CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs](https://arxiv.org/abs/2410.21695) Zhihao Liu, Chenhui Hu -+ [Enhancing Adversarial Attacks through Chain of Thought](https://arxiv.org//abs/2410.21791) ++ [Enhancing Adversarial Attacks through Chain of Thought](https://arxiv.org/abs/2410.21791) Jingbo Su -+ [Distinguishing Ignorance from Error in LLM Hallucinations](https://arxiv.org//abs/2410.22071) ++ [Distinguishing Ignorance from Error in LLM Hallucinations](https://arxiv.org/abs/2410.22071) Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov -+ [AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts](https://arxiv.org//abs/2410.22143) ++ [AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts](https://arxiv.org/abs/2410.22143) Vishal Kumar, Zeyi Liao, Jaylen Jones, Huan Sun -+ [Benchmarking LLM Guardrails in Handling Multilingual Toxicity](https://arxiv.org//abs/2410.22153) ++ [Benchmarking LLM Guardrails in Handling Multilingual Toxicity](https://arxiv.org/abs/2410.22153) Yahan Yang, Soham Dan, Dan Roth, Insup Lee -+ [Fingerprints of Super Resolution Networks](https://arxiv.org//abs/2410.21653) ++ [Fingerprints of Super Resolution Networks](https://arxiv.org/abs/2410.21653) Jeremy Vonderfecht, Feng Liu -+ [FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection](https://arxiv.org//abs/2410.21964) ++ [FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection](https://arxiv.org/abs/2410.21964) Dat Nguyen, Marcella Astrid, Enjie Ghorbel, Djamila Aouada -+ [Embedding-based classifiers can detect prompt injection attacks](https://arxiv.org//abs/2410.22284) ++ [Embedding-based classifiers can detect prompt injection attacks](https://arxiv.org/abs/2410.22284) Md. Ahsan Ayub, Subhabrata Majumdar -+ [Robust training of implicit generative models for multivariate and heavy-tailed distributions with an invariant statistical loss](https://arxiv.org//abs/2410.22381) ++ [Robust training of implicit generative models for multivariate and heavy-tailed distributions with an invariant statistical loss](https://arxiv.org/abs/2410.22381) José Manuel de Frutos, Manuel A. Vázquez, Pablo Olmos, Joaquín Míguez -+ [Power side-channel leakage localization through adversarial training of deep neural networks](https://arxiv.org//abs/2410.22425) ++ [Power side-channel leakage localization through adversarial training of deep neural networks](https://arxiv.org/abs/2410.22425) Jimmy Gammell, Anand Raghunathan, Kaushik Roy # 2024-10-28 -+ [Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks](https://arxiv.org//abs/2410.20911) ++ [Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks](https://arxiv.org/abs/2410.20911) Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese -+ [BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks](https://arxiv.org//abs/2410.20971) ++ [BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks](https://arxiv.org/abs/2410.20971) Yunhan Zhao, Xiang Zheng, Lin Luo, Yige Li, Xingjun Ma, Yu-Gang Jiang -+ [Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring](https://arxiv.org//abs/2410.21083) ++ [Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring](https://arxiv.org/abs/2410.21083) Honglin Mu, Han He, Yuxin Zhou, Yunlong Feng, Yang Xu, Libo Qin, Xiaoming Shi, Zeming Liu, Xudong Han, Qi Shi, Qingfu Zhu, Wanxiang Che -+ [SeriesGAN: Time Series Generation via Adversarial and Autoregressive Learning](https://arxiv.org//abs/2410.21203) ++ [SeriesGAN: Time Series Generation via Adversarial and Autoregressive Learning](https://arxiv.org/abs/2410.21203) MohammadReza EskandariNasab, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi -+ [Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models](https://arxiv.org//abs/2410.20940) ++ [Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models](https://arxiv.org/abs/2410.20940) Piotr Przybyła -+ [Evaluating the Robustness of LiDAR Point Cloud Tracking Against Adversarial Attack](https://arxiv.org//abs/2410.20893) ++ [Evaluating the Robustness of LiDAR Point Cloud Tracking Against Adversarial Attack](https://arxiv.org/abs/2410.20893) Shengjing Tian, Yinan Han, Xiantong Zhao, Bin Liu, Xiuping Liu -+ [Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models](https://arxiv.org//abs/2410.21088) ++ [Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models](https://arxiv.org/abs/2410.21088) Wenda Li, Huijie Zhang, Qing Qu -+ [Robustness and Generalization in Quantum Reinforcement Learning via Lipschitz Regularization](https://arxiv.org//abs/2410.21117) ++ [Robustness and Generalization in Quantum Reinforcement Learning via Lipschitz Regularization](https://arxiv.org/abs/2410.21117) Nico Meyer, Julian Berberich, Christopher Mutschler, Daniel D. Scherer @@ -18540,53 +18540,53 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Md Abdur Rahman, Fan Wu, Alfredo Cuzzocrea, Sheikh Iqbal Ahamed -+ [TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors](https://arxiv.org//abs/2410.21443) ++ [TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors](https://arxiv.org/abs/2410.21443) Adonisz Dimitriu, Tamás Michaletzky, Viktor Remeli -+ [AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models](https://arxiv.org//abs/2410.21471) ++ [AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models](https://arxiv.org/abs/2410.21471) Yaopei Zeng, Yuanpu Cao, Bochuan Cao, Yurui Chang, Jinghui Chen, Lu Lin -+ [Trustworthiness of Stochastic Gradient Descent in Distributed Learning](https://arxiv.org//abs/2410.21491) ++ [Trustworthiness of Stochastic Gradient Descent in Distributed Learning](https://arxiv.org/abs/2410.21491) Hongyang Li, Caesar Wu, Mohammed Chadli, Said Mammar, Pascal Bouvry -+ [FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks](https://arxiv.org//abs/2410.21492) ++ [FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks](https://arxiv.org/abs/2410.21492) Jiongxiao Wang, Fangzhou Wu, Wendi Li, Jinsheng Pan, Edward Suh, Z. Morley Mao, Muhao Chen, Chaowei Xiao -+ [Inverting Gradient Attacks Naturally Makes Data Poisons: An Availability Attack on Neural Networks](https://arxiv.org//abs/2410.21453) ++ [Inverting Gradient Attacks Naturally Makes Data Poisons: An Availability Attack on Neural Networks](https://arxiv.org/abs/2410.21453) Wassim Bouaziz, El-Mahdi El-Mhamdi, Nicolas Usunier -+ [SCULPT: Systematic Tuning of Long Prompts](https://arxiv.org//abs/2410.20788) ++ [SCULPT: Systematic Tuning of Long Prompts](https://arxiv.org/abs/2410.20788) Shanu Kumar, Akhila Yesantarao Venkata, Shubhanshu Khandelwal, Bishal Santra, Parag Agrawal, Manish Gupta # 2024-10-27 -+ [Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains](https://arxiv.org//abs/2410.20340) ++ [Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains](https://arxiv.org/abs/2410.20340) Jiemin Wu, Songning Lai, Ruiqiang Xiao, Tianlang Xue, Jiayu Yang, Yutao Yue -+ [Integrating uncertainty quantification into randomized smoothing based robustness guarantees](https://arxiv.org//abs/2410.20432) ++ [Integrating uncertainty quantification into randomized smoothing based robustness guarantees](https://arxiv.org/abs/2410.20432) Sina Däubener, Kira Maag, David Krueger, Asja Fischer -+ [LLM Robustness Against Misinformation in Biomedical Question Answering](https://arxiv.org//abs/2410.21330) ++ [LLM Robustness Against Misinformation in Biomedical Question Answering](https://arxiv.org/abs/2410.21330) Alexander Bondarenko, Adrian Viehweger -+ [Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness](https://arxiv.org//abs/2410.21331) ++ [Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness](https://arxiv.org/abs/2410.21331) Qi Zhang, Yifei Wang, Jingyi Cui, Xiang Pan, Qi Lei, Stefanie Jegelka, Yisen Wang @@ -18597,278 +18597,278 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Ananya Malik, Kartik Sharma, Shaily Bhatt, Lynnette Hui Xian Ng # 2024-10-26 -+ [Beyond Fine-Tuning: Effective Strategies for Mitigating Hallucinations in Large Language Models for Data Analytics](https://arxiv.org//abs/2410.20024) ++ [Beyond Fine-Tuning: Effective Strategies for Mitigating Hallucinations in Large Language Models for Data Analytics](https://arxiv.org/abs/2410.20024) Mikhail Rumiantsau, Aliaksei Vertsel, Ilya Hrytsuk, Isaiah Ballah -+ [Vulnerability of LLMs to Vertically Aligned Text Manipulations](https://arxiv.org//abs/2410.20016) ++ [Vulnerability of LLMs to Vertically Aligned Text Manipulations](https://arxiv.org/abs/2410.20016) Zhecheng Li, Yiwei Wang, Bryan Hooi, Yujun Cai, Zhen Xiong, Nanyun Peng, Kai-wei Chang -+ [Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification](https://arxiv.org//abs/2410.20097) ++ [Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification](https://arxiv.org/abs/2410.20097) Yue Su, Hao Li, Maoguo Gong -+ [Prompt Diffusion Robustifies Any-Modality Prompt Learning](https://arxiv.org//abs/2410.20164) ++ [Prompt Diffusion Robustifies Any-Modality Prompt Learning](https://arxiv.org/abs/2410.20164) Yingjun Du, Gaowen Liu, Yuzhang Shang, Yuguang Yao, Ramana Kompella, Cees G. M. Snoek -+ [Transferable Adversarial Attacks on SAM and Its Downstream Models](https://arxiv.org//abs/2410.20197) ++ [Transferable Adversarial Attacks on SAM and Its Downstream Models](https://arxiv.org/abs/2410.20197) Song Xia, Wenhan Yang, Yi Yu, Xun Lin, Henghui Ding, Lingyu Duan, Xudong Jiang -+ [Proactive Fraud Defense: Machine Learning's Evolving Role in Protecting Against Online Fraud](https://arxiv.org//abs/2410.20281) ++ [Proactive Fraud Defense: Machine Learning's Evolving Role in Protecting Against Online Fraud](https://arxiv.org/abs/2410.20281) Md Kamrul Hasan Chy -+ [CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification](https://arxiv.org//abs/2410.20136) ++ [CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification](https://arxiv.org/abs/2410.20136) Fangwen Mu, Junjie Wang, Zhuohao Yu, Lin Shi, Song Wang, Mingyang Li, Qing Wang # 2024-10-25 -+ [Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models](https://arxiv.org//abs/2410.19427) ++ [Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models](https://arxiv.org/abs/2410.19427) Yige Li, Hanxun Huang, Jiaming Zhang, Xingjun Ma, Yu-Gang Jiang -+ [Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models](https://arxiv.org//abs/2410.19385) ++ [Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models](https://arxiv.org/abs/2410.19385) Liam Barkley, Brink van der Merwe -+ [Adversarial Environment Design via Regret-Guided Diffusion Models](https://arxiv.org//abs/2410.19715) ++ [Adversarial Environment Design via Regret-Guided Diffusion Models](https://arxiv.org/abs/2410.19715) Hojun Chung, Junseo Lee, Minsoo Kim, Dohyeong Kim, Songhwai Oh -+ [The Reopening of Pandora's Box: Analyzing the Role of LLMs in the Evolving Battle Against AI-Generated Fake News](https://arxiv.org//abs/2410.19250) ++ [The Reopening of Pandora's Box: Analyzing the Role of LLMs in the Evolving Battle Against AI-Generated Fake News](https://arxiv.org/abs/2410.19250) Xinyu Wang, Wenbo Zhang, Sai Koneru, Hangzhi Guo, Bonam Mingole, S. Shyam Sundar, Sarah Rajtmajer, Amulya Yadav -+ [A Debate-Driven Experiment on LLM Hallucinations and Accuracy](https://arxiv.org//abs/2410.19485) ++ [A Debate-Driven Experiment on LLM Hallucinations and Accuracy](https://arxiv.org/abs/2410.19485) Ray Li, Tanishka Bagade, Kevin Martinez, Flora Yasmin, Grant Ayala, Michael Lam, Kevin Zhu -+ [Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors](https://arxiv.org//abs/2410.19230) ++ [Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors](https://arxiv.org/abs/2410.19230) Tianchun Wang, Yuanzhou Chen, Zichuan Liu, Zhanwen Chen, Haifeng Chen, Xiang Zhang, Wei Cheng # 2024-10-24 -+ [GADT: Enhancing Transferable Adversarial Attacks through Gradient-guided Adversarial Data Transformation](https://arxiv.org//abs/2410.18648) ++ [GADT: Enhancing Transferable Adversarial Attacks through Gradient-guided Adversarial Data Transformation](https://arxiv.org/abs/2410.18648) Yating Ma, Xiaogang Xu, Liming Fang, Zhe Liu -+ [Complexity Matters: Effective Dimensionality as a Measure for Adversarial Robustness](https://arxiv.org//abs/2410.18556) ++ [Complexity Matters: Effective Dimensionality as a Measure for Adversarial Robustness](https://arxiv.org/abs/2410.18556) David Khachaturov, Robert Mullins -+ [DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations](https://arxiv.org//abs/2410.18860) ++ [DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations](https://arxiv.org/abs/2410.18860) Aryo Pradipta Gema, Chen Jin, Ahmed Abdulaal, Tom Diethe, Philip Teare, Beatrice Alex, Pasquale Minervini, Amrutha Saseendran -+ [Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities](https://arxiv.org//abs/2410.18469) ++ [Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities](https://arxiv.org/abs/2410.18469) Chung-En Sun, Xiaodong Liu, Weiwei Yang, Tsui-Wei Weng, Hao Cheng, Aidan San, Michel Galley, Jianfeng Gao -+ [Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model](https://arxiv.org//abs/2410.18640) ++ [Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model](https://arxiv.org/abs/2410.18640) Wenhong Zhu, Zhiwei He, Xiaofeng Wang, Pengfei Liu, Rui Wang -+ [Adversarial Attacks on Large Language Models Using Regularized Relaxation](https://arxiv.org//abs/2410.19160) ++ [Adversarial Attacks on Large Language Models Using Regularized Relaxation](https://arxiv.org/abs/2410.19160) Samuel Jacob Chacko, Sajib Biswas, Chashi Mahiul Islam, Fatema Tabassum Liza, Xiuwen Liu # 2024-10-23 -+ [Advancing NLP Security by Leveraging LLMs as Adversarial Engines](https://arxiv.org//abs/2410.18215) ++ [Advancing NLP Security by Leveraging LLMs as Adversarial Engines](https://arxiv.org/abs/2410.18215) Sudarshan Srinivasan, Maria Mahbub, Amir Sadovnik -+ [Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing](https://arxiv.org//abs/2410.18267) ++ [Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing](https://arxiv.org/abs/2410.18267) Dongliang Guo, Mengxuan Hu, Zihan Guan, Junfeng Guo, Thomas Hartvigsen, Sheng Li -+ [Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks](https://arxiv.org//abs/2410.18210) ++ [Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks](https://arxiv.org/abs/2410.18210) Samuele Poppi, Zheng-Xin Yong, Yifei He, Bobbie Chern, Han Zhao, Aobo Yang, Jianfeng Chi -+ [Large Language Models Still Exhibit Bias in Long Text](https://arxiv.org//abs/2410.17519) ++ [Large Language Models Still Exhibit Bias in Long Text](https://arxiv.org/abs/2410.17519) Wonje Jeung, Dongjae Jeon, Ashkan Yousefpour, Jonghyun Choi -+ [Slot: Provenance-Driven APT Detection through Graph Reinforcement Learning](https://arxiv.org//abs/2410.17910) ++ [Slot: Provenance-Driven APT Detection through Graph Reinforcement Learning](https://arxiv.org/abs/2410.17910) Wei Qiao, Yebo Feng, Teng Li, Zhuo Ma, Yulong Shen, JianFeng Ma, Yang Liu # 2024-10-22 -+ [Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In](https://arxiv.org//abs/2410.16950) ++ [Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In](https://arxiv.org/abs/2410.16950) Itay Nakash, George Kour, Guy Uziel, Ateret Anaby-Tavor -+ [Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods](https://arxiv.org//abs/2410.17222) ++ [Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods](https://arxiv.org/abs/2410.17222) Tsachi Blau, Moshe Kimhi, Yonatan Belinkov, Alexander Bronstein, Chaim Baskin -+ [DENOASR: Debiasing ASRs through Selective Denoising](https://arxiv.org//abs/2410.16712) ++ [DENOASR: Debiasing ASRs through Selective Denoising](https://arxiv.org/abs/2410.16712) Anand Kumar Rai, Siddharth D Jaiswal, Shubham Prakash, Bendi Pragnya Sree, Animesh Mukherjee -+ [Evaluating the Effectiveness of Attack-Agnostic Features for Morphing Attack Detection](https://arxiv.org//abs/2410.16802) ++ [Evaluating the Effectiveness of Attack-Agnostic Features for Morphing Attack Detection](https://arxiv.org/abs/2410.16802) Laurent Colbois, Sébastien Marcel -+ [Dual-Model Defense: Safeguarding Diffusion Models from Membership Inference Attacks through Disjoint Data Splitting](https://arxiv.org//abs/2410.16657) ++ [Dual-Model Defense: Safeguarding Diffusion Models from Membership Inference Attacks through Disjoint Data Splitting](https://arxiv.org/abs/2410.16657) Bao Q. Tran, Viet Nguyen, Anh Tran, Toan Tran -+ [LLM-Assisted Red Teaming of Diffusion Models through "Failures Are Fated, But Can Be Faded"](https://arxiv.org//abs/2410.16738) ++ [LLM-Assisted Red Teaming of Diffusion Models through "Failures Are Fated, But Can Be Faded"](https://arxiv.org/abs/2410.16738) Som Sagar, Aditya Taparia, Ransalu Senanayake -+ [Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost](https://arxiv.org//abs/2410.16805) ++ [Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost](https://arxiv.org/abs/2410.16805) Cheng-Han Yeh, Kuanchun Yu, Chun-Shien Lu -+ [Optimal Robust Estimation under Local and Global Corruptions: Stronger Adversary and Smaller Error](https://arxiv.org//abs/2410.17230) ++ [Optimal Robust Estimation under Local and Global Corruptions: Stronger Adversary and Smaller Error](https://arxiv.org/abs/2410.17230) Thanasis Pittas, Ankit Pensia -+ [BETA: Automated Black-box Exploration for Timing Attacks in Processors](https://arxiv.org//abs/2410.16648) ++ [BETA: Automated Black-box Exploration for Timing Attacks in Processors](https://arxiv.org/abs/2410.16648) Congcong Chen, Jinhua Cui, Jiliang Zhang -+ [On the Vulnerability of Text Sanitization](https://arxiv.org//abs/2410.17052) ++ [On the Vulnerability of Text Sanitization](https://arxiv.org/abs/2410.17052) Meng Tong, Kejiang Chen, Xiaojian Yuang, Jiayang Liu, Weiming Zhang, Nenghai Yu, Jie Zhang # 2024-10-21 -+ [Boosting Jailbreak Transferability for Large Language Models](https://arxiv.org//abs/2410.15645) ++ [Boosting Jailbreak Transferability for Large Language Models](https://arxiv.org/abs/2410.15645) Hanqing Liu, Lifeng Zhou, Huanqian Yan -+ [Reducing Hallucinations in Vision-Language Models via Latent Space Steering](https://arxiv.org//abs/2410.15778) ++ [Reducing Hallucinations in Vision-Language Models via Latent Space Steering](https://arxiv.org/abs/2410.15778) Sheng Liu, Haotian Ye, James Zou -+ [Model Mimic Attack: Knowledge Distillation for Provably Transferable Adversarial Examples](https://arxiv.org//abs/2410.15889) ++ [Model Mimic Attack: Knowledge Distillation for Provably Transferable Adversarial Examples](https://arxiv.org/abs/2410.15889) Kirill Lukyanov, Andrew Perminov, Denis Turdakov, Mikhail Pautov -+ [SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis](https://arxiv.org//abs/2410.15641) ++ [SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis](https://arxiv.org/abs/2410.15641) Aidan Wong, He Cao, Zijing Liu, Yu Li -+ [A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns](https://arxiv.org//abs/2410.16155) ++ [A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns](https://arxiv.org/abs/2410.16155) Tianyi Men, Pengfei Cao, Zhuoran Jin, Yubo Chen, Kang Liu, Jun Zhao -+ [Can Knowledge Editing Really Correct Hallucinations?](https://arxiv.org//abs/2410.16251) ++ [Can Knowledge Editing Really Correct Hallucinations?](https://arxiv.org/abs/2410.16251) Baixiang Huang, Canyu Chen, Xiongxiao Xu, Ali Payani, Kai Shu -+ [Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation](https://arxiv.org//abs/2410.15618) ++ [Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation](https://arxiv.org/abs/2410.15618) Anh Bui, Long Vuong, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung -+ [A Realistic Threat Model for Large Language Model Jailbreaks](https://arxiv.org//abs/2410.16222) ++ [A Realistic Threat Model for Large Language Model Jailbreaks](https://arxiv.org/abs/2410.16222) Valentyn Boreiko, Alexander Panfilov, Vaclav Voracek, Matthias Hein, Jonas Geiping -+ [On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds](https://arxiv.org//abs/2410.16073) ++ [On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds](https://arxiv.org/abs/2410.16073) Matteo Vilucchio, Nikolaos Tsilivis, Bruno Loureiro, Julia Kempe -+ [Conflict-Aware Adversarial Training](https://arxiv.org//abs/2410.16579) ++ [Conflict-Aware Adversarial Training](https://arxiv.org/abs/2410.16579) Zhiyu Xue, Haohan Wang, Yao Qin, Ramtin Pedarsani -+ [Simplicity Bias via Global Convergence of Sharpness Minimization](https://arxiv.org//abs/2410.16401) ++ [Simplicity Bias via Global Convergence of Sharpness Minimization](https://arxiv.org/abs/2410.16401) Khashayar Gatmiry, Zhiyuan Li, Sashank J. Reddi, Stefanie Jegelka -+ [Enhancing PAC Learning of Half spaces Through Robust Optimization Techniques](https://arxiv.org//abs/2410.16573) ++ [Enhancing PAC Learning of Half spaces Through Robust Optimization Techniques](https://arxiv.org/abs/2410.16573) Shirmohammad Tavangari, Zahra Shakarami, Aref Yelghi, Asef Yelghi # 2024-10-20 -+ [Jailbreaking and Mitigation of Vulnerabilities in Large Language Models](https://arxiv.org//abs/2410.15236) ++ [Jailbreaking and Mitigation of Vulnerabilities in Large Language Models](https://arxiv.org/abs/2410.15236) Benji Peng, Ziqian Bi, Qian Niu, Ming Liu, Pohsun Feng, Tianyang Wang, Lawrence K.Q. Yan, Yizhu Wen, Yichao Zhang, Caitlyn Heqi Yin -+ [Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models](https://arxiv.org//abs/2410.15362) ++ [Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models](https://arxiv.org/abs/2410.15362) Xiao Li, Zhuhong Li, Qiongxiu Li, Bingze Lee, Jinghao Cui, Xiaolin Hu -+ [The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks](https://arxiv.org//abs/2410.15396) ++ [The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks](https://arxiv.org/abs/2410.15396) Daniel Ayzenshteyn, Roy Weiss, Yisroel Mirsky -+ [PEAS: A Strategy for Crafting Transferable Adversarial Examples](https://arxiv.org//abs/2410.15409) ++ [PEAS: A Strategy for Crafting Transferable Adversarial Examples](https://arxiv.org/abs/2410.15409) Bar Avraham, Yisroel Mirsky -+ [CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges](https://arxiv.org//abs/2410.15393) ++ [CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges](https://arxiv.org/abs/2410.15393) Haitao Li, Junjie Chen, Qingyao Ai, Zhumin Chu, Yujia Zhou, Qian Dong, Yiqun Liu -+ [Modality-Fair Preference Optimization for Trustworthy MLLM Alignment](https://arxiv.org//abs/2410.15334) ++ [Modality-Fair Preference Optimization for Trustworthy MLLM Alignment](https://arxiv.org/abs/2410.15334) Songtao Jiang, Yan Zhang, Ruizhe Chen, Yeying Jin, Zuozhu Liu -+ [DynaVINS++: Robust Visual-Inertial State Estimator in Dynamic Environments by Adaptive Truncated Least Squares and Stable State Recovery](https://arxiv.org//abs/2410.15373) ++ [DynaVINS++: Robust Visual-Inertial State Estimator in Dynamic Environments by Adaptive Truncated Least Squares and Stable State Recovery](https://arxiv.org/abs/2410.15373) Seungwon Song, Hyungtae Lim, Alex Junho Lee, Hyun Myung @@ -18879,134 +18879,134 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Arisrei Lim, Abhiram Maddukuri # 2024-10-19 -+ [Bias Amplification: Language Models as Increasingly Biased Media](https://arxiv.org//abs/2410.15234) ++ [Bias Amplification: Language Models as Increasingly Biased Media](https://arxiv.org/abs/2410.15234) Ze Wang, Zekun Wu, Jeremy Zhang, Navya Jain, Xin Guan, Adriano Koshiyama -+ [Adversarial Training: A Survey](https://arxiv.org//abs/2410.15042) ++ [Adversarial Training: A Survey](https://arxiv.org/abs/2410.15042) Mengnan Zhao, Lihe Zhang, Jingwen Ye, Huchuan Lu, Baocai Yin, Xinchao Wang -+ [Mind the Remaining: Mechanism Design for Robust Federated Unlearning](https://arxiv.org//abs/2410.15045) ++ [Mind the Remaining: Mechanism Design for Robust Federated Unlearning](https://arxiv.org/abs/2410.15045) Jiaqi Shao, Tao Lin, Bing Luo -+ [SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation](https://arxiv.org//abs/2410.15075) ++ [SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation](https://arxiv.org/abs/2410.15075) Chen-Hsiu Huang, Ja-Ling Wu -+ [Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models](https://arxiv.org//abs/2410.15116) ++ [Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models](https://arxiv.org/abs/2410.15116) Qitan Lv, Jie Wang, Hanzhu Chen, Bin Li, Yongdong Zhang, Feng Wu -+ [Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models](https://arxiv.org//abs/2410.15107) ++ [Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models](https://arxiv.org/abs/2410.15107) Seong-Il Park, Jay-Yoon Lee -+ [Attack as Defense: Run-time Backdoor Implantation for Image Content Protection](https://arxiv.org//abs/2410.14966) ++ [Attack as Defense: Run-time Backdoor Implantation for Image Content Protection](https://arxiv.org/abs/2410.14966) Haichuan Zhang, Meiyu Lin, Zhaoyi Liu, Renyuan Li, Zhiyuan Cheng, Carl Yang, Mingjie Tang # 2024-10-18 -+ [Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation](https://arxiv.org//abs/2410.14425) ++ [Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation](https://arxiv.org/abs/2410.14425) Shuai Zhao, Xiaobao Wu, Cong-Duy Nguyen, Meihuizi Jia, Yichao Feng, Luu Anh Tuan -+ [Real-time Fake News from Adversarial Feedback](https://arxiv.org//abs/2410.14651) ++ [Real-time Fake News from Adversarial Feedback](https://arxiv.org/abs/2410.14651) Sanxing Chen, Yukun Huang, Bhuwan Dhingra -+ [NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples](https://arxiv.org//abs/2410.14669) ++ [NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples](https://arxiv.org/abs/2410.14669) Baiqi Li, Zhiqiu Lin, Wenxuan Peng, Jean de Dieu Nyandwi, Daniel Jiang, Zixian Ma, Simran Khanuja, Ranjay Krishna, Graham Neubig, Deva Ramanan -+ [DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks](https://arxiv.org//abs/2410.14105) ++ [DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks](https://arxiv.org/abs/2410.14105) Hao Sui, Bing Chen, Jiale Zhang, Chengcheng Zhu, Di Wu, Qinghua Lu, Guodong Long -+ [Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models](https://arxiv.org//abs/2410.14479) ++ [Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models](https://arxiv.org/abs/2410.14479) Cody Clop, Yannick Teglia -+ [SignAttention: On the Interpretability of Transformer Models for Sign Language Translation](https://arxiv.org//abs/2410.14506) ++ [SignAttention: On the Interpretability of Transformer Models for Sign Language Translation](https://arxiv.org/abs/2410.14506) Pedro Alejandro Dal Bianco, Oscar Agustín Stanchi, Facundo Manuel Quiroga, Franco Ronchetti, Enzo Ferrante -+ [On the Regularization of Learnable Embeddings for Time Series Processing](https://arxiv.org//abs/2410.14630) ++ [On the Regularization of Learnable Embeddings for Time Series Processing](https://arxiv.org/abs/2410.14630) Luca Butera, Giovanni De Felice, Andrea Cini, Cesare Alippi -+ [MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time](https://arxiv.org//abs/2410.14184) ++ [MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time](https://arxiv.org/abs/2410.14184) Mozhi Zhang, Pengyu Wang, Chenkun Tan, Mianqiu Huang, Dong Zhang, Yaqian Zhou, Xipeng Qiu -+ [MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps](https://arxiv.org//abs/2410.14668) ++ [MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps](https://arxiv.org/abs/2410.14668) Xiongtao Zhou, Jie He, Lanyu Chen, jingyu li, Haojing Chen, Victor Gutierrez Basulto, Jeff Z. Pan, Hanjie Chen -+ [Zero-shot Action Localization via the Confidence of Large Vision-Language Models](https://arxiv.org//abs/2410.14340) ++ [Zero-shot Action Localization via the Confidence of Large Vision-Language Models](https://arxiv.org/abs/2410.14340) Josiah Aklilu, Xiaohan Wang, Serena Yeung-Levy -+ [A Mirror Descent Perspective of Smoothed Sign Descent](https://arxiv.org//abs/2410.14158) ++ [A Mirror Descent Perspective of Smoothed Sign Descent](https://arxiv.org/abs/2410.14158) Shuyang Wang, Diego Klabjan -+ [Decomposing The Dark Matter of Sparse Autoencoders](https://arxiv.org//abs/2410.14670) ++ [Decomposing The Dark Matter of Sparse Autoencoders](https://arxiv.org/abs/2410.14670) Joshua Engels, Logan Riggs, Max Tegmark -+ [Optimizing importance weighting in the presence of sub-population shifts](https://arxiv.org//abs/2410.14315) ++ [Optimizing importance weighting in the presence of sub-population shifts](https://arxiv.org/abs/2410.14315) Floris Holstege, Bram Wouters, Noud van Giersbergen, Cees Diks -+ [A Lipschitz spaces view of infinitely wide shallow neural networks](https://arxiv.org//abs/2410.14591) ++ [A Lipschitz spaces view of infinitely wide shallow neural networks](https://arxiv.org/abs/2410.14591) Francesca Bartolucci, Marcello Carioni, José A. Iglesias, Yury Korolev, Emanuele Naldi, Stefano Vigogna -+ [Not Sure Your Car Withstands Cyberwarfare](https://arxiv.org//abs/2410.14320) ++ [Not Sure Your Car Withstands Cyberwarfare](https://arxiv.org/abs/2410.14320) Giampaolo Bella, Gianpietro Castiglione, Sergio Esposito, Mario Raciti, Salvatore Riccobene -+ [Safeguarding Blockchain Ecosystem: Understanding and Detecting Attack Transactions on Cross-chain Bridges](https://arxiv.org//abs/2410.14493) ++ [Safeguarding Blockchain Ecosystem: Understanding and Detecting Attack Transactions on Cross-chain Bridges](https://arxiv.org/abs/2410.14493) Jiajing Wu, Kaixin Lin, Dan Lin, Bozhao Zhang, Zhiying Wu, Jianzhong Su -+ [Soft-Label Integration for Robust Toxicity Classification](https://arxiv.org//abs/2410.14894) ++ [Soft-Label Integration for Robust Toxicity Classification](https://arxiv.org/abs/2410.14894) Zelei Cheng, Xian Wu, Jiahao Yu, Shuo Han, Xin-Qiang Cai, Xinyu Xing -+ [Making LLMs Vulnerable to Prompt Injection via Poisoning Alignment](https://arxiv.org//abs/2410.14827) ++ [Making LLMs Vulnerable to Prompt Injection via Poisoning Alignment](https://arxiv.org/abs/2410.14827) Zedian Shao, Hongbin Liu, Jaden Mu, Neil Zhenqiang Gong -+ [A Hybrid Defense Strategy for Boosting Adversarial Robustness in Vision-Language Models](https://arxiv.org//abs/2410.14911) ++ [A Hybrid Defense Strategy for Boosting Adversarial Robustness in Vision-Language Models](https://arxiv.org/abs/2410.14911) Yuhan Liang, Yijun Li, Yumeng Niu, Qianhe Shen, Hangyu Liu @@ -19017,586 +19017,586 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Sanxing Chen, Yukun Huang, Bhuwan Dhingra # 2024-10-17 -+ [From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization](https://arxiv.org//abs/2410.13961) ++ [From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization](https://arxiv.org/abs/2410.13961) Catarina G. Belem, Pouya Pezeskhpour, Hayate Iso, Seiji Maekawa, Nikita Bhutani, Estevam Hruschka -+ [MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks](https://arxiv.org//abs/2410.14089) ++ [MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks](https://arxiv.org/abs/2410.14089) Xinxin Liu, Zhongliang Guo, Siyuan Huang, Chun Pong Lau -+ [Trojan Prompt Attacks on Graph Neural Networks](https://arxiv.org//abs/2410.13974) ++ [Trojan Prompt Attacks on Graph Neural Networks](https://arxiv.org/abs/2410.13974) Minhua Lin, Zhiwei Zhang, Enyan Dai, Zongyu Wu, Yilong Wang, Xiang Zhang, Suhang Wang -+ [Adversarial Inception for Bounded Backdoor Poisoning in Deep Reinforcement Learning](https://arxiv.org//abs/2410.13995) ++ [Adversarial Inception for Bounded Backdoor Poisoning in Deep Reinforcement Learning](https://arxiv.org/abs/2410.13995) Ethan Rathbun, Christopher Amato, Alina Oprea -+ [Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace](https://arxiv.org//abs/2410.13910) ++ [Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace](https://arxiv.org/abs/2410.13910) Jinluan Yang, Anke Tang, Didi Zhu, Zhengyu Chen, Li Shen, Fei Wu -+ [Cyber Attacks Prevention Towards Prosumer-based EV Charging Stations: An Edge-assisted Federated Prototype Knowledge Distillation Approach](https://arxiv.org//abs/2410.13260) ++ [Cyber Attacks Prevention Towards Prosumer-based EV Charging Stations: An Edge-assisted Federated Prototype Knowledge Distillation Approach](https://arxiv.org/abs/2410.13260) Luyao Zou, Quang Hieu Vo, Kitae Kim, Huy Q. Le, Chu Myaet Thwal, Chaoning Zhang, Choong Seon Hong -+ [Private Counterfactual Retrieval](https://arxiv.org//abs/2410.13812) ++ [Private Counterfactual Retrieval](https://arxiv.org/abs/2410.13812) Mohamed Nomeir, Pasan Dissanayake, Shreya Meel, Sanghamitra Dutta, Sennur Ulukus # 2024-10-16 -+ [Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning](https://arxiv.org//abs/2410.12130) ++ [Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning](https://arxiv.org/abs/2410.12130) Huiwen Wu, Xiaohan Li, Xiaogang Xu, Jiafei Wu, Deyi Zhang, Zhe Liu -+ [Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving](https://arxiv.org//abs/2410.12568) ++ [Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving](https://arxiv.org/abs/2410.12568) Sihao Wu, Jiaxu Liu, Xiangyu Yin, Guangliang Cheng, Meng Fang, Xingyu Zhao, Xinping Yi, Xiaowei Huang -+ [Low-Rank Adversarial PGD Attack](https://arxiv.org//abs/2410.12607) ++ [Low-Rank Adversarial PGD Attack](https://arxiv.org/abs/2410.12607) Dayana Savostianova, Emanuele Zangrando, Francesco Tudisco -+ [SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation](https://arxiv.org//abs/2410.12761) ++ [SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation](https://arxiv.org/abs/2410.12761) Jaehong Yoon, Shoubin Yu, Vaidehi Patil, Huaxiu Yao, Mohit Bansal -+ [Vaccinating Federated Learning for Robust Modulation Classification in Distributed Wireless Networks](https://arxiv.org//abs/2410.12772) ++ [Vaccinating Federated Learning for Robust Modulation Classification in Distributed Wireless Networks](https://arxiv.org/abs/2410.12772) Hunmin Lee, Hongju Seong, Wonbin Kim, Hyeokchan Kwon, Daehee Seo -+ [DAT: Improving Adversarial Robustness via Generative Amplitude Mix-up in Frequency Domain](https://arxiv.org//abs/2410.12307) ++ [DAT: Improving Adversarial Robustness via Generative Amplitude Mix-up in Frequency Domain](https://arxiv.org/abs/2410.12307) Fengpeng Li, Kemou Li, Haiwei Wu, Jinyu Tian, Jiantao Zhou # 2024-10-15 -+ [Towards General Deepfake Detection with Dynamic Curriculum](https://arxiv.org//abs/2410.11162) ++ [Towards General Deepfake Detection with Dynamic Curriculum](https://arxiv.org/abs/2410.11162) Wentang Song, Yuzhen Lin, Bin Li -+ [Archilles' Heel in Semi-open LLMs: Hiding Bottom against Recovery Attacks](https://arxiv.org//abs/2410.11182) ++ [Archilles' Heel in Semi-open LLMs: Hiding Bottom against Recovery Attacks](https://arxiv.org/abs/2410.11182) Hanbo Huang, Yihan Li, Bowen Jiang, Lin Liu, Ruoyu Sun, Zhuotao Liu, Shiyu Liang -+ [Backdoor Attack on Vertical Federated Graph Neural Network Learning](https://arxiv.org//abs/2410.11290) ++ [Backdoor Attack on Vertical Federated Graph Neural Network Learning](https://arxiv.org/abs/2410.11290) Jirui Yang, Peng Chen, Zhihui Lu, Ruijun Deng, Qiang Duan, Jianping Zeng -+ [Multi-round jailbreak attack on large language models](https://arxiv.org//abs/2410.11533) ++ [Multi-round jailbreak attack on large language models](https://arxiv.org/abs/2410.11533) Yihua Zhou, Xiaochuan Shi -+ [Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions](https://arxiv.org//abs/2410.11701) ++ [Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions](https://arxiv.org/abs/2410.11701) Yuhan Fu, Ruobing Xie, Jiazhen Liu, Bangxiang Lan, Xingwu Sun, Zhanhui Kang, Xirong Li -+ [Cognitive Overload Attack:Prompt Injection for Long Context](https://arxiv.org//abs/2410.11272) ++ [Cognitive Overload Attack:Prompt Injection for Long Context](https://arxiv.org/abs/2410.11272) Bibek Upadhayay, Vahid Behzadan, Amin Karbasi -+ [Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation](https://arxiv.org//abs/2410.11317) ++ [Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation](https://arxiv.org/abs/2410.11317) Qizhang Li, Xiaochen Yang, Wangmeng Zuo, Yiwen Guo -+ [Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models](https://arxiv.org//abs/2410.11639) ++ [Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models](https://arxiv.org/abs/2410.11639) Fan Yang, Yihao Huang, Kailong Wang, Ling Shi, Geguang Pu, Yang Liu, Haoyu Wang -+ [Adversarially Guided Stateful Defense Against Backdoor Attacks in Federated Deep Learning](https://arxiv.org//abs/2410.11205) ++ [Adversarially Guided Stateful Defense Against Backdoor Attacks in Federated Deep Learning](https://arxiv.org/abs/2410.11205) Hassan Ali, Surya Nepal, Salil S. Kanhere, Sanjay Jha -+ [AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment](https://arxiv.org//abs/2410.11283) ++ [AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment](https://arxiv.org/abs/2410.11283) Pankayaraj Pathmanathan, Udari Madhushani Sehwag, Michael-Andrei Panaitescu-Liess, Furong Huang -+ [Bias Similarity Across Large Language Models](https://arxiv.org//abs/2410.12010) ++ [Bias Similarity Across Large Language Models](https://arxiv.org/abs/2410.12010) Hyejun Jeong, Shiqing Ma, Amir Houmansadr -+ [Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction](https://arxiv.org//abs/2410.12040) ++ [Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction](https://arxiv.org/abs/2410.12040) Kaiqiao Han, Tianqing Fang, Zhaowei Wang, Yangqiu Song, Mark Steedman -+ [Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning](https://arxiv.org//abs/2410.12085) ++ [Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning](https://arxiv.org/abs/2410.12085) Fengyu Gao, Ruida Zhou, Tianhao Wang, Cong Shen, Jing Yang -+ [Taking off the Rose-Tinted Glasses: A Critical Look at Adversarial ML Through the Lens of Evasion Attacks](https://arxiv.org//abs/2410.12076) ++ [Taking off the Rose-Tinted Glasses: A Critical Look at Adversarial ML Through the Lens of Evasion Attacks](https://arxiv.org/abs/2410.12076) Kevin Eykholt, Farhan Ahmed, Pratik Vaishnavi, Amir Rahmati -+ [BeniFul: Backdoor Defense via Middle Feature Analysis for Deep Neural Networks](https://arxiv.org//abs/2410.14723) ++ [BeniFul: Backdoor Defense via Middle Feature Analysis for Deep Neural Networks](https://arxiv.org/abs/2410.14723) Xinfu Li, Junying Zhang, Xindi Ma -+ [DiffGAN: A Test Generation Approach for Differential Testing of Deep Neural Networks for Image Analysis](https://arxiv.org//abs/2410.19794) ++ [DiffGAN: A Test Generation Approach for Differential Testing of Deep Neural Networks for Image Analysis](https://arxiv.org/abs/2410.19794) Zohreh Aghababaeyan, Manel Abdellatif, Lionel Briand, Ramesh S # 2024-10-14 -+ [Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting](https://arxiv.org//abs/2410.10150) ++ [Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting](https://arxiv.org/abs/2410.10150) Yifan Luo, Zhennan Zhou, Meitan Wang, Bin Dong -+ [Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention](https://arxiv.org//abs/2410.10184) ++ [Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention](https://arxiv.org/abs/2410.10184) Ying Liu, Ge Bai, Chenji Lu, Shilong Li, Zhang Zhang, Ruifang Liu, Wenbin Guo -+ [ROSAR: An Adversarial Re-Training Framework for Robust Side-Scan Sonar Object Detection](https://arxiv.org//abs/2410.10554) ++ [ROSAR: An Adversarial Re-Training Framework for Robust Side-Scan Sonar Object Detection](https://arxiv.org/abs/2410.10554) Martin Aubard, László Antal, Ana Madureira, Luis F. Teixeira, Erika Ábrahám -+ [Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach](https://arxiv.org//abs/2410.10674) ++ [Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach](https://arxiv.org/abs/2410.10674) Rory Young, Nicolas Pugeault -+ [Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues](https://arxiv.org//abs/2410.10700) ++ [Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues](https://arxiv.org/abs/2410.10700) Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao -+ [Locking Down the Finetuned LLMs Safety](https://arxiv.org//abs/2410.10343) ++ [Locking Down the Finetuned LLMs Safety](https://arxiv.org/abs/2410.10343) Minjun Zhu, Linyi Yang, Yifan Wei, Ningyu Zhang, Yue Zhang -+ [Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors](https://arxiv.org//abs/2410.10091) ++ [Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors](https://arxiv.org/abs/2410.10091) Tao Lin, Lijia Yu, Gaojie Jin, Renjue Li, Peng Wu, Lijun Zhang -+ [Identity-Focused Inference and Extraction Attacks on Diffusion Models](https://arxiv.org//abs/2410.10177) ++ [Identity-Focused Inference and Extraction Attacks on Diffusion Models](https://arxiv.org/abs/2410.10177) Jayneel Vora, Aditya Krishnan, Nader Bouacida, Prabhu RV Shankar, Prasant Mohapatra -+ [Capture Artifacts via Progressive Disentangling and Purifying Blended Identities for Deepfake Detection](https://arxiv.org//abs/2410.10244) ++ [Capture Artifacts via Progressive Disentangling and Purifying Blended Identities for Deepfake Detection](https://arxiv.org/abs/2410.10244) Weijie Zhou, Xiaoqing Luo, Zhancheng Zhang, Jiachen He, Xiaojun Wu -+ [Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings](https://arxiv.org//abs/2410.10744) ++ [Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings](https://arxiv.org/abs/2410.10744) Hossein Mirzaei, Mackenzie W. Mathis -+ [Regularized Robustly Reliable Learners and Instance Targeted Attacks](https://arxiv.org//abs/2410.10572) ++ [Regularized Robustly Reliable Learners and Instance Targeted Attacks](https://arxiv.org/abs/2410.10572) Avrim Blum, Donya Saless -+ [Towards Calibrated Losses for Adversarial Robust Reject Option Classification](https://arxiv.org//abs/2410.10736) ++ [Towards Calibrated Losses for Adversarial Robust Reject Option Classification](https://arxiv.org/abs/2410.10736) Vrund Shah, Tejas Chaudhari, Naresh Manwani -+ [SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI](https://arxiv.org//abs/2410.11096) ++ [SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI](https://arxiv.org/abs/2410.11096) Yuzhou Nie, Zhun Wang, Yu Yang, Ruizhe Jiang, Yuheng Tang, Xander Davies, Yarin Gal, Bo Li, Wenbo Guo, Dawn Song # 2024-10-13 -+ [Robust 3D Point Clouds Classification based on Declarative Defenders](https://arxiv.org//abs/2410.09691) ++ [Robust 3D Point Clouds Classification based on Declarative Defenders](https://arxiv.org/abs/2410.09691) Kaidong Li, Tianxiao Zhang, Chuncong Zhong, Ziming Zhang, Guanghui Wang -+ [BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models](https://arxiv.org//abs/2410.09804) ++ [BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models](https://arxiv.org/abs/2410.09804) Xinyuan Wang, Victor Shea-Jay Huang, Renmiao Chen, Hao Wang, Chengwei Pan, Lei Sha, Minlie Huang -+ [Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense](https://arxiv.org//abs/2410.09838) ++ [Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense](https://arxiv.org/abs/2410.09838) Rui Min, Zeyu Qin, Nevin L. Zhang, Li Shen, Minhao Cheng -+ [Understanding Robustness of Parameter-Efficient Tuning for Image Classification](https://arxiv.org//abs/2410.09845) ++ [Understanding Robustness of Parameter-Efficient Tuning for Image Classification](https://arxiv.org/abs/2410.09845) Jiacheng Ruan, Xian Gao, Suncheng Xiang, Mingye Xie, Ting Liu, Yuzhuo Fu -+ [LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models](https://arxiv.org//abs/2410.09962) ++ [LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models](https://arxiv.org/abs/2410.09962) Han Qiu, Jiaxing Huang, Peng Gao, Qin Qi, Xiaoqin Zhang, Ling Shao, Shijian Lu -+ [Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise Perturbation](https://arxiv.org//abs/2410.09760) ++ [Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise Perturbation](https://arxiv.org/abs/2410.09760) Guozhi Liu, Weiwei Lin, Tiansheng Huang, Ruichao Mo, Qi Mu, Li Shen -+ [Uncovering Attacks and Defenses in Secure Aggregation for Federated Deep Learning](https://arxiv.org//abs/2410.09676) ++ [Uncovering Attacks and Defenses in Secure Aggregation for Federated Deep Learning](https://arxiv.org/abs/2410.09676) Yiwei Zhang, Rouzbeh Behnia, Attila A. Yavuz, Reza Ebrahimi, Elisa Bertino # 2024-10-12 -+ [Are You Human? An Adversarial Benchmark to Expose LLMs](https://arxiv.org//abs/2410.09569) ++ [Are You Human? An Adversarial Benchmark to Expose LLMs](https://arxiv.org/abs/2410.09569) Gilad Gressel, Rahul Pankajakshan, Yisroel Mirsky -+ [Impeding LLM-assisted Cheating in Introductory Programming Assignments via Adversarial Perturbations](https://arxiv.org//abs/2410.09318) ++ [Impeding LLM-assisted Cheating in Introductory Programming Assignments via Adversarial Perturbations](https://arxiv.org/abs/2410.09318) Saiful Islam Salim, Rubin Yuchan Yang, Alexander Cooper, Suryashree Ray, Saumya Debray, Sazzadur Rahaman -+ [A Speaker Turn-Aware Multi-Task Adversarial Network for Joint User Satisfaction Estimation and Sentiment Analysis](https://arxiv.org//abs/2410.09556) ++ [A Speaker Turn-Aware Multi-Task Adversarial Network for Joint User Satisfaction Estimation and Sentiment Analysis](https://arxiv.org/abs/2410.09556) Kaisong Song, Yangyang Kang, Jiawei Liu, Xurui Li, Changlong Sun, Xiaozhong Liu -+ [Debiasing Vison-Language Models with Text-Only Training](https://arxiv.org//abs/2410.09365) ++ [Debiasing Vison-Language Models with Text-Only Training](https://arxiv.org/abs/2410.09365) Yunfan Yang, Chaoquan Jiang, Zhiyu Lin, Jinlin Xiao, Jiaming Zhang, Jitao Sang -+ [Decision-Point Guided Safe Policy Improvement](https://arxiv.org//abs/2410.09361) ++ [Decision-Point Guided Safe Policy Improvement](https://arxiv.org/abs/2410.09361) Abhishek Sharma, Leo Benac, Sonali Parbhoo, Finale Doshi-Velez -+ [Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy](https://arxiv.org//abs/2410.09591) ++ [Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy](https://arxiv.org/abs/2410.09591) Yangsibo Huang, Daogao Liu, Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Milad Nasr, Amer Sinha, Chiyuan Zhang # 2024-10-11 -+ [Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning](https://arxiv.org//abs/2410.08540) ++ [Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning](https://arxiv.org/abs/2410.08540) Xinran Li, Ling Pan, Jun Zhang -+ [RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process](https://arxiv.org//abs/2410.08660) ++ [RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process](https://arxiv.org/abs/2410.08660) Peiran Wang, Xiaogeng Liu, Chaowei Xiao -+ [PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning](https://arxiv.org//abs/2410.08811) ++ [PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning](https://arxiv.org/abs/2410.08811) Tingchen Fu, Mrinank Sharma, Philip Torr, Shay B. Cohen, David Krueger, Fazl Barez -+ [The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses](https://arxiv.org//abs/2410.08864) ++ [The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses](https://arxiv.org/abs/2410.08864) Grzegorz Głuch, Berkant Turan, Sai Ganesh Nagarajan, Sebastian Pokutta -+ [On the Adversarial Transferability of Generalized "Skip Connections"](https://arxiv.org//abs/2410.08950) ++ [On the Adversarial Transferability of Generalized "Skip Connections"](https://arxiv.org/abs/2410.08950) Yisen Wang, Yichuan Mo, Dongxian Wu, Mingjie Li, Xingjun Ma, Zhouchen Lin -+ [Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements](https://arxiv.org//abs/2410.08968) ++ [Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements](https://arxiv.org/abs/2410.08968) Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi, Benjamin Van Durme -+ [NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models](https://arxiv.org//abs/2410.08970) ++ [NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models](https://arxiv.org/abs/2410.08970) Zheng Yi Ho, Siyuan Liang, Sen Zhang, Yibing Zhan, Dacheng Tao -+ [AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents](https://arxiv.org//abs/2410.09024) ++ [AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents](https://arxiv.org/abs/2410.09024) Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, Eric Winsor, Jerome Wynne, Yarin Gal, Xander Davies -+ [RoRA-VLM: Robust Retrieval-Augmented Vision Language Models](https://arxiv.org//abs/2410.08876) ++ [RoRA-VLM: Robust Retrieval-Augmented Vision Language Models](https://arxiv.org/abs/2410.08876) Jingyuan Qi, Zhiyang Xu, Rulin Shao, Yang Chen, Jing Di, Yu Cheng, Qifan Wang, Lifu Huang -+ [AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation](https://arxiv.org//abs/2410.09040) ++ [AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation](https://arxiv.org/abs/2410.09040) Zijun Wang, Haoqin Tu, Jieru Mei, Bingchen Zhao, Yisen Wang, Cihang Xie -+ [Natural Language Induced Adversarial Images](https://arxiv.org//abs/2410.08620) ++ [Natural Language Induced Adversarial Images](https://arxiv.org/abs/2410.08620) Xiaopei Zhu, Peiyang Xu, Guanning Zeng, Yingpeng Dong, Xiaolin Hu -+ [Gradients Stand-in for Defending Deep Leakage in Federated Learning](https://arxiv.org//abs/2410.08734) ++ [Gradients Stand-in for Defending Deep Leakage in Federated Learning](https://arxiv.org/abs/2410.08734) H. Yi, H. Ren, C. Hu, Y. Li, J. Deng, X. Xie -+ [Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data](https://arxiv.org//abs/2410.08503) ++ [Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data](https://arxiv.org/abs/2410.08503) Binghui Li, Yuanzhi Li -+ [Fragile Giants: Understanding the Susceptibility of Models to Subpopulation Attacks](https://arxiv.org//abs/2410.08872) ++ [Fragile Giants: Understanding the Susceptibility of Models to Subpopulation Attacks](https://arxiv.org/abs/2410.08872) Isha Gupta, Hidde Lycklama, Emanuel Opel, Evan Rose, Anwar Hithnawi -+ [Training on Fake Labels: Mitigating Label Leakage in Split Learning via Secure Dimension Transformation](https://arxiv.org//abs/2410.09125) ++ [Training on Fake Labels: Mitigating Label Leakage in Split Learning via Secure Dimension Transformation](https://arxiv.org/abs/2410.09125) Yukun Jiang, Peiran Wang, Chengguo Lin, Ziyue Huang, Yong Cheng -+ [Multi-Agent Actor-Critics in Autonomous Cyber Defense](https://arxiv.org//abs/2410.09134) ++ [Multi-Agent Actor-Critics in Autonomous Cyber Defense](https://arxiv.org/abs/2410.09134) Mingjun Wang, Remington Dechene -+ [Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection](https://arxiv.org//abs/2410.09250) ++ [Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection](https://arxiv.org/abs/2410.09250) Chu-Hsuan Abraham Lin, Chen-Yu Liu, Samuel Yen-Chi Chen, Kuan-Cheng Chen # 2024-10-10 -+ [Adversarial Robustness Overestimation and Instability in TRADES](https://arxiv.org//abs/2410.07675) ++ [Adversarial Robustness Overestimation and Instability in TRADES](https://arxiv.org/abs/2410.07675) Jonathan Weiping Li, Ren-Wei Liang, Cheng-Han Yeh, Cheng-Chang Tsai, Kuanchun Yu, Chun-Shien Lu, Shang-Tse Chen -+ [Private Language Models via Truncated Laplacian Mechanism](https://arxiv.org//abs/2410.08027) ++ [Private Language Models via Truncated Laplacian Mechanism](https://arxiv.org/abs/2410.08027) Tianhao Huang, Tao Yang, Ivan Habernal, Lijie Hu, Di Wang -+ [DPL: Cross-quality DeepFake Detection via Dual Progressive Learning](https://arxiv.org//abs/2410.07633) ++ [DPL: Cross-quality DeepFake Detection via Dual Progressive Learning](https://arxiv.org/abs/2410.07633) Dongliang Zhang, Yunfei Li, Jiaran Zhou, Yuezun Li -+ [Poison-splat: Computation Cost Attack on 3D Gaussian Splatting](https://arxiv.org//abs/2410.08190) ++ [Poison-splat: Computation Cost Attack on 3D Gaussian Splatting](https://arxiv.org/abs/2410.08190) Jiahao Lu, Yifan Zhang, Qiuhong Shen, Xinchao Wang, Shuicheng Yan -+ [Provable Privacy Attacks on Trained Shallow Neural Networks](https://arxiv.org//abs/2410.07632) ++ [Provable Privacy Attacks on Trained Shallow Neural Networks](https://arxiv.org/abs/2410.07632) Guy Smorodinsky, Gal Vardi, Itay Safran -+ [Understanding Adversarially Robust Generalization via Weight-Curvature Index](https://arxiv.org//abs/2410.07719) ++ [Understanding Adversarially Robust Generalization via Weight-Curvature Index](https://arxiv.org/abs/2410.07719) Yuelin Xu, Xiao Zhang -+ [Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery](https://arxiv.org//abs/2410.07643) ++ [Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery](https://arxiv.org/abs/2410.07643) Yangchun Zhang, Wang Zhou, Yirui Zhou -+ [Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks](https://arxiv.org//abs/2410.07670) ++ [Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks](https://arxiv.org/abs/2410.07670) Minxing Zhang, Michael Backes, Xiao Zhang -+ [Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities](https://arxiv.org//abs/2410.09114) ++ [Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities](https://arxiv.org/abs/2410.09114) Andrey Anurin, Jonathan Ng, Kibo Schaffer, Ziyue Wang, Jason Schreiber, Esben Kran -+ [Privately Learning from Graphs with Applications in Fine-tuning Large Language Models](https://arxiv.org//abs/2410.08299) ++ [Privately Learning from Graphs with Applications in Fine-tuning Large Language Models](https://arxiv.org/abs/2410.08299) Haoteng Yin, Rongzhe Wei, Eli Chien, Pan Li # 2024-10-09 -+ [Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders](https://arxiv.org//abs/2410.06462) ++ [Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders](https://arxiv.org/abs/2410.06462) David Noever, Forrest McKee -+ [Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models](https://arxiv.org//abs/2410.06699) ++ [Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models](https://arxiv.org/abs/2410.06699) Yubo Wang, Chaohu Liu, Yanqiu Qu, Haoyu Cao, Deqiang Jiang, Linli Xu -+ [Defending Membership Inference Attacks via Privacy-aware Sparsity Tuning](https://arxiv.org//abs/2410.06814) ++ [Defending Membership Inference Attacks via Privacy-aware Sparsity Tuning](https://arxiv.org/abs/2410.06814) Qiang Hu, Hengxiang Zhang, Hongxin Wei -+ [Understanding Model Ensemble in Transferable Adversarial Attack](https://arxiv.org//abs/2410.06851) ++ [Understanding Model Ensemble in Transferable Adversarial Attack](https://arxiv.org/abs/2410.06851) Wei Yao, Zeliang Zhang, Huayi Tang, Yong Liu -+ [Secure Video Quality Assessment Resisting Adversarial Attacks](https://arxiv.org//abs/2410.06866) ++ [Secure Video Quality Assessment Resisting Adversarial Attacks](https://arxiv.org/abs/2410.06866) Ao-Xiang Zhang, Yu Ran, Weixuan Tang, Yuan-Gen Wang, Qingxiao Guan, Chunsheng Yang -+ [PFAttack: Stealthy Attack Bypassing Group Fairness in Federated Learning](https://arxiv.org//abs/2410.06509) ++ [PFAttack: Stealthy Attack Bypassing Group Fairness in Federated Learning](https://arxiv.org/abs/2410.06509) Jiashi Gao, Ziwei Wang, Xiangyu Zhao, Xin Yao, Xuetao Wei -+ [Average Certified Radius is a Poor Metric for Randomized Smoothing](https://arxiv.org//abs/2410.06895) ++ [Average Certified Radius is a Poor Metric for Randomized Smoothing](https://arxiv.org/abs/2410.06895) Chenhao Sun, Yuhao Mao, Mark Niklas Müller, Martin Vechev -+ [Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility](https://arxiv.org//abs/2410.06921) ++ [Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility](https://arxiv.org/abs/2410.06921) Rajdeep Haldar, Yue Xing, Qifan Song, Guang Lin -+ [Bots can Snoop: Uncovering and Mitigating Privacy Risks of Bots in Group Chats](https://arxiv.org//abs/2410.06587) ++ [Bots can Snoop: Uncovering and Mitigating Privacy Risks of Bots in Group Chats](https://arxiv.org/abs/2410.06587) Kai-Hsiang Chou, Yi-Min Lin, Yi-An Wang, Jonathan Weiping Li, Tiffany Hyun-Jin Kim, Hsu-Chun Hsiao -+ [Mind Your Questions Towards Backdoor Attacks on Text-to-Visualization Models](https://arxiv.org//abs/2410.06782) ++ [Mind Your Questions Towards Backdoor Attacks on Text-to-Visualization Models](https://arxiv.org/abs/2410.06782) Shuaimin Li, Yuanfeng Song, Xuanang Chen, Anni Peng, Zhuoyue Wan, Chen Jason Zhang, Raymond Chi-Wing Wong -+ [Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems](https://arxiv.org//abs/2410.07283) ++ [Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems](https://arxiv.org/abs/2410.07283) Donghyun Lee, Mo Tiwari -+ [Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations](https://arxiv.org//abs/2410.09097) ++ [Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations](https://arxiv.org/abs/2410.09097) Tarun Raheja, Nilay Pochhi -+ [Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy](https://arxiv.org//abs/2410.09102) ++ [Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy](https://arxiv.org/abs/2410.09102) Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, Wenxuan Zhou -+ [Data Taggants: Dataset Ownership Verification via Harmless Targeted Data Poisoning](https://arxiv.org//abs/2410.09101) ++ [Data Taggants: Dataset Ownership Verification via Harmless Targeted Data Poisoning](https://arxiv.org/abs/2410.09101) Wassim Bouaziz, El-Mahdi El-Mhamdi, Nicolas Usunier -+ [Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification](https://arxiv.org//abs/2410.06816) ++ [Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification](https://arxiv.org/abs/2410.06816) Yuhao Mao, Yani Zhang, Martin Vechev -+ [Degree-Conscious Spiking Graph for Cross-Domain Adaptation](https://arxiv.org//abs/2410.06883) ++ [Degree-Conscious Spiking Graph for Cross-Domain Adaptation](https://arxiv.org/abs/2410.06883) Yingxu Wang, Mengzhu Wang, Houcheng Su, Nan Yin, Quanming Yao, James Kwok # 2024-10-08 -+ [Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning](https://arxiv.org//abs/2410.06304) ++ [Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning](https://arxiv.org/abs/2410.06304) Ruosen Li, Ziming Luo, Xinya Du -+ [DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing](https://arxiv.org//abs/2410.05694) ++ [DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing](https://arxiv.org/abs/2410.05694) June Suk Choi, Kyungmin Lee, Jongheon Jeong, Saining Xie, Jinwoo Shin, Kimin Lee -+ [Hyper Adversarial Tuning for Boosting Adversarial Robustness of Pretrained Large Vision Models](https://arxiv.org//abs/2410.05951) ++ [Hyper Adversarial Tuning for Boosting Adversarial Robustness of Pretrained Large Vision Models](https://arxiv.org/abs/2410.05951) Kangtao Lv, Huangsen Cao, Kainan Tu, Yihuai Xu, Zhimeng Zhang, Xin Ding, Yongwei Wang -+ [$\textit{X}^2$-DFD: A framework for e${X}$plainable and e${X}$tendable Deepfake Detection](https://arxiv.org//abs/2410.06126) ++ [$\textit{X}^2$-DFD: A framework for e${X}$plainable and e${X}$tendable Deepfake Detection](https://arxiv.org/abs/2410.06126) Yize Chen, Zhiyuan Yan, Siwei Lyu, Baoyuan Wu -+ [CALoR: Towards Comprehensive Model Inversion Defense](https://arxiv.org//abs/2410.05814) ++ [CALoR: Towards Comprehensive Model Inversion Defense](https://arxiv.org/abs/2410.05814) Hongyao Yu, Yixiang Qiu, Hao Fang, Bin Chen, Sijin Yu, Bin Wang, Shu-Tao Xia, Ke Xu -+ [Solving robust MDPs as a sequence of static RL problems](https://arxiv.org//abs/2410.06212) ++ [Solving robust MDPs as a sequence of static RL problems](https://arxiv.org/abs/2410.06212) Adil Zouitine, Matthieu Geist, Emmanuel Rachelson -+ [Filtered Randomized Smoothing: A New Defense for Robust Modulation Classification](https://arxiv.org//abs/2410.06339) ++ [Filtered Randomized Smoothing: A New Defense for Robust Modulation Classification](https://arxiv.org/abs/2410.06339) Wenhan Zhang, Meiyu Zhong, Ravi Tandon, Marwan Krunz # 2024-10-07 -+ [Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models](https://arxiv.org//abs/2410.04884) ++ [Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models](https://arxiv.org/abs/2410.04884) Dehong Kong, Siyuan Liang, Xiaopeng Zhu, Yuansheng Zhong, Wenqi Ren -+ [Defense-as-a-Service: Black-box Shielding against Backdoored Graph Models](https://arxiv.org//abs/2410.04916) ++ [Defense-as-a-Service: Black-box Shielding against Backdoored Graph Models](https://arxiv.org/abs/2410.04916) Xiao Yang, Kai Zhou, Yuni Lai, Gaolei Li -+ [Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality](https://arxiv.org//abs/2410.04780) ++ [Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality](https://arxiv.org/abs/2410.04780) Guanyu Zhou, Yibo Yan, Xin Zou, Kun Wang, Aiwei Liu, Xuming Hu -+ [CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models](https://arxiv.org//abs/2410.04823) ++ [CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models](https://arxiv.org/abs/2410.04823) Songning Lai, Jiayu Yang, Yu Huang, Lijie Hu, Tianlang Xue, Zhangyi Hu, Jiaxu Li, Haicheng Liao, Yutao Yue -+ [MIBench: A Comprehensive Benchmark for Model Inversion Attack and Defense](https://arxiv.org//abs/2410.05159) ++ [MIBench: A Comprehensive Benchmark for Model Inversion Attack and Defense](https://arxiv.org/abs/2410.05159) Yixiang Qiu, Hongyao Yu, Hao Fang, Wenbo Yu, Bin Chen, Xuan Wang, Shu-Tao Xia, Ke Xu -+ [On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning](https://arxiv.org//abs/2410.04682) ++ [On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning](https://arxiv.org/abs/2410.04682) Yongyi Su, Yushu Li, Nanqing Liu, Kui Jia, Xulei Yang, Chuan-Sheng Foo, Xun Xu -+ [FRIDA: Free-Rider Detection using Privacy Attacks](https://arxiv.org//abs/2410.05020) ++ [FRIDA: Free-Rider Detection using Privacy Attacks](https://arxiv.org/abs/2410.05020) Pol G. Recasens, Ádám Horváth, Alberto Gutierrez-Torre, Jordi Torres, Josep Ll.Berral, Balázs Pejó -+ [LOTOS: Layer-wise Orthogonalization for Training Robust Ensembles](https://arxiv.org//abs/2410.05136) ++ [LOTOS: Layer-wise Orthogonalization for Training Robust Ensembles](https://arxiv.org/abs/2410.05136) Ali Ebrahimpour-Boroojeny, Hari Sundaram, Varun Chandrasekaran -+ [AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models](https://arxiv.org//abs/2410.05346) ++ [AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models](https://arxiv.org/abs/2410.05346) Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Jitao Sang, Dit-Yan Yeung @@ -19607,96 +19607,96 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Chaithanya Bandi, Abir Harrasse # 2024-10-06 -+ [DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination](https://arxiv.org//abs/2410.04514) ++ [DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination](https://arxiv.org/abs/2410.04514) Xuan Gong, Tianshi Ming, Xinpeng Wang, Zhihua Wei -+ [Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning](https://arxiv.org//abs/2410.04524) ++ [Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning](https://arxiv.org/abs/2410.04524) Yanrui Du, Sendong Zhao, Jiawei Cao, Ming Ma, Danyang Zhao, Fenglei Fan, Ting Liu, Bing Qin -+ [Suspiciousness of Adversarial Texts to Human](https://arxiv.org//abs/2410.04377) ++ [Suspiciousness of Adversarial Texts to Human](https://arxiv.org/abs/2410.04377) Shakila Mahjabin Tonni, Pedro Faustini, Mark Dras -+ [DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion](https://arxiv.org//abs/2410.04372) ++ [DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion](https://arxiv.org/abs/2410.04372) Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji -+ [Robustness Reprogramming for Representation Learning](https://arxiv.org//abs/2410.04577) ++ [Robustness Reprogramming for Representation Learning](https://arxiv.org/abs/2410.04577) Zhichao Hou, MohamadAli Torkamani, Hamid Krim, Xiaorui Liu # 2024-10-05 -+ [Harnessing Task Overload for Scalable Jailbreak Attacks on Large Language Models](https://arxiv.org//abs/2410.04190) ++ [Harnessing Task Overload for Scalable Jailbreak Attacks on Large Language Models](https://arxiv.org/abs/2410.04190) Yiting Dong, Guobin Shen, Dongcheng Zhao, Xiang He, Yi Zeng -+ [Improving Generalization with Flat Hilbert Bayesian Inference](https://arxiv.org//abs/2410.04196) ++ [Improving Generalization with Flat Hilbert Bayesian Inference](https://arxiv.org/abs/2410.04196) Tuan Truong, Quyen Tran, Quan Pham-Ngoc, Nhat Ho, Dinh Phung, Trung Le -+ [Impact of Regularization on Calibration and Robustness: from the Representation Space Perspective](https://arxiv.org//abs/2410.03999) ++ [Impact of Regularization on Calibration and Robustness: from the Representation Space Perspective](https://arxiv.org/abs/2410.03999) Jonghyun Park, Juyeop Kim, Jong-Seok Lee # 2024-10-04 -+ [Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments](https://arxiv.org//abs/2410.03847) ++ [Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments](https://arxiv.org/abs/2410.03847) Simon Sinong Zhan, Qingyuan Wu, Philip Wang, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu -+ [Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step](https://arxiv.org//abs/2410.03869) ++ [Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step](https://arxiv.org/abs/2410.03869) Wenxuan Wang, Kuiyi Gao, Zihan Jia, Youliang Yuan, Jen-tse Huang, Qiuzhi Liu, Shuai Wang, Wenxiang Jiao, Zhaopeng Tu -+ [A Brain-Inspired Regularizer for Adversarial Robustness](https://arxiv.org//abs/2410.03952) ++ [A Brain-Inspired Regularizer for Adversarial Robustness](https://arxiv.org/abs/2410.03952) Elie Attias, Cengiz Pehlevan, Dina Obeid -+ [You Know What I'm Saying -- Jailbreak Attack via Implicit Reference](https://arxiv.org//abs/2410.03857) ++ [You Know What I'm Saying -- Jailbreak Attack via Implicit Reference](https://arxiv.org/abs/2410.03857) Tianyu Wu, Lingrui Mei, Ruibin Yuan, Lujun Li, Wei Xue, Yike Guo # 2024-10-03 -+ [Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge](https://arxiv.org//abs/2410.03775) ++ [Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge](https://arxiv.org/abs/2410.03775) Aparna Elangovan, Jongwoo Ko, Lei Xu, Mahsa Elyasi, Ling Liu, Sravan Bodapati, Dan Roth -+ [LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations](https://arxiv.org//abs/2410.02707) ++ [LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations](https://arxiv.org/abs/2410.02707) Hadas Orgad, Michael Toker, Zorik Gekhman, Roi Reichart, Idan Szpektor, Hadas Kotek, Yonatan Belinkov -+ [DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation](https://arxiv.org//abs/2410.03782) ++ [DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation](https://arxiv.org/abs/2410.03782) Changdae Oh, Yixuan Li, Kyungwoo Song, Sangdoo Yun, Dongyoon Han -+ [CPFD: Confidence-aware Privileged Feature Distillation for Short Video Classification](https://arxiv.org//abs/2410.03038) ++ [CPFD: Confidence-aware Privileged Feature Distillation for Short Video Classification](https://arxiv.org/abs/2410.03038) Jinghao Shi, Xiang Shen, Kaili Zhao, Xuedong Wang, Vera Wen, Zixuan Wang, Yifan Wu, Zhixin Zhang -+ [Optimizing Adaptive Attacks against Watermarks for Language Models](https://arxiv.org//abs/2410.02440) ++ [Optimizing Adaptive Attacks against Watermarks for Language Models](https://arxiv.org/abs/2410.02440) Abdulrahman Diaa, Toluwani Aremu, Nils Lukas -+ [Discovering Spoofing Attempts on Language Model Watermarks](https://arxiv.org//abs/2410.02693) ++ [Discovering Spoofing Attempts on Language Model Watermarks](https://arxiv.org/abs/2410.02693) Thibaud Gloaguen, Nikola Jovanović, Robin Staab, Martin Vechev @@ -19704,22 +19704,22 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Shuangpeng Han, Mengmi Zhang -+ [Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models](https://arxiv.org//abs/2410.03039) ++ [Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models](https://arxiv.org/abs/2410.03039) Xiaoyu Wu, Jiaru Zhang, Zhiwei Steven Wu # 2024-10-02 -+ [Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs](https://arxiv.org//abs/2410.03768) ++ [Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs](https://arxiv.org/abs/2410.03768) Yohan Mathew, Ollie Matthews, Robert McCarthy, Joan Velja, Christian Schroeder de Witt, Dylan Cope, Nandi Schoots -+ [Adversarial Robustness of AI-Generated Image Detectors in the Real World](https://arxiv.org//abs/2410.01574) ++ [Adversarial Robustness of AI-Generated Image Detectors in the Real World](https://arxiv.org/abs/2410.01574) Sina Mavali, Jonas Ricker, David Pape, Asja Fischer, Lea Schönherr -+ [Deep Unlearn: Benchmarking Machine Unlearning for Image Classification](https://arxiv.org//abs/2410.01276) ++ [Deep Unlearn: Benchmarking Machine Unlearning for Image Classification](https://arxiv.org/abs/2410.01276) Xavier F. Cadet, Anastasia Borovykh, Mohammad Malekzadeh, Sara Ahmadi-Abhari, Hamed Haddadi @@ -19729,486 +19729,486 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yang Li, Wenhan Yu, Jun Zhao # 2024-09-29 -+ [BadHMP: Backdoor Attack against Human Motion Prediction](https://arxiv.org//abs/2409.19638) ++ [BadHMP: Backdoor Attack against Human Motion Prediction](https://arxiv.org/abs/2409.19638) Chaohui Xu, Si Wang, Chip-Hong Chang # 2024-09-27 -+ [Predicting memorization within Large Language Models fine-tuned for classification](https://arxiv.org//abs/2409.18858) ++ [Predicting memorization within Large Language Models fine-tuned for classification](https://arxiv.org/abs/2409.18858) Jérémie Dentan, Davide Buscaldi, Aymen Shabou, Sonia Vanier -+ [Multimodal Pragmatic Jailbreak on Text-to-image Models](https://arxiv.org//abs/2409.19149) ++ [Multimodal Pragmatic Jailbreak on Text-to-image Models](https://arxiv.org/abs/2409.19149) Tong Liu, Zhixin Lai, Jiawen Wang, Gengyuan Zhang, Shuo Chen, Philip Torr, Vera Demberg, Volker Tresp, Jindong Gu -+ [Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems](https://arxiv.org//abs/2409.18708) ++ [Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems](https://arxiv.org/abs/2409.18708) Sergey Berezin, Reza Farahbakhsh, Noel Crespi # 2024-09-26 -+ [Showing Many Labels in Multi-label Classification Models: An Empirical Study of Adversarial Examples](https://arxiv.org//abs/2409.17568) ++ [Showing Many Labels in Multi-label Classification Models: An Empirical Study of Adversarial Examples](https://arxiv.org/abs/2409.17568) Yujiang Liu, Wenjian Luo, Zhijian Chen, Muhammad Luqman Naseem -+ [DarkSAM: Fooling Segment Anything Model to Segment Nothing](https://arxiv.org//abs/2409.17874) ++ [DarkSAM: Fooling Segment Anything Model to Segment Nothing](https://arxiv.org/abs/2409.17874) Ziqi Zhou, Yufei Song, Minghui Li, Shengshan Hu, Xianlong Wang, Leo Yu Zhang, Dezhong Yao, Hai Jin -+ [Improving Fast Adversarial Training via Self-Knowledge Guidance](https://arxiv.org//abs/2409.17589) ++ [Improving Fast Adversarial Training via Self-Knowledge Guidance](https://arxiv.org/abs/2409.17589) Chengze Jiang, Junkai Wang, Minjing Dong, Jie Gui, Xinli Shi, Yuan Cao, Yuan Yan Tang, James Tin-Yau Kwok -+ [TA-Cleaner: A Fine-grained Text Alignment Backdoor Defense Strategy for Multimodal Contrastive Learning](https://arxiv.org//abs/2409.17601) ++ [TA-Cleaner: A Fine-grained Text Alignment Backdoor Defense Strategy for Multimodal Contrastive Learning](https://arxiv.org/abs/2409.17601) Yuan Xun, Siyuan Liang, Xiaojun Jia, Xinwei Liu, Xiaochun Cao -+ [Efficient Bias Mitigation Without Privileged Information](https://arxiv.org//abs/2409.17691) ++ [Efficient Bias Mitigation Without Privileged Information](https://arxiv.org/abs/2409.17691) Mateo Espinosa Zarlenga, Swami Sankaranarayanan, Jerone T. A. Andrews, Zohreh Shams, Mateja Jamnik, Alice Xiang -+ [MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks](https://arxiv.org//abs/2409.17699) ++ [MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks](https://arxiv.org/abs/2409.17699) Giandomenico Cornacchia, Giulio Zizzo, Kieran Fraser, Muhammad Zaid Hamed, Ambrish Rawat, Mark Purcell -+ [Federated Learning under Attack: Improving Gradient Inversion for Batch of Images](https://arxiv.org//abs/2409.17767) ++ [Federated Learning under Attack: Improving Gradient Inversion for Batch of Images](https://arxiv.org/abs/2409.17767) Luiz Leite, Yuri Santo, Bruno L. Dalmazo, André Riker -+ [Faithfulness and the Notion of Adversarial Sensitivity in NLP Explanations](https://arxiv.org//abs/2409.17774) ++ [Faithfulness and the Notion of Adversarial Sensitivity in NLP Explanations](https://arxiv.org/abs/2409.17774) Supriya Manna, Niladri Sett -+ [PhantomLiDAR: Cross-modality Signal Injection Attacks against LiDAR](https://arxiv.org//abs/2409.17907) ++ [PhantomLiDAR: Cross-modality Signal Injection Attacks against LiDAR](https://arxiv.org/abs/2409.17907) Zizhi Jin, Qinhong Jiang, Xuancun Lu, Chen Yan, Xiaoyu Ji, Wenyuan Xu -+ [Weak-To-Strong Backdoor Attacks for LLMs with Contrastive Knowledge Distillation](https://arxiv.org//abs/2409.17946) ++ [Weak-To-Strong Backdoor Attacks for LLMs with Contrastive Knowledge Distillation](https://arxiv.org/abs/2409.17946) Shuai Zhao, Leilei Gan, Zhongliang Guo, Xiaobao Wu, Luwei Xiao, Xiaoyu Xu, Cong-Duy Nguyen, Luu Anh Tuan -+ [An Adversarial Perspective on Machine Unlearning for AI Safety](https://arxiv.org//abs/2409.18025) ++ [An Adversarial Perspective on Machine Unlearning for AI Safety](https://arxiv.org/abs/2409.18025) Jakub Łucki, Boyi Wei, Yangsibo Huang, Peter Henderson, Florian Tramèr, Javier Rando -+ [DARE: Diverse Visual Question Answering with Robustness Evaluation](https://arxiv.org//abs/2409.18023) ++ [DARE: Diverse Visual Question Answering with Robustness Evaluation](https://arxiv.org/abs/2409.18023) Hannah Sterz, Jonas Pfeiffer, Ivan Vulić -+ [RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking](https://arxiv.org//abs/2409.17458) ++ [RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking](https://arxiv.org/abs/2409.17458) Yifan Jiang, Kriti Aggarwal, Tanmay Laud, Kashif Munir, Jay Pujara, Subhabrata Mukherjee -+ [HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection](https://arxiv.org//abs/2409.17504) ++ [HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection](https://arxiv.org/abs/2409.17504) Xuefeng Du, Chaowei Xiao, Yixuan Li -+ [Dark Miner: Defend against unsafe generation for text-to-image diffusion models](https://arxiv.org//abs/2409.17682) ++ [Dark Miner: Defend against unsafe generation for text-to-image diffusion models](https://arxiv.org/abs/2409.17682) Zheling Meng, Bo Peng, Xiaochuan Jin, Yue Jiang, Jing Dong, Wei Wang, Tieniu Tan -+ [Perturb, Attend, Detect and Localize (PADL): Robust Proactive Image Defense](https://arxiv.org//abs/2409.17941) ++ [Perturb, Attend, Detect and Localize (PADL): Robust Proactive Image Defense](https://arxiv.org/abs/2409.17941) Filippo Bartolucci, Iacopo Masi, Giuseppe Lisanti -+ [CNCA: Toward Customizable and Natural Generation of Adversarial Camouflage for Vehicle Detectors](https://arxiv.org//abs/2409.17963) ++ [CNCA: Toward Customizable and Natural Generation of Adversarial Camouflage for Vehicle Detectors](https://arxiv.org/abs/2409.17963) Linye Lyu, Jiawei Zhou, Daojing He, Yu Li -+ [Cross-Modality Attack Boosted by Gradient-Evolutionary Multiform Optimization](https://arxiv.org//abs/2409.17977) ++ [Cross-Modality Attack Boosted by Gradient-Evolutionary Multiform Optimization](https://arxiv.org/abs/2409.17977) Yunpeng Gong, Qingyuan Zeng, Dejun Xu, Zhenzhong Wang, Min Jiang -+ [Discovering New Shadow Patterns for Black-Box Attacks on Lane Detection of Autonomous Vehicles](https://arxiv.org//abs/2409.18248) ++ [Discovering New Shadow Patterns for Black-Box Attacks on Lane Detection of Autonomous Vehicles](https://arxiv.org/abs/2409.18248) Pedram MohajerAnsari, Amir Salarpour, Jan de Voor, Alkim Domeke, Arkajyoti Mitra, Grace Johnson, Habeeb Olufowobi, Mohammad Hamad, Mert D. Pese -+ [AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure](https://arxiv.org//abs/2409.17642) ++ [AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure](https://arxiv.org/abs/2409.17642) Zhiyang Zhang, Xi Chen, Fangkai Yang, Xiaoting Qin, Chao Du, Xi Cheng, Hangxin Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang -+ [Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey](https://arxiv.org//abs/2409.18214) ++ [Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey](https://arxiv.org/abs/2409.18214) Yi Zhang, Zhen Chen, Chih-Hong Cheng, Wenjie Ruan, Xiaowei Huang, Dezong Zhao, David Flynn, Siddartha Khastgir, Xingyu Zhao # 2024-09-25 -+ [Claim-Guided Textual Backdoor Attack for Practical Applications](https://arxiv.org//abs/2409.16618) ++ [Claim-Guided Textual Backdoor Attack for Practical Applications](https://arxiv.org/abs/2409.16618) Minkyoo Song, Hanna Kim, Jaehan Kim, Youngjin Jin, Seungwon Shin -+ [RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems](https://arxiv.org//abs/2409.16727) ++ [RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems](https://arxiv.org/abs/2409.16727) Yihong Tang, Bo Wang, Xu Wang, Dongming Zhao, Jing Liu, Jijun Zhang, Ruifang He, Yuexian Hou -+ [EventHallusion: Diagnosing Event Hallucinations in Video LLMs](https://arxiv.org//abs/2409.16597) ++ [EventHallusion: Diagnosing Event Hallucinations in Video LLMs](https://arxiv.org/abs/2409.16597) Jiacheng Zhang, Yang Jiao, Shaoxiang Chen, Jingjing Chen, Yu-Gang Jiang -+ [Verified Relative Safety Margins for Neural Network Twins](https://arxiv.org//abs/2409.16726) ++ [Verified Relative Safety Margins for Neural Network Twins](https://arxiv.org/abs/2409.16726) Anahita Baninajjar, Kamran Hosseini, Ahmed Rezine, Amir Aminifar -+ [RESAA: A Removal and Structural Analysis Attack Against Compound Logic Locking](https://arxiv.org//abs/2409.16959) ++ [RESAA: A Removal and Structural Analysis Attack Against Compound Logic Locking](https://arxiv.org/abs/2409.16959) Felipe Almeida, Levent Aksoy, Samuel Pagliarini -+ [Transient Adversarial 3D Projection Attacks on Object Detection in Autonomous Driving](https://arxiv.org//abs/2409.17403) ++ [Transient Adversarial 3D Projection Attacks on Object Detection in Autonomous Driving](https://arxiv.org/abs/2409.17403) Ce Zhou, Qiben Yan, Sijia Liu -+ [Optical Lens Attack on Deep Learning Based Monocular Depth Estimation](https://arxiv.org//abs/2409.17376) ++ [Optical Lens Attack on Deep Learning Based Monocular Depth Estimation](https://arxiv.org/abs/2409.17376) Ce Zhou, Qiben Yan, Daniel Kent, Guangjing Wang, Ziqi Zhang, Hayder Radha -+ [SHEATH: Defending Horizontal Collaboration for Distributed CNNs against Adversarial Noise](https://arxiv.org//abs/2409.17279) ++ [SHEATH: Defending Horizontal Collaboration for Distributed CNNs against Adversarial Noise](https://arxiv.org/abs/2409.17279) Muneeba Asif, Mohammad Kumail Kazmi, Mohammad Ashiqur Rahman, Syed Rafay Hasan, Soamar Homsi # 2024-09-24 -+ [Revisiting Acoustic Features for Robust ASR](https://arxiv.org//abs/2409.16399) ++ [Revisiting Acoustic Features for Robust ASR](https://arxiv.org/abs/2409.16399) Muhammad A. Shah, Bhiksha Raj -+ [A Unified Hallucination Mitigation Framework for Large Vision-Language Models](https://arxiv.org//abs/2409.16494) ++ [A Unified Hallucination Mitigation Framework for Large Vision-Language Models](https://arxiv.org/abs/2409.16494) Yue Chang, Liqiang Jing, Xiaopeng Zhang, Yue Zhang -+ [Proactive Schemes: A Survey of Adversarial Attacks for Social Good](https://arxiv.org//abs/2409.16491) ++ [Proactive Schemes: A Survey of Adversarial Attacks for Social Good](https://arxiv.org/abs/2409.16491) Vishal Asnani, Xi Yin, Xiaoming Liu # 2024-09-23 -+ [PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs](https://arxiv.org//abs/2409.14729) ++ [PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs](https://arxiv.org/abs/2409.14729) Jiahao Yu, Yangguang Shao, Hanwen Miao, Junzheng Shi, Xinyu Xing -+ [Effective and Evasive Fuzz Testing-Driven Jailbreaking Attacks against LLMs](https://arxiv.org//abs/2409.14866) ++ [Effective and Evasive Fuzz Testing-Driven Jailbreaking Attacks against LLMs](https://arxiv.org/abs/2409.14866) Xueluan Gong, Mingzhe Li, Yilin Zhang, Fengyuan Ran, Chen Chen, Yanjiao Chen, Qian Wang, Kwok-Yan Lam -+ [Robust Federated Learning Over the Air: Combating Heavy-Tailed Noise with Median Anchored Clipping](https://arxiv.org//abs/2409.15100) ++ [Robust Federated Learning Over the Air: Combating Heavy-Tailed Noise with Median Anchored Clipping](https://arxiv.org/abs/2409.15100) Jiaxing Li, Zihan Chen, Kai Fong Ernest Chong, Bikramjit Das, Tony Q. S. Quek, Howard H. Yang -+ [Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models](https://arxiv.org//abs/2409.14785) ++ [Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models](https://arxiv.org/abs/2409.14785) Patrick Amadeus Irawan, Genta Indra Winata, Samuel Cahyawijaya, Ayu Purwarianti -+ [Evaluating the Usability of LLMs in Threat Intelligence Enrichment](https://arxiv.org//abs/2409.15072) ++ [Evaluating the Usability of LLMs in Threat Intelligence Enrichment](https://arxiv.org/abs/2409.15072) Sanchana Srikanth, Mohammad Hasanuzzaman, Farah Tasnur Meem -+ [Improving Adversarial Robustness for 3D Point Cloud Recognition at Test-Time through Purified Self-Training](https://arxiv.org//abs/2409.14940) ++ [Improving Adversarial Robustness for 3D Point Cloud Recognition at Test-Time through Purified Self-Training](https://arxiv.org/abs/2409.14940) Jinpeng Lin, Xulei Yang, Tianrui Li, Xun Xu -+ [Interpretability-Guided Test-Time Adversarial Defense](https://arxiv.org//abs/2409.15190) ++ [Interpretability-Guided Test-Time Adversarial Defense](https://arxiv.org/abs/2409.15190) Akshay Kulkarni, Tsui-Wei Weng -+ [RoWSFormer: A Robust Watermarking Framework with Swin Transformer for Enhanced Geometric Attack Resilience](https://arxiv.org//abs/2409.14829) ++ [RoWSFormer: A Robust Watermarking Framework with Swin Transformer for Enhanced Geometric Attack Resilience](https://arxiv.org/abs/2409.14829) Weitong Chen, Yuheng Li -+ [SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning](https://arxiv.org//abs/2409.14805) ++ [SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning](https://arxiv.org/abs/2409.14805) Minyeong Choe, Cheolhee Park, Changho Seo, Hyunil Kim -+ [Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI](https://arxiv.org//abs/2409.15398) ++ [Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI](https://arxiv.org/abs/2409.15398) Ambrish Rawat, Stefan Schoepf, Giulio Zizzo, Giandomenico Cornacchia, Muhammad Zaid Hameed, Kieran Fraser, Erik Miehling, Beat Buesser, Elizabeth M. Daly, Mark Purcell, Prasanna Sattigeri, Pin-Yu Chen, Kush R. Varshney -+ [In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models](https://arxiv.org//abs/2409.15454) ++ [In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models](https://arxiv.org/abs/2409.15454) Pengrui Han, Peiyang Song, Haofei Yu, Jiaxuan You # 2024-09-22 -+ [Dormant: Defending against Pose-driven Human Image Animation](https://arxiv.org//abs/2409.14424) ++ [Dormant: Defending against Pose-driven Human Image Animation](https://arxiv.org/abs/2409.14424) Jiachen Zhou, Mingsi Wang, Tianlin Li, Guozhu Meng, Kai Chen -+ [Enhancing LLM-based Autonomous Driving Agents to Mitigate Perception Attacks](https://arxiv.org//abs/2409.14488) ++ [Enhancing LLM-based Autonomous Driving Agents to Mitigate Perception Attacks](https://arxiv.org/abs/2409.14488) Ruoyu Song, Muslum Ozgur Ozmen, Hyungsub Kim, Antonio Bianchi, Z. Berkay Celik -+ [Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions](https://arxiv.org//abs/2409.14572) ++ [Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions](https://arxiv.org/abs/2409.14572) Hongchen Wang, Kangming Li, Scott Ramsay, Yao Fehlis, Edward Kim, Jason Hattrick-Simpers -+ [Backtracking Improves Generation Safety](https://arxiv.org//abs/2409.14586) ++ [Backtracking Improves Generation Safety](https://arxiv.org/abs/2409.14586) Yiming Zhang, Jianfeng Chi, Hailey Nguyen, Kartikeya Upasani, Daniel M. Bikel, Jason Weston, Eric Michael Smith # 2024-09-21 -+ [PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement Learning-Based Jailbreak Approach](https://arxiv.org//abs/2409.14177) ++ [PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement Learning-Based Jailbreak Approach](https://arxiv.org/abs/2409.14177) Zhihao Lin, Wei Ma, Mingyi Zhou, Yanjie Zhao, Haoyu Wang, Yang Liu, Jun Wang, Li Li -+ [Data-centric NLP Backdoor Defense from the Lens of Memorization](https://arxiv.org//abs/2409.14200) ++ [Data-centric NLP Backdoor Defense from the Lens of Memorization](https://arxiv.org/abs/2409.14200) Zhenting Wang, Zhizhi Wang, Mingyu Jin, Mengnan Du, Juan Zhai, Shiqing Ma -+ [When Witnesses Defend: A Witness Graph Topological Layer for Adversarial Graph Learning](https://arxiv.org//abs/2409.14161) ++ [When Witnesses Defend: A Witness Graph Topological Layer for Adversarial Graph Learning](https://arxiv.org/abs/2409.14161) Naheed Anjum Arafat, Debabrota Basu, Yulia Gel, Yuzhou Chen -+ [Adversarial Attacks on Parts of Speech: An Empirical Study in Text-to-Image Generation](https://arxiv.org//abs/2409.15381) ++ [Adversarial Attacks on Parts of Speech: An Empirical Study in Text-to-Image Generation](https://arxiv.org/abs/2409.15381) G M Shahariar, Jia Chen, Jiachen Li, Yue Dong # 2024-09-20 -+ [Relationship between Uncertainty in DNNs and Adversarial Attacks](https://arxiv.org//abs/2409.13232) ++ [Relationship between Uncertainty in DNNs and Adversarial Attacks](https://arxiv.org/abs/2409.13232) Abigail Adeniran, Adewale Adeyemo -+ [ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification](https://arxiv.org//abs/2409.13349) ++ [ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification](https://arxiv.org/abs/2409.13349) Zuomin Qu, Wei Lu, Xiangyang Luo, Qian Wang, Xiaochun Cao -+ [Certified Adversarial Robustness via Partition-based Randomized Smoothing](https://arxiv.org//abs/2409.13546) ++ [Certified Adversarial Robustness via Partition-based Randomized Smoothing](https://arxiv.org/abs/2409.13546) Hossein Goli, Farzan Farnia -+ [Efficient Visualization of Neural Networks with Generative Models and Adversarial Perturbations](https://arxiv.org//abs/2409.13559) ++ [Efficient Visualization of Neural Networks with Generative Models and Adversarial Perturbations](https://arxiv.org/abs/2409.13559) Athanasios Karagounis -+ [Neurosymbolic Conformal Classification](https://arxiv.org//abs/2409.13585) ++ [Neurosymbolic Conformal Classification](https://arxiv.org/abs/2409.13585) Arthur Ledaguenel, Céline Hudelot, Mostepha Khouadjia -+ [Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection](https://arxiv.org//abs/2409.13331) ++ [Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection](https://arxiv.org/abs/2409.13331) Md Abdur Rahman, Hossain Shahriar, Fan Wu, Alfredo Cuzzocrea -+ [Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks](https://arxiv.org//abs/2409.13464) ++ [Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks](https://arxiv.org/abs/2409.13464) Guibiao Liao, Wei Gao -+ [PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models](https://arxiv.org//abs/2409.13945) ++ [PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models](https://arxiv.org/abs/2409.13945) Vu Tuan Truong, Long Bao Le -+ [MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety](https://arxiv.org//abs/2409.13867) ++ [MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety](https://arxiv.org/abs/2409.13867) Justin Wang, Haimin Hu, Duy Phuong Nguyen, Jaime Fernández Fisac -+ [ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer](https://arxiv.org//abs/2409.13828) ++ [ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer](https://arxiv.org/abs/2409.13828) Shihua Sun, Kenechukwu Nwodo, Shridatt Sugrim, Angelos Stavrou, Haining Wang -+ [Persistent Backdoor Attacks in Continual Learning](https://arxiv.org//abs/2409.13864) ++ [Persistent Backdoor Attacks in Continual Learning](https://arxiv.org/abs/2409.13864) Zhen Guo, Abhinav Kumar, Reza Tourani # 2024-09-19 -+ [Privacy-Preserving Student Learning with Differentially Private Data-Free Distillation](https://arxiv.org//abs/2409.12384) ++ [Privacy-Preserving Student Learning with Differentially Private Data-Free Distillation](https://arxiv.org/abs/2409.12384) Bochao Liu, Jianghu Lu, Pengju Wang, Junjie Zhang, Dan Zeng, Zhenxing Qian, Shiming Ge -+ [Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition](https://arxiv.org//abs/2409.12386) ++ [Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition](https://arxiv.org/abs/2409.12386) Chien-Chun Wang, Li-Wei Chen, Cheng-Kang Chou, Hung-Shin Lee, Berlin Chen, Hsin-Min Wang -+ [ITPatch: An Invisible and Triggered Physical Adversarial Patch against Traffic Sign Recognition](https://arxiv.org//abs/2409.12394) ++ [ITPatch: An Invisible and Triggered Physical Adversarial Patch against Traffic Sign Recognition](https://arxiv.org/abs/2409.12394) Shuai Yuan, Hongwei Li, Xingshuo Han, Guowen Xu, Wenbo Jiang, Tao Ni, Qingchuan Zhao, Yuguang Fang -+ [TEAM: Temporal Adversarial Examples Attack Model against Network Intrusion Detection System Applied to RNN](https://arxiv.org//abs/2409.12472) ++ [TEAM: Temporal Adversarial Examples Attack Model against Network Intrusion Detection System Applied to RNN](https://arxiv.org/abs/2409.12472) Ziyi Liu, Dengpan Ye, Long Tang, Yunming Zhang, Jiacheng Deng -+ [The Robustness of Spiking Neural Networks in Communication and its Application towards Network Efficiency in Federated Learning](https://arxiv.org//abs/2409.12769) ++ [The Robustness of Spiking Neural Networks in Communication and its Application towards Network Efficiency in Federated Learning](https://arxiv.org/abs/2409.12769) Manh V. Nguyen, Liang Zhao, Bobin Deng, William Severa, Honghui Xu, Shaoen Wu -+ [Defending against Reverse Preference Attacks is Difficult](https://arxiv.org//abs/2409.12914) ++ [Defending against Reverse Preference Attacks is Difficult](https://arxiv.org/abs/2409.12914) Domenic Rosati, Giles Edkins, Harsh Raj, David Atanasov, Subhabrata Majumdar, Janarthanan Rajendran, Frank Rudzicz, Hassan Sajjad -+ [Enhancing 3D Robotic Vision Robustness by Minimizing Adversarial Mutual Information through a Curriculum Training Approach](https://arxiv.org//abs/2409.12379) ++ [Enhancing 3D Robotic Vision Robustness by Minimizing Adversarial Mutual Information through a Curriculum Training Approach](https://arxiv.org/abs/2409.12379) Nastaran Darabi, Dinithi Jayasuriya, Devashri Naik, Theja Tulabandhula, Amit Ranjan Trivedi -+ [Revisiting Semi-supervised Adversarial Robustness via Noise-aware Online Robust Distillation](https://arxiv.org//abs/2409.12946) ++ [Revisiting Semi-supervised Adversarial Robustness via Noise-aware Online Robust Distillation](https://arxiv.org/abs/2409.12946) Tsung-Han Wu, Hung-Ting Su, Shang-Tse Chen, Winston H. Hsu -+ [On the Regret of Coded Caching with Adversarial Requests](https://arxiv.org//abs/2409.12387) ++ [On the Regret of Coded Caching with Adversarial Requests](https://arxiv.org/abs/2409.12387) Anupam Nayak, Kota Srinivas Reddy, Nikhil Karamchandani -+ [On the Hardness of Decentralized Multi-Agent Policy Evaluation under Byzantine Attacks](https://arxiv.org//abs/2409.12882) ++ [On the Hardness of Decentralized Multi-Agent Policy Evaluation under Byzantine Attacks](https://arxiv.org/abs/2409.12882) Hairi, Minghong Fang, Zifan Zhang, Alvaro Velasquez, Jia Liu -+ [VCAT: Vulnerability-aware and Curiosity-driven Adversarial Training for Enhancing Autonomous Vehicle Robustness](https://arxiv.org//abs/2409.12997) ++ [VCAT: Vulnerability-aware and Curiosity-driven Adversarial Training for Enhancing Autonomous Vehicle Robustness](https://arxiv.org/abs/2409.12997) Xuan Cai, Zhiyong Cui, Xuesong Bai, Ruimin Ke, Zhenshu Ma, Haiyang Yu, Yilong Ren -+ [FedAT: Federated Adversarial Training for Distributed Insider Threat Detection](https://arxiv.org//abs/2409.13083) ++ [FedAT: Federated Adversarial Training for Distributed Insider Threat Detection](https://arxiv.org/abs/2409.13083) R G Gayathri, Atul Sajjanhar, Md Palash Uddin, Yong Xiang # 2024-09-18 -+ [GReDP: A More Robust Approach for Differential Privacy Training with Gradient-Preserving Noise Reduction](https://arxiv.org//abs/2409.11663) ++ [GReDP: A More Robust Approach for Differential Privacy Training with Gradient-Preserving Noise Reduction](https://arxiv.org/abs/2409.11663) Haodi Wang, Tangyu Jiang, Yu Guo, Xiaohua Jia, Chengjun Cai -+ [NPAT Null-Space Projected Adversarial Training Towards Zero Deterioration](https://arxiv.org//abs/2409.11754) ++ [NPAT Null-Space Projected Adversarial Training Towards Zero Deterioration](https://arxiv.org/abs/2409.11754) Hanyi Hu, Qiao Han, Kui Chen, Yao Yang -+ [PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning](https://arxiv.org//abs/2409.12072) ++ [PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning](https://arxiv.org/abs/2409.12072) Yukai Xu, Yujie Gu, Kouichi Sakurai # 2024-09-17 -+ [Jailbreaking Large Language Models with Symbolic Mathematics](https://arxiv.org//abs/2409.11445) ++ [Jailbreaking Large Language Models with Symbolic Mathematics](https://arxiv.org/abs/2409.11445) Emet Bethany, Mazal Bethany, Juan Arturo Nolazco Flores, Sumit Kumar Jha, Peyman Najafirad -+ [Golden Ratio Search: A Low-Power Adversarial Attack for Deep Learning based Modulation Classification](https://arxiv.org//abs/2409.11454) ++ [Golden Ratio Search: A Low-Power Adversarial Attack for Deep Learning based Modulation Classification](https://arxiv.org/abs/2409.11454) Deepsayan Sadhukhan, Nitin Priyadarshini Shankar, Sheetal Kalyani # 2024-09-16 -+ [HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making](https://arxiv.org//abs/2409.10011) ++ [HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making](https://arxiv.org/abs/2409.10011) Sumera Anjum, Hanzhi Zhang, Wenjun Zhou, Eun Jin Paek, Xiaopeng Zhao, Yunhe Feng -+ [Towards Physically-Realizable Adversarial Attacks in Embodied Vision Navigation](https://arxiv.org//abs/2409.10071) ++ [Towards Physically-Realizable Adversarial Attacks in Embodied Vision Navigation](https://arxiv.org/abs/2409.10071) Meng Chen, Jiawei Tu, Chao Qi, Yonghao Dang, Feng Zhou, Wei Wei, Jianqin Yin -+ [Federated Learning for Smart Grid: A Survey on Applications and Potential Vulnerabilities](https://arxiv.org//abs/2409.10764) ++ [Federated Learning for Smart Grid: A Survey on Applications and Potential Vulnerabilities](https://arxiv.org/abs/2409.10764) Zikai Zhang, Suman Rath, Jiaohao Xu, Tingsong Xiao # 2024-09-13 -+ [Fingerprint Vector: Enabling Scalable and Efficient Model Fingerprint Transfer via Vector Addition](https://arxiv.org//abs/2409.08846) ++ [Fingerprint Vector: Enabling Scalable and Efficient Model Fingerprint Transfer via Vector Addition](https://arxiv.org/abs/2409.08846) Zhenhua Xu, Qichen Liu, Zhebo Wang, Wenpeng Xing, Dezhang Kong, Mohan Li, Meng Han # 2024-09-12 -+ [A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning](https://arxiv.org//abs/2409.07775) ++ [A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning](https://arxiv.org/abs/2409.07775) Yinbo Yu, Saihao Yan, Jiajia Liu -+ [Attack End-to-End Autonomous Driving through Module-Wise Noise](https://arxiv.org//abs/2409.07706) ++ [Attack End-to-End Autonomous Driving through Module-Wise Noise](https://arxiv.org/abs/2409.07706) Lu Wang, Tianyuan Zhang, Yikai Han, Muyang Fang, Ting Jin, Jiaqi Kang -+ [Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking](https://arxiv.org//abs/2409.08045) ++ [Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking](https://arxiv.org/abs/2409.08045) Stav Cohen, Ron Bitton, Ben Nassi -+ [LoRID: Low-Rank Iterative Diffusion for Adversarial Purification](https://arxiv.org//abs/2409.08255) ++ [LoRID: Low-Rank Iterative Diffusion for Adversarial Purification](https://arxiv.org/abs/2409.08255) Geigh Zollicoffer, Minh Vu, Ben Nebgen, Juan Castorena, Boian Alexandrov, Manish Bhattarai -+ [GAZEploit: Remote Keystroke Inference Attack by Gaze Estimation from Avatar Views in VR/MR Devices](https://arxiv.org//abs/2409.08122) ++ [GAZEploit: Remote Keystroke Inference Attack by Gaze Estimation from Avatar Views in VR/MR Devices](https://arxiv.org/abs/2409.08122) Hanqiu Wang, Zihao Zhan, Haoqi Shan, Siqi Dai, Max Panoff, Shuo Wang -+ [Efficient Privacy-Preserving KAN Inference Using Homomorphic Encryption](https://arxiv.org//abs/2409.07751) ++ [Efficient Privacy-Preserving KAN Inference Using Homomorphic Encryption](https://arxiv.org/abs/2409.07751) Zhizheng Lai, Yufei Zhou, Peijia Zheng, Lin Chen -+ [DFDG: Data-Free Dual-Generator Adversarial Distillation for One-Shot Federated Learning](https://arxiv.org//abs/2409.07734) ++ [DFDG: Data-Free Dual-Generator Adversarial Distillation for One-Shot Federated Learning](https://arxiv.org/abs/2409.07734) Kangyang Luo, Shuai Wang, Yexuan Fu, Renrong Shao, Xiang Li, Yunshi Lan, Ming Gao, Jinlong Shu @@ -20218,514 +20218,514 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Nikolai L. Kühne, Astrid H. F. Kitchen, Marie S. Jensen, Mikkel S. L. Brøndt, Martin Gonzalez, Christophe Biscio, Zheng-Hua Tan -+ [LogoRA: Local-Global Representation Alignment for Robust Time Series Classification](https://arxiv.org//abs/2409.12169) ++ [LogoRA: Local-Global Representation Alignment for Robust Time Series Classification](https://arxiv.org/abs/2409.12169) Huanyu Zhang, Yi-Fan Zhang, Zhang Zhang, Qingsong Wen, Liang Wang -+ [Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data](https://arxiv.org//abs/2409.11423) ++ [Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data](https://arxiv.org/abs/2409.11423) Atilla Akkus, Mingjie Li, Junjie Chu, Michael Backes, Yang Zhang, Sinem Sav -+ [On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains](https://arxiv.org//abs/2409.17275) ++ [On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains](https://arxiv.org/abs/2409.17275) Xun Xian, Ganghua Wang, Xuan Bi, Jayanth Srinivasa, Ashish Kundu, Charles Fleming, Mingyi Hong, Jie Ding # 2024-09-11 -+ [Cyber Deception: State of the art, Trends and Open challenges](https://arxiv.org//abs/2409.07194) ++ [Cyber Deception: State of the art, Trends and Open challenges](https://arxiv.org/abs/2409.07194) Pedro Beltrán López, Manuel Gil Pérez, Pantaleone Nespoli -+ [Exploring User-level Gradient Inversion with a Diffusion Prior](https://arxiv.org//abs/2409.07291) ++ [Exploring User-level Gradient Inversion with a Diffusion Prior](https://arxiv.org/abs/2409.07291) Zhuohang Li, Andrew Lowy, Jing Liu, Toshiaki Koike-Akino, Bradley Malin, Kieran Parsons, Ye Wang -+ [Module-wise Adaptive Adversarial Training for End-to-end Autonomous Driving](https://arxiv.org//abs/2409.07321) ++ [Module-wise Adaptive Adversarial Training for End-to-end Autonomous Driving](https://arxiv.org/abs/2409.07321) Tianyuan Zhang, Lu Wang, Jiaqi Kang, Xinwei Zhang, Siyuan Liang, Yuwei Chen, Aishan Liu, Xianglong Liu -+ [Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks](https://arxiv.org//abs/2409.07353) ++ [Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks](https://arxiv.org/abs/2409.07353) Md Zarif Hossain, Ahmed Imteaj -+ [Introducing Perturb-ability Score (PS) to Enhance Robustness Against Evasion Adversarial Attacks on ML-NIDS](https://arxiv.org//abs/2409.07448) ++ [Introducing Perturb-ability Score (PS) to Enhance Robustness Against Evasion Adversarial Attacks on ML-NIDS](https://arxiv.org/abs/2409.07448) Mohamed elShehaby, Ashraf Matrawy -+ [AdvLogo: Adversarial Patch Attack against Object Detectors based on Diffusion Models](https://arxiv.org//abs/2409.07002) ++ [AdvLogo: Adversarial Patch Attack against Object Detectors based on Diffusion Models](https://arxiv.org/abs/2409.07002) Boming Miao, Chunxiao Li, Yao Zhu, Weixiang Sun, Zizhe Wang, Xiaoyi Wang, Chuanlong Xie -+ [AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs](https://arxiv.org//abs/2409.07503) ++ [AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs](https://arxiv.org/abs/2409.07503) Lijia Lv, Weigang Zhang, Xuehai Tang, Jie Wen, Feng Liu, Jizhong Han, Songlin Hu -+ [A Cost-Aware Approach to Adversarial Robustness in Neural Networks](https://arxiv.org//abs/2409.07609) ++ [A Cost-Aware Approach to Adversarial Robustness in Neural Networks](https://arxiv.org/abs/2409.07609) Charles Meyers, Mohammad Reza Saleh Sedghpour, Tommy Löfstedt, Erik Elmroth -+ [Context-Aware Membership Inference Attacks against Pre-trained Large Language Models](https://arxiv.org//abs/2409.13745) ++ [Context-Aware Membership Inference Attacks against Pre-trained Large Language Models](https://arxiv.org/abs/2409.13745) Hongyan Chang, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, Reza Shokri -+ [SoK: Security and Privacy Risks of Healthcare AI](https://arxiv.org//abs/2409.07415) ++ [SoK: Security and Privacy Risks of Healthcare AI](https://arxiv.org/abs/2409.07415) Yuanhaur Chang, Han Liu, Chenyang Lu, Ning Zhang # 2024-09-10 -+ [On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective](https://arxiv.org//abs/2409.06130) ++ [On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective](https://arxiv.org/abs/2409.06130) Aoting Hu, Yanzhi Chen, Renjie Xie, Adrian Weller -+ [Towards Robust Uncertainty-Aware Incomplete Multi-View Classification](https://arxiv.org//abs/2409.06270) ++ [Towards Robust Uncertainty-Aware Incomplete Multi-View Classification](https://arxiv.org/abs/2409.06270) Mulin Chen, Haojian Huang, Qiang Li -+ [LAMP: Learnable Meta-Path Guided Adversarial Contrastive Learning for Heterogeneous Graphs](https://arxiv.org//abs/2409.06323) ++ [LAMP: Learnable Meta-Path Guided Adversarial Contrastive Learning for Heterogeneous Graphs](https://arxiv.org/abs/2409.06323) Siqing Li, Jin-Duk Park, Wei Huang, Xin Cao, Won-Yong Shin, Zhiqiang Xu -+ [Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models](https://arxiv.org//abs/2409.06420) ++ [Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models](https://arxiv.org/abs/2409.06420) Siyu Zhai, Zhibo He, Xiaofeng Cong, Junming Hou, Jie Gui, Jian Wei You, Xin Gong, James Tin-Yau Kwok, Yuan Yan Tang -+ [BACKRUNNER: Mitigating Smart Contract Attacks in the Real World](https://arxiv.org//abs/2409.06213) ++ [BACKRUNNER: Mitigating Smart Contract Attacks in the Real World](https://arxiv.org/abs/2409.06213) Chaofan Shou, Yuanyu Ke, Yupeng Yang, Qi Su, Or Dadosh, Assaf Eli, David Benchimol, Doudou Lu, Daniel Tong, Dex Chen, Zoey Tan, Jacob Chia, Koushik Sen, Wenke Lee -+ [Adversary Resilient Learned Bloom Filters](https://arxiv.org//abs/2409.06556) ++ [Adversary Resilient Learned Bloom Filters](https://arxiv.org/abs/2409.06556) Allison Bishop, Hayder Tirmazi -+ [Adversarial Attacks to Multi-Modal Models](https://arxiv.org//abs/2409.06793) ++ [Adversarial Attacks to Multi-Modal Models](https://arxiv.org/abs/2409.06793) Zhihao Dou, Xin Hu, Haibo Yang, Zhuqing Liu, Minghong Fang -+ [DV-FSR: A Dual-View Target Attack Framework for Federated Sequential Recommendation](https://arxiv.org//abs/2409.07500) ++ [DV-FSR: A Dual-View Target Attack Framework for Federated Sequential Recommendation](https://arxiv.org/abs/2409.07500) Qitao Qin, Yucong Luo, Mingyue Cheng, Qingyang Mao, Chenyi Lei # 2024-09-09 -+ [TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors](https://arxiv.org//abs/2409.05294) ++ [TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors](https://arxiv.org/abs/2409.05294) Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, Yisen Wang -+ [Seeing Through the Mask: Rethinking Adversarial Examples for CAPTCHAs](https://arxiv.org//abs/2409.05558) ++ [Seeing Through the Mask: Rethinking Adversarial Examples for CAPTCHAs](https://arxiv.org/abs/2409.05558) Yahya Jabary, Andreas Plesner, Turlan Kuzhagaliyev, Roger Wattenhofer -+ [DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification](https://arxiv.org//abs/2409.05587) ++ [DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification](https://arxiv.org/abs/2409.05587) Junzhou Chen, Zirui Zhang, Jing Yu, Heqiang Huang, Ronghui Zhang, Xuemiao Xu, Bin Sheng, Hong Yan -+ [Adversarial Attacks on Data Attribution](https://arxiv.org//abs/2409.05657) ++ [Adversarial Attacks on Data Attribution](https://arxiv.org/abs/2409.05657) Xinhe Wang, Pingbang Hu, Junwei Deng, Jiaqi W. Ma # 2024-09-08 -+ [PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions](https://arxiv.org//abs/2409.05076) ++ [PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions](https://arxiv.org/abs/2409.05076) Yudong Zhang, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Yu Wang -+ [Vision-fused Attack: Advancing Aggressive and Stealthy Adversarial Text against Neural Machine Translation](https://arxiv.org//abs/2409.05021) ++ [Vision-fused Attack: Advancing Aggressive and Stealthy Adversarial Text against Neural Machine Translation](https://arxiv.org/abs/2409.05021) Yanni Xue, Haojie Hao, Jiakai Wang, Qiang Sheng, Renshuai Tao, Yu Liang, Pu Feng, Xianglong Liu -+ [Natias: Neuron Attribution based Transferable Image Adversarial Steganography](https://arxiv.org//abs/2409.04968) ++ [Natias: Neuron Attribution based Transferable Image Adversarial Steganography](https://arxiv.org/abs/2409.04968) Zexin Fan, Kejiang Chen, Kai Zeng, Jiansong Zhang, Weiming Zhang, Nenghai Yu -+ [Sight View Constraint for Robust Point Cloud Registration](https://arxiv.org//abs/2409.05065) ++ [Sight View Constraint for Robust Point Cloud Registration](https://arxiv.org/abs/2409.05065) Yaojie Zhang, Weijun Wang, Tianlun Huang, Zhiyong Wang, Wei Feng -+ [Can OOD Object Detectors Learn from Foundation Models?](https://arxiv.org//abs/2409.05162) ++ [Can OOD Object Detectors Learn from Foundation Models?](https://arxiv.org/abs/2409.05162) Jiahui Liu, Xin Wen, Shizhen Zhao, Yingxian Chen, Xiaojuan Qi -+ [Balancing Security and Accuracy: A Novel Federated Learning Approach for Cyberattack Detection in Blockchain Networks](https://arxiv.org//abs/2409.04972) ++ [Balancing Security and Accuracy: A Novel Federated Learning Approach for Cyberattack Detection in Blockchain Networks](https://arxiv.org/abs/2409.04972) Tran Viet Khoa, Mohammad Abu Alsheikh, Yibeltal Alem, Dinh Thai Hoang -+ [Efficient Homomorphically Encrypted Convolutional Neural Network Without Rotation](https://arxiv.org//abs/2409.05205) ++ [Efficient Homomorphically Encrypted Convolutional Neural Network Without Rotation](https://arxiv.org/abs/2409.05205) Sajjad Akherati, Xinmiao Zhang # 2024-09-07 -+ [Swin Transformer for Robust Differentiation of Real and Synthetic Images: Intra- and Inter-Dataset Analysis](https://arxiv.org//abs/2409.04734) ++ [Swin Transformer for Robust Differentiation of Real and Synthetic Images: Intra- and Inter-Dataset Analysis](https://arxiv.org/abs/2409.04734) Preetu Mehta, Aman Sagar, Suchi Kumari -+ [PANTS: Practical Adversarial Network Traffic Samples against ML-powered Networking Classifiers](https://arxiv.org//abs/2409.04691) ++ [PANTS: Practical Adversarial Network Traffic Samples against ML-powered Networking Classifiers](https://arxiv.org/abs/2409.04691) Minhao Jin, Maria Apostolaki -+ [PIXHELL Attack: Leaking Sensitive Information from Air-Gap Computers via `Singing Pixels'](https://arxiv.org//abs/2409.04930) ++ [PIXHELL Attack: Leaking Sensitive Information from Air-Gap Computers via `Singing Pixels'](https://arxiv.org/abs/2409.04930) Mordechai Guri # 2024-09-06 -+ [A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage](https://arxiv.org//abs/2409.04040) ++ [A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage](https://arxiv.org/abs/2409.04040) Huan Yang, Deyu Zhang, Yudong Zhao, Yuanchun Li, Yunxin Liu -+ [Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers](https://arxiv.org//abs/2409.04142) ++ [Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers](https://arxiv.org/abs/2409.04142) Gorka Abad, Stjepan Picek, Lorenzo Cavallaro, Aitor Urbieta -+ [AGR: Age Group fairness Reward for Bias Mitigation in LLMs](https://arxiv.org//abs/2409.04340) ++ [AGR: Age Group fairness Reward for Bias Mitigation in LLMs](https://arxiv.org/abs/2409.04340) Shuirong Cao, Ruoxi Cheng, Zhiqiang Wang -+ [Learning to Learn Transferable Generative Attack for Person Re-Identification](https://arxiv.org//abs/2409.04208) ++ [Learning to Learn Transferable Generative Attack for Person Re-Identification](https://arxiv.org/abs/2409.04208) Yuan Bian, Min Liu, Xueping Wang, Yunfeng Ma, Yaonan Wang # 2024-09-05 -+ [Bypassing DARCY Defense: Indistinguishable Universal Adversarial Triggers](https://arxiv.org//abs/2409.03183) ++ [Bypassing DARCY Defense: Indistinguishable Universal Adversarial Triggers](https://arxiv.org/abs/2409.03183) Zuquan Peng, Yuanyuan He, Jianbing Ni, Ben Niu -+ [Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG](https://arxiv.org//abs/2409.03646) ++ [Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG](https://arxiv.org/abs/2409.03646) Manshan Guo, Bhavin Choksi, Sari Sadiya, Alessandro T. Gifford, Martina G. Vilas, Radoslaw M. Cichy, Gemma Roig -+ [Active Fake: DeepFake Camouflage](https://arxiv.org//abs/2409.03200) ++ [Active Fake: DeepFake Camouflage](https://arxiv.org/abs/2409.03200) Pu Sun, Honggang Qi, Yuezun Li -+ [Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks](https://arxiv.org//abs/2409.03458) ++ [Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks](https://arxiv.org/abs/2409.03458) Akshay Jain, Shiv Ram Dubey, Satish Kumar Singh, KC Santosh, Bidyut Baran Chaudhuri -+ [A practical approach to evaluating the adversarial distance for machine learning classifiers](https://arxiv.org//abs/2409.03598) ++ [A practical approach to evaluating the adversarial distance for machine learning classifiers](https://arxiv.org/abs/2409.03598) Georg Siedel, Ekagra Gupta, Andrey Morozov -+ [Simplex-enabled Safe Continual Learning Machine](https://arxiv.org//abs/2409.05898) ++ [Simplex-enabled Safe Continual Learning Machine](https://arxiv.org/abs/2409.05898) Yihao Cai, Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo -+ [Revisiting Privacy-Utility Trade-off for DP Training with Pre-existing Knowledge](https://arxiv.org//abs/2409.03344) ++ [Revisiting Privacy-Utility Trade-off for DP Training with Pre-existing Knowledge](https://arxiv.org/abs/2409.03344) Yu Zheng, Wenchao Zhang, Yonggang Zhang, Wei Song, Kai Zhou, Bo Han # 2024-09-04 -+ [TASAR: Transferable Attack on Skeletal Action Recognition](https://arxiv.org//abs/2409.02483) ++ [TASAR: Transferable Attack on Skeletal Action Recognition](https://arxiv.org/abs/2409.02483) Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Ajian Liu, Xingxing Wei, Meng Wang, He Wang -+ [Adversarial Attacks on Machine Learning-Aided Visualizations](https://arxiv.org//abs/2409.02485) ++ [Adversarial Attacks on Machine Learning-Aided Visualizations](https://arxiv.org/abs/2409.02485) Takanori Fujiwara, Kostiantyn Kucher, Junpeng Wang, Rafael M. Martins, Andreas Kerren, Anders Ynnerman -+ [AdvSecureNet: A Python Toolkit for Adversarial Machine Learning](https://arxiv.org//abs/2409.02629) ++ [AdvSecureNet: A Python Toolkit for Adversarial Machine Learning](https://arxiv.org/abs/2409.02629) Melih Catal, Manuel Günther -+ [Alignment-Aware Model Extraction Attacks on Large Language Models](https://arxiv.org//abs/2409.02718) ++ [Alignment-Aware Model Extraction Attacks on Large Language Models](https://arxiv.org/abs/2409.02718) Zi Liang, Qingqing Ye, Yanyun Wang, Sen Zhang, Yaxin Xiao, Ronghua Li, Jianliang Xu, Haibo Hu -+ [Benchmarking Spurious Bias in Few-Shot Image Classifiers](https://arxiv.org//abs/2409.02882) ++ [Benchmarking Spurious Bias in Few-Shot Image Classifiers](https://arxiv.org/abs/2409.02882) Guangtao Zheng, Wenqian Ye, Aidong Zhang -+ [Robust Federated Finetuning of Foundation Models via Alternating Minimization of LoRA](https://arxiv.org//abs/2409.02346) ++ [Robust Federated Finetuning of Foundation Models via Alternating Minimization of LoRA](https://arxiv.org/abs/2409.02346) Shuangyi Chen, Yue Ju, Hardik Dalal, Zhongwen Zhu, Ashish Khisti -+ [Boosting Certificate Robustness for Time Series Classification with Efficient Self-Ensemble](https://arxiv.org//abs/2409.02802) ++ [Boosting Certificate Robustness for Time Series Classification with Efficient Self-Ensemble](https://arxiv.org/abs/2409.02802) Chang Dong, Zhengyang Li, Liangwei Zheng, Weitong Chen, Wei Emma Zhang -+ [Transfer-based Adversarial Poisoning Attacks for Online (MIMO-)Deep Receviers](https://arxiv.org//abs/2409.02430) ++ [Transfer-based Adversarial Poisoning Attacks for Online (MIMO-)Deep Receviers](https://arxiv.org/abs/2409.02430) Kunze Wu, Weiheng Jiang, Dusit Niyato, Yinghuan Li, Chuang Luo # 2024-09-03 -+ [Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor](https://arxiv.org//abs/2409.01952) ++ [Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor](https://arxiv.org/abs/2409.01952) Abdullah Arafat Miah, Yu Bi -+ [In Defense of RAG in the Era of Long-Context Language Models](https://arxiv.org//abs/2409.01666) ++ [In Defense of RAG in the Era of Long-Context Language Models](https://arxiv.org/abs/2409.01666) Tan Yu, Anbang Xu, Rama Akkiraju -+ [Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge](https://arxiv.org//abs/2409.01627) ++ [Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge](https://arxiv.org/abs/2409.01627) Hyejin Park, Dongbo Min -+ [NoiseAttack: An Evasive Sample-Specific Multi-Targeted Backdoor Attack Through White Gaussian Noise](https://arxiv.org//abs/2409.02251) ++ [NoiseAttack: An Evasive Sample-Specific Multi-Targeted Backdoor Attack Through White Gaussian Noise](https://arxiv.org/abs/2409.02251) Abdullah Arafat Miah, Kaan Icer, Resit Sendag, Yu Bi -+ [Safeguarding AI Agents: Developing and Analyzing Safety Architectures](https://arxiv.org//abs/2409.03793) ++ [Safeguarding AI Agents: Developing and Analyzing Safety Architectures](https://arxiv.org/abs/2409.03793) Ishaan Domkundwar, Mukunda N S # 2024-09-02 -+ [CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models](https://arxiv.org//abs/2409.01193) ++ [CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models](https://arxiv.org/abs/2409.01193) Rui Zeng, Xi Chen, Yuwen Pu, Xuhong Zhang, Tianyu Du, Shouling Ji -+ [Towards Robust Online Domain Adaptive Semantic Segmentation under Adverse Weather Conditions](https://arxiv.org//abs/2409.01072) ++ [Towards Robust Online Domain Adaptive Semantic Segmentation under Adverse Weather Conditions](https://arxiv.org/abs/2409.01072) Taorong Liu, Jing Xiao, Liang Liao, Chia-Wen Lin -+ [Defending against Model Inversion Attacks via Random Erasing](https://arxiv.org//abs/2409.01062) ++ [Defending against Model Inversion Attacks via Random Erasing](https://arxiv.org/abs/2409.01062) Viet-Hung Tran, Ngoc-Bao Nguyen, Son T. Mai, Hans Vandierendonck, Ngai-man Cheung -+ [Adversarial Pruning: A Survey and Benchmark of Pruning Methods for Adversarial Robustness](https://arxiv.org//abs/2409.01249) ++ [Adversarial Pruning: A Survey and Benchmark of Pruning Methods for Adversarial Robustness](https://arxiv.org/abs/2409.01249) Giorgio Piras, Maura Pintor, Ambra Demontis, Battista Biggio, Giorgio Giacinto, Fabio Roli -+ [Backdoor Defense through Self-Supervised and Generative Learning](https://arxiv.org//abs/2409.01185) ++ [Backdoor Defense through Self-Supervised and Generative Learning](https://arxiv.org/abs/2409.01185) Ivan Sabolić, Ivan Grubišić, Siniša Šegvić -+ [Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack](https://arxiv.org//abs/2409.00960) ++ [Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack](https://arxiv.org/abs/2409.00960) Guanzhong Chen, Zhenhan Qin, Mingxin Yang, Yajie Zhou, Tao Fan, Tianyu Du, Zenglin Xu -+ [No Peer, no Cry: Network Application Fuzzing via Fault Injection](https://arxiv.org//abs/2409.01059) ++ [No Peer, no Cry: Network Application Fuzzing via Fault Injection](https://arxiv.org/abs/2409.01059) Nils Bars, Moritz Schloegel, Nico Schiller, Lukas Bernhard, Thorsten Holz -+ [Phantom: Untargeted Poisoning Attacks on Semi-Supervised Learning (Full Version)](https://arxiv.org//abs/2409.01470) ++ [Phantom: Untargeted Poisoning Attacks on Semi-Supervised Learning (Full Version)](https://arxiv.org/abs/2409.01470) Jonathan Knauer, Phillip Rieger, Hossein Fereidooni, Ahmad-Reza Sadeghi # 2024-09-01 -+ [The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs](https://arxiv.org//abs/2409.00787) ++ [The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs](https://arxiv.org/abs/2409.00787) Bocheng Chen, Hanqing Guo, Guangjing Wang, Yuanda Wang, Qiben Yan -+ [Fisher Information guided Purification against Backdoor Attacks](https://arxiv.org//abs/2409.00863) ++ [Fisher Information guided Purification against Backdoor Attacks](https://arxiv.org/abs/2409.00863) Nazmul Karim, Abdullah Al Arafat, Adnan Siraj Rakin, Zhishan Guo, Nazanin Rahnavard -+ [Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models](https://arxiv.org//abs/2409.00598) ++ [Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models](https://arxiv.org/abs/2409.00598) Bang An, Sicheng Zhu, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang # 2024-08-31 -+ [Robust off-policy Reinforcement Learning via Soft Constrained Adversary](https://arxiv.org//abs/2409.00418) ++ [Robust off-policy Reinforcement Learning via Soft Constrained Adversary](https://arxiv.org/abs/2409.00418) Kosuke Nakanishi, Akihiro Kubo, Yuji Yasui, Shin Ishii -+ [Rethinking Backdoor Detection Evaluation for Language Models](https://arxiv.org//abs/2409.00399) ++ [Rethinking Backdoor Detection Evaluation for Language Models](https://arxiv.org/abs/2409.00399) Jun Yan, Wenjie Jacky Mo, Xiang Ren, Robin Jia -+ [LightPure: Realtime Adversarial Image Purification for Mobile Devices Using Diffusion Models](https://arxiv.org//abs/2409.00340) ++ [LightPure: Realtime Adversarial Image Purification for Mobile Devices Using Diffusion Models](https://arxiv.org/abs/2409.00340) Hossein Khalili, Seongbin Park, Vincent Li, Brandan Bright, Ali Payani, Ramana Rao Kompella, Nader Sehatbakhsh -+ [HSF: Defending against Jailbreak Attacks with Hidden State Filtering](https://arxiv.org//abs/2409.03788) ++ [HSF: Defending against Jailbreak Attacks with Hidden State Filtering](https://arxiv.org/abs/2409.03788) Cheng Qian, Hainan Zhang, Lei Sha, Zhiming Zheng # 2024-08-30 -+ [Safety Layers of Aligned Large Language Models: The Key to LLM Security](https://arxiv.org//abs/2408.17003) ++ [Safety Layers of Aligned Large Language Models: The Key to LLM Security](https://arxiv.org/abs/2408.17003) Shen Li, Liuyi Yao, Lan Zhang, Yaliang Li -+ [Instant Adversarial Purification with Adversarial Consistency Distillation](https://arxiv.org//abs/2408.17064) ++ [Instant Adversarial Purification with Adversarial Consistency Distillation](https://arxiv.org/abs/2408.17064) Chun Tong Lei, Hon Ming Yam, Zhongliang Guo, Chun Pong Lau -+ [Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage](https://arxiv.org//abs/2408.17354) ++ [Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage](https://arxiv.org/abs/2408.17354) Md Rafi Ur Rashid, Jing Liu, Toshiaki Koike-Akino, Shagufta Mehnaz, Ye Wang -+ [Can We Leave Deepfake Data Behind in Training Deepfake Detector?](https://arxiv.org//abs/2408.17052) ++ [Can We Leave Deepfake Data Behind in Training Deepfake Detector?](https://arxiv.org/abs/2408.17052) Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, Chen Li # 2024-08-29 -+ [DetectBERT: Towards Full App-Level Representation Learning to Detect Android Malware](https://arxiv.org//abs/2408.16353) ++ [DetectBERT: Towards Full App-Level Representation Learning to Detect Android Malware](https://arxiv.org/abs/2408.16353) Tiezhu Sun, Nadia Daoudi, Kisub Kim, Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein -+ [SFR-GNN: Simple and Fast Robust GNNs against Structural Attacks](https://arxiv.org//abs/2408.16537) ++ [SFR-GNN: Simple and Fast Robust GNNs against Structural Attacks](https://arxiv.org/abs/2408.16537) Xing Ai, Guanyu Zhu, Yulin Zhu, Yu Zheng, Gaolei Li, Jianhua Li, Kai Zhou -+ [PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning](https://arxiv.org//abs/2408.16769) ++ [PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning](https://arxiv.org/abs/2408.16769) Noor Hussein, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar -+ [GL-TSVM: A robust and smooth twin support vector machine with guardian loss function](https://arxiv.org//abs/2408.16336) ++ [GL-TSVM: A robust and smooth twin support vector machine with guardian loss function](https://arxiv.org/abs/2408.16336) Mushir Akhtar, M. Tanveer, Mohd. Arshad -+ [STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models](https://arxiv.org//abs/2408.16807) ++ [STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models](https://arxiv.org/abs/2408.16807) Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar -+ [Tex-ViT: A Generalizable, Robust, Texture-based dual-branch cross-attention deepfake detector](https://arxiv.org//abs/2408.16892) ++ [Tex-ViT: A Generalizable, Robust, Texture-based dual-branch cross-attention deepfake detector](https://arxiv.org/abs/2408.16892) Deepak Dagar, Dinesh Kumar Vishwakarma # 2024-08-28 -+ [Evaluating Model Robustness Using Adaptive Sparse L0 Regularization](https://arxiv.org//abs/2408.15702) ++ [Evaluating Model Robustness Using Adaptive Sparse L0 Regularization](https://arxiv.org/abs/2408.15702) Weiyou Liu, Zhenyang Li, Weitong Chen -+ [Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks](https://arxiv.org//abs/2408.15721) ++ [Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks](https://arxiv.org/abs/2408.15721) Oscar Chew, Po-Yi Lu, Jayden Lin, Hsuan-Tien Lin -+ [Network transferability of adversarial patches in real-time object detection](https://arxiv.org//abs/2408.15833) ++ [Network transferability of adversarial patches in real-time object detection](https://arxiv.org/abs/2408.15833) Jens Bayer, Stefan Becker, David Münch, Michael Arens -+ [Certified Causal Defense with Generalizable Robustness](https://arxiv.org//abs/2408.15451) ++ [Certified Causal Defense with Generalizable Robustness](https://arxiv.org/abs/2408.15451) Yiran Qiao, Yu Yin, Chen Chen, Jing Ma -+ [VFLIP: A Backdoor Defense for Vertical Federated Learning via Identification and Purification](https://arxiv.org//abs/2408.15591) ++ [VFLIP: A Backdoor Defense for Vertical Federated Learning via Identification and Purification](https://arxiv.org/abs/2408.15591) Yungi Cho, Woorim Han, Miseon Yu, Ho Bae, Yunheung Paek # 2024-08-27 -+ [TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training](https://arxiv.org//abs/2408.14728) ++ [TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training](https://arxiv.org/abs/2408.14728) Bongsoo Yi, Rongjie Lai, Yao Li -+ [Detecting AI Flaws: Target-Driven Attacks on Internal Faults in Language Models](https://arxiv.org//abs/2408.14853) ++ [Detecting AI Flaws: Target-Driven Attacks on Internal Faults in Language Models](https://arxiv.org/abs/2408.14853) Yuhao Du, Zhuo Li, Pengyu Cheng, Xiang Wan, Anningzhe Gao -+ [Adversarial Attacks and Defenses in Multivariate Time-Series Forecasting for Smart and Connected Infrastructures](https://arxiv.org//abs/2408.14875) ++ [Adversarial Attacks and Defenses in Multivariate Time-Series Forecasting for Smart and Connected Infrastructures](https://arxiv.org/abs/2408.14875) Pooja Krishan, Rohan Mohapatra, Saptarshi Sengupta -+ [Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering](https://arxiv.org//abs/2408.15037) ++ [Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering](https://arxiv.org/abs/2408.15037) Haowei Du, Huishuai Zhang, Dongyan Zhao -+ [LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet](https://arxiv.org//abs/2408.15221) ++ [LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet](https://arxiv.org/abs/2408.15221) Nathaniel Li, Ziwen Han, Ian Steneker, Willow Primack, Riley Goodside, Hugh Zhang, Zifan Wang, Cristina Menghini, Summer Yue -+ [Adversarial Manhole: Challenging Monocular Depth Estimation and Semantic Segmentation Models with Patch Attack](https://arxiv.org//abs/2408.14879) ++ [Adversarial Manhole: Challenging Monocular Depth Estimation and Semantic Segmentation Models with Patch Attack](https://arxiv.org/abs/2408.14879) Naufal Suryanto, Andro Aprila Adiputra, Ahmada Yusril Kadiptya, Yongsu Kim, Howon Kim -+ [Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation](https://arxiv.org//abs/2408.14738) ++ [Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation](https://arxiv.org/abs/2408.14738) Bochao Liu, Pengju Wang, Shiming Ge -+ [Improving Adversarial Robustness in Android Malware Detection by Reducing the Impact of Spurious Correlations](https://arxiv.org//abs/2408.16025) ++ [Improving Adversarial Robustness in Android Malware Detection by Reducing the Impact of Spurious Correlations](https://arxiv.org/abs/2408.16025) Hamid Bostani, Zhengyu Zhao, Veelasha Moonsamy @@ -20733,696 +20733,696 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca # 2024-08-26 -+ [Celtibero: Robust Layered Aggregation for Federated Learning](https://arxiv.org//abs/2408.14240) ++ [Celtibero: Robust Layered Aggregation for Federated Learning](https://arxiv.org/abs/2408.14240) Borja Molina-Coronado -+ [MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues](https://arxiv.org//abs/2408.14418) ++ [MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues](https://arxiv.org/abs/2408.14418) Kuluhan Binici, Abhinav Ramesh Kashyap, Viktor Schlegel, Andy T. Liu, Vijay Prakash Dwivedi, Thanh-Tung Nguyen, Xiaoxue Gao, Nancy F. Chen, Stefan Winkler -+ [TF-Attack: Transferable and Fast Adversarial Attacks on Large Language Models](https://arxiv.org//abs/2408.13985) ++ [TF-Attack: Transferable and Fast Adversarial Attacks on Large Language Models](https://arxiv.org/abs/2408.13985) Zelin Li, Kehai Chen, Xuefeng Bai, Lemao Liu, Mingming Yang, Yang Xiang, Min Zhang -+ [Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation](https://arxiv.org//abs/2408.13983) ++ [Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation](https://arxiv.org/abs/2408.13983) Yushun Tang, Shuoshuo Chen, Zhihe Lu, Xinchao Wang, Zhihai He -+ [2D-Malafide: Adversarial Attacks Against Face Deepfake Detection Systems](https://arxiv.org//abs/2408.14143) ++ [2D-Malafide: Adversarial Attacks Against Face Deepfake Detection Systems](https://arxiv.org/abs/2408.14143) Chiara Galdi, Michele Panariello, Massimiliano Todisco, Nicholas Evans # 2024-08-25 -+ [SAB:A Stealing and Robust Backdoor Attack based on Steganographic Algorithm against Federated Learning](https://arxiv.org//abs/2408.13773) ++ [SAB:A Stealing and Robust Backdoor Attack based on Steganographic Algorithm against Federated Learning](https://arxiv.org/abs/2408.13773) Weida Xu, Yang Xu, Sicong Zhang -+ [On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective](https://arxiv.org//abs/2408.13809) ++ [On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective](https://arxiv.org/abs/2408.13809) Tal Alter, Raz Lapid, Moshe Sipper -+ [RT-Attack: Jailbreaking Text-to-Image Models via Random Token](https://arxiv.org//abs/2408.13896) ++ [RT-Attack: Jailbreaking Text-to-Image Models via Random Token](https://arxiv.org/abs/2408.13896) Sensen Gao, Xiaojun Jia, Yihao Huang, Ranjie Duan, Jindong Gu, Yang Liu, Qing Guo -+ [CAMH: Advancing Model Hijacking Attack in Machine Learning](https://arxiv.org//abs/2408.13741) ++ [CAMH: Advancing Model Hijacking Attack in Machine Learning](https://arxiv.org/abs/2408.13741) Xing He, Jiahao Chen, Yuwen Pu, Qingming Li, Chunyi Zhou, Yingcai Wu, Jinbao Li, Shouling Ji -+ [Sample-Independent Federated Learning Backdoor Attack](https://arxiv.org//abs/2408.13849) ++ [Sample-Independent Federated Learning Backdoor Attack](https://arxiv.org/abs/2408.13849) Weida Xu, Yang Xu, Sicong Zhang # 2024-08-24 -+ [Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach](https://arxiv.org//abs/2408.13461) ++ [Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach](https://arxiv.org/abs/2408.13461) Jiwei Guan, Tianyu Ding, Longbing Cao, Lei Pan, Chen Wang, Xi Zheng -+ [DeepVoting: Learning and Fine-Tuning Voting Rules with Canonical Embeddings](https://arxiv.org//abs/2408.13630) ++ [DeepVoting: Learning and Fine-Tuning Voting Rules with Canonical Embeddings](https://arxiv.org/abs/2408.13630) Leonardo Matone, Ben Abramowitz, Ben Armstrong, Avinash Balakrishnan, Nicholas Mattei # 2024-08-23 -+ [On the Credibility of Backdoor Attacks Against Object Detectors in the Physical World](https://arxiv.org//abs/2408.12122) ++ [On the Credibility of Backdoor Attacks Against Object Detectors in the Physical World](https://arxiv.org/abs/2408.12122) Bao Gia Doan, Dang Quang Nguyen, Callum Lindquist, Paul Montague, Tamas Abraham, Olivier De Vel, Seyit Camtepe, Salil S. Kanhere, Ehsan Abbasnejad, Damith C. Ranasinghe -+ [BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models](https://arxiv.org//abs/2408.12798) ++ [BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models](https://arxiv.org/abs/2408.12798) Yige Li, Hanxun Huang, Yunhan Zhao, Xingjun Ma, Jun Sun -+ [Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks](https://arxiv.org//abs/2408.12806) ++ [Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks](https://arxiv.org/abs/2408.12806) Yusuf Usman, Aadesh Upadhyay, Prashnna Gyawali, Robin Chataut -+ [Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks](https://arxiv.org//abs/2408.13102) ++ [Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks](https://arxiv.org/abs/2408.13102) Zhenyu Liu, Haoran Duan, Huizhi Liang, Yang Long, Vaclav Snasel, Guiseppe Nicosia, Rajiv Ranjan, Varun Ojha -+ [Protecting against simultaneous data poisoning attacks](https://arxiv.org//abs/2408.13221) ++ [Protecting against simultaneous data poisoning attacks](https://arxiv.org/abs/2408.13221) Neel Alex, Shoaib Ahmed Siddiqui, Amartya Sanyal, David Krueger # 2024-08-22 -+ [Query-Efficient Video Adversarial Attack with Stylized Logo](https://arxiv.org//abs/2408.12099) ++ [Query-Efficient Video Adversarial Attack with Stylized Logo](https://arxiv.org/abs/2408.12099) Duoxun Tang, Yuxin Cao, Xi Xiao, Derui Wang, Sheng Wen, Tianqing Zhu -+ [MakeupAttack: Feature Space Black-box Backdoor Attack on Face Recognition via Makeup Transfer](https://arxiv.org//abs/2408.12312) ++ [MakeupAttack: Feature Space Black-box Backdoor Attack on Face Recognition via Makeup Transfer](https://arxiv.org/abs/2408.12312) Ming Sun, Lihua Jing, Zixuan Zhu, Rui Wang -+ [Enhancing Transferability of Adversarial Attacks with GE-AdvGAN+: A Comprehensive Framework for Gradient Editing](https://arxiv.org//abs/2408.12673) ++ [Enhancing Transferability of Adversarial Attacks with GE-AdvGAN+: A Comprehensive Framework for Gradient Editing](https://arxiv.org/abs/2408.12673) Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Yuchen Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen -+ [Leveraging Information Consistency in Frequency and Spatial Domain for Adversarial Attacks](https://arxiv.org//abs/2408.12670) ++ [Leveraging Information Consistency in Frequency and Spatial Domain for Adversarial Attacks](https://arxiv.org/abs/2408.12670) Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Xinyi Wang, Yiyun Huang, Huaming Chen -+ [BankTweak: Adversarial Attack against Multi-Object Trackers by Manipulating Feature Banks](https://arxiv.org//abs/2408.12727) ++ [BankTweak: Adversarial Attack against Multi-Object Trackers by Manipulating Feature Banks](https://arxiv.org/abs/2408.12727) Woojin Shin, Donghwa Kang, Daejin Choi, Brent Kang, Jinkyu Lee, Hyeongboo Baek -+ [Assessing the Uncertainty and Robustness of the Laptop Refurbishing Software](https://arxiv.org//abs/2409.03782) ++ [Assessing the Uncertainty and Robustness of the Laptop Refurbishing Software](https://arxiv.org/abs/2409.03782) Chengjie Lu, Jiahui Wu, Shaukat Ali, Mikkel Labori Olsen # 2024-08-21 -+ [Unlocking Adversarial Suffix Optimization Without Affirmative Phrases: Efficient Black-box Jailbreaking via LLM as Optimizer](https://arxiv.org//abs/2408.11313) ++ [Unlocking Adversarial Suffix Optimization Without Affirmative Phrases: Efficient Black-box Jailbreaking via LLM as Optimizer](https://arxiv.org/abs/2408.11313) Weipeng Jiang, Zhenting Wang, Juan Zhai, Shiqing Ma, Zhengyu Zhao, Chao Shen -+ [Nothing in Excess: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering](https://arxiv.org//abs/2408.11491) ++ [Nothing in Excess: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering](https://arxiv.org/abs/2408.11491) Zouying Cao, Yifei Yang, Hai Zhao -+ [Efficient Detection of Toxic Prompts in Large Language Models](https://arxiv.org//abs/2408.11727) ++ [Efficient Detection of Toxic Prompts in Large Language Models](https://arxiv.org/abs/2408.11727) Yi Liu, Junzhe Yu, Huijia Sun, Ling Shi, Gelei Deng, Yuqi Chen, Yang Liu -+ [Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks](https://arxiv.org//abs/2408.11587) ++ [Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks](https://arxiv.org/abs/2408.11587) Ziqiang Li, Yueqi Zeng, Pengfei Xia, Lei Liu, Zhangjie Fu, Bin Li -+ [Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks](https://arxiv.org//abs/2408.11749) ++ [Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks](https://arxiv.org/abs/2408.11749) Yiyi Chen, Russa Biswas, Heather Lent, Johannes Bjerva -+ [Latent Feature and Attention Dual Erasure Attack against Multi-View Diffusion Models for 3D Assets Protection](https://arxiv.org//abs/2408.11408) ++ [Latent Feature and Attention Dual Erasure Attack against Multi-View Diffusion Models for 3D Assets Protection](https://arxiv.org/abs/2408.11408) Jingwei Sun, Xuchong Zhang, Changfeng Sun, Qicheng Bai, Hongbin Sun -+ [Exploring Robustness of Visual State Space model against Backdoor Attacks](https://arxiv.org//abs/2408.11679) ++ [Exploring Robustness of Visual State Space model against Backdoor Attacks](https://arxiv.org/abs/2408.11679) Cheng-Yi Lee, Cheng-Chang Tsai, Chia-Mu Yu, Chun-Shien Lu -+ [Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models](https://arxiv.org//abs/2408.11810) ++ [Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models](https://arxiv.org/abs/2408.11810) Chun-Yen Shih, Li-Xuan Peng, Jia-Wei Liao, Ernie Chu, Cheng-Fu Chou, Jun-Cheng Chen -+ [First line of defense: A robust first layer mitigates adversarial attacks](https://arxiv.org//abs/2408.11680) ++ [First line of defense: A robust first layer mitigates adversarial attacks](https://arxiv.org/abs/2408.11680) Janani Suresh, Nancy Nayak, Sheetal Kalyani -+ [A Practical Trigger-Free Backdoor Attack on Neural Networks](https://arxiv.org//abs/2408.11444) ++ [A Practical Trigger-Free Backdoor Attack on Neural Networks](https://arxiv.org/abs/2408.11444) Jiahao Wang, Xianglong Zhang, Xiuzhen Cheng, Pengfei Hu, Guoming Zhang -+ [Defending against Jailbreak through Early Exit Generation of Large Language Models](https://arxiv.org//abs/2408.11308) ++ [Defending against Jailbreak through Early Exit Generation of Large Language Models](https://arxiv.org/abs/2408.11308) Chongwen Zhao, Zhihao Dou, Kaizhu Huang # 2024-08-20 -+ [Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models](https://arxiv.org//abs/2408.10571) ++ [Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models](https://arxiv.org/abs/2408.10571) Cong Wan, Yuhang He, Xiang Song, Yihong Gong -+ [Privacy-preserving Universal Adversarial Defense for Black-box Models](https://arxiv.org//abs/2408.10647) ++ [Privacy-preserving Universal Adversarial Defense for Black-box Models](https://arxiv.org/abs/2408.10647) Qiao Li, Cong Wu, Jing Chen, Zijun Zhang, Kun He, Ruiying Du, Xinxin Wang, Qingchuang Zhao, Yang Liu -+ [Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation](https://arxiv.org//abs/2408.10668) ++ [Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation](https://arxiv.org/abs/2408.10668) Haoyu Wang, Bingzhe Wu, Yatao Bian, Yongzhe Chang, Xueqian Wang, Peilin Zhao -+ [Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models](https://arxiv.org//abs/2408.10682) ++ [Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models](https://arxiv.org/abs/2408.10682) Hongbang Yuan, Zhuoran Jin, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao -+ [MEGen: Generative Backdoor in Large Language Models via Model Editing](https://arxiv.org//abs/2408.10722) ++ [MEGen: Generative Backdoor in Large Language Models via Model Editing](https://arxiv.org/abs/2408.10722) Jiyang Qiu, Xinbei Ma, Zhuosheng Zhang, Hai Zhao -+ [Security Assessment of Hierarchical Federated Deep Learning](https://arxiv.org//abs/2408.10752) ++ [Security Assessment of Hierarchical Federated Deep Learning](https://arxiv.org/abs/2408.10752) D Alqattan, R Sun, H Liang, G Nicosia, V Snasel, R Ranjan, V Ojha -+ [Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?](https://arxiv.org//abs/2408.10853) ++ [Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?](https://arxiv.org/abs/2408.10853) Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye -+ [A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse](https://arxiv.org//abs/2408.10901) ++ [A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse](https://arxiv.org/abs/2408.10901) Zhongliang Guo, Lei Fang, Jingyu Lin, Yifei Qian, Shuai Zhao, Zeyu Wang, Junhao Dong, Cunjian Chen, Ognjen Arandjelović, Chun Pong Lau -+ [GAIM: Attacking Graph Neural Networks via Adversarial Influence Maximization](https://arxiv.org//abs/2408.10948) ++ [GAIM: Attacking Graph Neural Networks via Adversarial Influence Maximization](https://arxiv.org/abs/2408.10948) Xiaodong Yang, Xiaoting Li, Huiyuan Chen, Yiwei Cai -+ [Adversarial Attack for Explanation Robustness of Rationalization Models](https://arxiv.org//abs/2408.10795) ++ [Adversarial Attack for Explanation Robustness of Rationalization Models](https://arxiv.org/abs/2408.10795) Yuankai Zhang, Lingxiao Kong, Haozhao Wang, Ruixuan Li, Jun Wang, Yuhua Li, Wei Liu -+ [MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial Purification](https://arxiv.org//abs/2408.10694) ++ [MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial Purification](https://arxiv.org/abs/2408.10694) Huafeng Qin, Yuming Fu, Huiyan Zhang, Mounim A. El-Yacoubi, Xinbo Gao, Qun Song, Jun Wang -+ [Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting](https://arxiv.org//abs/2408.10463) ++ [Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting](https://arxiv.org/abs/2408.10463) Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang -+ [Iterative Window Mean Filter: Thwarting Diffusion-based Adversarial Purification](https://arxiv.org//abs/2408.10673) ++ [Iterative Window Mean Filter: Thwarting Diffusion-based Adversarial Purification](https://arxiv.org/abs/2408.10673) Hanrui Wang, Ruoxi Sun, Cunjian Chen, Minhui Xue, Lay-Ki Soon, Shuo Wang, Zhe Jin -+ [Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Neural Carrier Articles](https://arxiv.org//abs/2408.11182) ++ [Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Neural Carrier Articles](https://arxiv.org/abs/2408.11182) Zhilong Wang, Haizhou Wang, Nanqing Luo, Lan Zhang, Xiaoyan Sun, Yebo Cao, Peng Liu -+ [Revisiting Min-Max Optimization Problem in Adversarial Training](https://arxiv.org//abs/2408.11218) ++ [Revisiting Min-Max Optimization Problem in Adversarial Training](https://arxiv.org/abs/2408.11218) Sina Hajer Ahmadi, Hassan Bahrami -+ [Robust Image Classification: Defensive Strategies against FGSM and PGD Adversarial Attacks](https://arxiv.org//abs/2408.13274) ++ [Robust Image Classification: Defensive Strategies against FGSM and PGD Adversarial Attacks](https://arxiv.org/abs/2408.13274) Hetvi Waghela, Jaydip Sen, Sneha Rakshit # 2024-08-19 -+- [Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis](https://arxiv.org//abs/2408.10021) ++- [Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis](https://arxiv.org/abs/2408.10021) Kira Maag, Roman Resner, Asja Fischer -+ [Regularization for Adversarial Robust Learning](https://arxiv.org//abs/2408.09672) ++ [Regularization for Adversarial Robust Learning](https://arxiv.org/abs/2408.09672) Jie Wang, Rui Gao, Yao Xie -+ [Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial Prompting](https://arxiv.org//abs/2408.09798) ++ [Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial Prompting](https://arxiv.org/abs/2408.09798) Yun-Da Tsai, Ting-Yu Yen, Keng-Te Liao, Shou-De Lin -+ [Transferring Backdoors between Large Language Models by Knowledge Distillation](https://arxiv.org//abs/2408.09878) ++ [Transferring Backdoors between Large Language Models by Knowledge Distillation](https://arxiv.org/abs/2408.09878) Pengzhou Cheng, Zongru Wu, Tianjie Ju, Wei Du, Zhuosheng Zhang Gongshen Liu -+ [The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks](https://arxiv.org//abs/2408.10446) ++ [The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks](https://arxiv.org/abs/2408.10446) Niyar R Barman, Krish Sharma, Ashhar Aziz, Shashwat Bajpai, Shwetangshu Biswas, Vasu Sharma, Vinija Jain, Aman Chadha, Amit Sheth, Amitava Das -+ [Differentially Private Stochastic Gradient Descent with Fixed-Size Minibatches: Tighter RDP Guarantees with or without Replacement](https://arxiv.org//abs/2408.10456) ++ [Differentially Private Stochastic Gradient Descent with Fixed-Size Minibatches: Tighter RDP Guarantees with or without Replacement](https://arxiv.org/abs/2408.10456) Jeremiah Birrell, Reza Ebrahimi, Rouzbeh Behnia, Jason Pacheco # 2024-08-18 -+ [Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning](https://arxiv.org//abs/2408.09600) ++ [Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning](https://arxiv.org/abs/2408.09600) Tiansheng Huang, Gautam Bhattacharya, Pratik Joshi, Josh Kimball, Ling Liu -+ [Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks](https://arxiv.org//abs/2408.09326) ++ [Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks](https://arxiv.org/abs/2408.09326) Kexin Chen, Yi Liu, Dongxia Wang, Jiaying Chen, Wenhai Wang -+ [Adversarial Attacked Teacher for Unsupervised Domain Adaptive Object Detection](https://arxiv.org//abs/2408.09431) ++ [Adversarial Attacked Teacher for Unsupervised Domain Adaptive Object Detection](https://arxiv.org/abs/2408.09431) Kaiwen Wang, Yinzhe Shen, Martin Lauer -+ [Enhancing Adversarial Transferability with Adversarial Weight Tuning](https://arxiv.org//abs/2408.09469) ++ [Enhancing Adversarial Transferability with Adversarial Weight Tuning](https://arxiv.org/abs/2408.09469) Jiahao Chen, Zhou Feng, Rui Zeng, Yuwen Pu, Chunyi Zhou, Yi Jiang, Yuyou Gan, Jinbao Li, Shouling Ji, Shouling_Ji -+ [NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models](https://arxiv.org//abs/2408.10280) ++ [NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models](https://arxiv.org/abs/2408.10280) Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wenhan Luo, Wei Xue, Yike Guo -+ [DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization](https://arxiv.org//abs/2408.11071) ++ [DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization](https://arxiv.org/abs/2408.11071) Pucheng Dang, Xing Hu, Dong Li, Rui Zhang, Qi Guo, Kaidi Xu -+ [Say My Name: a Model's Bias Discovery Framework](https://arxiv.org//abs/2408.09570) ++ [Say My Name: a Model's Bias Discovery Framework](https://arxiv.org/abs/2408.09570) Massimiliano Ciranni, Luca Molinaro, Carlo Alberto Barbano, Attilio Fiandrotti, Vittorio Murino, Vito Paolo Pastore, Enzo Tartaglione # 2024-08-17 -+ [Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons](https://arxiv.org//abs/2408.08655) ++ [Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons](https://arxiv.org/abs/2408.08655) Binbin Ding, Penghui Yang, Zeqing Ge, Shengjun Huang -+ [Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks?](https://arxiv.org//abs/2408.08685) ++ [Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks?](https://arxiv.org/abs/2408.08685) Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, Chuan Shi -+ [Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions](https://arxiv.org//abs/2408.08780) ++ [Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions](https://arxiv.org/abs/2408.08780) Chenming Tang, Zhixiang Wang, Yunfang Wu -+ [Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness](https://arxiv.org//abs/2408.08502) ++ [Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness](https://arxiv.org/abs/2408.08502) Hefei Mei, Minjing Dong, Chang Xu -+ [Visual-Friendly Concept Protection via Selective Adversarial Perturbations](https://arxiv.org//abs/2408.08518) ++ [Visual-Friendly Concept Protection via Selective Adversarial Perturbations](https://arxiv.org/abs/2408.08518) Xiaoyue Mi, Fan Tang, Juan Cao, Peng Li, Yang Liu -+ [Towards Physical World Backdoor Attacks against Skeleton Action Recognition](https://arxiv.org//abs/2408.08671) ++ [Towards Physical World Backdoor Attacks against Skeleton Action Recognition](https://arxiv.org/abs/2408.08671) Qichen Zheng, Yi Yu, Siyuan Yang, Jun Liu, Kwok-Yan Lam, Alex Kot -+ [\textit{MMJ-Bench}: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models](https://arxiv.org//abs/2408.08464) ++ [\textit{MMJ-Bench}: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models](https://arxiv.org/abs/2408.08464) Fenghua Weng, Yue Xu, Chengyan Fu, Wenjie Wang -+ [Gradient-Variation Online Learning under Generalized Smoothness](https://arxiv.org//abs/2408.09074) ++ [Gradient-Variation Online Learning under Generalized Smoothness](https://arxiv.org/abs/2408.09074) Yan-Feng Xie, Peng Zhao, Zhi-Hua Zhou -+ [Scalable and Certifiable Graph Unlearning via Lazy Local Propagation](https://arxiv.org//abs/2408.09212) ++ [Scalable and Certifiable Graph Unlearning via Lazy Local Propagation](https://arxiv.org/abs/2408.09212) Lu Yi, Zhewei Wei -+ [Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model](https://arxiv.org//abs/2408.09300) ++ [Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model](https://arxiv.org/abs/2408.09300) Massimiliano Todisco, Michele Panariello, Xin Wang, Héctor Delgado, Kong Aik Lee, Nicholas Evans -+ [BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger](https://arxiv.org//abs/2408.09093) ++ [BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger](https://arxiv.org/abs/2408.09093) Yulin Chen, Haoran Li, Zihao Zheng, Yangqiu Song -+ [Attack Anything: Blind DNNs via Universal Background Adversarial Attack](https://arxiv.org//abs/2409.00029) ++ [Attack Anything: Blind DNNs via Universal Background Adversarial Attack](https://arxiv.org/abs/2409.00029) Jiawei Lian, Shaohui Mei, Xiaofei Wang, Yi Wang, Lefan Wang, Yingjie Lu, Mingyang Ma, Lap-Pui Chau -+ [SA-GDA: Spectral Augmentation for Graph Domain Adaptation](https://arxiv.org//abs/2408.09189) ++ [SA-GDA: Spectral Augmentation for Graph Domain Adaptation](https://arxiv.org/abs/2408.09189) Jinhui Pang, Zixuan Wang, Jiliang Tang, Mingyan Xiao, Nan Yin # 2024-08-16 -+ [KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning](https://arxiv.org//abs/2408.08146) ++ [KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning](https://arxiv.org/abs/2408.08146) Kaiqi Zhang, Jing Zhao, Rui Chen -+ [A Multi-task Adversarial Attack Against Face Authentication](https://arxiv.org//abs/2408.08205) ++ [A Multi-task Adversarial Attack Against Face Authentication](https://arxiv.org/abs/2408.08205) Hanrui Wang, Shuo Wang, Cunjian Chen, Massimo Tistarelli, Zhe Jin -+ [Evaluating Text Classification Robustness to Part-of-Speech Adversarial Examples](https://arxiv.org//abs/2408.08374) ++ [Evaluating Text Classification Robustness to Part-of-Speech Adversarial Examples](https://arxiv.org/abs/2408.08374) Anahita Samadi, Allison Sullivan -+ [Penny-Wise and Pound-Foolish in Deepfake Detection](https://arxiv.org//abs/2408.08412) ++ [Penny-Wise and Pound-Foolish in Deepfake Detection](https://arxiv.org/abs/2408.08412) Yabin Wang, Zhiwu Huang, Su Zhou, Adam Prugel-Bennett, Xiaopeng Hong -+ [Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks](https://arxiv.org//abs/2408.08924) ++ [Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks](https://arxiv.org/abs/2408.08924) Jiawei Zhao, Kejiang Chen, Xiaojian Yuan, Weiming Zhang -+ [Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models](https://arxiv.org//abs/2408.08989) ++ [Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models](https://arxiv.org/abs/2408.08989) Qingyuan Zeng, Zhenzhong Wang, Yiu-ming Cheung, Min Jiang -+ [See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses](https://arxiv.org//abs/2408.08978) ++ [See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses](https://arxiv.org/abs/2408.08978) Yulong Chen, Yang Liu, Jianhao Yan, Xuefeng Bai, Ming Zhong, Yinghao Yang, Ziyi Yang, Chenguang Zhu, Yue Zhang -+ [LEVIS: Large Exact Verifiable Input Spaces for Neural Networks](https://arxiv.org//abs/2408.08824) ++ [LEVIS: Large Exact Verifiable Input Spaces for Neural Networks](https://arxiv.org/abs/2408.08824) Mohamad Fares El Hajj Chehade, Wenting Li, Brian W. Bell, Russell Bent, Saif R. Kazi, Hao Zhu # 2024-08-14 -+ [UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection](https://arxiv.org//abs/2408.07430) ++ [UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection](https://arxiv.org/abs/2408.07430) Mu Chen, Minghan Chen, Yi Yang -+ [BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning](https://arxiv.org//abs/2408.07440) ++ [BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning](https://arxiv.org/abs/2408.07440) Asif Hanif, Fahad Shamshad, Muhammad Awais, Muzammal Naseer, Fahad Shahbaz Khan, Karthik Nandakumar, Salman Khan, Rao Muhammad Anwer -+ [Robust Active Learning (RoAL): Countering Dynamic Adversaries in Active Learning with Elastic Weight Consolidation](https://arxiv.org//abs/2408.07364) ++ [Robust Active Learning (RoAL): Countering Dynamic Adversaries in Active Learning with Elastic Weight Consolidation](https://arxiv.org/abs/2408.07364) Ricky Maulana Fajri, Yulong Pei, Lu Yin, Mykola Pechenizkiy -+ [TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases](https://arxiv.org//abs/2408.07579) ++ [TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases](https://arxiv.org/abs/2408.07579) Thibault Simonetto, Salah Ghamizi, Maxime Cordy -+ [BadMerging: Backdoor Attacks Against Model Merging](https://arxiv.org//abs/2408.07362) ++ [BadMerging: Backdoor Attacks Against Model Merging](https://arxiv.org/abs/2408.07362) Jinghuai Zhang, Jianfeng Chi, Zheng Li, Kunlin Cai, Yang Zhang, Yuan Tian -+ [Enhancing Adversarial Attacks via Parameter Adaptive Adversarial Attack](https://arxiv.org//abs/2408.07733) ++ [Enhancing Adversarial Attacks via Parameter Adaptive Adversarial Attack](https://arxiv.org/abs/2408.07733) Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Chenyu Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen -+ [CodeMirage: Hallucinations in Code Generated by Large Language Models](https://arxiv.org//abs/2408.08333) ++ [CodeMirage: Hallucinations in Code Generated by Large Language Models](https://arxiv.org/abs/2408.08333) Vibhor Agarwal, Yulong Pei, Salwa Alamir, Xiaomo Liu # 2024-08-13 -+ [RW-NSGCN: A Robust Approach to Structural Attacks via Negative Sampling](https://arxiv.org//abs/2408.06665) ++ [RW-NSGCN: A Robust Approach to Structural Attacks via Negative Sampling](https://arxiv.org/abs/2408.06665) Shuqi He, Jun Zhuang, Ding Wang, Jun Song -+ [DePatch: Towards Robust Adversarial Patch for Evading Person Detectors in the Real World](https://arxiv.org//abs/2408.06625) ++ [DePatch: Towards Robust Adversarial Patch for Evading Person Detectors in the Real World](https://arxiv.org/abs/2408.06625) Jikang Cheng, Ying Zhang, Zhongyuan Wang, Zou Qin, Chen Li -+ [VulCatch: Enhancing Binary Vulnerability Detection through CodeT5 Decompilation and KAN Advanced Feature Extraction](https://arxiv.org//abs/2408.07181) ++ [VulCatch: Enhancing Binary Vulnerability Detection through CodeT5 Decompilation and KAN Advanced Feature Extraction](https://arxiv.org/abs/2408.07181) Abdulrahman Hamman Adama Chukkol, Senlin Luo, Kashif Sharif, Yunusa Haruna, Muhammad Muhammad Abdullahi -+ [FedMADE: Robust Federated Learning for Intrusion Detection in IoT Networks Using a Dynamic Aggregation Method](https://arxiv.org//abs/2408.07152) ++ [FedMADE: Robust Federated Learning for Intrusion Detection in IoT Networks Using a Dynamic Aggregation Method](https://arxiv.org/abs/2408.07152) Shihua Sun, Pragya Sharma, Kenechukwu Nwodo, Angelos Stavrou, Haining Wang -+ [ED$^4$: Explicit Data-level Debiasing for Deepfake Detection](https://arxiv.org//abs/2408.06779) ++ [ED$^4$: Explicit Data-level Debiasing for Deepfake Detection](https://arxiv.org/abs/2408.06779) Jikang Cheng, Ying Zhang, Qin Zou, Zhiyuan Yan, Chao Liang, Zhongyuan Wang, Chen Li # 2024-08-12 -+ [Understanding Byzantine Robustness in Federated Learning with A Black-box Server](https://arxiv.org//abs/2408.06042) ++ [Understanding Byzantine Robustness in Federated Learning with A Black-box Server](https://arxiv.org/abs/2408.06042) Fangyuan Zhao, Yuexiang Xie, Xuebin Ren, Bolin Ding, Shusen Yang, Yaliang Li -+ [Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information](https://arxiv.org//abs/2408.05900) ++ [Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information](https://arxiv.org/abs/2408.05900) Mingkun Zhang, Jianing Li, Wei Chen, Jiafeng Guo, Xueqi Cheng -+ [Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment](https://arxiv.org//abs/2408.06079) ++ [Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment](https://arxiv.org/abs/2408.06079) Kejia Zhang, Juanjuan Weng, Zhiming Luo, Shaozi Li -+ [LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization](https://arxiv.org//abs/2408.06297) ++ [LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization](https://arxiv.org/abs/2408.06297) Adarsh Barik, Anand Krishna, Vincent Y. F. Tan -+ [Nob-MIAs: Non-biased Membership Inference Attacks Assessment on Large Language Models with Ex-Post Dataset Construction](https://arxiv.org//abs/2408.05968) ++ [Nob-MIAs: Non-biased Membership Inference Attacks Assessment on Large Language Models with Ex-Post Dataset Construction](https://arxiv.org/abs/2408.05968) Cédric Eichler, Nathan Champeil, Nicolas Anciaux, Alexandra Bensamoun, Heber Hwang Arcolezi, José Maria De Fuentes -+ [Lancelot: Towards Efficient and Privacy-Preserving Byzantine-Robust Federated Learning within Fully Homomorphic Encryption](https://arxiv.org//abs/2408.06197) ++ [Lancelot: Towards Efficient and Privacy-Preserving Byzantine-Robust Federated Learning within Fully Homomorphic Encryption](https://arxiv.org/abs/2408.06197) Siyang Jiang, Hao Yang, Qipeng Xie, Chuan Ma, Sen Wang, Guoliang Xing -+ [Fooling SHAP with Output Shuffling Attacks](https://arxiv.org//abs/2408.06509) ++ [Fooling SHAP with Output Shuffling Attacks](https://arxiv.org/abs/2408.06509) Jun Yuan, Aritra Dasgupta # 2024-08-11 -+ [StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model](https://arxiv.org//abs/2408.05669) ++ [StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model](https://arxiv.org/abs/2408.05669) Ziyin Zhou, Ke Sun, Zhongxi Chen, Huafeng Kuang, Xiaoshuai Sun, Rongrong Ji -+ [Improving Adversarial Transferability with Neighbourhood Gradient Information](https://arxiv.org//abs/2408.05745) ++ [Improving Adversarial Transferability with Neighbourhood Gradient Information](https://arxiv.org/abs/2408.05745) Haijing Guo, Jiafeng Wang, Zhaoyu Chen, Kaixun Jiang, Lingyi Hong, Pinxue Guo, Jinglun Li, Wenqiang Zhang # 2024-08-10 -+ [MABR: A Multilayer Adversarial Bias Removal Approach Without Prior Bias Knowledge](https://arxiv.org//abs/2408.05497) ++ [MABR: A Multilayer Adversarial Bias Removal Approach Without Prior Bias Knowledge](https://arxiv.org/abs/2408.05497) Maxwell J. Yin, Boyu Wang, Charles Ling -+ [ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack](https://arxiv.org//abs/2408.05479) ++ [ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack](https://arxiv.org/abs/2408.05479) Ziyi Gao, Kai Chen, Zhipeng Wei, Tingshu Mou, Jingjing Chen, Zhiyu Tan, Hao Li, Yu-Gang Jiang # 2024-08-09 -+ [h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment](https://arxiv.org//abs/2408.04811) ++ [h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment](https://arxiv.org/abs/2408.04811) Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia, Davide Ghilardi, Anna Goldie, Federico Bianchi, Dan Jurafsky, Christopher D. Manning -+ [Counterfactual Explanations with Probabilistic Guarantees on their Robustness to Model Change](https://arxiv.org//abs/2408.04842) ++ [Counterfactual Explanations with Probabilistic Guarantees on their Robustness to Model Change](https://arxiv.org/abs/2408.04842) Ignacy Stępka, Mateusz Lango, Jerzy Stefanowski -+ [A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares](https://arxiv.org//abs/2408.05061) ++ [A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares](https://arxiv.org/abs/2408.05061) Stav Cohen, Ron Bitton, Ben Nassi -+ [Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery](https://arxiv.org//abs/2408.04958) ++ [Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery](https://arxiv.org/abs/2408.04958) Long Bai, Guankun Wang, Mobarakol Islam, Lalithkumar Seenivasan, An Wang, Hongliang Ren -+ [Adversarially Robust Industrial Anomaly Detection Through Diffusion Model](https://arxiv.org//abs/2408.04839) ++ [Adversarially Robust Industrial Anomaly Detection Through Diffusion Model](https://arxiv.org/abs/2408.04839) Yuanpu Cao, Lu Lin, Jinghui Chen -+ [Model Debiasing by Learnable Data Augmentation](https://arxiv.org//abs/2408.04955) ++ [Model Debiasing by Learnable Data Augmentation](https://arxiv.org/abs/2408.04955) Pietro Morerio, Ruggero Ragonesi, Vittorio Murino -+ [Range Membership Inference Attacks](https://arxiv.org//abs/2408.05131) ++ [Range Membership Inference Attacks](https://arxiv.org/abs/2408.05131) Jiashu Tao, Reza Shokri -+ [Federated Hypergraph Learning with Local Differential Privacy: Toward Privacy-Aware Hypergraph Structure Completion](https://arxiv.org//abs/2408.05160) ++ [Federated Hypergraph Learning with Local Differential Privacy: Toward Privacy-Aware Hypergraph Structure Completion](https://arxiv.org/abs/2408.05160) Linfeng Luo, Zhiqi Guo, Fengxiao Tang, Zihao Qiu, Ming Zhao # 2024-08-08 -+ [Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness](https://arxiv.org//abs/2408.04585) ++ [Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness](https://arxiv.org/abs/2408.04585) Xiaojing Fan, Chunliang Tao -+ [Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed Bandit](https://arxiv.org//abs/2408.04310) ++ [Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed Bandit](https://arxiv.org/abs/2408.04310) Duanyi Yao, Songze Li, Ye Xue, Jin Liu -+ [FDI: Attack Neural Code Generation Systems through User Feedback Channel](https://arxiv.org//abs/2408.04194) ++ [FDI: Attack Neural Code Generation Systems through User Feedback Channel](https://arxiv.org/abs/2408.04194) Zhensu Sun, Xiaoning Du, Xiapu Luo, Fu Song, David Lo, Li Li -+ [Eliminating Backdoors in Neural Code Models via Trigger Inversion](https://arxiv.org//abs/2408.04683) ++ [Eliminating Backdoors in Neural Code Models via Trigger Inversion](https://arxiv.org/abs/2408.04683) Weisong Sun, Yuchen Chen, Chunrong Fang, Yebo Feng, Yuan Xiao, An Guo, Quanjun Zhang, Yang Liu, Baowen Xu, Zhenyu Chen -+ [Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles](https://arxiv.org//abs/2408.04686) ++ [Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles](https://arxiv.org/abs/2408.04686) Xiongtao Sun, Deyue Zhang, Dongdong Yang, Quanchen Zou, Hui Li -+ [Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness](https://arxiv.org//abs/2408.05446) ++ [Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness](https://arxiv.org/abs/2408.05446) Stanislav Fort, Balaji Lakshminarayanan -+ [VideoQA in the Era of LLMs: An Empirical Study](https://arxiv.org//abs/2408.04223) ++ [VideoQA in the Era of LLMs: An Empirical Study](https://arxiv.org/abs/2408.04223) Junbin Xiao, Nanxin Huang, Hangyu Qin, Dongyang Li, Yicong Li, Fengbin Zhu, Zhulin Tao, Jianxing Yu, Liang Lin, Tat-Seng Chua, Angela Yao # 2024-08-07 -+ [EnJa: Ensemble Jailbreak on Large Language Models](https://arxiv.org//abs/2408.03603) ++ [EnJa: Ensemble Jailbreak on Large Language Models](https://arxiv.org/abs/2408.03603) Jiahao Zhang, Zilong Wang, Ruofan Wang, Xingjun Ma, Yu-Gang Jiang -+ [LaFA: Latent Feature Attacks on Non-negative Matrix Factorization](https://arxiv.org//abs/2408.03909) ++ [LaFA: Latent Feature Attacks on Non-negative Matrix Factorization](https://arxiv.org/abs/2408.03909) Minh Vu, Ben Nebgen, Erik Skau, Geigh Zollicoffer, Juan Castorena, Kim Rasmussen, Boian Alexandrov, Manish Bhattarai -+ [TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization](https://arxiv.org//abs/2408.03637) ++ [TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization](https://arxiv.org/abs/2408.03637) Kien T. Pham, Jingye Chen, Qifeng Chen -+ [Enhancing Output Diversity Improves Conjugate Gradient-based Adversarial Attacks](https://arxiv.org//abs/2408.03972) ++ [Enhancing Output Diversity Improves Conjugate Gradient-based Adversarial Attacks](https://arxiv.org/abs/2408.03972) Keiichiro Yamamura, Issa Oe, Hiroki Ishikura, Katsuki Fujisawa -+ [PushPull-Net: Inhibition-driven ResNet robust to image corruptions](https://arxiv.org//abs/2408.04077) ++ [PushPull-Net: Inhibition-driven ResNet robust to image corruptions](https://arxiv.org/abs/2408.04077) Guru Swaroop Bennabhaktula, Enrique Alegre, Nicola Strisciuglio, George Azzopardi -+ [Exploring RAG-based Vulnerability Augmentation with LLMs](https://arxiv.org//abs/2408.04125) ++ [Exploring RAG-based Vulnerability Augmentation with LLMs](https://arxiv.org/abs/2408.04125) Seyed Shayan Daneshvar, Yu Nong, Xu Yang, Shaowei Wang, Haipeng Cai # 2024-08-06 -+ [Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)](https://arxiv.org//abs/2408.04664) ++ [Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)](https://arxiv.org/abs/2408.04664) Avshalom Manevich, Reut Tsarfaty # 2024-08-05 -+ [RCDM: Enabling Robustness for Conditional Diffusion Model](https://arxiv.org//abs/2408.02710) ++ [RCDM: Enabling Robustness for Conditional Diffusion Model](https://arxiv.org/abs/2408.02710) Weifeng Xu, Xiang Zhu, Xiaoyong Li -+ [Mitigating Malicious Attacks in Federated Learning via Confidence-aware Defense](https://arxiv.org//abs/2408.02813) ++ [Mitigating Malicious Attacks in Federated Learning via Confidence-aware Defense](https://arxiv.org/abs/2408.02813) Qilei Li, Ahmed M. Abdelmoniem -+ [Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models](https://arxiv.org//abs/2408.02980) ++ [Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models](https://arxiv.org/abs/2408.02980) Haonan Zheng, Wen Jiang, Xinyang Deng, Wenrui Li -+ [DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning](https://arxiv.org//abs/2408.07080) ++ [DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning](https://arxiv.org/abs/2408.07080) Dino Ienco (EVERGREEN, UMR TETIS, INRAE), Cassio Fraga Dantas (UMR TETIS, INRAE, EVERGREEN) @@ -21432,41 +21432,41 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Shaopeng Fu, Xuexue Sun, Ke Qing, Tianhang Zheng, Di Wang -+ [Black-Box Adversarial Attacks on LLM-Based Code Completion](https://arxiv.org//abs/2408.02509) ++ [Black-Box Adversarial Attacks on LLM-Based Code Completion](https://arxiv.org/abs/2408.02509) Slobodan Jenko, Niels Mündler, Jingxuan He, Mark Vero, Martin Vechev -+ [Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions](https://arxiv.org//abs/2408.02544) ++ [Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions](https://arxiv.org/abs/2408.02544) Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang, Hai Zhao # 2024-08-04 -+ [Top K Enhanced Reinforcement Learning Attacks on Heterogeneous Graph Node Classification](https://arxiv.org//abs/2408.01964) ++ [Top K Enhanced Reinforcement Learning Attacks on Heterogeneous Graph Node Classification](https://arxiv.org/abs/2408.01964) Honglin Gao, Xiang Li, Yajuan Sun, Gaoxi Xiao # 2024-08-02 -+ [Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition](https://arxiv.org//abs/2408.01139) ++ [Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition](https://arxiv.org/abs/2408.01139) Róisín Luo, James McDermott, Colm O'Riordan -+ [Mission Impossible: A Statistical Perspective on Jailbreaking LLMs](https://arxiv.org//abs/2408.01420) ++ [Mission Impossible: A Statistical Perspective on Jailbreaking LLMs](https://arxiv.org/abs/2408.01420) Jingtong Su, Julia Kempe, Karen Ullrich -+ [Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs](https://arxiv.org//abs/2408.01355) ++ [Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs](https://arxiv.org/abs/2408.01355) Peng Ding, Jingyu Wu, Jun Kuang, Dan Ma, Xuezhi Cao, Xunliang Cai, Shi Chen, Jiajun Chen, Shujian Huang -+ [Assessing Robustness of Machine Learning Models using Covariate Perturbations](https://arxiv.org//abs/2408.01300) ++ [Assessing Robustness of Machine Learning Models using Covariate Perturbations](https://arxiv.org/abs/2408.01300) Arun Prakash R, Anwesha Bhattacharyya, Joel Vaughan, Vijayan N. Nair -+ [EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional Prosody](https://arxiv.org//abs/2408.01178) ++ [EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional Prosody](https://arxiv.org/abs/2408.01178) Coen Schoof, Stefanos Koffas, Mauro Conti, Stjepan Picek @@ -21477,167 +21477,167 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Alexander Gushchin, Khaled Abud, Georgii Bychkov, Ekaterina Shumitskaya, Anna Chistyakova, Sergey Lavrushkin, Bader Rasheed, Kirill Malyshev, Dmitriy Vatolin, Anastasia Antsiferova # 2024-08-01 -+ [Contrastive Graph Representation Learning with Adversarial Cross-view Reconstruction and Information Bottleneck](https://arxiv.org//abs/2408.00295) ++ [Contrastive Graph Representation Learning with Adversarial Cross-view Reconstruction and Information Bottleneck](https://arxiv.org/abs/2408.00295) Yuntao Shou, Haozhi Lan, Xiangyong Cao -+ [ADBM: Adversarial diffusion bridge model for reliable adversarial purification](https://arxiv.org//abs/2408.00315) ++ [ADBM: Adversarial diffusion bridge model for reliable adversarial purification](https://arxiv.org/abs/2408.00315) Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, Xiaolin Hu -+ [OTAD: An Optimal Transport-Induced Robust Model for Agnostic Adversarial Attack](https://arxiv.org//abs/2408.00329) ++ [OTAD: An Optimal Transport-Induced Robust Model for Agnostic Adversarial Attack](https://arxiv.org/abs/2408.00329) Kuo Gai, Sicong Wang, Shihua Zhang -+ [CERT-ED: Certifiably Robust Text Classification for Edit Distance](https://arxiv.org//abs/2408.00728) ++ [CERT-ED: Certifiably Robust Text Classification for Edit Distance](https://arxiv.org/abs/2408.00728) Zhuoqun Huang, Neil G Marchant, Olga Ohrimenko, Benjamin I. P. Rubinstein -+ [Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion](https://arxiv.org//abs/2408.00352) ++ [Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion](https://arxiv.org/abs/2408.00352) Honglei Miao, Fan Ma, Ruijie Quan, Kun Zhan, Yi Yang -+ [Revocable Backdoor for Deep Model Trading](https://arxiv.org//abs/2408.00255) ++ [Revocable Backdoor for Deep Model Trading](https://arxiv.org/abs/2408.00255) Yiran Xu, Nan Zhong, Zhenxing Qian, Xinpeng Zhang -+ [Adversarial Text Rewriting for Text-aware Recommender Systems](https://arxiv.org//abs/2408.00312) ++ [Adversarial Text Rewriting for Text-aware Recommender Systems](https://arxiv.org/abs/2408.00312) Sejoon Oh, Gaurav Verma, Srijan Kumar -+ [Benchmarking Attacks on Learning with Errors](https://arxiv.org//abs/2408.00882) ++ [Benchmarking Attacks on Learning with Errors](https://arxiv.org/abs/2408.00882) Emily Wenger, Eshika Saxena, Mohamed Malhou, Ellie Thieu, Kristin Lauter -+ [Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models](https://arxiv.org//abs/2408.00523) ++ [Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models](https://arxiv.org/abs/2408.00523) Yingkai Dong, Xiangtao Meng, Ning Yu, Zheng Li, Shanqing Guo # 2024-07-31 -+ [Measuring What Matters: Intrinsic Distance Preservation as a Robust Metric for Embedding Quality](https://arxiv.org//abs/2407.21590) ++ [Measuring What Matters: Intrinsic Distance Preservation as a Robust Metric for Embedding Quality](https://arxiv.org/abs/2407.21590) Steven N. Hart, Thomas E. Tavolara -+ [Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?](https://arxiv.org//abs/2407.21792) ++ [Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?](https://arxiv.org/abs/2407.21792) Richard Ren, Steven Basart, Adam Khoja, Alice Gatti, Long Phan, Xuwang Yin, Mantas Mazeika, Alexander Pan, Gabriel Mukobi, Ryan H. Kim, Stephen Fitz, Dan Hendrycks -+ [Defending Jailbreak Attack in VLMs via Cross-modality Information Detector](https://arxiv.org//abs/2407.21659) ++ [Defending Jailbreak Attack in VLMs via Cross-modality Information Detector](https://arxiv.org/abs/2407.21659) Yue Xu, Xiuyuan Qi, Zhan Qin, Wenjie Wang -+ [Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model](https://arxiv.org//abs/2407.21408) ++ [Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model](https://arxiv.org/abs/2407.21408) Zhichao Zhang, Xinyue Li, Wei Sun, Jun Jia, Xiongkuo Min, Zicheng Zhang, Chunyi Li, Zijian Chen, Puyi Wang, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Guangtao Zhai -+ [Conditioned Prompt-Optimization for Continual Deepfake Detection](https://arxiv.org//abs/2407.21554) ++ [Conditioned Prompt-Optimization for Continual Deepfake Detection](https://arxiv.org/abs/2407.21554) Francesco Laiti, Benedetta Liberatori, Thomas De Min, Elisa Ricci -+ [Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs](https://arxiv.org//abs/2407.21771) ++ [Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs](https://arxiv.org/abs/2407.21771) Shi Liu, Kecheng Zheng, Wei Chen -+ [Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models](https://arxiv.org//abs/2407.21316) ++ [Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models](https://arxiv.org/abs/2407.21316) Jiang Hao, Xiao Jin, Hu Xiaoguang, Chen Tianyou -+ [Resilience and Security of Deep Neural Networks Against Intentional and Unintentional Perturbations: Survey and Research Challenges](https://arxiv.org//abs/2408.00193) ++ [Resilience and Security of Deep Neural Networks Against Intentional and Unintentional Perturbations: Survey and Research Challenges](https://arxiv.org/abs/2408.00193) Sazzad Sayyed, Milin Zhang, Shahriar Rifat, Ananthram Swami, Michael De Lucia, Francesco Restuccia # 2024-07-30 -+ [FACL-Attack: Frequency-Aware Contrastive Learning for Transferable Adversarial Attacks](https://arxiv.org//abs/2407.20653) ++ [FACL-Attack: Frequency-Aware Contrastive Learning for Transferable Adversarial Attacks](https://arxiv.org/abs/2407.20653) Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon -+ [Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks](https://arxiv.org//abs/2407.20657) ++ [Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks](https://arxiv.org/abs/2407.20657) Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon -+ [PIP: Prototypes-Injected Prompt for Federated Class Incremental Learning](https://arxiv.org//abs/2407.20705) ++ [PIP: Prototypes-Injected Prompt for Federated Class Incremental Learning](https://arxiv.org/abs/2407.20705) Muhammad Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy, Lin Liu, Habibullah Habibullah, Ryszard Kowalczyk -+ [Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks](https://arxiv.org//abs/2407.20836) ++ [Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks](https://arxiv.org/abs/2407.20836) Yunfeng Diao, Naixin Zhai, Changtao Miao, Xun Yang, Meng Wang -+ [Can LLMs be Fooled? Investigating Vulnerabilities in LLMs](https://arxiv.org//abs/2407.20529) ++ [Can LLMs be Fooled? Investigating Vulnerabilities in LLMs](https://arxiv.org/abs/2407.20529) Sara Abdali, Jia He, CJ Barberan, Richard Anarfi -+ [Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification](https://arxiv.org//abs/2407.20859) ++ [Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification](https://arxiv.org/abs/2407.20859) Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, Yang Zhang -+ [DeepBaR: Fault Backdoor Attack on Deep Neural Network Layers](https://arxiv.org//abs/2407.21220) ++ [DeepBaR: Fault Backdoor Attack on Deep Neural Network Layers](https://arxiv.org/abs/2407.21220) C. A. Martínez-Mejía, J. Solano, J. Breier, D. Bucko, X. Hou # 2024-07-29 -+ [Can Editing LLMs Inject Harm?](https://arxiv.org//abs/2407.20224) ++ [Can Editing LLMs Inject Harm?](https://arxiv.org/abs/2407.20224) Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu -+ [Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability](https://arxiv.org//abs/2407.19842) ++ [Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability](https://arxiv.org/abs/2407.19842) Jorge García-Carrasco, Alejandro Maté, Juan Trujillo -+ [RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding](https://arxiv.org//abs/2407.20099) ++ [RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding](https://arxiv.org/abs/2407.20099) Keming Wu, Man Yao, Yuhong Chou, Xuerui Qiu, Rui Yang, Bo Xu, Guoqi Li -+ [BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning](https://arxiv.org//abs/2407.19845) ++ [BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning](https://arxiv.org/abs/2407.19845) Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Mingli Zhu, Ruotong Wang, Li Liu, Chao Shen -+ [Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank](https://arxiv.org//abs/2407.19943) ++ [Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank](https://arxiv.org/abs/2407.19943) Shashank Gupta, Harrie Oosterhuis, Maarten de Rijke -+ [Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities](https://arxiv.org//abs/2407.20337) ++ [Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities](https://arxiv.org/abs/2407.20337) Lorenzo Baraldi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Alessandro Nicolosi, Rita Cucchiara -+ [From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks](https://arxiv.org//abs/2407.20361) ++ [From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks](https://arxiv.org/abs/2407.20361) Aditya Kulkarni, Vivek Balachandran, Dinil Mon Divakaran, Tamal Das -+ [Enhancing Adversarial Text Attacks on BERT Models with Projected Gradient Descent](https://arxiv.org//abs/2407.21073) ++ [Enhancing Adversarial Text Attacks on BERT Models with Projected Gradient Descent](https://arxiv.org/abs/2407.21073) Hetvi Waghela, Jaydip Sen, Sneha Rakshit @@ -21647,364 +21647,364 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Aditya Kulkarni, Vivek Balachandran, Dinil Mon Divakaran, Tamal Das # 2024-07-28 -+ [Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection](https://arxiv.org//abs/2407.19553) ++ [Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection](https://arxiv.org/abs/2407.19553) Vincenzo De Rosa, Fabrizio Guillaro, Giovanni Poggi, Davide Cozzolino, Luisa Verdoliva # 2024-07-27 -+ [Towards Clean-Label Backdoor Attacks in the Physical World](https://arxiv.org//abs/2407.19203) ++ [Towards Clean-Label Backdoor Attacks in the Physical World](https://arxiv.org/abs/2407.19203) Thinh Dao, Cuong Chi Le, Khoa D Doan, Kok-Seng Wong -+ [EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection](https://arxiv.org//abs/2407.19216) ++ [EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection](https://arxiv.org/abs/2407.19216) Shigang Liu, Di Cao, Junae Kim, Tamas Abraham, Paul Montague, Seyit Camtepe, Jun Zhang, Yang Xiang -+ [Debiased Graph Poisoning Attack via Contrastive Surrogate Objective](https://arxiv.org//abs/2407.19155) ++ [Debiased Graph Poisoning Attack via Contrastive Surrogate Objective](https://arxiv.org/abs/2407.19155) Kanghoon Yoon, Yeonjun In, Namkyeong Lee, Kibum Kim, Chanyoung Park # 2024-07-26 -+ [Adversarial Robustification via Text-to-Image Diffusion Models](https://arxiv.org//abs/2407.18658) ++ [Adversarial Robustification via Text-to-Image Diffusion Models](https://arxiv.org/abs/2407.18658) Daewon Choi, Jongheon Jeong, Huiwon Jang, Jinwoo Shin -+ [Robust VAEs via Generating Process of Noise Augmented Data](https://arxiv.org//abs/2407.18632) ++ [Robust VAEs via Generating Process of Noise Augmented Data](https://arxiv.org/abs/2407.18632) Hiroo Irobe, Wataru Aoki, Kimihiro Yamazaki, Yuhui Zhang, Takumi Nakagawa, Hiroki Waida, Yuichiro Wada, Takafumi Kanamori -+ [Accuracy-Privacy Trade-off in the Mitigation of Membership Inference Attack in Federated Learning](https://arxiv.org//abs/2407.19119) ++ [Accuracy-Privacy Trade-off in the Mitigation of Membership Inference Attack in Federated Learning](https://arxiv.org/abs/2407.19119) Sayyed Farid Ahamed, Soumya Banerjee, Sandip Roy, Devin Quinn, Marc Vucovich, Kevin Choi, Abdul Rahman, Alison Hu, Edward Bowen, Sachin Shetty # 2024-07-25 -+ [A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models](https://arxiv.org//abs/2407.17797) ++ [A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models](https://arxiv.org/abs/2407.17797) Haonan Zheng, Xinyang Deng, Wen Jiang, Wenrui Li -+ [The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models](https://arxiv.org//abs/2407.17915) ++ [The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models](https://arxiv.org/abs/2407.17915) Zihui Wu, Haichang Gao, Jianping He, Ping Wang -+ [Peak-Controlled Logits Poisoning Attack in Federated Distillation](https://arxiv.org//abs/2407.18039) ++ [Peak-Controlled Logits Poisoning Attack in Federated Distillation](https://arxiv.org/abs/2407.18039) Yuhan Tang, Aoxu Zhang, Zhiyuan Wu, Bo Gao, Tian Wen, Yuwei Wang, Sheng Sun -+ [Is the Digital Forensics and Incident Response Pipeline Ready for Text-Based Threats in LLM Era?](https://arxiv.org//abs/2407.17870) ++ [Is the Digital Forensics and Incident Response Pipeline Ready for Text-Based Threats in LLM Era?](https://arxiv.org/abs/2407.17870) Avanti Bhandarkar, Ronald Wilson, Anushka Swarup, Mengdi Zhu, Damon Woodard -+ [Sparse vs Contiguous Adversarial Pixel Perturbations in Multimodal Models: An Empirical Analysis](https://arxiv.org//abs/2407.18251) ++ [Sparse vs Contiguous Adversarial Pixel Perturbations in Multimodal Models: An Empirical Analysis](https://arxiv.org/abs/2407.18251) Cristian-Alexandru Botocan, Raphael Meier, Ljiljana Dolamic -+ [RIDA: A Robust Attack Framework on Incomplete Graphs](https://arxiv.org//abs/2407.18170) ++ [RIDA: A Robust Attack Framework on Incomplete Graphs](https://arxiv.org/abs/2407.18170) Jianke Yu, Hanchen Wang, Chen Chen, Xiaoyang Wang, Wenjie Zhang, Ying Zhang -+ [Adversarial Robust Decision Transformer: Enhancing Robustness of RvS via Minimax Returns-to-go](https://arxiv.org//abs/2407.18414) ++ [Adversarial Robust Decision Transformer: Enhancing Robustness of RvS via Minimax Returns-to-go](https://arxiv.org/abs/2407.18414) Xiaohang Tang, Afonso Marques, Parameswaran Kamalaruban, Ilija Bogunovic # 2024-07-24 -+ [Robust Deep Hawkes Process under Label Noise of Both Event and Occurrence](https://arxiv.org//abs/2407.17164) ++ [Robust Deep Hawkes Process under Label Noise of Both Event and Occurrence](https://arxiv.org/abs/2407.17164) Xiaoyu Tan, Bin Li, Xihe Qiu, Jingjing Huang, Yinghui Xu, Wei Chu -+ [How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations?](https://arxiv.org//abs/2407.17291) ++ [How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations?](https://arxiv.org/abs/2407.17291) Leo Yu-Ho Lo, Huamin Qu -+ [Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches](https://arxiv.org//abs/2407.17312) ++ [Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches](https://arxiv.org/abs/2407.17312) Chenxing Zhao, Yang Li, Shihao Wu, Wenyi Tan, Shuangju Zhou, Quan Pan -+ [Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?](https://arxiv.org//abs/2407.17417) ++ [Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?](https://arxiv.org/abs/2407.17417) Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang -+ [From Sands to Mansions: Enabling Automatic Full-Life-Cycle Cyberattack Construction with LLM](https://arxiv.org//abs/2407.16928) ++ [From Sands to Mansions: Enabling Automatic Full-Life-Cycle Cyberattack Construction with LLM](https://arxiv.org/abs/2407.16928) Lingzhi Wang, Jiahui Wang, Kyle Jung, Kedar Thiagarajan, Emily Wei, Xiangmin Shen, Yan Chen, Zhenyuan Li # 2024-07-23 -+ [Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models](https://arxiv.org//abs/2407.16205) ++ [Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models](https://arxiv.org/abs/2407.16205) Shi Lin, Rongchang Li, Xun Wang, Changting Lin, Wenpeng Xing, Meng Han -+ [RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent](https://arxiv.org//abs/2407.16667) ++ [RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent](https://arxiv.org/abs/2407.16667) Huiyu Xu, Wenhui Zhang, Zhibo Wang, Feng Xiao, Rui Zheng, Yunhe Feng, Zhongjie Ba, Kui Ren -+ [Can Large Language Models Automatically Jailbreak GPT-4V?](https://arxiv.org//abs/2407.16686) ++ [Can Large Language Models Automatically Jailbreak GPT-4V?](https://arxiv.org/abs/2407.16686) Yuanwei Wu, Yue Huang, Yixin Liu, Xiang Li, Pan Zhou, Lichao Sun -+ [Algebraic Adversarial Attacks on Integrated Gradients](https://arxiv.org//abs/2407.16233) ++ [Algebraic Adversarial Attacks on Integrated Gradients](https://arxiv.org/abs/2407.16233) Lachlan Simpson, Federico Costanza, Kyle Millar, Adriel Cheng, Cheng-Chew Lim, Hong Gunn Chew -+ [STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments](https://arxiv.org//abs/2407.16337) ++ [STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments](https://arxiv.org/abs/2407.16337) Hao Zhou, Kun Sun, Shaoming Li, Yangfeng Fan, Guibin Jiang, Jiaqi Zheng, Tao Li -+ [Backdoor Attacks against Hybrid Classical-Quantum Neural Networks](https://arxiv.org//abs/2407.16273) ++ [Backdoor Attacks against Hybrid Classical-Quantum Neural Networks](https://arxiv.org/abs/2407.16273) Ji Guo, Wenbo Jiang, Rui Zhang, Wenshu Fan, Jiachen Li, Guoming Lu -+ [Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning](https://arxiv.org//abs/2407.16307) ++ [Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning](https://arxiv.org/abs/2407.16307) Xinwei Liu, Xiaojun Jia, Yuan Xun, Siyuan Liang, Xiaochun Cao -+ [Theoretical Analysis of Privacy Leakage in Trustworthy Federated Learning: A Perspective from Linear Algebra and Optimization Theory](https://arxiv.org//abs/2407.16735) ++ [Theoretical Analysis of Privacy Leakage in Trustworthy Federated Learning: A Perspective from Linear Algebra and Optimization Theory](https://arxiv.org/abs/2407.16735) Xiaojin Zhang, Wei Chen -+ [S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks](https://arxiv.org//abs/2407.17587) ++ [S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks](https://arxiv.org/abs/2407.17587) Neha A S, Vivek Chaturvedi, Muhammad Shafique # 2024-07-22 -+ [ImPress: Securing DRAM Against Data-Disturbance Errors via Implicit Row-Press Mitigation](https://arxiv.org//abs/2407.16006) ++ [ImPress: Securing DRAM Against Data-Disturbance Errors via Implicit Row-Press Mitigation](https://arxiv.org/abs/2407.16006) Moinuddin Qureshi, Anish Saxena, Aamer Jaleel # 2024-07-20 -+ [Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)](https://arxiv.org//abs/2407.14937) ++ [Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)](https://arxiv.org/abs/2407.14937) Apurv Verma, Satyapriya Krishna, Sebastian Gehrmann, Madhavan Seshadri, Anu Pradhan, Tom Ault, Leslie Barrett, David Rabinowitz, John Doucette, NhatHai Phan # 2024-07-18 -+ [DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving](https://arxiv.org//abs/2407.13690) ++ [DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving](https://arxiv.org/abs/2407.13690) Yuxuan Tong, Xiwen Zhang, Rui Wang, Ruidong Wu, Junxian He -+ [Visually Robust Adversarial Imitation Learning from Videos with Contrastive Learning](https://arxiv.org//abs/2407.12792) ++ [Visually Robust Adversarial Imitation Learning from Videos with Contrastive Learning](https://arxiv.org/abs/2407.12792) Vittorio Giammarino, James Queeney, Ioannis Ch. Paschalidis -+ [Cross-Task Attack: A Self-Supervision Generative Framework Based on Attention Shift](https://arxiv.org//abs/2407.13700) ++ [Cross-Task Attack: A Self-Supervision Generative Framework Based on Attention Shift](https://arxiv.org/abs/2407.13700) Qingyuan Zeng, Yunpeng Gong, Min Jiang -+ [Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models](https://arxiv.org//abs/2407.13757) ++ [Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models](https://arxiv.org/abs/2407.13757) Zhuo Chen, Jiawei Liu, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu -+ [Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models](https://arxiv.org//abs/2407.13252) ++ [Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models](https://arxiv.org/abs/2407.13252) Qiao Li, Xiaomeng Fu, Xi Wang, Jin Liu, Xingyu Gao, Jiao Dai, Jizhong Han -+ [PG-Attack: A Precision-Guided Adversarial Attack Framework Against Vision Foundation Models for Autonomous Driving](https://arxiv.org//abs/2407.13111) ++ [PG-Attack: A Precision-Guided Adversarial Attack Framework Against Vision Foundation Models for Autonomous Driving](https://arxiv.org/abs/2407.13111) Jiyuan Fu, Zhaoyu Chen, Kaixun Jiang, Haijing Guo, Shuyong Gao, Wenqiang Zhang -+ [Krait: A Backdoor Attack Against Graph Prompt Tuning](https://arxiv.org//abs/2407.13068) ++ [Krait: A Backdoor Attack Against Graph Prompt Tuning](https://arxiv.org/abs/2407.13068) Ying Song, Rita Singh, Balaji Palanisamy -+ [Motif-Consistent Counterfactuals with Adversarial Refinement for Graph-Level Anomaly Detection](https://arxiv.org//abs/2407.13251) ++ [Motif-Consistent Counterfactuals with Adversarial Refinement for Graph-Level Anomaly Detection](https://arxiv.org/abs/2407.13251) Chunjing Xiao, Shikang Pang, Wenxin Tai, Yanlong Huang, Goce Trajcevski, Fan Zhou -+ [Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls](https://arxiv.org//abs/2407.13625) ++ [Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls](https://arxiv.org/abs/2407.13625) Aras Selvi, Eleonora Kreacic, Mohsen Ghassemi, Vamsi Potluru, Tucker Balch, Manuela Veloso -+ [BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization](https://arxiv.org//abs/2407.13928) ++ [BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization](https://arxiv.org/abs/2407.13928) Ahmed Allam -+ [A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks](https://arxiv.org//abs/2407.13863) ++ [A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks](https://arxiv.org/abs/2407.13863) Yixiang Qiu, Hao Fang, Hongyao Yu, Bin Chen, MeiKang Qiu, Shu-Tao Xia -+ [Baba Is AI: Break the Rules to Beat the Benchmark](https://arxiv.org//abs/2407.13729) ++ [Baba Is AI: Break the Rules to Beat the Benchmark](https://arxiv.org/abs/2407.13729) Nathan Cloos, Meagan Jens, Michelangelo Naim, Yen-Ling Kuo, Ignacio Cases, Andrei Barbu, Christopher J. Cueva # 2024-07-17 -+ [Turning Generative Models Degenerate: The Power of Data Poisoning Attacks](https://arxiv.org//abs/2407.12281) ++ [Turning Generative Models Degenerate: The Power of Data Poisoning Attacks](https://arxiv.org/abs/2407.12281) Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Farhan Ahmed, Ling Cai, Nathalie Baracaldo -+ [Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection](https://arxiv.org//abs/2407.12292) ++ [Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection](https://arxiv.org/abs/2407.12292) Youheng Sun, Shengming Yuan, Xuanhan Wang, Lianli Gao, Jingkuan Song -+ [Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks](https://arxiv.org//abs/2407.12588) ++ [Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks](https://arxiv.org/abs/2407.12588) Antoni Kowalczuk, Jan Dubiński, Atiyeh Ashari Ghomi, Yi Sui, George Stein, Jiapeng Wu, Jesse C. Cresswell, Franziska Boenisch, Adam Dziedzic -+ [LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models](https://arxiv.org//abs/2407.12772) ++ [LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models](https://arxiv.org/abs/2407.12772) Kaichen Zhang, Bo Li, Peiyuan Zhang, Fanyi Pu, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, Ziwei Liu -+ [Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective](https://arxiv.org//abs/2407.12443) ++ [Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective](https://arxiv.org/abs/2407.12443) Zhaoxin Wang, Handing Wang, Cong Tian, Yaochu Jin -+ [Contrastive Adversarial Training for Unsupervised Domain Adaptation](https://arxiv.org//abs/2407.12782) ++ [Contrastive Adversarial Training for Unsupervised Domain Adaptation](https://arxiv.org/abs/2407.12782) Jiahong Chen, Zhilin Zhang, Lucy Li, Behzad Shahrasbi, Arjun Mishra -+ [AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases](https://arxiv.org//abs/2407.12784) ++ [AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases](https://arxiv.org/abs/2407.12784) Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, Bo Li -+ [Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion](https://arxiv.org//abs/2407.21032) ++ [Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion](https://arxiv.org/abs/2407.21032) Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee -+ [Direct Unlearning Optimization for Robust and Safe Text-to-Image Models](https://arxiv.org//abs/2407.21035) ++ [Direct Unlearning Optimization for Robust and Safe Text-to-Image Models](https://arxiv.org/abs/2407.21035) Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang, Yonghyun Jeong, Junghyo Jo, Gayoung Lee # 2024-07-16 -+ [EARN Fairness: Explaining, Asking, Reviewing and Negotiating Artificial Intelligence Fairness Metrics Among Stakeholders](https://arxiv.org//abs/2407.11442) ++ [EARN Fairness: Explaining, Asking, Reviewing and Negotiating Artificial Intelligence Fairness Metrics Among Stakeholders](https://arxiv.org/abs/2407.11442) Lin Luo, Yuri Nakao, Mathieu Chollet, Hiroya Inakoshi, Simone Stumpf -+ [Feature Inference Attack on Shapley Values](https://arxiv.org//abs/2407.11359) ++ [Feature Inference Attack on Shapley Values](https://arxiv.org/abs/2407.11359) Xinjian Luo, Yangfan Jiang, Xiaokui Xiao -+ [AEMIM: Adversarial Examples Meet Masked Image Modeling](https://arxiv.org//abs/2407.11537) ++ [AEMIM: Adversarial Examples Meet Masked Image Modeling](https://arxiv.org/abs/2407.11537) Wenzhao Xiang, Chang Liu, Hang Su, Hongyang Yu -+ [Enhancing TinyML Security: Study of Adversarial Attack Transferability](https://arxiv.org//abs/2407.11599) ++ [Enhancing TinyML Security: Study of Adversarial Attack Transferability](https://arxiv.org/abs/2407.11599) Parin Shah, Yuvaraj Govindarajulu, Pavan Kulkarni, Manojkumar Parmar -+ [Variational Randomized Smoothing for Sample-Wise Adversarial Robustness](https://arxiv.org//abs/2407.11844) ++ [Variational Randomized Smoothing for Sample-Wise Adversarial Robustness](https://arxiv.org/abs/2407.11844) Ryo Hase, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons -+ [Does Refusal Training in LLMs Generalize to the Past Tense?](https://arxiv.org//abs/2407.11969) ++ [Does Refusal Training in LLMs Generalize to the Past Tense?](https://arxiv.org/abs/2407.11969) Maksym Andriushchenko, Nicolas Flammarion -+ [Model Inversion Attacks Through Target-Specific Conditional Diffusion Models](https://arxiv.org//abs/2407.11424) ++ [Model Inversion Attacks Through Target-Specific Conditional Diffusion Models](https://arxiv.org/abs/2407.11424) Ouxiang Li, Yanbin Hao, Zhicai Wang, Bin Zhu, Shuo Wang, Zaixi Zhang, Fuli Feng -+ [Cycle Contrastive Adversarial Learning for Unsupervised image Deraining](https://arxiv.org//abs/2407.11750) ++ [Cycle Contrastive Adversarial Learning for Unsupervised image Deraining](https://arxiv.org/abs/2407.11750) Chen Zhao, Weiling Cai, ChengWei Hu, Zheng Yuan -+ [SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge](https://arxiv.org//abs/2407.11906) ++ [SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge](https://arxiv.org/abs/2407.11906) Hao Ding, Tuxun Lu, Yuqian Zhang, Ruixing Liang, Hongchao Shu, Lalithkumar Seenivasan, Yonghao Long, Qi Dou, Cong Gao, Mathias Unberath -+ [IPA-NeRF: Illusory Poisoning Attack Against Neural Radiance Fields](https://arxiv.org//abs/2407.11921) ++ [IPA-NeRF: Illusory Poisoning Attack Against Neural Radiance Fields](https://arxiv.org/abs/2407.11921) Wenxiang Jiang, Hanwei Zhang, Shuo Zhao, Zhongwen Guo, Hao Wang -+ [UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening](https://arxiv.org//abs/2407.11372) ++ [UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening](https://arxiv.org/abs/2407.11372) Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Hanxi Guo, Shiqing Ma, Xiangyu Zhang -+ [Relaxing Graph Transformers for Adversarial Attacks](https://arxiv.org//abs/2407.11764) ++ [Relaxing Graph Transformers for Adversarial Attacks](https://arxiv.org/abs/2407.11764) Philipp Foth, Lukas Gosch, Simon Geisler, Leo Schwinn, Stephan Günnemann -+ [One-Shot Unlearning of Personal Identities](https://arxiv.org//abs/2407.12069) ++ [One-Shot Unlearning of Personal Identities](https://arxiv.org/abs/2407.12069) Thomas De Min, Subhankar Roy, Massimiliano Mancini, Stéphane Lathuilière, Elisa Ricci -+ [Generalized Coverage for More Robust Low-Budget Active Learning](https://arxiv.org//abs/2407.12212) ++ [Generalized Coverage for More Robust Low-Budget Active Learning](https://arxiv.org/abs/2407.12212) Wonho Bae, Junhyug Noh, Danica J. Sutherland # 2024-07-15 -+ [Backdoor Attacks against Image-to-Image Networks](https://arxiv.org//abs/2407.10445) ++ [Backdoor Attacks against Image-to-Image Networks](https://arxiv.org/abs/2407.10445) Wenbo Jiang, Hongwei Li, Jiaming He, Rui Zhang, Guowen Xu, Tianwei Zhang, Rongxing Lu -+ [Learning to Unlearn for Robust Machine Unlearning](https://arxiv.org//abs/2407.10494) ++ [Learning to Unlearn for Robust Machine Unlearning](https://arxiv.org/abs/2407.10494) Mark He Huang, Lin Geng Foo, Jun Liu -+ [Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks](https://arxiv.org//abs/2407.10825) ++ [Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks](https://arxiv.org/abs/2407.10825) Quang H. Nguyen, Nguyen Ngoc-Hieu, The-Anh Ta, Thanh Nguyen-Tang, Hoang Thanh-Tung, Khoa D. Doan -+ [Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks](https://arxiv.org//abs/2407.10867) ++ [Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks](https://arxiv.org/abs/2407.10867) Lukas Gosch, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar, Stephan Günnemann @@ -22015,370 +22015,370 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Mark Russinovich, Ahmed Salem # 2024-07-14 -+ [Look Within, Why LLMs Hallucinate: A Causal Perspective](https://arxiv.org//abs/2407.10153) ++ [Look Within, Why LLMs Hallucinate: A Causal Perspective](https://arxiv.org/abs/2407.10153) He Li, Haoang Chi, Mingyu Liu, Wenjing Yang -+ [Augmented Neural Fine-Tuning for Efficient Backdoor Purification](https://arxiv.org//abs/2407.10052) ++ [Augmented Neural Fine-Tuning for Efficient Backdoor Purification](https://arxiv.org/abs/2407.10052) Nazmul Karim, Abdullah Al Arafat, Umar Khalid, Zhishan Guo, Nazanin Rahnavard -+ [CLIP-Guided Networks for Transferable Targeted Attacks](https://arxiv.org//abs/2407.10179) ++ [CLIP-Guided Networks for Transferable Targeted Attacks](https://arxiv.org/abs/2407.10179) Hao Fang, Jiawei Kong, Bin Chen, Tao Dai, Hao Wu, Shu-Tao Xia -+ [SENTINEL: Securing Indoor Localization against Adversarial Attacks with Capsule Neural Networks](https://arxiv.org//abs/2407.11091) ++ [SENTINEL: Securing Indoor Localization against Adversarial Attacks with Capsule Neural Networks](https://arxiv.org/abs/2407.11091) Danish Gufran, Pooja Anandathirtha, Sudeep Pasricha -+ [Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques](https://arxiv.org//abs/2407.11121) ++ [Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques](https://arxiv.org/abs/2407.11121) Rishika Bhagwatkar, Shravan Nayak, Reza Bayat, Alexis Roger, Daniel Z Kaplan, Pouya Bashivan, Irina Rish -+ [Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models](https://arxiv.org//abs/2407.11282) ++ [Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models](https://arxiv.org/abs/2407.11282) Qingcheng Zeng, Mingyu Jin, Qinkai Yu, Zhenting Wang, Wenyue Hua, Zihao Zhou, Guangyan Sun, Yanda Meng, Shiqing Ma, Qifan Wang, Felix Juefei-Xu, Kaize Ding, Fan Yang, Ruixiang Tang, Yongfeng Zhang # 2024-07-13 -+ [Partner in Crime: Boosting Targeted Poisoning Attacks against Federated Learning](https://arxiv.org//abs/2407.09958) ++ [Partner in Crime: Boosting Targeted Poisoning Attacks against Federated Learning](https://arxiv.org/abs/2407.09958) Shihua Sun, Shridatt Sugrim, Angelos Stavrou, Haining Wang -+ [SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images](https://arxiv.org//abs/2407.11073) ++ [SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images](https://arxiv.org/abs/2407.11073) Mingyuan Fan, Yang Liu, Cen Chen, Ximeng Liu -+ [MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models](https://arxiv.org//abs/2407.09972) ++ [MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models](https://arxiv.org/abs/2407.09972) Shanghao Shi, Md Shahedul Haque, Abhijeet Parida, Chaoyu Zhang, Marius George Linguraru, Y.Thomas Hou, Syed Muhammad Anwar, Wenjing Lou # 2024-07-12 -+ [Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations](https://arxiv.org//abs/2407.08983) ++ [Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations](https://arxiv.org/abs/2407.08983) David N. Palacio, Daniel Rodriguez-Cardenas, Alejandro Velasco, Dipin Khati, Kevin Moran, Denys Poshyvanyk -+ [Robustness of LLMs to Perturbations in Text](https://arxiv.org//abs/2407.08989) ++ [Robustness of LLMs to Perturbations in Text](https://arxiv.org/abs/2407.08989) Ayush Singh, Navpreet Singh, Shubham Vatsal -+ [Refusing Safe Prompts for Multi-modal Large Language Models](https://arxiv.org//abs/2407.09050) ++ [Refusing Safe Prompts for Multi-modal Large Language Models](https://arxiv.org/abs/2407.09050) Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong -+ [Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training](https://arxiv.org//abs/2407.09121) ++ [Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training](https://arxiv.org/abs/2407.09121) Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu -+ [TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs](https://arxiv.org//abs/2407.09164) ++ [TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs](https://arxiv.org/abs/2407.09164) Yuchen Yang, Hongwei Yao, Bingrun Yang, Yiling He, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren -+ [Deep Adversarial Defense Against Multilevel-Lp Attacks](https://arxiv.org//abs/2407.09251) ++ [Deep Adversarial Defense Against Multilevel-Lp Attacks](https://arxiv.org/abs/2407.09251) Ren Wang, Yuxuan Li, Alfred Hero -+ [Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off](https://arxiv.org//abs/2407.09150) ++ [Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off](https://arxiv.org/abs/2407.09150) Levente Halmosi, Bálint Mohos, Márk Jelasity -+ [Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses](https://arxiv.org//abs/2407.08935) ++ [Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses](https://arxiv.org/abs/2407.08935) Yuxin Yang, Qiang Li, Jinyuan Jia, Yuan Hong, Binghui Wang -+ [PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning](https://arxiv.org//abs/2407.08954) ++ [PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning](https://arxiv.org/abs/2407.08954) Sizai Hou, Songze Li, Tayyebeh Jahani-Nezhad, Giuseppe Caire -+ [DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks](https://arxiv.org//abs/2407.08956) ++ [DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks](https://arxiv.org/abs/2407.08956) Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang, Terry Yue Zhuo, David Lo, Taolue Chen -+ [CEIPA: Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models](https://arxiv.org//abs/2407.09292) ++ [CEIPA: Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models](https://arxiv.org/abs/2407.09292) Dong Shu, Mingyu Jin, Tianle Chen, Chong Zhang, Yongfeng Zhang -+ [BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning](https://arxiv.org//abs/2407.09658) ++ [BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning](https://arxiv.org/abs/2407.09658) Ning Wang, Shanghao Shi, Yang Xiao, Yimin Chen, Y. Thomas Hou, Wenjing Lou -+ [MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants](https://arxiv.org//abs/2407.11072) ++ [MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants](https://arxiv.org/abs/2407.11072) John Heibel, Daniel Lowd -+ [ASTPrompter: Preference-Aligned Automated Language Model Red-Teaming to Generate Low-Perplexity Unsafe Prompts](https://arxiv.org//abs/2407.09447) ++ [ASTPrompter: Preference-Aligned Automated Language Model Red-Teaming to Generate Low-Perplexity Unsafe Prompts](https://arxiv.org/abs/2407.09447) Amelia F. Hardy, Houjun Liu, Bernard Lange, Duncan Eddy, Mykel J. Kochenderfer -+ [ASTPrompter: Preference-Aligned Automated Language Model Red-Teaming to Generate Low-Perplexity Unsafe Prompts](https://arxiv.org//abs/2407.09447) ++ [ASTPrompter: Preference-Aligned Automated Language Model Red-Teaming to Generate Low-Perplexity Unsafe Prompts](https://arxiv.org/abs/2407.09447) Amelia F. Hardy, Houjun Liu, Allie Griffith, Bernard Lange, Duncan Eddy, Mykel J. Kochenderfer # 2024-07-11 -+ [Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment](https://arxiv.org//abs/2407.08127) ++ [Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment](https://arxiv.org/abs/2407.08127) Yufan Liu, Wanqian Zhang, Dayan Wu, Zheng Lin, Jingzi Gu, Weiping Wang -+ [Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization](https://arxiv.org//abs/2407.08374) ++ [Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization](https://arxiv.org/abs/2407.08374) Jinlong Li, Zequn Jie, Elisa Ricci, Lin Ma, Nicu Sebe -+ [Rethinking the Threat and Accessibility of Adversarial Attacks against Face Recognition Systems](https://arxiv.org//abs/2407.08514) ++ [Rethinking the Threat and Accessibility of Adversarial Attacks against Face Recognition Systems](https://arxiv.org/abs/2407.08514) Yuxin Cao, Yumeng Zhu, Derui Wang, Sheng Wen, Minhui Xue, Jin Lu, Hao Ge -+ [Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space](https://arxiv.org//abs/2407.08572) ++ [Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space](https://arxiv.org/abs/2407.08572) Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Xun Yang, Meng Wang, He Wang -+ [How to beat a Bayesian adversary](https://arxiv.org//abs/2407.08678) ++ [How to beat a Bayesian adversary](https://arxiv.org/abs/2407.08678) Zihan Ding, Kexin Jin, Jonas Latz, Chenguang Liu -+ [Model-agnostic clean-label backdoor mitigation in cybersecurity environments](https://arxiv.org//abs/2407.08159) ++ [Model-agnostic clean-label backdoor mitigation in cybersecurity environments](https://arxiv.org/abs/2407.08159) Giorgio Severi, Simona Boboila, John Holodnak, Kendra Kratkiewicz, Rauf Izmailov, Alina Oprea -+ [Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks](https://arxiv.org//abs/2407.08529) ++ [Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks](https://arxiv.org/abs/2407.08529) Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa -+ [A Survey on the Application of Generative Adversarial Networks in Cybersecurity: Prospective, Direction and Open Research Scopes](https://arxiv.org//abs/2407.08839) ++ [A Survey on the Application of Generative Adversarial Networks in Cybersecurity: Prospective, Direction and Open Research Scopes](https://arxiv.org/abs/2407.08839) Md Mashrur Arifin, Md Shoaib Ahmed, Tanmai Kumar Ghosh, Jun Zhuang, Jyh-haw Yeh -+ [Deep Learning for Network Anomaly Detection under Data Contamination: Evaluating Robustness and Mitigating Performance Degradation](https://arxiv.org//abs/2407.08838) ++ [Deep Learning for Network Anomaly Detection under Data Contamination: Evaluating Robustness and Mitigating Performance Degradation](https://arxiv.org/abs/2407.08838) D'Jeff K. Nkashama, Jordan Masakuna Félicien, Arian Soltani, Jean-Charles Verdier, Pierre-Martin Tardif, Marc Frappier, Froduald Kabanza -+ [HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks](https://arxiv.org//abs/2407.08806) ++ [HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks](https://arxiv.org/abs/2407.08806) Raffaele Mura, Giuseppe Floris, Luca Scionis, Giorgio Piras, Maura Pintor, Ambra Demontis, Giorgio Giacinto, Battista Biggio, Fabio Roli # 2024-07-10 -+ [Tuning Vision-Language Models with Candidate Labels by Prompt Alignment](https://arxiv.org//abs/2407.07638) ++ [Tuning Vision-Language Models with Candidate Labels by Prompt Alignment](https://arxiv.org/abs/2407.07638) Zhifang Zhang, Beibei Li -+ [Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization](https://arxiv.org//abs/2407.07880) ++ [Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization](https://arxiv.org/abs/2407.07880) Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jiawei Chen, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He -+ [Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities](https://arxiv.org//abs/2407.07791) ++ [Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities](https://arxiv.org/abs/2407.07791) Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, Gongshen Liu -+ [Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison](https://arxiv.org//abs/2407.07840) ++ [Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison](https://arxiv.org/abs/2407.07840) Qian Yang, Weixiang Yan, Aishwarya Agrawal -+ [Mitigating Backdoor Attacks using Activation-Guided Model Editing](https://arxiv.org//abs/2407.07662) ++ [Mitigating Backdoor Attacks using Activation-Guided Model Editing](https://arxiv.org/abs/2407.07662) Felix Hsieh, Huy H. Nguyen, AprilPyone MaungMaung, Dmitrii Usynin, Isao Echizen # 2024-07-09 -+ [A Hybrid Training-time and Run-time Defense Against Adversarial Attacks in Modulation Classification](https://arxiv.org//abs/2407.06807) ++ [A Hybrid Training-time and Run-time Defense Against Adversarial Attacks in Modulation Classification](https://arxiv.org/abs/2407.06807) Lu Zhang, Sangarapillai Lambotharan, Gan Zheng, Guisheng Liao, Ambra Demontis, Fabio Roli -+ [Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective](https://arxiv.org//abs/2407.06992) ++ [Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective](https://arxiv.org/abs/2407.06992) Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng -+ [Hiding Local Manipulations on SAR Images: a Counter-Forensic Attack](https://arxiv.org//abs/2407.07041) ++ [Hiding Local Manipulations on SAR Images: a Counter-Forensic Attack](https://arxiv.org/abs/2407.07041) Sara Mandelli, Edoardo Daniele Cannas, Paolo Bestagini, Stefano Tebaldini, Stefano Tubaro -+ [Universal Multi-view Black-box Attack against Object Detectors via Layout Optimization](https://arxiv.org//abs/2407.06688) ++ [Universal Multi-view Black-box Attack against Object Detectors via Layout Optimization](https://arxiv.org/abs/2407.06688) Donghua Wang, Wen Yao, Tingsong Jiang, Chao Li, Xiaoqian Chen -+ [Improving the Transferability of Adversarial Examples by Feature Augmentation](https://arxiv.org//abs/2407.06714) ++ [Improving the Transferability of Adversarial Examples by Feature Augmentation](https://arxiv.org/abs/2407.06714) Donghua Wang, Wen Yao, Tingsong Jiang, Xiaohu Zheng, Junqi Wu, Xiaoqian Chen -+ [AstroSpy: On detecting Fake Images in Astronomy via Joint Image-Spectral Representations](https://arxiv.org//abs/2407.06817) ++ [AstroSpy: On detecting Fake Images in Astronomy via Joint Image-Spectral Representations](https://arxiv.org/abs/2407.06817) Mohammed Talha Alam, Raza Imam, Mohsen Guizani, Fakhri Karray -+ [Towards Physics-informed Cyclic Adversarial Multi-PSF Lensless Imaging](https://arxiv.org//abs/2407.06727) ++ [Towards Physics-informed Cyclic Adversarial Multi-PSF Lensless Imaging](https://arxiv.org/abs/2407.06727) Abeer Banerjee, Sanjay Singh -+ [Event Trojan: Asynchronous Event-based Backdoor Attacks](https://arxiv.org//abs/2407.06838) ++ [Event Trojan: Asynchronous Event-based Backdoor Attacks](https://arxiv.org/abs/2407.06838) Ruofei Wang, Qing Guo, Haoliang Li, Renjie Wan -+ [Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning](https://arxiv.org//abs/2407.07221) ++ [Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning](https://arxiv.org/abs/2407.07221) Yuqi Jia, Minghong Fang, Hongbin Liu, Jinghuai Zhang, Neil Zhenqiang Gong -+ [Context-Masked Meta-Prompting for Privacy-Preserving LLM Adaptation in Finance](https://arxiv.org//abs/2407.18920) ++ [Context-Masked Meta-Prompting for Privacy-Preserving LLM Adaptation in Finance](https://arxiv.org/abs/2407.18920) Sayash Raaj Hiraou # 2024-07-08 -+ [$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning](https://arxiv.org//abs/2407.05557) ++ [$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning](https://arxiv.org/abs/2407.05557) Mintong Kang, Bo Li -+ [KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions](https://arxiv.org//abs/2407.05868) ++ [KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions](https://arxiv.org/abs/2407.05868) Yanxu Zhu, Jinlin Xiao, Yuhang Wang, Jitao Sang -+ [Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise](https://arxiv.org//abs/2407.05973) ++ [Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise](https://arxiv.org/abs/2407.05973) Bidur Khanal, Tianhong Dai, Binod Bhattarai, Cristian Linte -+ [Enhanced Model Robustness to Input Corruptions by Per-corruption Adaptation of Normalization Statistics](https://arxiv.org//abs/2407.06450) ++ [Enhanced Model Robustness to Input Corruptions by Per-corruption Adaptation of Normalization Statistics](https://arxiv.org/abs/2407.06450) Elena Camuffo, Umberto Michieli, Simone Milani, Jijoong Moon, Mete Ozay -+ [FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi Protocols](https://arxiv.org//abs/2407.06348) ++ [FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi Protocols](https://arxiv.org/abs/2407.06348) Hongbo Wen, Hanzhi Liu, Jiaxin Song, Yanju Chen, Wenbo Guo, Yu Feng -+ [Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment](https://arxiv.org//abs/2407.06443) ++ [Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment](https://arxiv.org/abs/2407.06443) Qizhang Feng, Siva Rajesh Kasa, Santhosh Kumar Kasa, Hyokun Yun, Choon Hui Teo, Sravan Babu Bodapati -+ [When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails](https://arxiv.org//abs/2407.06323) ++ [When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails](https://arxiv.org/abs/2407.06323) Manish Nagireddy, Inkit Padhi, Soumya Ghosh, Prasanna Sattigeri # 2024-07-07 -+ [Gradient Diffusion: A Perturbation-Resilient Gradient Leakage Attack](https://arxiv.org//abs/2407.05285) ++ [Gradient Diffusion: A Perturbation-Resilient Gradient Leakage Attack](https://arxiv.org/abs/2407.05285) Xuan Liu, Siqi Cai, Qihua Zhou, Song Guo, Ruibin Li, Kaiwei Lin -+ [Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense](https://arxiv.org//abs/2407.05396) ++ [Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense](https://arxiv.org/abs/2407.05396) Qi Zhou, Zipeng Ye, Yubo Tang, Wenjian Luo, Yuhui Shi, Yan Jia # 2024-07-06 -+ [BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records](https://arxiv.org//abs/2407.05213) ++ [BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records](https://arxiv.org/abs/2407.05213) Weimin Lyu, Zexin Bi, Fusheng Wang, Chao Chen # 2024-07-05 -+ [Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models](https://arxiv.org//abs/2407.04482) ++ [Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models](https://arxiv.org/abs/2407.04482) Vyas Raina, Mark Gales -+ [T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models](https://arxiv.org//abs/2407.04215) ++ [T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models](https://arxiv.org/abs/2407.04215) Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen -+ [Self-Supervised Representation Learning for Adversarial Attack Detection](https://arxiv.org//abs/2407.04382) ++ [Self-Supervised Representation Learning for Adversarial Attack Detection](https://arxiv.org/abs/2407.04382) Yi Li, Plamen Angelov, Neeraj Suri -+ [Late Breaking Results: Fortifying Neural Networks: Safeguarding Against Adversarial Attacks with Stochastic Computing](https://arxiv.org//abs/2407.04861) ++ [Late Breaking Results: Fortifying Neural Networks: Safeguarding Against Adversarial Attacks with Stochastic Computing](https://arxiv.org/abs/2407.04861) Faeze S. Banitaba, Sercan Aygun, M. Hassan Najafi -+ [Non-Cooperative Backdoor Attacks in Federated Learning: A New Threat Landscape](https://arxiv.org//abs/2407.07917) ++ [Non-Cooperative Backdoor Attacks in Federated Learning: A New Threat Landscape](https://arxiv.org/abs/2407.07917) Tuan Nguyen, Dung Thuy Nguyen, Khoa D Doan, Kok-Seng Wong # 2024-07-04 -+ [Adversarial Robustness of VAEs across Intersectional Subgroups](https://arxiv.org//abs/2407.03864) ++ [Adversarial Robustness of VAEs across Intersectional Subgroups](https://arxiv.org/abs/2407.03864) Chethan Krishnamurthy Ramanaik, Arjun Roy, Eirini Ntoutsi -+ [Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models](https://arxiv.org//abs/2407.04121) ++ [Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models](https://arxiv.org/abs/2407.04121) Yuyan Chen, Qiang Fu, Yichen Yuan, Zhihao Wen, Ge Fan, Dayiheng Liu, Dongmei Zhang, Zhixu Li, Yanghua Xiao -+ [Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers](https://arxiv.org//abs/2407.04151) ++ [Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers](https://arxiv.org/abs/2407.04151) Terry Tong, Jiashu Xu, Qin Liu, Muhao Chen -+ [Defense Against Syntactic Textual Backdoor Attacks with Token Substitution](https://arxiv.org//abs/2407.04179) ++ [Defense Against Syntactic Textual Backdoor Attacks with Token Substitution](https://arxiv.org/abs/2407.04179) Xinglin Li, Xianwen He, Yao Li, Minhao Cheng -+ [DART: Deep Adversarial Automated Red Teaming for LLM Safety](https://arxiv.org//abs/2407.03876) ++ [DART: Deep Adversarial Automated Red Teaming for LLM Safety](https://arxiv.org/abs/2407.03876) Bojian Jiang, Yi Jing, Tianhao Shen, Qing Yang, Deyi Xiong -+ [TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers](https://arxiv.org//abs/2407.03946) ++ [TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers](https://arxiv.org/abs/2407.03946) Fatemeh Nourilenjan Nokabadi, Yann Batiste Pequignot, Jean-Francois Lalonde, Christian Gagné -+ [Mitigating Low-Frequency Bias: Feature Recalibration and Frequency Attention Regularization for Adversarial Robustness](https://arxiv.org//abs/2407.04016) ++ [Mitigating Low-Frequency Bias: Feature Recalibration and Frequency Attention Regularization for Adversarial Robustness](https://arxiv.org/abs/2407.04016) Kejia Zhang, Juanjuan Weng, Yuanzheng Cai, Zhiming Luo, Shaozi Li -+ [Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs](https://arxiv.org//abs/2407.04108) ++ [Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs](https://arxiv.org/abs/2407.04108) Sara Price, Arjun Panickssery, Sam Bowman, Asa Cooper Stickland @@ -22410,16 +22410,16 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Xiang Ling, Zhiyu Wu, Bin Wang, Wei Deng, Jingzheng Wu, Shouling Ji, Tianyue Luo, Yanjun Wu -+ [Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning](https://arxiv.org//abs/2407.03391) ++ [Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning](https://arxiv.org/abs/2407.03391) Simon Ostermann, Kevin Baum, Christoph Endres, Julia Masloh, Patrick Schramowski -+ [SPLITZ: Certifiable Robustness via Split Lipschitz Randomized Smoothing](https://arxiv.org//abs/2407.02811) ++ [SPLITZ: Certifiable Robustness via Split Lipschitz Randomized Smoothing](https://arxiv.org/abs/2407.02811) Meiyu Zhong, Ravi Tandon -+ [PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding](https://arxiv.org//abs/2407.02943) ++ [PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding](https://arxiv.org/abs/2407.02943) Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, Xuebing Zhou @@ -22449,341 +22449,341 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Yash More, Prakhar Ganesh, Golnoosh Farnadi -+ [Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval](https://arxiv.org//abs/2407.02395) ++ [Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval](https://arxiv.org/abs/2407.02395) Jiexin Wang, Xitong Luo, Liuwen Cao, Hongkui He, Hailin Huang, Jiayuan Xie, Adam Jatowt, Yi Cai -+ [Funny-Valen-Tine: Planning Solution Distribution Enhances Machine Abstract Reasoning Ability](https://arxiv.org//abs/2407.02688) ++ [Funny-Valen-Tine: Planning Solution Distribution Enhances Machine Abstract Reasoning Ability](https://arxiv.org/abs/2407.02688) Ruizhuo Song, Beiming Yuan # 2024-07-01 -+ [Multi-View Black-Box Physical Attacks on Infrared Pedestrian Detectors Using Adversarial Infrared Grid](https://arxiv.org//abs/2407.01168) ++ [Multi-View Black-Box Physical Attacks on Infrared Pedestrian Detectors Using Adversarial Infrared Grid](https://arxiv.org/abs/2407.01168) Kalibinuer Tiliwalidi, Chengyin Hu, Weiwen Shi -+ [Learning Robust 3D Representation from CLIP via Dual Denoising](https://arxiv.org//abs/2407.00905) ++ [Learning Robust 3D Representation from CLIP via Dual Denoising](https://arxiv.org/abs/2407.00905) Shuqing Luo, Bowen Qu, Wei Gao -+ [Semantic-guided Adversarial Diffusion Model for Self-supervised Shadow Removal](https://arxiv.org//abs/2407.01104) ++ [Semantic-guided Adversarial Diffusion Model for Self-supervised Shadow Removal](https://arxiv.org/abs/2407.01104) Ziqi Zeng, Chen Zhao, Weiling Cai, Chenyu Dong -+ [Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability](https://arxiv.org//abs/2407.01306) ++ [Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability](https://arxiv.org/abs/2407.01306) Chenxi Li, Abhinav Kumar, Zhen Guo, Jie Hou, Reza Tourani -+ [Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks](https://arxiv.org//abs/2407.00869) ++ [Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks](https://arxiv.org/abs/2407.00869) Yue Zhou, Henry Peng Zou, Barbara Di Eugenio, Yang Zhang # 2024-06-30 -+ [Unveiling Glitches: A Deep Dive into Image Encoding Bugs within CLIP](https://arxiv.org//abs/2407.00592) ++ [Unveiling Glitches: A Deep Dive into Image Encoding Bugs within CLIP](https://arxiv.org/abs/2407.00592) Ayush Ranjan, Daniel Wen, Karthik Bhat -+ [Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness](https://arxiv.org//abs/2407.00623) ++ [Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness](https://arxiv.org/abs/2407.00623) Yiquan Li, Zhongzhu Chen, Kun Jin, Jiongxiao Wang, Bo Li, Chaowei Xiao -+ [A Whole-Process Certifiably Robust Aggregation Method Against Backdoor Attacks in Federated Learning](https://arxiv.org//abs/2407.00719) ++ [A Whole-Process Certifiably Robust Aggregation Method Against Backdoor Attacks in Federated Learning](https://arxiv.org/abs/2407.00719) Anqi Zhou, Yezheng Liu, Yidong Chai, Hongyi Zhu, Xinyue Ge, Yuanchun Jiang, Meng Wang # 2024-06-29 -+ [Query-Efficient Hard-Label Black-Box Attack against Vision Transformers](https://arxiv.org//abs/2407.00389) ++ [Query-Efficient Hard-Label Black-Box Attack against Vision Transformers](https://arxiv.org/abs/2407.00389) Chao Zhou, Xiaowen Shi, Yuan-Gen Wang # 2024-06-28 -+ [Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness](https://arxiv.org//abs/2406.19622) ++ [Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness](https://arxiv.org/abs/2406.19622) Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee -+ [Deceptive Diffusion: Generating Synthetic Adversarial Examples](https://arxiv.org//abs/2406.19807) ++ [Deceptive Diffusion: Generating Synthetic Adversarial Examples](https://arxiv.org/abs/2406.19807) Lucas Beerens, Catherine F. Higham, Desmond J. Higham -+ [Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation](https://arxiv.org//abs/2406.20053) ++ [Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation](https://arxiv.org/abs/2406.20053) Danny Halawi, Alexander Wei, Eric Wallace, Tony T. Wang, Nika Haghtalab, Jacob Steinhardt -+ [IDT: Dual-Task Adversarial Attacks for Privacy Protection](https://arxiv.org//abs/2406.19642) ++ [IDT: Dual-Task Adversarial Attacks for Privacy Protection](https://arxiv.org/abs/2406.19642) Pedro Faustini, Shakila Mahjabin Tonni, Annabelle McIver, Qiongkai Xu, Mark Dras -+ [NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations](https://arxiv.org//abs/2406.19783) ++ [NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations](https://arxiv.org/abs/2406.19783) Junkai Chen, Zhenhao Li, Xing Hu, Xin Xia -+ [GM-DF: Generalized Multi-Scenario Deepfake Detection](https://arxiv.org//abs/2406.20078) ++ [GM-DF: Generalized Multi-Scenario Deepfake Detection](https://arxiv.org/abs/2406.20078) Yingxin Lai, Zitong Yu, Jing Yang, Bin Li, Xiangui Kang, Linlin Shen -+ [AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation](https://arxiv.org//abs/2406.19649) ++ [AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation](https://arxiv.org/abs/2406.19649) Guanghao Zhu, Jing Zhang, Juanxiu Liu, Xiaohui Du, Ruqian Hao, Yong Liu, Lin Liu -+ [Backdoor Attack in Prompt-Based Continual Learning](https://arxiv.org//abs/2406.19753) ++ [Backdoor Attack in Prompt-Based Continual Learning](https://arxiv.org/abs/2406.19753) Trang Nguyen, Anh Tran, Nhat Ho -+ [Virtual Context: Enhancing Jailbreak Attacks with Special Token Injection](https://arxiv.org//abs/2406.19845) ++ [Virtual Context: Enhancing Jailbreak Attacks with Special Token Injection](https://arxiv.org/abs/2406.19845) Yuqi Zhou, Lin Lu, Hanchi Sun, Pan Zhou, Lichao Sun -+ [DiffuseDef: Improved Robustness to Adversarial Attacks](https://arxiv.org//abs/2407.00248) ++ [DiffuseDef: Improved Robustness to Adversarial Attacks](https://arxiv.org/abs/2407.00248) Zhenhao Li, Marek Rei, Lucia Specia # 2024-06-27 -+ [Rethinking harmless refusals when fine-tuning foundation models](https://arxiv.org//abs/2406.19552) ++ [Rethinking harmless refusals when fine-tuning foundation models](https://arxiv.org/abs/2406.19552) Florin Pop, Judd Rosenblatt, Diogo Schwerz de Lucena, Michael Vaiana -+ [Data Poisoning Attacks to Locally Differentially Private Frequent Itemset Mining Protocols](https://arxiv.org//abs/2406.19466) ++ [Data Poisoning Attacks to Locally Differentially Private Frequent Itemset Mining Protocols](https://arxiv.org/abs/2406.19466) Wei Tong, Haoyu Chen, Jiacheng Niu, Sheng Zhong -+ [CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network](https://arxiv.org//abs/2407.09550) ++ [CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network](https://arxiv.org/abs/2407.09550) Jia-Hau Bai, Chi-Ting Liu, Yu Wang, Fu-Chieh Chang, Pei-Yuan Wu # 2024-06-26 -+ [Poisoned LangChain: Jailbreak LLMs by LangChain](https://arxiv.org//abs/2406.18122) ++ [Poisoned LangChain: Jailbreak LLMs by LangChain](https://arxiv.org/abs/2406.18122) Ziqiu Wang, Jun Liu, Shengkai Zhang, Yang Yang -+ [MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization](https://arxiv.org//abs/2406.18379) ++ [MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization](https://arxiv.org/abs/2406.18379) Haolang Lu, Hongrui Peng, Guoshun Nan, Jiaoyang Cui, Cheng Wang, Weifei Jin -+ [WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models](https://arxiv.org//abs/2406.18510) ++ [WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models](https://arxiv.org/abs/2406.18510) Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri -+ [SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance](https://arxiv.org//abs/2406.18118) ++ [SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance](https://arxiv.org/abs/2406.18118) Caishuang Huang, Wanxu Zhao, Rui Zheng, Huijie Lv, Shihan Dou, Sixian Li, Xiao Wang, Enyu Zhou, Junjie Ye, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang # 2024-06-25 -+ [Machine Unlearning Fails to Remove Data Poisoning Attacks](https://arxiv.org//abs/2406.17216) ++ [Machine Unlearning Fails to Remove Data Poisoning Attacks](https://arxiv.org/abs/2406.17216) Martin Pawelczyk, Jimmy Z. Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel -+ [Diffusion-based Adversarial Purification for Intrusion Detection](https://arxiv.org//abs/2406.17606) ++ [Diffusion-based Adversarial Purification for Intrusion Detection](https://arxiv.org/abs/2406.17606) Mohamed Amine Merzouk, Erwan Beurier, Reda Yaich, Nora Boulahia-Cuppens, Frédéric Cuppens -+ [A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens](https://arxiv.org//abs/2406.17378) ++ [A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens](https://arxiv.org/abs/2406.17378) Zhijie Nie, Richong Zhang, Zhanyu Wu -+ [Inherent Challenges of Post-Hoc Membership Inference for Large Language Models](https://arxiv.org//abs/2406.17975) ++ [Inherent Challenges of Post-Hoc Membership Inference for Large Language Models](https://arxiv.org/abs/2406.17975) Matthieu Meeus, Shubham Jain, Marek Rei, Yves-Alexandre de Montjoye -+ [Banishing LLM Hallucinations Requires Rethinking Generalization](https://arxiv.org//abs/2406.17642) ++ [Banishing LLM Hallucinations Requires Rethinking Generalization](https://arxiv.org/abs/2406.17642) Johnny Li, Saksham Consul, Eda Zhou, James Wong, Naila Farooqui, Yuxin Ye, Nithyashree Manohar, Zhuxiaona Wei, Tian Wu, Ben Echols, Sharon Zhou, Gregory Diamos -+ [Detection of Synthetic Face Images: Accuracy, Robustness, Generalization](https://arxiv.org//abs/2406.17547) ++ [Detection of Synthetic Face Images: Accuracy, Robustness, Generalization](https://arxiv.org/abs/2406.17547) Nela Petrzelkova, Jan Cech -+ [A New Benchmark Dataset and Mixture-of-Experts Language Models for Adversarial Natural Language Inference in Vietnamese](https://arxiv.org//abs/2406.17716) ++ [A New Benchmark Dataset and Mixture-of-Experts Language Models for Adversarial Natural Language Inference in Vietnamese](https://arxiv.org/abs/2406.17716) Tin Van Huynh, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen # 2024-06-24 -+ [UNICAD: A Unified Approach for Attack Detection, Noise Reduction and Novel Class Identification](https://arxiv.org//abs/2406.16501) ++ [UNICAD: A Unified Approach for Attack Detection, Noise Reduction and Novel Class Identification](https://arxiv.org/abs/2406.16501) Alvaro Lopez Pellicer, Kittipos Giatgong, Yi Li, Neeraj Suri, Plamen Angelov -+ [Evaluating the Robustness of Deep-Learning Algorithm-Selection Models by Evolving Adversarial Instances](https://arxiv.org//abs/2406.16609) ++ [Evaluating the Robustness of Deep-Learning Algorithm-Selection Models by Evolving Adversarial Instances](https://arxiv.org/abs/2406.16609) Emma Hart, Quentin Renau, Kevin Sim, Mohamad Alissa -+ [Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization](https://arxiv.org//abs/2406.16743) ++ [Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization](https://arxiv.org/abs/2406.16743) Zhengyue Zhao, Xiaoyun Zhang, Kaidi Xu, Xing Hu, Rui Zhang, Zidong Du, Qi Guo, Yunji Chen -+ [Evaluating and Analyzing Relationship Hallucinations in LVLMs](https://arxiv.org//abs/2406.16449) ++ [Evaluating and Analyzing Relationship Hallucinations in LVLMs](https://arxiv.org/abs/2406.16449) Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai Sun, Rongrong Ji -+ [Improving robustness to corruptions with multiplicative weight perturbations](https://arxiv.org//abs/2406.16540) ++ [Improving robustness to corruptions with multiplicative weight perturbations](https://arxiv.org/abs/2406.16540) Trung Trinh, Markus Heinonen, Luigi Acerbi, Samuel Kaski -+ [Noisy Neighbors: Efficient membership inference attacks against LLMs](https://arxiv.org//abs/2406.16565) ++ [Noisy Neighbors: Efficient membership inference attacks against LLMs](https://arxiv.org/abs/2406.16565) Filippo Galli, Luca Melis, Tommaso Cucinotta -+ [BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models](https://arxiv.org//abs/2406.17092) ++ [BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models](https://arxiv.org/abs/2406.17092) Yi Zeng, Weiyu Sun, Tran Ngoc Huynh, Dawn Song, Bo Li, Ruoxi Jia -+ [Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models](https://arxiv.org//abs/2406.17115) ++ [Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models](https://arxiv.org/abs/2406.17115) Bei Yan, Jie Zhang, Zheng Yuan, Shiguang Shan, Xilin Chen -+ [Automated Adversarial Discovery for Safety Classifiers](https://arxiv.org//abs/2406.17104) ++ [Automated Adversarial Discovery for Safety Classifiers](https://arxiv.org/abs/2406.17104) Yash Kumar Lal, Preethi Lahoti, Aradhana Sinha, Yao Qin, Ananth Balashankar # 2024-06-23 -+ [Towards unlocking the mystery of adversarial fragility of neural networks](https://arxiv.org//abs/2406.16200) ++ [Towards unlocking the mystery of adversarial fragility of neural networks](https://arxiv.org/abs/2406.16200) Jingchao Gao, Raghu Mudumbai, Xiaodong Wu, Jirong Yi, Catherine Xu, Hui Xie, Weiyu Xu -+ [CBPF: Filtering Poisoned Data Based on Composite Backdoor Attack](https://arxiv.org//abs/2406.16125) ++ [CBPF: Filtering Poisoned Data Based on Composite Backdoor Attack](https://arxiv.org/abs/2406.16125) Hanfeng Xia, Haibo Hong, Ruili Wang -+ [Blind Baselines Beat Membership Inference Attacks for Foundation Models](https://arxiv.org//abs/2406.16201) ++ [Blind Baselines Beat Membership Inference Attacks for Foundation Models](https://arxiv.org/abs/2406.16201) Debeshee Das, Jie Zhang, Florian Tramèr # 2024-06-22 -+ [Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs](https://arxiv.org//abs/2406.15927) ++ [Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs](https://arxiv.org/abs/2406.15927) Jannik Kossen, Jiatong Han, Muhammed Razzak, Lisa Schut, Shreshth Malik, Yarin Gal -+ [EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation](https://arxiv.org//abs/2406.15863) ++ [EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation](https://arxiv.org/abs/2406.15863) Tianyu Wei, Shanmin Pang, Qi Guo, Yizhuo Ma, Qing Guo -+ [Federated Adversarial Learning for Robust Autonomous Landing Runway Detection](https://arxiv.org//abs/2406.15925) ++ [Federated Adversarial Learning for Robust Autonomous Landing Runway Detection](https://arxiv.org/abs/2406.15925) Yi Li, Plamen Angelov, Zhengxin Yu, Alvaro Lopez Pellicer, Neeraj Suri -+ [Large Language Models for Link Stealing Attacks Against Graph Neural Networks](https://arxiv.org//abs/2406.16963) ++ [Large Language Models for Link Stealing Attacks Against Graph Neural Networks](https://arxiv.org/abs/2406.16963) Faqian Guan, Tianqing Zhu, Hui Sun, Wanlei Zhou, Philip S. Yu -+ [MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?](https://arxiv.org//abs/2406.17806) ++ [MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?](https://arxiv.org/abs/2406.17806) Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh # 2024-06-21 -+ [From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking](https://arxiv.org//abs/2406.14859) ++ [From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking](https://arxiv.org/abs/2406.14859) Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei -+ [Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning](https://arxiv.org//abs/2406.14962) ++ [Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning](https://arxiv.org/abs/2406.14962) Suyi Li, Chenyi Jiang, Shidong Wang, Yang Long, Zheng Zhang, Haofeng Zhang -+ [DataFreeShield: Defending Adversarial Attacks without Training Data](https://arxiv.org//abs/2406.15635) ++ [DataFreeShield: Defending Adversarial Attacks without Training Data](https://arxiv.org/abs/2406.15635) Hyeyoon Lee, Kanghyun Choi, Dain Kwon, Sunjong Park, Mayoore Selvarasa Jaiswal, Noseong Park, Jonghyun Choi, Jinho Lee -+ [Backdooring Bias (B^2) into Stable Diffusion Models](https://arxiv.org//abs/2406.15213) ++ [Backdooring Bias (B^2) into Stable Diffusion Models](https://arxiv.org/abs/2406.15213) Ali Naseh, Jaechul Roh, Eugene Bagdasaryan, Amir Houmansadr -+ [Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models](https://arxiv.org//abs/2406.14855) ++ [Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models](https://arxiv.org/abs/2406.14855) Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu # 2024-06-20 -+ [Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective](https://arxiv.org//abs/2406.14023) ++ [Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective](https://arxiv.org/abs/2406.14023) Yuchen Wen, Keping Bi, Wei Chen, Jiafeng Guo, Xueqi Cheng -+ [Enhancing robustness of data-driven SHM models: adversarial training with circle loss](https://arxiv.org//abs/2406.14232) ++ [Enhancing robustness of data-driven SHM models: adversarial training with circle loss](https://arxiv.org/abs/2406.14232) Xiangli Yang, Xijie Deng, Hanwei Zhang, Yang Zou, Jianxi Yang -+ [ObscurePrompt: Jailbreaking Large Language Models via Obscure Input](https://arxiv.org//abs/2406.13662) ++ [ObscurePrompt: Jailbreaking Large Language Models via Obscure Input](https://arxiv.org/abs/2406.13662) Yue Huang, Jingyu Tang, Dongping Chen, Bingda Tang, Yao Wan, Lichao Sun, Xiangliang Zhang -+ [MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization](https://arxiv.org//abs/2406.14259) ++ [MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization](https://arxiv.org/abs/2406.14259) Zhaozhe Hu, Jia-Li Yin, Bin Chen, Luojun Lin, Bo-Hao Chen, Ximeng Liu -+ [Explainable AI Security: Exploring Robustness of Graph Neural Networks to Adversarial Attacks](https://arxiv.org//abs/2406.13920) ++ [Explainable AI Security: Exploring Robustness of Graph Neural Networks to Adversarial Attacks](https://arxiv.org/abs/2406.13920) Tao Wu, Canyixing Cui, Xingping Xian, Shaojie Qiao, Chao Wang, Lin Yuan, Shui Yu -+ [Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning](https://arxiv.org//abs/2406.14217) ++ [Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning](https://arxiv.org/abs/2406.14217) Yujing Wang, Hainan Zhang, Sijia Wen, Wangjie Qiu, Binghui Guo -+ [Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization](https://arxiv.org//abs/2406.14329) ++ [Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization](https://arxiv.org/abs/2406.14329) Tanapat Ratchatorn, Masayuki Tanaka -+ [Adversaries Can Misuse Combinations of Safe Models](https://arxiv.org//abs/2406.14595) ++ [Adversaries Can Misuse Combinations of Safe Models](https://arxiv.org/abs/2406.14595) Erik Jones, Anca Dragan, Jacob Steinhardt @@ -22793,589 +22793,589 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Tao Wu, Canyixing Cui, Xingping Xian, Shaojie Qiao, Chao Wang, Lin Yuan, Shui Yu -+ [Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems](https://arxiv.org//abs/2406.14545) ++ [Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems](https://arxiv.org/abs/2406.14545) Đorđe Klisura, Anthony Rios # 2024-06-19 -+ [AGSOA:Graph Neural Network Targeted Attack Based on Average Gradient and Structure Optimization](https://arxiv.org//abs/2406.13228) ++ [AGSOA:Graph Neural Network Targeted Attack Based on Average Gradient and Structure Optimization](https://arxiv.org/abs/2406.13228) Yang Chen, Bin Zhou -+ [Bayes' capacity as a measure for reconstruction attacks in federated learning](https://arxiv.org//abs/2406.13569) ++ [Bayes' capacity as a measure for reconstruction attacks in federated learning](https://arxiv.org/abs/2406.13569) Sayan Biswas, Mark Dras, Pedro Faustini, Natasha Fernandes, Annabelle McIver, Catuscia Palamidessi, Parastoo Sadeghi -+ [Towards Trustworthy Unsupervised Domain Adaptation: A Representation Learning Perspective for Enhancing Robustness, Discrimination, and Generalization](https://arxiv.org//abs/2406.13180) ++ [Towards Trustworthy Unsupervised Domain Adaptation: A Representation Learning Perspective for Enhancing Robustness, Discrimination, and Generalization](https://arxiv.org/abs/2406.13180) Jia-Li Yin, Haoyuan Zheng, Ximeng Liu -+ [Benchmarking Unsupervised Online IDS for Masquerade Attacks in CAN](https://arxiv.org//abs/2406.13778) ++ [Benchmarking Unsupervised Online IDS for Masquerade Attacks in CAN](https://arxiv.org/abs/2406.13778) Pablo Moriano, Steven C. Hespeler, Mingyan Li, Robert A. Bridges # 2024-06-18 -+ [CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models](https://arxiv.org//abs/2406.12257) ++ [CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models](https://arxiv.org/abs/2406.12257) Yuetai Li, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Dinuka Sahabandu, Bhaskar Ramasubramanian, Radha Poovendran -+ [Adversarial Attacks on Large Language Models in Medicine](https://arxiv.org//abs/2406.12259) ++ [Adversarial Attacks on Large Language Models in Medicine](https://arxiv.org/abs/2406.12259) Yifan Yang, Qiao Jin, Furong Huang, Zhiyong Lu -+ [Stealth edits for provably fixing or attacking large language models](https://arxiv.org//abs/2406.12670) ++ [Stealth edits for provably fixing or attacking large language models](https://arxiv.org/abs/2406.12670) Oliver J. Sutton, Qinghua Zhou, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Y. Tyukin -+ [UIFV: Data Reconstruction Attack in Vertical Federated Learning](https://arxiv.org//abs/2406.12588) ++ [UIFV: Data Reconstruction Attack in Vertical Federated Learning](https://arxiv.org/abs/2406.12588) Jirui Yang, Peng Chen, Zhihui Lu, Qiang Duan, Yubing Bao -+ [Can Go AIs be adversarially robust?](https://arxiv.org//abs/2406.12843) ++ [Can Go AIs be adversarially robust?](https://arxiv.org/abs/2406.12843) Tom Tseng, Euan McLean, Kellin Pelrine, Tony T. Wang, Adam Gleave -+ [ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations](https://arxiv.org//abs/2406.12223) ++ [ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations](https://arxiv.org/abs/2406.12223) Yunze Xiao, Yujia Hu, Kenny Tsu Wei Choo, Roy Ka-wei Lee -+ [Defending Against Social Engineering Attacks in the Age of LLMs](https://arxiv.org//abs/2406.12263) ++ [Defending Against Social Engineering Attacks in the Age of LLMs](https://arxiv.org/abs/2406.12263) Lin Ai, Tharindu Kumarage, Amrita Bhattacharjee, Zizhou Liu, Zheng Hui, Michael Davinroy, James Cook, Laura Cassani, Kirill Trapeznikov, Matthias Kirchner, Arslan Basharat, Anthony Hoogs, Joshua Garland, Huan Liu, Julia Hirschberg -+ [Adversarial Attacks on Multimodal Agents](https://arxiv.org//abs/2406.12814) ++ [Adversarial Attacks on Multimodal Agents](https://arxiv.org/abs/2406.12814) Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan -+ [Attack and Defense of Deep Learning Models in the Field of Web Attack Detection](https://arxiv.org//abs/2406.12605) ++ [Attack and Defense of Deep Learning Models in the Field of Web Attack Detection](https://arxiv.org/abs/2406.12605) Lijia Shi, Shihao Dong -+ [MaskPure: Improving Defense Against Text Adversaries with Stochastic Purification](https://arxiv.org//abs/2406.13066) ++ [MaskPure: Improving Defense Against Text Adversaries with Stochastic Purification](https://arxiv.org/abs/2406.13066) Harrison Gietz, Jugal Kalita -+ [NoiSec: Harnessing Noise for Security against Adversarial and Backdoor Attacks](https://arxiv.org//abs/2406.13073) ++ [NoiSec: Harnessing Noise for Security against Adversarial and Backdoor Attacks](https://arxiv.org/abs/2406.13073) Md Hasan Shahriar, Ning Wang, Y. Thomas Hou, Wenjing Lou -+ [DLP: towards active defense against backdoor attacks with decoupled learning process](https://arxiv.org//abs/2406.13098) ++ [DLP: towards active defense against backdoor attacks with decoupled learning process](https://arxiv.org/abs/2406.13098) Zonghao Ying, Bin Wu -+ [Saliency Attention and Semantic Similarity-Driven Adversarial Perturbation](https://arxiv.org//abs/2406.19413) ++ [Saliency Attention and Semantic Similarity-Driven Adversarial Perturbation](https://arxiv.org/abs/2406.19413) Hetvi Waghela, Jaydip Sen, Sneha Rakshit -+ [$k$-Submodular Interdiction Problems under Distributional Risk-Receptiveness and Robustness: Application to Machine Learning](https://arxiv.org//abs/2406.13023) ++ [$k$-Submodular Interdiction Problems under Distributional Risk-Receptiveness and Robustness: Application to Machine Learning](https://arxiv.org/abs/2406.13023) Seonghun Park, Manish Bansal -+ [What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering](https://arxiv.org//abs/2406.12334) ++ [What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering](https://arxiv.org/abs/2406.12334) Federico Errica, Giuseppe Siracusano, Davide Sanvito, Roberto Bifulco -+ [Exploring the Robustness of Language Models for Tabular Question Answering via Attention Analysis](https://arxiv.org//abs/2406.12719) ++ [Exploring the Robustness of Language Models for Tabular Question Answering via Attention Analysis](https://arxiv.org/abs/2406.12719) Kushal Raj Bhandari, Sixue Xing, Soham Dan, Jianxi Gao # 2024-06-17 -+ [Adversarial Style Augmentation via Large Language Model for Robust Fake News Detection](https://arxiv.org//abs/2406.11260) ++ [Adversarial Style Augmentation via Large Language Model for Robust Fake News Detection](https://arxiv.org/abs/2406.11260) Sungwon Park, Sungwon Han, Meeyoung Cha -+ [Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack](https://arxiv.org//abs/2406.11682) ++ [Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack](https://arxiv.org/abs/2406.11682) Shangqing Tu, Zhuoran Pan, Wenxuan Wang, Zhexin Zhang, Yuliang Sun, Jifan Yu, Hongning Wang, Lei Hou, Juanzi Li -+ ["Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak](https://arxiv.org//abs/2406.11668) ++ ["Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak](https://arxiv.org/abs/2406.11668) Lingrui Mei, Shenghua Liu, Yiwei Wang, Baolong Bi, Jiayi Mao, Xueqi Cheng -+ [Harmonizing Feature Maps: A Graph Convolutional Approach for Enhancing Adversarial Robustness](https://arxiv.org//abs/2406.11576) ++ [Harmonizing Feature Maps: A Graph Convolutional Approach for Enhancing Adversarial Robustness](https://arxiv.org/abs/2406.11576) Kejia Zhang, Juanjuan Weng, Junwei Wu, Guoqing Yang, Shaozi Li, Zhiming Luo -+ [A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving](https://arxiv.org//abs/2406.11707) ++ [A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving](https://arxiv.org/abs/2406.11707) Yang Lou, Yi Zhu, Qun Song, Rui Tan, Chunming Qiao, Wei-Bin Lee, Jianping Wang -+ [Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness](https://arxiv.org//abs/2406.11458) ++ [Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness](https://arxiv.org/abs/2406.11458) Maayan Ehrenberg, Roy Ganz, Nir Rosenfeld -+ [Obfuscating IoT Device Scanning Activity via Adversarial Example Generation](https://arxiv.org//abs/2406.11515) ++ [Obfuscating IoT Device Scanning Activity via Adversarial Example Generation](https://arxiv.org/abs/2406.11515) Haocong Li, Yaxin Zhang, Long Cheng, Wenjia Niu, Haining Wang, Qiang Li -+ [Is poisoning a real threat to LLM alignment? Maybe more so than you think](https://arxiv.org//abs/2406.12091) ++ [Is poisoning a real threat to LLM alignment? Maybe more so than you think](https://arxiv.org/abs/2406.12091) Pankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang -+ [Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI](https://arxiv.org//abs/2406.12027) ++ [Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI](https://arxiv.org/abs/2406.12027) Robert Hönig, Javier Rando, Nicholas Carlini, Florian Tramèr -+ [ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates](https://arxiv.org//abs/2406.12935) ++ [ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates](https://arxiv.org/abs/2406.12935) Fengqing Jiang, Zhangchen Xu, Luyao Niu, Bill Yuchen Lin, Radha Poovendran # 2024-06-16 -+ [KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs](https://arxiv.org//abs/2406.10802) ++ [KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs](https://arxiv.org/abs/2406.10802) Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia, Lina Wang -+ [Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition](https://arxiv.org//abs/2406.10932) ++ [Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition](https://arxiv.org/abs/2406.10932) Wenhan Yao, Jiangkun Yang, Yongqiang He, Jia Liu, Weiping Wen -+ [RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models](https://arxiv.org//abs/2406.11020) ++ [RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models](https://arxiv.org/abs/2406.11020) Yuqing Wang, Yun Zhao -+ [Imperceptible Face Forgery Attack via Adversarial Semantic Mask](https://arxiv.org//abs/2406.10887) ++ [Imperceptible Face Forgery Attack via Adversarial Semantic Mask](https://arxiv.org/abs/2406.10887) Decheng Liu, Qixuan Su, Chunlei Peng, Nannan Wang, Xinbo Gao -+ [Improving Adversarial Robustness via Decoupled Visual Representation Masking](https://arxiv.org//abs/2406.10933) ++ [Improving Adversarial Robustness via Decoupled Visual Representation Masking](https://arxiv.org/abs/2406.10933) Decheng Liu, Tao Chen, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao # 2024-06-15 -+ [Graph Neural Backdoor: Fundamentals, Methodologies, Applications, and Future Directions](https://arxiv.org//abs/2406.10573) ++ [Graph Neural Backdoor: Fundamentals, Methodologies, Applications, and Future Directions](https://arxiv.org/abs/2406.10573) Xiao Yang, Gaolei Li, Jianhua Li -+ [Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models](https://arxiv.org//abs/2406.10630) ++ [Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models](https://arxiv.org/abs/2406.10630) Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang, Siheng Chen -+ [Trading Devil: Robust backdoor attack via Stochastic investment models and Bayesian approach](https://arxiv.org//abs/2406.10719) ++ [Trading Devil: Robust backdoor attack via Stochastic investment models and Bayesian approach](https://arxiv.org/abs/2406.10719) Orson Mengara -+ [E-SAGE: Explainability-based Defense Against Backdoor Attacks on Graph Neural Networks](https://arxiv.org//abs/2406.10655) ++ [E-SAGE: Explainability-based Defense Against Backdoor Attacks on Graph Neural Networks](https://arxiv.org/abs/2406.10655) Dingqiang Yuan, Xiaohua Xu, Lei Yu, Tongchang Han, Rongchang Li, Meng Han # 2024-06-14 -+ [Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model](https://arxiv.org//abs/2406.09976) ++ [Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model](https://arxiv.org/abs/2406.09976) Siemen Herremans, Ali Anwar, Siegfried Mercelis -+ [Bag of Lies: Robustness in Continuous Pre-training BERT](https://arxiv.org//abs/2406.09967) ++ [Bag of Lies: Robustness in Continuous Pre-training BERT](https://arxiv.org/abs/2406.09967) Ine Gevers, Walter Daelemans -+ [Robustness-Inspired Defense Against Backdoor Attacks on Graph Neural Networks](https://arxiv.org//abs/2406.09836) ++ [Robustness-Inspired Defense Against Backdoor Attacks on Graph Neural Networks](https://arxiv.org/abs/2406.09836) Zhiwei Zhang, Minhua Lin, Junjie Xu, Zongyu Wu, Enyan Dai, Suhang Wang -+ [Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis](https://arxiv.org//abs/2406.10090) ++ [Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis](https://arxiv.org/abs/2406.10090) Zhang Chen, Luca Demetrio, Srishti Gupta, Xiaoyi Feng, Zhaoqiang Xia, Antonio Emanuele Cinà, Maura Pintor, Luca Oneto, Ambra Demontis, Battista Biggio, Fabio Roli -+ [Semantic Membership Inference Attack against Large Language Models](https://arxiv.org//abs/2406.10218) ++ [Semantic Membership Inference Attack against Large Language Models](https://arxiv.org/abs/2406.10218) Hamid Mozaffari, Virendra J. Marathe -+ [Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models](https://arxiv.org//abs/2406.09669) ++ [Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models](https://arxiv.org/abs/2406.09669) Changjiang Li, Ren Pang, Bochuan Cao, Jinghui Chen, Fenglong Ma, Shouling Ji, Ting Wang -+ [PRISM: A Design Framework for Open-Source Foundation Model Safety](https://arxiv.org//abs/2406.10415) ++ [PRISM: A Design Framework for Open-Source Foundation Model Safety](https://arxiv.org/abs/2406.10415) Terrence Neumann, Bryan Jones -+ [Adaptive Randomized Smoothing: Certifying Multi-Step Defences against Adversarial Examples](https://arxiv.org//abs/2406.10427) ++ [Adaptive Randomized Smoothing: Certifying Multi-Step Defences against Adversarial Examples](https://arxiv.org/abs/2406.10427) Saiyue Lyu, Shadab Shaikh, Frederick Shpilevskiy, Evan Shelhamer, Mathias Lécuyer # 2024-06-13 -+ [Is Diffusion Model Safe? Severe Data Leakage via Gradient-Guided Diffusion Model](https://arxiv.org//abs/2406.09484) ++ [Is Diffusion Model Safe? Severe Data Leakage via Gradient-Guided Diffusion Model](https://arxiv.org/abs/2406.09484) Jiayang Meng, Tao Huang, Hong Chen, Cuiping Li -+ [A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability](https://arxiv.org//abs/2406.09031) ++ [A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability](https://arxiv.org/abs/2406.09031) Pengyun Wang, Junyu Luo, Yanxin Shen, Ming Zhang, Shaoen Qin, Siyu Heng, Xiao Luo # 2024-06-12 -+ [Graph Transductive Defense: a Two-Stage Defense for Graph Membership Inference Attacks](https://arxiv.org//abs/2406.07917) ++ [Graph Transductive Defense: a Two-Stage Defense for Graph Membership Inference Attacks](https://arxiv.org/abs/2406.07917) Peizhi Niu, Chao Pan, Siheng Chen, Olgica Milenkovic -+ [Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding](https://arxiv.org//abs/2406.08200) ++ [Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding](https://arxiv.org/abs/2406.08200) Rui Wang, Liping Chen, Kong AiK Lee, Zhen-Hua Ling -+ [Adversarial Evasion Attack Efficiency against Large Language Models](https://arxiv.org//abs/2406.08050) ++ [Adversarial Evasion Attack Efficiency against Large Language Models](https://arxiv.org/abs/2406.08050) João Vitorino, Eva Maia, Isabel Praça -+ [Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis](https://arxiv.org//abs/2406.07820) ++ [Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis](https://arxiv.org/abs/2406.07820) Prithwijit Chowdhury, Mohit Prabhushankar, Ghassan AlRegib, Mohamed Deriche -+ [Adversarial Patch for 3D Local Feature Extractor](https://arxiv.org//abs/2406.08102) ++ [Adversarial Patch for 3D Local Feature Extractor](https://arxiv.org/abs/2406.08102) Yu Wen Pao, Li Chang Lai, Hong-Yi Lin -+ [Transformation-Dependent Adversarial Attacks](https://arxiv.org//abs/2406.08443) ++ [Transformation-Dependent Adversarial Attacks](https://arxiv.org/abs/2406.08443) Yaoteng Tan, Zikui Cai, M. Salman Asif -+ [On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models](https://arxiv.org//abs/2406.08486) ++ [On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models](https://arxiv.org/abs/2406.08486) Hashmat Shadab Malik, Numan Saeed, Asif Hanif, Muzammal Naseer, Mohammad Yaqub, Salman Khan, Fahad Shahbaz Khan -+ [RRLS : Robust Reinforcement Learning Suite](https://arxiv.org//abs/2406.08406) ++ [RRLS : Robust Reinforcement Learning Suite](https://arxiv.org/abs/2406.08406) Adil Zouitine, David Bertoin, Pierre Clavier, Matthieu Geist, Emmanuel Rachelson -+ [Genetic Column Generation for Computing Lower Bounds for Adversarial Classification](https://arxiv.org//abs/2406.08331) ++ [Genetic Column Generation for Computing Lower Bounds for Adversarial Classification](https://arxiv.org/abs/2406.08331) Maximilian Penka -+ [Graph Transductive Defense: a Two-Stage Defense for Graph Membership Inference Attacks](https://arxiv.org//abs/2406.07917) ++ [Graph Transductive Defense: a Two-Stage Defense for Graph Membership Inference Attacks](https://arxiv.org/abs/2406.07917) Peizhi Niu, Chao Pan, Siheng Chen, Olgica Milenkovic -+ [Adversarial Evasion Attack Efficiency against Large Language Models](https://arxiv.org//abs/2406.08050) ++ [Adversarial Evasion Attack Efficiency against Large Language Models](https://arxiv.org/abs/2406.08050) João Vitorino, Eva Maia, Isabel Praça -+ [Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis](https://arxiv.org//abs/2406.07820) ++ [Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis](https://arxiv.org/abs/2406.07820) Prithwijit Chowdhury, Mohit Prabhushankar, Ghassan AlRegib, Mohamed Deriche -+ [Adversarial Patch for 3D Local Feature Extractor](https://arxiv.org//abs/2406.08102) ++ [Adversarial Patch for 3D Local Feature Extractor](https://arxiv.org/abs/2406.08102) Yu Wen Pao, Li Chang Lai, Hong-Yi Lin -+ [AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer](https://arxiv.org//abs/2406.08298) ++ [AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer](https://arxiv.org/abs/2406.08298) Yitao Xu, Tong Zhang, Sabine Süsstrunk -+ [Transformation-Dependent Adversarial Attacks](https://arxiv.org//abs/2406.08443) ++ [Transformation-Dependent Adversarial Attacks](https://arxiv.org/abs/2406.08443) Yaoteng Tan, Zikui Cai, M. Salman Asif -+ [On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models](https://arxiv.org//abs/2406.08486) ++ [On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models](https://arxiv.org/abs/2406.08486) Hashmat Shadab Malik, Numan Saeed, Asif Hanif, Muzammal Naseer, Mohammad Yaqub, Salman Khan, Fahad Shahbaz Khan -+ [Genetic Column Generation for Computing Lower Bounds for Adversarial Classification](https://arxiv.org//abs/2406.08331) ++ [Genetic Column Generation for Computing Lower Bounds for Adversarial Classification](https://arxiv.org/abs/2406.08331) Maximilian Penka -+ [Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey](https://arxiv.org//abs/2406.07973) ++ [Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey](https://arxiv.org/abs/2406.07973) Shang Wang, Tianqing Zhu, Bo Liu, Ding Ming, Xu Guo, Dayong Ye, Wanlei Zhou -+ [I Don't Know You, But I Can Catch You: Real-Time Defense against Diverse Adversarial Patches for Object Detectors](https://arxiv.org//abs/2406.10285) ++ [I Don't Know You, But I Can Catch You: Real-Time Defense against Diverse Adversarial Patches for Object Detectors](https://arxiv.org/abs/2406.10285) Zijin Lin, Yue Zhao, Kai Chen, Jinwen He -+ [Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries](https://arxiv.org//abs/2406.10280) ++ [Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries](https://arxiv.org/abs/2406.10280) Yu-Hsiang Huang, Yuche Tsai, Hsiang Hsiao, Hong-Yi Lin, Shou-De Lin # 2024-06-11 -+ [Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples](https://arxiv.org//abs/2406.06967) ++ [Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples](https://arxiv.org/abs/2406.06967) Kailas Dayanandan, Anand Sinha, Brejesh Lall -+ [Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study](https://arxiv.org//abs/2406.07057) ++ [Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study](https://arxiv.org/abs/2406.07057) Yichi Zhang, Yao Huang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Yifan Wang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu -+ [Merging Improves Self-Critique Against Jailbreak Attacks](https://arxiv.org//abs/2406.07188) ++ [Merging Improves Self-Critique Against Jailbreak Attacks](https://arxiv.org/abs/2406.07188) Victor Gallego -+ [AudioMarkBench: Benchmarking Robustness of Audio Watermarking](https://arxiv.org//abs/2406.06979) ++ [AudioMarkBench: Benchmarking Robustness of Audio Watermarking](https://arxiv.org/abs/2406.06979) Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, Neil Zhenqiang Gong -+ [Erasing Radio Frequency Fingerprinting via Active Adversarial Perturbation](https://arxiv.org//abs/2406.07349) ++ [Erasing Radio Frequency Fingerprinting via Active Adversarial Perturbation](https://arxiv.org/abs/2406.07349) Zhaoyi Lu, Wenchao Xu, Ming Tu, Xin Xie, Cunqing Hua, Nan Cheng -+ [Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions](https://arxiv.org//abs/2406.07685) ++ [Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions](https://arxiv.org/abs/2406.07685) Leonardo Cotta, Chris J. Maddison -+ [Adversarial Machine Unlearning](https://arxiv.org//abs/2406.07687) ++ [Adversarial Machine Unlearning](https://arxiv.org/abs/2406.07687) Zonglin Di, Sixie Yu, Yevgeniy Vorobeychik, Yang Liu -+ [Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions](https://arxiv.org//abs/2406.07685) ++ [Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions](https://arxiv.org/abs/2406.07685) Leonardo Cotta, Chris J. Maddison -+ [Adversarial Machine Unlearning](https://arxiv.org//abs/2406.07687) ++ [Adversarial Machine Unlearning](https://arxiv.org/abs/2406.07687) Zonglin Di, Sixie Yu, Yevgeniy Vorobeychik, Yang Liu # 2024-06-10 -+ [Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models](https://arxiv.org//abs/2406.05948) ++ [Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models](https://arxiv.org/abs/2406.05948) Xi Li, Yusen Zhang, Renze Lou, Chen Wu, Jiaqi Wang -+ [Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning](https://arxiv.org//abs/2406.06207) ++ [Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning](https://arxiv.org/abs/2406.06207) Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Yongsheng Zhu, Guangquan Xu, Jiqiang Liu, Xiangliang Zhang -+ [Reinforced Compressive Neural Architecture Search for Versatile Adversarial Robustness](https://arxiv.org//abs/2406.06792) ++ [Reinforced Compressive Neural Architecture Search for Versatile Adversarial Robustness](https://arxiv.org/abs/2406.06792) Dingrong Wang, Hitesh Sapkota, Zhiqiang Tao, Qi Yu -+ [An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection](https://arxiv.org//abs/2406.06822) ++ [An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection](https://arxiv.org/abs/2406.06822) Shenao Yan, Shen Wang, Yue Duan, Hanbin Hong, Kiho Lee, Doowon Kim, Yuan Hong -+ [A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures](https://arxiv.org//abs/2406.06852) ++ [A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures](https://arxiv.org/abs/2406.06852) Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, Jie Fu, Yichao Feng, Fengjun Pan, Luu Anh Tuan # 2024-06-09 -+ [PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection](https://arxiv.org//abs/2406.05826) ++ [PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection](https://arxiv.org/abs/2406.05826) Wei Li, Pin-Yu Chen, Sijia Liu, Ren Wang -+ [Injecting Undetectable Backdoors in Deep Learning and Language Models](https://arxiv.org//abs/2406.05660) ++ [Injecting Undetectable Backdoors in Deep Learning and Language Models](https://arxiv.org/abs/2406.05660) Alkis Kalavasis, Amin Karbasi, Argyris Oikonomou, Katerina Sotiraki, Grigoris Velegkas, Manolis Zampetakis -+ [DMS: Addressing Information Loss with More Steps for Pragmatic Adversarial Attacks](https://arxiv.org//abs/2406.07580) ++ [DMS: Addressing Information Loss with More Steps for Pragmatic Adversarial Attacks](https://arxiv.org/abs/2406.07580) Zhiyu Zhu, Jiayu Zhang, Xinyi Wang, Zhibo Jin, Huaming Chen -+ [Artificial Intelligence as the New Hacker: Developing Agents for Offensive Security](https://arxiv.org//abs/2406.07561) ++ [Artificial Intelligence as the New Hacker: Developing Agents for Offensive Security](https://arxiv.org/abs/2406.07561) Leroy Jacob Valencia -+ [DMS: Addressing Information Loss with More Steps for Pragmatic Adversarial Attacks](https://arxiv.org//abs/2406.07580) ++ [DMS: Addressing Information Loss with More Steps for Pragmatic Adversarial Attacks](https://arxiv.org/abs/2406.07580) Zhiyu Zhu, Jiayu Zhang, Xinyi Wang, Zhibo Jin, Huaming Chen # 2024-06-08 -+ [SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner](https://arxiv.org//abs/2406.05498) ++ [SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner](https://arxiv.org/abs/2406.05498) Xunguang Wang, Daoyuan Wu, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Shuai Wang, Yingjiu Li, Yang Liu, Ning Liu, Juergen Rahmel -+ [Enhancing Adversarial Transferability via Information Bottleneck Constraints](https://arxiv.org//abs/2406.05531) ++ [Enhancing Adversarial Transferability via Information Bottleneck Constraints](https://arxiv.org/abs/2406.05531) Biqing Qi, Junqi Gao, Jianxing Liu, Ligang Wu, Bowen Zhou -+ [Exploring Adversarial Robustness of Deep State Space Models](https://arxiv.org//abs/2406.05532) ++ [Exploring Adversarial Robustness of Deep State Space Models](https://arxiv.org/abs/2406.05532) Biqing Qi, Yang Luo, Junqi Gao, Pengfei Li, Kai Tian, Zhiyuan Ma, Bowen Zhou -+ [Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability](https://arxiv.org//abs/2406.05535) ++ [Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability](https://arxiv.org/abs/2406.05535) Junqi Gao, Biqing Qi, Yao Li, Zhichang Guo, Dong Li, Yuming Xing, Dazhi Zhang -+ [Adversarial flows: A gradient flow characterization of adversarial attacks](https://arxiv.org//abs/2406.05376) ++ [Adversarial flows: A gradient flow characterization of adversarial attacks](https://arxiv.org/abs/2406.05376) Lukas Weigand, Tim Roith, Martin Burger # 2024-06-07 -+ [Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations](https://arxiv.org//abs/2406.04755) ++ [Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations](https://arxiv.org/abs/2406.04755) Weiran Lin, Anna Gerchanovsky, Omer Akgul, Lujo Bauer, Matt Fredrikson, Zifan Wang -+ [ADBA:Approximation Decision Boundary Approach for Black-Box Adversarial Attacks](https://arxiv.org//abs/2406.04998) ++ [ADBA:Approximation Decision Boundary Approach for Black-Box Adversarial Attacks](https://arxiv.org/abs/2406.04998) Feiyang Wang, Xingquan Zuo, Hai Huang, Gang Chen -+ [Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks](https://arxiv.org//abs/2406.04932) ++ [Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks](https://arxiv.org/abs/2406.04932) Lanzino Romeo, Fontana Federico, Diko Anxhelo, Marini Marco Raoul, Cinque Luigi -+ [The Price of Implicit Bias in Adversarially Robust Generalization](https://arxiv.org//abs/2406.04981) ++ [The Price of Implicit Bias in Adversarially Robust Generalization](https://arxiv.org/abs/2406.04981) Nikolaos Tsilivis, Natalie Frank, Nathan Srebro, Julia Kempe -+ [Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs](https://arxiv.org//abs/2406.06622) ++ [Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs](https://arxiv.org/abs/2406.06622) Fan Liu, Zhao Xu, Hao Liu # 2024-06-06 -+ [Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection](https://arxiv.org//abs/2406.04070) ++ [Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection](https://arxiv.org/abs/2406.04070) Yinting Wu, Pai Peng, Bo Cai, Le Li -+ [Improving Alignment and Robustness with Short Circuiting](https://arxiv.org//abs/2406.04313) ++ [Improving Alignment and Robustness with Short Circuiting](https://arxiv.org/abs/2406.04313) Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks -+ [Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt](https://arxiv.org//abs/2406.04031) ++ [Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt](https://arxiv.org/abs/2406.04031) Zonghao Ying, Aishan Liu, Tianyuan Zhang, Zhengmin Yu, Siyuan Liang, Xianglong Liu, Dacheng Tao -+ [AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens](https://arxiv.org//abs/2406.03805) ++ [AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens](https://arxiv.org/abs/2406.03805) Lin Lu, Hai Yan, Zenghui Yuan, Jiawen Shi, Wenqi Wei, Pin-Yu Chen, Pan Zhou -+ [PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning](https://arxiv.org//abs/2406.04478) ++ [PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning](https://arxiv.org/abs/2406.04478) Tianrong Zhang, Zhaohan Xi, Ting Wang, Prasenjit Mitra, Jinghui Chen # 2024-06-05 -+ [FREA: Feasibility-Guided Generation of Safety-Critical Scenarios with Reasonable Adversariality](https://arxiv.org//abs/2406.02983) ++ [FREA: Feasibility-Guided Generation of Safety-Critical Scenarios with Reasonable Adversariality](https://arxiv.org/abs/2406.02983) Keyu Chen, Yuheng Lei, Hao Cheng, Haoran Wu, Wenchao Sun, Sifa Zheng -+ [BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents](https://arxiv.org//abs/2406.03007) ++ [BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents](https://arxiv.org/abs/2406.03007) Yifei Wang, Dizhan Xue, Shengjie Zhang, Shengsheng Qian -+ [DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross Domain](https://arxiv.org//abs/2406.03017) ++ [DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross Domain](https://arxiv.org/abs/2406.03017) Jun Liu, Jiantao Zhou, Jiandian Zeng, Jinyu Tian -+ [VQUNet: Vector Quantization U-Net for Defending Adversarial Atacks by Regularizing Unwanted Noise](https://arxiv.org//abs/2406.03117) ++ [VQUNet: Vector Quantization U-Net for Defending Adversarial Atacks by Regularizing Unwanted Noise](https://arxiv.org/abs/2406.03117) Zhixun He, Mukesh Singhal -+ [ZeroPur: Succinct Training-Free Adversarial Purification](https://arxiv.org//abs/2406.03143) ++ [ZeroPur: Succinct Training-Free Adversarial Purification](https://arxiv.org/abs/2406.03143) Xiuli Bi, Zonglin Yang, Bo Liu, Xiaodong Cun, Chi-Man Pun, Pietro Lio, Bin Xiao -+ [Are Your Models Still Fair? Fairness Attacks on Graph Neural Networks via Node Injections](https://arxiv.org//abs/2406.03052) ++ [Are Your Models Still Fair? Fairness Attacks on Graph Neural Networks via Node Injections](https://arxiv.org/abs/2406.03052) Zihan Luo, Hong Huang, Yongkang Zhou, Jiping Zhang, Nuo Chen -+ [Distributional Adversarial Loss](https://arxiv.org//abs/2406.03458) ++ [Distributional Adversarial Loss](https://arxiv.org/abs/2406.03458) Saba Ahmadi, Siddharth Bhandari, Avrim Blum, Chen Dan, Prabhav Jain -+ [Defending Large Language Models Against Attacks With Residual Stream Activation Analysis](https://arxiv.org//abs/2406.03230) ++ [Defending Large Language Models Against Attacks With Residual Stream Activation Analysis](https://arxiv.org/abs/2406.03230) Amelia Kawasaki, Andrew Davis, Houssam Abbas -+ [Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders](https://arxiv.org//abs/2406.03508) ++ [Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders](https://arxiv.org/abs/2406.03508) Tingxu Han, Weisong Sun, Ziqi Ding, Chunrong Fang, Hanwei Qian, Jiaxun Li, Zhenyu Chen, Xiangyu Zhang @@ -23386,238 +23386,238 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Erhu Liu, Zonglin Yang, Bo Liu, Bin Xiao, Xiuli Bi # 2024-06-04 -+ [Certifiably Byzantine-Robust Federated Conformal Prediction](https://arxiv.org//abs/2406.01960) ++ [Certifiably Byzantine-Robust Federated Conformal Prediction](https://arxiv.org/abs/2406.01960) Mintong Kang, Zhen Lin, Jimeng Sun, Cao Xiao, Bo Li -+ [CR-UTP: Certified Robustness against Universal Text Perturbations](https://arxiv.org//abs/2406.01873) ++ [CR-UTP: Certified Robustness against Universal Text Perturbations](https://arxiv.org/abs/2406.01873) Qian Lou, Xin Liang, Jiaqi Xue, Yancheng Zhang, Rui Xie, Mengxin Zheng -+ [QROA: A Black-Box Query-Response Optimization Attack on LLMs](https://arxiv.org//abs/2406.02044) ++ [QROA: A Black-Box Query-Response Optimization Attack on LLMs](https://arxiv.org/abs/2406.02044) Hussein Jawad, Nicolas J.-B. BRUNEL (LaMME) -+ [MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training](https://arxiv.org//abs/2406.01867) ++ [MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training](https://arxiv.org/abs/2406.01867) Kengo Uchida, Takashi Shibuya, Yuhta Takida, Naoki Murata, Shusuke Takahashi, Yuki Mitsufuji -+ [SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks](https://arxiv.org//abs/2406.01894) ++ [SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks](https://arxiv.org/abs/2406.01894) Yi Pan, Jun-Jie Huang, Zihan Chen, Wentao Zhao, Ziyue Wang -+ [Advancing Generalized Transfer Attack with Initialization Derived Bilevel Optimization and Dynamic Sequence Truncation](https://arxiv.org//abs/2406.02064) ++ [Advancing Generalized Transfer Attack with Initialization Derived Bilevel Optimization and Dynamic Sequence Truncation](https://arxiv.org/abs/2406.02064) Yaohua Liu, Jiaxin Gao, Xuan Liu, Xianghao Jiao, Xin Fan, Risheng Liu -+ [Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing](https://arxiv.org//abs/2406.02309) ++ [Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing](https://arxiv.org/abs/2406.02309) Youwei Shu, Xi Xiao, Derui Wang, Yuxin Cao, Siji Chen, Jason Xue, Linyi Li, Bo Li -+ [Ai-Sampler: Adversarial Learning of Markov kernels with involutive maps](https://arxiv.org//abs/2406.02490) ++ [Ai-Sampler: Adversarial Learning of Markov kernels with involutive maps](https://arxiv.org/abs/2406.02490) Evgenii Egorov, Ricardo Valperga, Efstratios Gavves -+ [Auditing Privacy Mechanisms via Label Inference Attacks](https://arxiv.org//abs/2406.02797) ++ [Auditing Privacy Mechanisms via Label Inference Attacks](https://arxiv.org/abs/2406.02797) Róbert István Busa-Fekete, Travis Dick, Claudio Gentile, Andrés Muñoz Medina, Adam Smith, Marika Swanberg # 2024-06-03 -+ [BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models](https://arxiv.org//abs/2406.00083) ++ [BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models](https://arxiv.org/abs/2406.00083) Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, Qian Lou -+ [FedAdOb: Privacy-Preserving Federated Deep Learning with Adaptive Obfuscation](https://arxiv.org//abs/2406.01085) ++ [FedAdOb: Privacy-Preserving Federated Deep Learning with Adaptive Obfuscation](https://arxiv.org/abs/2406.01085) Hanlin Gu, Jiahuan Luo, Yan Kang, Yuan Yao, Gongxi Zhu, Bowen Li, Lixin Fan, Qiang Yang -+ [Are AI-Generated Text Detectors Robust to Adversarial Perturbations?](https://arxiv.org//abs/2406.01179) ++ [Are AI-Generated Text Detectors Robust to Adversarial Perturbations?](https://arxiv.org/abs/2406.01179) Guanhua Huang, Yuchen Zhang, Zhe Li, Yongjian You, Mingze Wang, Zhouwang Yang -+ [PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration](https://arxiv.org//abs/2406.01394) ++ [PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration](https://arxiv.org/abs/2406.01394) Ziqian Zeng, Jianwei Wang, Zhengdong Lu, Huiping Zhuang, Cen Chen -+ [Exploring Vulnerabilities and Protections in Large Language Models: A Survey](https://arxiv.org//abs/2406.00240) ++ [Exploring Vulnerabilities and Protections in Large Language Models: A Survey](https://arxiv.org/abs/2406.00240) Frank Weizhen Liu, Chenhui Hu -+ [Are you still on track!? Catching LLM Task Drift with Activations](https://arxiv.org//abs/2406.00799) ++ [Are you still on track!? Catching LLM Task Drift with Activations](https://arxiv.org/abs/2406.00799) Sahar Abdelnabi, Aideen Fay, Giovanni Cherubin, Ahmed Salem, Mario Fritz, Andrew Paverd -+ [Robust Infidelity: When Faithfulness Measures on Masked Language Models Are Misleading](https://arxiv.org//abs/2308.06795) ++ [Robust Infidelity: When Faithfulness Measures on Masked Language Models Are Misleading](https://arxiv.org/abs/2308.06795) Evan Crothers, Herna Viktor, Nathalie Japkowicz -+ [Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast](https://arxiv.org//abs/2402.08567) ++ [Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast](https://arxiv.org/abs/2402.08567) Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, Min Lin -+ [Adversarial 3D Virtual Patches using Integrated Gradients](https://arxiv.org//abs/2406.00282) ++ [Adversarial 3D Virtual Patches using Integrated Gradients](https://arxiv.org/abs/2406.00282) Chengzeng You, Zhongyuan Hau, Binbin Xu, Soteris Demetriou -+ [Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training](https://arxiv.org//abs/2406.01018) ++ [Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training](https://arxiv.org/abs/2406.01018) Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans -+ [Poisoning Attacks and Defenses in Recommender Systems: A Survey](https://arxiv.org//abs/2406.01022) ++ [Poisoning Attacks and Defenses in Recommender Systems: A Survey](https://arxiv.org/abs/2406.01022) Zongwei Wang, Junliang Yu, Min Gao, Guanhua Ye, Shazia Sadiq, Hongzhi Yin -+ [Constraint-based Adversarial Example Synthesis](https://arxiv.org//abs/2406.01219) ++ [Constraint-based Adversarial Example Synthesis](https://arxiv.org/abs/2406.01219) Fang Yu, Ya-Yu Chi, Yu-Fang Chen -+ [Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers](https://arxiv.org//abs/2406.01765) ++ [Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers](https://arxiv.org/abs/2406.01765) Fatemeh Nourilenjan Nokabadi, Jean-François Lalonde, Christian Gagné -+ [Model for Peanuts: Hijacking ML Models without Training Access is Possible](https://arxiv.org//abs/2406.01708) ++ [Model for Peanuts: Hijacking ML Models without Training Access is Possible](https://arxiv.org/abs/2406.01708) Mahmoud Ghorbel, Halima Bouzidi, Ioan Marius Bilasco, Ihsen Alouani -+ [Safeguarding Large Language Models: A Survey](https://arxiv.org//abs/2406.02622) ++ [Safeguarding Large Language Models: A Survey](https://arxiv.org/abs/2406.02622) Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng, Saddek Bensalem, Xiaowei Huang -+ [Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits](https://arxiv.org//abs/2406.02619) ++ [Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits](https://arxiv.org/abs/2406.02619) Andis Draguns, Andrew Gritsevskiy, Sumeet Ramesh Motwani, Charlie Rogers-Smith, Jeffrey Ladish, Christian Schroeder de Witt # 2024-06-02 -+ [Stealing Image-to-Image Translation Models With a Single Query](https://arxiv.org//abs/2406.00828) ++ [Stealing Image-to-Image Translation Models With a Single Query](https://arxiv.org/abs/2406.00828) Nurit Spingarn-Eliezer, Tomer Michaeli -+ [Invisible Backdoor Attacks on Diffusion Models](https://arxiv.org//abs/2406.00816) ++ [Invisible Backdoor Attacks on Diffusion Models](https://arxiv.org/abs/2406.00816) Sen Li, Junchi Ma, Minhao Cheng -+ [Generalization Bound and New Algorithm for Clean-Label Backdoor Attack](https://arxiv.org//abs/2406.00588) ++ [Generalization Bound and New Algorithm for Clean-Label Backdoor Attack](https://arxiv.org/abs/2406.00588) Lijia Yu, Shuang Liu, Yibo Miao, Xiao-Shan Gao, Lijun Zhang -+ [Constrained Adaptive Attack: Effective Adversarial Attack Against Deep Neural Networks for Tabular Data](https://arxiv.org//abs/2406.00775) ++ [Constrained Adaptive Attack: Effective Adversarial Attack Against Deep Neural Networks for Tabular Data](https://arxiv.org/abs/2406.00775) Thibault Simonetto, Salah Ghamizi, Maxime Cordy -+ [Teams of LLM Agents can Exploit Zero-Day Vulnerabilities](https://arxiv.org//abs/2406.01637) ++ [Teams of LLM Agents can Exploit Zero-Day Vulnerabilities](https://arxiv.org/abs/2406.01637) Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang -+ [A Novel Defense Against Poisoning Attacks on Federated Learning: LayerCAM Augmented with Autoencoder](https://arxiv.org//abs/2406.02605) ++ [A Novel Defense Against Poisoning Attacks on Federated Learning: LayerCAM Augmented with Autoencoder](https://arxiv.org/abs/2406.02605) Jingjing Zheng, Xin Yuan, Kai Li, Wei Ni, Eduardo Tovar, Jon Crowcroft # 2024-06-01 -+ [Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training](https://arxiv.org//abs/2406.00685) ++ [Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training](https://arxiv.org/abs/2406.00685) Jiacheng Zhang, Feng Liu, Dawei Zhou, Jingfeng Zhang, Tongliang Liu -+ [Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model](https://arxiv.org//abs/2406.03409) ++ [Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model](https://arxiv.org/abs/2406.03409) Jinyin Chen, Xiaoming Zhao, Haibin Zheng, Xiao Li, Sheng Xiang, Haifeng Guo # 2024-05-31 -+ [Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens](https://arxiv.org//abs/2405.20653) ++ [Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens](https://arxiv.org/abs/2405.20653) Jiahao Yu, Haozheng Luo, Jerry Yao-Chieh, Wenbo Guo, Han Liu, Xinyu Xing -+ [GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search](https://arxiv.org//abs/2405.20725) ++ [GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search](https://arxiv.org/abs/2405.20725) Wenbo Yu, Hao Fang, Bin Chen, Xiaohang Sui, Chuan Chen, Hao Wu, Shu-Tao Xia, Ke Xu -+ [Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training](https://arxiv.org//abs/2405.20978) ++ [Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training](https://arxiv.org/abs/2405.20978) Feiteng Fang, Yuelin Bai, Shiwen Ni, Min Yang, Xiaojun Chen, Ruifeng Xu -+ [Certifying Global Robustness for Deep Neural Networks](https://arxiv.org//abs/2405.20556) ++ [Certifying Global Robustness for Deep Neural Networks](https://arxiv.org/abs/2405.20556) You Li, Guannan Zhao, Shuyu Kong, Yunqi He, Hai Zhou -+ [Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization](https://arxiv.org//abs/2405.20584) ++ [Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization](https://arxiv.org/abs/2405.20584) Yisu Liu, Jinyang An, Wanqian Zhang, Dayan Wu, Jingzi Gu, Zheng Lin, Weiping Wang -+ [GANcrop: A Contrastive Defense Against Backdoor Attacks in Federated Learning](https://arxiv.org//abs/2405.20727) ++ [GANcrop: A Contrastive Defense Against Backdoor Attacks in Federated Learning](https://arxiv.org/abs/2405.20727) Xiaoyun Gan, Shanyu Gan, Taizhi Su, Peng Liu -+ [ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning](https://arxiv.org//abs/2405.20975) ++ [ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning](https://arxiv.org/abs/2405.20975) Zhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia, Bo Li, Radha Poovendran -+ [Improved Techniques for Optimization-Based Jailbreaking on Large Language Models](https://arxiv.org//abs/2405.21018) ++ [Improved Techniques for Optimization-Based Jailbreaking on Large Language Models](https://arxiv.org/abs/2405.21018) Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, Min Lin -+ [Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations](https://arxiv.org//abs/2405.20672) ++ [Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations](https://arxiv.org/abs/2405.20672) Davide Coppola, Hwee Kuan Lee -+ [Query Provenance Analysis for Robust and Efficient Query-based Black-box Attack Defense](https://arxiv.org//abs/2405.20641) ++ [Query Provenance Analysis for Robust and Efficient Query-based Black-box Attack Defense](https://arxiv.org/abs/2405.20641) Shaofei Li, Ziqi Zhang, Haomin Jia, Ding Li, Yao Guo, Xiangqun Chen -+ [BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning](https://arxiv.org//abs/2405.20862) ++ [BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning](https://arxiv.org/abs/2405.20862) Songze Li, Yanbo Dai -+ [RASE: Efficient Privacy-preserving Data Aggregation against Disclosure Attacks for IoTs](https://arxiv.org//abs/2405.20914) ++ [RASE: Efficient Privacy-preserving Data Aggregation against Disclosure Attacks for IoTs](https://arxiv.org/abs/2405.20914) Zuyan Wang, Jun Tao, Dika Zou -+ [Exfiltration of personal information from ChatGPT via prompt injection](https://arxiv.org//abs/2406.00199) ++ [Exfiltration of personal information from ChatGPT via prompt injection](https://arxiv.org/abs/2406.00199) Gregory Schwartzman @@ -23628,305 +23628,305 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Wenbo Yu, Hao Fang, Bin Chen, Xiaohang Sui, Chuan Chen, Hao Wu, Shu-Tao Xia, Ke Xu # 2024-05-30 -+ [HOLMES: to Detect Adversarial Examples with Multiple Detectors](https://arxiv.org//abs/2405.19956) ++ [HOLMES: to Detect Adversarial Examples with Multiple Detectors](https://arxiv.org/abs/2405.19956) Jing Wen -+ [Efficient LLM-Jailbreaking by Introducing Visual Modality](https://arxiv.org//abs/2405.20015) ++ [Efficient LLM-Jailbreaking by Introducing Visual Modality](https://arxiv.org/abs/2405.20015) Zhenxing Niu, Yuyao Sun, Haodong Ren, Haoxuan Ji, Quan Wang, Xiaoke Ma, Gang Hua, Rong Jin -+ [Context Injection Attacks on Large Language Models](https://arxiv.org//abs/2405.20234) ++ [Context Injection Attacks on Large Language Models](https://arxiv.org/abs/2405.20234) Cheng'an Wei, Kai Chen, Yue Zhao, Yujia Gong, Lu Xiang, Shenchen Zhu -+ [Large Language Model Watermark Stealing With Mixed Integer Programming](https://arxiv.org//abs/2405.19677) ++ [Large Language Model Watermark Stealing With Mixed Integer Programming](https://arxiv.org/abs/2405.19677) Zhaoxi Zhang, Xiaomei Zhang, Yanjun Zhang, Leo Yu Zhang, Chao Chen, Shengshan Hu, Asif Gill, Shirui Pan -+ [AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization](https://arxiv.org//abs/2405.19668) ++ [AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization](https://arxiv.org/abs/2405.19668) Jiawei Chen, Xiao Yang, Zhengwei Fang, Yu Tian, Yinpeng Dong, Zhaoxia Yin, Hang Su -+ [DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World](https://arxiv.org//abs/2405.19990) ++ [DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World](https://arxiv.org/abs/2405.19990) Wenli Sun, Xinyang Jiang, Dongsheng Li, Cairong Zhao -+ [Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models](https://arxiv.org//abs/2405.20090) ++ [Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models](https://arxiv.org/abs/2405.20090) Hao Cheng, Erjia Xiao, Jiahang Cao, Le Yang, Kaidi Xu, Jindong Gu, Renjing Xu -+ [Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness](https://arxiv.org//abs/2405.20291) ++ [Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness](https://arxiv.org/abs/2405.20291) Weilin Lin, Li Liu, Shaokui Wei, Jianze Li, Hui Xiong -+ [BAN: Detecting Backdoors Activated by Adversarial Neuron Noise](https://arxiv.org//abs/2405.19928) ++ [BAN: Detecting Backdoors Activated by Adversarial Neuron Noise](https://arxiv.org/abs/2405.19928) Xiaoyun Xu, Zhuoran Liu, Stefanos Koffas, Shujian Yu, Stjepan Picek -+ [Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable](https://arxiv.org//abs/2405.20272) ++ [Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable](https://arxiv.org/abs/2405.20272) Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu -+ [Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models](https://arxiv.org//abs/2405.19598) ++ [Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models](https://arxiv.org/abs/2405.19598) Fujiao Ji, Kiho Lee, Hyungjoon Koo, Wenhao You, Euijin Choo, Hyoungshick Kim, Doowon Kim -+ [Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks](https://arxiv.org//abs/2405.20099) ++ [Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks](https://arxiv.org/abs/2405.20099) Chen Xiong, Xiangyu Qi, Pin-Yu Chen, Tsung-Yi Ho -+ [Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation](https://arxiv.org//abs/2405.20446) ++ [Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation](https://arxiv.org/abs/2405.20446) Maya Anderson, Guy Amit, Abigail Goldsteen -+ [Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters](https://arxiv.org//abs/2405.20413) ++ [Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters](https://arxiv.org/abs/2405.20413) Haibo Jin, Andy Zhou, Joe D. Menke, Haohan Wang -+ [Phantom: General Trigger Attacks on Retrieval Augmented Language Generation](https://arxiv.org//abs/2405.20485) ++ [Phantom: General Trigger Attacks on Retrieval Augmented Language Generation](https://arxiv.org/abs/2405.20485) Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A. Choquette-Choo, Milad Nasr, Cristina Nita-Rotaru, Alina Oprea -+ [Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images](https://arxiv.org//abs/2405.20469) ++ [Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images](https://arxiv.org/abs/2405.20469) Krishnakant Singh, Thanush Navaratnam, Jannik Holmer, Simone Schaub-Meyer, Stefan Roth -+ [Enhancing Adversarial Robustness in SNNs with Sparse Gradients](https://arxiv.org//abs/2405.20355) ++ [Enhancing Adversarial Robustness in SNNs with Sparse Gradients](https://arxiv.org/abs/2405.20355) Yujia Liu, Tong Bu, Jianhao Ding, Zecheng Hao, Tiejun Huang, Zhaofei Yu -+ [SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents](https://arxiv.org//abs/2405.20539) ++ [SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents](https://arxiv.org/abs/2405.20539) Ethan Rathbun, Christopher Amato, Alina Oprea -+ [Cutting Through the Noise: Boosting LLM Performance on Math Word Problems](https://arxiv.org//abs/2406.15444) ++ [Cutting Through the Noise: Boosting LLM Performance on Math Word Problems](https://arxiv.org/abs/2406.15444) Ujjwala Anantheswaran, Himanshu Gupta, Kevin Scaria, Shreyas Verma, Chitta Baral, Swaroop Mishra # 2024-05-29 -+ [Leveraging Many-To-Many Relationships for Defending Against Visual-Language Adversarial Attacks](https://arxiv.org//abs/2405.18770) ++ [Leveraging Many-To-Many Relationships for Defending Against Visual-Language Adversarial Attacks](https://arxiv.org/abs/2405.18770) Futa Waseda, Antonio Tejero-de-Pablos -+ [EntProp: High Entropy Propagation for Improving Accuracy and Robustness](https://arxiv.org//abs/2405.18931) ++ [EntProp: High Entropy Propagation for Improving Accuracy and Robustness](https://arxiv.org/abs/2405.18931) Shohei Enomoto -+ [Verifiably Robust Conformal Prediction](https://arxiv.org//abs/2405.18942) ++ [Verifiably Robust Conformal Prediction](https://arxiv.org/abs/2405.18942) Linus Jeary, Tom Kuipers, Mehran Hosseini, Nicola Paoletti -+ [DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints](https://arxiv.org//abs/2405.19026) ++ [DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints](https://arxiv.org/abs/2405.19026) Andrew Zhao, Quentin Xu, Matthieu Lin, Shenzhi Wang, Yong-jin Liu, Zilong Zheng, Gao Huang -+ [Convex neural network synthesis for robustness in the 1-norm](https://arxiv.org//abs/2405.19029) ++ [Convex neural network synthesis for robustness in the 1-norm](https://arxiv.org/abs/2405.19029) Ross Drummond, Chris Guiver, Matthew C. Turner -+ [Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior](https://arxiv.org//abs/2405.19098) ++ [Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior](https://arxiv.org/abs/2405.19098) Shuyu Cheng, Yibo Miao, Yinpeng Dong, Xiao Yang, Xiao-Shan Gao, Jun Zhu -+ [Model Agnostic Defense against Adversarial Patch Attacks on Object Detection in Unmanned Aerial Vehicles](https://arxiv.org//abs/2405.19179) ++ [Model Agnostic Defense against Adversarial Patch Attacks on Object Detection in Unmanned Aerial Vehicles](https://arxiv.org/abs/2405.19179) Saurabh Pathak, Samridha Shrestha, Abdelrahman AlMahmoud -+ [Robust Entropy Search for Safe Efficient Bayesian Optimization](https://arxiv.org//abs/2405.19059) ++ [Robust Entropy Search for Safe Efficient Bayesian Optimization](https://arxiv.org/abs/2405.19059) Dorina Weichert, Alexander Kister, Patrick Link, Sebastian Houben, Gunar Ernis -+ [Voice Jailbreak Attacks Against GPT-4o](https://arxiv.org//abs/2405.19103) ++ [Voice Jailbreak Attacks Against GPT-4o](https://arxiv.org/abs/2405.19103) Xinyue Shen, Yixin Wu, Michael Backes, Yang Zhang -+ [PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN](https://arxiv.org//abs/2405.18744) ++ [PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN](https://arxiv.org/abs/2405.18744) Fei Zheng, Chaochao Chen, Zhongxuan Han, Xiaolin Zheng -+ [Node Injection Attack Based on Label Propagation Against Graph Neural Network](https://arxiv.org//abs/2405.18824) ++ [Node Injection Attack Based on Label Propagation Against Graph Neural Network](https://arxiv.org/abs/2405.18824) Peican Zhu, Zechen Pan, Keke Tang, Xiaodong Cui, Jinhuan Wang, Qi Xuan -+ [AI Risk Management Should Incorporate Both Safety and Security](https://arxiv.org//abs/2405.19524) ++ [AI Risk Management Should Incorporate Both Safety and Security](https://arxiv.org/abs/2405.19524) Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal -+ [Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies](https://arxiv.org//abs/2405.19424) ++ [Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies](https://arxiv.org/abs/2405.19424) Yipu Chen, Haotian Xue, Yongxin Chen # 2024-05-28 -+ [Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing](https://arxiv.org//abs/2405.18166) ++ [Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing](https://arxiv.org/abs/2405.18166) Wei Zhao, Zhe Li, Yige Li, Ye Zhang, Jun Sun -+ [Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding](https://arxiv.org//abs/2405.18180) ++ [Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding](https://arxiv.org/abs/2405.18180) Daniel Bethell, Simos Gerasimou, Radu Calinescu, Calum Imrie -+ [Rethinking Pruning for Backdoor Mitigation: An Optimization Perspective](https://arxiv.org//abs/2405.17746) ++ [Rethinking Pruning for Backdoor Mitigation: An Optimization Perspective](https://arxiv.org/abs/2405.17746) Nan Li, Haiyang Yu, Ping Yi -+ [Magnitude-based Neuron Pruning for Backdoor Defens](https://arxiv.org//abs/2405.17750) ++ [Magnitude-based Neuron Pruning for Backdoor Defens](https://arxiv.org/abs/2405.17750) Nan Li, Haoyu Jiang, Ping Yi -+ [White-box Multimodal Jailbreaks Against Large Vision-Language Models](https://arxiv.org//abs/2405.17894) ++ [White-box Multimodal Jailbreaks Against Large Vision-Language Models](https://arxiv.org/abs/2405.17894) Ruofan Wang, Xingjun Ma, Hanxu Zhou, Chuanjun Ji, Guangnan Ye, Yu-Gang Jiang -+ [ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator](https://arxiv.org//abs/2405.18111) ++ [ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator](https://arxiv.org/abs/2405.18111) Junda Zhu, Lingyong Yan, Haibo Shi, Dawei Yin, Lei Sha -+ [Towards Unified Robustness Against Both Backdoor and Adversarial Attacks](https://arxiv.org//abs/2405.17929) ++ [Towards Unified Robustness Against Both Backdoor and Adversarial Attacks](https://arxiv.org/abs/2405.17929) Zhenxing Niu, Yuyao Sun, Qiguang Miao, Rong Jin, Gang Hua -+ [Cross-Context Backdoor Attacks against Graph Prompt Learning](https://arxiv.org//abs/2405.17984) ++ [Cross-Context Backdoor Attacks against Graph Prompt Learning](https://arxiv.org/abs/2405.17984) Xiaoting Lyu, Yufei Han, Wei Wang, Hangwei Qian, Ivor Tsang, Xiangliang Zhang -+ [Channel Reciprocity Based Attack Detection for Securing UWB Ranging by Autoencoder](https://arxiv.org//abs/2405.18255) ++ [Channel Reciprocity Based Attack Detection for Securing UWB Ranging by Autoencoder](https://arxiv.org/abs/2405.18255) Wenlong Gou, Chuanhang Yu, Juntao Ma, Gang Wu, Vladimir Mordachev -+ [Learning diverse attacks on large language models for robust red-teaming and safety tuning](https://arxiv.org//abs/2405.18540) ++ [Learning diverse attacks on large language models for robust red-teaming and safety tuning](https://arxiv.org/abs/2405.18540) Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain -+ [Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning](https://arxiv.org//abs/2405.18641) ++ [Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning](https://arxiv.org/abs/2405.18641) Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu -+ [Improved Generation of Adversarial Examples Against Safety-aligned LLMs](https://arxiv.org//abs/2405.20778) ++ [Improved Generation of Adversarial Examples Against Safety-aligned LLMs](https://arxiv.org/abs/2405.20778) Qizhang Li, Yiwen Guo, Wangmeng Zuo, Hao Chen -+ [Stochastic Adversarial Networks for Multi-Domain Text Classification](https://arxiv.org//abs/2406.00044) ++ [Stochastic Adversarial Networks for Multi-Domain Text Classification](https://arxiv.org/abs/2406.00044) Xu Wang, Yuan Wu -+ [Training More Robust Classification Model via Discriminative Loss and Gaussian Noise Injection](https://arxiv.org//abs/2405.18499) ++ [Training More Robust Classification Model via Discriminative Loss and Gaussian Noise Injection](https://arxiv.org/abs/2405.18499) Hai-Vy Nguyen, Fabrice Gamboa, Sixin Zhang, Reda Chhaibi, Serge Gratton, Thierry Giaccone # 2024-05-27 -+ [TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models](https://arxiv.org//abs/2405.16783) ++ [TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models](https://arxiv.org/abs/2405.16783) Yuzhou. Nie, Yanting. Wang, Jinyuan. Jia, Michael J. De Lucia, Nathaniel D. Bastian, Wenbo. Guo, Dawn. Song -+ [Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization](https://arxiv.org//abs/2405.17067) ++ [Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization](https://arxiv.org/abs/2405.17067) Dixuan Wang, Yanda Li, Junyuan Jiang, Zepeng Ding, Guochao Jiang, Jiaqing Liang, Deqing Yang -+ [A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness](https://arxiv.org//abs/2405.17361) ++ [A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness](https://arxiv.org/abs/2405.17361) Yuhao Zhang, Aws Albarghouthi, Loris D'Antoni -+ [Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training](https://arxiv.org//abs/2405.17130) ++ [Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training](https://arxiv.org/abs/2405.17130) Enes Altinisik, Safa Messaoud, Husrev Taha Sencar, Hassan Sajjad, Sanjay Chawla -+ [Privacy-Aware Visual Language Models](https://arxiv.org//abs/2405.17423) ++ [Privacy-Aware Visual Language Models](https://arxiv.org/abs/2405.17423) Laurens Samson, Nimrod Barazani, Sennay Ghebreab, Yuki M. Asano -+ [Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation](https://arxiv.org//abs/2405.16895) ++ [Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation](https://arxiv.org/abs/2405.16895) Liang Shi, Jie Zhang, Shiguang Shan -+ [Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models](https://arxiv.org//abs/2405.16940) ++ [Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models](https://arxiv.org/abs/2405.16940) Fengfan Zhou, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Lizhuang Ma, Hefei Ling -+ [Spectral regularization for adversarially-robust representation learning](https://arxiv.org//abs/2405.17181) ++ [Spectral regularization for adversarially-robust representation learning](https://arxiv.org/abs/2405.17181) Sheng Yang, Jacob A. Zavatone-Veth, Cengiz Pehlevan -+ [Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models](https://arxiv.org//abs/2405.16833) ++ [Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models](https://arxiv.org/abs/2405.16833) Chia-Yi Hsu, Yu-Lin Tsai, Chih-Hsun Lin, Pin-Yu Chen, Chia-Mu Yu, Chun-Ying Huang -+ [The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective](https://arxiv.org//abs/2405.16918) ++ [The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective](https://arxiv.org/abs/2405.16918) Nils Philipp Walter, Linara Adilova, Jilles Vreeken, Michael Kamp -+ [OSLO: One-Shot Label-Only Membership Inference Attacks](https://arxiv.org//abs/2405.16978) ++ [OSLO: One-Shot Label-Only Membership Inference Attacks](https://arxiv.org/abs/2405.16978) Yuefeng Peng, Jaechul Roh, Subhransu Maji, Amir Houmansadr -+ [Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models](https://arxiv.org//abs/2405.17374) ++ [Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models](https://arxiv.org/abs/2405.17374) ShengYun Peng, Pin-Yu Chen, Matthew Hull, Duen Horng Chau -+ [TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability](https://arxiv.org//abs/2405.17678) ++ [TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability](https://arxiv.org/abs/2405.17678) Fengji Ma, Li Liu, Hei Victor Cheng -+ [Exploring Backdoor Attacks against Large Language Model-based Decision Making](https://arxiv.org//abs/2405.20774) ++ [Exploring Backdoor Attacks against Large Language Model-based Decision Making](https://arxiv.org/abs/2405.20774) Ruochen Jiao, Shaoyuan Xie, Justin Yue, Takami Sato, Lixu Wang, Yixuan Wang, Qi Alfred Chen, Qi Zhu @@ -23937,1357 +23937,1357 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca Enes Altinisik, Safa Messaoud, Husrev Taha Sencar, Hassan Sajjad, Sanjay Chawla # 2024-05-26 -+ [Automatic Jailbreaking of the Text-to-Image Generative AI Systems](https://arxiv.org//abs/2405.16567) ++ [Automatic Jailbreaking of the Text-to-Image Generative AI Systems](https://arxiv.org/abs/2405.16567) Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang -+ [Visualizing the Shadows: Unveiling Data Poisoning Behaviors in Federated Learning](https://arxiv.org//abs/2405.16707) ++ [Visualizing the Shadows: Unveiling Data Poisoning Behaviors in Federated Learning](https://arxiv.org/abs/2405.16707) Xueqing Zhang, Junkai Zhang, Ka-Ho Chow, Juntao Chen, Ying Mao, Mohamed Rahouti, Xiang Li, Yuchen Liu, Wenqi Wei -+ [Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level](https://arxiv.org//abs/2405.16405) ++ [Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level](https://arxiv.org/abs/2405.16405) Runlin Lei, Yuwei Hu, Yuchen Ren, Zhewei Wei -+ [Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer](https://arxiv.org//abs/2405.16436) ++ [Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer](https://arxiv.org/abs/2405.16436) Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang -+ [Partial train and isolate, mitigate backdoor attack](https://arxiv.org//abs/2405.16488) ++ [Partial train and isolate, mitigate backdoor attack](https://arxiv.org/abs/2405.16488) Yong Li, Han Gao -+ [Pruning for Robust Concept Erasing in Diffusion Models](https://arxiv.org//abs/2405.16534) ++ [Pruning for Robust Concept Erasing in Diffusion Models](https://arxiv.org/abs/2405.16534) Tianyun Yang, Juan Cao, Chang Xu -+ [Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models](https://arxiv.org//abs/2405.20775) ++ [Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models](https://arxiv.org/abs/2405.20775) Xijie Huang, Xinyuan Wang, Hantao Zhang, Jiawen Xi, Jingkun An, Hao Wang, Chengwei Pan # 2024-05-25 -+ [Diffusion-Reward Adversarial Imitation Learning](https://arxiv.org//abs/2405.16194) ++ [Diffusion-Reward Adversarial Imitation Learning](https://arxiv.org/abs/2405.16194) Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, Shao-Hua Sun -+ [Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack](https://arxiv.org//abs/2405.16134) ++ [Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack](https://arxiv.org/abs/2405.16134) Mingli Zhu, Siyuan Liang, Baoyuan Wu -+ [Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling](https://arxiv.org//abs/2405.16181) ++ [Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling](https://arxiv.org/abs/2405.16181) Chunlin Qiu, Yiheng Duan, Lingchen Zhao, Qian Wang -+ [Detecting Adversarial Data via Perturbation Forgery](https://arxiv.org//abs/2405.16226) ++ [Detecting Adversarial Data via Perturbation Forgery](https://arxiv.org/abs/2405.16226) Qian Wang, Chen Li, Yuchen Luo, Hefei Ling, Ping Li, Jiazhong Chen, Shijuan Huang, Ning Yu -+ [Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination](https://arxiv.org//abs/2405.16260) ++ [Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination](https://arxiv.org/abs/2405.16260) Shelly Golan, Roy Ganz, Michael Elad -+ [R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model](https://arxiv.org//abs/2405.16341) ++ [R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model](https://arxiv.org/abs/2405.16341) Changhoon Kim, Kyle Min, Yezhou Yang -+ [Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness](https://arxiv.org//abs/2405.16036) ++ [Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness](https://arxiv.org/abs/2405.16036) Jieren Deng, Hanbin Hong, Aaron Palmer, Xin Zhou, Jinbo Bi, Kaleel Mahmood, Yuan Hong, Derek Aguiar -+ [Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor](https://arxiv.org//abs/2405.16112) ++ [Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor](https://arxiv.org/abs/2405.16112) Shaokui Wei, Hongyuan Zha, Baoyuan Wu -+ [Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency](https://arxiv.org//abs/2405.16262) ++ [Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency](https://arxiv.org/abs/2405.16262) Runqi Lin, Chaojian Yu, Bo Han, Hang Su, Tongliang Liu -+ [Secure Hierarchical Federated Learning in Vehicular Networks Using Dynamic Client Selection and Anomaly Detection](https://arxiv.org//abs/2405.17497) ++ [Secure Hierarchical Federated Learning in Vehicular Networks Using Dynamic Client Selection and Anomaly Detection](https://arxiv.org/abs/2405.17497) M. Saeid HaghighiFard, Sinem Coleri -+ [Towards Black-Box Membership Inference Attack for Diffusion Models](https://arxiv.org//abs/2405.20771) ++ [Towards Black-Box Membership Inference Attack for Diffusion Models](https://arxiv.org/abs/2405.20771) Jingwei Li, Jing Dong, Tianxing He, Jingzhao Zhang -+ [Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Characte](https://arxiv.org//abs/2405.20773) ++ [Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Characte](https://arxiv.org/abs/2405.20773) Siyuan Ma, Weidi Luo, Yu Wang, Xiaogeng Liu, Muhao Chen, Bo Li, Chaowei Xiao # 2024-05-24 -+ [How Does Bayes Error Limit Probabilistic Robust Accuracy](https://arxiv.org//abs/2405.14923) ++ [How Does Bayes Error Limit Probabilistic Robust Accuracy](https://arxiv.org/abs/2405.14923) Ruihan Zhang, Jun Sun -+ [RFLPA: A Robust Federated Learning Framework against Poisoning Attacks with Secure Aggregation](https://arxiv.org//abs/2405.15182) ++ [RFLPA: A Robust Federated Learning Framework against Poisoning Attacks with Secure Aggregation](https://arxiv.org/abs/2405.15182) Peihua Mai, Ran Yan, Yan Pang -+ [Coordinated Disclosure for AI: Beyond Security Vulnerabilities](https://arxiv.org//abs/2402.07039) ++ [Coordinated Disclosure for AI: Beyond Security Vulnerabilities](https://arxiv.org/abs/2402.07039) Sven Cattell, Avijit Ghosh, Lucie-Aimée Kaffee -+ [Robust Diffusion Models for Adversarial Purification](https://arxiv.org//abs/2403.16067) ++ [Robust Diffusion Models for Adversarial Purification](https://arxiv.org/abs/2403.16067) Guang Lin, Zerui Tao, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao -+ [Certifiably Robust RAG against Retrieval Corruption](https://arxiv.org//abs/2405.15556) ++ [Certifiably Robust RAG against Retrieval Corruption](https://arxiv.org/abs/2405.15556) Chong Xiang, Tong Wu, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal -+ [Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models](https://arxiv.org//abs/2405.15234) ++ [Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models](https://arxiv.org/abs/2405.15234) Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, Sijia Liu -+ [BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection](https://arxiv.org//abs/2405.15269) ++ [BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection](https://arxiv.org/abs/2405.15269) Yuwei Niu, Shuo He, Qi Wei, Feng Liu, Lei Feng -+ [Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection](https://arxiv.org//abs/2405.15465) ++ [Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection](https://arxiv.org/abs/2405.15465) Fan Liu, Liang Yao, Chuanyi Zhang, Ting Wu, Xinlei Zhang, Jun Zhou, Xiruo Jiang -+ [Better Membership Inference Privacy Measurement through Discrepancy](https://arxiv.org//abs/2405.15140) ++ [Better Membership Inference Privacy Measurement through Discrepancy](https://arxiv.org/abs/2405.15140) Ruihan Wu, Pengrun Huang, Kamalika Chaudhuri -+ [Adversarial Attacks on Hidden Tasks in Multi-Task Learning](https://arxiv.org//abs/2405.15244) ++ [Adversarial Attacks on Hidden Tasks in Multi-Task Learning](https://arxiv.org/abs/2405.15244) Yu Zhe, Rei Nagaike, Daiki Nishiyama, Kazuto Fukuchi, Jun Sakuma -+ [Decaf: Data Distribution Decompose Attack against Federated Learning](https://arxiv.org//abs/2405.15316) ++ [Decaf: Data Distribution Decompose Attack against Federated Learning](https://arxiv.org/abs/2405.15316) Zhiyang Dai, Chunyi Zhou, Anmin Fu -+ [Lost in the Averages: A New Specific Setup to Evaluate Membership Inference Attacks Against Machine Learning Models](https://arxiv.org//abs/2405.15423) ++ [Lost in the Averages: A New Specific Setup to Evaluate Membership Inference Attacks Against Machine Learning Models](https://arxiv.org/abs/2405.15423) Florent Guépin, Nataša Krčo, Matthieu Meeus, Yves-Alexandre de Montjoye -+ [DAGER: Exact Gradient Inversion for Large Language Models](https://arxiv.org//abs/2405.15586) ++ [DAGER: Exact Gradient Inversion for Large Language Models](https://arxiv.org/abs/2405.15586) Ivo Petrov, Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, Martin Vechev -+ [Efficient Adversarial Training in LLMs with Continuous Attacks](https://arxiv.org//abs/2405.15589) ++ [Efficient Adversarial Training in LLMs with Continuous Attacks](https://arxiv.org/abs/2405.15589) Sophie Xhonneux, Alessandro Sordoni, Stephan Günnemann, Gauthier Gidel, Leo Schwinn -+ [TrojanForge: Adversarial Hardware Trojan Examples with Reinforcement Learning](https://arxiv.org//abs/2405.15184) ++ [TrojanForge: Adversarial Hardware Trojan Examples with Reinforcement Learning](https://arxiv.org/abs/2405.15184) Amin Sarihi, Peter Jamieson, Ahmad Patooghy, Abdel-Hameed A. Badawy -+ [Adversarial Imitation Learning from Visual Observations using Latent Information](https://arxiv.org//abs/2309.17371) ++ [Adversarial Imitation Learning from Visual Observations using Latent Information](https://arxiv.org/abs/2309.17371) Vittorio Giammarino, James Queeney, Ioannis Ch. Paschalidis -+ [Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models](https://arxiv.org//abs/2405.15984) ++ [Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models](https://arxiv.org/abs/2405.15984) Simon Chi Lok Yu, Jie He, Pasquale Minervini, Jeff Z. Pan -+ [A Neurosymbolic Framework for Bias Correction in CNNs](https://arxiv.org//abs/2405.15886) ++ [A Neurosymbolic Framework for Bias Correction in CNNs](https://arxiv.org/abs/2405.15886) Parth Padalkar, Natalia Ślusarz, Ekaterina Komendantskaya, Gopal Gupta -+ [Robust width: A lightweight and certifiable adversarial defense](https://arxiv.org//abs/2405.15971) ++ [Robust width: A lightweight and certifiable adversarial defense](https://arxiv.org/abs/2405.15971) Jonathan Peck, Bart Goossens -+ [Can Implicit Bias Imply Adversarial Robustness?](https://arxiv.org//abs/2405.15942) ++ [Can Implicit Bias Imply Adversarial Robustness?](https://arxiv.org/abs/2405.15942) Hancheng Min, René Vidal -+ [BadGD: A unified data-centric framework to identify gradient descent vulnerabilities](https://arxiv.org//abs/2405.15979) ++ [BadGD: A unified data-centric framework to identify gradient descent vulnerabilities](https://arxiv.org/abs/2405.15979) Chi-Hua Wang, Guang Cheng -+ [Robustifying Safety-Aligned Large Language Models through Clean Data Curation](https://arxiv.org//abs/2405.19358) ++ [Robustifying Safety-Aligned Large Language Models through Clean Data Curation](https://arxiv.org/abs/2405.19358) Xiaoqun Liu, Jiacheng Liang, Muchao Ye, Zhaohan Xi -+ [ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users](https://arxiv.org//abs/2405.19360) ++ [ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users](https://arxiv.org/abs/2405.19360) Guanlin Li, Kangjie Chen, Shudong Zhang, Jie Zhang, Tianwei Zhang -+ [Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent](https://arxiv.org//abs/2405.20770) ++ [Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent](https://arxiv.org/abs/2405.20770) Guang Lin, Qibin Zhao # 2024-05-23 -+ [Learning to Transform Dynamically for Better Adversarial Transferability](https://arxiv.org//abs/2405.14077) ++ [Learning to Transform Dynamically for Better Adversarial Transferability](https://arxiv.org/abs/2405.14077) Rongyi Zhu, Zeliang Zhang, Susan Liang, Zhuo Liu, Chenliang Xu -+ [Certified Robustness against Sparse Adversarial Perturbations via Data Localization](https://arxiv.org//abs/2405.14176) ++ [Certified Robustness against Sparse Adversarial Perturbations via Data Localization](https://arxiv.org/abs/2405.14176) Ambar Pal, René Vidal, Jeremias Sulam -+ [SLIFER: Investigating Performance and Robustness of Malware Detection Pipelines](https://arxiv.org//abs/2405.14478) ++ [SLIFER: Investigating Performance and Robustness of Malware Detection Pipelines](https://arxiv.org/abs/2405.14478) Andrea Ponte, Dmitrijs Trizna, Luca Demetrio, Battista Biggio, Fabio Roli -+ [Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs](https://arxiv.org//abs/2405.14189) ++ [Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs](https://arxiv.org/abs/2405.14189) Yihao Huang, Chong Wang, Xiaojun Jia, Qing Guo, Felix Juefei-Xu, Jian Zhang, Geguang Pu, Yang Liu -+ [MoGU: A Framework for Enhancing Safety of Open-Sourced LLMs While Preserving Their Usability](https://arxiv.org//abs/2405.14488) ++ [MoGU: A Framework for Enhancing Safety of Open-Sourced LLMs While Preserving Their Usability](https://arxiv.org/abs/2405.14488) Yanrui Du, Sendong Zhao, Danyang Zhao, Ming Ma, Yuhan Chen, Liangyu Huo, Qing Yang, Dongliang Xu, Bing Qin -+ [Impact of Non-Standard Unicode Characters on Security and Comprehension in Large Language Models](https://arxiv.org//abs/2405.14490) ++ [Impact of Non-Standard Unicode Characters on Security and Comprehension in Large Language Models](https://arxiv.org/abs/2405.14490) Johan S Daniel, Anand Pal -+ [Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models](https://arxiv.org//abs/2405.14646) ++ [Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models](https://arxiv.org/abs/2405.14646) Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li -+ [Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving with Typography](https://arxiv.org//abs/2405.14169) ++ [Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving with Typography](https://arxiv.org/abs/2405.14169) Nhat Chung, Sensen Gao, Tuan-Anh Vu, Jie Zhang, Aishan Liu, Yun Lin, Jin Song Dong, Qing Guo -+ [Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds](https://arxiv.org//abs/2405.14210) ++ [Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds](https://arxiv.org/abs/2405.14210) Hanwei Zhang, Luo Cheng, Qisong He, Wei Huang, Renjue Li, Ronan Sicre, Xiaowei Huang, Holger Hermanns, Lijun Zhang -+ [Towards Imperceptible Backdoor Attack in Self-supervised Learning](https://arxiv.org//abs/2405.14672) ++ [Towards Imperceptible Backdoor Attack in Self-supervised Learning](https://arxiv.org/abs/2405.14672) Hanrong Zhang, Zhenting Wang, Tingxu Han, Mingyu Jin, Chenlu Zhan, Mengnan Du, Hongwei Wang, Shiqing Ma -+ [Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy](https://arxiv.org//abs/2405.14800) ++ [Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy](https://arxiv.org/abs/2405.14800) Shengfang Zhai, Huanran Chen, Yinpeng Dong, Jiajun Li, Qingni Shen, Yansong Gao, Hang Su, Yang Liu -+ [Boosting Robustness by Clipping Gradients in Distributed Learning](https://arxiv.org//abs/2405.14432) ++ [Boosting Robustness by Clipping Gradients in Distributed Learning](https://arxiv.org/abs/2405.14432) Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Ahmed Jellouli, Geovani Rizk, John Stephan -+ [Identity Inference from CLIP Models using Only Textual Data](https://arxiv.org//abs/2405.14517) ++ [Identity Inference from CLIP Models using Only Textual Data](https://arxiv.org/abs/2405.14517) Songze Li, Ruoxi Cheng, Xiaojun Jia -+ [A New Formulation for Zeroth-Order Optimization of Adversarial EXEmples in Malware Detection](https://arxiv.org//abs/2405.14519) ++ [A New Formulation for Zeroth-Order Optimization of Adversarial EXEmples in Malware Detection](https://arxiv.org/abs/2405.14519) Marco Rando, Luca Demetrio, Lorenzo Rosasco, Fabio Roli -+ [Nearly Tight Black-Box Auditing of Differentially Private Machine Learning](https://arxiv.org//abs/2405.14106) ++ [Nearly Tight Black-Box Auditing of Differentially Private Machine Learning](https://arxiv.org/abs/2405.14106) Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro -+ [Generating camera failures as a class of physics-based adversarial examples](https://arxiv.org//abs/2405.15033) ++ [Generating camera failures as a class of physics-based adversarial examples](https://arxiv.org/abs/2405.15033) Manav Prabhakar, Jwalandhar Girnar, Arpan Kusari -+ [Universal Robustness via Median Randomized Smoothing for Real-World Super-Resolution](https://arxiv.org//abs/2405.14934) ++ [Universal Robustness via Median Randomized Smoothing for Real-World Super-Resolution](https://arxiv.org/abs/2405.14934) Zakariya Chaouai, Mohamed Tamaazousti # 2024-05-22 -+ [Adversarial Training via Adaptive Knowledge Amalgamation of an Ensemble of Teachers](https://arxiv.org//abs/2405.13324) ++ [Adversarial Training via Adaptive Knowledge Amalgamation of an Ensemble of Teachers](https://arxiv.org/abs/2405.13324) Shayan Mohajer Hamidi, Linfeng Ye -+ [Safety Alignment for Vision Language Models](https://arxiv.org//abs/2405.13581) ++ [Safety Alignment for Vision Language Models](https://arxiv.org/abs/2405.13581) Zhendong Liu, Yuanbi Nie, Yingshui Tan, Xiangyu Yue, Qiushi Cui, Chongjun Wang, Xiaoyong Zhu, Bo Zheng -+ [DeepNcode: Encoding-Based Protection against Bit-Flip Attacks on Neural Networks](https://arxiv.org//abs/2405.13891) ++ [DeepNcode: Encoding-Based Protection against Bit-Flip Attacks on Neural Networks](https://arxiv.org/abs/2405.13891) Patrik Velčický, Jakub Breier, Xiaolu Hou, Mladen Kovačević -+ [Towards Comprehensive and Efficient Post Safety Alignment of Large Language Models via Safety Patching](https://arxiv.org//abs/2405.13820) ++ [Towards Comprehensive and Efficient Post Safety Alignment of Large Language Models via Safety Patching](https://arxiv.org/abs/2405.13820) Weixiang Zhao, Yulin Hu, Zhuojun Li, Yang Deng, Yanyan Zhao, Bing Qin, Tat-Seng Chua -+ [TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models](https://arxiv.org//abs/2405.13401) ++ [TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models](https://arxiv.org/abs/2405.13401) Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Ping Yi, Zhuosheng Zhang, Gongshen Liu -+ [Towards Certification of Uncertainty Calibration under Adversarial Attacks](https://arxiv.org//abs/2405.13922) ++ [Towards Certification of Uncertainty Calibration under Adversarial Attacks](https://arxiv.org/abs/2405.13922) Cornelius Emde, Francesco Pinto, Thomas Lukasiewicz, Philip H.S. Torr, Adel Bibi -+ [WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response](https://arxiv.org//abs/2405.14023) ++ [WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response](https://arxiv.org/abs/2405.14023) Tianrong Zhang, Bochuan Cao, Yuanpu Cao, Lu Lin, Prasenjit Mitra, Jinghui Chen -+ [Adversarial Training of Two-Layer Polynomial and ReLU Activation Networks via Convex Optimization](https://arxiv.org//abs/2405.14033) ++ [Adversarial Training of Two-Layer Polynomial and ReLU Activation Networks via Convex Optimization](https://arxiv.org/abs/2405.14033) Daniel Kuelbs, Sanjay Lall, Mert Pilanci -+ [Memory Scraping Attack on Xilinx FPGAs: Private Data Extraction from Terminated Processes](https://arxiv.org//abs/2405.13927) ++ [Memory Scraping Attack on Xilinx FPGAs: Private Data Extraction from Terminated Processes](https://arxiv.org/abs/2405.13927) Bharadwaj Madabhushi, Sandip Kundu, Daniel Holcomb -+ [Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models](https://arxiv.org//abs/2405.13798) ++ [Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models](https://arxiv.org/abs/2405.13798) Tyler Bell, Avinash Mudireddy, Ivan Johnson-Eversoll, Soura Dasgupta, Raghu Mudumbai # 2024-05-21 -+ [Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models](https://arxiv.org//abs/2405.12523) ++ [Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models](https://arxiv.org/abs/2405.12523) Jiaqi Li, Qianshan Wei, Chuanyi Zhang, Guilin Qi, Miaozeng Du, Yongrui Chen, Sheng Bi -+ [Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming](https://arxiv.org//abs/2405.12604) ++ [Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming](https://arxiv.org/abs/2405.12604) Jiaxu Liu, Xiangyu Yin, Sihao Wu, Jianhong Wang, Meng Fang, Xinping Yi, Xiaowei Huang -+ [Generative AI and Large Language Models for Cyber Security: All Insights You Need](https://arxiv.org//abs/2405.12750) ++ [Generative AI and Large Language Models for Cyber Security: All Insights You Need](https://arxiv.org/abs/2405.12750) Mohamed Amine Ferrag, Fatima Alwahedi, Ammar Battah, Bilel Cherif, Abdechakour Mechri, Norbert Tihanyi -+ [Transparency Distortion Robustness for SOTA Image Segmentation Tasks](https://arxiv.org//abs/2405.12864) ++ [Transparency Distortion Robustness for SOTA Image Segmentation Tasks](https://arxiv.org/abs/2405.12864) Volker Knauthe, Arne Rak, Tristan Wirth, Thomas Pöllabauer, Simon Metzler, Arjan Kuijper, Dieter W. Fellner -+ [Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks](https://arxiv.org//abs/2405.12725) ++ [Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks](https://arxiv.org/abs/2405.12725) Boheng Li, Yishuo Cai, Haowei Li, Feng Xue, Zhifeng Li, Yiming Li -+ [Spatial-aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image](https://arxiv.org//abs/2405.12872) ++ [Spatial-aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image](https://arxiv.org/abs/2405.12872) Zerui Zhang, Zhichao Sun, Zelong Liu, Bo Du, Rui Yu, Zhou Zhao, Yongchao Xu -+ [Robust Classification via a Single Diffusion Model](https://arxiv.org//abs/2305.15241) ++ [Robust Classification via a Single Diffusion Model](https://arxiv.org/abs/2305.15241) Huanran Chen, Yinpeng Dong, Zhengyi Wang, Xiao Yang, Chengqi Duan, Hang Su, Jun Zhu -+ [Rethinking Robustness Assessment: Adversarial Attacks on Learning-based Quadrupedal Locomotion Controllers](https://arxiv.org//abs/2405.12424) ++ [Rethinking Robustness Assessment: Adversarial Attacks on Learning-based Quadrupedal Locomotion Controllers](https://arxiv.org/abs/2405.12424) Fan Shi, Chong Zhang, Takahiro Miki, Joonho Lee, Marco Hutter, Stelian Coros -+ [How to Train a Backdoor-Robust Model on a Poisoned Dataset without Auxiliary Data?](https://arxiv.org//abs/2405.12719) ++ [How to Train a Backdoor-Robust Model on a Poisoned Dataset without Auxiliary Data?](https://arxiv.org/abs/2405.12719) Yuwen Pu, Jiahao Chen, Chunyi Zhou, Zhou Feng, Qingming Li, Chunqiang Hu, Shouling Ji -+ [A Stealthy Backdoor Attack for Without-Label-Sharing Split Learning](https://arxiv.org//abs/2405.12751) ++ [A Stealthy Backdoor Attack for Without-Label-Sharing Split Learning](https://arxiv.org/abs/2405.12751) Yuwen Pu, Zhuoyuan Ding, Jiahao Chen, Chunyi Zhou, Qingming Li, Chunqiang Hu, Shouling Ji -+ [Rethinking the Vulnerabilities of Face Recognition Systems:From a Practical Perspective](https://arxiv.org//abs/2405.12786) ++ [Rethinking the Vulnerabilities of Face Recognition Systems:From a Practical Perspective](https://arxiv.org/abs/2405.12786) Jiahao Chen, Zhiqiang Shen, Yuwen Pu, Chunyi Zhou, Shouling Ji -+ [GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation](https://arxiv.org//abs/2405.13077) ++ [GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation](https://arxiv.org/abs/2405.13077) Govind Ramesh, Yao Dou, Wei Xu -+ [Interactive Simulations of Backdoors in Neural Networks](https://arxiv.org//abs/2405.13217) ++ [Interactive Simulations of Backdoors in Neural Networks](https://arxiv.org/abs/2405.13217) Peter Bajcsy, Maxime Bros -+ [EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection](https://arxiv.org//abs/2405.13080) ++ [EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection](https://arxiv.org/abs/2405.13080) Yuwen Qian, Shuchi Wu, Kang Wei, Ming Ding, Di Xiao, Tao Xiang, Chuan Ma, Song Guo -+ [A novel reliability attack of Physical Unclonable Functions](https://arxiv.org//abs/2405.13147) ++ [A novel reliability attack of Physical Unclonable Functions](https://arxiv.org/abs/2405.13147) Gaoxiang Li, Yu Zhuang # 2024-05-20 -+ [Fed-Credit: Robust Federated Learning with Credibility Management](https://arxiv.org//abs/2405.11758) ++ [Fed-Credit: Robust Federated Learning with Credibility Management](https://arxiv.org/abs/2405.11758) Jiayan Chen, Zhirong Qian, Tianhui Meng, Xitong Gao, Tian Wang, Weijia Jia -+ [Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space](https://arxiv.org//abs/2405.11982) ++ [Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space](https://arxiv.org/abs/2405.11982) Qianmei Liu, Yufei Kuang, Jie Wang -+ [Adaptive Batch Normalization Networks for Adversarial Robustness](https://arxiv.org//abs/2405.11708) ++ [Adaptive Batch Normalization Networks for Adversarial Robustness](https://arxiv.org/abs/2405.11708) Shao-Yuan Lo, Vishal M. Patel -+ [Adversarially Diversified Rehearsal Memory (ADRM): Mitigating Memory Overfitting Challenge in Continual Learning](https://arxiv.org//abs/2405.11829) ++ [Adversarially Diversified Rehearsal Memory (ADRM): Mitigating Memory Overfitting Challenge in Continual Learning](https://arxiv.org/abs/2405.11829) Hikmat Khan, Ghulam Rasool, Nidhal Carla Bouaynaya -+ [Data Contamination Calibration for Black-box LLMs](https://arxiv.org//abs/2405.11930) ++ [Data Contamination Calibration for Black-box LLMs](https://arxiv.org/abs/2405.11930) Wentao Ye, Jiaqi Hu, Liyao Li, Haobo Wang, Gang Chen, Junbo Zhao -+ [Decentralized Privacy Preservation for Critical Connections in Graphs](https://arxiv.org//abs/2405.11713) ++ [Decentralized Privacy Preservation for Critical Connections in Graphs](https://arxiv.org/abs/2405.11713) Conggai Li, Wei Ni, Ming Ding, Youyang Qu, Jianjun Chen, David Smith, Wenjie Zhang, Thierry Rakotoarivelo -+ [GAN-GRID: A Novel Generative Attack on Smart Grid Stability Prediction](https://arxiv.org//abs/2405.12076) ++ [GAN-GRID: A Novel Generative Attack on Smart Grid Stability Prediction](https://arxiv.org/abs/2405.12076) Emad Efatinasab, Alessandro Brighente, Mirco Rampazzo, Nahal Azadi, Mauro Conti -+ [Adaptive Batch Normalization Networks for Adversarial Robustness](https://arxiv.org//abs/2405.11708) ++ [Adaptive Batch Normalization Networks for Adversarial Robustness](https://arxiv.org/abs/2405.11708) Shao-Yuan Lo, Vishal M. Patel -+ [Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks](https://arxiv.org//abs/2405.12295) ++ [Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks](https://arxiv.org/abs/2405.12295) Marcin Podhajski, Jan Dubiński, Franziska Boenisch, Adam Dziedzic, Agnieszka Pregowska, Tomasz Michalak -+ [Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation](https://arxiv.org//abs/2405.13068) ++ [Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation](https://arxiv.org/abs/2405.13068) Yuxi Li, Yi Liu, Yuekang Li, Ling Shi, Gelei Deng, Shengquan Chen, Kailong Wang # 2024-05-19 -+ [An Invisible Backdoor Attack Based On Semantic Feature](https://arxiv.org//abs/2405.11551) ++ [An Invisible Backdoor Attack Based On Semantic Feature](https://arxiv.org/abs/2405.11551) Yangming Chen -+ [Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems](https://arxiv.org//abs/2405.11629) ++ [Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems](https://arxiv.org/abs/2405.11629) Shengxiang Sun, Shenzhe Zhu -+ [A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers](https://arxiv.org//abs/2405.11904) ++ [A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers](https://arxiv.org/abs/2405.11904) Tom Roth, Inigo Jauregi Unanue, Alsharif Abuadbba, Massimo Piccardi -+ [On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks](https://arxiv.org//abs/2405.11432) ++ [On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks](https://arxiv.org/abs/2405.11432) Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester -+ [Certified Robust Accuracy of Neural Networks Are Bounded due to Bayes Errors](https://arxiv.org//abs/2405.11547) ++ [Certified Robust Accuracy of Neural Networks Are Bounded due to Bayes Errors](https://arxiv.org/abs/2405.11547) Ruihan Zhang, Jun Sun -+ [A GAN-Based Data Poisoning Attack Against Federated Learning Systems and Its Countermeasure](https://arxiv.org//abs/2405.11440) ++ [A GAN-Based Data Poisoning Attack Against Federated Learning Systems and Its Countermeasure](https://arxiv.org/abs/2405.11440) Wei Sun, Bo Gao, Ke Xiong, Yuwei Wang, Pingyi Fan, Khaled Ben Letaief -+ [Sketches-based join size estimation under local differential privacy](https://arxiv.org//abs/2405.11419) ++ [Sketches-based join size estimation under local differential privacy](https://arxiv.org/abs/2405.11419) Meifan Zhang, Xin Liu, Lihua Yin -+ [BOSC: A Backdoor-based Framework for Open Set Synthetic Image Attribution](https://arxiv.org//abs/2405.11491) ++ [BOSC: A Backdoor-based Framework for Open Set Synthetic Image Attribution](https://arxiv.org/abs/2405.11491) Jun Wang, Benedetta Tondi, Mauro Barni -+ [Hummer: Towards Limited Competitive Preference Dataset](https://arxiv.org//abs/2405.11647) ++ [Hummer: Towards Limited Competitive Preference Dataset](https://arxiv.org/abs/2405.11647) Li Jiang, Yusen Wu, Junwu Xiong, Jingqing Ruan, Yichuan Ding, Qingpei Guo, Zujie Wen, Jun Zhou, Xiaotie Deng # 2024-05-18 -+ [Revisiting the Robust Generalization of Adversarial Prompt Tuning](https://arxiv.org//abs/2405.11154) ++ [Revisiting the Robust Generalization of Adversarial Prompt Tuning](https://arxiv.org/abs/2405.11154) Fan Yang, Mingxuan Xia, Sangzhou Xia, Chicheng Ma, Hui Hui -+ [Trustworthy Actionable Perturbations](https://arxiv.org//abs/2405.11195) ++ [Trustworthy Actionable Perturbations](https://arxiv.org/abs/2405.11195) Jesse Friedbaum, Sudarshan Adiga, Ravi Tandon -+ [Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses](https://arxiv.org//abs/2405.11206) ++ [Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses](https://arxiv.org/abs/2405.11206) Thanh Nguyen, Tung M. Luu, Tri Ton, Chang D. Yoo -+ [SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks](https://arxiv.org//abs/2405.11575) ++ [SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks](https://arxiv.org/abs/2405.11575) Xuanli He, Qiongkai Xu, Jun Wang, Benjamin I. P. Rubinstein, Trevor Cohn -+ [BadActs: A Universal Backdoor Defense in the Activation Space](https://arxiv.org//abs/2405.11227) ++ [BadActs: A Universal Backdoor Defense in the Activation Space](https://arxiv.org/abs/2405.11227) Biao Yi, Sishuo Chen, Yiming Li, Tong Li, Baolei Zhang, Zheli Liu -+ [UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers](https://arxiv.org//abs/2405.11336) ++ [UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers](https://arxiv.org/abs/2405.11336) Duo Peng, Qiuhong Ke, Jun Liu - [AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA](https://arxiv.org//abs/2405.11135) + [AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA](https://arxiv.org/abs/2405.11135) Weitao Feng, Wenbo Zhou, Jiyan He, Jie Zhang, Tianyi Wei, Guanlin Li, Tianwei Zhang, Weiming Zhang, Nenghai Yu -+ [Few-Shot API Attack Detection: Overcoming Data Scarcity with GAN-Inspired Learning](https://arxiv.org//abs/2405.11258) ++ [Few-Shot API Attack Detection: Overcoming Data Scarcity with GAN-Inspired Learning](https://arxiv.org/abs/2405.11258) Udi Aharon, Revital Marbel, Ran Dubin, Amit Dvir, Chen Hajaj -+ [Detecting Complex Multi-step Attacks with Explainable Graph Neural Network](https://arxiv.org//abs/2405.11335) ++ [Detecting Complex Multi-step Attacks with Explainable Graph Neural Network](https://arxiv.org/abs/2405.11335) Wei Liu, Peng Gao, Haotian Zhang, Ke Li, Weiyong Yang, Xingshen Wei, Shuji Wu -+ [MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection](https://arxiv.org//abs/2405.11315) ++ [MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection](https://arxiv.org/abs/2405.11315) Ximiao Zhang, Min Xu, Dehui Qiu, Ruixin Yan, Ning Lang, Xiuzhuang Zhou # 2024-05-17 -+ [Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors](https://arxiv.org//abs/2405.10529) ++ [Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors](https://arxiv.org/abs/2405.10529) Jiachen Sun, Changsheng Wang, Jiongxiao Wang, Yiwei Zhang, Chaowei Xiao -+ [Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transformers](https://arxiv.org//abs/2405.10612) ++ [Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transformers](https://arxiv.org/abs/2405.10612) Sheng Yang, Jiawang Bai, Kuofeng Gao, Yong Yang, Yiming Li, Shu-tao Xia -+ [Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation](https://arxiv.org//abs/2405.10870) ++ [Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation](https://arxiv.org/abs/2405.10870) Yixing Huang, Zahra Khodabakhshi, Ahmed Gomaa, Manuel Schmidt, Rainer Fietkau, Matthias Guckenberger, Nicolaus Andratschke, Christoph Bert, Stephanie Tanadini-Lang, Florian Putz -+ [Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective](https://arxiv.org//abs/2405.10757) ++ [Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective](https://arxiv.org/abs/2405.10757) Zhiwei Zhang, Minhua Lin, Enyan Dai, Suhang Wang -+ [Boosting Few-Pixel Robustness Verification via Covering Verification Designs](https://arxiv.org//abs/2405.10924) ++ [Boosting Few-Pixel Robustness Verification via Covering Verification Designs](https://arxiv.org/abs/2405.10924) Yuval Shapira, Naor Wiesel, Shahar Shabelman, Dana Drachsler-Cohen -+ [Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing](https://arxiv.org//abs/2405.10521) ++ [Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing](https://arxiv.org/abs/2405.10521) Yaoqi Yang, Bangning Zhang, Daoxing Guo, Hongyang Du, Zehui Xiong, Dusit Niyato, Zhu Han # 2024-05-16 -+ [Box-Free Model Watermarks Are Prone to Black-Box Removal Attacks](https://arxiv.org//abs/2405.09863) ++ [Box-Free Model Watermarks Are Prone to Black-Box Removal Attacks](https://arxiv.org/abs/2405.09863) Haonan An, Guang Hua, Zhiping Lin, Yuguang Fang -+ [DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection](https://arxiv.org//abs/2405.09882) ++ [DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection](https://arxiv.org/abs/2405.09882) Yuhao Sun, Lingyun Yu, Hongtao Xie, Jiaming Li, Yongdong Zhang -+ [Keep It Private: Unsupervised Privatization of Online Text](https://arxiv.org//abs/2405.10260) ++ [Keep It Private: Unsupervised Privatization of Online Text](https://arxiv.org/abs/2405.10260) Calvin Bao, Marine Carpuat -+ [SecureLLM: Using Compositionality to Build Provably Secure Language Models for Private, Sensitive, and Secret Data](https://arxiv.org//abs/2405.09805) ++ [SecureLLM: Using Compositionality to Build Provably Secure Language Models for Private, Sensitive, and Secret Data](https://arxiv.org/abs/2405.09805) Abdulrahman Alabdulakreem, Christian M Arnold, Yerim Lee, Pieter M Feenstra, Boris Katz, Andrei Barbu -+ [Infrared Adversarial Car Stickers](https://arxiv.org//abs/2405.09924) ++ [Infrared Adversarial Car Stickers](https://arxiv.org/abs/2405.09924) Xiaopei Zhu, Yuqiu Liu, Zhanhao Hu, Jianmin Li, Xiaolin Hu -+ [Adversarial Robustness for Visual Grounding of Multimodal Large Language Models](https://arxiv.org//abs/2405.09981) ++ [Adversarial Robustness for Visual Grounding of Multimodal Large Language Models](https://arxiv.org/abs/2405.09981) Kuofeng Gao, Yang Bai, Jiawang Bai, Yong Yang, Shu-Tao Xia -+ [IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency](https://arxiv.org//abs/2405.09786) ++ [IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency](https://arxiv.org/abs/2405.09786) Linshan Hou, Ruili Feng, Zhongyun Hua, Wei Luo, Leo Yu Zhang, Yiming Li -+ [Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution](https://arxiv.org//abs/2405.09800) ++ [Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution](https://arxiv.org/abs/2405.09800) Eslam Zaher, Maciej Trzaskowski, Quan Nguyen, Fred Roosta -+ [Dealing Doubt: Unveiling Threat Models in Gradient Inversion Attacks under Federated Learning, A Survey and Taxonomy](https://arxiv.org//abs/2405.10376) ++ [Dealing Doubt: Unveiling Threat Models in Gradient Inversion Attacks under Federated Learning, A Survey and Taxonomy](https://arxiv.org/abs/2405.10376) Yichuan Shi, Olivera Kotevska, Viktor Reshniak, Abhishek Singh, Ramesh Raskar -+ [Adversarial Robustness Guarantees for Quantum Classifiers](https://arxiv.org//abs/2405.10360) ++ [Adversarial Robustness Guarantees for Quantum Classifiers](https://arxiv.org/abs/2405.10360) Neil Dowling, Maxwell T. West, Angus Southwell, Azar C. Nakhl, Martin Sevior, Muhammad Usman, Kavan Modi -+ [Learnable Privacy Neurons Localization in Language Models](https://arxiv.org//abs/2405.10989) ++ [Learnable Privacy Neurons Localization in Language Models](https://arxiv.org/abs/2405.10989) Ruizhe Chen, Tianxiang Hu, Yang Feng, Zuozhu Liu -+ ["What do you want from theory alone?" Experimenting with Tight Auditing of Differentially Private Synthetic Data Generation](https://arxiv.org//abs/2405.10994) ++ ["What do you want from theory alone?" Experimenting with Tight Auditing of Differentially Private Synthetic Data Generation](https://arxiv.org/abs/2405.10994) Meenatchi Sundaram Muthu Selva Annamalai, Georgi Ganev, Emiliano De Cristofaro # 2024-05-15 -+ [Identity Overlap Between Face Recognition Train/Test Data: Causing Optimistic Bias in Accuracy Measurement](https://arxiv.org//abs/2405.09403) ++ [Identity Overlap Between Face Recognition Train/Test Data: Causing Optimistic Bias in Accuracy Measurement](https://arxiv.org/abs/2405.09403) Haiyu Wu, Sicong Tian, Jacob Gutierrez, Aman Bhatta, Kağan Öztürk, Kevin W. Bowyer -+ [Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization](https://arxiv.org//abs/2405.09113) ++ [Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization](https://arxiv.org/abs/2405.09113) Kai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Yining Li, Kai Chen, Zhiqiang Shen, Matt Fredrikson -+ [Cross-Input Certified Training for Universal Perturbations](https://arxiv.org//abs/2405.09176) ++ [Cross-Input Certified Training for Universal Perturbations](https://arxiv.org/abs/2405.09176) Changming Xu, Gagandeep Singh -+ [Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer](https://arxiv.org//abs/2405.09470) ++ [Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer](https://arxiv.org/abs/2405.09470) Weifei Jin, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie Hao, Ziyao Liu -+ [Words Blending Boxes. Obfuscating Queries in Information Retrieval using Differential Privacy](https://arxiv.org//abs/2405.09306) ++ [Words Blending Boxes. Obfuscating Queries in Information Retrieval using Differential Privacy](https://arxiv.org/abs/2405.09306) Francesco Luigi De Faveri, Guglielmo Faggioli, Nicola Ferro -+ [Training Deep Learning Models with Hybrid Datasets for Robust Automatic Target Detection on real SAR images](https://arxiv.org//abs/2405.09588) ++ [Training Deep Learning Models with Hybrid Datasets for Robust Automatic Target Detection on real SAR images](https://arxiv.org/abs/2405.09588) Benjamin Camus, Théo Voillemin, Corentin Le Barbu, Jean-Christophe Louvigné, Carole Belloni, Emmanuel Vallée -+ [Properties that allow or prohibit transferability of adversarial attacks among quantized networks](https://arxiv.org//abs/2405.09598) ++ [Properties that allow or prohibit transferability of adversarial attacks among quantized networks](https://arxiv.org/abs/2405.09598) Abhishek Shrestha, Jürgen Großmann -+ [DP-RuL: Differentially-Private Rule Learning for Clinical Decision Support Systems](https://arxiv.org//abs/2405.09721) ++ [DP-RuL: Differentially-Private Rule Learning for Clinical Decision Support Systems](https://arxiv.org/abs/2405.09721) Josephine Lamp, Lu Feng, David Evans -+ [Benchmark Early and Red Team Often: A Framework for Assessing and Managing Dual-Use Hazards of AI Foundation Models](https://arxiv.org//abs/2405.10986) ++ [Benchmark Early and Red Team Often: A Framework for Assessing and Managing Dual-Use Hazards of AI Foundation Models](https://arxiv.org/abs/2405.10986) Anthony M. Barrett, Krystal Jackson, Evan R. Murphy, Nada Madkour, Jessica Newman # 2024-05-14 -+ [Can we Defend Against the Unknown? An Empirical Study About Threshold Selection for Neural Network Monitoring](https://arxiv.org//abs/2405.08654) ++ [Can we Defend Against the Unknown? An Empirical Study About Threshold Selection for Neural Network Monitoring](https://arxiv.org/abs/2405.08654) Khoi Tran Dang, Kevin Delmas, Jérémie Guiochet, Joris Guérin -+ [Towards Safe Large Language Models for Medicine](https://arxiv.org//abs/2403.03744) ++ [Towards Safe Large Language Models for Medicine](https://arxiv.org/abs/2403.03744) Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju -+ [SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models](https://arxiv.org//abs/2405.08317) ++ [SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models](https://arxiv.org/abs/2405.08317) Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff -+ [UnMarker: A Universal Attack on Defensive Watermarking](https://arxiv.org//abs/2405.08363) ++ [UnMarker: A Universal Attack on Defensive Watermarking](https://arxiv.org/abs/2405.08363) Andre Kassis, Urs Hengartner -+ [Certifying Robustness of Graph Convolutional Networks for Node Perturbation with Polyhedra Abstract Interpretation](https://arxiv.org//abs/2405.08645) ++ [Certifying Robustness of Graph Convolutional Networks for Node Perturbation with Polyhedra Abstract Interpretation](https://arxiv.org/abs/2405.08645) Boqi Chen, Kristóf Marussy, Oszkár Semeráth, Gunter Mussbacher, Dániel Varró -+ [Differentially Private Federated Learning: A Systematic Review](https://arxiv.org//abs/2405.08299) ++ [Differentially Private Federated Learning: A Systematic Review](https://arxiv.org/abs/2405.08299) Jie Fu, Yuan Hong, Xinpeng Ling, Leixia Wang, Xun Ran, Zhiyu Sun, Wendy Hui Wang, Zhili Chen, Yang Cao -+ [Work-in-Progress: Crash Course: Can (Under Attack) Autonomous Driving Beat Human Drivers?](https://arxiv.org//abs/2405.08466) ++ [Work-in-Progress: Crash Course: Can (Under Attack) Autonomous Driving Beat Human Drivers?](https://arxiv.org/abs/2405.08466) Francesco Marchiori, Alessandro Brighente, Mauro Conti -+ [Adversarial Machine Learning Threats to Spacecraft](https://arxiv.org//abs/2405.08834) ++ [Adversarial Machine Learning Threats to Spacecraft](https://arxiv.org/abs/2405.08834) Rajiv Thummala, Shristi Sharma, Matteo Calabrese, Gregory Falco -+ [Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning](https://arxiv.org//abs/2405.08920) ++ [Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning](https://arxiv.org/abs/2405.08920) Chendi Wang, Yuqing Zhu, Weijie J. Su, Yu-Xiang Wang -+ [The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks](https://arxiv.org//abs/2405.08886) ++ [The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks](https://arxiv.org/abs/2405.08886) Ziquan Liu, Yufei Cui, Yan Yan, Yi Xu, Xiangyang Ji, Xue Liu, Antoni B. Chan -+ [RS-Reg: Probabilistic and Robust Certified Regression Through Randomized Smoothing](https://arxiv.org//abs/2405.08892) ++ [RS-Reg: Probabilistic and Robust Certified Regression Through Randomized Smoothing](https://arxiv.org/abs/2405.08892) Aref Miri Rekavandi, Olga Ohrimenko, Benjamin I.P. Rubinstein -+ [Private Data Leakage in Federated Human Activity Recognition for Wearable Healthcare Devices](https://arxiv.org//abs/2405.10979) ++ [Private Data Leakage in Federated Human Activity Recognition for Wearable Healthcare Devices](https://arxiv.org/abs/2405.10979) Kongyang Chen, Dongping Zhang, Bing Mi # 2024-05-13 -+ [GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation](https://arxiv.org//abs/2405.07562) ++ [GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation](https://arxiv.org/abs/2405.07562) Andrey V. Galichin, Mikhail Pautov, Alexey Zhavoronkin, Oleg Y. Rogov, Ivan Oseledets -+ [Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection](https://arxiv.org//abs/2405.07595) ++ [Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection](https://arxiv.org/abs/2405.07595) Dehong Kong, Siyuan Liang, Wenqi Ren -+ [CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models](https://arxiv.org//abs/2405.07668) ++ [CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models](https://arxiv.org/abs/2405.07668) Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Bo Jiang, W.K. Chan -+ [RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors](https://arxiv.org//abs/2405.07940) ++ [RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors](https://arxiv.org/abs/2405.07940) Liam Dugan, Alyssa Hwang, Filip Trhlik, Josh Magnus Ludan, Andrew Zhu, Hainiu Xu, Daphne Ippolito, Chris Callison-Burch -+ [Backdoor Removal for Generative Large Language Models](https://arxiv.org//abs/2405.07667) ++ [Backdoor Removal for Generative Large Language Models](https://arxiv.org/abs/2405.07667) Haoran Li, Yulin Chen, Zihao Zheng, Qi Hu, Chunkit Chan, Heshan Liu, Yangqiu Song -+ [Evaluating Google's Protected Audience Protocol](https://arxiv.org//abs/2405.08102) ++ [Evaluating Google's Protected Audience Protocol](https://arxiv.org/abs/2405.08102) Minjun Long, David Evans # 2024-05-12 -+ [VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial Presence](https://arxiv.org//abs/2405.07316) ++ [VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial Presence](https://arxiv.org/abs/2405.07316) Mayank Bakshi, Sara Ghasvarianjahromi, Yauhen Yakimenka, Allison Beemer, Oliver Kosut, Joerg Kliewer # 2024-05-11 -+ [Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought](https://arxiv.org//abs/2405.07018) ++ [Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought](https://arxiv.org/abs/2405.07018) Xiaoxiao Chi, Xuyun Zhang, Yan Wang, Lianyong Qi, Amin Beheshti, Xiaolong Xu, Kim-Kwang Raymond Choo, Shuo Wang, Hongsheng Hu -+ [Disrupting Style Mimicry Attacks on Video Imagery](https://arxiv.org//abs/2405.06865) ++ [Disrupting Style Mimicry Attacks on Video Imagery](https://arxiv.org/abs/2405.06865) Josephine Passananti, Stanley Wu, Shawn Shan, Haitao Zheng, Ben Y. Zhao # 2024-05-10 -+ [Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems](https://arxiv.org//abs/2405.06624) ++ [Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems](https://arxiv.org/abs/2405.06624) David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum -+ [Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning](https://arxiv.org//abs/2405.06206) ++ [Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning](https://arxiv.org/abs/2405.06206) Yujie Zhang, Neil Gong, Michael K. Reiter -+ [Disttack: Graph Adversarial Attacks Toward Distributed GNN Training](https://arxiv.org//abs/2405.06247) ++ [Disttack: Graph Adversarial Attacks Toward Distributed GNN Training](https://arxiv.org/abs/2405.06247) Yuxiang Zhang, Xin Liu, Meng Wu, Wei Yan, Mingyu Yan, Xiaochun Ye, Dongrui Fan -+ [Exploring the Interplay of Interpretability and Robustness in Deep Neural Networks: A Saliency-guided Approach](https://arxiv.org//abs/2405.06278) ++ [Exploring the Interplay of Interpretability and Robustness in Deep Neural Networks: A Saliency-guided Approach](https://arxiv.org/abs/2405.06278) Amira Guesmi, Nishant Suresh Aswani, Muhammad Shafique -+ [Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing](https://arxiv.org//abs/2405.06340) ++ [Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing](https://arxiv.org/abs/2405.06340) Juanjuan Weng, Zhiming Luo, Shaozi Li -+ [Evaluating Adversarial Robustness in the Spatial Frequency Domain](https://arxiv.org//abs/2405.06345) ++ [Evaluating Adversarial Robustness in the Spatial Frequency Domain](https://arxiv.org/abs/2405.06345) Keng-Hsin Liao, Chin-Yuan Yeh, Hsi-Wen Chen, Ming-Syan Chen -+ [Certified $\ell_2$ Attribution Robustness via Uniformly Smoothed Attributions](https://arxiv.org//abs/2405.06361) ++ [Certified $\ell_2$ Attribution Robustness via Uniformly Smoothed Attributions](https://arxiv.org/abs/2405.06361) Fan Wang, Adams Wai-Kin Kong -+ [Risks of Practicing Large Language Models in Smart Grid: Threat Modeling and Validation](https://arxiv.org//abs/2405.06237) ++ [Risks of Practicing Large Language Models in Smart Grid: Threat Modeling and Validation](https://arxiv.org/abs/2405.06237) Jiangnan Li, Yingyuan Yang, Jinyuan Sun -+ [PLeak: Prompt Leaking Attacks against Large Language Model Applications](https://arxiv.org//abs/2405.06823) ++ [PLeak: Prompt Leaking Attacks against Large Language Model Applications](https://arxiv.org/abs/2405.06823) Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, Yinzhi Cao -+ [LLM-Generated Black-box Explanations Can Be Adversarially Helpful](https://arxiv.org//abs/2405.06800) ++ [LLM-Generated Black-box Explanations Can Be Adversarially Helpful](https://arxiv.org/abs/2405.06800) Rohan Ajwani, Shashidhar Reddy Javaji, Frank Rudzicz, Zining Zhu # 2024-05-09 -+ [Towards Robust Physical-world Backdoor Attacks on Lane Detection](https://arxiv.org//abs/2405.05553) ++ [Towards Robust Physical-world Backdoor Attacks on Lane Detection](https://arxiv.org/abs/2405.05553) Xinwei Zhang, Aishan Liu, Tianyuan Zhang, Siyuan Liang, Xianglong Liu -+ [Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness](https://arxiv.org//abs/2405.05930) ++ [Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness](https://arxiv.org/abs/2405.05930) Siyuan Li, Xi Lin, Yaju Liu, Jianhua Li -+ [Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM](https://arxiv.org//abs/2405.05610) ++ [Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM](https://arxiv.org/abs/2405.05610) Xikang Yang, Xuehai Tang, Songlin Hu, Jizhong Han -+ [Towards Accurate and Robust Architectures via Neural Architecture Search](https://arxiv.org//abs/2405.05502) ++ [Towards Accurate and Robust Architectures via Neural Architecture Search](https://arxiv.org/abs/2405.05502) Yuwei Ou, Yuqi Feng, Yanan Sun -+ [Universal Adversarial Perturbations for Vision-Language Pre-trained Models](https://arxiv.org//abs/2405.05524) ++ [Universal Adversarial Perturbations for Vision-Language Pre-trained Models](https://arxiv.org/abs/2405.05524) Peng-Fei Zhang, Zi Huang, Guangdong Bai -+ [Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers](https://arxiv.org//abs/2405.05573) ++ [Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers](https://arxiv.org/abs/2405.05573) Binxiao Huang, Jason Chun Lok, Chang Liu, Ngai Wong -+ [Model Inversion Robustness: Can Transfer Learning Help?](https://arxiv.org//abs/2405.05588) ++ [Model Inversion Robustness: Can Transfer Learning Help?](https://arxiv.org/abs/2405.05588) Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran, Ngoc-Bao Nguyen, Ngai-Man Cheung -+ [Privacy-Preserving Edge Federated Learning for Intelligent Mobile-Health Systems](https://arxiv.org//abs/2405.05611) ++ [Privacy-Preserving Edge Federated Learning for Intelligent Mobile-Health Systems](https://arxiv.org/abs/2405.05611) Amin Aminifar, Matin Shokri, Amir Aminifar -+ [Link Stealing Attacks Against Inductive Graph Neural Networks](https://arxiv.org//abs/2405.05784) ++ [Link Stealing Attacks Against Inductive Graph Neural Networks](https://arxiv.org/abs/2405.05784) Yixin Wu, Xinlei He, Pascal Berrang, Mathias Humbert, Michael Backes, Neil Zhenqiang Gong, Yang Zhang -+ [A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data](https://arxiv.org//abs/2301.10053) ++ [A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data](https://arxiv.org/abs/2301.10053) Meenatchi Sundaram Muthu Selva Annamalai, Andrea Gadotti, Luc Rocher -+ [High-Performance Privacy-Preserving Matrix Completion for Trajectory Recovery](https://arxiv.org//abs/2405.05789) ++ [High-Performance Privacy-Preserving Matrix Completion for Trajectory Recovery](https://arxiv.org/abs/2405.05789) Jiahao Guo, An-Bao Xu -+ [Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models](https://arxiv.org//abs/2405.05990) ++ [Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models](https://arxiv.org/abs/2405.05990) Yang Bai, Ge Pei, Jindong Gu, Yong Yang, Xingjun Ma -+ [Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models](https://arxiv.org//abs/2405.06134) ++ [Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models](https://arxiv.org/abs/2405.06134) Vyas Raina, Rao Ma, Charles McGhee, Kate Knill, Mark Gales -+ [BB-Patch: BlackBox Adversarial Patch-Attack using Zeroth-Order Optimization](https://arxiv.org//abs/2405.06049) ++ [BB-Patch: BlackBox Adversarial Patch-Attack using Zeroth-Order Optimization](https://arxiv.org/abs/2405.06049) Satyadwyoom Kumar, Saurabh Gupta, Arun Balaji Buduru -+ [Hard Work Does Not Always Pay Off: Poisoning Attacks on Neural Architecture Search](https://arxiv.org//abs/2405.06073) ++ [Hard Work Does Not Always Pay Off: Poisoning Attacks on Neural Architecture Search](https://arxiv.org/abs/2405.06073) Zachary Coalson, Huazheng Wang, Qingyun Wu, Sanghyun Hong # 2024-05-08 -+ [Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution](https://arxiv.org//abs/2405.04825) ++ [Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution](https://arxiv.org/abs/2405.04825) Shuo Shao, Yiming Li, Hongwei Yao, Yiling He, Zhan Qin, Kui Ren -+ [The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio](https://arxiv.org//abs/2405.04880) ++ [The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio](https://arxiv.org/abs/2405.04880) Yuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi Sun -+ [BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models](https://arxiv.org//abs/2405.04756) ++ [BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models](https://arxiv.org/abs/2405.04756) Chu Fei Luo, Ahmad Ghawanmeh, Xiaodan Zhu, Faiza Khan Khattak -+ [Mitigating Bias Using Model-Agnostic Data Attribution](https://arxiv.org//abs/2405.05031) ++ [Mitigating Bias Using Model-Agnostic Data Attribution](https://arxiv.org/abs/2405.05031) Sander De Coninck, Wei-Cheng Wang, Sam Leroux, Pieter Simoens -+ [Espresso: Robust Concept Filtering in Text-to-Image Models](https://arxiv.org//abs/2404.19227) ++ [Espresso: Robust Concept Filtering in Text-to-Image Models](https://arxiv.org/abs/2404.19227) Anudeep Das, Vasisht Duddu, Rui Zhang, N. Asokan -+ [Towards Efficient Training and Evaluation of Robust Models against $l_0$ Bounded Adversarial Perturbations](https://arxiv.org//abs/2405.05075) ++ [Towards Efficient Training and Evaluation of Robust Models against $l_0$ Bounded Adversarial Perturbations](https://arxiv.org/abs/2405.05075) Xuyang Zhong, Yixiao Huang, Chen Liu -+ [Adversarial Threats to Automatic Modulation Open Set Recognition in Wireless Networks](https://arxiv.org//abs/2405.05022) ++ [Adversarial Threats to Automatic Modulation Open Set Recognition in Wireless Networks](https://arxiv.org/abs/2405.05022) Yandie Yang, Sicheng Zhang, Kuixian Li, Qiao Tian, Yun Lin -+ [HackCar: a test platform for attacks and defenses on a cost-contained automotive architecture](https://arxiv.org//abs/2405.05023) ++ [HackCar: a test platform for attacks and defenses on a cost-contained automotive architecture](https://arxiv.org/abs/2405.05023) Dario Stabili, Filip Valgimigli, Edoardo Torrini, Mirco Marchetti -+ [Systematic Use of Random Self-Reducibility against Physical Attacks](https://arxiv.org//abs/2405.05193) ++ [Systematic Use of Random Self-Reducibility against Physical Attacks](https://arxiv.org/abs/2405.05193) Ferhat Erata, TingHung Chiu, Anthony Etim, Srilalith Nampally, Tejas Raju, Rajashree Ramu, Ruzica Piskac, Timos Antonopoulos, Wenjie Xiong, Jakub Szefer -+ [Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals](https://arxiv.org//abs/2405.05466) ++ [Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals](https://arxiv.org/abs/2405.05466) Joshua Clymer, Caden Juang, Severin Field -+ [Adversary-Guided Motion Retargeting for Skeleton Anonymization](https://arxiv.org//abs/2405.05428) ++ [Adversary-Guided Motion Retargeting for Skeleton Anonymization](https://arxiv.org/abs/2405.05428) Thomas Carr, Depeng Xu, Aidong Lu -+ [Model Reconstruction Using Counterfactual Explanations: Mitigating the Decision Boundary Shift](https://arxiv.org//abs/2405.05369) ++ [Model Reconstruction Using Counterfactual Explanations: Mitigating the Decision Boundary Shift](https://arxiv.org/abs/2405.05369) Pasan Dissanayake, Sanghamitra Dutta -+ [Untargeted Adversarial Attack on Knowledge Graph Embeddings](https://arxiv.org//abs/2405.10970) ++ [Untargeted Adversarial Attack on Knowledge Graph Embeddings](https://arxiv.org/abs/2405.10970) Tianzhe Zhao, Jiaoyan Chen, Yanchi Ru, Qika Lin, Yuxia Geng, Jun Liu # 2024-05-07 -+ [Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition](https://arxiv.org//abs/2405.04344) ++ [Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition](https://arxiv.org/abs/2405.04344) Chenxi Qiu -+ [Locally Differentially Private In-Context Learning](https://arxiv.org//abs/2405.04032) ++ [Locally Differentially Private In-Context Learning](https://arxiv.org/abs/2405.04032) Chunyan Zheng, Keke Sun, Wenhao Zhao, Haibo Zhou, Lixin Jiang, Shaoyang Song, Chunlai Zhou -+ [A2-DIDM: Privacy-preserving Accumulator-enabled Auditing for Distributed Identity of DNN Model](https://arxiv.org//abs/2405.04108) ++ [A2-DIDM: Privacy-preserving Accumulator-enabled Auditing for Distributed Identity of DNN Model](https://arxiv.org/abs/2405.04108) Tianxiu Xie, Keke Gai, Jing Yu, Liehuang Zhu, Kim-Kwang Raymond Choo -+ [Revisiting character-level adversarial attacks](https://arxiv.org//abs/2405.04346) ++ [Revisiting character-level adversarial attacks](https://arxiv.org/abs/2405.04346) Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G. Chrysos, Volkan Cevher -+ [IPFed: Identity protected federated learning for user authentication](https://arxiv.org//abs/2405.03955) ++ [IPFed: Identity protected federated learning for user authentication](https://arxiv.org/abs/2405.03955) Yosuke Kaga, Yusei Suzuki, Kenta Takahashi -+ [Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method](https://arxiv.org//abs/2405.04133) ++ [Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method](https://arxiv.org/abs/2405.04133) Peisong He, Leyao Zhu, Jiaxing Li, Shiqi Wang, Haoliang Li -+ [Breast Histopathology Image Retrieval by Attention-based Adversarially Regularized Variational Graph Autoencoder with Contrastive Learning-Based Feature Extraction](https://arxiv.org//abs/2405.04211) ++ [Breast Histopathology Image Retrieval by Attention-based Adversarially Regularized Variational Graph Autoencoder with Contrastive Learning-Based Feature Extraction](https://arxiv.org/abs/2405.04211) Nematollah Saeidi, Hossein Karshenas, Bijan Shoushtarian, Sepideh Hatamikia, Ramona Woitek, Amirreza Mahbod -+ [Effective and Robust Adversarial Training against Data and Label Corruptions](https://arxiv.org//abs/2405.04191) ++ [Effective and Robust Adversarial Training against Data and Label Corruptions](https://arxiv.org/abs/2405.04191) Peng-Fei Zhang, Zi Huang, Xin-Shun Xu, Guangdong Bai -+ [Unlearning Backdoor Attacks through Gradient-Based Model Pruning](https://arxiv.org//abs/2405.03918) ++ [Unlearning Backdoor Attacks through Gradient-Based Model Pruning](https://arxiv.org/abs/2405.03918) Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak -+ [Explainability-Informed Targeted Malware Misclassification](https://arxiv.org//abs/2405.04010) ++ [Explainability-Informed Targeted Malware Misclassification](https://arxiv.org/abs/2405.04010) Quincy Card, Kshitiz Aryal, Maanak Gupta -+ [Enabling Privacy-Preserving and Publicly Auditable Federated Learning](https://arxiv.org//abs/2405.04029) ++ [Enabling Privacy-Preserving and Publicly Auditable Federated Learning](https://arxiv.org/abs/2405.04029) Huang Zeng, Anjia Yang, Jian Weng, Min-Rong Chen, Fengjun Xiao, Yi Liu, Ye Yao -+ [A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning](https://arxiv.org//abs/2405.04115) ++ [A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning](https://arxiv.org/abs/2405.04115) Xiaoyang Xu, Mengda Yang, Wenzhe Yi, Ziang Li, Juan Wang, Hongxin Hu, Yong Zhuang, Yaxin Liu # 2024-05-06 -+ [ To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models](https://arxiv.org//abs/2405.03097) ++ [ To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models](https://arxiv.org/abs/2405.03097) George-Octavian Barbulescu, Peter Triantafillou -+ [ Assessing Adversarial Robustness of Large Language Models: An Empirical Study](https://arxiv.org//abs/2405.02764) ++ [ Assessing Adversarial Robustness of Large Language Models: An Empirical Study](https://arxiv.org/abs/2405.02764) Zeyu Yang, Zhao Meng, Xiaochen Zheng, Roger Wattenhofer -+ [ Exploring Frequencies via Feature Mixing and Meta-Learning for Improving Adversarial Transferability](https://arxiv.org//abs/2405.03193) ++ [ Exploring Frequencies via Feature Mixing and Meta-Learning for Improving Adversarial Transferability](https://arxiv.org/abs/2405.03193) Juanjuan Weng, Zhiming Luo, Shaozi Li -+ [ UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images](https://arxiv.org//abs/2405.03486) ++ [ UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images](https://arxiv.org/abs/2405.03486) Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang -+ [ Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation](https://arxiv.org//abs/2405.03649) ++ [ Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation](https://arxiv.org/abs/2405.03649) Guangtao Zheng, Wenqian Ye, Aidong Zhang -+ [ Provably Unlearnable Examples](https://arxiv.org//abs/2405.03316) ++ [ Provably Unlearnable Examples](https://arxiv.org/abs/2405.03316) Derui Wang, Minhui Xue, Bo Li, Seyit Camtepe, Liming Zhu -+ [ GI-SMN: Gradient Inversion Attack against Federated Learning without Prior Knowledge](https://arxiv.org//abs/2405.03516) ++ [ GI-SMN: Gradient Inversion Attack against Federated Learning without Prior Knowledge](https://arxiv.org/abs/2405.03516) Jin Qian, Kaimin Wei, Yongdong Wu, Jilian Zhang, Jipeng Chen, Huan Bao -+ [ Federated Learning Privacy: Attacks, Defenses, Applications, and Policy Landscape - A Survey](https://arxiv.org//abs/2405.03636) ++ [ Federated Learning Privacy: Attacks, Defenses, Applications, and Policy Landscape - A Survey](https://arxiv.org/abs/2405.03636) Joshua C. Zhao, Saurabh Bagchi, Salman Avestimehr, Kevin S. Chan, Somali Chaterji, Dimitris Dimitriadis, Jiacheng Li, Ninghui Li, Arash Nourian, Holger R. Roth -+ [ Cutting through buggy adversarial example defenses: fixing 1 line of code breaks Sabre](https://arxiv.org//abs/2405.03672) ++ [ Cutting through buggy adversarial example defenses: fixing 1 line of code breaks Sabre](https://arxiv.org/abs/2405.03672) Nicholas Carlini -+ [ FOBNN: Fast Oblivious Binarized Neural Network Inference](https://arxiv.org//abs/2405.03136) ++ [ FOBNN: Fast Oblivious Binarized Neural Network Inference](https://arxiv.org/abs/2405.03136) Xin Chen, Zhili Chen, Benchang Dong, Shiwen Wei, Lin Chen, Daojing He -+ [ DarkFed: A Data-Free Backdoor Attack in Federated Learning](https://arxiv.org//abs/2405.03299) ++ [ DarkFed: A Data-Free Backdoor Attack in Federated Learning](https://arxiv.org/abs/2405.03299) Minghui Li, Wei Wan, Yuxuan Ning, Shengshan Hu, Lulu Xue, Leo Yu Zhang, Yichen Wang -+ [ LaserEscape: Detecting and Mitigating Optical Probing Attacks](https://arxiv.org//abs/2405.03632) ++ [ LaserEscape: Detecting and Mitigating Optical Probing Attacks](https://arxiv.org/abs/2405.03632) Saleh Khalaj Monfared, Kyle Mitard, Andrew Cannon, Domenic Forte, Shahin Tajik -+ [Is ReLU Adversarially Robust?](https://arxiv.org//abs/2405.03777) ++ [Is ReLU Adversarially Robust?](https://arxiv.org/abs/2405.03777) Korn Sooksatra, Greg Hamerly, Pablo Rivas -+ [On Adversarial Examples for Text Classification by Perturbing Latent Representations](https://arxiv.org//abs/2405.03789) ++ [On Adversarial Examples for Text Classification by Perturbing Latent Representations](https://arxiv.org/abs/2405.03789) Korn Sooksatra, Bikram Khanal, Pablo Rivas -+ [Enhancing O-RAN Security: Evasion Attacks and Robust Defenses for Graph Reinforcement Learning-based Connection Management](https://arxiv.org//abs/2405.03891) ++ [Enhancing O-RAN Security: Evasion Attacks and Robust Defenses for Graph Reinforcement Learning-based Connection Management](https://arxiv.org/abs/2405.03891) Ravikumar Balakrishnan, Marius Arvinte, Nageen Himayat, Hosein Nikopour, Hassnaa Moustafa -+ [Is ReLU Adversarially Robust?](https://arxiv.org//abs/2405.03777) ++ [Is ReLU Adversarially Robust?](https://arxiv.org/abs/2405.03777) Korn Sooksatra, Greg Hamerly, Pablo Rivas -+ [On Adversarial Examples for Text Classification by Perturbing Latent Representations](https://arxiv.org//abs/2405.03789) ++ [On Adversarial Examples for Text Classification by Perturbing Latent Representations](https://arxiv.org/abs/2405.03789) Korn Sooksatra, Bikram Khanal, Pablo Rivas -+ [Enhancing O-RAN Security: Evasion Attacks and Robust Defenses for Graph Reinforcement Learning-based Connection Management](https://arxiv.org//abs/2405.03891) ++ [Enhancing O-RAN Security: Evasion Attacks and Robust Defenses for Graph Reinforcement Learning-based Connection Management](https://arxiv.org/abs/2405.03891) Ravikumar Balakrishnan, Marius Arvinte, Nageen Himayat, Hosein Nikopour, Hassnaa Moustafa -+ [Generative adversarial learning with optimal input dimension and its adaptive generator architecture](https://arxiv.org//abs/2405.03723) ++ [Generative adversarial learning with optimal input dimension and its adaptive generator architecture](https://arxiv.org/abs/2405.03723) Zhiyao Tan, Ling Zhou, Huazhen Lin -+ [Secure Inference for Vertically Partitioned Data Using Multiparty Homomorphic Encryption](https://arxiv.org//abs/2405.03775) ++ [Secure Inference for Vertically Partitioned Data Using Multiparty Homomorphic Encryption](https://arxiv.org/abs/2405.03775) Shuangyi Chen, Yue Ju, Zhongwen Zhu, Ashish Khisti -+ [Differentially Private Federated Learning without Noise Addition: When is it Possible?](https://arxiv.org//abs/2405.04551) ++ [Differentially Private Federated Learning without Noise Addition: When is it Possible?](https://arxiv.org/abs/2405.04551) Jiang Zhang, Yahya H Ezzeldin, Ahmed Roushdy Elkordy, Konstantinos Psounis, Salman Avestimehr -+ [Differentially Private Synthetic Data with Private Density Estimation](https://arxiv.org//abs/2405.04554) ++ [Differentially Private Synthetic Data with Private Density Estimation](https://arxiv.org/abs/2405.04554) Nikolija Bojkovic, Po-Ling Loh # 2024-05-05 -+ [ Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints](https://arxiv.org//abs/2405.03005) ++ [ Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints](https://arxiv.org/abs/2405.03005) Siow Meng Low, Akshat Kumar -+ [ AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection](https://arxiv.org//abs/2405.03075) ++ [ AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection](https://arxiv.org/abs/2405.03075) Aditya Singh, Pavan Reddy -+ [ Confidential and Protected Disease Classifier using Fully Homomorphic Encryption](https://arxiv.org//abs/2405.02790) ++ [ Confidential and Protected Disease Classifier using Fully Homomorphic Encryption](https://arxiv.org/abs/2405.02790) Aditya Malik, Nalini Ratha, Bharat Yalavarthi, Tilak Sharma, Arjun Kaushik, Charanjit Jutla -+ [ Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy](https://arxiv.org//abs/2405.02828) ++ [ Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy](https://arxiv.org/abs/2405.02828) Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Bowen Xu, Premkumar Devanbu, Mohammad Amin Alipour -+ [ Defense against Joint Poison and Evasion Attacks: A Case Study of DERMS](https://arxiv.org//abs/2405.02989) ++ [ Defense against Joint Poison and Evasion Attacks: A Case Study of DERMS](https://arxiv.org/abs/2405.02989) Zain ul Abdeen, Padmaksha Roy, Ahmad Al-Tawaha, Rouxi Jia, Laura Freeman, Peter Beling, Chen-Ching Liu, Alberto Sangiovanni-Vincentelli, Ming Jin # 2024-05-04 -+ [ Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness](https://arxiv.org//abs/2405.02564) ++ [ Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness](https://arxiv.org/abs/2405.02564) Zhenan Shao, Linjian Ma, Bo Li, Diane M. Beck -+ [ Detecting Edited Knowledge in Language Models](https://arxiv.org//abs/2405.02765) ++ [ Detecting Edited Knowledge in Language Models](https://arxiv.org/abs/2405.02765) Paul Youssef, Zhixue Zhao, Jörg Schlötterer, Christin Seifert -+ [ Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent](https://arxiv.org//abs/2405.03654) ++ [ Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent](https://arxiv.org/abs/2405.03654) Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang -+ [ PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds](https://arxiv.org//abs/2405.02638) ++ [ PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds](https://arxiv.org/abs/2405.02638) Zehan Zhu, Yan Huang, Xin Wang, Jinming Xu -+ [ Updating Windows Malware Detectors: Balancing Robustness and Regression against Adversarial EXEmples](https://arxiv.org//abs/2405.02646) ++ [ Updating Windows Malware Detectors: Balancing Robustness and Regression against Adversarial EXEmples](https://arxiv.org/abs/2405.02646) Matous Kozak, Luca Demetrio, Dmitrijs Trizna, Fabio Roli -+ [ Metric Differential Privacy at the User-Level](https://arxiv.org//abs/2405.02665) ++ [ Metric Differential Privacy at the User-Level](https://arxiv.org/abs/2405.02665) Jacob Imola, Amrita Roy Chowdhury, Kamalika Chaudhuri -+ [Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition](https://arxiv.org//abs/2405.03712) ++ [Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition](https://arxiv.org/abs/2405.03712) Xiaoyan Su, Yinghao Zhu, Run Li # 2024-05-03 -+ [ Impact of Architectural Modifications on Deep Learning Adversarial Robustness](https://arxiv.org//abs/2405.01934) ++ [ Impact of Architectural Modifications on Deep Learning Adversarial Robustness](https://arxiv.org/abs/2405.01934) Firuz Juraev, Mohammed Abuhamad, Simon S. Woo, George K Thiruvathukal, Tamer Abuhmed -+ [ From Attack to Defense: Insights into Deep Learning Security Measures in Black-Box Settings](https://arxiv.org//abs/2405.01963) ++ [ From Attack to Defense: Insights into Deep Learning Security Measures in Black-Box Settings](https://arxiv.org/abs/2405.01963) Firuz Juraev, Mohammed Abuhamad, Eric Chan-Tin, George K. Thiruvathukal, Tamer Abuhmed -+ [ Adversarial Botometer: Adversarial Analysis for Social Bot Detection](https://arxiv.org//abs/2405.02016) ++ [ Adversarial Botometer: Adversarial Analysis for Social Bot Detection](https://arxiv.org/abs/2405.02016) Shaghayegh Najari, Davood Rafiee, Mostafa Salehi, Reza Farahbakhsh -+ [ Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach](https://arxiv.org//abs/2405.02044) ++ [ Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach](https://arxiv.org/abs/2405.02044) Anton Plaksin, Vitaly Kalev -+ [ Uniformly Stable Algorithms for Adversarial Training and Beyond](https://arxiv.org//abs/2405.01817) ++ [ Uniformly Stable Algorithms for Adversarial Training and Beyond](https://arxiv.org/abs/2405.01817) Jiancong Xiao, Jiawei Zhang, Zhi-Quan Luo, Asuman Ozdaglar -+ [ A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion](https://arxiv.org//abs/2405.01838) ++ [ A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion](https://arxiv.org/abs/2405.01838) Trinath Sai Subhash Reddy Pittala, Uma Maheswara Rao Meleti, Geethakrishna Puligundla -+ [ Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes](https://arxiv.org//abs/2405.02188) ++ [ Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes](https://arxiv.org/abs/2405.02188) Sang Bin Moon, Abolfazl Hashemi -+ [ ProFLingo: A Fingerprinting-based Copyright Protection Scheme for Large Language Models](https://arxiv.org//abs/2405.02466) ++ [ ProFLingo: A Fingerprinting-based Copyright Protection Scheme for Large Language Models](https://arxiv.org/abs/2405.02466) Heng Jin, Chaoyu Zhang, Shanghao Shi, Wenjing Lou, Y. Thomas Hou -+ [ Adaptive and robust watermark against model extraction attack](https://arxiv.org//abs/2405.02365) ++ [ Adaptive and robust watermark against model extraction attack](https://arxiv.org/abs/2405.02365) Kaiyi Pang, Tao Qi, Chuhan Wu, Minhao Bai # 2024-05-02 -+ [ PVF (Parameter Vulnerability Factor): A Quantitative Metric Measuring AI Vulnerability and Resilience Against Parameter Corruptions](https://arxiv.org//abs/2405.01741) ++ [ PVF (Parameter Vulnerability Factor): A Quantitative Metric Measuring AI Vulnerability and Resilience Against Parameter Corruptions](https://arxiv.org/abs/2405.01741) Xun Jiao, Fred Lin, Harish D. Dixit, Joel Coburn, Abhinav Pandey, Han Wang, Jianyu Huang, Venkat Ramesh, Wang Xu, Daniel Moore, Sriram Sankar -+ [ Privacy-aware Berrut Approximated Coded Computing for Federated Learning](https://arxiv.org//abs/2405.01704) ++ [ Privacy-aware Berrut Approximated Coded Computing for Federated Learning](https://arxiv.org/abs/2405.01704) Xavier Martínez Luaña, Rebeca P. Díaz Redondo, Manuel Fernández Veiga -+ [ Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk](https://arxiv.org//abs/2405.01718) ++ [ Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk](https://arxiv.org/abs/2405.01718) Xinyi Ni, Lifeng Lai -+ [ Adversarial Attacks on Reinforcement Learning Agents for Command and Control](https://arxiv.org//abs/2405.01693) ++ [ Adversarial Attacks on Reinforcement Learning Agents for Command and Control](https://arxiv.org/abs/2405.01693) Ahaan Dabholkar, James Z. Hare, Mark Mittrick, John Richardson, Nicholas Waytowich, Priya Narayanan, Saurabh Bagchi -+ [ ATTAXONOMY: Unpacking Differential Privacy Guarantees Against Practical Adversaries](https://arxiv.org//abs/2405.01716) ++ [ ATTAXONOMY: Unpacking Differential Privacy Guarantees Against Practical Adversaries](https://arxiv.org/abs/2405.01716) Rachel Cummings, Shlomi Hod, Jayshree Sarathy, Marika Swanberg -+ [ Explainability Guided Adversarial Evasion Attacks on Malware Detectors](https://arxiv.org//abs/2405.01728) ++ [ Explainability Guided Adversarial Evasion Attacks on Malware Detectors](https://arxiv.org/abs/2405.01728) Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam, Moustafa Saleh -+ [ Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods](https://arxiv.org//abs/2405.02344) ++ [ Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods](https://arxiv.org/abs/2405.02344) Peiyu Yang, Naveed Akhtar, Jiantong Jiang, Ajmal Mian -+ [ Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy](https://arxiv.org//abs/2405.02341) ++ [ Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy](https://arxiv.org/abs/2405.02341) Wei-Ning Chen, Berivan Isik, Peter Kairouz, Albert No, Sewoong Oh, Zheng Xu -+ [ Temporal assessment of malicious behaviors: application to turnout field data monitoring](https://arxiv.org//abs/2405.02346) ++ [ Temporal assessment of malicious behaviors: application to turnout field data monitoring](https://arxiv.org/abs/2405.02346) Sara Abdellaoui, Emil Dumitrescu, Cédric Escudero, Eric Zamaï @@ -25330,1184 +25330,1184 @@ It appears that the [List of All Adversarial Example Papers](https://nicholas.ca # 2024-04-29 -+ [ Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models](https://arxiv.org//abs/2404.18353) ++ [ Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models](https://arxiv.org/abs/2404.18353) Norbert Tihanyi, Tamas Bisztray, Mohamed Amine Ferrag, Ridhi Jain, Lucas C. Cordeiro -+ [ Certification of Speaker Recognition Models to Additive Perturbations](https://arxiv.org//abs/2404.18791) ++ [ Certification of Speaker Recognition Models to Additive Perturbations](https://arxiv.org/abs/2404.18791) Dmitrii Korzh, Elvir Karimov, Mikhail Pautov, Oleg Y. Rogov, Ivan Oseledets -+ [ Harmonic Machine Learning Models are Robust](https://arxiv.org//abs/2404.18825) ++ [ Harmonic Machine Learning Models are Robust](https://arxiv.org/abs/2404.18825) Nicholas S. Kersting, Yi Li, Aman Mohanty, Oyindamola Obisesan, Raphael Okochu -+ [ Uncertainty-boosted Robust Video Activity Anticipation](https://arxiv.org//abs/2404.18648) ++ [ Uncertainty-boosted Robust Video Activity Anticipation](https://arxiv.org/abs/2404.18648) Zhaobo Qi, Shuhui Wang, Weigang Zhang, Qingming Huang -+ [ Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots](https://arxiv.org//abs/2404.18702) ++ [ Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots](https://arxiv.org/abs/2404.18702) Xi Xin, Fei Huang, Giles Hooker -+ [ A Systematic Evaluation of Adversarial Attacks against Speech Emotion Recognition Models](https://arxiv.org//abs/2404.18514) ++ [ A Systematic Evaluation of Adversarial Attacks against Speech Emotion Recognition Models](https://arxiv.org/abs/2404.18514) Nicolas Facchinetti, Federico Simonetta, Stavros Ntalampiras -+ [ Assessing Cybersecurity Vulnerabilities in Code Large Language Models](https://arxiv.org//abs/2404.18567) ++ [ Assessing Cybersecurity Vulnerabilities in Code Large Language Models](https://arxiv.org/abs/2404.18567) Md Imran Hossen, Jianyi Zhang, Yinzhi Cao, Xiali Hei # 2024-04-27 -+ [ Adversarial Examples: Generation Proposal in the Context of Facial Recognition Systems](https://arxiv.org//abs/2404.17760) ++ [ Adversarial Examples: Generation Proposal in the Context of Facial Recognition Systems](https://arxiv.org/abs/2404.17760) Marina Fuster, Ignacio Vidaurreta -+ [ Bounding the Expected Robustness of Graph Neural Networks Subject to Node Feature Attacks](https://arxiv.org//abs/2404.17947) ++ [ Bounding the Expected Robustness of Graph Neural Networks Subject to Node Feature Attacks](https://arxiv.org/abs/2404.17947) Yassine Abbahaddou, Sofiane Ennadir, Johannes F. Lutzeyer, Michalis Vazirgiannis, Henrik Boström -+ [ Privacy-Preserving Aggregation for Decentralized Learning with Byzantine-Robustness](https://arxiv.org//abs/2404.17970) ++ [ Privacy-Preserving Aggregation for Decentralized Learning with Byzantine-Robustness](https://arxiv.org/abs/2404.17970) Ali Reza Ghavamipour, Benjamin Zi Hao Zhao, Oguzhan Ersoy, Fatih Turkmen -+ [ Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics](https://arxiv.org//abs/2404.17867) ++ [ Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics](https://arxiv.org/abs/2404.17867) Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin -+ [ Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection](https://arxiv.org//abs/2404.17839) ++ [ Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection](https://arxiv.org/abs/2404.17839) Yizhou Chen, Zeyu Sun, Zhihao Gong, Dan Hao # 2024-04-26 -+ [ Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs](https://arxiv.org//abs/2404.17120) ++ [ Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs](https://arxiv.org/abs/2404.17120) Valeriia Cherepanova, James Zou -+ [ Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications](https://arxiv.org//abs/2404.17196) ++ [ Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications](https://arxiv.org/abs/2404.17196) Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, Yu Jiang -+ [ Enhancing Privacy and Security of Autonomous UAV Navigation](https://arxiv.org//abs/2404.17225) ++ [ Enhancing Privacy and Security of Autonomous UAV Navigation](https://arxiv.org/abs/2404.17225) Vatsal Aggarwal, Arjun Ramesh Kaushik, Charanjit Jutla, Nalini Ratha -+ [ M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training](https://arxiv.org//abs/2404.17391) ++ [ M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training](https://arxiv.org/abs/2404.17391) Lakmal Meegahapola, Hamza Hassoune, Daniel Gatica-Perez -+ [ Defending Spiking Neural Networks against Adversarial Attacks through Image Purification](https://arxiv.org//abs/2404.17092) ++ [ Defending Spiking Neural Networks against Adversarial Attacks through Image Purification](https://arxiv.org/abs/2404.17092) Weiran Chen, Qi Sun, Qi Xu -+ [ Adversarial Reweighting with $α$-Power Maximization for Domain Adaptation](https://arxiv.org//abs/2404.17275) ++ [ Adversarial Reweighting with $α$-Power Maximization for Domain Adaptation](https://arxiv.org/abs/2404.17275) Xiang Gu, Xi Yu, Yan Yang, Jian Sun, Zongben Xu -+ [ Estimating the Robustness Radius for Randomized Smoothing with 100$\times$ Sample Efficiency](https://arxiv.org//abs/2404.17371) ++ [ Estimating the Robustness Radius for Randomized Smoothing with 100$\times$ Sample Efficiency](https://arxiv.org/abs/2404.17371) Emmanouil Seferis, Stefanos Kollias, Chih-Hong Cheng -+ [ Adversarial Consistency and the Uniqueness of the Adversarial Bayes Classifier](https://arxiv.org//abs/2404.17358) ++ [ Adversarial Consistency and the Uniqueness of the Adversarial Bayes Classifier](https://arxiv.org/abs/2404.17358) Natalie S. Frank -+ [ Evaluations of Machine Learning Privacy Defenses are Misleading](https://arxiv.org//abs/2404.17399) ++ [ Evaluations of Machine Learning Privacy Defenses are Misleading](https://arxiv.org/abs/2404.17399) Michael Aerni, Jie Zhang, Florian Tramèr -+ [ Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning](https://arxiv.org//abs/2404.17617) ++ [ Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning](https://arxiv.org/abs/2404.17617) Tao Liu, Yuhang Zhang, Zhu Feng, Zhiqin Yang, Chen Xu, Dapeng Man, Wu Yang -+ [ Center-Based Relaxed Learning Against Membership Inference Attacks](https://arxiv.org//abs/2404.17674) ++ [ Center-Based Relaxed Learning Against Membership Inference Attacks](https://arxiv.org/abs/2404.17674) Xingli Fang, Jung-Eun Kim -+ [ Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models](https://arxiv.org//abs/2405.02332) ++ [ Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models](https://arxiv.org/abs/2405.02332) Adrien Le Coz, Houssem Ouertatani, Stéphane Herbin, Faouzi Adjed # 2024-04-25 -+ [ Constructing Optimal Noise Channels for Enhanced Robustness in Quantum Machine Learning](https://arxiv.org//abs/2404.16417) ++ [ Constructing Optimal Noise Channels for Enhanced Robustness in Quantum Machine Learning](https://arxiv.org/abs/2404.16417) David Winderl, Nicola Franco, Jeanette Miriam Lorenz -+ [ Towards Precise Observations of Neural Model Robustness in Classification](https://arxiv.org//abs/2404.16457) ++ [ Towards Precise Observations of Neural Model Robustness in Classification](https://arxiv.org/abs/2404.16457) Wenchuan Mu, Kwan Hui Lim -+ [ Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples](https://arxiv.org//abs/2404.16557) ++ [ Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples](https://arxiv.org/abs/2404.16557) Kuofeng Gao, Jindong Gu, Yang Bai, Shu-Tao Xia, Philip Torr, Wei Liu, Zhifeng Li -+ [ Understanding Privacy Risks of Embeddings Induced by Large Language Models](https://arxiv.org//abs/2404.16587) ++ [ Understanding Privacy Risks of Embeddings Induced by Large Language Models](https://arxiv.org/abs/2404.16587) Zhihao Zhu, Ninglu Shao, Defu Lian, Chenwang Wu, Zheng Liu, Yi Yang, Enhong Chen -+ [ Don't Say No: Jailbreaking LLM by Suppressing Refusal](https://arxiv.org//abs/2404.16369) ++ [ Don't Say No: Jailbreaking LLM by Suppressing Refusal](https://arxiv.org/abs/2404.16369) Yukai Zhou, Wenjie Wang -+ [ PAD: Patch-Agnostic Defense against Adversarial Patch Attacks](https://arxiv.org//abs/2404.16452) ++ [ PAD: Patch-Agnostic Defense against Adversarial Patch Attacks](https://arxiv.org/abs/2404.16452) Lihua Jing, Rui Wang, Wenqi Ren, Xin Dong, Cong Zou -+ [ Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference](https://arxiv.org//abs/2404.16287) ++ [ Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference](https://arxiv.org/abs/2404.16287) Zhe Zhang, Ryumei Nakada, Linjun Zhang -+ [ Boosting Model Resilience via Implicit Adversarial Data Augmentation](https://arxiv.org//abs/2404.16307) ++ [ Boosting Model Resilience via Implicit Adversarial Data Augmentation](https://arxiv.org/abs/2404.16307) Xiaoling Zhou, Wei Ye, Zhemg Lee, Rui Xie, Shikun Zhang -+ [ Generating Minimalist Adversarial Perturbations to Test Object-Detection Models: An Adaptive Multi-Metric Evolutionary Search Approach](https://arxiv.org//abs/2404.17020) ++ [ Generating Minimalist Adversarial Perturbations to Test Object-Detection Models: An Adaptive Multi-Metric Evolutionary Search Approach](https://arxiv.org/abs/2404.17020) Cristopher McIntyre-Garcia, Adrien Heymans, Beril Borali, Won-Sook Lee, Shiva Nejati -+ [ A Notion of Uniqueness for the Adversarial Bayes Classifier](https://arxiv.org//abs/2404.16956) ++ [ A Notion of Uniqueness for the Adversarial Bayes Classifier](https://arxiv.org/abs/2404.16956) Natalie S. Frank # 2024-04-24 -+ [ A General Black-box Adversarial Attack on Graph-based Fake News Detectors](https://arxiv.org//abs/2404.15744) ++ [ A General Black-box Adversarial Attack on Graph-based Fake News Detectors](https://arxiv.org/abs/2404.15744) Peican Zhu, Zechen Pan, Yang Liu, Jiwei Tian, Keke Tang, Zhen Wang -+ [ Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks](https://arxiv.org//abs/2404.15881) ++ [ Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks](https://arxiv.org/abs/2404.15881) Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee -+ [ Universal Adversarial Triggers Are Not Universal](https://arxiv.org//abs/2404.16020) ++ [ Universal Adversarial Triggers Are Not Universal](https://arxiv.org/abs/2404.16020) Nicholas Meade, Arkil Patel, Siva Reddy -+ [ 3D Face Morphing Attack Generation using Non-Rigid Registration](https://arxiv.org//abs/2404.15765) ++ [ 3D Face Morphing Attack Generation using Non-Rigid Registration](https://arxiv.org/abs/2404.15765) Jag Mohan Singh, Raghavendra Ramachandra -+ [ Vision Transformer-based Adversarial Domain Adaptation](https://arxiv.org//abs/2404.15817) ++ [ Vision Transformer-based Adversarial Domain Adaptation](https://arxiv.org/abs/2404.15817) Yahan Li, Yuan Wu -+ [ Beyond Deepfake Images: Detecting AI-Generated Videos](https://arxiv.org//abs/2404.15955) ++ [ Beyond Deepfake Images: Detecting AI-Generated Videos](https://arxiv.org/abs/2404.15955) Danial Samadi Vahdati, Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm -+ [ MISLEAD: Manipulating Importance of Selected features for Learning Epsilon in Evasion Attack Deception](https://arxiv.org//abs/2404.15656) ++ [ MISLEAD: Manipulating Importance of Selected features for Learning Epsilon in Evasion Attack Deception](https://arxiv.org/abs/2404.15656) Vidit Khazanchi, Pavan Kulkarni, Yuvaraj Govindarajulu, Manojkumar Parmar -+ [ CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning](https://arxiv.org//abs/2404.15854) ++ [ CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning](https://arxiv.org/abs/2404.15854) Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu -+ [ Security Analysis of WiFi-based Sensing Systems: Threats from Perturbation Attacks](https://arxiv.org//abs/2404.15587) ++ [ Security Analysis of WiFi-based Sensing Systems: Threats from Perturbation Attacks](https://arxiv.org/abs/2404.15587) Hangcheng Cao, Wenbin Huang, Guowen Xu, Xianhao Chen, Ziyang He, Jingyang Hu, Hongbo Jiang, Yuguang Fang -+ [ PoisonedFL: Model Poisoning Attacks to Federated Learning via Multi-Round Consistency](https://arxiv.org//abs/2404.15611) ++ [ PoisonedFL: Model Poisoning Attacks to Federated Learning via Multi-Round Consistency](https://arxiv.org/abs/2404.15611) Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong -+ [ Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy](https://arxiv.org//abs/2404.15686) ++ [ Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy](https://arxiv.org/abs/2404.15686) Sehyun Ryu, Jonggyu Jang, Hyun Jong Yang -+ [ Advancing Recommender Systems by mitigating Shilling attacks](https://arxiv.org//abs/2404.16177) ++ [ Advancing Recommender Systems by mitigating Shilling attacks](https://arxiv.org/abs/2404.16177) Aditya Chichani, Juzer Golwala, Tejas Gundecha, Kiran Gawande -+ [ Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions](https://arxiv.org//abs/2404.16251) ++ [ Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions](https://arxiv.org/abs/2404.16251) Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu -+ [ An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape](https://arxiv.org//abs/2404.16212) ++ [ An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape](https://arxiv.org/abs/2404.16212) Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, Bimal Viswanath -+ [ Enhancing Privacy in Face Analytics Using Fully Homomorphic Encryption](https://arxiv.org//abs/2404.16255) ++ [ Enhancing Privacy in Face Analytics Using Fully Homomorphic Encryption](https://arxiv.org/abs/2404.16255) Bharat Yalavarthi, Arjun Ramesh Kaushik, Arun Ross, Vishnu Boddeti, Nalini Ratha -+ [ A Comparative Analysis of Adversarial Robustness for Quantum and Classical Machine Learning Models](https://arxiv.org//abs/2404.16154) ++ [ A Comparative Analysis of Adversarial Robustness for Quantum and Classical Machine Learning Models](https://arxiv.org/abs/2404.16154) Maximilian Wendlinger, Kilian Tscharke, Pascal Debus -+ [ Attacks on Third-Party APIs of Large Language Models](https://arxiv.org//abs/2404.16891) ++ [ Attacks on Third-Party APIs of Large Language Models](https://arxiv.org/abs/2404.16891) Wanru Zhao, Vidit Khazanchi, Haodi Xing, Xuanli He, Qiongkai Xu, Nicholas Donald Lane # 2024-04-23 -+ [ Talk Too Much: Poisoning Large Language Models under Token Limit](https://arxiv.org//abs/2404.14795) ++ [ Talk Too Much: Poisoning Large Language Models under Token Limit](https://arxiv.org/abs/2404.14795) Jiaming He, Wenbo Jiang, Guanyu Hou, Wenshu Fan, Rui Zhang, Hongwei Li -+ [ A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation](https://arxiv.org//abs/2404.14746) ++ [ A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation](https://arxiv.org/abs/2404.14746) Phoebe Jing, Yijing Gao, Xianlong Zeng -+ [ Leverage Variational Graph Representation For Model Poisoning on Federated Learning](https://arxiv.org//abs/2404.15042) ++ [ Leverage Variational Graph Representation For Model Poisoning on Federated Learning](https://arxiv.org/abs/2404.15042) Kai Li, Xin Yuan, Jingjing Zheng, Wei Ni, Falko Dressler, Abbas Jamalipour -+ [ Formal Verification of Graph Convolutional Networks with Uncertain Node Features and Uncertain Graph Structure](https://arxiv.org//abs/2404.15065) ++ [ Formal Verification of Graph Convolutional Networks with Uncertain Node Features and Uncertain Graph Structure](https://arxiv.org/abs/2404.15065) Tobias Ladner, Michael Eichelbeck, Matthias Althoff -+ [ Manipulating Recommender Systems: A Survey of Poisoning Attacks and Countermeasures](https://arxiv.org//abs/2404.14942) ++ [ Manipulating Recommender Systems: A Survey of Poisoning Attacks and Countermeasures](https://arxiv.org/abs/2404.14942) Thanh Toan Nguyen, Quoc Viet Hung Nguyen, Thanh Tam Nguyen, Thanh Trung Huynh, Thanh Thi Nguyen, Matthias Weidlich, Hongzhi Yin -+ [ Double Privacy Guard: Robust Traceable Adversarial Watermarking against Face Recognition](https://arxiv.org//abs/2404.14693) ++ [ Double Privacy Guard: Robust Traceable Adversarial Watermarking against Face Recognition](https://arxiv.org/abs/2404.14693) Yunming Zhang, Dengpan Ye, Sipeng Shen, Caiyun Xie, Ziyi Liu, Jiacheng Deng, Long Tang -+ [ Every Breath You Don't Take: Deepfake Speech Detection Using Breath](https://arxiv.org//abs/2404.15143) ++ [ Every Breath You Don't Take: Deepfake Speech Detection Using Breath](https://arxiv.org/abs/2404.15143) Seth Layton, Thiago De Andrade, Daniel Olszewski, Kevin Warren, Carrie Gates, Kevin Butler, Patrick Traynor -+ [ Rethinking LLM Memorization through the Lens of Adversarial Compression](https://arxiv.org//abs/2404.15146) ++ [ Rethinking LLM Memorization through the Lens of Adversarial Compression](https://arxiv.org/abs/2404.15146) Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary C. Lipton, J. Zico Kolter -+ [ Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs](https://arxiv.org//abs/2404.14461) ++ [ Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs](https://arxiv.org/abs/2404.14461) Javier Rando, Francesco Croce, Kryštof Mitka, Stepan Shabalin, Maksym Andriushchenko, Nicolas Flammarion, Florian Tramèr -+ [ The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking](https://arxiv.org//abs/2404.14581) ++ [ The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking](https://arxiv.org/abs/2404.14581) Yuying Li, Zeyan Liu, Junyi Zhao, Liangqin Ren, Fengjun Li, Jiebo Luo, Bo Luo -+ [ Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares](https://arxiv.org//abs/2404.15409) ++ [ Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares](https://arxiv.org/abs/2404.15409) Gavin Brown, Jonathan Hayase, Samuel Hopkins, Weihao Kong, Xiyang Liu, Sewoong Oh, Juan C. Perdomo, Adam Smith -+ [LaneCorrect: Self-supervised Lane Detection](https://arxiv.org//abs/2404.14671) ++ [LaneCorrect: Self-supervised Lane Detection](https://arxiv.org/abs/2404.14671) Ming Nie, Xinyue Cai, Hang Xu, Li Zhang # 2024-04-22 -+ [ Protecting Your LLMs with Information Bottleneck](https://arxiv.org//abs/2404.13968) ++ [ Protecting Your LLMs with Information Bottleneck](https://arxiv.org/abs/2404.13968) Zichuan Liu, Zefan Wang, Linjie Xu, Jinyu Wang, Lei Song, Tianchun Wang, Chunlin Chen, Wei Cheng, Jiang Bian -+ [ Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback](https://arxiv.org//abs/2404.14233) ++ [ Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback](https://arxiv.org/abs/2404.14233) Wenyi Xiao, Ziwei Huang, Leilei Gan, Wanggui He, Haoyuan Li, Zhelun Yu, Hao Jiang, Fei Wu, Linchao Zhu -+ [ Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations](https://arxiv.org//abs/2404.13948) ++ [ Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations](https://arxiv.org/abs/2404.13948) Sukmin Cho, Soyeong Jeong, Jeongyeon Seo, Taeho Hwang, Jong C. Park -+ [ Zero-shot Cross-lingual Stance Detection via Adversarial Language Adaptation](https://arxiv.org//abs/2404.14339) ++ [ Zero-shot Cross-lingual Stance Detection via Adversarial Language Adaptation](https://arxiv.org/abs/2404.14339) Bharathi A, Arkaitz Zubiaga -+ [ Swap It Like Its Hot: Segmentation-based spoof attacks on eye-tracking images](https://arxiv.org//abs/2404.13827) ++ [ Swap It Like Its Hot: Segmentation-based spoof attacks on eye-tracking images](https://arxiv.org/abs/2404.13827) Anish S. Narkar, Brendan David-John -+ [ FreqBlender: Enhancing DeepFake Detection by Blending Frequency Knowledge](https://arxiv.org//abs/2404.13872) ++ [ FreqBlender: Enhancing DeepFake Detection by Blending Frequency Knowledge](https://arxiv.org/abs/2404.13872) Hanzhe Li, Jiaran Zhou, Bin Li, Junyu Dong, Yuezun Li -+ [ CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction](https://arxiv.org//abs/2404.14042) ++ [ CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction](https://arxiv.org/abs/2404.14042) Wenhao Lan, Yijun Yang, Haihua Shen, Shan Li -+ [ Towards Better Adversarial Purification via Adversarial Denoising Diffusion Training](https://arxiv.org//abs/2404.14309) ++ [ Towards Better Adversarial Purification via Adversarial Denoising Diffusion Training](https://arxiv.org/abs/2404.14309) Yiming Liu, Kezhao Liu, Yao Xiao, Ziyi Dong, Xiaogang Xu, Pengxu Wei, Liang Lin -+ [ Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference](https://arxiv.org//abs/2404.13815) ++ [ Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference](https://arxiv.org/abs/2404.13815) Yujin Han, Difan Zou -+ [ Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning](https://arxiv.org//abs/2404.13860) ++ [ Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2404.13860) Huan Bao, Kaimin Wei, Yongdong Wu, Jin Qian, Robert H. Deng -+ [ Explicit Lipschitz Value Estimation Enhances Policy Robustness Against Perturbation](https://arxiv.org//abs/2404.13879) ++ [ Explicit Lipschitz Value Estimation Enhances Policy Robustness Against Perturbation](https://arxiv.org/abs/2404.13879) Xulin Chen, Ruipeng Liu, Garrett E. Katz -+ [ Dual Model Replacement:invisible Multi-target Backdoor Attack based on Federal Learning](https://arxiv.org//abs/2404.13946) ++ [ Dual Model Replacement:invisible Multi-target Backdoor Attack based on Federal Learning](https://arxiv.org/abs/2404.13946) Rong Wang, Guichen Zhou, Mingjun Gao, Yunpeng Xiao -+ [ Poisoning Attacks on Federated Learning-based Wireless Traffic Prediction](https://arxiv.org//abs/2404.14389) ++ [ Poisoning Attacks on Federated Learning-based Wireless Traffic Prediction](https://arxiv.org/abs/2404.14389) Zifan Zhang, Minghong Fang, Jiayuan Huang, Yuchen Liu -+ [ A mean curvature flow arising in adversarial training](https://arxiv.org//abs/2404.14402) ++ [ A mean curvature flow arising in adversarial training](https://arxiv.org/abs/2404.14402) Leon Bungert, Tim Laux, Kerrek Stinson -+ [ Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models](https://arxiv.org//abs/2404.14138) ++ [ Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models](https://arxiv.org/abs/2404.14138) Alberto Castagnaro, Mauro Conti, Luca Pajola -+ [A Survey on Speech Deepfake Detection](https://arxiv.org//abs/2404.13914) ++ [A Survey on Speech Deepfake Detection](https://arxiv.org/abs/2404.13914) Menglu Li, Yasaman Ahmadiadli, Xiao-Ping Zhang -+ [Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion](https://arxiv.org//abs/2404.14161) ++ [Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion](https://arxiv.org/abs/2404.14161) Dohoon Lee, Jaehyun Park, Hyunwoo J. Kim, Kyogu Lee # 2024-04-21 -+ [ Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion](https://arxiv.org//abs/2404.13518) ++ [ Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion](https://arxiv.org/abs/2404.13518) Hongyu Zhu, Sichu Liang, Wentao Hu, Fangqi Li, Ju Jia, Shilin Wang -+ [ FedMPQ: Secure and Communication-Efficient Federated Learning with Multi-codebook Product Quantization](https://arxiv.org//abs/2404.13575) ++ [ FedMPQ: Secure and Communication-Efficient Federated Learning with Multi-codebook Product Quantization](https://arxiv.org/abs/2404.13575) Xu Yang, Jiapeng Zhang, Qifeng Zhang, Zhuo Tang -+ [ Interval Abstractions for Robust Counterfactual Explanations](https://arxiv.org//abs/2404.13736) ++ [ Interval Abstractions for Robust Counterfactual Explanations](https://arxiv.org/abs/2404.13736) Junqi Jiang, Francesco Leofante, Antonio Rago, Francesca Toni -+ [ Towards General Conceptual Model Editing via Adversarial Representation Engineering](https://arxiv.org//abs/2404.13752) ++ [ Towards General Conceptual Model Editing via Adversarial Representation Engineering](https://arxiv.org/abs/2404.13752) Yihao Zhang, Zeming Wei, Jun Sun, Meng Sun -+ [ Trojan Detection in Large Language Models: Insights from The Trojan Detection Challenge](https://arxiv.org//abs/2404.13660) ++ [ Trojan Detection in Large Language Models: Insights from The Trojan Detection Challenge](https://arxiv.org/abs/2404.13660) Narek Maloyan, Ekansh Verma, Bulat Nutfullin, Bislan Ashinov -+ [ Attack on Scene Flow using Point Clouds](https://arxiv.org//abs/2404.13621) ++ [ Attack on Scene Flow using Point Clouds](https://arxiv.org/abs/2404.13621) Haniyeh Ehsani Oskouie, Mohammad-Shahram Moin, Shohreh Kasaei -+ [ Mean Aggregator Is More Robust Than Robust Aggregators Under Label Poisoning Attacks](https://arxiv.org//abs/2404.13647) ++ [ Mean Aggregator Is More Robust Than Robust Aggregators Under Label Poisoning Attacks](https://arxiv.org/abs/2404.13647) Jie Peng, Weiyu Li, Qing Ling -+ [ LLMs in Web-Development: Evaluating LLM-Generated PHP code unveiling vulnerabilities and limitations](https://arxiv.org//abs/2404.14459) ++ [ LLMs in Web-Development: Evaluating LLM-Generated PHP code unveiling vulnerabilities and limitations](https://arxiv.org/abs/2404.14459) Rebeka Tóth, Tamas Bisztray, László Erdodi -+ [ Robust EEG-based Emotion Recognition Using an Inception and Two-sided Perturbation Model](https://arxiv.org//abs/2404.15373) ++ [ Robust EEG-based Emotion Recognition Using an Inception and Two-sided Perturbation Model](https://arxiv.org/abs/2404.15373) Shadi Sartipi, Mujdat Cetin -+ [ AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs](https://arxiv.org//abs/2404.16873) ++ [ AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs](https://arxiv.org/abs/2404.16873) Anselm Paulus, Arman Zharmagambetov, Chuan Guo, Brandon Amos, Yuandong Tian # 2024-04-20 -+ [ Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think](https://arxiv.org//abs/2404.13320) ++ [ Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think](https://arxiv.org/abs/2404.13320) Haotian Xue, Yongxin Chen -+ [ AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models](https://arxiv.org//abs/2404.13425) ++ [ AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models](https://arxiv.org/abs/2404.13425) Yuheng Ji, Yue Liu, Zhicheng Zhang, Zhao Zhang, Yuting Zhao, Gang Zhou, Xingwei Zhang, Xinwang Liu, Xiaolong Zheng -+ [ PristiQ: A Co-Design Framework for Preserving Data Security of Quantum Learning in the Cloud](https://arxiv.org//abs/2404.13475) ++ [ PristiQ: A Co-Design Framework for Preserving Data Security of Quantum Learning in the Cloud](https://arxiv.org/abs/2404.13475) Zhepeng Wang, Yi Sheng, Nirajan Koirala, Kanad Basu, Taeho Jung, Cheng-Chang Lu, Weiwen Jiang -+ [ Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives](https://arxiv.org//abs/2404.13277) ++ [ Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives](https://arxiv.org/abs/2404.13277) Chenxi Yang, Yujia Liu, Dingquan Li, Yan Zhong, Tingting Jiang -+ [ Backdoor Attacks and Defenses on Semantic-Symbol Reconstruction in Semantic Communications](https://arxiv.org//abs/2404.13279) ++ [ Backdoor Attacks and Defenses on Semantic-Symbol Reconstruction in Semantic Communications](https://arxiv.org/abs/2404.13279) Yuan Zhou, Rose Qingyang Hu, Yi Qian -+ [ Generative Subspace Adversarial Active Learning for Outlier Detection in Multiple Views of High-dimensional Data](https://arxiv.org//abs/2404.14451) ++ [ Generative Subspace Adversarial Active Learning for Outlier Detection in Multiple Views of High-dimensional Data](https://arxiv.org/abs/2404.14451) Jose Cribeiro-Ramallo, Vadim Arzamasov, Federico Matteucci, Denis Wambold, Klemens Böhm -+ [A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models](https://arxiv.org//abs/2404.14445) ++ [A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models](https://arxiv.org/abs/2404.14445) Yefeng Yuan, Yuhong Liu, Liang Cheng # 2024-04-19 -+ [ How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples](https://arxiv.org//abs/2404.12653) ++ [ How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples](https://arxiv.org/abs/2404.12653) Dren Fazlija, Arkadij Orlov, Johanna Schrader, Monty-Maximilian Zühlke, Michael Rohs, Daniel Kudenko -+ [ A Clean-graph Backdoor Attack against Graph Convolutional Networks with Poisoned Label Only](https://arxiv.org//abs/2404.12704) ++ [ A Clean-graph Backdoor Attack against Graph Convolutional Networks with Poisoned Label Only](https://arxiv.org/abs/2404.12704) Jiazhu Dai, Haoyu Sun -+ [ AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation](https://arxiv.org//abs/2404.12635) ++ [ AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation](https://arxiv.org/abs/2404.12635) Heqi Peng, Yunhong Wang, Ruijie Yang, Beichen Li, Rui Wang, Yuanfang Guo -+ [ MLSD-GAN -- Generating Strong High Quality Face Morphing Attacks using Latent Semantic Disentanglement](https://arxiv.org//abs/2404.12679) ++ [ MLSD-GAN -- Generating Strong High Quality Face Morphing Attacks using Latent Semantic Disentanglement](https://arxiv.org/abs/2404.12679) Aravinda Reddy PN, Raghavendra Ramachandra, Krothapalli Sreenivasa Rao, Pabitra Mitra -+ [ PATE-TripleGAN: Privacy-Preserving Image Synthesis with Gaussian Differential Privacy](https://arxiv.org//abs/2404.12730) ++ [ PATE-TripleGAN: Privacy-Preserving Image Synthesis with Gaussian Differential Privacy](https://arxiv.org/abs/2404.12730) Zepeng Jiang, Weiwei Ni, Yifan Zhang -+ [ Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images](https://arxiv.org//abs/2404.12908) ++ [ Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images](https://arxiv.org/abs/2404.12908) Santosh, Li Lin, Irene Amerini, Xin Wang, Shu Hu -+ [ SA-Attack: Speed-adaptive stealthy adversarial attack on trajectory prediction](https://arxiv.org//abs/2404.12612) ++ [ SA-Attack: Speed-adaptive stealthy adversarial attack on trajectory prediction](https://arxiv.org/abs/2404.12612) Huilin Yin, Jiaxiang Li, Pengju Zhen, Jun Yan -+ [ LSP Framework: A Compensatory Model for Defeating Trigger Reverse Engineering via Label Smoothing Poisoning](https://arxiv.org//abs/2404.12852) ++ [ LSP Framework: A Compensatory Model for Defeating Trigger Reverse Engineering via Label Smoothing Poisoning](https://arxiv.org/abs/2404.12852) Beichen Li, Yuanfang Guo, Heqi Peng, Yangxi Li, Yunhong Wang -+ [ Defending against Data Poisoning Attacks in Federated Learning via User Elimination](https://arxiv.org//abs/2404.12778) ++ [ Defending against Data Poisoning Attacks in Federated Learning via User Elimination](https://arxiv.org/abs/2404.12778) Nick Galanis -+ [ The Power of Words: Generating PowerShell Attacks from Natural Language](https://arxiv.org//abs/2404.12893) ++ [ The Power of Words: Generating PowerShell Attacks from Natural Language](https://arxiv.org/abs/2404.12893) Pietro Liguori, Christian Marescalco, Roberto Natella, Vittorio Orbinato, Luciano Pianese -+ [ Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models](https://arxiv.org//abs/2404.12916) ++ [ Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models](https://arxiv.org/abs/2404.12916) Zhenyang Ni, Rui Ye, Yuxi Wei, Zhen Xiang, Yanfeng Wang, Siheng Chen -+ [ Privacy-Preserving Debiasing using Data Augmentation and Machine Unlearning](https://arxiv.org//abs/2404.13194) ++ [ Privacy-Preserving Debiasing using Data Augmentation and Machine Unlearning](https://arxiv.org/abs/2404.13194) Zhixin Pan, Emma Andrews, Laura Chang, Prabhat Mishra -+ [ DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection](https://arxiv.org//abs/2404.13146) ++ [ DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection](https://arxiv.org/abs/2404.13146) Shuwei Hou, Yan Ju, Chengzhe Sun, Shan Jia, Lipeng Ke, Riky Zhou, Anita Nikolich, Siwei Lyu -+ [ CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models](https://arxiv.org//abs/2404.13161) ++ [ CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models](https://arxiv.org/abs/2404.13161) Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil, David Molnar, Spencer Whitman, Joshua Saxe # 2024-04-18 -+ [ Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors](https://arxiv.org//abs/2404.12120) ++ [ Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors](https://arxiv.org/abs/2404.12120) Raz Lapid, Almog Dubin, Moshe Sipper -+ [ Proteus: Preserving Model Confidentiality during Graph Optimizations](https://arxiv.org//abs/2404.12512) ++ [ Proteus: Preserving Model Confidentiality during Graph Optimizations](https://arxiv.org/abs/2404.12512) Yubo Gao, Maryam Haghifam, Christina Giannoula, Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar -+ [ Introducing v0.5 of the AI Safety Benchmark from MLCommons](https://arxiv.org//abs/2404.12241) ++ [ Introducing v0.5 of the AI Safety Benchmark from MLCommons](https://arxiv.org/abs/2404.12241) Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse Khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, et al. (46 additional authors not shown) -+ [ Advancing the Robustness of Large Language Models through Self-Denoised Smoothing](https://arxiv.org//abs/2404.12274) ++ [ Advancing the Robustness of Large Language Models through Self-Denoised Smoothing](https://arxiv.org/abs/2404.12274) Jiabao Ji, Bairu Hou, Zhen Zhang, Guanhua Zhang, Wenqi Fan, Qing Li, Yang Zhang, Gaowen Liu, Sijia Liu, Shiyu Chang -+ [ Enhance Robustness of Language Models Against Variation Attack through Graph Integration](https://arxiv.org//abs/2404.12014) ++ [ Enhance Robustness of Language Models Against Variation Attack through Graph Integration](https://arxiv.org/abs/2404.12014) Zi Xiong, Lizhi Qing, Yangyang Kang, Jiawei Liu, Hongsong Li, Changlong Sun, Xiaozhong Liu, Wei Lu -+ [ Uncovering Safety Risks in Open-source LLMs through Concept Activation Vector](https://arxiv.org//abs/2404.12038) ++ [ Uncovering Safety Risks in Open-source LLMs through Concept Activation Vector](https://arxiv.org/abs/2404.12038) Zhihao Xu, Ruixuan Huang, Xiting Wang, Fangzhao Wu, Jing Yao, Xing Xie -+ [ Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement](https://arxiv.org//abs/2404.11819) ++ [ Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement](https://arxiv.org/abs/2404.11819) Pushkar Shukla, Dhruv Srikanth, Lee Cohen, Matthew Turk -+ [ FedMID: A Data-Free Method for Using Intermediate Outputs as a Defense Mechanism Against Poisoning Attacks in Federated Learning](https://arxiv.org//abs/2404.11905) ++ [ FedMID: A Data-Free Method for Using Intermediate Outputs as a Defense Mechanism Against Poisoning Attacks in Federated Learning](https://arxiv.org/abs/2404.11905) Sungwon Han, Hyeonho Song, Sungwon Park, Meeyoung Cha -+ [ A Mean-Field Analysis of Neural Gradient Descent-Ascent: Applications to Functional Conditional Moment Equations](https://arxiv.org//abs/2404.12312) ++ [ A Mean-Field Analysis of Neural Gradient Descent-Ascent: Applications to Functional Conditional Moment Equations](https://arxiv.org/abs/2404.12312) Yuchen Zhu, Yufeng Zhang, Zhaoran Wang, Zhuoran Yang, Xiaohong Chen -+ [ KDk: A Defense Mechanism Against Label Inference Attacks in Vertical Federated Learning](https://arxiv.org//abs/2404.12369) ++ [ KDk: A Defense Mechanism Against Label Inference Attacks in Vertical Federated Learning](https://arxiv.org/abs/2404.12369) Marco Arazzi, Serena Nicolazzo, Antonino Nocera # 2024-04-17 -+ [ TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment](https://arxiv.org//abs/2404.11121) ++ [ TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment](https://arxiv.org/abs/2404.11121) Qinfeng Li, Zhiqiang Shen, Zhenghan Qin, Yangfan Xie, Xuhong Zhang, Tianyu Du, Jianwei Yin -+ [ Sampling-based Pseudo-Likelihood for Membership Inference Attacks](https://arxiv.org//abs/2404.11262) ++ [ Sampling-based Pseudo-Likelihood for Membership Inference Attacks](https://arxiv.org/abs/2404.11262) Masahiro Kaneko, Youmi Ma, Yuki Wata, Naoaki Okazaki -+ [ A Federated Learning Approach to Privacy Preserving Offensive Language Identification](https://arxiv.org//abs/2404.11470) ++ [ A Federated Learning Approach to Privacy Preserving Offensive Language Identification](https://arxiv.org/abs/2404.11470) Marcos Zampieri, Damith Premasiri, Tharindu Ranasinghe -+ [ GenFighter: A Generative and Evolutive Textual Attack Removal](https://arxiv.org//abs/2404.11538) ++ [ GenFighter: A Generative and Evolutive Textual Attack Removal](https://arxiv.org/abs/2404.11538) Md Athikul Islam, Edoardo Serra, Sushil Jajodia -+ [ The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data](https://arxiv.org//abs/2404.11265) ++ [ The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data](https://arxiv.org/abs/2404.11265) Zixuan Zhu, Rui Wang, Cong Zou, Lihua Jing -+ [ Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness](https://arxiv.org//abs/2404.11357) ++ [ Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness](https://arxiv.org/abs/2404.11357) Hangtao Zhang, Shengshan Hu, Yichen Wang, Leo Yu Zhang, Ziqi Zhou, Xianlong Wang, Yanjun Zhang, Chao Chen -+ [ Clipped SGD Algorithms for Privacy Preserving Performative Prediction: Bias Amplification and Remedies](https://arxiv.org//abs/2404.10995) ++ [ Clipped SGD Algorithms for Privacy Preserving Performative Prediction: Bias Amplification and Remedies](https://arxiv.org/abs/2404.10995) Qiang Li, Michal Yemini, Hoi-To Wai -+ [ Exploring DNN Robustness Against Adversarial Attacks Using Approximate Multipliers](https://arxiv.org//abs/2404.11665) ++ [ Exploring DNN Robustness Against Adversarial Attacks Using Approximate Multipliers](https://arxiv.org/abs/2404.11665) Mohammad Javad Askarizadeh, Ebrahim Farahmand, Jorge Castro-Godinez, Ali Mahani, Laura Cabrera-Quiros, Carlos Salazar-Garcia -+ [ A Secure and Trustworthy Network Architecture for Federated Learning Healthcare Applications](https://arxiv.org//abs/2404.11698) ++ [ A Secure and Trustworthy Network Architecture for Federated Learning Healthcare Applications](https://arxiv.org/abs/2404.11698) Antonio Boiano, Marco Di Gennaro, Luca Barbieri, Michele Carminati, Monica Nicoli, Alessandro Redondi, Stefano Savazzi, Albert Sund Aillet, Diogo Reis Santos, Luigi Serio # 2024-04-16 -+ [ Private Attribute Inference from Images with Vision-Language Models](https://arxiv.org//abs/2404.10618) ++ [ Private Attribute Inference from Images with Vision-Language Models](https://arxiv.org/abs/2404.10618) Batuhan Tömekçe, Mark Vero, Robin Staab, Martin Vechev -+ [ Towards a Novel Perspective on Adversarial Examples Driven by Frequency](https://arxiv.org//abs/2404.10202) ++ [ Towards a Novel Perspective on Adversarial Examples Driven by Frequency](https://arxiv.org/abs/2404.10202) Zhun Zhang, Yi Zeng, Qihe Liu, Shijie Zhou -+ [ Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning](https://arxiv.org//abs/2404.10552) ++ [ Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning](https://arxiv.org/abs/2404.10552) Xiao Wang, Tianze Chen, Xianjun Yang, Qi Zhang, Xun Zhao, Dahua Lin -+ [ Self-playing Adversarial Language Game Enhances LLM Reasoning](https://arxiv.org//abs/2404.10642) ++ [ Self-playing Adversarial Language Game Enhances LLM Reasoning](https://arxiv.org/abs/2404.10642) Pengyu Cheng, Tianhao Hu, Han Xu, Zhisong Zhang, Yong Dai, Lei Han, Nan Du -+ [ Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models](https://arxiv.org//abs/2404.10335) ++ [ Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models](https://arxiv.org/abs/2404.10335) Qi Guo, Shanmin Pang, Xiaojun Jia, Qing Guo -+ [ Adversarial Identity Injection for Semantic Face Image Synthesis](https://arxiv.org//abs/2404.10408) ++ [ Adversarial Identity Injection for Semantic Face Image Synthesis](https://arxiv.org/abs/2404.10408) Giuseppe Tarollo, Tomaso Fontanini, Claudio Ferrari, Guido Borghi, Andrea Prati -+ [ Do Counterfactual Examples Complicate Adversarial Training?](https://arxiv.org//abs/2404.10588) ++ [ Do Counterfactual Examples Complicate Adversarial Training?](https://arxiv.org/abs/2404.10588) Eric Yeats, Cameron Darwin, Eduardo Ortega, Frank Liu, Hai Li -+ [ Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback](https://arxiv.org//abs/2404.10776) ++ [ Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback](https://arxiv.org/abs/2404.10776) Qiwei Di, Jiafan He, Quanquan Gu -+ [ Differentially Private Optimization with Sparse Gradients](https://arxiv.org//abs/2404.10881) ++ [ Differentially Private Optimization with Sparse Gradients](https://arxiv.org/abs/2404.10881) Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Ravi Kumar, Pasin Manurangsi # 2024-04-15 -+ [ Privacy at a Price: Exploring its Dual Impact on AI Fairness](https://arxiv.org//abs/2404.09391) ++ [ Privacy at a Price: Exploring its Dual Impact on AI Fairness](https://arxiv.org/abs/2404.09391) Mengmeng Yang, Ming Ding, Youyang Qu, Wei Ni, David Smith, Thierry Rakotoarivelo -+ [ Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models](https://arxiv.org//abs/2404.09401) ++ [ Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models](https://arxiv.org/abs/2404.09401) Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka -+ [ Improving Weakly-Supervised Object Localization Using Adversarial Erasing and Pseudo Label](https://arxiv.org//abs/2404.09475) ++ [ Improving Weakly-Supervised Object Localization Using Adversarial Erasing and Pseudo Label](https://arxiv.org/abs/2404.09475) Byeongkeun Kang, Sinhae Cha, Yeejin Lee -+ [ Beyond Noise: Privacy-Preserving Decentralized Learning with Virtual Nodes](https://arxiv.org//abs/2404.09536) ++ [ Beyond Noise: Privacy-Preserving Decentralized Learning with Virtual Nodes](https://arxiv.org/abs/2404.09536) Sayan Biswas, Mathieu Even, Anne-Marie Kermarrec, Laurent Massoulie, Rafael Pires, Rishi Sharma, Martijn de Vos -+ [ Privacy-Preserving Intrusion Detection using Convolutional Neural Networks](https://arxiv.org//abs/2404.09625) ++ [ Privacy-Preserving Intrusion Detection using Convolutional Neural Networks](https://arxiv.org/abs/2404.09625) Martin Kodys, Zhongmin Dai, Vrizlynn L. L. Thing -+ [ Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing](https://arxiv.org//abs/2404.09586) ++ [ Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing](https://arxiv.org/abs/2404.09586) Song Xia, Yu Yi, Xudong Jiang, Henghui Ding -+ [ Ti-Patch: Tiled Physical Adversarial Patch for no-reference video quality metrics](https://arxiv.org//abs/2404.09961) ++ [ Ti-Patch: Tiled Physical Adversarial Patch for no-reference video quality metrics](https://arxiv.org/abs/2404.09961) Victoria Leonenkova, Ekaterina Shumitskaya, Anastasia Antsiferova, Dmitriy Vatolin -+ [ On the Efficiency of Privacy Attacks in Federated Learning](https://arxiv.org//abs/2404.09430) ++ [ On the Efficiency of Privacy Attacks in Federated Learning](https://arxiv.org/abs/2404.09430) Nawrin Tabassum, Ka-Ho Chow, Xuyu Wang, Wenbin Zhang, Yanzhao Wu -+ [ Privacy-Preserving Federated Unlearning with Certified Client Removal](https://arxiv.org//abs/2404.09724) ++ [ Privacy-Preserving Federated Unlearning with Certified Client Removal](https://arxiv.org/abs/2404.09724) Ziyao Liu, Huanyi Ye, Yu Jiang, Jiyuan Shen, Jiale Guo, Ivan Tjuawinata, Kwok-Yan Lam -+ [ Deceiving to Enlighten: Coaxing LLMs to Self-Reflection for Enhanced Bias Detection and Mitigation](https://arxiv.org//abs/2404.10160) ++ [ Deceiving to Enlighten: Coaxing LLMs to Self-Reflection for Enhanced Bias Detection and Mitigation](https://arxiv.org/abs/2404.10160) Ruoxi Cheng, Haoxuan Ma, Shuirong Cao -+ [ AIGeN: An Adversarial Approach for Instruction Generation in VLN](https://arxiv.org//abs/2404.10054) ++ [ AIGeN: An Adversarial Approach for Instruction Generation in VLN](https://arxiv.org/abs/2404.10054) Niyati Rawal, Roberto Bigazzi, Lorenzo Baraldi, Rita Cucchiara -+ [ Black-box Adversarial Transferability: An Empirical Study in Cybersecurity Perspective](https://arxiv.org//abs/2404.10796) ++ [ Black-box Adversarial Transferability: An Empirical Study in Cybersecurity Perspective](https://arxiv.org/abs/2404.10796) Khushnaseeb Roshan, Aasim Zafar # 2024-04-14 -+ [ Make Split, not Hijack: Preventing Feature-Space Hijacking Attacks in Split Learning](https://arxiv.org//abs/2404.09265) ++ [ Make Split, not Hijack: Preventing Feature-Space Hijacking Attacks in Split Learning](https://arxiv.org/abs/2404.09265) Tanveer Khan, Mindaugas Budzys, Antonis Michalas -+ [ FaceCat: Enhancing Face Recognition Security with a Unified Generative Model Framework](https://arxiv.org//abs/2404.09193) ++ [ FaceCat: Enhancing Face Recognition Security with a Unified Generative Model Framework](https://arxiv.org/abs/2404.09193) Jiawei Chen, Xiao Yang, Yinpeng Dong, Hang Su, Jianteng Peng, Zhaoxia Yin -+ [ Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies](https://arxiv.org//abs/2404.09349) ++ [ Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies](https://arxiv.org/abs/2404.09349) Brian R. Bartoldson, James Diffenderfer, Konstantinos Parasyris, Bhavya Kailkhura # 2024-04-13 -+ [ Proof-of-Learning with Incentive Security](https://arxiv.org//abs/2404.09005) ++ [ Proof-of-Learning with Incentive Security](https://arxiv.org/abs/2404.09005) Zishuo Zhao, Zhixuan Fang, Xuechao Wang, Yuan Zhou -+ [ CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants](https://arxiv.org//abs/2404.09066) ++ [ CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants](https://arxiv.org/abs/2404.09066) Amit Finkman, Eden Bar-Kochva, Avishag Shapira, Dudu Mimran, Yuval Elovici, Asaf Shabtai -+ [ Stability and Generalization in Free Adversarial Training](https://arxiv.org//abs/2404.08980) ++ [ Stability and Generalization in Free Adversarial Training](https://arxiv.org/abs/2404.08980) Xiwei Cheng, Kexin Fu, Farzan Farnia -+ [ Multimodal Attack Detection for Action Recognition Models](https://arxiv.org//abs/2404.10790) ++ [ Multimodal Attack Detection for Action Recognition Models](https://arxiv.org/abs/2404.10790) Furkan Mumcu, Yasin Yilmaz # 2024-04-12 -+ [ A Survey of Neural Network Robustness Assessment in Image Recognition](https://arxiv.org//abs/2404.08285) ++ [ A Survey of Neural Network Robustness Assessment in Image Recognition](https://arxiv.org/abs/2404.08285) Jie Wang, Jun Ai, Minyan Lu, Haoran Su, Dan Yu, Yutao Zhang, Junda Zhu, Jingyu Liu -+ [ Adversarial Imitation Learning via Boosting](https://arxiv.org//abs/2404.08513) ++ [ Adversarial Imitation Learning via Boosting](https://arxiv.org/abs/2404.08513) Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun -+ [ VertAttack: Taking advantage of Text Classifiers' horizontal vision](https://arxiv.org//abs/2404.08538) ++ [ VertAttack: Taking advantage of Text Classifiers' horizontal vision](https://arxiv.org/abs/2404.08538) Jonathan Rusert -+ [ Practical Region-level Attack against Segment Anything Models](https://arxiv.org//abs/2404.08255) ++ [ Practical Region-level Attack against Segment Anything Models](https://arxiv.org/abs/2404.08255) Yifan Shen, Zhengyuan Li, Gang Wang -+ [ Struggle with Adversarial Defense? Try Diffusion](https://arxiv.org//abs/2404.08273) ++ [ Struggle with Adversarial Defense? Try Diffusion](https://arxiv.org/abs/2404.08273) Yujie Li, Yanbin Wang, Haitao xu, Bin Liu, Jianguo Sun, Zhenhao Guo, Wenrui Ma -+ [ Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts](https://arxiv.org//abs/2404.08341) ++ [ Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts](https://arxiv.org/abs/2404.08341) Yang Li, Songlin Yang, Wei Wang, Ziwen He, Bo Peng, Jing Dong -+ [ Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues](https://arxiv.org//abs/2404.08450) ++ [ Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues](https://arxiv.org/abs/2404.08450) Xianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Hui Ma, Binjie Mao, Xi Li, Yao Wang, Pengfei Yan, Ajian Liu -+ [ On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation](https://arxiv.org//abs/2404.08540) ++ [ On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation](https://arxiv.org/abs/2404.08540) Agneet Chatterjee, Tejas Gokhale, Chitta Baral, Yezhou Yang -+ [ Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing](https://arxiv.org//abs/2404.08444) ++ [ Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing](https://arxiv.org/abs/2404.08444) Cui Zhang, Xiao Xu, Qiong Wu, Pingyi Fan, Qiang Fan, Huiling Zhu, Jiangzhou Wang -+ [ FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models](https://arxiv.org//abs/2404.08631) ++ [ FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models](https://arxiv.org/abs/2404.08631) Yanting Wang, Wei Zou, Jinyuan Jia -+ [ LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models](https://arxiv.org//abs/2404.08847) ++ [ LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models](https://arxiv.org/abs/2404.08847) Juntaek Lim, Youngeun Kwon, Ranggi Hwang, Kiwan Maeng, G. Edward Suh, Minsoo Rhu -+ [ PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis](https://arxiv.org//abs/2404.10789) ++ [ PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis](https://arxiv.org/abs/2404.10789) Dipkamal Bhusal, Md Tanvirul Alam, Monish K. Veerabhadran, Michael Clifford, Sara Rampazzi, Nidhi Rastogi -+ [JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models](https://arxiv.org//abs/2404.08793) ++ [JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models](https://arxiv.org/abs/2404.08793) Yingchaojie Feng, Zhizhang Chen, Zhining Kang, Sijia Wang, Haoyu Tian, Wei Zhang, Minfeng Zhu, Wei Chen # 2024-04-11 -+ [ Differentially Private GANs for Generating Synthetic Indoor Location Data](https://arxiv.org//abs/2404.07366) ++ [ Differentially Private GANs for Generating Synthetic Indoor Location Data](https://arxiv.org/abs/2404.07366) Vahideh Moghtadaiee, Mina Alishahi, Milad Rabiei -+ [ Differentially Private Reinforcement Learning with Self-Play](https://arxiv.org//abs/2404.07559) ++ [ Differentially Private Reinforcement Learning with Self-Play](https://arxiv.org/abs/2404.07559) Dan Qiao, Yu-Xiang Wang -+ [ Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing](https://arxiv.org//abs/2404.07572) ++ [ Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing](https://arxiv.org/abs/2404.07572) ZhenZhe Gao, Zhenjun Tang, Zhaoxia Yin, Baoyuan Wu, Yue Lu -+ [ AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs](https://arxiv.org//abs/2404.07921) ++ [ AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs](https://arxiv.org/abs/2404.07921) Zeyi Liao, Huan Sun -+ [ Privacy preserving layer partitioning for Deep Neural Network models](https://arxiv.org//abs/2404.07437) ++ [ Privacy preserving layer partitioning for Deep Neural Network models](https://arxiv.org/abs/2404.07437) Kishore Rajasekar, Randolph Loh, Kar Wai Fok, Vrizlynn L. L. Thing -+ [ Enhancing Network Intrusion Detection Performance using Generative Adversarial Networks](https://arxiv.org//abs/2404.07464) ++ [ Enhancing Network Intrusion Detection Performance using Generative Adversarial Networks](https://arxiv.org/abs/2404.07464) Xinxing Zhao, Kar Wai Fok, Vrizlynn L. L. Thing -+ [ Backdoor Contrastive Learning via Bi-level Trigger Optimization](https://arxiv.org//abs/2404.07863) ++ [ Backdoor Contrastive Learning via Bi-level Trigger Optimization](https://arxiv.org/abs/2404.07863) Weiyu Sun, Xinyu Zhang, Hao Lu, Yingcong Chen, Ting Wang, Jinghui Chen, Lu Lin -+ [ Latent Guard: a Safety Framework for Text-to-image Generation](https://arxiv.org//abs/2404.08031) ++ [ Latent Guard: a Safety Framework for Text-to-image Generation](https://arxiv.org/abs/2404.08031) Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, Fabio Pizzati -+ [ LLM Agents can Autonomously Exploit One-day Vulnerabilities](https://arxiv.org//abs/2404.08144) ++ [ LLM Agents can Autonomously Exploit One-day Vulnerabilities](https://arxiv.org/abs/2404.08144) Richard Fang, Rohan Bindu, Akul Gupta, Daniel Kang -+ [ Persistent Classification: A New Approach to Stability of Data and Adversarial Examples](https://arxiv.org//abs/2404.08069) ++ [ Persistent Classification: A New Approach to Stability of Data and Adversarial Examples](https://arxiv.org/abs/2404.08069) Brian Bell, Michael Geyer, David Glickenstein, Keaton Hamm, Carlos Scheidegger, Amanda Fernandez, Juston Moore -+ [ Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization](https://arxiv.org//abs/2404.08154) ++ [ Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization](https://arxiv.org/abs/2404.08154) Runqi Lin, Chaojian Yu, Tongliang Liu -+ [ CodeFort: Robust Training for Code Generation Models](https://arxiv.org//abs/2405.01567) ++ [ CodeFort: Robust Training for Code Generation Models](https://arxiv.org/abs/2405.01567) Yuhao Zhang, Shiqi Wang, Haifeng Qian, Zijian Wang, Mingyue Shang, Linbo Liu, Sanjay Krishna Gouda, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, Anoop Deoras # 2024-04-10 -+ [ Towards a Game-theoretic Understanding of Explanation-based Membership Inference Attacks](https://arxiv.org//abs/2404.07139) ++ [ Towards a Game-theoretic Understanding of Explanation-based Membership Inference Attacks](https://arxiv.org/abs/2404.07139) Kavita Kumari, Murtuza Jadliwala, Sumit Kumar Jha, Anindya Maiti -+ [ SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models](https://arxiv.org//abs/2404.06666) ++ [ SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models](https://arxiv.org/abs/2404.06666) Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, Wenyuan Xu -+ [ How to Craft Backdoors with Unlabeled Data Alone?](https://arxiv.org//abs/2404.06694) ++ [ How to Craft Backdoors with Unlabeled Data Alone?](https://arxiv.org/abs/2404.06694) Yifei Wang, Wenhan Ma, Yisen Wang -+ [ Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data](https://arxiv.org//abs/2404.06776) ++ [ Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data](https://arxiv.org/abs/2404.06776) Yu Qiao, Chaoning Zhang, Apurba Adhikary, Choong Seon Hong -+ [ Adversarial purification for no-reference image-quality metrics: applicability study and new methods](https://arxiv.org//abs/2404.06957) ++ [ Adversarial purification for no-reference image-quality metrics: applicability study and new methods](https://arxiv.org/abs/2404.06957) Aleksandr Gushchin, Anna Chistyakova, Vladislav Minashkin, Anastasia Antsiferova, Dmitriy Vatolin -+ [ Simpler becomes Harder: Do LLMs Exhibit a Coherent Behavior on Simplified Corpora?](https://arxiv.org//abs/2404.06838) ++ [ Simpler becomes Harder: Do LLMs Exhibit a Coherent Behavior on Simplified Corpora?](https://arxiv.org/abs/2404.06838) Miriam Anschütz, Edoardo Mosca, Georg Groh -+ [ Poisoning Prevention in Federated Learning and Differential Privacy via Stateful Proofs of Execution](https://arxiv.org//abs/2404.06721) ++ [ Poisoning Prevention in Federated Learning and Differential Privacy via Stateful Proofs of Execution](https://arxiv.org/abs/2404.06721) Norrathep Rattanavipanon, Ivan de Oliviera Nunes -+ [A Survey and Future Outlook on Indoor Location Fingerprinting Privacy Preservation](https://arxiv.org//abs/2404.07345) ++ [A Survey and Future Outlook on Indoor Location Fingerprinting Privacy Preservation](https://arxiv.org/abs/2404.07345) Amir Fathalizadeh, Vahideh Moghtadaiee, Mina Alishahi # 2024-04-09 -+ [ Differential Privacy for Anomaly Detection: Analyzing the Trade-off Between Privacy and Explainability](https://arxiv.org//abs/2404.06144) ++ [ Differential Privacy for Anomaly Detection: Analyzing the Trade-off Between Privacy and Explainability](https://arxiv.org/abs/2404.06144) Fatima Ezzeddine, Mirna Saad, Omran Ayoub, Davide Andreoletti, Martin Gjoreski, Ihab Sbeity, Marc Langheinrich, Silvia Giordano -+ [ LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks](https://arxiv.org//abs/2404.06247) ++ [ LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks](https://arxiv.org/abs/2404.06247) Jianlang Chen, Xuhong Ren, Qing Guo, Felix Juefei-Xu, Di Lin, Wei Feng, Lei Ma, Jianjun Zhao -+ [ On adversarial training and the 1 Nearest Neighbor classifier](https://arxiv.org//abs/2404.06313) ++ [ On adversarial training and the 1 Nearest Neighbor classifier](https://arxiv.org/abs/2404.06313) Amir Hagai, Yair Weiss -+ [ Towards Robust Domain Generation Algorithm Classification](https://arxiv.org//abs/2404.06236) ++ [ Towards Robust Domain Generation Algorithm Classification](https://arxiv.org/abs/2404.06236) Arthur Drichel, Marc Meyer, Ulrike Meyer -+ [ Privacy-preserving Scanpath Comparison for Pervasive Eye Tracking](https://arxiv.org//abs/2404.06216) ++ [ Privacy-preserving Scanpath Comparison for Pervasive Eye Tracking](https://arxiv.org/abs/2404.06216) Suleyman Ozdel, Efe Bozkir, Enkelejda Kasneci -+ [ Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs](https://arxiv.org//abs/2404.07242) ++ [ Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs](https://arxiv.org/abs/2404.07242) Bibek Upadhayay, Vahid Behzadan -+ [ Towards Building a Robust Toxicity Predictor](https://arxiv.org//abs/2404.08690) ++ [ Towards Building a Robust Toxicity Predictor](https://arxiv.org/abs/2404.08690) Dmitriy Bespalov, Sourav Bhabesh, Yi Xiang, Liutong Zhou, Yanjun Qi # 2024-04-08 -+ [ SoK: Gradient Leakage in Federated Learning](https://arxiv.org//abs/2404.05403) ++ [ SoK: Gradient Leakage in Federated Learning](https://arxiv.org/abs/2404.05403) Jiacheng Du, Jiahui Hu, Zhibo Wang, Peng Sun, Neil Zhenqiang Gong, Kui Ren -+ [ Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data](https://arxiv.org//abs/2404.05530) ++ [ Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data](https://arxiv.org/abs/2404.05530) Tim Baumgärtner, Yang Gao, Dana Alon, Donald Metzler -+ [ Investigating the Impact of Quantization on Adversarial Robustness](https://arxiv.org//abs/2404.05639) ++ [ Investigating the Impact of Quantization on Adversarial Robustness](https://arxiv.org/abs/2404.05639) Qun Li, Yuan Meng, Chen Tang, Jiacheng Jiang, Zhi Wang -+ [ David and Goliath: An Empirical Evaluation of Attacks and Defenses for QNNs at the Deep Edge](https://arxiv.org//abs/2404.05688) ++ [ David and Goliath: An Empirical Evaluation of Attacks and Defenses for QNNs at the Deep Edge](https://arxiv.org/abs/2404.05688) Miguel Costa, Sandro Pinto -+ [ Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods](https://arxiv.org//abs/2404.05159) ++ [ Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods](https://arxiv.org/abs/2404.05159) Roopkatha Dey, Aivy Debnath, Sayak Kumar Dutta, Kaustav Ghosh, Arijit Mitra, Arghya Roy Chowdhury, Jaydip Sen -+ [ Out-of-Distribution Data: An Acquaintance of Adversarial Examples -- A Survey](https://arxiv.org//abs/2404.05219) ++ [ Out-of-Distribution Data: An Acquaintance of Adversarial Examples -- A Survey](https://arxiv.org/abs/2404.05219) Naveen Karunanayake, Ravin Gunawardena, Suranga Seneviratne, Sanjay Chawla -+ [ BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack](https://arxiv.org//abs/2404.05311) ++ [ BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack](https://arxiv.org/abs/2404.05311) Viet Quoc Vo, Ehsan Abbasnejad, Damith C. Ranasinghe -+ [ Certified PEFTSmoothing: Parameter-Efficient Fine-Tuning with Randomized Smoothing](https://arxiv.org//abs/2404.05350) ++ [ Certified PEFTSmoothing: Parameter-Efficient Fine-Tuning with Randomized Smoothing](https://arxiv.org/abs/2404.05350) Chengyan Fu, Wenjie Wang -+ [ Flexible Fairness Learning via Inverse Conditional Permutation](https://arxiv.org//abs/2404.05678) ++ [ Flexible Fairness Learning via Inverse Conditional Permutation](https://arxiv.org/abs/2404.05678) Yuheng Lai, Leying Guan -+ [ Enabling Privacy-Preserving Cyber Threat Detection with Federated Learning](https://arxiv.org//abs/2404.05130) ++ [ Enabling Privacy-Preserving Cyber Threat Detection with Federated Learning](https://arxiv.org/abs/2404.05130) Yu Bi, Yekai Li, Xuan Feng, Xianghang Mi -+ [ Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning](https://arxiv.org//abs/2404.05868) ++ [ Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning](https://arxiv.org/abs/2404.05868) Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei -+ [ Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge](https://arxiv.org//abs/2404.05880) ++ [ Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge](https://arxiv.org/abs/2404.05880) Weikai Lu, Ziqian Zeng, Jianwei Wang, Zhengdong Lu, Zelin Chen, Huiping Zhuang, Cen Chen -+ [ Privacy-Preserving Deep Learning Using Deformable Operators for Secure Task Learning](https://arxiv.org//abs/2404.05828) ++ [ Privacy-Preserving Deep Learning Using Deformable Operators for Secure Task Learning](https://arxiv.org/abs/2404.05828) Fabian Perez, Jhon Lopez, Henry Arguello -+ [ Quantum Adversarial Learning for Kernel Methods](https://arxiv.org//abs/2404.05824) ++ [ Quantum Adversarial Learning for Kernel Methods](https://arxiv.org/abs/2404.05824) Giuseppe Montalbano, Leonardo Banchi # 2024-04-07 -+ [ Inference-Time Rule Eraser: Distilling and Removing Bias Rules to Mitigate Bias in Deployed Models](https://arxiv.org//abs/2404.04814) ++ [ Inference-Time Rule Eraser: Distilling and Removing Bias Rules to Mitigate Bias in Deployed Models](https://arxiv.org/abs/2404.04814) Yi Zhang, Jitao Sang -+ [ Hidden You Malicious Goal Into Benigh Narratives: Jailbreak Large Language Models through Logic Chain Injection](https://arxiv.org//abs/2404.04849) ++ [ Hidden You Malicious Goal Into Benigh Narratives: Jailbreak Large Language Models through Logic Chain Injection](https://arxiv.org/abs/2404.04849) Zhilong Wang, Yebo Cao, Peng Liu -+ [ SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials](https://arxiv.org//abs/2404.04963) ++ [ SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials](https://arxiv.org/abs/2404.04963) Mael Jullien, Marco Valentino, André Freitas -+ [ How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations?](https://arxiv.org//abs/2404.05088) ++ [ How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations?](https://arxiv.org/abs/2404.05088) Ishani Mondal, Abhilasha Sancheti -+ [ Privacy-Preserving Traceable Functional Encryption for Inner Product](https://arxiv.org//abs/2404.04861) ++ [ Privacy-Preserving Traceable Functional Encryption for Inner Product](https://arxiv.org/abs/2404.04861) Muyao Qiu, Jinguang Han # 2024-04-06 -+ [ Trustless Audits without Revealing Data or Models](https://arxiv.org//abs/2404.04500) ++ [ Trustless Audits without Revealing Data or Models](https://arxiv.org/abs/2404.04500) Suppakit Waiwitlikhit, Ion Stoica, Yi Sun, Tatsunori Hashimoto, Daniel Kang -+ [ Data Poisoning Attacks on Off-Policy Policy Evaluation Methods](https://arxiv.org//abs/2404.04714) ++ [ Data Poisoning Attacks on Off-Policy Policy Evaluation Methods](https://arxiv.org/abs/2404.04714) Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju -+ [ D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy](https://arxiv.org//abs/2404.04584) ++ [ D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy](https://arxiv.org/abs/2404.04584) Yongqi Yang, Zhihao Qian, Ye Zhu, Yu Wu -+ [ Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training](https://arxiv.org//abs/2404.04647) ++ [ Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training](https://arxiv.org/abs/2404.04647) Shizhan Gong, Qi Dou, Farzan Farnia -+ [ CANEDERLI: On The Impact of Adversarial Training and Transferability on CAN Intrusion Detection Systems](https://arxiv.org//abs/2404.04648) ++ [ CANEDERLI: On The Impact of Adversarial Training and Transferability on CAN Intrusion Detection Systems](https://arxiv.org/abs/2404.04648) Francesco Marchiori, Mauro Conti -+ [ Goal-guided Generative Prompt Injection Attack on Large Language Models](https://arxiv.org//abs/2404.07234) ++ [ Goal-guided Generative Prompt Injection Attack on Large Language Models](https://arxiv.org/abs/2404.07234) Chong Zhang, Mingyu Jin, Qinkai Yu, Chengzhi Liu, Haochen Xue, Xiaobo Jin -+ [ ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming](https://arxiv.org//abs/2404.08676) ++ [ ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming](https://arxiv.org/abs/2404.08676) Simone Tedeschi, Felix Friedrich, Patrick Schramowski, Kristian Kersting, Roberto Navigli, Huu Nguyen, Bo Li ## 2024-04-05 -+ [ Precision Guided Approach to Mitigate Data Poisoning Attacks in Federated Learning](https://arxiv.org//abs/2404.04139) ++ [ Precision Guided Approach to Mitigate Data Poisoning Attacks in Federated Learning](https://arxiv.org/abs/2404.04139) K Naveen Kumar, C Krishna Mohan, Aravind Machiry -+ [ Watermark-based Detection and Attribution of AI-Generated Content](https://arxiv.org//abs/2404.04254) ++ [ Watermark-based Detection and Attribution of AI-Generated Content](https://arxiv.org/abs/2404.04254) Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Neil Zhenqiang Gong -+ [ Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism](https://arxiv.org//abs/2404.04245) ++ [ Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism](https://arxiv.org/abs/2404.04245) Trilokesh Ranjan Sarkar, Nilanjan Das, Pralay Sankar Maitra, Bijoy Some, Ritwik Saha, Orijita Adhikary, Bishal Bose, Jaydip Sen -+ [ Re-pseudonymization Strategies for Smart Meter Data Are Not Robust to Deep Learning Profiling Attacks](https://arxiv.org//abs/2404.03948) ++ [ Re-pseudonymization Strategies for Smart Meter Data Are Not Robust to Deep Learning Profiling Attacks](https://arxiv.org/abs/2404.03948) Ana-Maria Cretu, Miruna Rusu, Yves-Alexandre de Montjoye -+ [ You Can Use But Cannot Recognize: Preserving Visual Privacy in Deep Neural Networks](https://arxiv.org//abs/2404.04098) ++ [ You Can Use But Cannot Recognize: Preserving Visual Privacy in Deep Neural Networks](https://arxiv.org/abs/2404.04098) Qiushi Li, Yan Zhang, Ju Ren, Qi Li, Yaoxue Zhang -+ [ Increased LLM Vulnerabilities from Fine-tuning and Quantization](https://arxiv.org//abs/2404.04392) ++ [ Increased LLM Vulnerabilities from Fine-tuning and Quantization](https://arxiv.org/abs/2404.04392) Divyanshu Kumar, Anurakt Kumar, Sahil Agarwal, Prashanth Harshangi @@ -33704,36 +33704,36 @@ Xiaosen Wang, Zeyuan Yin Jinyan Su, Terry Yue Zhuo, Jonibek Mansurov, Di Wang, Preslav Nakov # 2024-03-28 -+ [Imperceptible Protection against Style Imitation from Diffusion Models](https://arxiv.org//abs/2403.19254) ++ [Imperceptible Protection against Style Imitation from Diffusion Models](https://arxiv.org/abs/2403.19254) Namhyuk Ahn, Wonhyuk Ahn, KiYoon Yoo, Daesik Kim, Seung-Hun Nam # 2024-03-25 -+ [Bridging Privacy and Robustness for Trustworthy Machine Learning](https://arxiv.org//abs/2403.16591) ++ [Bridging Privacy and Robustness for Trustworthy Machine Learning](https://arxiv.org/abs/2403.16591) Xiaojin Zhang, Wei Chen # 2024-03-24 -+ [Is The Watermarking Of LLM-Generated Code Robust?](https://arxiv.org//abs/2403.17983) ++ [Is The Watermarking Of LLM-Generated Code Robust?](https://arxiv.org/abs/2403.17983) Tarun Suresh, Shubham Ugare, Gagandeep Singh, Sasa Misailovic # 2024-03-23 -+ [Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training](https://arxiv.org//abs/2403.15740) ++ [Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training](https://arxiv.org/abs/2403.15740) Shuai Zhao, Linchao Zhu, Ruijie Quan, Yi Yang # 2024-03-22 -+ [Robust Utility Optimization via a GAN Approach](https://arxiv.org//abs/2403.15243) ++ [Robust Utility Optimization via a GAN Approach](https://arxiv.org/abs/2403.15243) Florian Krach, Josef Teichmann, Hanna Wutte # 2024-03-20 -+ [Certified Human Trajectory Prediction](https://arxiv.org//abs/2403.13778) ++ [Certified Human Trajectory Prediction](https://arxiv.org/abs/2403.13778) Mohammadhossein Bahari, Saeed Saadatnejad, Amirhossein Askari Farsangi, Seyed-Mohsen Moosavi-Dezfooli, Alexandre Alahi -+ [Federated Learning Resilient to Byzantine Attacks and Data Heterogeneity](https://arxiv.org//abs/2403.13374) ++ [Federated Learning Resilient to Byzantine Attacks and Data Heterogeneity](https://arxiv.org/abs/2403.13374) Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Han Hu, Hangguan Shan, Tony Q. S. Quek, Puning Zhao @@ -33743,60 +33743,60 @@ Xiaosen Wang, Zeyuan Yin Sara Abdali, Richard Anarfi, CJ Barberan, Jia He, Erfan Shayegani # 2024-03-17 -+ [Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention](https://arxiv.org//abs/2403.11052) ++ [Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention](https://arxiv.org/abs/2403.11052) Jie Ren, Yaxin Li, Shenglai Zeng, Han Xu, Lingjuan Lyu, Yue Xing, Jiliang Tang # 2024-03-15 -+ [Evasive Active Hypothesis Testing with Deep Neuroevolution: The Single- and Multi-Agent Cases](https://arxiv.org//abs/2403.10112) ++ [Evasive Active Hypothesis Testing with Deep Neuroevolution: The Single- and Multi-Agent Cases](https://arxiv.org/abs/2403.10112) George Stamatelis, Angelos-Nikolaos Kanatas, Ioannis Asprogerakas, George C. Alexandropoulos # 2024-03-06 -+ [Do You Trust Your Model? Emerging Malware Threats in the Deep Learning Ecosystem](https://arxiv.org//abs/2403.03593) ++ [Do You Trust Your Model? Emerging Malware Threats in the Deep Learning Ecosystem](https://arxiv.org/abs/2403.03593) Dorjan Hitaj, Giulio Pagnotta, Fabio De Gaspari, Sediola Ruko, Briland Hitaj, Luigi V. Mancini, Fernando Perez-Cruz # 2024-03-04 -+ [Enhancing Object Detection Robustness: Detecting and Restoring Confidence in the Presence of Adversarial Patch Attacks](https://arxiv.org//abs/2403.12988) ++ [Enhancing Object Detection Robustness: Detecting and Restoring Confidence in the Presence of Adversarial Patch Attacks](https://arxiv.org/abs/2403.12988) Roie Kazoom, Raz Birman, Ofer Hadar # 2024-02-29 -+ [LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem](https://arxiv.org//abs/2403.00108) ++ [LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem](https://arxiv.org/abs/2403.00108) Hongyi Liu, Shaochen Zhong, Xintong Sun, Minghao Tian, Mohsen Hariri, Zirui Liu, Ruixiang Tang, Zhimeng Jiang, Jiayi Yuan, Yu-Neng Chuang, Li Li, Soo-Hyun Choi, Rui Chen, Vipin Chaudhary, Xia Hu # 2024-02-27 -+ [Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates](https://arxiv.org//abs/2402.17390) ++ [Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates](https://arxiv.org/abs/2402.17390) Daniele Angioni, Luca Demetrio, Maura Pintor, Luca Oneto, Davide Anguita, Battista Biggio, Fabio Roli # 2024-02-26 -+ [A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection](https://arxiv.org//abs/2402.17018) ++ [A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection](https://arxiv.org/abs/2402.17018) Leonid Boytsov, Ameya Joshi, Filipe Condessa # 2024-02-21 -+ [Round Trip Translation Defence against Large Language Model Jailbreaking Attacks](https://arxiv.org//abs/2402.13517) ++ [Round Trip Translation Defence against Large Language Model Jailbreaking Attacks](https://arxiv.org/abs/2402.13517) Canaan Yung, Hadi Mohaghegh Dolatabadi, Sarah Erfani, Christopher Leckie # 2024-02-20 -+ [VGMShield: Mitigating Misuse of Video Generative Models](https://arxiv.org//abs/2402.13126) ++ [VGMShield: Mitigating Misuse of Video Generative Models](https://arxiv.org/abs/2402.13126) Yan Pang, Baicheng Chen, Yang Zhang, Tianhao Wang -+ [Revisiting Differentially Private Hyper-parameter Tuning](https://arxiv.org//abs/2402.13087) ++ [Revisiting Differentially Private Hyper-parameter Tuning](https://arxiv.org/abs/2402.13087) Zihang Xiang, Tianhao Wang, Chenglong Wang, Di Wang # 2024-02-14 -+ [How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments?](https://arxiv.org//abs/2402.09546) ++ [How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments?](https://arxiv.org/abs/2402.09546) Congcong Wen, Jiazhao Liang, Shuaihang Yuan, Hao Huang, Geeta Chandra Raju Bethala, Yu-Shen Liu, Mengyu Wang, Anthony Tzes, Yi Fang -+ [Is my Data in your AI Model? Membership Inference Test with Application to Face Images](https://arxiv.org//abs/2402.09225) ++ [Is my Data in your AI Model? Membership Inference Test with Application to Face Images](https://arxiv.org/abs/2402.09225) Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana, Javier Ortega-Garcia @@ -33815,179 +33815,179 @@ Xiaosen Wang, Zeyuan Yin Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H.S. Torr, Lewis Hammond, Christian Schroeder de Witt # 2024-02-08 -+ [Is Adversarial Training with Compressed Datasets Effective?](https://arxiv.org//abs/2402.05675) ++ [Is Adversarial Training with Compressed Datasets Effective?](https://arxiv.org/abs/2402.05675) Tong Chen, Raghavendra Selvan # 2024-02-01 -+ [Benchmarking Spiking Neural Network Learning Methods with Varying Locality](https://arxiv.org//abs/2402.01782) ++ [Benchmarking Spiking Neural Network Learning Methods with Varying Locality](https://arxiv.org/abs/2402.01782) Jiaqi Lin, Sen Lu, Malyaban Bal, Abhronil Sengupta # 2024-01-31 -+ [Semantic-Syntactic Discrepancy in Images (SSDI): Learning Meaning and Order of Features from Natural Images](https://arxiv.org//abs/2401.17515) ++ [Semantic-Syntactic Discrepancy in Images (SSDI): Learning Meaning and Order of Features from Natural Images](https://arxiv.org/abs/2401.17515) Chun Tao, Timur Ibrayev, Kaushik Roy # 2024-01-30 -+ [Weak-to-Strong Jailbreaking on Large Language Models](https://arxiv.org//abs/2401.17256) ++ [Weak-to-Strong Jailbreaking on Large Language Models](https://arxiv.org/abs/2401.17256) Xuandong Zhao, Xianjun Yang, Tianyu Pang, Chao Du, Lei Li, Yu-Xiang Wang, William Yang Wang -+ [Single Word Change is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers](https://arxiv.org//abs/2401.17196) ++ [Single Word Change is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers](https://arxiv.org/abs/2401.17196) Lei Xu, Sarah Alnegheimish, Laure Berti-Equille, Alfredo Cuesta-Infante, Kalyan Veeramachaneni # 2024-01-24 -+ [Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning](https://arxiv.org//abs/2401.13796) ++ [Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning](https://arxiv.org/abs/2401.13796) Andrea Apicella, Francesco Isgrò, Roberto Prevete # 2024-01-21 -+ [Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts](https://arxiv.org//abs/2401.11406) ++ [Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts](https://arxiv.org/abs/2401.11406) Kiyoon Kim, Shreyank N Gowda, Panagiotis Eustratiadis, Antreas Antoniou, Robert B Fisher # 2024-01-19 -+ [PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks](https://arxiv.org//abs/2401.10586) ++ [PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks](https://arxiv.org/abs/2401.10586) Ping Guo, Xiang Li, Zhiyuan Yang, Xi Lin, Qingchuan Zhao, Qingfu Zhang # 2024-01-18 -+ [Towards Robust Graph Structural Learning Beyond Homophily via Preserving Neighbor Similarity](https://arxiv.org//abs/2401.09754) ++ [Towards Robust Graph Structural Learning Beyond Homophily via Preserving Neighbor Similarity](https://arxiv.org/abs/2401.09754) Yulin Zhu, Yuni Lai, Xing Ai, Wai Lun LO, Gaolei Li, Jianhua Li, Di Tang, Xingxing Zhang, Mengpei Yang, Kai Zhou # 2024-01-17 -+ [Privacy Engineering in Smart Home (SH) Systems: A Comprehensive Privacy Threat Analysis and Risk Management Approach](https://arxiv.org//abs/2401.09519) ++ [Privacy Engineering in Smart Home (SH) Systems: A Comprehensive Privacy Threat Analysis and Risk Management Approach](https://arxiv.org/abs/2401.09519) Emmanuel Dare Alalade, Mohammed Mahyoub, Ashraf Matrawy # 2024-01-16 -+ [Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models](https://arxiv.org//abs/2401.08491) ++ [Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models](https://arxiv.org/abs/2401.08491) Tassilo Klein, Moin Nabi -+ [X Hacking: The Threat of Misguided AutoML](https://arxiv.org//abs/2401.08513) ++ [X Hacking: The Threat of Misguided AutoML](https://arxiv.org/abs/2401.08513) Rahul Sharma, Sergey Redyuk, Sumantrak Mukherjee, Andrea Šipka, Eyke Hüllermeier, Sebastian Vollmer, David Selby # 2024-01-11 -+ [Manipulating Feature Visualizations with Gradient Slingshots](https://arxiv.org//abs/2401.06122) ++ [Manipulating Feature Visualizations with Gradient Slingshots](https://arxiv.org/abs/2401.06122) Dilyara Bareeva, Marina M.-C. Höhne, Alexander Warnecke, Lukas Pirch, Klaus-Robert Müller, Konrad Rieck, Sebastian Lapuschkin, Kirill Bykov # 2024-01-05 -+ [Effective backdoor attack on graph neural networks in link prediction tasks](https://arxiv.org//abs/2401.02663) ++ [Effective backdoor attack on graph neural networks in link prediction tasks](https://arxiv.org/abs/2401.02663) Jiazhu Dai, Haoyu Sun # 2024-01-03 -+ [FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers](https://arxiv.org//abs/2401.01752) ++ [FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers](https://arxiv.org/abs/2401.01752) Zheng Yuan, Jie Zhang, Shiguang Shan, Xilin Chen # 2024-01-02 -+ [JMA: a General Algorithm to Craft Nearly Optimal Targeted Adversarial Example](https://arxiv.org//abs/2401.01199) ++ [JMA: a General Algorithm to Craft Nearly Optimal Targeted Adversarial Example](https://arxiv.org/abs/2401.01199) Benedetta Tondi, Wei Guo, Niccolò Pancino, Mauro Barni # 2023-12-22 -+ [Balancing Privacy, Robustness, and Efficiency in Machine Learning](https://arxiv.org//abs/2312.14712) ++ [Balancing Privacy, Robustness, and Efficiency in Machine Learning](https://arxiv.org/abs/2312.14712) Youssef Allouah, Rachid Guerraoui, John Stephan # 2023-12-18 -+ [A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report](https://arxiv.org//abs/2312.11283) ++ [A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report](https://arxiv.org/abs/2312.11283) John M. Abowd, Tamara Adams, Robert Ashmead, David Darais, Sourya Dey, Simson L. Garfinkel, Nathan Goldschlag, Daniel Kifer, Philip Leclerc, Ethan Lew, Scott Moore, Rolando A. Rodríguez, Ramy N. Tadros, Lars Vilhuber # 2023-12-11 -+ [MalPurifier: Enhancing Android Malware Detection with Adversarial Purification against Evasion Attacks](https://arxiv.org//abs/2312.06423) ++ [MalPurifier: Enhancing Android Malware Detection with Adversarial Purification against Evasion Attacks](https://arxiv.org/abs/2312.06423) Yuyang Zhou, Guang Cheng, Zongyao Chen, Shui Yu # 2023-12-07 -+ [GaitGuard: Towards Private Gait in Mixed Reality](https://arxiv.org//abs/2312.04470) ++ [GaitGuard: Towards Private Gait in Mixed Reality](https://arxiv.org/abs/2312.04470) Diana Romero, Ruchi Jagdish Patel, Athina Markopoulou, Salma Elmalaki # 2023-12-03 -+ [Breaking XOR Arbiter PUFs with Chosen Challenge Attack](https://arxiv.org//abs/2312.01256) ++ [Breaking XOR Arbiter PUFs with Chosen Challenge Attack](https://arxiv.org/abs/2312.01256) Niloufar Sayadi, Phuong Ha Nguyen, Marten van Dijk, Chenglu Jin # 2023-11-28 -+ [On the Robustness of Decision-Focused Learning](https://arxiv.org//abs/2311.16487) ++ [On the Robustness of Decision-Focused Learning](https://arxiv.org/abs/2311.16487) Yehya Farhat # 2023-11-15 -+ [On the Foundation of Distributionally Robust Reinforcement Learning](https://arxiv.org//abs/2311.09018) ++ [On the Foundation of Distributionally Robust Reinforcement Learning](https://arxiv.org/abs/2311.09018) Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou # 2023-11-13 -+ [Backdoor Attacks on Transformers for Tabular Data: An Empirical Study](https://arxiv.org//abs/2311.07550) ++ [Backdoor Attacks on Transformers for Tabular Data: An Empirical Study](https://arxiv.org/abs/2311.07550) Bart Pleiter, Behrad Tajalli, Stefanos Koffas, Gorka Abad, Jing Xu, Martha Larson, Stjepan Picek # 2023-11-12 -+ [Preserving Node-level Privacy in Graph Neural Networks](https://arxiv.org//abs/2311.06888) ++ [Preserving Node-level Privacy in Graph Neural Networks](https://arxiv.org/abs/2311.06888) Zihang Xiang, Tianhao Wang, Di Wang # 2023-11-07 -+ [Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models](https://arxiv.org//abs/2311.04378) ++ [Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models](https://arxiv.org/abs/2311.04378) Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, Boaz Barak # 2023-11-03 -+ [GNNBleed: Inference Attacks to Unveil Private Edges in Graphs with Realistic Access to GNN Models](https://arxiv.org//abs/2311.16139) ++ [GNNBleed: Inference Attacks to Unveil Private Edges in Graphs with Realistic Access to GNN Models](https://arxiv.org/abs/2311.16139) Zeyu Song, Ehsanul Kabir, Shagufta Mehnaz # 2023-11-02 -+ [Upper and lower bounds for the Lipschitz constant of random neural networks](https://arxiv.org//abs/2311.01356) ++ [Upper and lower bounds for the Lipschitz constant of random neural networks](https://arxiv.org/abs/2311.01356) Paul Geuchen, Dominik Stöger, Thomas Telaar, Felix Voigtlaender # 2023-10-24 -+ [Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles](https://arxiv.org//abs/2310.15952) ++ [Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles](https://arxiv.org/abs/2310.15952) Xing Shen, Hengguan Huang, Brennan Nichyporuk, Tal Arbel # 2023-10-20 -+ [Competitive Advantage Attacks to Decentralized Federated Learning](https://arxiv.org//abs/2310.13862) ++ [Competitive Advantage Attacks to Decentralized Federated Learning](https://arxiv.org/abs/2310.13862) Yuqi Jia, Minghong Fang, Neil Zhenqiang Gong # 2023-10-18 -+ [Revisiting Transferable Adversarial Images: Systemization, Evaluation, and New Insights](https://arxiv.org//abs/2310.11850) ++ [Revisiting Transferable Adversarial Images: Systemization, Evaluation, and New Insights](https://arxiv.org/abs/2310.11850) Zhengyu Zhao, Hanwei Zhang, Renjue Li, Ronan Sicre, Laurent Amsaleg, Michael Backes, Qi Li, Qian Wang, Chao Shen # 2023-10-17 -+ [Adversarial Robustness Unhardening via Backdoor Attacks in Federated Learning](https://arxiv.org//abs/2310.11594) ++ [Adversarial Robustness Unhardening via Backdoor Attacks in Federated Learning](https://arxiv.org/abs/2310.11594) Taejin Kim, Jiarui Li, Shubhranshu Singh, Nikhil Madaan, Carlee Joe-Wong # 2023-09-29 -+ [Adversarial Attacks to Latent Representations of Distributed Neural Networks in Split Computing](https://arxiv.org//abs/2309.17401) ++ [Adversarial Attacks to Latent Representations of Distributed Neural Networks in Split Computing](https://arxiv.org/abs/2309.17401) Milin Zhang, Mohammad Abdi, Jonathan Ashdown, Francesco Restuccia -+ [On Continuity of Robust and Accurate Classifiers](https://arxiv.org//abs/2309.17048) ++ [On Continuity of Robust and Accurate Classifiers](https://arxiv.org/abs/2309.17048) Ramin Barati, Reza Safabakhsh, Mohammad Rahmati # 2023-09-26 -+ [TroLL: Exploiting Structural Similarities between Logic Locking and Hardware Trojans](https://arxiv.org//abs/2309.15067) ++ [TroLL: Exploiting Structural Similarities between Logic Locking and Hardware Trojans](https://arxiv.org/abs/2309.15067) Yuntao Liu, Aruna Jayasena, Prabhat Mishra, Ankur Srivastava # 2023-08-23 -+ [A Survey of Graph Unlearning](https://arxiv.org//abs/2310.02164) ++ [A Survey of Graph Unlearning](https://arxiv.org/abs/2310.02164) Anwar Said, Ngoc N. Tran, Yuying Zhao, Tyler Derr, Mudassir Shabbir, Waseem Abbas, Xenofon Koutsoukos @@ -33997,27 +33997,27 @@ Xiaosen Wang, Zeyuan Yin Muhammad Irfan Khan, Esa Alhoniemi, Elina Kontio, Suleiman A. Khan, Mojtaba Jafaritadi # 2023-07-25 -+ [Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis](https://arxiv.org//abs/2307.14364) ++ [Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis](https://arxiv.org/abs/2307.14364) Yang Jiao, Kai Yang, Dongjin Song # 2023-07-24 -+ [Homophily-Driven Sanitation View for Robust Graph Contrastive Learning](https://arxiv.org//abs/2307.12555) ++ [Homophily-Driven Sanitation View for Robust Graph Contrastive Learning](https://arxiv.org/abs/2307.12555) Yulin Zhu, Xing Ai, Yevgeniy Vorobeychik, Kai Zhou # 2023-07-21 -+ [Improving Transferability of Adversarial Examples via Bayesian Attacks](https://arxiv.org//abs/2307.11334) ++ [Improving Transferability of Adversarial Examples via Bayesian Attacks](https://arxiv.org/abs/2307.11334) Qizhang Li, Yiwen Guo, Xiaochen Yang, Wangmeng Zuo, Hao Chen # 2023-07-17 -+ [Analyzing the Impact of Adversarial Examples on Explainable Machine Learning](https://arxiv.org//abs/2307.08327) ++ [Analyzing the Impact of Adversarial Examples on Explainable Machine Learning](https://arxiv.org/abs/2307.08327) Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu Chandan Nayak, Tapadhir Das # 2023-07-03 -+ [Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)](https://arxiv.org//abs/2307.01225) ++ [Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)](https://arxiv.org/abs/2307.01225) Bushra Sabir, M. Ali Babar, Sharif Abuadbba @@ -34027,17 +34027,17 @@ Xiaosen Wang, Zeyuan Yin Nicholas Boucher, Jenny Blessing, Ilia Shumailov, Ross Anderson, Nicolas Papernot # 2023-06-11 -+ [Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework](https://arxiv.org//abs/2306.07992) ++ [Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework](https://arxiv.org/abs/2306.07992) Minglei Yin, Bin Liu, Neil Zhenqiang Gong, Xin Li # 2023-06-06 -+ [SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving](https://arxiv.org//abs/2306.03538) ++ [SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving](https://arxiv.org/abs/2306.03538) Honghao Fu, Yongli Gu, Yidong Yan, Yilang Shen, Yiwen Wu, Libo Sun # 2023-05-27 -+ [Two Heads are Actually Better than One: Towards Better Adversarial Robustness via Transduction and Rejection](https://arxiv.org//abs/2305.17528) ++ [Two Heads are Actually Better than One: Towards Better Adversarial Robustness via Transduction and Rejection](https://arxiv.org/abs/2305.17528) Nils Palumbo, Yang Guo, Xi Wu, Jiefeng Chen, Yingyu Liang, Somesh Jha @@ -34047,32 +34047,32 @@ Xiaosen Wang, Zeyuan Yin Rui Tuo, Haoyuan Chen, Raktim Bhattacharya # 2023-05-24 -+ [Robust Sparse Mean Estimation via Incremental Learning](https://arxiv.org//abs/2305.15276) ++ [Robust Sparse Mean Estimation via Incremental Learning](https://arxiv.org/abs/2305.15276) Jianhao Ma, Rui Ray Chen, Yinghui He, Salar Fattahi, Wei Hu # 2023-05-23 -+ [Adversarial Defenses via Vector Quantization](https://arxiv.org//abs/2305.13651) ++ [Adversarial Defenses via Vector Quantization](https://arxiv.org/abs/2305.13651) Zhiyi Dong, Yongyi Mao # 2023-05-16 -+ [Releasing Inequality Phenomenon in $\ell_{\infty}$-norm Adversarial Training via Input Gradient Distillation](https://arxiv.org//abs/2305.09305) ++ [Releasing Inequality Phenomenon in $\ell_{\infty}$-norm Adversarial Training via Input Gradient Distillation](https://arxiv.org/abs/2305.09305) Junxi Chen, Junhao Dong, Xiaohua Xie, Jianhuang Lai # 2023-05-09 -+ [Privacy in Speech Technology](https://arxiv.org//abs/2305.05227) ++ [Privacy in Speech Technology](https://arxiv.org/abs/2305.05227) Tom Bäckström # 2023-05-06 -+ [Gradient Leakage Defense with Key-Lock Module for Federated Learning](https://arxiv.org//abs/2305.04095) ++ [Gradient Leakage Defense with Key-Lock Module for Federated Learning](https://arxiv.org/abs/2305.04095) Hanchi Ren, Jingjing Deng, Xianghua Xie, Xiaoke Ma, Jianfeng Ma # 2023-04-21 -+ [Interpretable and Robust AI in EEG Systems: A Survey](https://arxiv.org//abs/2304.10755) ++ [Interpretable and Robust AI in EEG Systems: A Survey](https://arxiv.org/abs/2304.10755) Xinliang Zhou, Chenyu Liu, Jinan Zhou, Zhongruo Wang, Liming Zhai, Ziyu Jia, Cuntai Guan, Yang Liu @@ -34081,109 +34081,109 @@ Xiaosen Wang, Zeyuan Yin Dongjie Wang, Chang-Tien Lu, Xinyue Ye, Tan Yigitcanlar, Yanjie Fu -+ [Generative AI Meets Future Cities: Towards an Era of Autonomous Urban Intelligence](https://arxiv.org//abs/2304.03892) ++ [Generative AI Meets Future Cities: Towards an Era of Autonomous Urban Intelligence](https://arxiv.org/abs/2304.03892) Dongjie Wang, Chang-Tien Lu, Xinyue Ye, Tan Yigitcanlar, Yanjie Fu # 2023-03-27 -+ [Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder](https://arxiv.org//abs/2303.15564) ++ [Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder](https://arxiv.org/abs/2303.15564) Tao Sun, Lu Pang, Weimin Lyu, Chao Chen, Haibin Ling # 2023-03-18 -+ [NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models](https://arxiv.org//abs/2303.10430) ++ [NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models](https://arxiv.org/abs/2303.10430) Yiran Ye, Thai Le, Dongwon Lee # 2023-03-17 -+ [Bridging Models to Defend: A Population-Based Strategy for Robust Adversarial Defense](https://arxiv.org//abs/2303.10225) ++ [Bridging Models to Defend: A Population-Based Strategy for Robust Adversarial Defense](https://arxiv.org/abs/2303.10225) Ren Wang, Yuxuan Li, Can Chen, Dakuo Wang, Jinjun Xiong, Pin-Yu Chen, Sijia Liu, Mohammad Shahidehpour, Alfred Hero # 2023-03-14 -+ [Constrained Adversarial Learning for Automated Software Testing: a literature review](https://arxiv.org//abs/2303.07546) ++ [Constrained Adversarial Learning for Automated Software Testing: a literature review](https://arxiv.org/abs/2303.07546) João Vitorino, Tiago Dias, Tiago Fonseca, Eva Maia, Isabel Praça -+ [Eliciting Latent Predictions from Transformers with the Tuned Lens](https://arxiv.org//abs/2303.08112) ++ [Eliciting Latent Predictions from Transformers with the Tuned Lens](https://arxiv.org/abs/2303.08112) Nora Belrose, Igor Ostrovsky, Lev McKinney, Zach Furman, Logan Smith, Danny Halawi, Stella Biderman, Jacob Steinhardt # 2023-03-07 -+ [Nash Equilibria, Regularization and Computation in Optimal Transport-Based Distributionally Robust Optimization](https://arxiv.org//abs/2303.03900) ++ [Nash Equilibria, Regularization and Computation in Optimal Transport-Based Distributionally Robust Optimization](https://arxiv.org/abs/2303.03900) Soroosh Shafiee, Liviu Aolaritei, Florian Dörfler, Daniel Kuhn -+ [DR-VIDAL -- Doubly Robust Variational Information-theoretic Deep Adversarial Learning for Counterfactual Prediction and Treatment Effect Estimation on Real World Data](https://arxiv.org//abs/2303.04201) ++ [DR-VIDAL -- Doubly Robust Variational Information-theoretic Deep Adversarial Learning for Counterfactual Prediction and Treatment Effect Estimation on Real World Data](https://arxiv.org/abs/2303.04201) Shantanu Ghosh, Zheng Feng, Jiang Bian, Kevin Butler, Mattia Prosperi # 2023-03-02 -+ [Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance](https://arxiv.org//abs/2303.01256) ++ [Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance](https://arxiv.org/abs/2303.01256) Xin Gu, Gautam Kamath, Zhiwei Steven Wu # 2023-02-20 -+ [ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment](https://arxiv.org//abs/2302.09913) ++ [ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment](https://arxiv.org/abs/2302.09913) Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali, Giuseppe Caire # 2023-02-18 -+ [Digital Privacy Under Attack: Challenges and Enablers](https://arxiv.org//abs/2302.09258) ++ [Digital Privacy Under Attack: Challenges and Enablers](https://arxiv.org/abs/2302.09258) Baobao Song, Mengyue Deng, Shiva Raj Pokhrel, Qiujun Lan, Robin Doss, Gang Li # 2023-02-10 -+ [Privacy Against Agnostic Inference Attacks in Vertical Federated Learning](https://arxiv.org//abs/2302.05545) ++ [Privacy Against Agnostic Inference Attacks in Vertical Federated Learning](https://arxiv.org/abs/2302.05545) Morteza Varasteh # 2022-12-20 -+ [Learned-Database Systems Security](https://arxiv.org//abs/2212.10318) ++ [Learned-Database Systems Security](https://arxiv.org/abs/2212.10318) Roei Schuster, Jin Peng Zhou, Thorsten Eisenhofer, Paul Grubbs, Nicolas Papernot # 2022-12-12 -+ [Security of Deep Reinforcement Learning for Autonomous Driving: A Survey](https://arxiv.org//abs/2212.06123) ++ [Security of Deep Reinforcement Learning for Autonomous Driving: A Survey](https://arxiv.org/abs/2212.06123) Ambra Demontis, Srishti Gupta, Maura Pintor, Luca Demetrio, Kathrin Grosse, Hsiao-Ying Lin, Chengfang Fang, Battista Biggio, Fabio Roli # 2022-12-09 -+ [Iterative Minimax Games with Coupled Linear Constraints](https://arxiv.org//abs/2212.04672) ++ [Iterative Minimax Games with Coupled Linear Constraints](https://arxiv.org/abs/2212.04672) Huiling Zhang, Zi Xu, Yu-Hong Dai # 2022-12-05 -+ [Refiner: Data Refining against Gradient Leakage Attacks in Federated Learning](https://arxiv.org//abs/2212.02042) ++ [Refiner: Data Refining against Gradient Leakage Attacks in Federated Learning](https://arxiv.org/abs/2212.02042) Mingyuan Fan, Cen Chen, Chengyu Wang, Xiaodan Li, Wenmeng Zhou # 2022-12-02 -+ [Safe machine learning model release from Trusted Research Environments: The SACRO-ML package](https://arxiv.org//abs/2212.01233) ++ [Safe machine learning model release from Trusted Research Environments: The SACRO-ML package](https://arxiv.org/abs/2212.01233) Jim Smith, Richard J. Preen, Andrew McCarthy, Maha Albashir, Alba Crespi-Boixader, Shahzad Mumtaz, Christian Cole, James Liley, Jost Migenda, Simon Rogers, Yola Jones # 2022-10-26 -+ [Privacy Analysis of Samsung's Crowd-Sourced Bluetooth Location Tracking System](https://arxiv.org//abs/2210.14702) ++ [Privacy Analysis of Samsung's Crowd-Sourced Bluetooth Location Tracking System](https://arxiv.org/abs/2210.14702) Tingfeng Yu, James Henderson, Alwen Tiu, Thomas Haines # 2022-10-25 -+ [Similarity between Units of Natural Language: The Transition from Coarse to Fine Estimation](https://arxiv.org//abs/2210.14275) ++ [Similarity between Units of Natural Language: The Transition from Coarse to Fine Estimation](https://arxiv.org/abs/2210.14275) Wenchuan Mu -+ [Robustness of Locally Differentially Private Graph Analysis Against Poisoning](https://arxiv.org//abs/2210.14376) ++ [Robustness of Locally Differentially Private Graph Analysis Against Poisoning](https://arxiv.org/abs/2210.14376) Jacob Imola, Amrita Roy Chowdhury, Kamalika Chaudhuri # 2022-10-13 -+ [LEAVES: Learning Views for Time-Series Biobehavioral Data in Contrastive Learning](https://arxiv.org//abs/2210.07340) ++ [LEAVES: Learning Views for Time-Series Biobehavioral Data in Contrastive Learning](https://arxiv.org/abs/2210.07340) Han Yu, Huiyuan Yang, Akane Sano # 2022-10-12 -+ [Differentially private multivariate medians](https://arxiv.org//abs/2210.06459) ++ [Differentially private multivariate medians](https://arxiv.org/abs/2210.06459) Kelly Ramsay, Aukosh Jagannath, Shoja'eddin Chenouri @@ -34193,17 +34193,17 @@ Xiaosen Wang, Zeyuan Yin Nuo Xu, Kaleel Mahmood, Haowen Fang, Ethan Rathbun, Caiwen Ding, Wujie Wen # 2022-07-07 -+ [On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Networks](https://arxiv.org//abs/2207.03400) ++ [On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Networks](https://arxiv.org/abs/2207.03400) Seongjin Park, Haedong Jeong, Tair Djanibekov, Giyoung Jeon, Jinseok Seol, Jaesik Choi # 2022-06-01 -+ [Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines](https://arxiv.org//abs/2206.00535) ++ [Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines](https://arxiv.org/abs/2206.00535) Camilo Fosco, Emilie Josephs, Alex Andonian, Aude Oliva # 2022-01-10 -+ [FedDTG:Federated Data-Free Knowledge Distillation via Three-Player Generative Adversarial Networks](https://arxiv.org//abs/2201.03169) ++ [FedDTG:Federated Data-Free Knowledge Distillation via Three-Player Generative Adversarial Networks](https://arxiv.org/abs/2201.03169) Lingzhi Gao, Zhenyuan Zhang, Chao Wu @@ -34213,37 +34213,37 @@ Xiaosen Wang, Zeyuan Yin Montaser Mohammedalamen, Dustin Morrill, Alexander Sieusahai, Yash Satsangi, Michael Bowling # 2021-09-21 -+ [Beyond Discriminant Patterns: On the Robustness of Decision Rule Ensembles](https://arxiv.org//abs/2109.10432) ++ [Beyond Discriminant Patterns: On the Robustness of Decision Rule Ensembles](https://arxiv.org/abs/2109.10432) Xin Du, Subramanian Ramamoorthy, Wouter Duivesteijn, Jin Tian, Mykola Pechenizkiy # 2021-03-30 -+ [PointBA: Towards Backdoor Attacks in 3D Point Cloud](https://arxiv.org//abs/2103.16074) ++ [PointBA: Towards Backdoor Attacks in 3D Point Cloud](https://arxiv.org/abs/2103.16074) Xinke Li, Zhirui Chen, Yue Zhao, Zekun Tong, Yabang Zhao, Andrew Lim, Joey Tianyi Zhou # 2019-11-19 -+ [Defective Convolutional Networks](https://arxiv.org//abs/1911.08432) ++ [Defective Convolutional Networks](https://arxiv.org/abs/1911.08432) Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Di He, Liwei Wang # 2019-11-05 -+ [Federated Adversarial Domain Adaptation](https://arxiv.org//abs/1911.02054) ++ [Federated Adversarial Domain Adaptation](https://arxiv.org/abs/1911.02054) Xingchao Peng, Zijun Huang, Yizhe Zhu, Kate Saenko # 2019-06-04 -+ [Architecture Selection via the Trade-off Between Accuracy and Robustness](https://arxiv.org//abs/1906.01354) ++ [Architecture Selection via the Trade-off Between Accuracy and Robustness](https://arxiv.org/abs/1906.01354) Zhun Deng, Cynthia Dwork, Jialiang Wang, Yao Zhao # 2019-05-28 -+ [Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks](https://arxiv.org//abs/1905.12032) ++ [Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks](https://arxiv.org/abs/1905.12032) Pu Zhao, Siyue Wang, Cheng Gongye, Yanzhi Wang, Yunsi Fei, Xue Lin # 2018-07-11 -+ [The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization](https://arxiv.org//abs/1807.03907) ++ [The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization](https://arxiv.org/abs/1807.03907) Constantinos Daskalakis, Ioannis Panageas