Skip to content

tyskill/llm-arxiv-daily

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Updated on 2025.09.09

Usage instructions: here

Table of Contents
  1. Model Security
  2. Prompt Injection
  3. Code Embedding
  4. Model Context Protocol
  5. Supply Chain Attacks

Model Security

Publish Date Title Authors PDF Code
2025-07-23 HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study Mandar Pitale et.al. 2507.17118 null
2025-07-22 Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge Proofs H M Mohaimanul Islam et.al. 2507.17010 null
2025-07-22 Depth Gives a False Sense of Privacy: LLM Internal States Inversion Tian Dong et.al. 2507.16372 null
2025-07-19 Combining Cost-Constrained Runtime Monitors for AI Safety Tim Tian Hua et.al. 2507.15886 null
2025-07-19 When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems Qibing Ren et.al. 2507.14660 null
2025-07-22 Mapping the Parasocial AI Market: User Trends, Engagement and Risks Zilan Qian et.al. 2507.14226 null
2025-07-15 Mitigating Trojanized Prompt Chains in Educational LLM Use Cases: Experimental Findings and Detection Tool Design Richard M. Charles et.al. 2507.14207 null
2025-07-23 Fake or Real: The Impostor Hunt in Texts for Space Operations Agata Kaczmarek et.al. 2507.13508 null
2025-07-17 Manipulation Attacks by Misaligned AI: Risk Analysis and Safety Case Framework Rishane Dassanayake et.al. 2507.12872 null
2025-07-16 LLMs Encode Harmfulness and Refusal Separately Jiachen Zhao et.al. 2507.11878 null
2025-07-09 The AI Shadow War: SaaS vs. Edge Computing Architectures Rhea Pritham Marpu et.al. 2507.11545 null
2025-07-15 Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety Tomek Korbak et.al. 2507.11473 null
2025-07-14 3S-Attack: Spatial, Spectral and Semantic Invisible Backdoor Attack Against DNN Models Jianyao Yin et.al. 2507.10733 null
2025-07-16 From Semantic Web and MAS to Agentic AI: A Unified Narrative of the Web of Agents Tatiana Petrova et.al. 2507.10644 null
2025-07-14 Can You Detect the Difference? İsmail Tarım et.al. 2507.10475 null
2025-07-14 BlueGlass: A Framework for Composite AI Safety Harshal Nandigramwar et.al. 2507.10106 null
2025-07-13 Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications Jia Yi Goh et.al. 2507.09820 null
2025-07-12 Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers Santhosh Kumar Ravindran et.al. 2507.09406 null
2025-07-06 Mass-Scale Analysis of In-the-Wild Conversations Reveals Complexity Bounds on LLM Jailbreaking Aldan Creo et.al. 2507.08014 null
2025-07-15 Secure Cooperative Gradient Coding: Optimality, Reliability, and Global Privacy Shudi Weng et.al. 2507.07565 null
2025-07-09 Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models Aaron Dharna et.al. 2507.06466 null
2025-07-08 Humans overrely on overconfident language models, across languages Neil Rathi et.al. 2507.06306 null
2025-07-07 Evaluating the Critical Risks of Amazon's Nova Premier under the Frontier Model Safety Framework Satyapriya Krishna et.al. 2507.06260 null
2025-07-08 CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations Xiaohu Li et.al. 2507.06043 null
2025-07-08 Domain adaptation of large language models for geotechnical applications Lei Fan et.al. 2507.05613 null
2025-07-07 When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors Scott Emmons et.al. 2507.05246 null
2025-07-07 Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message Wei Duan et.al. 2507.04673 null
2025-07-03 From Turing to Tomorrow: The UK's Approach to AI Regulation Oliver Ritchie et.al. 2507.03050 null
2025-07-01 `For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts Annika M Schoene et.al. 2507.02990 null
2025-07-01 GAF-Guard: An Agentic Framework for Risk Management and Governance in Large Language Models Seshu Tirupathi et.al. 2507.02986 null
2025-07-03 Moral Responsibility or Obedience: What Do We Want from AI? Joseph Boland et.al. 2507.02788 null
2025-07-03 Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks Sizhe Chen et.al. 2507.02735 null
2025-07-02 How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Rahul Ramachandran et.al. 2507.01955 null
2025-07-02 Out-of-Distribution Detection Methods Answer the Wrong Questions Yucen Lily Li et.al. 2507.01831 null
2025-07-01 SAFER: Probing Safety in Reward Models with Sparse Autoencoder Sihang Li et.al. 2507.00665 null
2025-06-30 Thinking About Thinking: SAGE-nano's Inverse Reasoning for Self-Aware Language Models Basab Jha et.al. 2507.00092 null
2025-06-30 Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments Christoph Schnabl et.al. 2506.23706 null
2025-06-30 A New Perspective On AI Safety Through Control Theory Methodologies Lars Ullrich et.al. 2506.23703 null
2025-06-29 Securing AI Systems: A Guide to Known Attacks and Impacts Naoto Kiribuchi et.al. 2506.23296 null
2025-06-28 MPC in the Quantum Head (or: Superposition-Secure (Quantum) Zero-Knowledge) Andrea Coladangelo et.al. 2506.22961 null
2025-06-25 Mitigating Gambling-Like Risk-Taking Behaviors in Large Language Models: A Behavioral Economics Approach to AI Safety Y. Du et.al. 2506.22496 null
2025-06-24 Report on NSF Workshop on Science of Safe AI Rajeev Alur et.al. 2506.22492 null
2025-06-27 A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety Camille François et.al. 2506.22183 null
2025-06-27 SODA: Out-of-Distribution Detection in Domain-Shifted Point Clouds via Neighborhood Propagation Adam Goodge et.al. 2506.21892 null
2025-06-30 The Singapore Consensus on Global AI Safety Research Priorities Yoshua Bengio et.al. 2506.20702 null
2025-06-25 Probing AI Safety with Source Code Ujwal Narayan et.al. 2506.20471 null
2025-06-24 Persona Features Control Emergent Misalignment Miles Wang et.al. 2506.19823 null
2025-06-21 AI Safety vs. AI Security: Demystifying the Distinction and Boundaries Zhiqiang Lin et.al. 2506.18932 null
2025-06-23 How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models Feng He et.al. 2506.18428 null
2025-06-23 LLM-Integrated Digital Twins for Hierarchical Resource Allocation in 6G Networks Majumder Haider et.al. 2506.18293 null
2025-06-22 AI Through the Human Lens: Investigating Cognitive Theories in Machine Psychology Akash Kundu et.al. 2506.18156 null
2025-06-22 $φ^{\infty}$ : Clause Purification, Embedding Realignment, and the Total Suppression of the Em Dash in Autoregressive Language Models Bugra Kilictas et.al. 2506.18129 null
2025-06-21 Out of Control -- Why Alignment Needs Formal Control Theory (and an Alignment Control Stack) Elija Perrier et.al. 2506.17846 null
2025-06-20 SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification Zhenglin Lai et.al. 2506.17368 null
2025-06-19 PL-Guard: Benchmarking Language Model Safety for Polish Aleksandra Krasnodębska et.al. 2506.16322 null
2025-06-19 Probing the Robustness of Large Language Models Safety to Latent Perturbations Tianle Gu et.al. 2506.16078 link
2025-06-18 LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning Gabrel J. Perin et.al. 2506.15606 link
2025-06-17 TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift Dipesh Tharu Mahato et.al. 2506.14217 link
2025-06-17 The Ethics of Generative AI in Anonymous Spaces: A Case Study of 4chan's /pol/ Board Parth Gaba et.al. 2506.14191 null
2025-06-17 Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions Junfeng Jiao et.al. 2506.13510 link
2025-06-16 Position: Certified Robustness Does Not (Yet) Imply Model Security Andrew C. Cullen et.al. 2506.13024 null
2025-06-15 Intriguing Frequency Interpretation of Adversarial Robustness for CNNs and ViTs Lu Chen et.al. 2506.12875 null
2025-06-14 OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics Vineeth Dorna et.al. 2506.12618 link
2025-06-14 Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare Yubin Kim et.al. 2506.12482 null
2025-06-13 InfoFlood: Jailbreaking Large Language Models with Information Overload Advait Yadav et.al. 2506.12274 null
2025-06-13 Hatevolution: What Static Benchmarks Don't Tell Us Chiara Di Bonaventura et.al. 2506.12148 null
2025-06-13 Improving Large Language Model Safety with Contrastive Representation Learning Samuel Simko et.al. 2506.11938 link
2025-06-13 Model Organisms for Emergent Misalignment Edward Turner et.al. 2506.11613 null
2025-06-12 The Alignment Trap: Complexity Barriers Jasper Yao et.al. 2506.10304 null
2025-06-11 Data-Centric Safety and Ethical Measures for Data and AI Governance Srija Chakraborty et.al. 2506.10217 null
2025-06-09 LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges Haoyang Li et.al. 2506.10022 link
2025-06-08 Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations Zhiyu Xue et.al. 2506.09067 null
2025-06-11 Societal AI Research Has Become Less Interdisciplinary Dror Kris Markus et.al. 2506.08738 null
2025-06-11 AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin Shuo Yang et.al. 2506.08473 link
2025-06-06 Benchmarking Misuse Mitigation Against Covert Adversaries Davis Brown et.al. 2506.06414 link
2025-06-03 Rational Superautotrophic Diplomacy (SupraAD); A Conceptual Framework for Alignment Based on Interdisciplinary Findings on the Fundamentals of Cognition Andrea Morris et.al. 2506.05389 null
2025-06-05 Normative Conflicts and Shallow AI Alignment Raphaël Millière et.al. 2506.04679 null
2025-06-04 Watermarking Degrades Alignment in Language Models: Analysis and Mitigation Apurv Verma et.al. 2506.04462 link
2025-06-04 Misalignment or misuse? The AGI alignment tradeoff Max Hellrigel-Holderbaum et.al. 2506.03755 null
2025-06-04 Bridging the Artificial Intelligence Governance Gap: The United States' and China's Divergent Approaches to Governing General-Purpose Artificial Intelligence Oliver Guest et.al. 2506.03497 null
2025-06-03 MAEBE: Multi-Agent Emergent Behavior Framework Sinem Erisken et.al. 2506.03053 null
2025-06-02 Trojan Horse Hunt in Time Series Forecasting for Space Operations Krzysztof Kotowski et.al. 2506.01849 null
2025-06-02 ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs Zeming Wei et.al. 2506.01770 link
2025-06-02 Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation Yuan Gan et.al. 2506.01591 link
2025-05-31 Wide Reflective Equilibrium in LLM Alignment: Bridging Moral Epistemology and AI Safety Matthew Brophy et.al. 2506.00415 null
2025-05-30 Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences Mingqian Zheng et.al. 2506.00195 null
2025-05-30 Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment Kundan Krishna et.al. 2506.00166 null
2025-05-30 TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis Xiaorui Wu et.al. 2505.24672 link
2025-05-30 Benchmarking Large Language Models for Cryptanalysis and Mismatched-Generalization Utsav Maskey et.al. 2505.24621 null
2025-05-30 The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It Zheng-Xin Yong et.al. 2505.24119 null
2025-05-29 OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities Sahil Verma et.al. 2505.23856 link
2025-05-27 Watermarking Without Standards Is Not AI Governance Alexander Nemecek et.al. 2505.23814 null
2025-05-29 SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents Kunlun Zhu et.al. 2505.23559 link
2025-05-29 Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models Mingyu Yu et.al. 2505.23404 null
2025-05-28 Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies Chenruo Liu et.al. 2505.22829 null
2025-05-28 TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE Tong Sun et.al. 2505.22735 link
2025-05-27 Expert Survey: AI Reliability & Security Research Priorities Joe O'Brien et.al. 2505.21664 null
2025-05-27 Preventing Adversarial AI Attacks Against Autonomous Situational Awareness: A Maritime Case Study Mathew J. Walter et.al. 2505.21609 null
2025-05-27 SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge Fengqing Jiang et.al. 2505.21605 null
2025-05-26 Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts Hee-Seon Kim et.al. 2505.21556 null
2025-05-27 The Multilingual Divide and Its Impact on Global AI Safety Aidan Peppin et.al. 2505.21344 null
2025-05-27 Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling Yichuan Cao et.al. 2505.21074 null
2025-05-26 VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration Jiahui Geng et.al. 2505.20362 link
2025-05-26 What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs Sangyeop Kim et.al. 2505.19773 null
2025-05-25 When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas Steffen Backmann et.al. 2505.19212 link
2025-05-25 GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization Zixuan Chen et.al. 2505.18979 null
2025-05-24 Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning Maeva Guerrier et.al. 2505.18858 null
2025-05-24 Safety Alignment via Constrained Knowledge Unlearning Zesheng Shi et.al. 2505.18588 null
2025-05-23 Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary Licheng Pan et.al. 2505.18325 null
2025-05-23 Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis Jonathan Bennion et.al. 2505.17636 null
2025-05-23 Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models Jiawei Kong et.al. 2505.17601 null
2025-05-20 From nuclear safety to LLM security: Applying non-probabilistic risk management strategies to build safe and secure LLM-powered systems Alexander Gutfraind et.al. 2505.17084 null
2025-05-22 When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques Jianing Geng et.al. 2505.16765 null
2025-05-22 Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization Chengcan Wu et.al. 2505.16737 link
2025-05-21 Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack Silvia Cappelletti et.al. 2505.15323 null
2025-05-20 Foundations of Unknown-aware Machine Learning Xuefeng Du et.al. 2505.14933 null
2025-05-20 Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas Yu Ying Chiu et.al. 2505.14633 link
2025-05-19 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations Li Ji-An et.al. 2505.13763 null
2025-05-16 Noise Injection Systemically Degrades Large Language Model Safety Guardrails Prithviraj Singh Shahani et.al. 2505.13500 null
2025-05-19 Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities Lili Zhang et.al. 2505.13195 null
2025-05-19 Bullying the Machine: How Personas Increase LLM Vulnerability Ziwei Xu et.al. 2505.12692 null
2025-05-18 Persuasion and Safety in the Era of Generative AI Haein Kong et.al. 2505.12248 null
2025-05-17 Position Paper: Bounded Alignment: What (Not) To Expect From AGI Agents Ali A. Minai et.al. 2505.11866 null
2025-05-16 Probing the Vulnerability of Large Language Models to Polysemantic Interventions Bofan Gong et.al. 2505.11611 null
2025-05-16 Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning Jingcheng Niu et.al. 2505.11004 link
2025-05-15 Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility Maurice Chiodo et.al. 2505.10426 null
2025-05-15 Dark LLMs: The Growing Threat of Unaligned AI Models Michael Fire et.al. 2505.10066 null
2025-05-15 Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data Adel ElZemity et.al. 2505.09974 null
2025-05-14 Access Controls Will Solve the Dual-Use Dilemma EvĹľen Wybitul et.al. 2505.09341 null
2025-05-16 SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models Huining Cui et.al. 2505.07584 null
2025-05-09 Offensive Security for AI Systems: Concepts, Practices, and Applications Josh Harguess et.al. 2505.06380 null
2025-05-08 Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods Markov Grey et.al. 2505.05541 null
2025-05-08 Reasoning Models Don't Always Say What They Think Yanda Chen et.al. 2505.05410 null
2025-05-08 Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation Luca Marzari et.al. 2505.05235 null
2025-05-08 Belief Filtering for Epistemic Control in Linguistic State Space Sebastian Dumbrava et.al. 2505.04927 null
2025-05-07 The Aloe Family Recipe for Open and Specialized Healthcare LLMs Dario Garcia-Gasulla et.al. 2505.04388 null
2025-05-07 Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety Variath Madhupal Gautham Nair et.al. 2505.04146 null
2025-05-08 An alignment safety case sketch based on debate Marie Davidsen Buhl et.al. 2505.03989 null
2025-05-05 What Is AI Safety? What Do We Want It to Be? Jacqueline Harding et.al. 2505.02313 null
2025-05-04 Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents Christian Schroeder de Witt et.al. 2505.02077 null
2025-05-03 Third-party compliance reviews for frontier AI safety frameworks Aidan Homewood et.al. 2505.01643 null
2025-05-02 Securing the Future of IVR: AI-Driven Innovation with Agile Security, Data Regulation, and Ethical AI Integration Khushbu Mehboob Shaikh et.al. 2505.01514 null
2025-04-30 A Domain-Agnostic Scalable AI Safety Ensuring Framework Beomjun Kim et.al. 2504.20924 null
2025-04-29 When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines Sachin R. Pendse et.al. 2504.20910 null
2025-04-25 AI Awareness Xiaojian Li et.al. 2504.20084 null
2025-04-28 Mitigating Societal Cognitive Overload in the Age of AI: Challenges and Directions Salem Lahlou et.al. 2504.19990 null
2025-05-02 Securing Agentic AI: A Comprehensive Threat Model and Mitigation Framework for Generative AI Agents Vineeth Sai Narajala et.al. 2504.19956 null
2025-04-28 AI Alignment in Medical Imaging: Unveiling Hidden Biases Through Counterfactual Analysis Haroui Ma et.al. 2504.19621 link
2025-04-26 Latent Adversarial Training Improves the Representation of Refusal Alexandra Abbas et.al. 2504.18872 null
2025-04-25 AI Safety Assurance for Automated Vehicles: A Survey on Research, Standardization, Regulation Lars Ullrich et.al. 2504.18328 null
2025-04-25 RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models Bang An et.al. 2504.18041 null
2025-04-17 Security-First AI: Foundations for Robust and Trustworthy Systems Krti Tallam et.al. 2504.16110 null
2025-04-21 Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds Heidy Khlaaf et.al. 2504.15088 null
2025-04-20 A Byzantine Fault Tolerance Approach towards AI Safety John deVadoss et.al. 2504.14668 null
2025-04-20 Seeing Through Risk: A Symbolic Approximation of Prospect Theory Ali Arslan Yousaf et.al. 2504.14448 null
2025-04-16 AI Safety Should Prioritize the Future of Work Sanchaita Hazra et.al. 2504.13959 null
2025-04-17 In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate? Ben Bucknall et.al. 2504.12914 null
2025-04-16 Secure Transfer Learning: Training Clean Models Against Backdoor in (Both) Pre-trained Encoders and Downstream Datasets Yechao Zhang et.al. 2504.11990 null
2025-04-14 The Jailbreak Tax: How Useful are Your Jailbreak Outputs? Kristina Nikolić et.al. 2504.10694 link
2025-04-14 Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models? Yanbo Wang et.al. 2504.10000 null
2025-04-13 The Structural Safety Generalization Problem Julius Broomfield et.al. 2504.09712 link
2025-04-13 Mitigating Many-Shot Jailbreaking Christopher M. Ackerman et.al. 2504.09604 null
2025-04-10 Geneshift: Impact of different scenario shift on Jailbreaking LLM Tianyi Wu et.al. 2504.08104 null
2025-04-10 The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search Yutaro Yamada et.al. 2504.08066 link
2025-04-10 Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge Riccardo Cantini et.al. 2504.07887 link
2025-04-07 Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs Ling Hu et.al. 2504.04994 null
2025-04-05 Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability Vishnu Kabir Chhabra et.al. 2504.04215 null
2025-04-05 Among Us: A Sandbox for Agentic Deception Satvik Golechha et.al. 2504.04072 null
2025-04-03 Improving Harmful Text Detection with Joint Retrieval and External Knowledge Zidong Yu et.al. 2504.02310 null
2025-04-02 Reinsuring AI: Energy, Agriculture, Finance & Medicine as Precedents for Scalable Governance of Frontier Artificial Intelligence Nicholas Stetler et.al. 2504.02127 null
2025-03-28 A Framework for Cryptographic Verifiability of End-to-End AI Pipelines Kar Balan et.al. 2503.22573 null
2025-03-28 Effective Automation to Support the Human Infrastructure in AI Red Teaming Alice Qian Zhang et.al. 2503.22116 null
2025-03-28 Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories Yazhou Zhang et.al. 2503.22115 null
2025-03-31 MAD Chairs: A new tool to evaluate AI Chris Santos-Lang et.al. 2503.20986 null
2025-03-26 The Backfiring Effect of Weak AI Safety Regulation Benjamin Laufer et.al. 2503.20848 null
2025-03-26 AI Safety in the Eyes of the Downstream Developer: A First Look at Concerns, Practices, and Challenges Haoyu Gao et.al. 2503.19444 null
2025-03-18 International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty Rebecca Scholefield et.al. 2503.18956 null
2025-03-22 Intelligence Sequencing and the Path-Dependence of Intelligence Evolution: AGI-First vs. DCI-First as Irreversible Attractors Andy E. Williams et.al. 2503.17688 null
2025-03-17 AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations Dillon Bowen et.al. 2503.17388 null
2025-03-18 Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models Prashant Kulkarni et.al. 2503.15560 link
2025-03-19 A Peek Behind the Curtain: Using Step-Around Prompt Engineering to Identify Bias and Misinformation in GenAI Models Don Hickerson et.al. 2503.15205 null
2025-03-17 ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction Tong Zhou et.al. 2503.13224 null
2025-03-17 Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering Kenneth J. K. Ong et.al. 2503.12722 null

(back to top)

Prompt Injection

Publish Date Title Authors PDF Code
2025-07-21 Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems Andrii Balashov et.al. 2507.15613 null
2025-07-21 QSAF: A Novel Mitigation Framework for Cognitive Degradation in Agentic AI Hammad Atta et.al. 2507.15330 null
2025-07-21 PromptArmor: Simple yet Effective Prompt Injection Defenses Tianneng Shi et.al. 2507.15219 null
2025-07-20 DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection Jerry Wang et.al. 2507.15042 null
2025-07-20 AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning Yi Zhang et.al. 2507.14987 null
2025-07-20 Hierarchical Cross-modal Prompt Learning for Vision-Language Models Hao Zheng et.al. 2507.14976 null
2025-07-20 Strategic Integration of AI Chatbots in Physics Teacher Preparation: A TPACK-SWOT Analysis of Pedagogical, Epistemic, and Cybersecurity Dimensions N. Mohammadipour et.al. 2507.14860 null
2025-07-20 Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree Sam Johnson et.al. 2507.14799 null
2025-07-18 Innocence in the Crossfire: Roles of Skip Connections in Jailbreaking Visual Language Models Palash Nandi et.al. 2507.13761 null
2025-07-18 TopicAttack: An Indirect Prompt Injection Attack via Topic Transition Yulin Chen et.al. 2507.13686 null
2025-07-17 Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers Liang Lin et.al. 2507.13474 null
2025-07-17 Prompt Injection 2.0: Hybrid AI Threats Jeremy McHugh et.al. 2507.13169 null
2025-07-17 MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems Yu Cui et.al. 2507.13038 null
2025-07-16 Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks Rina Mishra et.al. 2507.12185 null
2025-07-16 LLMs Encode Harmfulness and Refusal Separately Jiachen Zhao et.al. 2507.11878 null
2025-07-15 Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility Brendan Murphy et.al. 2507.11630 null
2025-07-14 ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning Zhengyue Zhao et.al. 2507.11500 null
2025-07-15 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Zichen Wen et.al. 2507.11097 null
2025-07-17 SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems Wenliang Shan et.al. 2507.08898 null
2025-07-10 A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking Zhengye Han et.al. 2507.08207 null
2025-07-10 Defending Against Prompt Injection With a Few DefensiveTokens Sizhe Chen et.al. 2507.07974 null
2025-07-10 GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing Peiyan Zhang et.al. 2507.07735 null
2025-07-10 May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks Nishit V. Pandya et.al. 2507.07417 null
2025-07-09 An attention-aware GNN-based input defender against multi-turn jailbreak on LLMs Zixuan Huang et.al. 2507.07146 null
2025-07-11 The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover Matteo Lupinacci et.al. 2507.06850 null
2025-07-09 On the Robustness of Verbal Confidence of LLMs in Adversarial Attacks Stephen Obadinma et.al. 2507.06489 null
2025-07-09 Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models Aaron Dharna et.al. 2507.06466 null
2025-07-08 Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms Tarek Gasmi et.al. 2507.06323 null
2025-07-08 The bitter lesson of misuse detection Hadrien Mariaccia et.al. 2507.06282 null
2025-07-08 Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review Zhicheng Lin et.al. 2507.06185 null
2025-07-08 CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations Xiaohu Li et.al. 2507.06043 null
2025-07-08 TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data Aravind Cheruvu et.al. 2507.05660 null
2025-07-08 How Not to Detect Prompt Injections with an LLM Sarthak Choudhary et.al. 2507.05630 null
2025-07-07 A Systematization of Security Vulnerabilities in Computer Use Agents Daniel Jones et.al. 2507.05445 null
2025-07-07 Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models Ziqi Miao et.al. 2507.05248 null
2025-07-07 Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message Wei Duan et.al. 2507.04673 null
2025-07-06 Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking Tim Beyer et.al. 2507.04446 null
2025-07-06 Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs Xiaomeng Hu et.al. 2507.04365 null
2025-07-04 On Jailbreaking Quantized Language Models Through Fault Injection Attacks Noureldin Zahran et.al. 2507.03236 null
2025-07-03 Adversarial Manipulation of Reasoning Models using Internal Representations Kureha Yamaguchi et.al. 2507.03167 null
2025-07-03 LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users Almog Hilel et.al. 2507.02850 null
2025-07-03 Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection Ziqi Miao et.al. 2507.02844 null
2025-07-03 Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models Riccardo Cantini et.al. 2507.02799 null
2025-07-03 Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks Sizhe Chen et.al. 2507.02735 null
2025-07-03 PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage Krishna Kanth Nakka et.al. 2507.02332 null
2025-07-02 MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation Lu Yan et.al. 2507.02057 null
2025-07-02 SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism Beitao Chen et.al. 2507.01513 null
2025-07-01 Reasoning as an Adaptive Defense for Safety Taeyoun Kim et.al. 2507.00971 null
2025-07-01 SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents Siyuan Liang et.al. 2507.00841 null
2025-07-02 Transferable Modeling Strategies for Low-Resource LLM Tasks: A Prompt and Alignment-Based Approach Shuangquan Lyu et.al. 2507.00601 null
2025-06-30 Linearly Decoding Refused Knowledge in Aligned Language Models Aryan Shrivastava et.al. 2507.00239 null
2025-06-30 Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models Tung-Ling Li et.al. 2506.24056 null
2025-06-30 Leveraging the Potential of Prompt Engineering for Hate Speech Detection in Low-Resource Languages Ruhina Tabasshum Prome et.al. 2506.23930 null
2025-06-30 Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models Maria Carolina Cornelia Wit et.al. 2506.23576 null
2025-06-29 From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows Mohamed Amine Ferrag et.al. 2506.23260 null
2025-06-28 Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models Younwoo Choi et.al. 2506.22957 null
2025-06-27 VERA: Variational Inference Framework for Jailbreaking Large Language Models Anamika Lochab et.al. 2506.22666 null
2025-06-27 MetaCipher: A General and Extensible Reinforcement Learning Framework for Obfuscation-Based Jailbreak Attacks on Black-Box LLMs Boyuan Chen et.al. 2506.22557 null
2025-07-01 Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center James Wen et.al. 2506.22523 null
2025-06-27 A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety Camille François et.al. 2506.22183 null
2025-06-27 Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses Mohamed Ahmed et.al. 2506.21972 null
2025-06-24 PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty Jinwen He et.al. 2506.19563 null
2025-06-24 MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models Yinan Xia et.al. 2506.19257 null
2025-06-23 Command-V: Pasting LLM Behaviors via Activation Profiles Barry Wang et.al. 2506.19140 null
2025-06-23 Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems Valerii Gakh et.al. 2506.19109 null
2025-06-23 Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks Xiaodong Wu et.al. 2506.18543 null
2025-06-23 NSFW-Classifier Guided Prompt Sanitization for Safe Text-to-Image Generation Yu Xie et.al. 2506.18325 null
2025-06-22 Multi-turn Jailbreaking via Global Refinement and Active Fabrication Hua Tang et.al. 2506.17881 null
2025-06-20 Semantic-Aware Parsing for Security Logs Julien Piet et.al. 2506.17512 null
2025-06-20 From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers Jingtong Su et.al. 2506.17052 null
2025-06-20 MIST: Jailbreaking Black-box Large Language Models via Iterative Semantic Tuning Muyang Zheng et.al. 2506.16792 null
2025-06-20 Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models Lei Jiang et.al. 2506.16760 null
2025-06-19 Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models Biao Yi et.al. 2506.16447 null
2025-06-19 Probing the Robustness of Large Language Models Safety to Latent Perturbations Tianle Gu et.al. 2506.16078 link
2025-06-18 Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts Kartik Sharma et.al. 2506.15751 null
2025-06-18 Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers Tommaso Green et.al. 2506.15674 link
2025-06-18 From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem Yanxu Mao et.al. 2506.15170 null
2025-06-17 OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents Thomas Kuntz et.al. 2506.14866 link
2025-06-17 AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models Ads Dawson et.al. 2506.14682 link
2025-06-16 Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations Abhilekh Borah et.al. 2506.13901 null
2025-06-17 Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions Junfeng Jiao et.al. 2506.13510 link
2025-06-15 Jailbreak Strength and Model Similarity Predict Transferability Rico Angell et.al. 2506.12913 null
2025-06-15 Universal Jailbreak Suffixes Are Strong Attention Hijackers Matan Ben-Tov et.al. 2506.12880 link
2025-06-15 SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression Yucheng Li et.al. 2506.12707 null
2025-06-15 Alphabet Index Mapping: Jailbreaking LLMs through Semantic Dissimilarity Bilal Saleh Husain et.al. 2506.12685 null
2025-06-14 Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025 Zonghao Ying et.al. 2506.12430 link
2025-06-14 Exploring the Secondary Risks of Large Language Models Jiawei Chen et.al. 2506.12382 null
2025-06-14 QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety Taegyeong Lee et.al. 2506.12299 null
2025-06-13 InfoFlood: Jailbreaking Large Language Models with Information Overload Advait Yadav et.al. 2506.12274 null
2025-06-13 Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models Jinming Wen et.al. 2506.11521 null
2025-06-12 How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts? Sohee Yang et.al. 2506.10979 null
2025-06-12 SoK: Evaluating Jailbreak Guardrails for Large Language Models Xunguang Wang et.al. 2506.10597 link
2025-06-10 Evaluation empirique de la sécurisation et de l'alignement de ChatGPT et Gemini: analyse comparative des vulnérabilités par expérimentations de jailbreaks Rafaël Nouailles et.al. 2506.10029 null
2025-06-09 LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges Haoyang Li et.al. 2506.10022 link
2025-06-11 LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge Sahar Abdelnabi et.al. 2506.09956 link
2025-06-11 Effective Red-Teaming of Policy-Adherent Agents Itay Nakash et.al. 2506.09600 null
2025-06-11 AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI) Danush Khanna et.al. 2506.08885 null
2025-06-11 Design Patterns for Securing LLM Agents against Prompt Injections Luca Beurer-Kellner et.al. 2506.08837 null
2025-06-09 TokenBreak: Bypassing Text Classification Models Through Token Manipulation Kasimir Schulz et.al. 2506.07948 null
2025-06-11 RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards Jingnan Zheng et.al. 2506.07736 null
2025-06-09 Evaluating LLMs Robustness in Less Resourced Languages with Proxy Models Maciej ChrabÄ…szcz et.al. 2506.07645 null
2025-06-09 TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts Torsten KrauĂź et.al. 2506.07596 null
2025-06-09 When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment Yuxin Xiao et.al. 2506.07452 link
2025-06-09 Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures Yukai Zhou et.al. 2506.07402 null
2025-06-08 AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint Leheng Sheng et.al. 2506.07022 link
2025-06-10 Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test Xiaoyuan Zhu et.al. 2506.06975 null
2025-06-06 Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Ruizhong Qiu et.al. 2506.06444 link
2025-06-06 Small Models, Big Support: A Local LLM Framework for Teacher-Centric Content Creation and Assessment using RAG and CAG Zarreen Reza et.al. 2506.05925 null
2025-06-06 To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt Zhilong Wang et.al. 2506.05739 null
2025-06-05 Sentinel: SOTA model to protect against prompt injections Dror Ivry et.al. 2506.05446 null
2025-06-05 Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets Lei Hsiung et.al. 2506.05346 null
2025-06-05 HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model Youngwan Lee et.al. 2506.04704 null
2025-06-06 TracLLM: A Generic Framework for Attributing Long Context LLMs Yanting Wang et.al. 2506.04202 link
2025-06-03 Adversarial Attacks on Robotic Vision Language Action Models Eliot Krzysztof Jones et.al. 2506.03350 link
2025-06-03 It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics Matthew Kowal et.al. 2506.02873 null
2025-06-03 ATAG: AI-Agent Application Threat Assessment with Attack Graphs Parth Atulbhai Gandhi et.al. 2506.02859 null
2025-06-03 From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV Yousef Emami et.al. 2506.02649 null
2025-06-03 BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage Kalyan Nakka et.al. 2506.02479 link
2025-06-03 VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents Tri Cao et.al. 2506.02456 link
2025-06-02 ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs Zeming Wei et.al. 2506.01770 link
2025-06-02 Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models Youze Wang et.al. 2506.01307 null
2025-06-01 Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution Meysam Alizadeh et.al. 2506.01055 null
2025-06-01 Predicting Empirical AI Research Outcomes with Language Models Jiaxin Wen et.al. 2506.00794 null
2025-06-01 Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning Weiyang Guo et.al. 2506.00782 null
2025-05-30 TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis Xiaorui Wu et.al. 2505.24672 link
2025-05-30 Benchmarking Large Language Models for Cryptanalysis and Mismatched-Generalization Utsav Maskey et.al. 2505.24621 null
2025-05-30 AMIA: Automatic Masking and Joint Intention Analysis Makes LVLMs Robust Jailbreak Defenders Yuqi Zhang et.al. 2505.24519 null
2025-05-30 Model Unlearning via Sparse Autoencoder Subspace Guided Projections Xu Wang et.al. 2505.24428 null
2025-05-30 From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models Haibo Jin et.al. 2505.24232 null
2025-05-30 SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems Xu He et.al. 2505.24201 null
2025-05-29 LLM Agents Should Employ Security Principles Kaiyuan Zhang et.al. 2505.24019 null
2025-05-29 Securing AI Agents with Information-Flow Control Manuel Costa et.al. 2505.23643 link
2025-05-29 Understanding Refusal in Language Models with Sparse Autoencoders Wei Jie Yeo et.al. 2505.23556 link
2025-05-29 Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models Mingyu Yu et.al. 2505.23404 null
2025-05-28 Operationalizing CaMeL: Strengthening LLM Defenses for Enterprise Deployment Krti Tallam et.al. 2505.22852 null
2025-05-28 Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing Yifan Lu et.al. 2505.22298 null
2025-05-28 Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models Yongcan Yu et.al. 2505.22271 null
2025-05-28 Jailbreak Distillation: Renewable Safety Benchmarking Jingyu Zhang et.al. 2505.22037 null
2025-05-28 RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments Zeyi Liao et.al. 2505.21936 link
2025-05-27 Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation Tharindu Kumarage et.al. 2505.21784 null
2025-05-26 Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts Hee-Seon Kim et.al. 2505.21556 null
2025-05-28 Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space Yao Huang et.al. 2505.21277 link
2025-05-27 Improved Representation Steering for Language Models Zhengxuan Wu et.al. 2505.20809 link
2025-05-26 Holes in Latent Space: Topological Signatures Under Adversarial Influence Aideen Fay et.al. 2505.20435 null
2025-05-26 Lifelong Safety Alignment for Language Models Haoyu Wang et.al. 2505.20259 link
2025-05-26 Capability-Based Scaling Laws for LLM Red-Teaming Alexander Panfilov et.al. 2505.20162 link
2025-05-26 Attention! You Vision Language Model Could Be Maliciously Manipulated Xiaosen Wang et.al. 2505.19911 null
2025-05-26 What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs Sangyeop Kim et.al. 2505.19773 null
2025-05-26 SGM: A Framework for Building Specification-Guided Moderation Filters Masoomali Fatehkia et.al. 2505.19766 null
2025-05-26 VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models Bingrui Sima et.al. 2505.19684 null
2025-05-26 JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models Jiaxin Song et.al. 2505.19610 null
2025-05-25 GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization Zixuan Chen et.al. 2505.18979 null
2025-05-25 Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations Sanjay Kariyappa et.al. 2505.18907 null
2025-05-24 Security Concerns for Large Language Models: A Survey Miles Q. Li et.al. 2505.18889 null
2025-05-24 Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework Binhao Ma et.al. 2505.18864 link
2025-05-23 Survival Games: Human-LLM Strategic Showdowns under Severe Resource Scarcity Zhihong Chen et.al. 2505.17937 link
2025-05-23 Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking? Chengda Lu et.al. 2505.17650 null
2025-05-23 Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models Jiawei Kong et.al. 2505.17601 null
2025-05-23 One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs Linbao Li et.al. 2505.17598 link
2025-05-23 JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models Zifan Peng et.al. 2505.17568 link
2025-05-23 Chain-of-Lure: A Synthetic Narrative-Driven Approach to Compromise Large Language Models Wenhan Chang et.al. 2505.17519 null
2025-05-22 Refusal Direction is Universal Across Safety-Aligned Languages Xinpeng Wang et.al. 2505.17306 null
2025-05-22 In-Context Watermarks for Large Language Models Yepeng Liu et.al. 2505.16934 null
2025-05-22 When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques Jianing Geng et.al. 2505.16765 null
2025-05-23 Finetuning-Activated Backdoors in LLMs Thibaud Gloaguen et.al. 2505.16567 link
2025-05-22 Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models Zhaoxin Wang et.al. 2505.16446 null
2025-05-22 Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers Viet-Anh Nguyen et.al. 2505.16241 null
2025-05-22 SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning Kaiwen Zhou et.al. 2505.16186 null
2025-05-21 Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval Taiye Chen et.al. 2505.15753 null
2025-05-21 Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses Xiaoxue Yang et.al. 2505.15738 link
2025-05-21 Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries Yuhao Wang et.al. 2505.15420 null
2025-05-21 Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models Zirui Song et.al. 2505.15406 link
2025-05-20 SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment Wonje Jeung et.al. 2505.14667 null
2025-05-20 sudoLLM : On Multi-role Alignment of Language Models Soumadeep Saha et.al. 2505.14607 null
2025-05-20 Can Large Language Models Really Recognize Your Name? Dzung Pham et.al. 2505.14549 link
2025-05-20 Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders Agam Goyal et.al. 2505.14536 null
2025-05-20 Lessons from Defending Gemini Against Indirect Prompt Injections Chongyang Shi et.al. 2505.14534 null
2025-05-20 Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs Jiawen Wang et.al. 2505.14368 null
2025-05-20 Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion Tiehan Cui et.al. 2505.14316 null
2025-05-20 EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection Yijie Lu et.al. 2505.14289 null
2025-05-20 "Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs Darpan Aswal et.al. 2505.14226 null
2025-05-20 AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models Guangke Chen et.al. 2505.14103 null
2025-05-19 Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks Narek Maloyan et.al. 2505.13348 null
2025-05-19 I'll believe it when I see it: Images increase misinformation sharing in Vision-Language Models Alice Plebe et.al. 2505.13302 link
2025-05-19 The Hidden Dangers of Browsing AI Agents Mykyta Mudryi et.al. 2505.13076 null
2025-05-18 BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation Wenqi Lyu et.al. 2505.12443 null
2025-05-18 CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement Gauri Kholkar et.al. 2505.12368 null
2025-05-18 The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models Linghan Huang et.al. 2505.12287 null
2025-05-17 Why Not Act on What You Know? Unleashing Safety Potential of LLMs via Self-Aware Guard Enhancement Peng Ding et.al. 2505.12060 link
2025-05-17 Multilingual Collaborative Defense for Large Language Models Hongliang Li et.al. 2505.11835 link
2025-05-17 JULI: Jailbreak Large Language Models by Self-Introspection Jesson Wang et.al. 2505.11790 null
2025-05-16 EnvInjection: Environmental Prompt Injection Attack to Multi-modal Web Agents Xilong Wang et.al. 2505.11717 null
2025-05-16 ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks Zhixiong Zhuang et.al. 2505.11459 null
2025-05-16 CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs Sijia Chen et.al. 2505.11413 null
2025-05-16 AutoRAN: Weak-to-Strong Jailbreaking of Large Reasoning Models Jiacheng Liang et.al. 2505.10846 link
2025-05-16 LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs Ran Li et.al. 2505.10838 null
2025-05-15 Dark LLMs: The Growing Threat of Unaligned AI Models Michael Fire et.al. 2505.10066 null
2025-05-15 Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data Adel ElZemity et.al. 2505.09974 null
2025-05-16 PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization Yidan Wang et.al. 2505.09921 link
2025-05-14 Adversarial Attack on Large Language Models using Exponentiated Gradient Descent Sajib Biswas et.al. 2505.09820 link
2025-05-14 Adversarial Suffix Filtering: a Defense Pipeline for LLMs David Khachaturov et.al. 2505.09602 null
2025-05-11 TokenProber: Jailbreaking Text-to-image Models via Fine-grained Word Impact Analysis Longtian Wang et.al. 2505.08804 null
2025-05-13 A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem Sunday Oyinlola Ogundoyin et.al. 2505.08148 link
2025-05-12 Concept-Level Explainability for Auditing & Steering LLM Responses Kenza Amara et.al. 2505.07610 link
2025-05-12 One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models Haoran Gu et.al. 2505.07167 null
2025-05-10 Jailbreaking the Text-to-Video Generative Models Jiayang Liu et.al. 2505.06679 null
2025-05-10 Practical Reasoning Interruption Attacks on Reasoning Large Language Models Yu Cui et.al. 2505.06643 null
2025-05-10 Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model Xinyue Lou et.al. 2505.06538 link
2025-05-10 System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection Jiawei Guo et.al. 2505.06493 null
2025-05-08 Defending against Indirect Prompt Injection by Instruction Detection Tongyu Wen et.al. 2505.06311 link
2025-05-09 AgentXploit: End-to-End Redteaming of Black-Box AI Agents Zhun Wang et.al. 2505.05849 null
2025-05-12 LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities Kalyan Nakka et.al. 2505.05619 link
2025-05-07 Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs Chetan Pathade et.al. 2505.04806 null
2025-05-07 Safeguard-by-Development: A Privacy-Enhanced Development Paradigm for Multi-Agent Collaboration Systems Jian Cui et.al. 2505.04799 null
2025-05-07 A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models Pedro Pinacho-Davidson et.al. 2505.04784 null
2025-05-07 The Aloe Family Recipe for Open and Specialized Healthcare LLMs Dario Garcia-Gasulla et.al. 2505.04388 null
2025-05-07 Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety Variath Madhupal Gautham Nair et.al. 2505.04146 null
2025-05-06 LlamaFirewall: An open source guardrail system for building secure AI agents Sahana Chennabasappa et.al. 2505.03574 null
2025-05-03 Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs Haoming Yang et.al. 2505.02862 null
2025-05-04 Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents Christian Schroeder de Witt et.al. 2505.02077 null
2025-05-05 Helping Large Language Models Protect Themselves: An Enhanced Filtering and Summarization System Sheikh Samit Muhaimin et.al. 2505.01315 null
2025-05-01 OET: Optimization-based prompt injection Evaluation Toolkit Jinsheng Pan et.al. 2505.00843 link
2025-05-05 The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them) Zihao Wang et.al. 2505.00626 null
2025-04-29 HyPerAlign: Hypotheses-driven Personalized Alignment Cristina Garbacea et.al. 2505.00038 null
2025-04-30 XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs Marco Arazzi et.al. 2504.21700 null
2025-04-30 Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs Pan Suo et.al. 2504.21680 null
2025-04-30 The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning Siyi Chen et.al. 2504.21307 null
2025-04-29 CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks Rui Wang et.al. 2504.21228 null
2025-04-29 ACE: A Security Architecture for LLM-Integrated App Systems Evan Li et.al. 2504.20984 null
2025-04-29 AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security Zikui Cai et.al. 2504.20965 link
2025-04-29 Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption Wenxiao Wang et.al. 2504.20769 null
2025-04-29 Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression Yu Cui et.al. 2504.20493 null
2025-04-29 Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction Yulin Chen et.al. 2504.20472 null
2025-04-29 Inception: Jailbreak the Memory Mechanism of Text-to-Image Generation Systems Shiqian Zhao et.al. 2504.20376 null
2025-04-28 Prompt Injection Attack to Tool Selection in LLM Agents Jiawen Shi et.al. 2504.19793 null
2025-04-29 Security Steerability is All You Need Itay Hazan et.al. 2504.19521 null
2025-04-28 JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift Julien Piet et.al. 2504.19440 link
2025-04-27 Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling Ishan Kavathekar et.al. 2504.19277 link
2025-04-26 Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs Mohammad Akbar-Tajari et.al. 2504.19019 link
2025-04-22 WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks Ivan Evtimov et.al. 2504.18575 link
2025-04-25 Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections Narek Maloyan et.al. 2504.18333 null
2025-04-23 Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-based Multi-Agent Debate Senmao Qi et.al. 2504.16489 null
2025-04-20 Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection Xiangyu Chang et.al. 2504.16125 null
2025-04-26 T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models Siyuan Liang et.al. 2504.15512 null
2025-04-21 MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning Yahan Yang et.al. 2504.15241 null
2025-04-20 Prompt-Hacking: The New p-Hacking? Thomas Kosch et.al. 2504.14571 null
2025-04-20 LLM-Enabled In-Context Learning for Data Collection Scheduling in UAV-assisted Sensor Networks Yousef Emami et.al. 2504.14556 null
2025-04-25 Manipulating Multimodal Agents via Cross-Modal Prompt Injection Le Wang et.al. 2504.14348 null
2025-04-18 DETAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification Yu Li et.al. 2504.13562 null
2025-04-15 X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents Salman Rahman et.al. 2504.13203 null
2025-04-15 Concept Enhancement Engineering: A Lightweight and Efficient Robust Defense Against Jailbreak Attacks in Embodied AI Jirui Yang et.al. 2504.13201 null
2025-04-17 GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms Sinan He et.al. 2504.13052 null
2025-04-17 ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition Haidar Khan et.al. 2504.12562 link
2025-04-14 You've Changed: Detecting Modification of Black-Box Large Language Models Alden Dima et.al. 2504.12335 null
2025-04-15 DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks Yupei Liu et.al. 2504.11358 link
2025-04-16 Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails William Hackett et.al. 2504.11168 null
2025-04-15 Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models Jiangtao Liu et.al. 2504.11106 null
2025-04-14 The Jailbreak Tax: How Useful are Your Jailbreak Outputs? Kristina Nikolić et.al. 2504.10694 link
2025-04-14 Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Tao Zhang et.al. 2504.10465 link
2025-04-16 LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks Soumyadeep Pal et.al. 2504.10185 link
2025-04-14 RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability Yichi Zhang et.al. 2504.10081 null
2025-04-14 StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models Yang Feng et.al. 2504.09841 null
2025-04-13 The Structural Safety Generalization Problem Julius Broomfield et.al. 2504.09712 link
2025-04-13 Mitigating Many-Shot Jailbreaking Christopher M. Ackerman et.al. 2504.09604 null
2025-04-13 ControlNET: A Firewall for RAG-based LLM System Hongwei Yao et.al. 2504.09593 null
2025-04-13 AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender Weixiang Zhao et.al. 2504.09466 null
2025-04-13 SaRO: Enhancing LLM Safety through Reasoning-based Alignment Yutao Mou et.al. 2504.09420 null
2025-04-12 Feature-Aware Malicious Output Detection and Mitigation Weilong Dong et.al. 2504.09191 null
2025-04-10 Geneshift: Impact of different scenario shift on Jailbreaking LLM Tianyi Wu et.al. 2504.08104 null
2025-04-10 Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge Riccardo Cantini et.al. 2504.07887 link
2025-04-10 Defense against Prompt Injection Attacks via Mixture of Encodings Ruiyi Zhang et.al. 2504.07467 link
2025-04-09 Bypassing Safety Guardrails in LLMs Using Humor Pedro Cisneros-Velarde et.al. 2504.06577 null
2025-04-08 Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking Junxi Chen et.al. 2504.05838 link
2025-04-08 Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators Xitao Li et.al. 2504.05689 null
2025-04-08 Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking Yu-Hang Wu et.al. 2504.05652 link
2025-04-07 A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models Carlos Peláez-González et.al. 2504.04976 null
2025-04-08 Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models Yubo Li et.al. 2504.04717 link
2025-04-06 StyleRec: A Benchmark Dataset for Prompt Recovery in Writing Style Transformation Shenyang Liu et.al. 2504.04373 null
2025-04-08 JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model Yi Nian et.al. 2504.03770 link
2025-04-03 More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment Yifan Wang et.al. 2504.02193 null
2025-04-02 Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses Zhengchun Shang et.al. 2504.02080 null
2025-04-02 Representation Bending for Large Language Model Safety Ashkan Yousefpour et.al. 2504.01550 link
2025-04-02 LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution Zhuoran Yang et.al. 2504.01533 null
2025-04-07 PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$ de Contextualization Aofan Liu et.al. 2504.01444 null
2025-04-07 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks Jiawei Wang et.al. 2504.01308 link
2025-04-02 Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning Si Chen et.al. 2504.01278 null
2025-04-01 Multilingual and Multi-Accent Jailbreaking of Audio LLMs Jaechul Roh et.al. 2504.01094 null
2025-04-01 Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics Shide Zhou et.al. 2504.00446 null
2025-03-31 Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms Shuoming Zhang et.al. 2503.24191 null
2025-03-29 Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions Shih-Han Chan et.al. 2503.23250 null
2025-03-27 Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing Johan Wahréus et.al. 2503.21598 null
2025-03-27 Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection Ryan Marinelli et.al. 2503.21464 link
2025-03-26 Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy Joonhyun Jeong et.al. 2503.20823 link
2025-03-26 Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models Shih-Wen Ke et.al. 2503.20320 null
2025-03-26 sudo rm -rf agentic_security Sejin Lee et.al. 2503.20279 link
2025-03-24 MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks Wenhao You et.al. 2503.19134 null
2025-03-23 SRMIR: Shadow Reward Models Based on Introspective Reasoning for LLM Alignment Ruoxi Cheng et.al. 2503.18991 null
2025-03-24 Defeating Prompt Injections by Design Edoardo Debenedetti et.al. 2503.18813 null
2025-03-23 Metaphor-based Jailbreaking Attacks on Text-to-Image Models Chenyu Zhang et.al. 2503.17987 null
2025-03-23 Smoke and Mirrors: Jailbreaking LLM-based Code Generation via Implicit Malicious Prompts Sheng Ouyang et.al. 2503.17953 null

(back to top)

Code Embedding

Publish Date Title Authors PDF Code
2025-07-21 AlgoSimBench: Identifying Algorithmically Similar Problems for Competitive Programming Jierui Li et.al. 2507.15378 null
2025-07-16 When Retriever Meets Generator: A Joint Model for Code Comment Generation Tien P. T. Le et.al. 2507.12558 null
2025-07-07 Unified Framework for Quantum Code Embedding Andrew C. Yuan et.al. 2507.05361 null
2025-05-27 Semi-supervised Clustering Through Representation Learning of Large-scale EHR Data Linshanshan Wang et.al. 2505.20731 null
2025-05-19 Towards A Generalist Code Embedding Model Based On Massive Data Synthesis Chaofan Li et.al. 2505.12697 link
2025-05-31 Improving the Context Length and Efficiency of Code Retrieval for Tracing Security Vulnerability Fixes Xueqing Liu et.al. 2503.22935 null
2025-07-17 OASIS: Order-Augmented Strategy for Improved Code Search Zuchen Gao et.al. 2503.08161 null
2025-03-10 Assessing Uncertainty in Stock Returns: A Gaussian Mixture Distribution-Based Method Yanlong Wang et.al. 2503.06929 null
2025-06-02 LoRACode: LoRA Adapters for Code Embeddings Saumya Chaturvedi et.al. 2503.05315 null
2025-03-07 Extended Controllability Tests for Quantum Decoherence-Free Subspaces Eric B. Kopp et.al. 2503.05155 null
2025-02-21 GNN-Coder: Boosting Semantic Code Retrieval with Combined GNNs and Transformer Yufan Ye et.al. 2502.15202 null
2025-03-16 Poisoned Source Code Detection in Code Models Ehab Ghannoum et.al. 2502.13459 null
2025-02-07 EnseSmells: Deep ensemble and programming language models for automated code smells detection Anh Ho et.al. 2502.05012 link
2025-03-26 Intelligent Code Embedding Framework for High-Precision Ransomware Detection via Multimodal Execution Path Analysis Levi Gareth et.al. 2501.15836 null
2024-12-18 Transducer Tuning: Efficient Model Adaptation for Software Tasks Using Code Property Graphs Imam Nur Bani Yusuf et.al. 2412.13467 link

(back to top)

Model Context Protocol

Publish Date Title Authors PDF Code
2025-07-08 Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms Tarek Gasmi et.al. 2507.06323 null
2025-07-05 We Urgently Need Privilege Management in MCP: A Measurement of API Usage in MCP Ecosystems Zhihao Li et.al. 2507.06250 null
2025-06-27 Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis Rafi Al Attrach et.al. 2507.01053 null
2025-07-01 VTS-Guided AI Interaction Workflow for Business Insights Sun Ding et.al. 2507.00347 null
2025-06-30 A Large-Scale Evolvable Dataset for Model Context Protocol Ecosystem and Security Analysis Zhiwei Lin et.al. 2506.23474 null
2025-06-29 From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows Mohamed Amine Ferrag et.al. 2506.23260 null
2025-06-18 RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments Yuchuan Fu et.al. 2506.15253 link
2025-06-08 Personalized Constitutionally-Aligned Agentic Superego: Secure AI Behavior Aligned to Diverse Human Values Nell Watson et.al. 2506.13774 null
2025-06-20 Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers Mohammed Mehedi Hasan et.al. 2506.13538 link
2025-06-12 QuantMCP: Grounding Large Language Models in Verifiable Financial Reality Yifan Zeng et.al. 2506.06622 null
2025-05-26 Survey of LLM Agent Communication with MCP: A Software Design Pattern Centric Review Anjana Sarkar et.al. 2506.05364 null
2025-06-05 Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol Ecosystem Hao Song et.al. 2506.02040 link
2025-06-02 ETDI: Mitigating Tool Squatting and Rug Pull Attacks in Model Context Protocol (MCP) by using OAuth-Enhanced Tool Definitions and Policy-Based Access Control Manish Bhatt et.al. 2506.01333 null
2025-05-30 Chances and Challenges of the Model Context Protocol in Digital Forensics and Incident Response Jan-Niclas Hilgert et.al. 2506.00274 null
2025-05-27 ADA: Automated Moving Target Defense for AI Workloads via Ephemeral Infrastructure-Native Rotation in Kubernetes Akram Sheriff et.al. 2505.23805 null
2025-05-29 MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment John Halloran et.al. 2505.23634 null
2025-05-28 AgentDNS: A Root Domain Naming System for LLM Agents Enfang Cui et.al. 2505.22368 null
2025-05-23 Gaming Tool Preferences in Agentic LLMs Kazem Faghih et.al. 2505.18135 link
2025-05-22 Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models Junjie Xiong et.al. 2505.16957 null
2025-05-16 MPMA: Preference Manipulation Attack Against Model Context Protocol Zihan Wang et.al. 2505.11154 null
2025-05-06 From Glue-Code to Protocols: A Critical Analysis of A2A and MCP Integration for Scalable Agent Systems Qiaomu Li et.al. 2505.03864 null
2025-05-23 A survey of agent interoperability protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP) Abul Ehtesham et.al. 2505.02279 null
2025-04-28 Simplified and Secure MCP Gateways for Enterprise AI Integration Ivo Brett et.al. 2504.19997 link
2025-04-28 Securing GenAI Multi-Agent Systems Against Tool Squatting: A Zero Trust Registry-Based Approach Vineeth Sai Narajala et.al. 2504.19951 null
2025-04-28 From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review Mohamed Amine Ferrag et.al. 2504.19678 null
2025-05-02 Building A Secure Agentic AI Application Leveraging A2A Protocol Idan Habler et.al. 2504.16902 null
2025-05-19 MCP Guardian: A Security-First Layer for Safeguarding MCP-Based AI System Sonu Kumar et.al. 2504.12757 null
2025-04-11 MCP Bridge: A Lightweight, LLM-Agnostic RESTful Proxy for Model Context Protocol Servers Arash Ahmadi et.al. 2504.08999 null
2025-05-02 Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies Vineeth Sai Narajala et.al. 2504.08623 null
2025-04-11 MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits Brandon Radosevich et.al. 2504.03767 link
2025-04-06 Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions Xinyi Hou et.al. 2503.23278 null

(back to top)

Supply Chain Attacks

Publish Date Title Authors PDF Code
2025-06-24 FuncVul: An Effective Function Level Vulnerability Detection Model using LLM and Code Chunk Sajal Halder et.al. 2506.19453 null
2025-05-30 When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs Xinyue Shen et.al. 2506.00197 null
2025-07-15 Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems Ronny Ko et.al. 2505.23847 null
2025-05-27 JavaSith: A Client-Side Framework for Analyzing Potentially Malicious Extensions in Browsers, VS Code, and NPM Packages Avihay Cohen et.al. 2505.21263 null
2025-06-30 LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries Zekun Wu et.al. 2505.08842 null
2025-05-07 Safeguard-by-Development: A Privacy-Enhanced Development Paradigm for Multi-Agent Collaboration Systems Jian Cui et.al. 2505.04799 null
2025-05-02 A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories Ziqi Ding et.al. 2505.01067 null
2025-04-29 Understanding Large Language Model Supply Chain: Structure, Domain, and Vulnerabilities Yanzhe Hu et.al. 2504.20763 null
2025-04-24 Automatically Generating Rules of Malicious Software Packages via Large Language Model XiangRui Zhang et.al. 2504.17198 null
2025-03-27 Malicious and Unintentional Disclosure Risks in Large Language Models for Code Generation Rafiqul Rabin et.al. 2503.22760 null
2025-05-26 The CodeInverter Suite: Control-Flow and Data-Mapping Augmented Binary Decompilation with LLMs Peipei Liu et.al. 2503.07215 null
2025-02-18 SoK: Understanding Vulnerabilities in the Large Language Model Supply Chain Shenao Wang et.al. 2502.12497 null
2025-01-31 Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities Arjun Krishna et.al. 2501.19012 null
2024-12-26 Integrating Artificial Open Generative Artificial Intelligence into Software Supply Chain Security Vasileios Alevizos et.al. 2412.19088 null
2024-12-23 Emerging Security Challenges of Large Language Models Herve Debar et.al. 2412.17614 null
2024-12-22 Enhancing Supply Chain Transparency in Emerging Economies Using Online Contents and LLMs Bohan Jin et.al. 2412.16922 null
2024-12-18 RAG for Effective Supply Chain Security Questionnaire Automation Zaynab Batool Reza et.al. 2412.13988 null
2025-03-30 Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors Yuefeng Peng et.al. 2411.01705 null
2024-11-03 Large Language Model Supply Chain: Open Problems From the Security Perspective Qiang Hu et.al. 2411.01604 null

(back to top)

About

🎓Automatically Update LLM Security Papers Daily using Github Actions

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%