A collection of papers on LLM/LVLM hallucination evaluation benchmark, detection, and mitigation.
We will continue to update this list with the latest resources. If you find any missed resources (paper/code) or errors, please feel free to open an issue or make a pull request.
-
[Li2023] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models in EMNLP, 2023. [paper]
-
[Chen2024] FactCHD: Benchmarking Fact-Conficting Hallucination Detection in IJCAI, 2024. [paper][code]
-
[Su2024] Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models in Arxiv, 2024.[paper][code]
-
[Kossen2024] Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs in Arxiv, 2024.[paper][code]
-
[Ji2024] ANAH: Analytical Annotation of Hallucinations in Large Language Models in ACL, 2024. [paper]
-
[Simhi2024] Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs in Arxiv, 2024. [paper][code]
-
[Li2024] The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models in Arxiv, 2024. [paper][code]
-
[Liu2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models in Arxiv, 2025. [paper][code]
-
[Niu2024] RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models in ACL, 2024. [paper][code]
-
[Chen2024] FactCHD: Benchmarking Fact-Conficting Hallucination Detection in IJCAI, 2024. [paper][code]
-
[Zhang2024] KnowHalu: Hallucination Detection via Multi-Form Knowledge-Based Factual Checking in Arxiv, 2024. [paper][code]
-
[Rawte2024] FACTOID: FACtual enTailment fOr hallucInation Detection in Arxiv, 2024. [paper][code]
-
[Es2024] RAGAs: Automated evaluation of retrieval augmented generation in EACL, 2024. [paper][code]
-
[Hu2024] RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models in Arxiv, 2024. [paper][code]
-
[Zhang2025] CORRECT: Context- and Reference-Augmented Reasoning and Prompting for Fact-Checking, in NAACL, 2025. [paper][code]
-
[Lee2025] Enhancing Hallucination Detection via Future Context, in Arxiv, 2025. [paper][code]
-
[Zhang2023] Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus in EMNLP, 2023. [paper][code]
-
[Snyder2024] On Early Detection of Hallucinations in Factual Question Answering in KDD, 2024.[paper][code]
-
[Chuang2024] Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps in EMNLP, 2024. [paper][code]
-
[Ji2024] LLM Internal States Reveal Hallucination Risk Faced With a Query in Arxiv, 2024. [paper][code]
-
[Bouchard2025] Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers in Arxiv, 2025. [paper][code]
-
[Ma2025] Semantic Energy: Detecting LLM Hallucination Beyond Entropy in Arxiv, 2025. [paper][code]
-
[Cohen2023] LM vs LM: Detecting Factual Errors via Cross Examination in Arxiv, 2023. [paper][code]
-
[Manakul2023] SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models in EMNLP, 2023. [paper][code]
-
[Chen2023] Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models in CIKM, 2023. [paper][code]
-
[Su2024] Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models in Arxiv, 2024.[paper][code]
-
[Mündler2024] Self-Contradictory Hallucinations of LLMs: Evaluation, Detection and Mitigation in ICLR, 2024.[paper][code]
-
[Kossen2024] Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs in ICML, 2024. [paper][code]
-
[Xu2024] Hallucination is Inevitable:An Innate Limitation of Large Language Models in Arxiv, 2024. [paper][code]
-
[Niu2025] Robust Hallucination Detection in LLMs via Adaptive Token Selection in NeurIPS, 2025.[paper][code]
-
[Sun2025] Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations in Arxiv, 2025. [paper][code]
-
[Islam2025] How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild in Arxiv, 2025. [paper][code]
-
[Muhammed2025] SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models in Arxiv, 2025. [paper][code]
-
[Yang2025] Hallucination Detection in Large Language Models with Metamorphic Relations in FSE, 2025. [paper][code]
Hidden States Analysis
-
[Azaria2023] The internal state of an llm knows when it’s lying in EMNLP findings, 2023. [paper][code]
-
[Chen2024] INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection in ICLR, 2024. [paper][code]
-
[Kuhn2023] Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation in ICLR, 2023. [paper][code]
-
[Farquhar2024] Detecting Hallucinations in Large Language Models Using Semantic Entropy in Nature,2024. [paper][code]
-
[Sriramanan2024] LLM-Check: Investigating Detection of Hallucinations in Large Language Models in NeurIPS, 2024. [paper][code]
-
[Wang2025] Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation in ICLR, 2025. [paper][code]
-
[Zhang2025] ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs in ACL, 2025. [paper][code]
-
[Cheang2025] Large Language Models Do NOT Really Know What They Don't Know in arXiv, 2025. [paper][code]
-
[Li2023] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model in NeurIPS, 2023. [paper][code]
-
[Liu2023] LitCab: Lightweight Language Model Calibration over Short- and Long-form Responses in ICLR,2023. [paper][code]
-
[Ji2023] Towards Mitigating Hallucination in Large Language Models via Self-Reflection in EMNLP findings, 2023. [paper]
-
[Chen2023] PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions in Arxiv, 2023 [paper]
-
[Campbell2023] Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching in Arxiv, 2023. [paper]
-
[Wan2023] Faithfulness-Aware Decoding Strategies for Abstractive Summarization in EACL, 2023. [paper][code]
-
[Shi2023] Trusting Your Evidence: Hallucinate Less with Context-aware Decoding in Arxiv, 2023. [paper]
-
[Chen2024] Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning in AAAI,2024. [paper][code]
-
[Zhang2024] R-Tuning: Instructing Large Language Models to Say `I Don't Know' in NAACL, 2023. [paper][code]
-
[Chuang2024] DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models in ICLR, 2024. [paper][code]
-
[Kapoor2024] Calibration-Tuning: Teaching Large Language Models to Know What They Don’t Know in UncertaiNLP, 2024. [paper]
-
[Zhang2024] TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space in ACL, 2024. [paper][code]
-
[Zhou2025] HaDeMiF: Hallucination Detection and Mitigation in Large Language Models in ICLR, 2025. [paper][code]
-
[Zhang2025] The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination in ACL, 2025. [paper][code]
-
[Wu2025] Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization in Arxiv, 2025. [paper][code]
-
[Cheng2025] Integrative Decoding: Improving Factuality via Implicit Self-consistency in ICLR, 2025. [paper][code]
-
[Yang2025] Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection in CVPR, 2025. [paper][code]
-
[Wan2025] ONLY:One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Model in ICCV, 2025. [paper][code]
-
[Chang2025] Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation in Arxiv, 2025. [paper][code]
-
[Wang2025] Image Tokens Matter: Mitigating Hallucination in Discrete Tokenizer-based Large Vision-Language Models via Latent Editing in Arxiv, 2025. [paper][code]
-
[Ji2022] RHO (ρ): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding in ACL findings, 2022. [paper]
-
[Kang2024] Unfamiliar Finetuning Examples Control How Language Models Hallucinate in Arxiv, 2024. [paper]
-
[Gekhman2024] Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? in EMNLP, 2024. [paper]
-
[Sun2025] Redeep: Detecting hallucination in retrieval-augmented generation via mechanistic interpretability in ICLR, 2025. [paper][code]
-
[Dey2025] Uncertainty-Aware Fusion: An Ensemble Framework for Mitigating Hallucinations in Large Language Models in WebConf, 2025. [paper][code]
-
[Lavrinovics2025] MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations in Arxiv, 2025. [paper][code]
-
[Sui2025] Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge in Arxiv, 2025. [paper][code]
-
[Ferrando2025] Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models in ICLR, 2025. [paper][code]
-
[Xue2025] UALIGN: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models in ACL, 2025. [paper][code]
-
[Cheng2025] Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector in EMNLP, 2024. [paper][code]
-
[Lee2022] Factuality Enhanced Language Models for Open-Ended Text Generation in NeurIPS, 2022. [paper][code]
-
[Tian2023] Fine-tuning Language Models for Factuality in ICLR, 2023. [paper][code]
-
[Lin2024] FLAME: Factuality-Aware Alignment for Large Language Models in NeurIPS, 2024.[paper]
-
[Yang2024] V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization in EMNLP Findings, 2024. [paper][code]
-
[Kang2024] Unfamiliar finetuning examples control how language in NAACL, 2024. [paper][code]
-
[Yang2025] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key in CVPR, 2025. [paper][code]
-
[Gu2025] Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs in ICLR, 2025. [paper][code]
-
[Wang2023] Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity in Arxiv,2023. [paper]
-
[Ye2023] Cognitive Mirage: A Review of Hallucinations in Large Language Models in Arxiv,2023. [paper]
-
[Zhang2023] Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models in Arxiv,2023. [paper]
-
[Gao2023] Retrieval-augmented generation for large language models: A survey in Arxiv, 2023. [paper]
-
[Huang2024] A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions in TOIS, 2024. [paper]
-
[Ji2024] Survey of Hallucination in Natural Language Generation in CSUR, 2024. [paper]
-
[Bai2024] Hallucination of Multimodal Large Language Models: A Survey in Arxiv, 2024. [paper]
-
[Chen2025] A Survey of Multimodal Hallucination Evaluation and Detection in Arxiv, 2025. [paper]
-
[Lin2025] LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions in Arxiv, 2025. [paper]