Awesome-Hallucination-Detection-and-Mitigation

A collection of papers on LLM/LVLM hallucination evaluation benchmark, detection, and mitigation.

We will continue to update this list with the latest resources. If you find any missed resources (paper/code) or errors, please feel free to open an issue or make a pull request.

Hallucinations Evaluation Benchmark

[Li2023] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models in EMNLP, 2023. [paper]
[Chen2024] FactCHD: Benchmarking Fact-Conficting Hallucination Detection in IJCAI, 2024. [paper][code]
[Su2024] Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models in Arxiv, 2024.[paper][code]
[Kossen2024] Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs in Arxiv, 2024.[paper][code]
[Ji2024] ANAH: Analytical Annotation of Hallucinations in Large Language Models in ACL, 2024. [paper]
[Simhi2024] Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs in Arxiv, 2024. [paper][code]

Causes of Hallucination

[Li2024] The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models in Arxiv, 2024. [paper][code]
[Liu2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models in Arxiv, 2025. [paper][code]

Hallucination Detection

Fact-checking

[Niu2024] RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models in ACL, 2024. [paper][code]
[Chen2024] FactCHD: Benchmarking Fact-Conficting Hallucination Detection in IJCAI, 2024. [paper][code]
[Zhang2024] KnowHalu: Hallucination Detection via Multi-Form Knowledge-Based Factual Checking in Arxiv, 2024. [paper][code]
[Rawte2024] FACTOID: FACtual enTailment fOr hallucInation Detection in Arxiv, 2024. [paper][code]
[Es2024] RAGAs: Automated evaluation of retrieval augmented generation in EACL, 2024. [paper][code]
[Hu2024] RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models in Arxiv, 2024. [paper][code]
[Zhang2025] CORRECT: Context- and Reference-Augmented Reasoning and Prompting for Fact-Checking, in NAACL, 2025. [paper][code]
[Lee2025] Enhancing Hallucination Detection via Future Context, in Arxiv, 2025. [paper][code]

Uncertainty Analysis

[Zhang2023] Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus in EMNLP, 2023. [paper][code]
[Snyder2024] On Early Detection of Hallucinations in Factual Question Answering in KDD, 2024.[paper][code]
[Chuang2024] Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps in EMNLP, 2024. [paper][code]
[Ji2024] LLM Internal States Reveal Hallucination Risk Faced With a Query in Arxiv, 2024. [paper][code]
[Bouchard2025] Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers in Arxiv, 2025. [paper][code]
[Ma2025] Semantic Energy: Detecting LLM Hallucination Beyond Entropy in Arxiv, 2025. [paper][code]

Consistency Measure

[Cohen2023] LM vs LM: Detecting Factual Errors via Cross Examination in Arxiv, 2023. [paper][code]
[Manakul2023] SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models in EMNLP, 2023. [paper][code]
[Chen2023] Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models in CIKM, 2023. [paper][code]
[Su2024] Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models in Arxiv, 2024.[paper][code]
[Mündler2024] Self-Contradictory Hallucinations of LLMs: Evaluation, Detection and Mitigation in ICLR, 2024.[paper][code]
[Kossen2024] Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs in ICML, 2024. [paper][code]
[Xu2024] Hallucination is Inevitable:An Innate Limitation of Large Language Models in Arxiv, 2024. [paper][code]
[Niu2025] Robust Hallucination Detection in LLMs via Adaptive Token Selection in NeurIPS, 2025.[paper][code]
[Sun2025] Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations in Arxiv, 2025. [paper][code]
[Islam2025] How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild in Arxiv, 2025. [paper][code]
[Muhammed2025] SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models in Arxiv, 2025. [paper][code]
[Yang2025] Hallucination Detection in Large Language Models with Metamorphic Relations in FSE, 2025. [paper][code]

Hidden States Analysis

[Azaria2023] The internal state of an llm knows when it’s lying in EMNLP findings, 2023. [paper][code]
[Chen2024] INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection in ICLR, 2024. [paper][code]
[Kuhn2023] Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation in ICLR, 2023. [paper][code]
[Farquhar2024] Detecting Hallucinations in Large Language Models Using Semantic Entropy in Nature,2024. [paper][code]
[Sriramanan2024] LLM-Check: Investigating Detection of Hallucinations in Large Language Models in NeurIPS, 2024. [paper][code]
[Wang2025] Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation in ICLR, 2025. [paper][code]
[Zhang2025] ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs in ACL, 2025. [paper][code]
[Cheang2025] Large Language Models Do NOT Really Know What They Don't Know in arXiv, 2025. [paper][code]

RL Reasoning

[Su2025] Learning to Reason for Hallucination Span Detection in Arxiv, 2025. [paper][code]

Hallucination Mitigation

Model Calibration

[Li2023] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model in NeurIPS, 2023. [paper][code]
[Liu2023] LitCab: Lightweight Language Model Calibration over Short- and Long-form Responses in ICLR,2023. [paper][code]
[Ji2023] Towards Mitigating Hallucination in Large Language Models via Self-Reflection in EMNLP findings, 2023. [paper]
[Chen2023] PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions in Arxiv, 2023 [paper]
[Campbell2023] Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching in Arxiv, 2023. [paper]
[Wan2023] Faithfulness-Aware Decoding Strategies for Abstractive Summarization in EACL, 2023. [paper][code]
[Shi2023] Trusting Your Evidence: Hallucinate Less with Context-aware Decoding in Arxiv, 2023. [paper]
[Chen2024] Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning in AAAI,2024. [paper][code]
[Zhang2024] R-Tuning: Instructing Large Language Models to Say `I Don't Know' in NAACL, 2023. [paper][code]
[Chuang2024] DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models in ICLR, 2024. [paper][code]
[Kapoor2024] Calibration-Tuning: Teaching Large Language Models to Know What They Don’t Know in UncertaiNLP, 2024. [paper]
[Zhang2024] TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space in ACL, 2024. [paper][code]
[Zhou2025] HaDeMiF: Hallucination Detection and Mitigation in Large Language Models in ICLR, 2025. [paper][code]
[Zhang2025] The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination in ACL, 2025. [paper][code]
[Wu2025] Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization in Arxiv, 2025. [paper][code]
[Cheng2025] Integrative Decoding: Improving Factuality via Implicit Self-consistency in ICLR, 2025. [paper][code]
[Yang2025] Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection in CVPR, 2025. [paper][code]
[Wan2025] ONLY:One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Model in ICCV, 2025. [paper][code]
[Chang2025] Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation in Arxiv, 2025. [paper][code]
[Wang2025] Image Tokens Matter: Mitigating Hallucination in Discrete Tokenizer-based Large Vision-Language Models via Latent Editing in Arxiv, 2025. [paper][code]

External Knowledge

[Ji2022] RHO (ρ): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding in ACL findings, 2022. [paper]
[Kang2024] Unfamiliar Finetuning Examples Control How Language Models Hallucinate in Arxiv, 2024. [paper]
[Gekhman2024] Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? in EMNLP, 2024. [paper]
[Sun2025] Redeep: Detecting hallucination in retrieval-augmented generation via mechanistic interpretability in ICLR, 2025. [paper][code]
[Dey2025] Uncertainty-Aware Fusion: An Ensemble Framework for Mitigating Hallucinations in Large Language Models in WebConf, 2025. [paper][code]
[Lavrinovics2025] MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations in Arxiv, 2025. [paper][code]
[Sui2025] Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge in Arxiv, 2025. [paper][code]
[Ferrando2025] Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models in ICLR, 2025. [paper][code]
[Xue2025] UALIGN: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models in ACL, 2025. [paper][code]
[Cheng2025] Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector in EMNLP, 2024. [paper][code]

Alignment-Fine-tuning

[Lee2022] Factuality Enhanced Language Models for Open-Ended Text Generation in NeurIPS, 2022. [paper][code]
[Tian2023] Fine-tuning Language Models for Factuality in ICLR, 2023. [paper][code]
[Lin2024] FLAME: Factuality-Aware Alignment for Large Language Models in NeurIPS, 2024.[paper]
[Yang2024] V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization in EMNLP Findings, 2024. [paper][code]
[Kang2024] Unfamiliar finetuning examples control how language in NAACL, 2024. [paper][code]
[Yang2025] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key in CVPR, 2025. [paper][code]
[Gu2025] Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs in ICLR, 2025. [paper][code]

Related Survey

[Wang2023] Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity in Arxiv,2023. [paper]
[Ye2023] Cognitive Mirage: A Review of Hallucinations in Large Language Models in Arxiv,2023. [paper]
[Zhang2023] Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models in Arxiv,2023. [paper]
[Gao2023] Retrieval-augmented generation for large language models: A survey in Arxiv, 2023. [paper]
[Huang2024] A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions in TOIS, 2024. [paper]
[Ji2024] Survey of Hallucination in Natural Language Generation in CSUR, 2024. [paper]
[Bai2024] Hallucination of Multimodal Large Language Models: A Survey in Arxiv, 2024. [paper]
[Chen2025] A Survey of Multimodal Hallucination Evaluation and Detection in Arxiv, 2025. [paper]
[Lin2025] LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions in Arxiv, 2025. [paper]

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Hallucination-Detection-and-Mitigation

Hallucinations Evaluation Benchmark

Causes of Hallucination

Hallucination Detection

Fact-checking

Uncertainty Analysis

Consistency Measure

Hidden States Analysis

RL Reasoning

Hallucination Mitigation

Model Calibration

External Knowledge

Alignment-Fine-tuning

Related Survey

Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

Awesome-Hallucination-Detection-and-Mitigation

Hallucinations Evaluation Benchmark

Causes of Hallucination

Hallucination Detection

Fact-checking

Uncertainty Analysis

Consistency Measure

Hidden States Analysis

RL Reasoning

Hallucination Mitigation

Model Calibration

External Knowledge

Alignment-Fine-tuning

Related Survey

Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages