This project implements an AI-powered hallucination detector for multimodal models.
It checks whether an image-caption pair is consistent or potentially hallucinated, using a combination of BLIP (image captioning), CLIP (vision-language similarity) and semantic similarity models.
- 🖼️ Upload an image and test AI-generated or custom captions.
- 🤖 Generates captions automatically using BLIP.
- 🔗 Measures similarity between user/AI captions with CLIP and Sentence Transformers.
- 📊 Outputs a confidence score for consistency.
⚠️ Flags possible hallucinations when captions do not align with the image.- 🌐 Streamlit-based interactive web app.
- Streamlit – UI for interaction
- PyTorch – Deep learning framework
- CLIP – Vision-language model
- BLIP – Image captioning
- Sentence Transformers – Semantic similarity
- Clone the repository
git clone https://github.com/SamyukthaaAnand/Hallucination-Detection-in-Multimodal-LLMs.git cd Hallucination-Detection-in-Multimodal-LLMs - Set up a virtual environment (recommended)
python -m venv venv source venv/bin/activate # Mac/Linux venv\Scripts\activate # Windows
- Install dependencies
pip install -r requirements.txt
Run the Streamlit app:
streamlit run hallucination.py- Upload an image (JPG/PNG).
- Choose caption type:
- AI Generated (via BLIP)
- Custom (enter your own)
- The system compares caption vs. image using CLIP + semantic similarity.
- A confidence score is displayed:
- ✅ High score → Caption likely matches image.
⚠️ Low score → Possible hallucination.
MIT License – feel free to use and modify for research purposes.