A simple Streamlit app that uses the BLIP/BLIP2 models to generate captions for any uploaded image.
- Upload an image (jpg/png)
- The AI model will analyze it
- Youβll get a descriptive caption like:
βA cat wearing sunglasses while sitting on a couch.β
- Python
- Streamlit
- Transformers (BLIP model from Salesforce)
- Hugging Face
pip install streamlit transformers torch torchvision Pillow
streamlit run ai_image_captioning_app.pyWe tested the same image using 3 different models. Here's how their descriptions differ:
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xl")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xl")

