This project involves training an image captioning model using precomputed features and evaluating its performance using two different metrics: BLEU score and cosine similarity.
- The COCO dataset is used for training and evaluation.
- Split the dataset into training and validation sets.
- Extract and save image features using a pre-trained CNN model.
- Save the features map for later use.
- Tokenize the captions and build a vocabulary.
- Design a Recurrent Neural Network (RNN)-based decoder for generating captions.
- Train the decoder using precomputed image features.
- Display sample test images with model-generated captions and reference captions.
- Evaluate the model's performance on the test set.
- Calculate the average BLEU score on the entire test set.
- Display a histogram of the distribution of BLEU scores.
- Display high and low BLEU score examples with model predictions and reference captions.
- Calculate cosine similarity scores between generated captions and reference captions.
- Display a histogram of the distribution of cosine similarity scores.
- Display examples of high and low cosine similarity scores with model predictions and reference captions.
- Compare model performance using BLEU and cosine similarity.
- Discuss strengths and weaknesses of each method.
- Display examples where BLEU and cosine similarity scores are similar and different.
- Discuss the findings and implications.
- Summarize the project, highlighting key findings and insights.
- Discuss potential improvements and future work.