Extension of group project for DL course at IIT KGP.
Make sure to correctly set path of dataset where it is loaded Just run the cells in order as they appear You will need to fine tune the model as the file in which weights are stored is pretty huge ~1GB Also I have provided two excel files which have captions stored generated by two models
I am currently trying to improve my custom model using BLIP-2 architecture. My custom model currently is Vit encoder + GPT decoder, I will add Qformer to improve its performance