RSICD multimodal image captioning

Remote Sensing Image Captioning Project for UNSW Comp9444

Original Repo

Model	BLEU-1	BLEU-2	BLEU-3	BLEU-4	METEOR	ROUGE_L	CIDEr	SPICE
VLAD + RNN	0.493	0.3091	0.2209	0.1677	0.1996	0.4242	1.0392	-
VIT + GPT2	0.5832	0.3456	0.2118	0.1371	0.3413	0.3306	0.3846	0.2124
BLIP1-Base	0.6809	0.5071	0.3865	0.3027	0.2579	0.4794	0.5864	0.2671
BLIP1-Large	0.7387	0.5773	0.4584	0.3719	0.2999	0.5397	0.8822	0.2917
SkyEye GPT	0.8773	0.777	0.689	0.6199	0.3623	0.6354	0.8937	-

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
data		data
final_submission_2025_T2		final_submission_2025_T2
outputs/mlat		outputs/mlat
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt