This repository contains code for implementing models and evaluating them for the Visual Storytelling task (using the VWP1 dataset):
On the Challenges in Evaluating Visually Grounded Stories—In proceedings of the Text2Story workshop (ECIR 2025).
Note: Despite being proposed specifically for visual storytelling, this method is generalizable and can be extended to any task involving model-generated outputs with corresponding references.
VWP dataset is constructed using scenes from movies. Compared to the popular VIST dataset:
- Visual sequences in VWP are well-connected and centered around recurring characters
- Stories are longer with diverse entities
The recently proposed
In this work, we use the v2.1 dataset. We discuss their performance, underline the challenges in evaluating the visually-grounded stories, and argue for considering more dimensions important for automatic narrative generation.
For generating stories using VLMs, use the following code:
pip install -r requirements.txt
python -u generate_stories.py --model qwen-vl (run python generate_stories.py --help for more options)
For training & generating stories using the TAPM (+LLAMA 2) model and for evaluating stories using the
🔗 If you find this work useful, please consider citing it:
@inproceedings{
}