Skip to content

On the Challenges in Evaluating Visually Grounded Stories – Text2Story workshop (ECIR 2025)

Notifications You must be signed in to change notification settings

akskuchi/vwp-visual-storytelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

CC BY license Python PyTorch HuggingFace

👀 What?

This repository contains code for implementing models and evaluating them for the Visual Storytelling task (using the VWP1 dataset):
On the Challenges in Evaluating Visually Grounded Stories—In proceedings of the Text2Story workshop (ECIR 2025).

Note: Despite being proposed specifically for visual storytelling, this method is generalizable and can be extended to any task involving model-generated outputs with corresponding references.

🤔 Why?

VWP dataset is constructed using scenes from movies. Compared to the popular VIST dataset:

  • Visual sequences in VWP are well-connected and centered around recurring characters
  • Stories are longer with diverse entities

The recently proposed $d_{HM}$2 metric evaluates model-generated stories by measuring their closeness to human stories along three dimensions—Coherence, Visual grounding, Repetition

In this work, we use the $d_{HM}$ metric and compare several general-purpose foundation vision-language-models (VLMs) with models trained specifically on the VWPv2.1 dataset. We discuss their performance, underline the challenges in evaluating the visually-grounded stories, and argue for considering more dimensions important for automatic narrative generation.

🤖 How?

For generating stories using VLMs, use the following code:
pip install -r requirements.txt
python -u generate_stories.py --model qwen-vl (run python generate_stories.py --help for more options)

For training & generating stories using the TAPM (+LLAMA 2) model and for evaluating stories using the $d_{HM}$ metric, we followed the instructions in this repository.


🔗 If you find this work useful, please consider citing it:

@inproceedings{
}

Footnotes

  1. https://aclanthology.org/2023.tacl-1.33

  2. https://aclanthology.org/2024.findings-emnlp.679

About

On the Challenges in Evaluating Visually Grounded Stories – Text2Story workshop (ECIR 2025)

Resources

Stars

Watchers

Forks

Contributors

Languages