Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Built upon AaronCCWong's PyTorch implementation.

A PyTorch implementation

For a trained model to load into the decoder, use

Some training statistics

TODO

Setup before training

Follow the instructions for the dataset you choose to work with.

First, download Karpathy's data splits here.

Flickr8k

Download the Flickr8k images from here. Put the images in data/flickr8k/imgs/. Place the Flickr8k data split JSON file in data/flickr8k/. It should be named dataset.json.

Run python generate_json_data.py --split-path='data/flickr8k/dataset.json' --data-path='data/flickr8k' to generate the JSON files needed for training.

If you want to use pre-trained BERT embeddings (bert=True), additionally run python generate_json_data_bert.py --split-path='data/flickr8k/dataset.json' --data-path='data/flickr8k' to generate the BERT-tokenized caption JSON files.

COCO

Download the COCO dataset training and validation images. Put them in data/coco/imgs/train2014 and data/coco/imgs/val2014 respectively. Put the COCO dataset split JSON file from Karpathy in data/coco/. It should be named dataset.json.

Run python generate_json_data.py --split-path='data/coco/dataset.json' --data-path='data/coco' to generate the JSON files needed for training.

To Train

Start the training by running:

python train.py --data=data/flickr8k

or to make a small test run:

python train.py --data=data/flickr8k --tf --ado --attention --epochs=1 --frac=0.02 --log-interval=2

The models will be saved in model/ and the training statistics are uploaded to your W&B account.

My training statistics are available here: W&B

To Generate Captions

Note that together with the model parameters, a model_config.json is saved. This is required by generate_caption.py to properly load the model.

python generate_caption.py --img-path <PATH_TO_IMG> --model <PATH_TO_MODEL_PARAMETERS>

An example:

python generate_caption.py --img-path data/flickr8k/imgs/667626_18933d713e.jpg --model model/model_vgg19_5.pth

You also have the option to generate captions based on models saved on W&B:

python generate_caption.py --img-path data/flickr8k/imgs/667626_18933d713e.jpg --wandb-run yvokeller/show-attend-and-tell/0v6sxo6t --wandb-model model/model_vgg19_1.pth

Captioned Examples

Correctly Captioned Images

TODO

Incorrectly Captioned Images

TODO

References

Show, Attend and Tell

Original Theano Implementation

Neural Machine Translation By Jointly Learning to Align And Translate

Karpathys Data splits

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
assets		assets
model		model
nbassets		nbassets
.gitignore		.gitignore
README.md		README.md
attention.py		attention.py
dataset.py		dataset.py
decoder.py		decoder.py
encoder.py		encoder.py
generate_caption.py		generate_caption.py
generate_json_data.py		generate_json_data.py
generate_json_data_bert.py		generate_json_data_bert.py
nb_tests.ipynb		nb_tests.ipynb
paper_to_code.html		paper_to_code.html
paper_to_code.ipynb		paper_to_code.ipynb
requirements.txt		requirements.txt
test_loader_img_paths.txt		test_loader_img_paths.txt
train.py		train.py
train_models.py		train_models.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

A PyTorch implementation

Some training statistics

Setup before training

Flickr8k

COCO

To Train

To Generate Captions

Captioned Examples

Correctly Captioned Images

Incorrectly Captioned Images

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

yvokeller/Show-Attend-and-Tell

Folders and files

Latest commit

History

Repository files navigation

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

A PyTorch implementation

Some training statistics

Setup before training

Flickr8k

COCO

To Train

To Generate Captions

Captioned Examples

Correctly Captioned Images

Incorrectly Captioned Images

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages