Skip to content

gabrielmotablima/ved-transformer-caption-ptbr

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

40 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

A Comparative Evaluation of Transformer-Based Vision Encoder-Decoder Models for Brazilian Portuguese Image Captioning

Gabriel Bromonschenkel, HilΓ‘rio Oliveira, Thiago M. PaixΓ£o

SIBGRAPI 2024


Open-stuff available in

or access our publicly available collection.

πŸ”§ To set up the environment, use:

$ chmod +x setup.sh
$ ./setup.sh

βš™οΈ To run the complete train and evaluate, use:

$ python train.py

βš™οΈ To run only the evaluation, use:

$ python eval.py

πŸ”§ Don't forget of setting up the training/model attributes in config.yml. An example:

config:
  encoder: "deit-base-224"
  decoder: "gpt2-small"
  dataset: "pracegover_63k"
  max_length: 25
  batch_size: 16
  evaluate_from_model: False
  turn_off_computer: False

generate_args:
  num_beams: 1
  no_repeat_ngram_size: 0
  early_stopping: False

training_args:
  predict_with_generate: True
  num_train_epochs: 1
  eval_steps: 200
  logging_steps: 200
  per_device_train_batch_size: 16
  per_device_eval_batch_size: 16
  learning_rate: 5.6e-5
  weight_decay: 0.01
  save_total_limit: 1
  logging_strategy: "epoch"
  evaluation_strategy: "epoch"
  save_strategy: "epoch"
  load_best_model_at_end: True
  metric_for_best_model: "rougeL"
  greater_is_better: True
  push_to_hub: False
  fp16: True

callbacks:
  early_stopping:
    patience: 1
    threshold: 0.0

encoder: # available options to select in config.encoder at the top of this document
  vit-base-224: "google/vit-base-patch16-224"
  vit-base-224-21k: "google/vit-base-patch16-224-in21k"
  vit-base-384: "google/vit-base-patch16-384"
  vit-large-384: "google/vit-large-patch16-384"
  vit-huge-224-21k: "google/vit-huge-patch14-224-in21k"
  swin-base-224: "microsoft/swin-base-patch4-window7-224"
  swin-base-224-22k: "microsoft/swin-base-patch4-window7-224-in22k"
  swin-base-384: "microsoft/swin-base-patch4-window12-384"
  swin-large-384-22k: "microsoft/swin-large-patch4-window12-384-in22k"
  beit-base-224: "microsoft/beit-base-patch16-224"
  beit-base-224-22k: "microsoft/beit-base-patch16-224-pt22k"
  beit-large-224-22k: "microsoft/beit-large-patch16-224-pt22k-ft22k"
  beit-large-512: "microsoft/beit-large-patch16-512"
  beit-large-640: "microsoft/beit-large-finetuned-ade-640-640"
  deit-base-224: "facebook/deit-base-patch16-224"
  deit-base-distil-224: "facebook/deit-base-distilled-patch16-224"
  deit-base-384: "facebook/deit-base-patch16-384"
  deit-base-distil-384: "facebook/deit-base-distilled-patch16-384"

decoder: # available options to select in config.decoder at the top of this document
  bert-base: "neuralmind/bert-base-portuguese-cased"
  bert-large: "neuralmind/bert-large-portuguese-cased"
  roberta-small: "josu/roberta-pt-br"
  distilbert-base: "adalbertojunior/distilbert-portuguese-cased"
  gpt2-small: "pierreguillou/gpt2-small-portuguese"
  bart-base: "adalbertojunior/bart-base-portuguese"

πŸ—ƒοΈ Directory structure:

β”œβ”€β”€ README.md          <- The top-level README for developers using this project.
β”œβ”€β”€ data
β”‚   └── pracegover_63k     <- Dataset #PraCegoVer 63k
β”‚       β”œβ”€β”€ test.hf        <- Data for testing split.
β”‚       β”œβ”€β”€ train.hf       <- Data for training split.
β”‚       └── validation.hf  <- Data for validation split.
β”‚
β”œβ”€β”€ docs               <- A default HTML for docs.
β”‚
β”œβ”€β”€ models             <- The models and its artifacts will be saved here.
β”‚
β”œβ”€β”€ requirements.txt   <- The requirements file for reproducing the training and evaluation pipelines.
β”‚
└── src                <- Source code for use in this project.
    β”œβ”€β”€ __init__.py    <- Makes src a Python module
    β”‚
    β”œβ”€β”€ utils          <- Modularization for configuration, splits processing and evaluation metrics.
    β”‚   β”œβ”€β”€ config.py
    β”‚   β”œβ”€β”€ data_processing.py
    β”‚   └── metrics.py
    β”‚
    β”œβ”€β”€ eval.py
    └── train.py

πŸ“‹ BibTex entry and citation info

@inproceedings{bromonschenkel2024comparative,
  title={A Comparative Evaluation of Transformer-Based Vision Encoder-Decoder Models for Brazilian Portuguese Image Captioning},
  author={Bromonschenkel, Gabriel and Oliveira, Hil{\'a}rio and Paix{\~a}o, Thiago M},
  booktitle={2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)},
  pages={1--6},
  year={2024},
  organization={IEEE}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%