Navigate to the root directory of this repository, then run
git submodule init
git submodule update
Please use the forked version at https://github.com/michaelpginn/yoyodyne.
- You will need to use Python 3.10. You can use pyenv to manage versions.
- Create a virtual environment (as some of the dependencies are nonstandard) using
venv
. - Navigate to the clone repo and run
pip install .
- Navigate back to the
americasnlp2024
repo (ensuring you're still using the right environment and Python and run scripts as needed.
Training scripts are in scripts/
train.sh
is used to train using the shared task data across all the model architectures supported byyoyodyne
train_aug.sh
is used to train using an additional augmented datasetconjoin_dataframes.py
is a helper script that joins two dataframes. You can use it to combine multiple augmented datasets.copy_preds.py
is a helper script used to produce the final predictions tsv file.
Any augmented data files should be added to data/augmented/{language}-{strategy}.tsv
. The file should not include column headers and should follow the columns Source, Target, Change.