coptic_char_generator

Repository for character based generation of Coptic for reconstruction of Coptic manuscripts.

Data Path

Data comes from the Coptic SCRIPTORIUM GitHub. The recommended file path is to clone the data repo into your Desktop, unzip the sahidic.ot and sahidica.nt files, and the data path will work without any updates or changes.

Command Line Arguments

To run the project, the command is just python main.py

Masking

Mask Type (required)

Masking can be random (per character, by masking percentage) or smart (sections of masking, based on distribution of the text).

Add -m <random, smart> or --masking <random, smart> to the command.

Masking Strategy (required)

Masking can happen only once (right after data is read in) or dynamic (re-masked in each training epoch).

Add -ms <once, dynamic> or --masking-strategy <once, dynamic> to the command.

SentencePiece (optional)

To train a SentencePiece model, add -sp or --sentencepiece to the command.

If you already have a SentencePiece model named "coptic_sp.model" and "coptic_sp.vocab" and don't need to retrain, leave out the -sp flag.

Model Training (optional)

To train the model, add -tr or --train to the command.

Partition (optional)

To create the data set partition, add -p or --partion to the command.

Evaluation (optional)

To evaluate on the lacuna test sets, add -e or --eval to the command.

Demo

A demo for interacting with several of our models is available online here.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
data		data
log		log
models		models
server		server
.gitignore		.gitignore
README.md		README.md
coptic_RNN.py		coptic_RNN.py
coptic_char_data.py		coptic_char_data.py
coptic_char_generator.py		coptic_char_generator.py
coptic_utils.py		coptic_utils.py
data_stats.py		data_stats.py
main.py		main.py
requirements.txt		requirements.txt
sp_coptic.py		sp_coptic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coptic_char_generator

Data Path

Command Line Arguments

Masking

Mask Type (required)

Masking Strategy (required)

SentencePiece (optional)

Model Training (optional)

Partition (optional)

Evaluation (optional)

Demo

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

lauren-lizzy-levine/coptic_char_generator

Folders and files

Latest commit

History

Repository files navigation

coptic_char_generator

Data Path

Command Line Arguments

Masking

Mask Type (required)

Masking Strategy (required)

SentencePiece (optional)

Model Training (optional)

Partition (optional)

Evaluation (optional)

Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages