Skip to content

malamatenia/learnable-handwriter

Repository files navigation

An Interpretable Deep Learning Approach for Morphological Script Type Analysis (IWCP 2024) DOI

https://learnable-handwriter.github.io/

Malamatenia Vlachou Efstathiou, Ioannis Siglidis, Dominique Stutzmann and Mathieu Aubry

LTW_graph.png

  • For training without having to install, we provide a standalone ColabOpen In Colab notebook.

  • For minimal inference on pre-trained and finetuned models without having to install, we provide a standalone ColabOpen In Colab notebook.

  • A figures.ipynb notebook is provided to reproduce the paper results and graphs. You'll need to download & extract datasets.zip and runs.zip in the base folder first or run it directly in ColabOpen In Colab

Getting Started

Install

Note

macOS is not supported due to compatibility issues with the available PyTorch version (affine transforms are not fully implemented or optimized in the macOS build). We recommend running the code on a Linux system (locally or on a server) with CUDA support. For training, the use of a GPU is strongly advised.

After cloning the repository and entering the base folder:

  1. Create a conda environment:
    conda create --name lhr python=3.10
    conda activate lhr
  2. Install pytorch.
  3. If you're using pip:
    python -m pip install -r requirements.txt

Run it from scratch on our dataset

Train

In this case you'll need to download & extract only the datasets.zip.

Train our reference model with:

python scripts/train.py iwcp_south_north.yaml 

Finetune

1. Northern and Southern Textualis models with:

python scripts/finetune_scripts.py -i runs/iwcp_south_north/train/ -o runs/iwcp_south_north/finetune/ --mode g_theta --max_steps 2500 --invert_sprites --script Northern_Textualis Southern_Textualis -a datasets/iwcp_south_north/annotation.json -d datasets/iwcp_south_north/ --split train

2. document models with:

python scripts/finetune_docs.py -i runs/iwcp_south_north/train/ -o runs/iwcp_south_north/finetune/ --mode g_theta --max_steps 2500 --invert_sprites -a datasets/iwcp_south_north/annotation.json -d datasets/iwcp_south_north/ --split all

Run it on your data

Create your config files:

1. Create a config file for the dataset:

configs/dataset/<DATASET_ID>.yaml
...

DATASET-TAG:                 
  path: <DATASET-NAME>/      
  sep: ''                    # How the character separator is denoted in the annotation. 
  space: ' '                 # How the space is denoted in the annotation.

2. then a second one setting the hyperparameters:

configs/<DATASET_ID>.yaml
...

For its structure, see the config file provided for our experiment.

Create your dataset folder:

3. Create the dataset folder:

datasets/<DATASET-NAME>
├── annotation.json
└── images
  ├── <image_id>.png 
  └── ...

The annotation.json file should be a dictionary with entries of the form:

    "<image_id>": {
        "split": "train",                            # {"train", "val", "test"} - "val" is ignored in the unsupervised case.
        "label": "A beautiful calico cat."           # The text that corresponds to this line.
        "script": "Times_New_Roman"                  # (optional) Corresponds to the script type of the image
    },

You can completely ignore the annotation.json file in the case of unsupervised training without evaluation.

Train and finetune

4. Train with

   python scripts/train.py <CONFIG_NAME>.yaml

5. Finetune

  • On a group of documents defined by their "script" type with:
python scripts/finetune_scripts.py -i runs/<MODEL_PATH> -o <OUTPUT_PATH> --mode g_theta --max_steps <int> --invert_sprites --script '<SCRIPT_NAME>' -a <DATASET_PATH>/annotation.json -d <DATASET_PATH> --split <train or all>
  • On individual documents with:
python scripts/finetune_docs.py -i runs/<MODEL_PATH> -o <OUTPUT_PATH> --mode g_theta --max_steps <int> --invert_sprites -a <DATASET_PATH>/annotation.json -d <DATASET_PATH> --split <train or all>

Note

To ensure a consistent set of characters regardless of the annotation source for our analysis, we implement internally choco-mufin, using a disambiguation-table.csv to normalize or exclude characters from the annotations. The current configuration suppresses allographs and edition signs (e.g., modern punctuation) for a graphetic result.

Cite us

@misc{vlachou2024interpretable,
    title = {An Interpretable Deep Learning Approach for Morphological Script Type Analysis},
    author = {Vlachou-Efstathiou, Malamatenia and Siglidis, Ioannis and Stutzmann, Dominique and Aubry, Mathieu},
    publisher = {Document Analysis and Recognition--ICDAR 2021 Workshops: Athens, Greece, August 30--September 4, 2023, Proceedings},
    year = {2024},
    organization={Springer}, 
    url={https://arxiv.org/abs/2408.11150}}

Check out also: Siglidis, I., Gonthier, N., Gaubil, J., Monnier, T., & Aubry, M. (2023). The Learnable Typewriter: A Generative Approach to Text Analysis.

Acknowledgements

This study was supported by the CNRS through MITI and the 80|Prime program (CrEMe Caractérisation des écritures médiévales) , and by the European Research Council (ERC project DISCOVER, number 101076028). We thank Ségolène Albouy, Raphaël Baena, Sonat Baltacı, Syrine Kalleli, and Elliot Vincent for valuable feedback on the paper.

About

An Interpretable Deep Learning Approach for Morphological Script Type Analysis (IWCP 2024)

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors