Skip to content

detypstify/typic

Repository files navigation

built with nix

deploying

Using OCR to generate Typst code based on images of math formulas as a fully client-side webapp.

Getting started

Using the model

The model is hosted here.

Installation

Obtaining data

We use oxen to version control our data. To get the oxen executable, run nix develop. Then, from the root of this repo, clone the oxen repo:

oxen clone https://hub.oxen.ai/DiracDelta/data

The datasets we use for this project will now be available in data/.

Training the model

Detypstify uses a custom dataset which was generated by transpiling the im2latex-230k with pandoc and cleaning the resulting data (see scraper/). The final dataset is available on Kaggle.

  1. Download the dataset and unzip it
  2. Run poetry run train_val_split to perform a train validation split
  3. Generate formulas.txt by running scripts/mk_formulas_txt.sh on the train and val directories
  4. Install pix2tex
    1. Follow the instructions to generate tokenizer, train.pkl, val.pkl
    2. Create a config.yaml based on the template
    3. Train the model with python -m pix2tex.train --config config.yaml

About

Using OCR to convert images of formulas into Typst code.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5