Hello,
Where is this data for Tatar language is coming from?
I see a lot of garbage there, I barely found a Tatar words here.
I would like to improve this.
- do you have some page with guidance how to train the model?
- once I train it, should I create a PR with just model itself to that repo? where are storing raw data for training?
follow up for tesseract-ocr/langdata#305