This repo will contain a list of useful resources for Mongolian NLP and also my own experiments mostly with PyTorch.
DATASETLJSpeech like male voice TTS dataset created from the Mongolian Bible- used in tugstugi/pytorch-dc-tts
- use dl_and_preprop_dataset.py to download the audio files
DATASETEduge news classification dataset- used to train the Eduge.mn production news classifier
- 75K news with 9 categories:
урлаг соёл,эдийн засаг,эрүүл мэнд,хууль,улс төр,спорт,технологи,боловсролandбайгал орчин
DATASET11-11.mn government agency complaint dataset- 80K with 5 categories:
санал хүсэлт,гомдол,шүүмжлэл,талархалandөргөдөл
- 80K with 5 categories:
DATASETonline news corpus- 700 million words
DEMOHMM TTS online demo of the Mongolian National University- 1x male and 2x female voices
PYTORCHtugstugi/pytorch-dc-ttsDEMOColab online demoDATASETLJSpeech like male voice dataset created from the Mongolian Bible
TFtugstugi/Tacotron-2 fork of Rayhane-mamah/Tacotron-2 adapted for the Mongolian Bible datasetDEMOColab online demo
PYTORCHtugstugi/mongolian-speech-recognition- single voice demo
DEMOCyrillic to Mongolian script converter demo of the Inner Mongolian universityPYTORCHtugstugi/bichig2cyrillic Mongolian script to (and back) cyrillic converterPYTORCHMongolian script OCR to be released