This is a deasciifier Python library and command line utility for Turkish that solves the problem of diacritics restoration (also known as diacritics reconstruction). It takes a Turkish string containing only ASCII characters (that is, without proper diacritics) and replaces the relevant characters with their corresponding Turkish letters.
The web-based, online version of this system is available at:
http://turkceyap.appspot.com (I'm currently too busy to fix it, please use https://deasciifier.com/ instead!)
Keep in mind that diacritics restoration (deasciification) for Turkish doesn't work 100% of the time; it is an active research topic! Still, this library is good enough for many practical purposes, and served many people and projects in the last 15 years.
This system is based on the turkish-mode for GNU Emacs by Prof. Deniz Yüret.
- Installation
- Example Python Library Usage
- Example CLI (Command Line Interface) Usage
- Other Programming Languages and Systems
- Advanced Research
Requires Python 3.11+. Works on all platforms (Linux, macOS Intel & Apple Silicon, Windows).
Install from the project root using pip (PEP 517 build):
pip install .Or install directly from GitHub:
pip install git+https://github.com/emres/turkish-deasciifier.gitEditable install for development:
pip install -e .from turkish.deasciifier import Deasciifier
my_ascii_turkish_txt = "Opusmegi cagristiran catirtilar."
deasciifier = Deasciifier(my_ascii_turkish_txt)
my_deasciified_turkish_txt = deasciifier.convert_to_turkish()
print(my_deasciified_turkish_txt)Or use the package shorthand:
from turkish import DeasciifierAfter installation, the turkish-deasciify command is available:
$ echo "Opusmegi cagristiran catirtilar." | turkish-deasciify
$ cat somefile.txt | turkish-deasciifyYou can also run the module directly: python -m turkish
python -m unittest tests- Java: https://github.com/ahmetb/turkish-deasciifier-java
- Perl: https://metacpan.org/pod/release/BURAK/Lingua-TR-ASCII-0.13/lib/Lingua/TR/ASCII.pm
- Haskell: http://hackage.haskell.org/package/turkish-deasciifier
- Node.js: https://github.com/f/deasciifier/
- VIM: https://github.com/joom/turkish-deasciifier.vim
- Emacs Lisp: https://github.com/emres/turkish-mode (also available as a package in MELPA)
- Swift: https://github.com/armish/TurkishDeasciifier
For recent advanced scientific research articles, please see the following:
- The Deceptively Complex World of Turkish Diacritics: A Neural Network Journey
- Diacritic Restoration Using Recurrent Neural Network
- Diacritics Restoration Using Neural Networks
- Diacritic restoration of Turkish tweets with word2vec
- Vowel and Diacritic Restoration for Social Media Texts
- Paper: https://www.aclweb.org/anthology/W14-1307/
- Full text (PDF): https://www.aclweb.org/anthology/W14-1307.pdf
- Web demo: http://tools.nlp.itu.edu.tr/Deasciifier