Arabic-Diacritization-Text

In this project, we aim to take a deep learning approach to the problem of Arabic Diacritization. Our first milestone is to reproduce the work by the results by Fadel et al. [1], using an RNN model with Bidirectional LSTM layers. Next, we will attempt to improve the model through tuning hyperparameters.

Two measures have been developed to measure the accuracy of a diacritization system. The first is the Diacritic Error Rate (DER) which is the percentage of misclassified Arabic characters whether the character has 0, 1, or 2 diacritics. The other is Word Error Rate (WER) which is the percentage of words which have at least one misclassified Arabic character. We generally aim to reach a maximum DER of 3% and a maximum WER of 8%

Model

The used CuDDLSTM layers are bidirectional LTSM layers, where CuDDLTSM is a fast LSTM implementation backed by CuDNN and runable on GPU with the tensorflow backend.

A bidirectional LTSM runs inputs in two ways, one from past to future and one from future to past, allowing the model to preserve information from both past and future.

References

[1] Fadel, Ali, Ibraheem Tuffaha, Bara' Al-Jawarneh and M. Al-Ayyoub, “Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation.” WAT@EMNLP-IJCNL, 2019.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
static		static
templates		templates
ARABIC_LETTERS_LIST.pickle		ARABIC_LETTERS_LIST.pickle
DIACRITICS_LIST.pickle		DIACRITICS_LIST.pickle
README.md		README.md
RNN_CLASSES_MAPPING.pickle		RNN_CLASSES_MAPPING.pickle
RNN_REV_CLASSES_MAPPING.pickle		RNN_REV_CLASSES_MAPPING.pickle
RNN_SMALL_CHARACTERS_MAPPING.pickle		RNN_SMALL_CHARACTERS_MAPPING.pickle
arabic-text-diacritization.ipynb		arabic-text-diacritization.ipynb
arabic1.jpg		arabic1.jpg
download.png		download.png
home.html		home.html
map_data.py		map_data.py
output.h5		output.h5
predict.py		predict.py
remove_diacritics.py		remove_diacritics.py
result.png		result.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic-Diacritization-Text

Model

References

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Arabic-Diacritization-Text

Model

References

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages