Skip to content

Commit c128a9a

Browse files
committed
Version 2.0.0: Rerwote the whole reshaper
1 parent 60b877d commit c128a9a

File tree

6 files changed

+1844
-343
lines changed

6 files changed

+1844
-343
lines changed

.gitignore

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
env/
12+
build/
13+
develop-eggs/
14+
dist/
15+
downloads/
16+
eggs/
17+
.eggs/
18+
lib/
19+
lib64/
20+
parts/
21+
sdist/
22+
var/
23+
wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
28+
# Installer logs
29+
pip-log.txt
30+
pip-delete-this-directory.txt
31+
32+
# Unit test / coverage reports
33+
htmlcov/
34+
.tox/
35+
.coverage
36+
.coverage.*
37+
.cache
38+
nosetests.xml
39+
coverage.xml
40+
*,cover
41+
.hypothesis/
42+
43+
# virtualenv
44+
venv/
45+
46+
# ignore
47+
.ignore/

README.md

Lines changed: 153 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,37 @@
11
## Python Arabic Reshaper
2-
Reconstruct Arabic sentences to be used in applications that don't support Arabic
2+
Reconstruct Arabic sentences to be used in applications that don't support
3+
Arabic script.
34

4-
Based on [Better Arabic Reshaper](https://github.com/agawish/Better-Arabic-Reshaper/), ported to Python, tweaked a little bit.
5+
Works with Python 2.x and 3.x
56

6-
Arabic is a very special script language with two essential features:
7+
## Description
8+
9+
Arabic script is very special with two essential features:
710

811
1. It is written from right to left.
912
2. The characters change shape according to their surrounding characters.
1013

11-
So when you try to print Arabic text in an application – or a library – that doesn’t support Arabic you’re pretty likely to end up with something that looks like this:
14+
So when you try to print text written in Arabic script in an application
15+
– or a library – that doesn’t support Arabic you’re pretty likely to end up
16+
with something that looks like this:
1217

1318
![Arabic text written from left to right with no reshaping](http://mpcabd.xyz/wp-content/uploads/2012/05/arabic-1.png)
1419

15-
We have two problems here, first, the characters are in the isolated form, which means that every character is rendered regardless of its surroundings, and second is that the text is written from left to right.
20+
We have two problems here, first, the characters are in the isolated form,
21+
which means that every character is rendered regardless of its surroundings,
22+
and second is that the text is written from left to right.
1623

17-
To solve the latter issue all we have to do is to use the [Unicode bidirectional algorithm](http://unicode.org/reports/tr9/), which is implemented purely in Python in [python-bidi](https://github.com/MeirKriheli/python-bidi). If you use it you’ll end up with something that looks like this:
24+
To solve the latter issue all we have to do is to use the
25+
[Unicode bidirectional algorithm](http://unicode.org/reports/tr9/), which is
26+
implemented purely in Python in
27+
[python-bidi](https://github.com/MeirKriheli/python-bidi).
28+
If you use it you’ll end up with something that looks like this:
1829

1930
![Arabic text written from right to left with no reshaping](http://mpcabd.xyz/wp-content/uploads/2012/05/arabic-6.png)
2031

21-
The only issue left to solve is to reshape those characters and replace them with their correct shapes according to their surroundings. Using this library helps with the reshaping so we can get the proper result like this:
32+
The only issue left to solve is to reshape those characters and replace them
33+
with their correct shapes according to their surroundings. Using this library
34+
helps with the reshaping so we can get the proper result like this:
2235

2336
![Arabic text written from right to left with reshaping](http://mpcabd.xyz/wp-content/uploads/2012/05/arabic-3.png)
2437

@@ -30,26 +43,128 @@ The only issue left to solve is to reshape those characters and replace them wit
3043

3144
```
3245
import arabic_reshaper
46+
47+
text_to_be_reshaped = 'اللغة العربية رائعة'
48+
reshaped_text = arabic_reshaper.reshape(text_to_be_reshaped)
49+
50+
# At this stage the text is reshaped, all letters are in their correct form
51+
# based on their surroundings, but if you are going to print the text in a
52+
# left-to-right context, which usually happens in libraries/apps that do not
53+
# support Arabic and/or right-to-left text rendering, then you need to use
54+
# get_display from python-bidi.
55+
# Note that this is optional and depends on your usage of the reshaped text.
56+
3357
from bidi.algorithm import get_display
34-
35-
#...
36-
reshaped_text = arabic_reshaper.reshape(u'اللغة العربية رائعة')
3758
bidi_text = get_display(reshaped_text)
38-
pass_arabic_text_to_render(bidi_text)
39-
#...
59+
60+
# At this stage the text in bidi_text can be easily rendered in any library
61+
# that doesn't support Arabic and/or right-to-left, so use it as you'd use
62+
# any other string. For example if you're using PIL.ImageDraw.text to draw
63+
# text over an image you'd just use it like this...
64+
65+
image = Image.new('RGBA', base.size, (255,255,255,0))
66+
image_draw = ImageDraw.Draw(image)
67+
image_draw.text((10,10), bidi_text, fill=(255,255,255,128))
68+
69+
# See http://pillow.readthedocs.io/en/3.1.x/reference/ImageDraw.html?#PIL.ImageDraw.PIL.ImageDraw.Draw.text
70+
71+
```
72+
73+
## Settings
74+
75+
You can configure the reshaper to your preferences, it has the following
76+
settings defined:
77+
78+
* `language` (Default: `'Arabic'`): Currently it's ignored, but the reshaper
79+
might be extended in the future to work with other languages that use the
80+
Arabic script (Farsi, Urdu, etc.)
81+
* `support_ligatures` (Default: `True`): When this is set to `False`, the
82+
reshaper will not replace any ligatures, otherwise it will replace enabled
83+
ligatures.
84+
* `delete_harakat` (Default: `True`): When this is set to `False` the reshaper
85+
will not delete the harakat from the text, this will result in them showing up
86+
in the reshaped text, you should enable this option if you are going to pass
87+
the reshaped text to `bidi.algorithm.get_display` because it will reverse the
88+
text and you'd end up with harakat applied to the next letter instead of the
89+
previous letter.
90+
91+
Besides the settings above, you can enable/disable supported ligatures. For a
92+
full list of supported ligatures and their default status check the file
93+
[default-config.ini](default-config.ini).
94+
95+
There are multiple ways that you can configure the reshaper in, choose the one
96+
that suits your deployment:
97+
98+
### Via ArabicReshaper instance `configuration`
99+
100+
Instead of directly using `arabic_reshaper.reshape` function, define an
101+
instance of `arabic_reshaper.ArabicReshaper`, and pass your config dictionary
102+
to its constructor's `configuration` parameter like this:
103+
104+
```
105+
from arabic_reshaper import ArabicReshaper
106+
configuration = {
107+
'delete_harakat': False,
108+
'support_ligatures': True,
109+
'RIAL SIGN': True, # Replace ريال with ﷼
110+
}
111+
reshaper = ArabicReshaper(configuration=configuration)
112+
text_to_be_reshaped = 'سعر المنتج ١٥٠ ريال'
113+
reshaped_text = reshaper.reshape(text_to_be_reshaped)
40114
```
41115

42-
The `pass_arabic_text_to_render` function here is an imaginary function, it is just here to say that the variable `bidi_text` is the variable that you would need to use in your code afterwards, for example to print it in PDF, or to write it in an Image, etc.
116+
### Via ArabicReshaper instance `configuration_file`
43117

44-
For more info visit my blog [post here](http://mpcabd.xyz/python-arabic-text-reshaper/)
118+
You can separte the configuration from your code, by copying the file
119+
[default-config.ini](default-config.ini) and change its settings,
120+
then save it somewhere in your project, and then you can tell the reshaper
121+
to use your new config file, just pass the path to your config file to its
122+
constructor's `configuration_file` parameter like this:
123+
124+
```
125+
from arabic_reshaper import ArabicReshaper
126+
configuration = {
127+
'delete_harakat': False,
128+
'support_ligatures': True,
129+
'RIAL SIGN': True, # Replace ريال with ﷼
130+
}
131+
reshaper = ArabicReshaper(configuration_file='/path/to/your/config.ini')
132+
text_to_be_reshaped = 'سعر المنتج ١٥٠ ريال'
133+
reshaped_text = reshaper.reshape(text_to_be_reshaped)
134+
```
135+
136+
Where in you `config.ini` you can have something like this:
137+
138+
```
139+
[ArabicReshaper]
140+
delete_harakat = no
141+
support_ligatures = yes
142+
RIAL SIGN = yes
143+
```
144+
145+
### Via `PYTHON_ARABIC_RESHAPER_CONFIGURATION_FILE` environment variable
146+
147+
Instead of having to rewrite your old code to configure it like above, you can
148+
define an environment variable with the name
149+
`PYTHON_ARABIC_RESHAPER_CONFIGURATION_FILE` and in its value put the full path
150+
to the configuration file. This way the reshape function will pick it
151+
automatically, and you won't have to change your old code.
45152

46153
## Known Issue
47154

48-
[Harakat or Tashkeel](http://en.wikipedia.org/wiki/Arabic_diacritics#Tashkil_.28marks_used_as_phonetic_guides.29) are not supported, and I think that they can't be supported as their unicode characters are non-spacing marks (i.e. they don't take space, they are rendered in the same space of the character before them), which means that when used in a reshaper, they will be rendered on the next character as the text is reversed.
155+
When using a library/app that doesn't support right-to-left text rendering,
156+
[Harakat or Tashkeel](http://en.wikipedia.org/wiki/Arabic_diacritics#Tashkil_.28marks_used_as_phonetic_guides.29)
157+
cannot be supported, because their unicode characters are non-spacing marks
158+
(i.e. they don't take space, they are rendered in the same space of the
159+
character before them), which means that when you keep them and pass the
160+
reshaped text to `bidi.algorithm.get_display`, they will end up being rendered
161+
on the next character not the character they should be on as the text is
162+
reversed.
49163

50164
## License
51165

52-
This work is licensed under [GNU General Public License v3](http://www.gnu.org/licenses/gpl.txt).
166+
This work is licensed under
167+
[GNU General Public License v3](http://www.gnu.org/licenses/gpl.txt).
53168

54169
## Demo
55170

@@ -59,8 +174,30 @@ Online Arabic Reshaper: http://pydj.mpczbd.xyz/arabic-reshaper/
59174

60175
https://github.com/mpcabd/python-arabic-reshaper/tarball/master
61176

177+
## Version History
178+
179+
### 2.0.0
180+
181+
* Totally rewrote the code;
182+
* Faster and better performance;
183+
* Added the ability to configure and customise the reshaper.
184+
185+
### 1.0.1
186+
187+
* New glyphs for Farsi;
188+
* Added setup.py;
189+
* Bugfixes.
190+
191+
### 1.0
192+
193+
* Ported [Better Arabic Reshaper](https://github.com/agawish/Better-Arabic-Reshaper/)
194+
to Python.
195+
62196
## Contact
63197

64198
Abdullah Diab (mpcabd)
65199
66200
Blog: http://mpcabd.xyz
201+
202+
For more info visit my blog
203+
[post here](http://mpcabd.xyz/python-arabic-text-reshaper/)

__init__.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1 @@
1-
from .arabic_reshaper import reshape
2-
3-
__all__ = [reshape]
4-
1+
from .arabic_reshaper import reshape, default_reshaper, ArabicReshaper

0 commit comments

Comments
 (0)