Replies: 1 comment
-
After tremendous effort debugging and learning Paddle, here is the part of the code that might cause an issue loading the labels |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Question
How does PaddleOCR load labels during recognition training? Does it try to compare word directly with the images, or does it try to compare character by character? My hypothesis is PaddleOCR will compare character by character with the image from left to right, as suggested in the code here https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/tools/program.py#L593C1-L593, but someone please confirm my hypothesis.
Issue
I am asking this because I noticed the native behavior of Python
The result would be
What happens is, the word ;هخحت# displayed here on our screen as English speaker should actually really be displayed as

for an Arabic person. Try copy and paste it on Google search, Google will know this is Arabic and will display it right to left, which is the correct way.
Which means, if PaddleOCR loads character by character from labels.txt, then for Arabic, Python is smart enough that it will reach characters from right to left, just as a native Arabic speaker!!
So if PaddleOCR doesn't change the order of reading the label for Arabic, then correct me if I am wrong, it will try to match character by character in the reverse order with the image, which is wrong.
By the way, to make it even more complicated, whenever there is a punctuation with Arabic (for example in the case above), simply reversing the string does not work, because the punctuation will go to the opposite end, or get flipped. For example, if you copy and paste (هخحت# on Google search from right to left, it gets flipped and becomes

for an Arabic person.
Beta Was this translation helpful? Give feedback.
All reactions