-
Hey everybody! I've been trying to use paddleOCR for hungarian text detection and recognition. However the model is struggling with special characters in the language, for example: á, é, ő, ö, ú etc.. Sometimes it detects them well and the output is correct, other times it cannot detect the character, instead it detects ä or just a without the accent. Now I've looked at several parts of the implementation. The hungarian language is supported, however dictionary that gets used during inference for a pretrained model is a latin dictionary, which contains the aforementioned "other" characters too, like the 'a with double dots on top' - I am using the quickstart code example of paddleOCR. I would like to ask for some advice regarding this matter. I've tried preprocessing the image, eliminating backgrounds (that, for example solved the problem for one specific example, but also decreased performance on other inputs) and some other preprocessing ideas I've had in mind. I've also thought of fine tuning a model using a hungarian dataset, however that'd be a last resort solution for me, since that would be rather time consuming. I accept any kind of advice, thank you, and have a nice day! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Here are a few suggestions that might help improve the performance:
|
Beta Was this translation helpful? Give feedback.
Here are a few suggestions that might help improve the performance:
Custom Dictionary: Modify the dictionary file used by the model to include all Hungarian characters. This way, the model will recognize these characters as valid and increase the likelihood of correct detection.
Data Augmentation: Generate synthetic training data that includes Hungarian text with special characters. This can help the model learn to recognize these characters better without the need for a full retraining.
Preprocessing: Since preprocessing worked for some cases, you could create a preprocessing pipeline that adapts based on the input image characteristics. For example, apply different preprocessing t…