How to fine-tune PP-OCRv5 for japanese language? #15690
-
I'm interested in fine-tuning PaddleOCR for the Japanese language. I know that Japanese is officially supported, and there's a dedicated recognition model for Japanese in PP-OCRv3. However, I noticed that there isn't a specific Japanese recognition model for PP-OCRv5. Is this because PP-OCRv5 uses a multilingual recognition model? If so, how exactly does the lang parameter work in PP-OCRv5, what does it do exactly? Additionally, if I want to fine-tune PP-OCRv5 japanese version for Japanese, should I just use the PP-OCRv5_server_rec.yml configuration along with the PP-OCRv5_server_rec_pretrained.pdparams? Or is there a better way to fine-tune the model for improved Japanese text recognition? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Beta Was this translation helpful? Give feedback.
Yes, in real-world scenarios, multiple languages are often mixed together. PP-OCRv5 integrates multiple languages into a single model for recognition and currently supports Simplified Chinese, Traditional Chinese, English, and Japanese. More languages will be added for training in the future. Regarding the use of the lang parameter in PP-OCRv5, specifying lang="japan" will default to using the PP-OCRv5 server. If you wish to use the Japanese model from PP-OCRv3, you also need to specify ocr_version="PP-OCRv3", as detailed below: