consistently getting Loss = nanxxx and accuracy 0.000000 even for running on 1500 epoches on arabic dataset. #12858
Replies: 28 comments
-
@WenmuZhou @dyning @LDOUBLEV @tink2123 @MissPenguin @Topdu @Evezerest @littletomatodonkey @andyjpaddle @weisy11 @D-DanielYang @Topdu @sdcb @ZeyuChen @bingooo please help????? I've been stuck on PaddleOCR for the past two weeks. I'd be grateful if someone could help me resolve it. |
Beta Was this translation helpful? Give feedback.
-
which version of paddleocr (not talking about venv version), the code you downloaded as zip for training what is the version of that,
|
Beta Was this translation helpful? Give feedback.
-
Train Data:
Same format goes for Validation directory
Please let me if you need more information..? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
train using this command
and make distributed false In config file |
Beta Was this translation helpful? Give feedback.
-
Dear @hritikakolkar
After that i test an image using pretrained model downloaded from PaddleOCR github repo.
It accuractly recognize the image text. ls /usr/lib | grep lib
I have prepared my dataset of arabic language having and set the path in yml file for training PaddleOCR in my custom dataset of arabic language. Here is the yml file
and here is the ouput I have, still loss : nanxxx
I'm feeling a bit confused, and I wonder if I might be doing something incorrectly. I've been looking into several GitHub issues related to PaddleOCR, and most of them seem to be about achieving an accuracy of 0. The contributors to PaddleOCR have suggested that increasing the number of epochs can resolve this issue. I've observed many of these discussions, and they all share a common pattern. The loss starts decreasing from the very first epoch, and there are no instances of loss being "nanxxx." But in my case I am getting loss: nanxxx from start of epoch. I'm seeking assistance and advice from anyone who might be able to shed light on this matter. |
Beta Was this translation helpful? Give feedback.
-
Use https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6.1 this version of paddleOCR instead of cloning current version I also find some errors while using the current version, download the link I provided as zip. |
Beta Was this translation helpful? Give feedback.
-
This time it will work wait for 4 hours the model till you see results as you are finetuning. |
Beta Was this translation helpful? Give feedback.
-
thank you so much for response, will try it and update you soon. |
Beta Was this translation helpful? Give feedback.
-
@hritikakolkar just download and unzip and use paddleOCR-2.6.1 version and follow the same steps Although I am not finetuning write now, I am just training, but same issue when I finetune that model. Lastly I am confused about the and start training paddleocr and here are the results yet (issue not resolved yet), let's wait for some epoches and then i will let you know about the update
|
Beta Was this translation helpful? Give feedback.
-
If you change the dictionary file then I think last three layers are changed so the accuracy will start from 0.00, but if not it shouldn't one more thing while finetuning model don't use cosine use piecewise please refer this docs (I know you are training from scratch just to let you know about finetuning) https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/finetune_en.md |
Beta Was this translation helpful? Give feedback.
-
@hritikakolkar, I use paddleOCR-2.6.1 version. here is the epoch which is still running but not getting any change, Need your giudence and help in this matter
What should I have to do to solve the problem? |
Beta Was this translation helpful? Give feedback.
-
I don't know what is the problem. But you have to dig down se whether the image is being read, is label being read (might be the problem) other than that I can't help. |
Beta Was this translation helpful? Give feedback.
-
Sad to here this from you. |
Beta Was this translation helpful? Give feedback.
-
@hritikakolkar I have an update, I tried 3 versions of PaddleOCR (2.7 2.6, 2.5) but the issue is same, Could this possibly be due to the CUDA version? |
Beta Was this translation helpful? Give feedback.
-
Okay, I don't know whether that be the problem here is what version I use.
if you want to install use this command
|
Beta Was this translation helpful? Give feedback.
-
Thank you so much, I found it it is because of CUDA version |
Beta Was this translation helpful? Give feedback.
-
oh good |
Beta Was this translation helpful? Give feedback.
-
@hritikakolkar Bro, can you share your experiences? Have you trained PaddleOCR on an Arabic dataset, and did you finetune it, or did you train it from scratch? I'm asking these questions because I trained PaddleOCR on an Arabic dataset and achieved an accuracy of 83.93%, which I find somewhat unsatisfactory. I used the following command to train PaddleOCR with finetuning: python3 tools/train.py -c configs/rec/PP-OCRv3/multi_language/arabic_PP-OCRv3_rec.yml -o Global.pretrained_model=/additional_drive/ibrar/PaddleOCR/pretrain_models/arabic/arabic_PP-OCRv3_rec_train/best_accuracy.pdparams I've already shared my YML file with you. I'm a bit confused because when I test images with the pre-trained model, it performs very well. However, when I test it on my trained model, the performance isn't as good. Do you have any insights into why this might be happening? Is this the right way to start finetuning the model for training?" |
Beta Was this translation helpful? Give feedback.
-
If you are not getting desired accuracy then there might be two problems, the dataset distribution is different from real world data you are inferencing on, otherwise training hyperparameters like batch size, learning rate are important. Please also try to infer on multiple epochs for example check how model worked on epoch_10, epoch_11, for doing this you have to save model at every epoch while training. |
Beta Was this translation helpful? Give feedback.
-
The real problem is of data, paddleocr guys haven't shared the dataset in which they trained arabic ocr model so that's the issue |
Beta Was this translation helpful? Give feedback.
-
Hmm, that makes sense. Thank you so much. I will let you know after conducting these experiments. |
Beta Was this translation helpful? Give feedback.
-
I'm also cuda12.1, and i installed paddle2.5.1post12.0, it works well.you can have a try |
Beta Was this translation helpful? Give feedback.
-
@data-ant Thank you so much. You are right. I was initially trying to install a lower version that was not compatible with CUDA 12.1. However, I have since downgraded my CUDA to version 11.7, and it works fine now. Subsequently, I realized that the issue was not related to the CUDA version but rather to the PaddleOCR version. |
Beta Was this translation helpful? Give feedback.
-
我已收到,谢谢。
祝您诸事顺利!
|
Beta Was this translation helpful? Give feedback.
-
@hritikakolkar @WenmuZhou @dyning @LDOUBLEV @tink2123 @MissPenguin @Topdu @Evezerest @littletomatodonkey @andyjpaddle @weisy11 @D-DanielYang @Topdu @sdcb @ZeyuChen @data-ant @bingooo please help????? I am using PaddleOCR for Arabic dataset recognition, and I have achieved an accuracy of 97.76%. However, I am encountering issues with error rates, specifically on Arabic dates like the example provided below: ![]() When I input such images into my trained PaddleOCR recognition model, it occasionally misses 2, 3, or 4 words or adds extra integers. Despite attempting to train PaddleOCR separately on dates with a labeled dataset of approximately 60,000 images, each with exact and correct ground truth, the accuracy does not seem to improve. I am seeking assistance to understand why PaddleOCR is struggling with Arabic numbers and if there is any way to enhance its performance. Additionally, I am open to exploring alternative methods to address this issue. Can anyone provide insights or suggestions on improving the accuracy of PaddleOCR for recognizing Arabic numbers in date formats?" |
Beta Was this translation helpful? Give feedback.
-
@IbrarBabar009 - would you be able to share the finetuned Arabic model? Also have you done any fine-tuning on Arabic detection, as I am using paddle on scene-text detection & recognition in Arabic? |
Beta Was this translation helpful? Give feedback.
-
It will work fine just add general data along with dates also you needed to add Arabic 0 inn the dictionary |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to train PaddleOCR for arabic dataset for recognition, I am getting
I am training using this command
python -m paddle.distributed.launch --gpus '0' tools/train.py -c configs/rec/PP-OCRv3/multi_language/arabic_PP-OCRv3_rec.yml
No. of Training Samples: 95998
No. of val Samples: 10428
Here is the sample epoch output
[2023/10/02 10:05:52] ppocr INFO: epoch: [1352/1500], global_step: 2440, lr: 0.000162, acc: 0.000000, norm_edit_dis: 0.000000,CTCLoss: nanxxx, SARLoss: nanxxx, loss: nanxxx, avg_reader_cost: 0.00013 s, avg_batch_cost: 0.30052 s, avg_samples: 128.0,ips: 425.93432 samples/s, eta: 1 day, 8:19:23
this is my arabic_PP-OCRv3_rec.yml
I'm attempting to train the model solely on my Arabic dataset without fine-tuning, and I'm encountering the same issue whether I use a pretrained model and fine-tune it or train it directly on my Arabic dataset.
I have attempted to resolve this issue by extensively searching through PaddleOCR's GitHub issues, and I discovered that the only suggested solution is to increase the number of epochs. Consequently, I increased the number of epochs from 200 to 1500, but unfortunately, I have not been able to resolve the issue.
Is there anyone here who can provide assistance? What I am missing?
Beta Was this translation helpful? Give feedback.
All reactions