Skip to content

Empty page!! error + non working models when training from existing box #5

@mnavasloro

Description

@mnavasloro

We are trying to build an OCR for an ancient Spanish language called Iberian. When using jTessBoxEditor, we continuously get the following error: "Empty page!!"

We got to train a basic model for this language from a UTF-8 font. This model works, even if the previous error also appeared. Nevertheless, when we try to improve this model using "Train with Existing Box" with boxes created using also jessTessBoxEditor (when we open it we see the characters are correctly included in the boxes) the error appears and generates the files, but does not work (no matter what text/image you use, even with pictures recognized correctly by the original model). We tried to do this "train with existing box" with existing languages (e.g. English), but got the same error.

Is there anything we are doing wrong? We leave the log below.

Thank you and best regards,

María


** Run Tesseract for Training **
[C:\Program Files\Tesseract-OCR/tesseract, ibe.iberian.exp0.tif, ibe.iberian.exp0, box.train]
Empty page!!

** Compute Character Set **
[C:\Program Files\Tesseract-OCR/unicharset_extractor, ibe.iberian.exp0.box]
Extracting unicharset from box file ibe.iberian.exp0.box
Wrote unicharset file unicharset

** Set Character Set Properties **
[C:\Program Files\Tesseract-OCR/set_unicharset_properties, -U, unicharset, -O, unicharset, --script_dir=C:\Users\sergi\Desktop]
Loaded unicharset of size 4 from file unicharset
Setting unichar properties
Setting script properties
Failed to load script unicharset from:C:\Users\sergi\Desktop/Latin.unicharset
Failed to load script unicharset from:C:\Users\sergi\Desktop/Unknown.unicharset
Warning: properties incomplete for index 3 = 
Writing unicharset to file unicharset

** Shape Clustering **
[C:\Program Files\Tesseract-OCR/shapeclustering, -F, ibe.font_properties, -U, unicharset, ibe.iberian.exp0.tr]
Reading ibe.iberian.exp0.tr ...
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Master shape_table:Number of shapes = 0 max unichars = 0 number with multiple unichars = 0

** MF Training **
[C:\Program Files\Tesseract-OCR/mftraining, -F, ibe.font_properties, -U, unicharset, -O, ibe.unicharset, ibe.iberian.exp0.tr]
Done!
Read shape table shapetable of 0 shapes
Reading ibe.iberian.exp0.tr ...
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Warning: no protos/configs for  in CreateIntTemplates()

** CN Training **
[C:\Program Files\Tesseract-OCR/cntraining, ibe.iberian.exp0.tr]
Reading ibe.iberian.exp0.tr ...
Clustering ...

Writing normproto ...

Successful rename of inttemp
Successful rename of pffmtable
Successful rename of normproto
Successful rename of shapetable
** Dictionary Data **
[C:\Program Files\Tesseract-OCR/wordlist2dawg, ibe.frequent_words_list, ibe.freq-dawg, ibe.unicharset]
Loading unicharset from 'ibe.unicharset'
Reading word list from 'ibe.frequent_words_list'
Reducing Trie to SquishedDawg
Dawg is empty, skip producing the output file

[C:\Program Files\Tesseract-OCR/wordlist2dawg, ibe.words_list, ibe.word-dawg, ibe.unicharset]
Loading unicharset from 'ibe.unicharset'
Reading word list from 'ibe.words_list'
Reducing Trie to SquishedDawg
Dawg is empty, skip producing the output file

** Combine Data Files **
[C:\Program Files\Tesseract-OCR/combine_tessdata, ibe.]
Combining tessdata files
Output ibe.traineddata created successfully.
Version:v5.4.0.20240606
1:unicharset:size=237, offset=192
3:inttemp:size=110687, offset=429
4:pffmtable:size=48, offset=111116
5:normproto:size=182, offset=111164
13:shapetable:size=4, offset=111346
23:version:size=15, offset=111350

** Moving generated traineddata file to tessdata folder **
** Training Completed **

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions