-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Description
Expected Behavior:
The API function PrintVariables prints current parameters to a file, and ReadConfigFile reads parameters from a file. Intuitively, ReadConfigFile should be able to read the files that PrintVariables writes. This is explicitly assumed within the ProcessPage function, where these functions are used together to "Save current config variables before switching modes" and then "Restore saved config variables".
Lines 1293 to 1306 in a873553
| // Save current config variables before switching modes. | |
| FILE *fp = fopen(kOldVarsFile, "wb"); | |
| if (fp == nullptr) { | |
| tprintf("Error, failed to open file \"%s\"\n", kOldVarsFile); | |
| } else { | |
| PrintVariables(fp); | |
| fclose(fp); | |
| } | |
| // Switch to alternate mode for retry. | |
| ReadConfigFile(retry_config); | |
| SetImage(pix); | |
| Recognize(nullptr); | |
| // Restore saved config variables. | |
| ReadConfigFile(kOldVarsFile); |
Current Behavior:
Unfortunately, this does not currently work properly. The issue is that PrintVariables prints parameter descriptions alongside key/value pairs (e.g. chs_trailing_punct1 ).,;:?! 1st Trailing punctuation), and ReadConfigFile reads the description as a value (for string parameters). An example showing this is below.
#include <tesseract/baseapi.h>
int main()
{
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
if (api->Init(NULL, "eng")) {
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}
static const char *kOldVarsFile = "failed_vars.txt";
// Print default value of chs_trailing_punct1
printf("Initial value: %s\n", api->GetStringVariable("chs_trailing_punct1"));
FILE *fp = fopen(kOldVarsFile, "wb");
api->PrintVariables(fp);
fclose(fp);
api->ReadConfigFile(kOldVarsFile);
printf("After PrintVariables/ReadConfigFile: %s\n", api->GetStringVariable("chs_trailing_punct1"));
api->End();
delete api;
return 0;
}
This returns the following:
Initial value: ).,;:?!
After PrintVariables/ReadConfigFile: ).,;:?! 1st Trailing punctuation
The impact of this is:
ProcessPagedoes not work correctly when used withretry_config- There is no simple interface for generating a config file with the user's current settings
- This is useful for saving/restoring configurations (as
ProcessPageattempts to do)
- This is useful for saving/restoring configurations (as
Suggested Fix:
The simplest solution would be to remove the descriptions from the PrintVariables output (or at least hide that behavior behind an option). I can write a PR if others agree this makes sense. Editing ReadConfigFile to ignore the descriptions is likely also possible, but could be higher effort.
Environment
Tesseract Version: 5.2.0
Commit Number: 15200c6
Platform: Linux ubuntu 5.15.0-43-generic