Skip to content

Commit 9c84ae8

Browse files
author
Kirill Makankov
committed
Fixed misspelling.
Squashed commits: [7dab0a3] 1. dataPath is now readonly public. 2. designated initializer changed to - (id)initWithLanguage:(NSString *)language configDictionary:(NSDictionary *)configDictionary configFileNames:(NSArray *)configFileNames dataPath:(NSString *)dataPath engineMode:(G8OCREngineMode)engineMode NS_DESIGNATED_INITIALIZER; where configDictionary is a dictioanry of the config variables and configFileNames is an array of file names containing key-value config pairs. All the config variables can be init only and debug time both. Furthermore they could be specified at the same time, in such case tesseract will get variables from every file and dictionary all together. 3. To avoid ambiguity with Apple Naming Conventions the following functions were renamed: - setUpTesseractToSearchTrainedDataInTrainedDataFolderOfTheApplicatinBundle (only setters should start with set) - initEngine (only initializer should start with init. in such case the function should return id) - copyDataToDocumentsDirectory (if function starts from copy, the special rules are implemented by ARC to the function) 4. if the dataPath is specified, the whole content of the tessdata from the application bundle is copied to the Documents/dataPath/tessdata and tesseract is initialized with that path. 5. _dataPath is replaced with self.dataPath everywhere besides init method. 2. some unit tests print a message if the result of copyItemAtPath == NO Small changes Symlinking tessdata files from app bundle to the Documents folder. updated lib Check that destination file doesn't exist before symlinking updated lib.
1 parent 48b3dbd commit 9c84ae8

File tree

6 files changed

+317
-49
lines changed

6 files changed

+317
-49
lines changed

Products/TesseractOCR.framework/Versions/A/Headers/G8Tesseract.h

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,12 @@
4040
*/
4141
@property (nonatomic, copy) NSString* language;
4242

43+
/**
44+
* The path to the tessdata file, if it was specified in a call to initWithDataPath:language:(NSString engineMode:configFileNames:
45+
* Otherwise it's supposed that the tessdata folder is located in the application bundle
46+
*/
47+
@property (nonatomic, readonly, copy) NSString *dataPath;
48+
4349
/**
4450
* The recognition mode to use. See `G8OCREngineMode` in G8Constants.h for the
4551
* available recognition modes.
@@ -228,6 +234,29 @@
228234
- (id)initWithLanguage:(NSString*)language
229235
engineMode:(G8OCREngineMode)engineMode;
230236

237+
/**
238+
* Initialize Tesseract with the provided language and engine mode.
239+
*
240+
* @param language The language to use in recognition. See `language`.
241+
* @param configDictionary A dictioanry of the config variables
242+
* @param configFileNames An array of file names containing key-value config pairs. All the config
243+
* variables can be init only and debug time both. Furthermore they could be
244+
* specified at the same time, in such case tesseract will get variables from
245+
* every file and dictionary all together.
246+
* @param dataPath If the dataPath is specified, the whole content of the tessdata from the
247+
* application bundle is copied to the Documents/dataPath/tessdata
248+
* and tesseract is initialized with that path.
249+
* @param engineMode The engine mode to use in recognition. See `engineMode`.
250+
*
251+
* @return The initialized Tesseract object, or `nil` if there was an error.
252+
*/
253+
254+
- (id)initWithLanguage:(NSString *)language
255+
configDictionary:(NSDictionary *)configDictionary
256+
configFileNames:(NSArray *)configFileNames
257+
dataPath:(NSString *)dataPath
258+
engineMode:(G8OCREngineMode)engineMode NS_DESIGNATED_INITIALIZER;
259+
231260
/**
232261
* Set a Tesseract variable. See G8TesseractParameters.h for the available
233262
* options.
21.9 KB
Binary file not shown.

TesseractOCR/G8Tesseract.h

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,12 @@
4040
*/
4141
@property (nonatomic, copy) NSString* language;
4242

43+
/**
44+
* The path to the tessdata file, if it was specified in a call to initWithDataPath:language:(NSString engineMode:configFileNames:
45+
* Otherwise it's supposed that the tessdata folder is located in the application bundle
46+
*/
47+
@property (nonatomic, readonly, copy) NSString *dataPath;
48+
4349
/**
4450
* The recognition mode to use. See `G8OCREngineMode` in G8Constants.h for the
4551
* available recognition modes.
@@ -228,6 +234,29 @@
228234
- (id)initWithLanguage:(NSString*)language
229235
engineMode:(G8OCREngineMode)engineMode;
230236

237+
/**
238+
* Initialize Tesseract with the provided language and engine mode.
239+
*
240+
* @param language The language to use in recognition. See `language`.
241+
* @param configDictionary A dictioanry of the config variables
242+
* @param configFileNames An array of file names containing key-value config pairs. All the config
243+
* variables can be init only and debug time both. Furthermore they could be
244+
* specified at the same time, in such case tesseract will get variables from
245+
* every file and dictionary all together.
246+
* @param dataPath If the dataPath is specified, the whole content of the tessdata from the
247+
* application bundle is copied to the Documents/dataPath/tessdata
248+
* and tesseract is initialized with that path.
249+
* @param engineMode The engine mode to use in recognition. See `engineMode`.
250+
*
251+
* @return The initialized Tesseract object, or `nil` if there was an error.
252+
*/
253+
254+
- (id)initWithLanguage:(NSString *)language
255+
configDictionary:(NSDictionary *)configDictionary
256+
configFileNames:(NSArray *)configFileNames
257+
dataPath:(NSString *)dataPath
258+
engineMode:(G8OCREngineMode)engineMode NS_DESIGNATED_INITIALIZER;
259+
231260
/**
232261
* Set a Tesseract variable. See G8TesseractParameters.h for the available
233262
* options.

TesseractOCR/G8Tesseract.mm

Lines changed: 97 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919
#import "pix.h"
2020
#import "ocrclass.h"
2121
#import "allheaders.h"
22+
#import "genericvector.h"
23+
#import "strngs.h"
2224

2325
namespace tesseract {
2426
class TessBaseAPI;
@@ -30,6 +32,8 @@ @interface G8Tesseract () {
3032
}
3133

3234
@property (nonatomic, copy) NSString *dataPath;
35+
@property (nonatomic, strong) NSDictionary *configDictionary;
36+
@property (nonatomic, strong) NSArray *configFileNames;
3337
@property (nonatomic, strong) NSMutableDictionary *variables;
3438

3539
@property (readwrite, assign) CGSize imageSize;
@@ -67,22 +71,26 @@ - (id)init
6771

6872
- (id)initWithLanguage:(NSString*)language
6973
{
70-
return [self initPrivateWithDataPath:nil language:language engineMode:G8OCREngineModeTesseractOnly];
74+
return [self initWithLanguage:language configDictionary:nil configFileNames:nil dataPath:nil engineMode:G8OCREngineModeTesseractOnly];
7175
}
7276

7377
- (id)initWithLanguage:(NSString *)language engineMode:(G8OCREngineMode)engineMode
7478
{
75-
return [self initPrivateWithDataPath:nil language:language engineMode:engineMode];
79+
return [self initWithLanguage:language configDictionary:nil configFileNames:nil dataPath:nil engineMode:engineMode];
7680
}
7781

78-
- (id)initPrivateWithDataPath:(NSString *)dataPath
79-
language:(NSString *)language
80-
engineMode:(G8OCREngineMode)engineMode
82+
- (id)initWithLanguage:(NSString *)language
83+
configDictionary:(NSDictionary *)configDictionary
84+
configFileNames:(NSArray *)configFileNames
85+
dataPath:(NSString *)dataPath
86+
engineMode:(G8OCREngineMode)engineMode
8187
{
8288
self = [super init];
8389
if (self != nil) {
8490
_dataPath = [dataPath copy];
8591
_language = [language copy];
92+
_configDictionary = configDictionary;
93+
_configFileNames = configFileNames;
8694
_engineMode = engineMode;
8795
_pageSegmentationMode = G8PageSegmentationModeSingleBlock;
8896
_variables = [NSMutableDictionary dictionary];
@@ -93,16 +101,25 @@ - (id)initPrivateWithDataPath:(NSString *)dataPath
93101
_monitor->cancel = (CANCEL_FUNC)[self methodForSelector:@selector(tesseractCancelCallbackFunction:)];
94102
_monitor->cancel_this = (__bridge void*)self;
95103

96-
if (dataPath != nil) {
97-
[self copyDataToDocumentsDirectory];
104+
if (self.dataPath != nil) {
105+
// config Tesseract to search trainedData in tessdata folder of the Documents folder];
106+
NSArray *documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
107+
NSString *documentPath = documentPaths.firstObject;
108+
assert(documentPath);
109+
_dataPath = [documentPath stringByAppendingPathComponent:self.dataPath];
110+
111+
[self moveTessdataToDocumentsDirectoryIfNecessary];
98112
}
99113
else {
100-
[self setUpTesseractToSearchTrainedDataInTrainedDataFolderOfTheApplicatinBundle];
114+
// config Tesseract to search trainedData in tessdata folder of the application bundle];
115+
_dataPath = [NSString stringWithFormat:@"%@/", [NSString stringWithString:[NSBundle bundleForClass:self.class].bundlePath]];
101116
}
117+
118+
setenv("TESSDATA_PREFIX", [self.dataPath stringByAppendingString:@"/"].UTF8String, 1);
102119

103120
_tesseract = new tesseract::TessBaseAPI();
104121

105-
BOOL success = [self initEngine];
122+
BOOL success = [self configEngine];
106123
if (success == NO) {
107124
self = nil;
108125
}
@@ -125,17 +142,31 @@ - (void)dealloc
125142
}
126143
}
127144

128-
- (void)setUpTesseractToSearchTrainedDataInTrainedDataFolderOfTheApplicatinBundle
145+
- (BOOL)configEngine
129146
{
130-
NSString *datapath =
131-
[NSString stringWithFormat:@"%@/", [NSString stringWithString:[[NSBundle mainBundle] bundlePath]]];
132-
setenv("TESSDATA_PREFIX", datapath.UTF8String, 1);
133-
}
147+
GenericVector<STRING> tessKeys;
148+
for( NSString *key in self.configDictionary.allKeys ){
149+
tessKeys.push_back(STRING(key.UTF8String));
150+
}
134151

135-
- (BOOL)initEngine
136-
{
152+
GenericVector<STRING> tessValues;
153+
for( NSString *val in self.configDictionary.allValues ){
154+
tessValues.push_back(STRING(val.UTF8String));
155+
}
156+
157+
int count = (int)self.configFileNames.count;
158+
const char **configs = (const char **)malloc(sizeof(int) * count);
159+
for (int i = 0; i < count; i++) {
160+
configs[i] = ((NSString*)self.configFileNames[i]).UTF8String;
161+
}
137162
int returnCode = _tesseract->Init(self.dataPath.UTF8String, self.language.UTF8String,
138-
(tesseract::OcrEngineMode)self.engineMode);
163+
(tesseract::OcrEngineMode)self.engineMode,
164+
(char **)configs, count,
165+
&tessKeys, &tessValues,
166+
false);
167+
if (configs != nullptr) {
168+
free(configs);
169+
}
139170
return returnCode == 0;
140171
}
141172

@@ -147,7 +178,7 @@ - (void)resetFlags
147178

148179
- (BOOL)resetEngine
149180
{
150-
BOOL isInitDone = [self initEngine];
181+
BOOL isInitDone = [self configEngine];
151182
if (isInitDone) {
152183
[self loadVariables];
153184
[self resetFlags];
@@ -159,43 +190,60 @@ - (BOOL)resetEngine
159190
return isInitDone;
160191
}
161192

162-
- (void)copyDataToDocumentsDirectory
193+
- (BOOL)moveTessdataToDocumentsDirectoryIfNecessary
163194
{
164-
// Useful paths
165195
NSFileManager *fileManager = [NSFileManager defaultManager];
166-
NSArray *documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
167-
NSString *documentPath = documentPaths.firstObject;
168-
NSString *dataPath = [documentPath stringByAppendingPathComponent:self.dataPath];
169-
170-
// NSString *dataPath = [[NSBundle mainBundle] pathForResource:@"grc" ofType:@"traineddata"];
171-
NSLog(@"DATAPATH %@", dataPath);
172-
173-
// Copy data in Doc Directory
174-
if ([fileManager fileExistsAtPath:dataPath] == NO) {
175-
[fileManager createDirectoryAtPath:dataPath withIntermediateDirectories:YES attributes:nil error:nil];
196+
197+
// Useful paths
198+
NSString *tessdataFolderName = @"tessdata";
199+
NSString *tessdataPath = [[NSBundle bundleForClass:self.class].resourcePath stringByAppendingPathComponent:tessdataFolderName];
200+
NSString *destinationPath = [self.dataPath stringByAppendingPathComponent:tessdataFolderName];
201+
NSLog(@"Tesseract destination path: %@", destinationPath);
202+
203+
if ([fileManager fileExistsAtPath:destinationPath] == NO) {
204+
NSError *error = nil;
205+
BOOL res = [fileManager createDirectoryAtPath:destinationPath withIntermediateDirectories:YES attributes:nil error:nil];
206+
if (error != nil) {
207+
NSLog(@"Error creating folder %@: %@", destinationPath, error);
208+
return NO;
209+
}
210+
if (res == NO) {
211+
NSLog(@"Error creating folder %@", destinationPath);
212+
return NO;
213+
}
176214
}
177-
178-
NSBundle *bundle = [NSBundle bundleForClass:[self class]];
179-
for (NSString *languageName in [self.language componentsSeparatedByString:@"+"]) {
180-
NSString *tessdataPath = [bundle pathForResource:languageName ofType:@"traineddata"];
181-
182-
if (tessdataPath != nil) {
183-
NSString *destinationPath = [dataPath stringByAppendingPathComponent:tessdataPath.lastPathComponent];
184-
185-
if([fileManager fileExistsAtPath:destinationPath] == NO) {
186-
NSError *error = nil;
187-
NSLog(@"found %@", tessdataPath);
188-
NSLog(@"coping in %@", destinationPath);
189-
[fileManager copyItemAtPath:tessdataPath toPath:destinationPath error:&error];
190-
191-
if(error != nil) {
192-
NSLog(@"ERROR! %@", error.description);
193-
}
215+
216+
BOOL result = YES;
217+
NSError *error = nil;
218+
NSArray *files = [fileManager contentsOfDirectoryAtPath:tessdataPath error:&error];
219+
if (error != nil) {
220+
NSLog(@"ERROR! %@", error.description);
221+
result = NO;
222+
}
223+
for (NSString *filename in files) {
224+
225+
NSString *destinationFileName = [destinationPath stringByAppendingPathComponent:filename];
226+
if (![fileManager fileExistsAtPath:destinationFileName]) {
227+
228+
NSString *filePath = [tessdataPath stringByAppendingPathComponent:filename];
229+
//NSLog(@"found %@", filePath);
230+
//NSLog(@"symlink in %@", destinationFileName);
231+
232+
BOOL res = [fileManager createSymbolicLinkAtPath:destinationFileName
233+
withDestinationPath:filePath
234+
error:&error];
235+
if (res == NO) {
236+
NSLog(@"The result of createSymbolicLinkAtPath == NO");
237+
result = NO;
238+
}
239+
if (error != nil) {
240+
NSLog(@"Error creating symlink %@: %@", filePath, error);
241+
result = NO;
194242
}
195243
}
196244
}
197-
198-
setenv("TESSDATA_PREFIX", [documentPath stringByAppendingString:@"/"].UTF8String, 1);
245+
246+
return result;
199247
}
200248

201249
- (void)setVariableValue:(NSString *)value forKey:(NSString *)key

TestsProject/TestsProject.xcodeproj/project.pbxproj

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
41C68DAF1A41825500848AE1 /* UIImage+G8Equal.m in Sources */ = {isa = PBXBuildFile; fileRef = 41C68DAE1A41825500848AE1 /* UIImage+G8Equal.m */; };
2727
41C68DB31A41849100848AE1 /* image_sample.jpg in Resources */ = {isa = PBXBuildFile; fileRef = 41C68DB21A41849100848AE1 /* image_sample.jpg */; };
2828
41C68DB51A41854600848AE1 /* image_sample_tr.png in Resources */ = {isa = PBXBuildFile; fileRef = 41C68DB41A41854600848AE1 /* image_sample_tr.png */; };
29+
732C54761A514DA6000322DA /* InitializationTests.m in Sources */ = {isa = PBXBuildFile; fileRef = 732C54751A514DA5000322DA /* InitializationTests.m */; };
2930
8FA2F9CE23919BEC8C64A5EA /* libPods-TestsProjectTests.a in Frameworks */ = {isa = PBXBuildFile; fileRef = 5CD3C116A45C293ADAC81D1B /* libPods-TestsProjectTests.a */; };
3031
/* End PBXBuildFile section */
3132

@@ -70,6 +71,7 @@
7071
41C68DB21A41849100848AE1 /* image_sample.jpg */ = {isa = PBXFileReference; lastKnownFileType = image.jpeg; path = image_sample.jpg; sourceTree = "<group>"; };
7172
41C68DB41A41854600848AE1 /* image_sample_tr.png */ = {isa = PBXFileReference; lastKnownFileType = image.png; path = image_sample_tr.png; sourceTree = "<group>"; };
7273
5CD3C116A45C293ADAC81D1B /* libPods-TestsProjectTests.a */ = {isa = PBXFileReference; explicitFileType = archive.ar; includeInIndex = 0; path = "libPods-TestsProjectTests.a"; sourceTree = BUILT_PRODUCTS_DIR; };
74+
732C54751A514DA5000322DA /* InitializationTests.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = InitializationTests.m; sourceTree = "<group>"; };
7375
/* End PBXFileReference section */
7476

7577
/* Begin PBXFrameworksBuildPhase section */
@@ -144,6 +146,7 @@
144146
4115B9771A3EF8E90004EC0A /* TestsProjectTests */ = {
145147
isa = PBXGroup;
146148
children = (
149+
732C54751A514DA5000322DA /* InitializationTests.m */,
147150
4115B97A1A3EF8E90004EC0A /* RecognitionTests.m */,
148151
4115B9781A3EF8E90004EC0A /* Supporting Files */,
149152
);
@@ -348,6 +351,7 @@
348351
files = (
349352
414121231A4C5A5700583ED4 /* G8RecognitionTestsHelper.m in Sources */,
350353
4115B97B1A3EF8E90004EC0A /* RecognitionTests.m in Sources */,
354+
732C54761A514DA6000322DA /* InitializationTests.m in Sources */,
351355
41C68DAF1A41825500848AE1 /* UIImage+G8Equal.m in Sources */,
352356
);
353357
runOnlyForDeploymentPostprocessing = 0;

0 commit comments

Comments
 (0)