Skip to content

Commit ab958da

Browse files
kevinconKirill Makankov
authored andcommitted
Removed setInputName in hOCR and cleaned docs.
I tested the hOCR method without calling `setInputName` and it works fine without it, suggesting that this upstream Tesseract issue requiring it to be set to the empty string has been resolved in the version of Tesseract we are using (3.03-rc1): https://code.google.com/p/tesseract-ocr/issues/detail?id=463 I also cleaned up the documentation a little bit since several strings were longer than 80 characters and some docstrings were not formatted correctly (like @param, etc.).
1 parent 65a3d0c commit ab958da

File tree

2 files changed

+42
-26
lines changed

2 files changed

+42
-26
lines changed

TesseractOCR/G8Tesseract.h

Lines changed: 41 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,9 @@
4141
@property (nonatomic, copy) NSString* language;
4242

4343
/**
44-
* The path to the tessdata file, if it was specified in a call to initWithLanguage:configDictionary:configFileNames:cachesRelatedDataPath:engineMode: as a cachesRelatedDataPath
45-
* Otherwise it's supposed that the tessdata folder is located in the application bundle
44+
* The absolute path to the tessdata folder, which may exist in either the
45+
* application bundle or in the Caches directory depending on the argument to
46+
* `cachesRelatedDataPath` in the designated initializer.
4647
*/
4748
@property (nonatomic, readonly, copy) NSString *absoluteDataPath;
4849

@@ -112,10 +113,17 @@
112113
*/
113114
@property (nonatomic, readonly) NSString *recognizedText;
114115

115-
/*
116-
* Make an HTML-formatted string with hOCR markup from the internal data structures.
117-
* page_number is 0-based but will appear in the output as 1-based.
118-
*/
116+
/**
117+
* Make an HTML-formatted string with hOCR markup from the internal Tesseract
118+
* data structures.
119+
* page_number is 0-based but will appear in the output as 1-based.
120+
*
121+
* @param pageNumber The page number within the image of interest. If you
122+
* aren't using a multipage image or don't know what this
123+
* means, use `0` for `pageNumber`.
124+
*
125+
* @return The HTML-formatted string with hOCR markup.
126+
*/
119127
- (NSString *)recognizedHOCRForPageNumber:(int)pageNumber;
120128

121129
/**
@@ -174,7 +182,7 @@
174182

175183
/**
176184
* Retrieve Tesseract's recognition result based on a provided resolution.
177-
* For, example for the pageIteratorLevel == G8PageIteratorLevelSymbol it returns
185+
* For example, the pageIteratorLevel == G8PageIteratorLevelSymbol returns
178186
* an array of `G8RecognizedBlock`'s representing the characters recognized
179187
* in the target image, including the bounding boxes for each character.
180188
*
@@ -183,8 +191,9 @@
183191
* resolution options.
184192
*
185193
* @return An array of `G8RecognizedBlock`'s, each containing a confidence
186-
* value and a bounding box for the text it represents. See G8RecognizedBlock.h for more
187-
* information about the available fields for this data structure.
194+
* value and a bounding box for the text it represents. See
195+
* G8RecognizedBlock.h for more information about the available fields
196+
* for this data structure.
188197
*/
189198
- (NSArray *)recognizedBlocksByIteratorLevel:(G8PageIteratorLevel)pageIteratorLevel;
190199

@@ -237,17 +246,25 @@
237246
/**
238247
* Initialize Tesseract with the provided language and engine mode.
239248
*
240-
* @param language The language to use in recognition. See `language`.
241-
* @param configDictionary A dictionary of the config variables
242-
* @param configFileNames An array of file names containing key-value config pairs. All the config
243-
* variables can be init only and debug time both. Furthermore they could be
244-
* specified at the same time, in such case tesseract will get variables from
245-
* every file and dictionary all together.
246-
* The files are searched into two folders, which are tessdata/tessconfigs and tessdata/configs
247-
* @param cachesRelatedPath If the cachesRelatedDataPath is specified, the whole content of the tessdata from the
248-
* application bundle is copied to the Library/Caches/cachesRelatedDataPath/tessdata
249-
* and tesseract is initialized with that path.
250-
* @param engineMode The engine mode to use in recognition. See `engineMode`.
249+
* @param language The language to use in recognition. See
250+
* `language`.
251+
* @param configDictionary A dictionary of config variables to set.
252+
* @param configFileNames An array of file names containing key-value
253+
* config pairs. Config settings can be set at
254+
* initialization or run-time. Furthermore, they
255+
* could be specified at the same time, in which
256+
* case Tesseract will get variables from every
257+
* config file as well as the dictionary.
258+
* The config files must exist in one of two
259+
* possible folders: tessdata/tessconfigs or
260+
* tessdata/configs.
261+
* @param cachesRelatedPath If the cachesRelatedDataPath is specified, the
262+
* whole contents of the tessdata folder in the
263+
* application bundle will be copied to
264+
* Library/Caches/cachesRelatedDataPath/tessdata
265+
* and Tesseract will be set to use that path.
266+
* @param engineMode The engine mode to use in recognition. See
267+
* `engineMode`.
251268
*
252269
* @return The initialized Tesseract object, or `nil` if there was an error.
253270
*/
@@ -268,12 +285,13 @@
268285
- (void)setVariableValue:(NSString *)value forKey:(NSString *)key;
269286

270287
/**
271-
* Returns a Tesseract variable for the given key. See G8TesseractParameters.h for the available
272-
* options.
288+
* Returns a Tesseract variable for the given key. See G8TesseractParameters.h
289+
* for the available options.
273290
*
274291
* @param key The option to get.
275292
*
276-
* @return returns the variable value for the given key, if it's beeb set. nil otherwise.
293+
* @return Returns the variable value for the given key, if it's been set.
294+
* nil otherwise.
277295
*/
278296
- (NSString*)variableValueForKey:(NSString *)key;
279297

TesseractOCR/G8Tesseract.mm

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -599,9 +599,7 @@ - (NSArray *)recognizedBlocksByIteratorLevel:(G8PageIteratorLevel)pageIteratorLe
599599
}
600600

601601
- (NSString *)recognizedHOCRForPageNumber:(int)pageNumber {
602-
603-
_tesseract->SetInputName("");
604-
char* hocr = _tesseract->GetHOCRText(pageNumber);
602+
char *hocr = _tesseract->GetHOCRText(pageNumber);
605603
if (hocr) {
606604
NSString *text = [NSString stringWithUTF8String:hocr];
607605
free(hocr);

0 commit comments

Comments
 (0)