You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[The review of API/ABI changes for Tesseract since 3.00 version](https://abi-laboratory.pro/tracker/timeline/tesseract/) has been created with the help of open-source [abi-tracker tool](https://github.com/lvc/abi-tracker). The tool checks all API symbols declared in header files (doesn't take docs into account), so there may be some false positives.
33
20
34
-
## Doxygen (Tesseract latest from github)
21
+
## Source Documentation generated by Doxygen
22
+
23
+
###Tesseract latest from GitHub
35
24
36
25
Documentation of tesseract generated on Jan 30 2020 from master branch (5.0.0-alpha-619-ge9db
37
26
) by [doxygen](http://www.doxygen.org) can be found at [tesseract-ocr.github.io](https://tesseract-ocr.github.io/tessapi/5.x/index.html)
38
27
39
-
##Doxygen (Tesseract 4.1.1)
28
+
### Tesseract 4.1.1
40
29
41
30
Documentation of tesseract generated on 1.8.17 (4.1.1 release) by [doxygen](http://www.doxygen.org) can be found at [fossies.org](https://fossies.org/dox/tesseract-4.1.1/index.html)
42
31
43
-
##Doxygen (Tesseract 4.00.00dev)
32
+
### Tesseract 4.00.00dev
44
33
45
34
Documentation of tesseract generated on Sat May 20, 2017 from master branch (4.0) by [doxygen](http://www.doxygen.org) can be found at [ub-mannheim.github.io](https://ub-mannheim.github.io/tesseract/)
46
35
47
-
## Doxygen (3.05.02)
48
-
49
-
Documentation of tesseract (3.05.02) generated on Oct 29 2018 by [doxygen](http://www.doxygen.org) can be found on [tesseract-ocr.github.io](https://tesseract-ocr.github.io/tessapi/3.05.02/)
50
-
51
-
## Doxygen (3.04 from July 2015)
52
-
53
-
Documentation of tesseract generated from source code as of July 2015 by [doxygen](http://www.doxygen.org) can be found on [tesseract-ocr.github.io](https://tesseract-ocr.github.io/tessapi/3.x/)
Copy file name to clipboardExpand all lines: README.md
+33-32Lines changed: 33 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,14 +2,14 @@
2
2
3
3
## Introduction
4
4
5
-
Tesseract is an open source [text recognition (OCR)](https://en.wikipedia.org/wiki/Optical_character_recognition) Engine, available under the [Apache 2.0 license.](http://www.apache.org/licenses/LICENSE-2.0).
6
-
* The current official release is [4.1.1](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1).
7
-
* The [master branch on Github](https://github.com/tesseract-ocr/tesseract.git) can be used by those who want the latest 5.0.0.Alpha code for LSTM (--oem 1) and legacy (--oem 0) Tesseract.
8
-
* The [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for [3.05.02](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.02) release for legacy Tesseract.
5
+
Tesseract is an open source [text recognition (OCR)](https://en.wikipedia.org/wiki/Optical_character_recognition) Engine, available under the [Apache 2.0 license.](http://www.apache.org/licenses/LICENSE-2.0).
6
+
* The current official release is [4.1.1](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1).
7
+
* The [master branch on Github](https://github.com/tesseract-ocr/tesseract.git) can be used by those who want the latest 5.0.0.Alpha code for LSTM (--oem 1) and legacy (--oem 0) Tesseract.
8
+
* The [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for [3.05.02](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.02) release for legacy Tesseract.
9
9
10
10
Tesseract can be used directly via command line, or (for programmers) using an [API](https://github.com/tesseract-ocr/tesseract/blob/master/include/tesseract/baseapi.h) to extract printed text from images. It supports a [wide variety of languages](Data-Files-in-different-versions.md). Tesseract doesn't have a built-in GUI, but there are several available from the [3rdParty](User-Projects-–-3rdParty.md) page.
11
11
12
-
Tesseract can be used in your own project, under the terms of the [Apache License 2.0.](http://www.apache.org/licenses/LICENSE-2.0) It has a fully featured API, and can be compiled for a variety of targets including Android and the iPhone. See the [3rdParty](User-Projects-–-3rdParty.md) page for a sample of what has been done with it.
12
+
Tesseract can be used in your own project, under the terms of the [Apache License 2.0.](http://www.apache.org/licenses/LICENSE-2.0) It has a fully featured API, and can be compiled for a variety of targets including Android and the iPhone. See the [3rdParty](User-Projects-–-3rdParty.md) page for a sample of what has been done with it.
13
13
14
14
If you have a question, first read the [documentation](https://tesseract-ocr.github.io/), particularly the [FAQ](FAQ.md) to see if your problem is addressed there. If not, search the [Tesseract user forum](http://groups.google.com/group/tesseract-ocr) or the
15
15
[Tesseract developer forum](http://groups.google.com/group/tesseract-dev), and if you still can't find what you need, please ask us there.
@@ -29,25 +29,13 @@ This user manual is for Tesseract versions 4.x.x and 5.0.0.Alpha. For versions 3
[Tesseract **4.0x+**](4.0-with-LSTM.md) added a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for [100+ languages and 35+ scripts](Data-Files-in-different-versions.md) is available in [tessdata](https://github.com/tesseract-ocr/tessdata), [tessdata_best](https://github.com/tesseract-ocr/tessdata_best), [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast) repositories.
have information about LSTM integration in Tesseract 4.0x.
46
-
-[TesseractOpenCL](TesseractOpenCL.md)
34
+
Tesseract **4.0x+** added a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for [100+ languages and 35+ scripts](Data-Files-in-different-versions.md) is available in [tessdata](https://github.com/tesseract-ocr/tessdata), [tessdata_best](https://github.com/tesseract-ocr/tessdata_best), [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast) repositories.
47
35
48
36
## 5.0.0.Alpha
49
37
50
-
Tesseract **5.0.0.Alpha** source code is available in the 'master' branch of the [repository](https://github.com/tesseract-ocr/tesseract).
38
+
Tesseract **5.0.0.Alpha** source code is available in the 'master' branch of the [repository](https://github.com/tesseract-ocr/tesseract).
51
39
52
40
### Compiling and Installation
53
41
-[Compiling](Compiling.md)
@@ -70,21 +58,34 @@ Tesseract **5.0.0.Alpha** source code is available in the 'master' branch of the
70
58
-[Common Errors and Resolutions](4.0x-Common-Errors-and-Resolutions.md)
There have been many changes made to LSTM training process.
1
+
## Community Contributions for Finetune Training
4
2
5
3
You can see the following links where there are modified training scripts created by Tesseract users:
6
4
@@ -9,4 +7,4 @@ You can see the following links where there are modified training scripts create
9
7
*[wiki.wareya.moe - tesstrain.sh at pastebin](https://pastebin.com/cD5wctUG)
10
8
*[wiki.wareya.moe - tesstrain_utils.sh at pastebin](https://pastebin.com/TfqJUxSR)
11
9
12
-
To train from line images and its matching ground truth, please see the project [ocr-d/train](https://github.com/OCR-D/ocrd-train) which creates box files and lstmf files using the line images.
10
+
Kristóf Horváth keeps a [LSTM (community) training guide](https://docs.google.com/document/d/1qDqbnlptcCPVIvMOHwfNws-CQat-llZLOTHC6S94Vec).
0 commit comments