Skip to content

Commit 939c6b8

Browse files
ShreeShree
authored andcommitted
Symlink multiple Training Tesseract pages
1 parent a34c2e6 commit 939c6b8

7 files changed

+46
-75
lines changed

Documentation.md

Lines changed: 6 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,4 @@
1-
## Technical Documentation
2-
3-
[Technical Papers and Presentations](Technical-Documentation.md)
4-
5-
## Tesseract 4.0 with LSTM
6-
7-
For information about the new LSTM based tesseract engine, please see the [documentation](4.0-with-LSTM.md).
8-
9-
## Manual Pages (4.x)
1+
## Manual Pages
102

113
The manual pages for tesseract and related training tools are available at following links:
124

@@ -22,36 +14,25 @@ The manual pages for tesseract and related training tools are available at follo
2214
* [unicharset\_extractor](https://github.com/tesseract-ocr/tesseract/blob/master/doc/unicharset_extractor.1.asc)
2315
* [wordlist2dawg](https://github.com/tesseract-ocr/tesseract/blob/master/doc/wordlist2dawg.1.asc)
2416

25-
## Changes to Tesseract
26-
27-
* [Release Notes](ReleaseNotes.md)
28-
* [Change Log](https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog)
29-
3017
## API/ABI Changes Review
3118

3219
[The review of API/ABI changes for Tesseract since 3.00 version](https://abi-laboratory.pro/tracker/timeline/tesseract/) has been created with the help of open-source [abi-tracker tool](https://github.com/lvc/abi-tracker). The tool checks all API symbols declared in header files (doesn't take docs into account), so there may be some false positives.
3320

34-
## Doxygen (Tesseract latest from github)
21+
## Source Documentation generated by Doxygen
22+
23+
###Tesseract latest from GitHub
3524

3625
Documentation of tesseract generated on Jan 30 2020 from master branch (5.0.0-alpha-619-ge9db
3726
) by [doxygen](http://www.doxygen.org) can be found at [tesseract-ocr.github.io](https://tesseract-ocr.github.io/tessapi/5.x/index.html)
3827

39-
## Doxygen (Tesseract 4.1.1)
28+
### Tesseract 4.1.1
4029

4130
Documentation of tesseract generated on 1.8.17 (4.1.1 release) by [doxygen](http://www.doxygen.org) can be found at [fossies.org](https://fossies.org/dox/tesseract-4.1.1/index.html)
4231

43-
## Doxygen (Tesseract 4.00.00dev)
32+
### Tesseract 4.00.00dev
4433

4534
Documentation of tesseract generated on Sat May 20, 2017 from master branch (4.0) by [doxygen](http://www.doxygen.org) can be found at [ub-mannheim.github.io](https://ub-mannheim.github.io/tesseract/)
4635

47-
## Doxygen (3.05.02)
48-
49-
Documentation of tesseract (3.05.02) generated on Oct 29 2018 by [doxygen](http://www.doxygen.org) can be found on [tesseract-ocr.github.io](https://tesseract-ocr.github.io/tessapi/3.05.02/)
50-
51-
## Doxygen (3.04 from July 2015)
52-
53-
Documentation of tesseract generated from source code as of July 2015 by [doxygen](http://www.doxygen.org) can be found on [tesseract-ocr.github.io](https://tesseract-ocr.github.io/tessapi/3.x/)
54-
5536
## FAQ
5637

5738
[Frequently Asked Questions](FAQ.md)

OldVersionDocs.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@
1818

1919
### Other pages for legacy Tesseract engine
2020

21-
- [Traineddata files](tess3/Data-Files.md)
22-
- [FAQ - Old version](FAQ-Old.md)
23-
- [Technical Documentation](Technical-Documentation.md)
21+
- [Traineddata files for Tesseract 3.0x](tess3/Data-Files.md)
22+
- [FAQ - Old version](tess3/FAQ-Old.md)
23+
- [Technical Documentation](tess3/Technical-Documentation.md)
2424
- [API/ABI changes for Tesseract since 3.00 version](https://abi-laboratory.pro/tracker/timeline/tesseract/)
2525
- [Slides from Tutorial on Tesseract presented at DAS2014](https://drive.google.com/file/d/0B7l10Bj_LprhbUlIUFlCdGtDYkE/edit?usp=sharing)
2626
- [Source Code Documentation by Doxygen - 3.x](https://tesseract-ocr.github.io/tessapi/3.x/)
@@ -31,7 +31,6 @@
3131
- [Training Tesseract - 3.03–3.05](tess3/Training-Tesseract-3.03–3.05.md)
3232
- [Training Tesseract - 3.00–3.02](tess3/Training-Tesseract-3.00–3.02.md)
3333

34-
3534
## Tesseract 2
3635

3736
- [TrainingTesseract2](tess3/TrainingTesseract2.md)

README.md

Lines changed: 33 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22

33
## Introduction
44

5-
Tesseract is an open source [text recognition (OCR)](https://en.wikipedia.org/wiki/Optical_character_recognition) Engine, available under the [Apache 2.0 license.](http://www.apache.org/licenses/LICENSE-2.0).
6-
* The current official release is [4.1.1](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1).
7-
* The [master branch on Github](https://github.com/tesseract-ocr/tesseract.git) can be used by those who want the latest 5.0.0.Alpha code for LSTM (--oem 1) and legacy (--oem 0) Tesseract.
8-
* The [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for [3.05.02](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.02) release for legacy Tesseract.
5+
Tesseract is an open source [text recognition (OCR)](https://en.wikipedia.org/wiki/Optical_character_recognition) Engine, available under the [Apache 2.0 license.](http://www.apache.org/licenses/LICENSE-2.0).
6+
* The current official release is [4.1.1](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1).
7+
* The [master branch on Github](https://github.com/tesseract-ocr/tesseract.git) can be used by those who want the latest 5.0.0.Alpha code for LSTM (--oem 1) and legacy (--oem 0) Tesseract.
8+
* The [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for [3.05.02](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.02) release for legacy Tesseract.
99

1010
Tesseract can be used directly via command line, or (for programmers) using an [API](https://github.com/tesseract-ocr/tesseract/blob/master/include/tesseract/baseapi.h) to extract printed text from images. It supports a [wide variety of languages](Data-Files-in-different-versions.md). Tesseract doesn't have a built-in GUI, but there are several available from the [3rdParty](User-Projects-–-3rdParty.md) page.
1111

12-
Tesseract can be used in your own project, under the terms of the [Apache License 2.0.](http://www.apache.org/licenses/LICENSE-2.0) It has a fully featured API, and can be compiled for a variety of targets including Android and the iPhone. See the [3rdParty](User-Projects-–-3rdParty.md) page for a sample of what has been done with it.
12+
Tesseract can be used in your own project, under the terms of the [Apache License 2.0.](http://www.apache.org/licenses/LICENSE-2.0) It has a fully featured API, and can be compiled for a variety of targets including Android and the iPhone. See the [3rdParty](User-Projects-–-3rdParty.md) page for a sample of what has been done with it.
1313

1414
If you have a question, first read the [documentation](https://tesseract-ocr.github.io/), particularly the [FAQ](FAQ.md) to see if your problem is addressed there. If not, search the [Tesseract user forum](http://groups.google.com/group/tesseract-ocr) or the
1515
[Tesseract developer forum](http://groups.google.com/group/tesseract-dev), and if you still can't find what you need, please ask us there.
@@ -29,25 +29,13 @@ This user manual is for Tesseract versions 4.x.x and 5.0.0.Alpha. For versions 3
2929
- [Changelog](https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog)
3030
- [4.0x-Changelog](4.0x-Changelog.md)
3131

32-
## [4.0 with LSTM](4.0-with-LSTM.md)
32+
## 4.0 with LSTM
3333

34-
[Tesseract **4.0x+**](4.0-with-LSTM.md) added a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for [100+ languages and 35+ scripts](Data-Files-in-different-versions.md) is available in [tessdata](https://github.com/tesseract-ocr/tessdata), [tessdata_best](https://github.com/tesseract-ocr/tessdata_best), [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast) repositories.
35-
36-
### Technical Information
37-
- [NeuralNetsInTesseract4.00](NeuralNetsInTesseract4.00.md)
38-
- [VGSLSpecs](VGSLSpecs.md)
39-
- [VGSLSpecs info from Tensorflow](https://github.com/mldbai/tensorflow-models/blob/master/street/g3doc/vgslspecs.md)
40-
- [DAS 2016 tutorial slides](https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016)
41-
Slides
42-
[#2](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf),
43-
[#6](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/6ModernizationEfforts.pdf),
44-
[#7](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/7Building%20a%20Multi-Lingual%20OCR%20Engine.pdf)
45-
have information about LSTM integration in Tesseract 4.0x.
46-
- [TesseractOpenCL](TesseractOpenCL.md)
34+
Tesseract **4.0x+** added a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for [100+ languages and 35+ scripts](Data-Files-in-different-versions.md) is available in [tessdata](https://github.com/tesseract-ocr/tessdata), [tessdata_best](https://github.com/tesseract-ocr/tessdata_best), [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast) repositories.
4735

4836
## 5.0.0.Alpha
4937

50-
Tesseract **5.0.0.Alpha** source code is available in the 'master' branch of the [repository](https://github.com/tesseract-ocr/tesseract).
38+
Tesseract **5.0.0.Alpha** source code is available in the 'master' branch of the [repository](https://github.com/tesseract-ocr/tesseract).
5139

5240
### Compiling and Installation
5341
- [Compiling](Compiling.md)
@@ -70,21 +58,34 @@ Tesseract **5.0.0.Alpha** source code is available in the 'master' branch of the
7058
- [Common Errors and Resolutions](4.0x-Common-Errors-and-Resolutions.md)
7159
- [Frequently Asked Qustions](FAQ.md)
7260

73-
61+
### Technical Information
62+
- [HistoricalTechnical Documentation](tess3/Technical-Documentation.md)
63+
- [API/ABI changes review for Tesseract](https://abi-laboratory.pro/?view=timeline&l=tesseract)
64+
- [Manual Pages](Documentation.md#manual-pages)
65+
- [Source Documentation generated by Doxygen](Documentation.md#source-documentation-generated-by-Doxygen)
66+
- [NeuralNetsInTesseract4.00](NeuralNetsInTesseract4.00.md)
67+
- [VGSLSpecs](VGSLSpecs.md)
68+
- [VGSLSpecs info from Tensorflow](https://github.com/mldbai/tensorflow-models/blob/master/street/g3doc/vgslspecs.md)
69+
- [DAS 2016 tutorial slides](https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016)
70+
Slides
71+
[#2](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf),
72+
[#6](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/6ModernizationEfforts.pdf),
73+
[#7](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/7Building%20a%20Multi-Lingual%20OCR%20Engine.pdf)
74+
have information about LSTM integration in Tesseract 4.0x.
75+
- [4.0 Accuracy and Performance](4.0-Accuracy-and-Performance.md)
76+
- [Tesseract OpenCL - Experimental](TesseractOpenCL.md)
7477

7578
### Training
76-
- [4.0-Accuracy-and-Performance](4.0-Accuracy-and-Performance.md)
77-
- [4.0-with-LSTM](4.0-with-LSTM.md)
78-
- [Documentation](Documentation.md)
79-
- [Fonts](Fonts.md)
80-
- [LSTM Training from Images and Groundtruth Transcription](https://github.com/tesseract-ocr/tesstrain)
81-
- [Making-Box-Files---4.0](Making-Box-Files---4.0.md)
82-
- [Technical-Documentation](Technical-Documentation.md)
79+
- [Makefile based Training from Images and Groundtruth Transcription](https://github.com/tesseract-ocr/tesstrain)
80+
- [TrainingTesseract 4.00 - Detailed Guide](TrainingTesseract-4.00.md)
81+
-- [Hardware-Software Requirements](#hardware-software-requirements)
82+
-- [Training Text Requirements](TrainingTesseract-4.00.md#training-text-requirements)
83+
-- [Fonts](Fonts.md)
84+
-- [Making-Box-Files---4.0](Making-Box-Files---4.0.md)
85+
-- [LSTMTraining Command Line](TrainingTesseract-4.00.md#lstmtraining-command-line)
86+
-- [Error Messages From Training](TrainingTesseract-4.00.md#error-messages-from-training)
87+
- [Community Contributions for Finetune Training](TrainingTesseract-4.00---Finetune.md)
8388
- [The-Hallucination-Effect](The-Hallucination-Effect.md)
84-
- [Training-Tesseract](Training-Tesseract.md)
85-
- [TrainingTesseract-4.00---Finetune](TrainingTesseract-4.00---Finetune.md)
86-
- [TrainingTesseract-4.00](TrainingTesseract-4.00.md)
87-
8889

8990
### Testing
9091
- [TestingTesseract](TestingTesseract.md)

Training-Tesseract.md

Lines changed: 0 additions & 7 deletions
This file was deleted.

Training-Tesseract.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
TrainingTesseract-4.00.md
Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
Please read [TrainingTesseract 4.00](TrainingTesseract-4.00.md)
2-
3-
There have been many changes made to LSTM training process.
1+
## Community Contributions for Finetune Training
42

53
You can see the following links where there are modified training scripts created by Tesseract users:
64

@@ -9,4 +7,4 @@ You can see the following links where there are modified training scripts create
97
* [wiki.wareya.moe - tesstrain.sh at pastebin](https://pastebin.com/cD5wctUG)
108
* [wiki.wareya.moe - tesstrain_utils.sh at pastebin](https://pastebin.com/TfqJUxSR)
119

12-
To train from line images and its matching ground truth, please see the project [ocr-d/train](https://github.com/OCR-D/ocrd-train) which creates box files and lstmf files using the line images.
10+
Kristóf Horváth keeps a [LSTM (community) training guide](https://docs.google.com/document/d/1qDqbnlptcCPVIvMOHwfNws-CQat-llZLOTHC6S94Vec).

TrainingTesseract.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

TrainingTesseract.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
TrainingTesseract-4.00.md
File renamed without changes.

0 commit comments

Comments
 (0)