-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Compiling
Note: This wiki expects you to be familiar with compiling software on your operation system.
To install Tesseract 4.x you can simply run the following command on your Ubuntu 18.xx bionic:
sudo apt install tesseract-ocr
If you wish to install the Developer Tools which can be used for training, run the following command:
sudo apt install libtesseract-dev
The following instructions are for building on Linux, which also can be applied to other UNIX like operating systems.
- A compiler for C and C++: GCC or Clang
- GNU Autotools: autoconf, automake, libtool
- pkg-config
- Leptonica
- libpng, libjpeg, libtiff
If they are not already installed, you need the following libraries (Ubuntu 16.04/14.04):
sudo apt-get install g++ # or clang++ (presumably)
sudo apt-get install autoconf automake libtool
sudo apt-get install pkg-config
sudo apt-get install libpng-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev
if you plan to install the training tools, you also need the following libraries:
sudo apt-get install libicu-dev
sudo apt-get install libpango1.0-dev
sudo apt-get install libcairo2-dev
You also need to install Leptonica. Ensure that the development headers for Leptonica are installed before compiling Tesseract.
Tesseract versions and the minimum version of Leptonica required:
Tesseract | Leptonica | Ubuntu |
---|---|---|
4.00 | 1.74.2 | Ubuntu 18.04 |
3.05 | 1.74.0 | Must build from source |
3.04 | 1.71 | Ubuntu 16.04 |
3.03 | 1.70 | Ubuntu 14.04 |
3.02 | 1.69 | Ubuntu 12.04 |
3.01 | 1.67 |
One option is to install the distro's Leptonica package:
sudo apt-get install libleptonica-dev
but if you are using an oldish version of Linux, the Leptonica version may be too old, so you will need to build from source.
The sources are at https://github.com/DanBloomberg/leptonica . The instructions for building are given in Leptonica README.
Note that if building Leptonica from source, you may need to ensure that /usr/local/lib is in your library path. This is a standard Linux bug, and the information at Stackoverflow is very helpful.
Please follow instructions in https://github.com/tesseract-ocr/tesseract/wiki/Compiling--GitInstallation
Also read Install Instructions
Tesseract can be configured to install anywhere, which makes it possible to install it without root access.
To install it in $HOME/local:
./autogen.sh
./configure --prefix=$HOME/local/
make
make install
To install it in $HOME/local using Leptonica libraries also installed in $HOME/local:
./autogen.sh
LIBLEPT_HEADERSDIR=$HOME/local/include ./configure \
--prefix=$HOME/local/ --with-extra-libraries=$HOME/local/lib
make
make install
- Download the data file(s) for the language(s) you interest in.
- Move it to the
tessdata
directory (e.g. 'mv tessdata $TESSDATA_PREFIX' if definedTESSDATA_PREFIX
)
You can also use:
export TESSDATA_PREFIX=/some/path/to/tessdata
to point to your tessdata directory (example: if your tessdata path is '/usr/local/share/tessdata' you have to use 'export TESSDATA_PREFIX='/usr/local/share/').
!!! IMPORTANT !!! To use Tesseract in your application (to include tess or to link it into your app) see this very simple example https://github.com/tesseract-ocr/tesseract/wiki/User-App-Example.
-
Download the latest CPPAN (C++ Archive Network
https://cppan.org/
) client fromhttps://cppan.org/client/
. - Run
cppan --build pvt.cppan.demo.google.tesseract.tesseract-master
.
- Setup Vcpkg the Visual C++ Package Manager.
- Run
vcpkg install tesseract:x64-windows
for 64bit version. Using --head for master branch.
Today it is possible to build a full set of tess training tools on Windows with Visual Studio. The latest versions (Win10, VS2015/VS2017) are preferable.
To do this:
-
Download the latest CPPAN (C++ Archive Network
https://cppan.org/
) client fromhttps://cppan.org/client/
. - Run
cppan --build pvt.cppan.demo.google.tesseract-master
.
For development purposes of Tesseract itself do the next steps:
- Download and install Git, CMake and put them in PATH.
-
Download the latest CPPAN (C++ Archive Network
https://cppan.org/
) client fromhttps://cppan.org/client/
. CPPAN is a source package distribution system. Add CPPAN client in PATH too. (VS2015 redist is required.) - If you have a release archive, unpack it to
tesseract
dir.
If you're using master branch (4.0) run
git clone https://github.com/tesseract-ocr/tesseract tesseract
-
Run
cd tesseract cppan mkdir build && cd build cmake ..
-
Build a solution (
tesseract.sln
) in your Visual Studio version. If you want to build and install from command line (e.g. Release build) you can use this command:
cmake --build . --config Release --target install
If you want to install to other directory that C:\Program Files (you will need admin right for this), you need to specify install path during configuration:
cmake .. -G "Visual Studio 15 2017 Win64" -DCMAKE_INSTALL_PREFIX=inst
For development purposes of training tools after cloning a repo from previous paragraph, run
cppan --build .
You'll see a solution link appeared in the root directory of Tesseract.
If you're building with cppan+cmake, run cmake as follows:
mkdir win64 && cd win64
cppan ..
cmake .. -G "Visual Studio 14 2015 Win64"
If you're building with cppan, edit cppan.yml and uncomment this line:
#generator: Visual Studio 14 2015 Win64 -> generator: Visual Studio 14 2015 Win64
Then run cppan --generate .
- it will create a solution link for you.
(For VS2017, use '15 2017' instead of '14 2015'.)
If you have Visual Studio 2015, checkout the https://github.com/peirick/VS2015_Tesseract repository for Visual Studio 2015 Projects for Tessearct and dependencies. and click on build_tesseract.bat. After that you still need to download the language packs.
Have a look at blog How to build Tesseract 3.03 with Visual Studio 2013.
For tesseract-ocr 3.02 please follow instruction in Visual Studio 2008 Developer Notes for Tesseract-OCR.
Download these packages from the Downloads Archive on SourceForge page:
-
tesseract-3.01.tar.gz
- Tesseract source -
tesseract-3.01-win_vs.zip
- Visual studio (2008 & 2010) solution with necessary libraries -
tesseract-ocr-3.01.eng.tar.gz
- English language file for Tesseract (or download other language training file)
Unpack them to one directory (e.g. tesseract-3.01
). Note that tesseract-ocr-3.01.eng.tar.gz
names the root directory 'tesseract-ocr'
instead of 'tesseract-3.01'
.
Windows relevant files are located in vs2008 directory (e.g. 'tesseract-3.01\vs2008'). The same build process as usual applies: Open tesseract.sln with VC++Express 2008 and build all (or just Tesseract.) It should compile (in at least release mode) without having to install anything further. The dll dependencies and Leptonica are included. Output will be in tesseract-3.01\vs2008\bin (or tesseract-3.01\vs2008\bin.rd or tesseract-3.01\vs2008\bin.dbg based on configuration build).
For Mingw+Msys have a look at blog Compiling Leptonica and Tesseract-ocr with Mingw+Msys.
Download and install MSYS2 Installer from https://msys2.github.io/
The core packages groups you need to install if you wish to build from PKGBUILDs are:
- base-devel for any building
- msys2-devel for building msys2 packages
- mingw-w64-i686-toolchain for building mingw32 packages
- mingw-w64-x86_64-toolchain for building mingw64 packages
To build the tesseract-ocr release package, use PKGBUILD from https://github.com/Alexpux/MINGW-packages/tree/master/mingw-w64-tesseract-ocr
To build on Cygwin have a look at blog How to build Tesseract on Cygwin.
Tesseract as well as the training utilities for 3.04.00 onwards are available as Cygwin packages.
Tesseract specific packages to be installed:
tesseract-ocr 3.04.01-1
tesseract-ocr-eng 3.04-1
tesseract-training-core 3.04-1
tesseract-training-eng 3.04-1
tesseract-training-util 3.04.01-1
Mingw-w64 allows building 32- or 64-bit executables for Windows. It can be used for native compilations on Windows, but also for cross compilations on Linux (which are easier and faster than native compilations). Most large Linux distributions already contain packages with the tools need for a cross build. Before building Tesseract, it is necessary to build some prerequisites.
For Debian and similar distributions (e. g. Ubuntu), the cross tools can be installed like that:
# Development environment targeting 32- and 64-bit Windows (required)
apt-get install mingw-w64
# Development tools for 32- and 64-bit Windows (optional)
apt-get install mingw-w64-tools
These prerequisites will be needed:
- libpng, libtiff, zlib (binaries for Mingw-w64 available as part of the GTK+ bundles)
- libicu
- liblcms2
- openjpeg
- leptonica
Typically a package manager like Fink, Homebrew or MacPorts is needed in addition to Apple's Xcode.
Xcode and the related command line tools provides the compiler (llvm-gcc
) and linker, but also libraries like zlib
. The package manager provides free software packages which are not part of Xcode.
The Xcode Command Line Tools can be installed by running xcode-select --install
.
Note that Tesseract 4 can be built with OpenMP support, but that requires additional installations.
Fink (as of 2017-04) neither provides Leptonica nor the packages needed for the Tesseract training tools, so it cannot be recommended for building Tesseract.
# Install cmake if it is not available.
sudo port install cmake
git clone https://github.com/llvm-mirror/openmp.git
cd openmp
mkdir build
cd build
cmake ..
make
sudo make install
sudo port install autoconf \
automake \
libtool \
pkgconfig \
leptonica
Compilation itself relies on the Autotools suite:
git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure
make
sudo make install
If compilation fails at the make
command, with libtool
erring on missing instructions, you may be building with MacPort's g++
compiler, with known issues. The community recommends to use clang
, but a workaround for g++
is to re-configure the build:
./configure CXXFLAGS=-Wa,-q
And then proceed with make
.
In the above training tools are not installed. You can install not only Tesseract but also training tools like below.
sudo port install cairo pango
sudo port install icu +devel
git clone https://github.com/tesseract-ocr/tesseract/
cd tesseract
./autogen.sh
./configure
make training
sudo make install training-install
brew install automake autoconf libtool
brew install pkgconfig
brew install icu4c
brew install leptonica
brew install gcc
In the above, training tool dependencies are not installed. You can install them like below.
brew install pango
As of January 2017, the clang builds but OpenMP will only use a single thread, reducing performance. For best results, use gcc.
The exact values of CPPFLAGS
and LDFLAGS
can be read from brew info icu4c
.
git clone https://github.com/tesseract-ocr/tesseract/
cd tesseract
./autogen.sh
./configure CC=gcc-6 CXX=g++-6 CPPFLAGS=-I/usr/local/opt/icu4c/include LDFLAGS=-L/usr/local/opt/icu4c/lib
make -j
sudo make install # if desired
make training # if installed with training dependencies
- To fix this error
./configure: line 4237: syntax error near unexpected token `-mavx,'
./configure: line 4237: `AX_CHECK_COMPILE_FLAG(-mavx, avx=1, avx=0)'
ensure that autoconf-archive
is installed. Don't forget to run ./autogen.sh
after the installation of autoconf-archive
. Note this error happens often under CentOS, where autoconf-archive
is missing and no package is available. Some projects help with installing.
The latest code from GitHub does not require autoconf-archive
.
-
If configure fails with such error "configure: error: Leptonica 1.74 or higher is required." Try to install libleptonica-dev package.
-
If you are sure you have installed leptonica (for example in /usr/local) then probably pkg-config is not looking at your install folder (check with
pkg-config --variable pc_path pkg-config
).
A solution is to set PKG_CONFIG_PATH : example :PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
- On some systems autotools does not create m4 directory automatically (giving the error: "configure: error: cannot find macro directory 'm4'").
In this case you must create m4 directory (mkdir m4
), and then rerun the above commands starting with ./configure.
Old wiki - no longer maintained. The pages were moved, see the new documentation.
These wiki pages are no longer maintained.
All pages were moved to tesseract-ocr/tessdoc.
The latest documentation is available at https://tesseract-ocr.github.io/.