Skip to content

Commit 26b4f63

Browse files
committed
chore: update README.md to improve formatting and add missing system packages
1 parent 582b858 commit 26b4f63

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

services/document-extractor/README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,21 @@ The following endpoints are provided by the *documents_extractor*:
77
# Requirements
88
All required python libraries can be found in the [pyproject.toml](pyproject.toml) file.
99
In addition to python libraries the following system packages are required:
10-
```
10+
11+
```shell
1112
build-essential
1213
make
1314
ffmpeg
1415
poppler-utils
1516
tesseract-ocr
1617
tesseract-ocr-deu
1718
tesseract-ocr-eng
19+
libleptonica-dev
20+
pkg-config
1821
```
1922

23+
The Tesseract data path is set via `TESSDATA_PREFIX=/usr/share/tesseract-ocr/5/tessdata` in both prod and dev images.
24+
2025
# Endpoints
2126

2227
## `/extract`
@@ -31,4 +36,3 @@ The following types of information will be extracted:
3136
A detailed explanation of the deployment can be found in the [project README](../../README.md).
3237
The *helm-chart* used for the deployment can be found in the [infrastructure directory](../../infrastructure/).
3338

34-

0 commit comments

Comments
 (0)