Skip to content

Commit 778cca6

Browse files
committed
Merge branch 'refactor/prod-and-dev-Dockerfiles' into ci/refactor-ghas
2 parents 13abb2d + 26b4f63 commit 778cca6

File tree

3 files changed

+13
-6
lines changed

3 files changed

+13
-6
lines changed

services/document-extractor/Dockerfile.dev

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,14 @@ WORKDIR /app
99

1010
RUN DEBIAN_FRONTEND=noninteractive apt-get update \
1111
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
12-
build-essential make ffmpeg poppler-utils tesseract-ocr tesseract-ocr-deu tesseract-ocr-eng \
12+
build-essential make ffmpeg poppler-utils \
13+
tesseract-ocr tesseract-ocr-deu tesseract-ocr-eng \
14+
libleptonica-dev pkg-config \
1315
&& python3 -m venv "${POETRY_VIRTUALENVS_PATH}" \
1416
&& ${POETRY_VIRTUALENVS_PATH}/bin/pip install "poetry==${POETRY_VERSION}" \
1517
&& rm -rf /var/lib/apt/lists/*
1618
ENV PATH="${POETRY_VIRTUALENVS_PATH}/bin:$PATH"
19+
ENV TESSDATA_PREFIX=/usr/share/tesseract-ocr/5/tessdata
1720

1821
# Copy lockfiles first
1922
COPY services/document-extractor/pyproject.toml services/document-extractor/poetry.lock /app/services/document-extractor/

services/document-extractor/README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,21 @@ The following endpoints are provided by the *documents_extractor*:
77
# Requirements
88
All required python libraries can be found in the [pyproject.toml](pyproject.toml) file.
99
In addition to python libraries the following system packages are required:
10-
```
10+
11+
```shell
1112
build-essential
1213
make
1314
ffmpeg
1415
poppler-utils
1516
tesseract-ocr
1617
tesseract-ocr-deu
1718
tesseract-ocr-eng
19+
libleptonica-dev
20+
pkg-config
1821
```
1922

23+
The Tesseract data path is set via `TESSDATA_PREFIX=/usr/share/tesseract-ocr/5/tessdata` in both prod and dev images.
24+
2025
# Endpoints
2126

2227
## `/extract`
@@ -31,4 +36,3 @@ The following types of information will be extracted:
3136
A detailed explanation of the deployment can be found in the [project README](../../README.md).
3237
The *helm-chart* used for the deployment can be found in the [infrastructure directory](../../infrastructure/).
3338

34-

services/mcp-server/Dockerfile.dev

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM --platform=linux/amd64 python:3.11.7-bookworm
1+
FROM --platform=linux/amd64 python:3.13.7-bookworm
22

33
# Dev image for mcp-server (no local libs)
44
ENV POETRY_VIRTUALENVS_PATH=/app/services/mcp-server/.venv
@@ -23,8 +23,8 @@ RUN poetry config virtualenvs.create false \
2323
&& cd /app/services/mcp-server \
2424
&& poetry install --no-interaction --no-ansi --no-root --with dev
2525

26-
# Create non-root user
27-
RUN adduser --disabled-password --gecos "" --uid 65532 nonroot
26+
# Create non-root user (align with prod UID for consistent file perms)
27+
RUN adduser --disabled-password --gecos "" --uid 10001 nonroot
2828

2929
WORKDIR /app/services/mcp-server
3030
RUN mkdir -p log && chmod 700 log \

0 commit comments

Comments
 (0)