Skip to content

Commit 6aca289

Browse files
authored
feat: update to use new kv-cache UDS tokenizer (#609)
* feat: update to use new kv-cache UDS tokenizer - change preprocessing to types from kv-cache - add new unit test case: same tests from the old ones - keep old test case but mark it wont be used(for now can be removed later) - add new make target to build UDS image - update image to use the one from llm-d in deploy - remove parts in Dockerfile to only build go code Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: more changes for UDS in makefile and docs Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: add comments for download-tokenizer and remove as dependecy to build Signed-off-by: Wen Zhou <wenzhou@redhat.com> * GHAction: remove lint-and-test which still using python Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: fix rebase Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: lint with 2.8.0 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: code review - remove env variable LDFLAGS PYTHON_CONFIG CGO_CFLAGS TOKENIZER_ARCH PYTHON_VERSION epp_* and sidecar_* for CGO - update documentation - remove make targets related to tokenizer, pythone Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>
1 parent 5c89823 commit 6aca289

File tree

17 files changed

+788
-321
lines changed

17 files changed

+788
-321
lines changed

.github/workflows/ci-pr-checks.yaml

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -52,21 +52,15 @@ jobs:
5252
go-version: "${{ env.GO_VERSION }}"
5353
cache-dependency-path: ./go.sum
5454

55-
- name: Configure CGO for Python
55+
- name: Configure CGO for ZMQ
5656
run: |
57-
PYTHON_INCLUDE=$(python3 -c "import sysconfig; print(sysconfig.get_path('include'))")
58-
echo "CPATH=${PYTHON_INCLUDE}:${CPATH}" >> $GITHUB_ENV
5957
echo "CGO_ENABLED=1" >> $GITHUB_ENV
60-
echo "CGO_CFLAGS=$(python3-config --cflags --embed)" >> $GITHUB_ENV
61-
echo "CGO_LDFLAGS=$(python3-config --ldflags --embed)" >> $GITHUB_ENV
62-
63-
- name: Set PKG_CONFIG_PATH
64-
run: echo "PKG_CONFIG_PATH=/usr/lib/pkgconfig" >> $GITHUB_ENV
58+
echo "PKG_CONFIG_PATH=/usr/lib/pkgconfig" >> $GITHUB_ENV
6559
6660
- name: Install dependencies
6761
run: |
6862
go mod tidy
69-
sudo -E env "PATH=$PATH" make install-dependencies install-python-deps
63+
sudo -E env "PATH=$PATH" make install-dependencies
7064
7165
- name: Run lint checks
7266
uses: golangci/golangci-lint-action@v9
@@ -76,9 +70,6 @@ jobs:
7670
skip-cache: true
7771
env:
7872
CGO_ENABLED: ${{ env.CGO_ENABLED }}
79-
CGO_CFLAGS: ${{ env.CGO_CFLAGS }}
80-
CGO_LDFLAGS: ${{ env.CGO_LDFLAGS }}
81-
CPATH: ${{ env.CPATH }}
8273
PKG_CONFIG_PATH: ${{ env.PKG_CONFIG_PATH }}
8374

8475
- name: Run make build

DEVELOPMENT.md

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,38 +6,37 @@ Documentation for developing the inference scheduler.
66

77
- [Make] `v4`+
88
- [Golang] `v1.24`+
9-
- [Python] `v3.12`
109
- [Docker] (or [Podman])
1110
- [Kubernetes in Docker (KIND)]
1211
- [Kustomize]
12+
- [ZeroMQ]
1313

1414
[Make]:https://www.gnu.org/software/make/
1515
[Golang]:https://go.dev/
16-
[Python]:https://www.python.org/
1716
[Docker]:https://www.docker.com/
1817
[Podman]:https://podman.io/
1918
[Kubernetes in Docker (KIND)]:https://github.com/kubernetes-sigs/kind
2019
[Kustomize]:https://kubectl.docs.kubernetes.io/installation/kustomize/
20+
[ZeroMQ]:https://zeromq.org/
2121

22-
### Python Version Configuration
22+
> [!NOTE]
23+
> **Python is NOT required** as of v0.5.1. Tokenization is handled by a separate UDS (Unix Domain Socket) tokenizer sidecar container. Previous versions (< v0.5.1) used embedded Python tokenizers with daulet/tokenizers bindings, but these are now deprecated.
2324
24-
The project uses Python 3.12 by default, but this can be configured:
25+
## Tokenization Architecture
2526

26-
**For local development:**
27-
`PYTHON_VERSION` in the Makefile set which Python version is used.
27+
The project uses **UDS (Unix Domain Socket)** tokenization. Tokenization is handled by a separate UDS tokenizer sidecar container, not by the EPP container itself. Previous embedded tokenizer approaches (daulet/tokenizers, direct Python/vLLM linking) are deprecated and no longer used.
2828

29-
**For Docker builds:**
30-
The Python version is parameterized in the Dockerfile via the `PYTHON_VERSION` build argument, which defaults to 3.12. To build with a different Python version:
29+
**Building the UDS tokenizer image:**
3130

3231
```bash
33-
PYTHON_VERSION=3.13 make image-build
34-
35-
# Or directly with Docker
36-
docker build --build-arg PYTHON_VERSION=3.13 -f Dockerfile.epp .
32+
make image-build-uds-tokenizer
3733
```
3834

39-
**For CI/CD:**
40-
Workflow uses Python 3.12 by default. The version can be set by modifying the `python-version` input in workflow file.
35+
The image is tagged as `ghcr.io/llm-d/llm-d-uds-tokenizer:dev` by default. Override with:
36+
37+
```bash
38+
UDS_TOKENIZER_TAG=v1.0.0 make image-build-uds-tokenizer
39+
```
4140

4241
## Kind Development Environment
4342

Dockerfile.epp

Lines changed: 24 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,23 @@
11
## Minimal runtime Dockerfile (microdnf-only, no torch, wrapper in site-packages)
2-
# Go dependencies stage: download go modules and extract kv-cache
3-
FROM quay.io/projectquay/golang:1.25 AS go-deps
2+
## Simplified EPP Dockerfile - UDS tokenizer only (no vLLM, no embedded tokenizer)
3+
## This build uses the default kv-cache pool (UDS-only, no embedded_tokenizers build tag)
4+
## Tokenization is handled by a separate UDS tokenizer sidecar container
5+
##
6+
## CGO is still required for ZMQ (kvevents) but Python/vLLM dependencies are removed
7+
# Go build stage
8+
FROM quay.io/projectquay/golang:1.25 AS go-builder
9+
10+
ARG TARGETOS
11+
ARG TARGETARCH
412

513
WORKDIR /workspace
614

15+
# Install ZMQ development libraries (required for CGO)
16+
# The builder is based on UBI8, so we need epel-release-8
17+
RUN dnf install -y 'https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm' && \
18+
dnf install -y zeromq-devel pkgconfig && \
19+
dnf clean all
20+
721
# Copy the Go Modules manifests
822
COPY go.mod go.mod
923
COPY go.sum go.sum
@@ -14,98 +28,26 @@ COPY pkg/ pkg/
1428

1529
RUN go mod download
1630

17-
# Copy Python wrapper and requirements from llm-d-kv-cache dependency
18-
# Extract version dynamically and copy to a known location
19-
RUN KV_CACHE_PKG=$(go list -m -f '{{.Dir}}' github.com/llm-d/llm-d-kv-cache) && \
20-
mkdir -p /workspace/kv-cache && \
21-
cp -r $KV_CACHE_PKG/* /workspace/kv-cache
22-
23-
FROM python:3.12-slim AS python-builder
24-
25-
ARG TARGETARCH
26-
27-
COPY --from=go-deps /workspace/kv-cache /workspace/kv-cache
28-
WORKDIR /workspace/kv-cache
29-
30-
# Create venv and install vLLM based on architecture using pre-built wheels
31-
RUN python3.12 -m venv /workspace/kv-cache/build/venv && \
32-
. /workspace/kv-cache/build/venv/bin/activate && \
33-
pip install --upgrade pip && \
34-
VLLM_VERSION="0.14.0" && \
35-
if [ "$TARGETARCH" = "arm64" ]; then \
36-
pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cpu-cp38-abi3-manylinux_2_35_aarch64.whl; \
37-
elif [ "$TARGETARCH" = "amd64" ]; then \
38-
pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cpu-cp38-abi3-manylinux_2_35_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cpu; \
39-
else \
40-
echo "ERROR: Unsupported architecture: $TARGETARCH. Only arm64 and amd64 are supported." && exit 1; \
41-
fi
42-
43-
# Go build stage
44-
FROM quay.io/projectquay/golang:1.25 AS go-builder
45-
46-
ARG TARGETOS
47-
ARG TARGETARCH
48-
ARG PYTHON_VERSION=3.12
49-
ENV PYTHON=python${PYTHON_VERSION}
50-
51-
# Install build tools
52-
# The builder is based on UBI8, so we need epel-release-8.
53-
# ${PYTHON}-devel needed for CGO compilation (Python headers and ${PYTHON}-config for linker flags)
54-
RUN dnf install -y 'https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm' && \
55-
dnf install -y gcc-c++ libstdc++ libstdc++-devel clang zeromq-devel pkgconfig ${PYTHON}-devel ${PYTHON}-pip git && \
56-
dnf clean all
57-
58-
COPY --from=go-deps /workspace /workspace
59-
COPY --from=go-deps /go/pkg/mod /go/pkg/mod
60-
61-
WORKDIR /workspace
62-
63-
COPY Makefile* ./
64-
65-
COPY --from=python-builder /workspace/kv-cache/pkg/preprocessing/chat_completions /workspace/kv-cache/pkg/preprocessing/chat_completions
66-
RUN make setup-venv
67-
COPY --from=python-builder /workspace/kv-cache/build/venv/lib/python3.12/site-packages /workspace/build/venv/lib/python3.12/site-packages
68-
69-
ENV PYTHONPATH=/workspace/kv-cache/pkg/preprocessing/chat_completions:/workspace/build/venv/lib/python3.12/site-packages
70-
RUN python3.12 -c "import tokenizer_wrapper" # verify tokenizer_wrapper is correctly installed
71-
72-
ARG RELEASE_VERSION=v1.22.1
73-
RUN TOKENIZER_VERSION=${RELEASE_VERSION} make build-epp
31+
# Build EPP with CGO for ZMQ only (no Python, no embedded tokenizer)
32+
# The default kv-cache build uses UDS tokenizer (//go:build !embedded_tokenizers)
33+
RUN CGO_ENABLED=1 GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -o bin/epp cmd/epp/main.go
7434

7535
# Runtime stage
7636
# Use ubi9 as a minimal base image to package the manager binary
7737
# Refer to https://catalog.redhat.com/software/containers/ubi9/ubi-minimal/615bd9b4075b022acc111bf5 for more details
7838
FROM registry.access.redhat.com/ubi9/ubi-minimal:9.7
79-
ARG PYTHON_VERSION=3.12
80-
WORKDIR /
81-
COPY --from=go-builder /workspace/bin/epp /app/epp
8239

83-
USER root
40+
WORKDIR /
8441

85-
ENV PYTHON=python${PYTHON_VERSION}
86-
# Install zeromq runtime library and Python runtime needed by the manager.
87-
# The final image is UBI9, so we need epel-release-9.
88-
# Using microdnf for minimal image size
42+
# Install ZMQ runtime library only (no Python needed)
8943
RUN curl -L -o /tmp/epel-release.rpm https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
9044
rpm -i /tmp/epel-release.rpm && \
9145
rm /tmp/epel-release.rpm && \
92-
microdnf install -y --setopt=install_weak_deps=0 zeromq ${PYTHON} ${PYTHON}-libs ${PYTHON}-pip && \
46+
microdnf install -y --setopt=install_weak_deps=0 zeromq && \
9347
microdnf clean all && \
94-
rm -rf /var/cache/yum /var/lib/yum && \
95-
# Note: ${PYTHON} package does not automatically create python3/python symlinks - they must be created manually
96-
ln -sf /usr/bin/${PYTHON} /usr/bin/python3 && \
97-
ln -sf /usr/bin/${PYTHON} /usr/bin/python
48+
rm -rf /var/cache/yum /var/lib/yum
9849

99-
# Copy Python kv-cache package and site-packages from the python-builder stage
100-
COPY --from=python-builder /workspace/kv-cache /workspace/kv-cache
101-
ENV PYTHONPATH=/workspace/kv-cache/pkg/preprocessing/chat_completions:/workspace/kv-cache/build/venv/lib/python3.12/site-packages
102-
RUN ${PYTHON} -c "import tokenizer_wrapper" # verify tokenizer_wrapper is correctly installed
103-
104-
ENV HF_HOME="/tmp/.cache"
105-
# used by kv-cache-manager
106-
ENV LOCAL_TOKENIZER_DIR="/tmp/.cache"
107-
# Create cache directory and set permissions for non-root user
108-
RUN mkdir -p /tmp/.cache && chown -R 65532:65532 ${HF_HOME}
50+
COPY --from=go-builder /workspace/bin/epp /app/epp
10951

11052
USER 65532:65532
11153

0 commit comments

Comments
 (0)