Skip to content

Commit f5943eb

Browse files
committed
Update documents for supporting other platforms
1 parent c753840 commit f5943eb

File tree

9 files changed

+226
-149
lines changed

9 files changed

+226
-149
lines changed

.github/workflows/build-publish.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ jobs:
3737
IMAGE_ID=ghcr.io/${{ github.repository_owner }}/homl-vllm-cpu-base:latest
3838
docker buildx build \
3939
-t ${IMAGE_ID} \
40-
-f server/Dockerfile.cpu \
40+
-f server/Dockerfile.cpu.base \
4141
./vllm-src \
4242
--push
4343
@@ -67,7 +67,7 @@ jobs:
6767
docker buildx build \
6868
--build-arg HOML_SERVER_VERSION=$VERSION \
6969
-t ghcr.io/${{ github.repository_owner }}/homl/server:latest-cpu \
70-
-f Dockerfile.cpu.app \
70+
-f Dockerfile.cpu \
7171
. \
7272
--push
7373

README.md

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,14 @@ For detailed information on how to use the HoML CLI, please refer to our officia
5252

5353
[**HoML Documentation**](https://homl.dev/docs/cli.html)
5454

55+
5556
## TODO / Roadmap
56-
* [v] Improve vLLM startup time to support faster switching between models.
57-
* MultiGPU support: Enable multiple models running at the same time on different GPUs.
58-
* Enable multiple models running at the same time on the same GPU, this means we need to be able to estimate the vRAM usage of each model and manage the memory accordingly.
59-
* Add support for ROCm, Apple Silicon, and other architectures.
60-
* Add support for loading adapter layers.
61-
* Add support for endpoints other than chat/completion, such as embeddings and text generation.
57+
- [x] Improve vLLM startup time to support faster switching between models.
58+
- [ ] MultiGPU support: Enable multiple models running at the same time on different GPUs.
59+
- [ ] Enable multiple models running at the same time on the same GPU, this means we need to be able to estimate the vRAM usage of each model and manage the memory accordingly.
60+
- [ ] Add support for ROCm, Apple Silicon, and other architectures.
61+
- [ ] Add support for loading adapter layers.
62+
- [ ] Add support for endpoints other than chat/completion, such as embeddings and text generation.
6263

6364
## Contributing
6465

@@ -69,6 +70,21 @@ We are particularly looking for help with:
6970
* Testing and verifying models for the curated list.
7071
* Improving the CLI experience.
7172

73+
## Contribute / Build from Source
74+
Currently only CUDA version is officially supported, but other platform that vLLM can run on is possible if you want to build from source.
75+
76+
### Cli
77+
See [cli/README.md](cli/README.md)
78+
79+
### Server
80+
See [server/README.md](server/README.md)
81+
82+
### If you want to add support for a new platform
83+
84+
1. [Create the server for the new platform](server/README.md#other-platforms)
85+
2. [Update the CLI](cli/README.md#adding-support-for-other-platforms)
86+
3. follow the guide there to start the new server
87+
7288
## Community
7389

7490
Join our community to stay updated, ask questions, and contribute to the project:

cli/README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# BUILD CLI from source
2+
3+
Build the CLI from source by following these steps:
4+
5+
1. Clone the repository:
6+
7+
```bash
8+
git clone https://github.com/homl-dev/homl.git
9+
cd homl/cli
10+
```
11+
2. create a venv
12+
13+
```bash
14+
python -m venv venv
15+
source venv/bin/activate
16+
```
17+
18+
3. run the build command
19+
20+
```
21+
cd cli
22+
bash build.sh
23+
```
24+
25+
3. The CLI binary will be at `dist/homl`
26+
27+
28+
# Adding support for other platforms
29+
30+
1. Make modification to the following functions to add support for other platforms inside [install_utils](homl_cli/utils/install_utils.py):
31+
32+
1. detect_platform: to support detecting the platform correctly
33+
2. get_platform_config: to return the correct image, and add correct hardware resource assignments for docker.
34+
3. install: add platform-specific installation steps
35+
36+
2. When running with locally build image, use HOML_DOCKER_IMAGE_OVERRIDE environment variable to specify the image when running the `homl server install` command.

cli/homl_cli/utils/install_utils.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -85,11 +85,9 @@ def check_and_install_docker():
8585

8686

8787
def get_platform_config(accelerator: str, gptoss: bool) -> Dict[str, Any]:
88-
"""Returns the docker image and other config for a given platform."""
89-
# In the future, these images would be hosted on a public registry.
90-
# For now, they are conceptual names.
88+
"""Returns the docker image and other config for a given platform."""
9189
if accelerator == "cuda":
92-
return {
90+
cfg = {
9391
"image": "ghcr.io/wsmlby/homl/server:latest-cuda" if not gptoss else "ghcr.io/wsmlby/homl/server:latest-cuda-gptoss",
9492
"deploy_resources": """
9593
resources:
@@ -102,10 +100,15 @@ def get_platform_config(accelerator: str, gptoss: bool) -> Dict[str, Any]:
102100
}
103101
# TODO: Add support for ROCm and XPU in the future
104102
else: # cpu
105-
return {
103+
cfg = {
106104
"image": "ghcr.io/wsmlby/homl/server:latest-cpu",
107105
"deploy_resources": "",
108106
}
107+
108+
if os.environ.get("HOML_DOCKER_IMAGE_OVERRIDE"):
109+
cfg["image"] = os.environ["HOML_DOCKER_IMAGE_OVERRIDE"]
110+
111+
return cfg
109112

110113

111114
def check_and_install_nvidia_runtime():
@@ -155,6 +158,7 @@ def install(insecure_socket: bool, upgrade: bool, gptoss: bool, install_webui: b
155158
if accelerator == "cuda":
156159
if not check_and_install_nvidia_runtime():
157160
return
161+
# add other platform checks here
158162
else:
159163
click.secho("No NVIDIA runtime found. Currently only support NVIDIA GPU. Abort.", fg="red")
160164
return

server/Dockerfile.cpu

Lines changed: 18 additions & 101 deletions
Original file line numberDiff line numberDiff line change
@@ -1,111 +1,28 @@
1-
# This Dockerfile is sourced from the official vLLM project:
2-
# https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.cpu
3-
#
4-
# To build the CPU base image for HoML, you should:
5-
# 1. Clone the vLLM repository: git clone https://github.com/vllm-project/vllm.git
6-
# 2. Navigate to the vLLM repository root.
7-
# 3. Place this file at the root of the vLLM project checkout.
8-
# 4. Run the build command, e.g.:
9-
# docker buildx build -t homl/vllm-cpu:latest -f Dockerfile.cpu .
10-
#
11-
# The resulting `homl/vllm-cpu:latest` image can then be used as a base
12-
# for the main HoML server image.
1+
# This Dockerfile builds the final HoML Server image for CPU.
2+
# It layers the HoML server code on top of a pre-built vLLM CPU base image.
3+
FROM ghcr.io/wsmlby/homl-vllm-cpu-base:latest
134

14-
######################### COMMON BASE IMAGE #########################
15-
FROM ubuntu:22.04 AS base-common
165

17-
WORKDIR /workspace/
6+
# Set the working directory to homl_server
7+
WORKDIR /app/homl_server
188

19-
ARG PYTHON_VERSION=3.12
20-
ARG PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu"
219

22-
# Install minimal dependencies and uv
23-
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
24-
--mount=type=cache,target=/var/lib/apt,sharing=locked \
25-
apt-get update -y \
26-
&& apt-get install -y --no-install-recommends ccache git curl wget ca-certificates \
27-
gcc-12 g++-12 libtcmalloc-minimal4 libnuma-dev ffmpeg libsm6 libxext6 libgl1 jq lsof \
28-
&& update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12 \
29-
&& curl -LsSf https://astral.sh/uv/install.sh | sh
10+
# Copy requirements.txt and install dependencies
11+
COPY requirements.txt ./
12+
RUN pip install -r requirements.txt
3013

31-
ENV CCACHE_DIR=/root/.cache/ccache
32-
ENV CMAKE_CXX_COMPILER_LAUNCHER=ccache
14+
# ENV ACCELERATOR=CPU
3315

34-
ENV PATH="/root/.local/bin:$PATH"
35-
ENV VIRTUAL_ENV="/opt/venv"
36-
ENV UV_PYTHON_INSTALL_DIR=/opt/uv/python
37-
RUN uv venv --python ${PYTHON_VERSION} --seed ${VIRTUAL_ENV}
38-
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
16+
# Copy our application source code
17+
COPY ./homl_server ./
18+
COPY ./homl_server ./homl_server
19+
COPY ./vllm_patches ./patches
3920

40-
ENV UV_HTTP_TIMEOUT=500
21+
RUN cd /usr/local/lib/python3.12/dist-packages/vllm && patch -p1 < /app/patches/registry.patch
4122

42-
# Install Python dependencies
43-
ENV PIP_EXTRA_INDEX_URL=${PIP_EXTRA_INDEX_URL}
44-
ENV UV_EXTRA_INDEX_URL=${PIP_EXTRA_INDEX_URL}
45-
ENV UV_INDEX_STRATEGY="unsafe-best-match"
46-
ENV UV_LINK_MODE="copy"
47-
RUN --mount=type=cache,target=/root/.cache/uv \
48-
--mount=type=bind,src=requirements/common.txt,target=requirements/common.txt \
49-
--mount=type=bind,src=requirements/cpu.txt,target=requirements/cpu.txt \
50-
uv pip install --upgrade pip && \
51-
uv pip install -r requirements/cpu.txt
5223

53-
ARG TARGETARCH
54-
ENV TARGETARCH=${TARGETARCH}
24+
ARG HOML_SERVER_VERSION=dev
25+
ENV HOML_SERVER_VERSION=$HOML_SERVER_VERSION
5526

56-
######################### x86_64 BASE IMAGE #########################
57-
FROM base-common AS base-amd64
58-
59-
ENV LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4:/opt/venv/lib/libiomp5.so"
60-
61-
######################### arm64 BASE IMAGE #########################
62-
FROM base-common AS base-arm64
63-
64-
ENV LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libtcmalloc_minimal.so.4"
65-
66-
######################### BASE IMAGE #########################
67-
FROM base-${TARGETARCH} AS base
68-
69-
RUN echo 'ulimit -c 0' >> ~/.bashrc
70-
71-
######################### BUILD IMAGE #########################
72-
FROM base AS vllm-build
73-
74-
ARG GIT_REPO_CHECK=0
75-
# Support for building with non-AVX512 vLLM: docker build --build-arg VLLM_CPU_DISABLE_AVX512="true" ...
76-
ARG VLLM_CPU_DISABLE_AVX512=true
77-
ENV VLLM_CPU_DISABLE_AVX512=${VLLM_CPU_DISABLE_AVX512}
78-
# Support for building with AVX512BF16 ISA: docker build --build-arg VLLM_CPU_AVX512BF16="true" ...
79-
ARG VLLM_CPU_AVX512BF16=false
80-
ENV VLLM_CPU_AVX512BF16=${VLLM_CPU_AVX512BF16}
81-
# Support for building with AVX512VNNI ISA: docker build --build-arg VLLM_CPU_AVX512VNNI="true" ...
82-
ARG VLLM_CPU_AVX512VNNI=false
83-
ENV VLLM_CPU_AVX512VNNI=${VLLM_CPU_AVX512VNNI}
84-
85-
WORKDIR /workspace/vllm
86-
87-
RUN --mount=type=cache,target=/root/.cache/uv \
88-
--mount=type=bind,src=requirements/cpu-build.txt,target=requirements/build.txt \
89-
uv pip install -r requirements/build.txt
90-
91-
COPY . .
92-
RUN --mount=type=bind,source=.git,target=.git \
93-
if [ "$GIT_REPO_CHECK" != 0 ]; then bash tools/check_repo.sh ; fi
94-
95-
RUN --mount=type=cache,target=/root/.cache/uv \
96-
--mount=type=cache,target=/root/.cache/ccache \
97-
--mount=type=cache,target=/workspace/vllm/.deps,sharing=locked \
98-
--mount=type=bind,source=.git,target=.git \
99-
VLLM_TARGET_DEVICE=cpu python3 setup.py bdist_wheel
100-
101-
######################### RELEASE IMAGE #########################
102-
FROM base AS vllm-openai
103-
104-
WORKDIR /workspace/
105-
106-
RUN --mount=type=cache,target=/root/.cache/uv \
107-
--mount=type=cache,target=/root/.cache/ccache \
108-
--mount=type=bind,from=vllm-build,src=/workspace/vllm/dist,target=dist \
109-
uv pip install dist/*.whl
110-
111-
ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]
27+
# Start the server directly from main.py
28+
ENTRYPOINT ["python3", "-u", "main.py"]

server/Dockerfile.cpu.app

Lines changed: 0 additions & 26 deletions
This file was deleted.

0 commit comments

Comments
 (0)