The HoML server is designed to be run as a Docker container. It provides an OpenAI-compatible API for running inference on various models and a gRPC server for control by the homl CLI.
The server container is intended to be managed directly by the homl command-line tool. To install and run the server, please use the following command:
homl server installThis command will handle the creation of a docker-compose.yml file, configure the necessary volumes for the model cache and gRPC socket, and start the server.
For advanced users who wish to manage the server manually, the homl server install command generates a docker-compose.yml file in ~/.homl/. This file can be inspected and modified to suit your needs.
Key configuration points managed by the installer include:
- User Permissions: The container is run with the host user's UID/GID to ensure correct ownership of the socket and cache files.
- Volume Mounts: The socket is shared at
~/.homl/runand the model cache is persisted at~/.homl/models. - Insecure Socket: An
--insecure-socketflag on theinstallcommand allows for a world-writable socket as a fallback, which is controlled by theHOML_INSECURE_SOCKETenvironment variable passed to the container.
For developers, the platform-specific HoML server images can be built from source.
docker build -f Dockerfile.cuda -t homl/server:latest-cuda .Building the CPU image is a two-step process:
- Build the vLLM CPU base image: This requires a clone of the vLLM repository. See the comments in
Dockerfile.cpu.basefor detailed instructions on building thehoml/vllm-cpu:latestbase image. - Build the HoML server image:
docker build -f Dockerfile.cpu -t homl/server:latest-cpu .
To test this on other platforms
-
find/build the vLLM base image. You can find them here. For example :
- ROCm: rocm/vllm:latest
- Intel GPU: intel/vllm:latest
- TPU: vllm/vllm-tpu:nightly
- You can also build base images from here https://github.com/vllm-project/vllm/tree/main/docker
- Or, of course, create a new image for vLLM on that platform.
-
create a Dockerfile for your platform similar to this
- make sure you use the correct base image from step 1
- set the
ENV ACCELERATOR=<your-accelerator-name>
-
make the proper modification for some platform specific commands where ACCELERATOR is used.
-
Build the Docker image using the Dockerfile you created.
-
To use the image, use HOML_DOCKER_IMAGE_OVERRIDE environment variable to specify the image when running the
homl server installcommand. For example
docker build -f Dockerfile.xpu -t homl/server:latest-xpu .
HOML_DOCKER_IMAGE_OVERRIDE=homl/server:latest-xpu homl server install- use
homl server logto see logs