Deploy LiteLLM behind Coolify with a pooled set of NVIDIA API keys and a single
OpenAI-compatible model alias: glm-5-nvidia.
This pack was validated against a real Coolify host with duplicate end-to-end deployments, health checks, and live inference requests.
docker-compose.coolify.yml- Static reference compose for the current verified deployment shape.
- Pins LiteLLM to a known working image digest.
- Generates
/app/config.yamlat container start.
render-compose.sh- Renders the deployable compose from
.env. - Supports any number of
NVIDIA_API_KEY_POOL_Nentries.
- Renders the deployable compose from
upsert-via-api.sh- Creates or updates the Coolify service via the public API.
- Preflights host-port collisions before mutation.
- Waits for
running:healthybefore returning success.
smoke-test.sh- Verifies
/v1/modelsand/v1/chat/completions.
- Verifies
.env.example- Required runtime variables with placeholder values.
factory-custom-model.example.json- Example Factory Droid custom model entry for the deployed LiteLLM service.
- Upstream provider: NVIDIA OpenAI-compatible API
- Upstream model:
z-ai/glm5 - Exposed LiteLLM alias:
glm-5-nvidia - Routing strategy:
simple-shuffle - LiteLLM image:
ghcr.io/berriai/litellm@sha256:d6580beba82a69e4cfb6598c300b7c524d9ea6f67592226fdec7f6a9aba34eb2
- Copy
.env.exampleto.env. - Fill in
LITELLM_MASTER_KEY,NVIDIA_API_BASE, and one or moreNVIDIA_API_KEY_POOL_Nvalues. - Export the Coolify control-plane variables:
COOLIFY_BASE_URLCOOLIFY_API_TOKENPROJECT_UUIDENVIRONMENT_UUIDSERVER_UUID
- Run:
sh ./upsert-via-api.shIf SERVICE_UUID is set, the script updates that service. If SERVICE_UUID is
unset, it searches by SERVICE_NAME and updates that match; otherwise it creates
a new service.
For a second LiteLLM instance on the same server, use a unique service name and host port:
SERVICE_NAME=litellm-glm5-nvidia-pool-duplicate \
HOST_PORT=4010 \
sh ./upsert-via-api.shBASE_URL=http://127.0.0.1:4000/v1 \
API_KEY="$LITELLM_MASTER_KEY" \
sh ./smoke-test.shExpected result:
/v1/modelsreturns onlyglm-5-nvidia/v1/chat/completionsreturnssmoke test ok
- The scripts discover
NVIDIA_API_KEY_POOL_Nentries from the env file itself, not from ambient shell variables. - The deploy script fails loudly if multiple Coolify services match the same
SERVICE_NAME. - The deploy script fails fast if
HOST_PORTis already bound by another Coolify service. - The deploy script tolerates transient Coolify
exitedstates during startup and only fails fast on explicit:unhealthystatus. - Coolify can momentarily report stale health around start and restart transitions; the script includes a short post-action settle delay before health polling.
- The smoke test defaults to a 120 second timeout because valid NVIDIA keys can still have variable latency.
- Do not treat the generated runtime compose under
/data/coolify/services/*as source of truth. It contains instance-specific labels, UUIDs, network names, and container names.