LiteLLM on Coolify for NVIDIA Multi-Key Inference

Deploy LiteLLM behind Coolify with a pooled set of NVIDIA API keys and a single OpenAI-compatible model alias: glm-5-nvidia.

This pack was validated against a real Coolify host with duplicate end-to-end deployments, health checks, and live inference requests.

What it includes

docker-compose.coolify.yml
- Static reference compose for the current verified deployment shape.
- Pins LiteLLM to a known working image digest.
- Generates /app/config.yaml at container start.
render-compose.sh
- Renders the deployable compose from .env.
- Supports any number of NVIDIA_API_KEY_POOL_N entries.
upsert-via-api.sh
- Creates or updates the Coolify service via the public API.
- Preflights host-port collisions before mutation.
- Waits for running:healthy before returning success.
smoke-test.sh
- Verifies /v1/models and /v1/chat/completions.
.env.example
- Required runtime variables with placeholder values.
factory-custom-model.example.json
- Example Factory Droid custom model entry for the deployed LiteLLM service.

Runtime shape

Upstream provider: NVIDIA OpenAI-compatible API
Upstream model: z-ai/glm5
Exposed LiteLLM alias: glm-5-nvidia
Routing strategy: simple-shuffle
LiteLLM image: ghcr.io/berriai/litellm@sha256:d6580beba82a69e4cfb6598c300b7c524d9ea6f67592226fdec7f6a9aba34eb2

Deploy

Copy .env.example to .env.
Fill in LITELLM_MASTER_KEY, NVIDIA_API_BASE, and one or more NVIDIA_API_KEY_POOL_N values.
Export the Coolify control-plane variables:
- COOLIFY_BASE_URL
- COOLIFY_API_TOKEN
- PROJECT_UUID
- ENVIRONMENT_UUID
- SERVER_UUID
Run:

sh ./upsert-via-api.sh

If SERVICE_UUID is set, the script updates that service. If SERVICE_UUID is unset, it searches by SERVICE_NAME and updates that match; otherwise it creates a new service.

For a second LiteLLM instance on the same server, use a unique service name and host port:

SERVICE_NAME=litellm-glm5-nvidia-pool-duplicate \
HOST_PORT=4010 \
sh ./upsert-via-api.sh

Verify

BASE_URL=http://127.0.0.1:4000/v1 \
API_KEY="$LITELLM_MASTER_KEY" \
sh ./smoke-test.sh

Expected result:

/v1/models returns only glm-5-nvidia
/v1/chat/completions returns smoke test ok

Operational notes

The scripts discover NVIDIA_API_KEY_POOL_N entries from the env file itself, not from ambient shell variables.
The deploy script fails loudly if multiple Coolify services match the same SERVICE_NAME.
The deploy script fails fast if HOST_PORT is already bound by another Coolify service.
The deploy script tolerates transient Coolify exited states during startup and only fails fast on explicit :unhealthy status.
Coolify can momentarily report stale health around start and restart transitions; the script includes a short post-action settle delay before health polling.
The smoke test defaults to a 120 second timeout because valid NVIDIA keys can still have variable latency.
Do not treat the generated runtime compose under /data/coolify/services/* as source of truth. It contains instance-specific labels, UUIDs, network names, and container names.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.compose.dynamic.rendered.yml		.compose.dynamic.rendered.yml
.compose.dynamic.yml		.compose.dynamic.yml
.compose.rendered.yml		.compose.rendered.yml
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.coolify.yml		docker-compose.coolify.yml
factory-custom-model.example.json		factory-custom-model.example.json
render-compose.sh		render-compose.sh
smoke-test.sh		smoke-test.sh
upsert-via-api.sh		upsert-via-api.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiteLLM on Coolify for NVIDIA Multi-Key Inference

What it includes

Runtime shape

Deploy

Verify

Operational notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LiteLLM on Coolify for NVIDIA Multi-Key Inference

What it includes

Runtime shape

Deploy

Verify

Operational notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages