This image contains Ollama with the following properties and enhancements:
- Flash attention enabled by default.
- Default context length of
32000. - Preloading of models via environment variable
PRELOAD_MODELS - Pulling of models at startup via environment variable
PULL_MODELS - Option to delete models not specified for preloading or pulling by setting environment variable
DELETE_MODELS=true - Option to change the process priority using environment variables
SCHED_POLICYandNICENESS_ADJUSTMENT - Cuda support
- Vulkan support
Usage is the same as in the official Ollama Docker image.
- Pull from Docker Hub, download the package from Releases or build using
builder/build.sh
To run for development execute:
docker compose --file docker-compose-dev.yaml up --build