-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
Description
Summary
Implement an Ollama provider (pkg/provider/ollama) for local model inference, including full model lifecycle management (download, load, unload, delete).
Requirements
Core Provider
- Implement
llm.Clientinterface for the Ollama API (http://localhost:11434by default) - Endpoint configurable via
OLLAMA_ENDPOINTenvironment variable or CLI flag - No API key required (local service)
- Support streaming and non-streaming chat completions
- Support tool/function calling (Ollama supports this for compatible models)
- Support embeddings
Model Management
- Pull/Download: Pull models from the Ollama registry (equivalent to
ollama pull) - List: List locally available models with size, quantization, and modification date
- Load: Load a model into memory (warm up for faster inference)
- Unload: Unload a model from memory to free GPU/RAM
- Delete: Remove a model from local storage
- Show: Get model details (parameters, template, license, system prompt)
- Expose model management operations via the API (new endpoints or extend existing model API)
API Endpoints
POST /api/model/pull— pull a model by name/tagPOST /api/model/{name}/load— load model into memoryPOST /api/model/{name}/unload— unload model from memoryDELETE /api/model/{name}— delete a model- Pull progress should be streamable (SSE or chunked response) for UI feedback
Models
- Any model available in the Ollama registry (llama, mistral, gemma, phi, qwen, etc.)
- Model names use the Ollama format:
model:tag(e.g.llama3.2:latest,mistral:7b-instruct-q4_0)
Notes
- Ollama API docs: https://github.com/ollama/ollama/blob/main/docs/api.md
- The existing
ollama.tfdeployment in tf-modules shows Ollama is already running in the infrastructure - Consider how model management interacts with the CLI (
llm model pull,llm model delete, etc.)
Motivation
Ollama enables local/private model inference with no API costs. Model management is a key differentiator — users need to download, load, and manage models on their GPU servers. This complements the cloud providers (Gemini, Anthropic, Mistral) with self-hosted options.
OpenAI is a major LLM provider and its multi-modal output capabilities (images, audio) will drive the content model to support rich responses across all providers, improving the overall architecture.
Reactions are currently unavailable