Add Ollama provider with model lifecycle management

## Summary

Implement an Ollama provider (`pkg/provider/ollama`) for local model inference, including full model lifecycle management (download, load, unload, delete).

## Requirements

### Core Provider
- Implement `llm.Client` interface for the Ollama API (`http://localhost:11434` by default)
- Endpoint configurable via `OLLAMA_ENDPOINT` environment variable or CLI flag
- No API key required (local service)
- Support streaming and non-streaming chat completions
- Support tool/function calling (Ollama supports this for compatible models)
- Support embeddings

### Model Management
- **Pull/Download**: Pull models from the Ollama registry (equivalent to `ollama pull`)
- **List**: List locally available models with size, quantization, and modification date
- **Load**: Load a model into memory (warm up for faster inference)
- **Unload**: Unload a model from memory to free GPU/RAM
- **Delete**: Remove a model from local storage
- **Show**: Get model details (parameters, template, license, system prompt)
- Expose model management operations via the API (new endpoints or extend existing model API)

### API Endpoints
- `POST /api/model/pull` — pull a model by name/tag
- `POST /api/model/{name}/load` — load model into memory
- `POST /api/model/{name}/unload` — unload model from memory
- `DELETE /api/model/{name}` — delete a model
- Pull progress should be streamable (SSE or chunked response) for UI feedback

### Models
- Any model available in the Ollama registry (llama, mistral, gemma, phi, qwen, etc.)
- Model names use the Ollama format: `model:tag` (e.g. `llama3.2:latest`, `mistral:7b-instruct-q4_0`)

## Notes

- Ollama API docs: https://github.com/ollama/ollama/blob/main/docs/api.md
- The existing `ollama.tf` deployment in tf-modules shows Ollama is already running in the infrastructure
- Consider how model management interacts with the CLI (`llm model pull`, `llm model delete`, etc.)

## Motivation

Ollama enables local/private model inference with no API costs. Model management is a key differentiator — users need to download, load, and manage models on their GPU servers. This complements the cloud providers (Gemini, Anthropic, Mistral) with self-hosted options.

OpenAI is a major LLM provider and its multi-modal output capabilities (images, audio) will drive the content model to support rich responses across all providers, improving the overall architecture.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ollama provider with model lifecycle management #24

Summary

Requirements

Core Provider

Model Management

API Endpoints

Models

Notes

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Ollama provider with model lifecycle management #24

Description

Summary

Requirements

Core Provider

Model Management

API Endpoints

Models

Notes

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions