Ollama provides free, local AI models that run on your own hardware. This guide covers everything you need to know about setting up Ollama with Open Notebook, including different deployment scenarios and network configurations.
- 🆓 Completely Free: No API costs after initial setup
- 🔒 Full Privacy: Your data never leaves your local network
- 📱 Offline Capable: Works without internet connection
- 🚀 Fast: Local inference with no network latency
- 🧠 Reasoning Models: Support for advanced reasoning models like DeepSeek-R1
- 💾 Model Variety: Access to hundreds of open-source models
Linux/macOS:
curl -fsSL https://ollama.ai/install.sh | shWindows: Download and install from ollama.ai
# Language models (choose one or more)
ollama pull qwen3 # Excellent general purpose, 7B parameters
ollama pull gemma3 # Google's model, good performance
ollama pull deepseek-r1 # Advanced reasoning model
ollama pull phi4 # Microsoft's efficient model
# Embedding model (required for search)
ollama pull mxbai-embed-large # Best embedding model for OllamaFor local installation:
export OLLAMA_API_BASE=http://localhost:11434For Docker installation:
export OLLAMA_API_BASE=http://host.docker.internal:11434The OLLAMA_API_BASE environment variable tells Open Notebook where to find your Ollama server. The correct value depends on your deployment scenario:
When both Open Notebook and Ollama run directly on your machine:
export OLLAMA_API_BASE=http://localhost:11434
# or
export OLLAMA_API_BASE=http://127.0.0.1:11434Use localhost vs 127.0.0.1:
- localhost: Recommended, works with most configurations
- 127.0.0.1: Use if you have DNS resolution issues with localhost
When Open Notebook runs in Docker but Ollama runs on your host machine:
export OLLAMA_API_BASE=http://host.docker.internal:11434# Start Ollama with external access enabled
export OLLAMA_HOST=0.0.0.0:11434
ollama serveWhy host.docker.internal?
- Docker containers can't reach
localhoston the host host.docker.internalis Docker's special hostname for the host machine- Available on Docker Desktop for Mac/Windows and recent Linux versions
Why OLLAMA_HOST=0.0.0.0:11434?
- By default, Ollama only binds to localhost and rejects external connections
- Docker containers are considered "external" even when running on the same machine
- Setting
OLLAMA_HOST=0.0.0.0:11434allows connections from Docker containers
When both Open Notebook and Ollama run in the same Docker Compose stack:
export OLLAMA_API_BASE=http://ollama:11434Docker Compose Example:
version: '3.8'
services:
open-notebook:
image: lfnovo/open_notebook:v1-latest-single
ports:
- "8502:8502"
- "5055:5055"
environment:
- OLLAMA_API_BASE=http://ollama:11434
volumes:
- ./notebook_data:/app/data
- ./surreal_data:/mydata
depends_on:
- ollama
ollama:
image: ollama/ollama:v1-latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
# Optional: GPU support
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
ollama_data:When Ollama runs on a different machine in your network:
export OLLAMA_API_BASE=http://192.168.1.100:11434
# Replace 192.168.1.100 with your Ollama server's IP addressSecurity Note: Only use this in trusted networks. Ollama doesn't have built-in authentication.
If you've configured Ollama to use a different port:
# Start Ollama on custom port
OLLAMA_HOST=0.0.0.0:8080 ollama serve
# Configure Open Notebook
export OLLAMA_API_BASE=http://localhost:8080| Model | Size | Best For | Quality | Speed |
|---|---|---|---|---|
| qwen3 | 7B | General purpose, coding | Excellent | Fast |
| deepseek-r1 | 7B | Reasoning, problem-solving | Exceptional | Medium |
| gemma3 | 7B | Balanced performance | Very Good | Fast |
| phi4 | 14B | Efficiency on small hardware | Good | Very Fast |
| llama3 | 8B | General purpose | Very Good | Medium |
| Model | Best For | Performance |
|---|---|---|
| mxbai-embed-large | General search | Excellent |
| nomic-embed-text | Document similarity | Good |
| all-minilm | Lightweight option | Fair |
# Essential models
ollama pull qwen3 # Primary language model
ollama pull mxbai-embed-large # Search embeddings
# Optional reasoning model
ollama pull deepseek-r1 # Advanced reasoning
# Alternative language models
ollama pull gemma3 # Google's model
ollama pull phi4 # Microsoft's efficient model- RAM: 8GB (for 7B models)
- Storage: 10GB free space per model
- CPU: Modern multi-core processor
- RAM: 16GB+ (for multiple models)
- Storage: SSD with 50GB+ free space
- GPU: NVIDIA GPU with 8GB+ VRAM (optional but faster)
NVIDIA GPU (CUDA):
# Install NVIDIA Container Toolkit for Docker
# Then use the Docker Compose example above with GPU support
# For local installation, Ollama auto-detects CUDA
ollama pull qwen3Apple Silicon (M1/M2/M3):
# Ollama automatically uses Metal acceleration
# No additional setup required
ollama pull qwen3AMD GPUs:
# ROCm support varies by model and system
# Check Ollama documentation for latest compatibility1. "Ollama unavailable" in Open Notebook
Check Ollama is running:
curl http://localhost:11434/api/tagsVerify environment variable:
echo $OLLAMA_API_BASE# If Open Notebook runs in Docker or on a different machine,
# Ollama must bind to all interfaces, not just localhost
export OLLAMA_HOST=0.0.0.0:11434
ollama serveWhy this is needed: By default, Ollama only accepts connections from
localhost(127.0.0.1). When Open Notebook runs in Docker or on a different machine, it can't reach Ollama unless you configureOLLAMA_HOST=0.0.0.0:11434to accept external connections.
Restart Ollama:
# Linux/macOS
sudo systemctl restart ollama
# or
ollama serve
# Windows
# Restart from system tray or Services2. Docker networking issues
From inside Open Notebook container, test Ollama:
# Get into container
docker exec -it open-notebook bash
# Test connection
curl http://host.docker.internal:11434/api/tags3. Models not downloading
Check disk space:
df -hManual model pull:
ollama pull qwen3 --verboseClear failed downloads:
ollama rm qwen3
ollama pull qwen34. Slow performance
Check model size vs available RAM:
ollama ps # Show running models
free -h # Check available memoryUse smaller models:
ollama pull phi4 # Instead of larger models
ollama pull gemma3:2b # 2B parameter variant5. Port conflicts
Check what's using port 11434:
lsof -i :11434
netstat -tulpn | grep 11434Use custom port:
OLLAMA_HOST=0.0.0.0:8080 ollama serve
export OLLAMA_API_BASE=http://localhost:80801. Host networking on Linux:
# Use host networking if host.docker.internal doesn't work
docker run --network host lfnovo/open_notebook:v1-latest-single
export OLLAMA_API_BASE=http://localhost:114342. Custom bridge network:
version: '3.8'
networks:
ollama_network:
driver: bridge
services:
open-notebook:
networks:
- ollama_network
environment:
- OLLAMA_API_BASE=http://ollama:11434
ollama:
networks:
- ollama_network3. Firewall issues:
# Allow Ollama port through firewall
sudo ufw allow 11434
# or
sudo firewall-cmd --add-port=11434/tcp --permanentList installed models:
ollama listRemove unused models:
ollama rm model_nameShow running models:
ollama psPreload models for faster startup:
# Keep model in memory
curl http://localhost:11434/api/generate -d '{
"model": "qwen3",
"prompt": "test",
"keep_alive": -1
}'Linux: Increase file limits:
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.confmacOS: Increase memory limits:
# Add to ~/.zshrc or ~/.bash_profile
export OLLAMA_MAX_LOADED_MODELS=2
export OLLAMA_NUM_PARALLEL=4Docker: Resource allocation:
services:
ollama:
deploy:
resources:
limits:
memory: 8G
cpus: '4'# Ollama server configuration
export OLLAMA_HOST=0.0.0.0:11434 # Bind to all interfaces
export OLLAMA_KEEP_ALIVE=5m # Keep models in memory
export OLLAMA_MAX_LOADED_MODELS=3 # Max concurrent models
export OLLAMA_MAX_QUEUE=512 # Request queue size
export OLLAMA_NUM_PARALLEL=4 # Parallel request handling
export OLLAMA_FLASH_ATTENTION=1 # Enable flash attention (if supported)
# Open Notebook configuration
export OLLAMA_API_BASE=http://localhost:11434If you're running Ollama behind a reverse proxy with self-signed SSL certificates (e.g., Caddy, nginx with custom certs), you may encounter SSL verification errors:
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate
Solutions:
Option 1: Use a custom CA bundle (recommended)
# Point to your CA certificate file
export ESPERANTO_SSL_CA_BUNDLE=/path/to/your/ca-bundle.pemOption 2: Disable SSL verification (development only)
# WARNING: Only use in trusted development environments
export ESPERANTO_SSL_VERIFY=falseDocker Compose example with SSL configuration:
services:
open-notebook:
image: lfnovo/open_notebook:v1-latest-single
environment:
- OLLAMA_API_BASE=https://ollama.local:11434
# Option 1: Custom CA bundle
- ESPERANTO_SSL_CA_BUNDLE=/certs/ca-bundle.pem
# Option 2: Disable verification (dev only)
# - ESPERANTO_SSL_VERIFY=false
volumes:
- /path/to/your/ca-bundle.pem:/certs/ca-bundle.pem:roSecurity Note: Disabling SSL verification exposes you to man-in-the-middle attacks. Always prefer using a custom CA bundle in production environments.
Import custom models:
# Create Modelfile
cat > Modelfile << EOF
FROM qwen3
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM "You are a helpful research assistant."
EOF
# Create custom model
ollama create my-research-model -f ModelfileUse in Open Notebook:
- Go to Models
- Add new model:
my-research-model - Set as default for specific tasks
Monitor Ollama logs:
# Linux (systemd)
journalctl -u ollama -f
# Docker
docker logs -f ollama
# Manual run with verbose logging
OLLAMA_DEBUG=1 ollama serveResource monitoring:
# CPU and memory usage
htop
# GPU usage (NVIDIA)
nvidia-smi -l 1
# Model-specific metrics
ollama psimport requests
import os
# Test Ollama connection
ollama_base = os.environ.get('OLLAMA_API_BASE', 'http://localhost:11434')
response = requests.get(f'{ollama_base}/api/tags')
print(f"Available models: {response.json()}")
# Generate text
payload = {
"model": "qwen3",
"prompt": "Explain quantum computing",
"stream": False
}
response = requests.post(f'{ollama_base}/api/generate', json=payload)
print(response.json()['response'])#!/bin/bash
# ollama-health-check.sh
OLLAMA_API_BASE=${OLLAMA_API_BASE:-"http://localhost:11434"}
echo "Checking Ollama health..."
if curl -s "${OLLAMA_API_BASE}/api/tags" > /dev/null; then
echo "✅ Ollama is running"
echo "Available models:"
curl -s "${OLLAMA_API_BASE}/api/tags" | jq -r '.models[].name'
else
echo "❌ Ollama is not accessible at ${OLLAMA_API_BASE}"
exit 1
fiSimilar performance models:
- GPT-4 →
qwen3ordeepseek-r1 - GPT-3.5 →
gemma3orphi4 - text-embedding-ada-002 →
mxbai-embed-large
Cost comparison:
- OpenAI: $0.01-0.06 per 1K tokens
- Ollama: $0 after hardware investment
Claude replacement suggestions:
- Claude 3.5 Sonnet →
deepseek-r1(reasoning) - Claude 3 Haiku →
phi4(speed)
-
Network Security:
- Run Ollama only on trusted networks
- Use firewall rules to limit access
- Consider VPN for remote access
-
Model Verification:
- Only pull models from trusted sources
- Verify model checksums when possible
-
Resource Limits:
- Set memory and CPU limits in production
- Monitor resource usage regularly
-
Model Selection:
- Use appropriate model size for your hardware
- Smaller models for simple tasks
- Reasoning models only when needed
-
Resource Management:
- Preload frequently used models
- Remove unused models regularly
- Monitor system resources
-
Network Optimization:
- Use local networks for better latency
- Consider SSD storage for faster model loading
Community Resources:
- Ollama GitHub - Official repository
- Ollama Discord - Community support
- Open Notebook Discord - Integration help
Debugging Resources:
- Check Ollama logs for error messages
- Test connection with curl commands
- Verify environment variables
- Monitor system resources
This comprehensive guide should help you successfully deploy and optimize Ollama with Open Notebook. Start with the Quick Start section and refer to specific scenarios as needed.