Skip to content

Using Ollama

Griffen Fargo edited this page Oct 6, 2025 · 5 revisions

Ollama allows you to run large language models locally on your machine, providing privacy, offline capability, and no API costs. This guide covers everything you need to know about using Ollama with coco.

What is Ollama?

Ollama is a tool that makes it easy to run large language models locally. It provides:

  • Privacy: Your code never leaves your machine
  • No API Costs: Run models without paying per request
  • Offline Capability: Work without internet connection
  • Performance: Direct access to your hardware (CPU/GPU)
  • Model Variety: Access to many open-source models

Installation

1. Install Ollama

macOS:

# Using Homebrew
brew install ollama

# Or download from https://ollama.ai/

Linux:

# Install script
curl -fsSL https://ollama.ai/install.sh | sh

# Or using package managers
# Ubuntu/Debian
sudo apt install ollama

# Arch Linux  
yay -S ollama

Windows:

# Download installer from https://ollama.ai/
# Or use Windows Subsystem for Linux (WSL)

2. Start Ollama Service

# Start the Ollama service
ollama serve

# Or run as background service (Linux/macOS)
sudo systemctl start ollama

3. Pull a Model

# Recommended models for code generation
ollama pull qwen2.5-coder:7b      # Best balance of speed/quality
ollama pull llama3.1:8b           # Good general purpose model
ollama pull codellama:13b         # Specialized for code

# Smaller models for faster responses
ollama pull qwen2.5-coder:1.5b    # Very fast, good for simple commits
ollama pull llama3.2:3b           # Fast and capable

# Larger models for better quality (requires more RAM)
ollama pull qwen2.5-coder:32b     # Highest quality code model
ollama pull llama3.1:70b          # Excellent but requires 40GB+ RAM

Quick Setup with Coco

Using coco init

The easiest way to configure Ollama with coco:

# Run the setup wizard
coco init

# Select "ollama" when prompted for provider
# Choose from your installed models
# Wizard will configure everything automatically

Manual Configuration

Create or update your .coco.config.json:

{
  "service": {
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "endpoint": "http://localhost:11434",
    "authentication": {
      "type": "None"
    }
  }
}

Recommended Models

For Code Generation (Recommended)

Qwen2.5-Coder Series (Best for coco):

ollama pull qwen2.5-coder:1.5b    # 1.5B params - Very fast, 2GB RAM
ollama pull qwen2.5-coder:3b      # 3B params - Fast, 4GB RAM  
ollama pull qwen2.5-coder:7b      # 7B params - Balanced, 8GB RAM ⭐ Recommended
ollama pull qwen2.5-coder:14b     # 14B params - High quality, 16GB RAM
ollama pull qwen2.5-coder:32b     # 32B params - Highest quality, 32GB RAM

CodeLlama Series:

ollama pull codellama:7b          # 7B params - Good for code, 8GB RAM
ollama pull codellama:13b         # 13B params - Better quality, 16GB RAM
ollama pull codellama:34b         # 34B params - High quality, 32GB RAM

For General Use

Llama 3.1/3.2 Series:

ollama pull llama3.2:1b          # 1B params - Very fast, 2GB RAM
ollama pull llama3.2:3b          # 3B params - Fast and capable, 4GB RAM
ollama pull llama3.1:8b          # 8B params - Excellent balance, 8GB RAM ⭐ Recommended
ollama pull llama3.1:70b         # 70B params - Top quality, 40GB+ RAM

DeepSeek R1 Series (Latest):

ollama pull deepseek-r1:1.5b     # 1.5B params - Very fast reasoning
ollama pull deepseek-r1:8b       # 8B params - Good reasoning, 8GB RAM
ollama pull deepseek-r1:32b      # 32B params - Excellent reasoning, 32GB RAM

Configuration Options

Basic Configuration

{
  "service": {
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "endpoint": "http://localhost:11434",
    "tokenLimit": 2048,
    "temperature": 0.4,
    "maxConcurrent": 1,
    "authentication": {
      "type": "None"
    }
  }
}

Advanced Configuration

{
  "service": {
    "provider": "ollama", 
    "model": "qwen2.5-coder:7b",
    "endpoint": "http://localhost:11434",
    "tokenLimit": 4096,
    "temperature": 0.3,
    "maxConcurrent": 1,
    "maxParsingAttempts": 5,
    "requestOptions": {
      "timeout": 120000,
      "maxRetries": 3
    },
    "authentication": {
      "type": "None"
    },
    "fields": {
      "numCtx": 4096,
      "numPredict": 2048,
      "repeatPenalty": 1.1,
      "topK": 40,
      "topP": 0.9,
      "seed": -1,
      "stop": ["\n\n", "```"]
    }
  }
}

Configuration Parameters Explained

Parameter Description Default Recommended
model Ollama model name - qwen2.5-coder:7b
endpoint Ollama server URL http://localhost:11434 Default
tokenLimit Max tokens per request 2048 2048-4096
temperature Randomness (0.0-1.0) 0.4 0.3-0.4
maxConcurrent Concurrent requests 1 1 (Ollama limitation)
numCtx Context window size 2048 4096
numPredict Max tokens to generate 128 1024-2048
repeatPenalty Repetition penalty 1.1 1.1
topK Top-K sampling 40 40
topP Top-P sampling 0.9 0.9

Performance Optimization

Hardware Requirements

Minimum Requirements:

  • RAM: 8GB (for 7B models)
  • Storage: 10GB free space
  • CPU: Modern multi-core processor

Recommended Setup:

  • RAM: 16GB+ (for better performance)
  • GPU: NVIDIA GPU with 8GB+ VRAM (optional but faster)
  • Storage: SSD for faster model loading
  • CPU: 8+ cores for better inference speed

GPU Acceleration

NVIDIA GPU (CUDA):

# Ollama automatically uses GPU if available
# Verify GPU usage
ollama ps

# Check GPU memory usage
nvidia-smi

Apple Silicon (M1/M2/M3):

# Ollama automatically uses Metal acceleration
# Monitor with Activity Monitor

AMD GPU (ROCm - Linux only):

# Install ROCm drivers first
# Ollama will detect and use AMD GPU

Model Selection by Hardware

8GB RAM:

{
  "service": {
    "model": "qwen2.5-coder:3b"  // or llama3.2:3b
  }
}

16GB RAM:

{
  "service": {
    "model": "qwen2.5-coder:7b"  // or llama3.1:8b
  }
}

32GB+ RAM:

{
  "service": {
    "model": "qwen2.5-coder:14b"  // or codellama:13b
  }
}

Remote Ollama Setup

Running Ollama on Another Machine

Server Setup:

# On the server machine
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# Or set environment variable permanently
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

Client Configuration:

{
  "service": {
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "endpoint": "http://192.168.1.100:11434",
    "authentication": {
      "type": "None"
    }
  }
}

Docker Setup

Run Ollama in Docker:

# CPU only
docker run -d \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

# With GPU support
docker run -d \
  --gpus=all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

# Pull models
docker exec -it ollama ollama pull qwen2.5-coder:7b

Troubleshooting

Common Issues

1. Ollama Service Not Running

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama service
ollama serve

# Or as system service (Linux)
sudo systemctl start ollama
sudo systemctl enable ollama

2. Model Not Found

# List installed models
ollama list

# Pull the model if missing
ollama pull qwen2.5-coder:7b

# Check model name in coco config matches exactly

3. Connection Refused

# Check Ollama endpoint
curl http://localhost:11434/api/version

# Verify endpoint in config
{
  "service": {
    "endpoint": "http://localhost:11434"  // Check port
  }
}

4. Out of Memory Errors

# Use smaller model
ollama pull qwen2.5-coder:3b

# Or reduce context size
{
  "service": {
    "fields": {
      "numCtx": 2048  // Reduce from 4096
    }
  }
}

5. Slow Performance

# Check system resources
htop  # or Activity Monitor on macOS

# Use smaller model for faster responses
ollama pull qwen2.5-coder:1.5b

# Reduce token limits
{
  "service": {
    "tokenLimit": 1024,
    "fields": {
      "numPredict": 512
    }
  }
}

Debugging Commands

# Check Ollama status
ollama ps

# Test model directly
ollama run qwen2.5-coder:7b "Write a commit message for adding authentication"

# Check Ollama logs (Linux)
journalctl -u ollama -f

# Verbose coco output
coco --verbose commit

Best Practices

1. Model Selection

  • Start small: Begin with 3B-7B models, upgrade if needed
  • Code-specific models: Use qwen2.5-coder or codellama for better code understanding
  • Match hardware: Don't use models larger than your RAM can handle

2. Configuration Tuning

{
  "service": {
    "temperature": 0.3,        // Lower for more consistent commits
    "maxParsingAttempts": 5,   // Higher for Ollama (less reliable parsing)
    "tokenLimit": 2048,        // Balance context vs speed
    "fields": {
      "numCtx": 4096,          // Larger context for better understanding
      "repeatPenalty": 1.1     // Reduce repetitive output
    }
  }
}

3. Performance Tips

  • Keep models loaded: Ollama keeps recently used models in memory
  • Use SSD storage: Faster model loading and inference
  • Monitor resources: Watch RAM/CPU usage during inference
  • Batch operations: Process multiple commits together when possible

4. Privacy and Security

  • Local processing: All data stays on your machine
  • No internet required: Works completely offline
  • Secure by default: No API keys or external services
  • Audit trail: Full control over model and data

Integration Examples

Team Setup

Shared Model Configuration:

{
  "service": {
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "endpoint": "http://team-ollama-server:11434",
    "temperature": 0.3,
    "tokenLimit": 2048
  },
  "conventionalCommits": true,
  "mode": "interactive"
}

CI/CD Integration

GitHub Actions Example:

name: Generate Commit Messages
on: [push]

jobs:
  commit-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Setup Ollama
        run: |
          curl -fsSL https://ollama.ai/install.sh | sh
          ollama serve &
          sleep 10
          ollama pull qwen2.5-coder:3b
          
      - name: Install Coco
        run: npm install -g git-coco
        
      - name: Generate Commit Message
        run: coco --verbose commit

Development Workflow

Pre-commit Hook:

#!/bin/sh
# .git/hooks/pre-commit

# Generate commit message suggestion
echo "Suggested commit message:"
coco commit

echo "Continue with commit? (y/n)"
read -r response
if [ "$response" != "y" ]; then
    exit 1
fi

Comparison with Cloud APIs

Feature Ollama OpenAI API Anthropic API
Privacy ✅ Local ❌ Cloud ❌ Cloud
Cost ✅ Free 💰 Pay per use 💰 Pay per use
Offline ✅ Yes ❌ No ❌ No
Speed ⚡ Hardware dependent ⚡ Fast ⚡ Fast
Quality 📊 Model dependent 📊 Excellent 📊 Excellent
Setup 🔧 More complex 🔧 Simple 🔧 Simple
Updates 🔄 Manual 🔄 Automatic 🔄 Automatic

Advanced Use Cases

Custom Model Fine-tuning

# Create custom model for your codebase
ollama create my-coco-model -f Modelfile

# Modelfile example
FROM qwen2.5-coder:7b
PARAMETER temperature 0.3
PARAMETER top_p 0.9
SYSTEM "You are a commit message generator for a TypeScript React project. Focus on conventional commits format."

Multi-Model Setup

{
  "service": {
    "provider": "ollama",
    "model": "qwen2.5-coder:7b"
  }
}

Switch models based on context:

# For complex changes, use larger model
COCO_SERVICE_MODEL=qwen2.5-coder:14b coco commit

# For simple changes, use faster model  
COCO_SERVICE_MODEL=qwen2.5-coder:1.5b coco commit

This comprehensive guide provides everything needed to successfully use Ollama with coco, from basic setup to advanced configurations and troubleshooting.

Clone this wiki locally