-
Notifications
You must be signed in to change notification settings - Fork 0
Using Ollama
Ollama allows you to run large language models locally on your machine, providing privacy, offline capability, and no API costs. This guide covers everything you need to know about using Ollama with coco.
Ollama is a tool that makes it easy to run large language models locally. It provides:
- Privacy: Your code never leaves your machine
- No API Costs: Run models without paying per request
- Offline Capability: Work without internet connection
- Performance: Direct access to your hardware (CPU/GPU)
- Model Variety: Access to many open-source models
macOS:
# Using Homebrew
brew install ollama
# Or download from https://ollama.ai/Linux:
# Install script
curl -fsSL https://ollama.ai/install.sh | sh
# Or using package managers
# Ubuntu/Debian
sudo apt install ollama
# Arch Linux
yay -S ollamaWindows:
# Download installer from https://ollama.ai/
# Or use Windows Subsystem for Linux (WSL)# Start the Ollama service
ollama serve
# Or run as background service (Linux/macOS)
sudo systemctl start ollama# Recommended models for code generation
ollama pull qwen2.5-coder:7b # Best balance of speed/quality
ollama pull llama3.1:8b # Good general purpose model
ollama pull codellama:13b # Specialized for code
# Smaller models for faster responses
ollama pull qwen2.5-coder:1.5b # Very fast, good for simple commits
ollama pull llama3.2:3b # Fast and capable
# Larger models for better quality (requires more RAM)
ollama pull qwen2.5-coder:32b # Highest quality code model
ollama pull llama3.1:70b # Excellent but requires 40GB+ RAMThe easiest way to configure Ollama with coco:
# Run the setup wizard
coco init
# Select "ollama" when prompted for provider
# Choose from your installed models
# Wizard will configure everything automaticallyCreate or update your .coco.config.json:
{
"service": {
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"endpoint": "http://localhost:11434",
"authentication": {
"type": "None"
}
}
}Qwen2.5-Coder Series (Best for coco):
ollama pull qwen2.5-coder:1.5b # 1.5B params - Very fast, 2GB RAM
ollama pull qwen2.5-coder:3b # 3B params - Fast, 4GB RAM
ollama pull qwen2.5-coder:7b # 7B params - Balanced, 8GB RAM ⭐ Recommended
ollama pull qwen2.5-coder:14b # 14B params - High quality, 16GB RAM
ollama pull qwen2.5-coder:32b # 32B params - Highest quality, 32GB RAMCodeLlama Series:
ollama pull codellama:7b # 7B params - Good for code, 8GB RAM
ollama pull codellama:13b # 13B params - Better quality, 16GB RAM
ollama pull codellama:34b # 34B params - High quality, 32GB RAMLlama 3.1/3.2 Series:
ollama pull llama3.2:1b # 1B params - Very fast, 2GB RAM
ollama pull llama3.2:3b # 3B params - Fast and capable, 4GB RAM
ollama pull llama3.1:8b # 8B params - Excellent balance, 8GB RAM ⭐ Recommended
ollama pull llama3.1:70b # 70B params - Top quality, 40GB+ RAMDeepSeek R1 Series (Latest):
ollama pull deepseek-r1:1.5b # 1.5B params - Very fast reasoning
ollama pull deepseek-r1:8b # 8B params - Good reasoning, 8GB RAM
ollama pull deepseek-r1:32b # 32B params - Excellent reasoning, 32GB RAM{
"service": {
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"endpoint": "http://localhost:11434",
"tokenLimit": 2048,
"temperature": 0.4,
"maxConcurrent": 1,
"authentication": {
"type": "None"
}
}
}{
"service": {
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"endpoint": "http://localhost:11434",
"tokenLimit": 4096,
"temperature": 0.3,
"maxConcurrent": 1,
"maxParsingAttempts": 5,
"requestOptions": {
"timeout": 120000,
"maxRetries": 3
},
"authentication": {
"type": "None"
},
"fields": {
"numCtx": 4096,
"numPredict": 2048,
"repeatPenalty": 1.1,
"topK": 40,
"topP": 0.9,
"seed": -1,
"stop": ["\n\n", "```"]
}
}
}| Parameter | Description | Default | Recommended |
|---|---|---|---|
model |
Ollama model name | - | qwen2.5-coder:7b |
endpoint |
Ollama server URL | http://localhost:11434 |
Default |
tokenLimit |
Max tokens per request | 2048 |
2048-4096 |
temperature |
Randomness (0.0-1.0) | 0.4 |
0.3-0.4 |
maxConcurrent |
Concurrent requests | 1 |
1 (Ollama limitation) |
numCtx |
Context window size | 2048 |
4096 |
numPredict |
Max tokens to generate | 128 |
1024-2048 |
repeatPenalty |
Repetition penalty | 1.1 |
1.1 |
topK |
Top-K sampling | 40 |
40 |
topP |
Top-P sampling | 0.9 |
0.9 |
Minimum Requirements:
- RAM: 8GB (for 7B models)
- Storage: 10GB free space
- CPU: Modern multi-core processor
Recommended Setup:
- RAM: 16GB+ (for better performance)
- GPU: NVIDIA GPU with 8GB+ VRAM (optional but faster)
- Storage: SSD for faster model loading
- CPU: 8+ cores for better inference speed
NVIDIA GPU (CUDA):
# Ollama automatically uses GPU if available
# Verify GPU usage
ollama ps
# Check GPU memory usage
nvidia-smiApple Silicon (M1/M2/M3):
# Ollama automatically uses Metal acceleration
# Monitor with Activity MonitorAMD GPU (ROCm - Linux only):
# Install ROCm drivers first
# Ollama will detect and use AMD GPU8GB RAM:
{
"service": {
"model": "qwen2.5-coder:3b" // or llama3.2:3b
}
}16GB RAM:
{
"service": {
"model": "qwen2.5-coder:7b" // or llama3.1:8b
}
}32GB+ RAM:
{
"service": {
"model": "qwen2.5-coder:14b" // or codellama:13b
}
}Server Setup:
# On the server machine
OLLAMA_HOST=0.0.0.0:11434 ollama serve
# Or set environment variable permanently
export OLLAMA_HOST=0.0.0.0:11434
ollama serveClient Configuration:
{
"service": {
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"endpoint": "http://192.168.1.100:11434",
"authentication": {
"type": "None"
}
}
}Run Ollama in Docker:
# CPU only
docker run -d \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
# With GPU support
docker run -d \
--gpus=all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
# Pull models
docker exec -it ollama ollama pull qwen2.5-coder:7b1. Ollama Service Not Running
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Start Ollama service
ollama serve
# Or as system service (Linux)
sudo systemctl start ollama
sudo systemctl enable ollama2. Model Not Found
# List installed models
ollama list
# Pull the model if missing
ollama pull qwen2.5-coder:7b
# Check model name in coco config matches exactly3. Connection Refused
# Check Ollama endpoint
curl http://localhost:11434/api/version
# Verify endpoint in config
{
"service": {
"endpoint": "http://localhost:11434" // Check port
}
}4. Out of Memory Errors
# Use smaller model
ollama pull qwen2.5-coder:3b
# Or reduce context size
{
"service": {
"fields": {
"numCtx": 2048 // Reduce from 4096
}
}
}5. Slow Performance
# Check system resources
htop # or Activity Monitor on macOS
# Use smaller model for faster responses
ollama pull qwen2.5-coder:1.5b
# Reduce token limits
{
"service": {
"tokenLimit": 1024,
"fields": {
"numPredict": 512
}
}
}# Check Ollama status
ollama ps
# Test model directly
ollama run qwen2.5-coder:7b "Write a commit message for adding authentication"
# Check Ollama logs (Linux)
journalctl -u ollama -f
# Verbose coco output
coco --verbose commit- Start small: Begin with 3B-7B models, upgrade if needed
-
Code-specific models: Use
qwen2.5-coderorcodellamafor better code understanding - Match hardware: Don't use models larger than your RAM can handle
{
"service": {
"temperature": 0.3, // Lower for more consistent commits
"maxParsingAttempts": 5, // Higher for Ollama (less reliable parsing)
"tokenLimit": 2048, // Balance context vs speed
"fields": {
"numCtx": 4096, // Larger context for better understanding
"repeatPenalty": 1.1 // Reduce repetitive output
}
}
}- Keep models loaded: Ollama keeps recently used models in memory
- Use SSD storage: Faster model loading and inference
- Monitor resources: Watch RAM/CPU usage during inference
- Batch operations: Process multiple commits together when possible
- Local processing: All data stays on your machine
- No internet required: Works completely offline
- Secure by default: No API keys or external services
- Audit trail: Full control over model and data
Shared Model Configuration:
{
"service": {
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"endpoint": "http://team-ollama-server:11434",
"temperature": 0.3,
"tokenLimit": 2048
},
"conventionalCommits": true,
"mode": "interactive"
}GitHub Actions Example:
name: Generate Commit Messages
on: [push]
jobs:
commit-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Ollama
run: |
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve &
sleep 10
ollama pull qwen2.5-coder:3b
- name: Install Coco
run: npm install -g git-coco
- name: Generate Commit Message
run: coco --verbose commitPre-commit Hook:
#!/bin/sh
# .git/hooks/pre-commit
# Generate commit message suggestion
echo "Suggested commit message:"
coco commit
echo "Continue with commit? (y/n)"
read -r response
if [ "$response" != "y" ]; then
exit 1
fi| Feature | Ollama | OpenAI API | Anthropic API |
|---|---|---|---|
| Privacy | ✅ Local | ❌ Cloud | ❌ Cloud |
| Cost | ✅ Free | 💰 Pay per use | 💰 Pay per use |
| Offline | ✅ Yes | ❌ No | ❌ No |
| Speed | ⚡ Hardware dependent | ⚡ Fast | ⚡ Fast |
| Quality | 📊 Model dependent | 📊 Excellent | 📊 Excellent |
| Setup | 🔧 More complex | 🔧 Simple | 🔧 Simple |
| Updates | 🔄 Manual | 🔄 Automatic | 🔄 Automatic |
# Create custom model for your codebase
ollama create my-coco-model -f Modelfile
# Modelfile example
FROM qwen2.5-coder:7b
PARAMETER temperature 0.3
PARAMETER top_p 0.9
SYSTEM "You are a commit message generator for a TypeScript React project. Focus on conventional commits format."{
"service": {
"provider": "ollama",
"model": "qwen2.5-coder:7b"
}
}Switch models based on context:
# For complex changes, use larger model
COCO_SERVICE_MODEL=qwen2.5-coder:14b coco commit
# For simple changes, use faster model
COCO_SERVICE_MODEL=qwen2.5-coder:1.5b coco commitThis comprehensive guide provides everything needed to successfully use Ollama with coco, from basic setup to advanced configurations and troubleshooting.