Configuration Guide

This guide covers all configuration options for Gosper, including environment variables, config files, model management, and advanced tuning.

Environment Variables
Configuration File
Model Management
Audio Format Specifications
Performance Tuning
Server Configuration

Environment Variables

Gosper can be configured via environment variables. These take precedence over config file settings.

Core Settings

Variable	Type	Default	Description
`GOSPER_MODEL`	string	`ggml-tiny.en.bin`	Model name or absolute path
`GOSPER_LANG`	string	`auto`	Language code (en, es, fr, etc.) or `auto`
`GOSPER_THREADS`	int	CPU cores	Number of threads for inference
`GOSPER_CACHE`	string	OS cache dir	Model cache directory
`GOSPER_LOG`	string	`info`	Log level: `debug`, `info`, `warn`, `error`

Audio Settings (CLI)

Variable	Type	Default	Description
`GOSPER_AUDIO_FEEDBACK`	bool	`false`	Enable beep on recording start/stop
`GOSPER_OUTPUT_DEVICE`	string	`default`	Audio output device ID for beeps
`GOSPER_BEEP_VOLUME`	float	`0.5`	Beep volume (0.0 - 1.0)

Server Settings

Variable	Type	Default	Description
`PORT`	int	`8080`	HTTP server port
`HOST`	string	`0.0.0.0`	HTTP server bind address
`MODEL_BASE_URL`	string	Hugging Face	Base URL for model downloads

Examples

Basic CLI Usage:

export GOSPER_MODEL=ggml-base.en.bin
export GOSPER_LANG=en
export GOSPER_THREADS=4

gosper transcribe meeting.wav

Server Configuration:

export PORT=9000
export GOSPER_MODEL=ggml-medium.en.bin
export GOSPER_CACHE=/var/cache/gosper/models
export GOSPER_LOG=debug

./server

Development Setup:

export GOSPER_LOG=debug
export GOSPER_MODEL_PATH=/path/to/models/ggml-tiny.en.bin
export GOSPER_INTEGRATION=1  # Enable integration tests

make test
make itest

Configuration File

Gosper reads configuration from ~/.config/gosper/config.json (or $XDG_CONFIG_HOME/gosper/config.json).

File Location

Linux/macOS:

~/.config/gosper/config.json

Windows:

%APPDATA%\gosper\config.json

Format

{
  "model": "ggml-base.en.bin",
  "lang": "en",
  "threads": 4,
  "cache_dir": "/path/to/models",
  "log_level": "info",
  "LastDeviceID": "default",
  "AudioFeedback": true,
  "OutputDeviceID": "speakers",
  "BeepVolume": 0.5
}

Fields

Field	Type	Description
`model`	string	Default model name or path
`lang`	string	Default language code
`threads`	int	Thread count for inference
`cache_dir`	string	Model cache directory
`log_level`	string	Logging level
`LastDeviceID`	string	Last used audio input device
`AudioFeedback`	bool	Enable recording beeps
`OutputDeviceID`	string	Audio output device for beeps
`BeepVolume`	float	Beep volume (0.0 - 1.0)

Precedence Order

Configuration is loaded in this order (later overrides earlier):

Default values (hardcoded in binary)
Config file (~/.config/gosper/config.json)
Environment variables (GOSPER_*)
Command-line flags (--model, --lang, etc.)

Example:

# Config file says: model=ggml-tiny.en.bin
# Environment says: GOSPER_MODEL=ggml-base.en.bin
# Command-line says: --model ggml-medium.en.bin

# Result: ggml-medium.en.bin (command-line wins)

Model Management

Model Selection

Option 1: Model Name (auto-download from Hugging Face):

gosper transcribe audio.mp3 --model ggml-base.en.bin
# Downloads to cache if not found

Option 2: Absolute Path:

gosper transcribe audio.mp3 --model /path/to/ggml-large-v3.bin
# Uses local file directly

Option 3: Environment Variable:

export GOSPER_MODEL=ggml-medium.en.bin
gosper transcribe audio.mp3

Model Cache Directory

Default Locations:

Linux: ~/.cache/gosper/
macOS: ~/Library/Caches/gosper/
Windows: %LOCALAPPDATA%\gosper\cache\

Custom Cache:

export GOSPER_CACHE=/var/cache/gosper
gosper transcribe audio.mp3

Model Download Behavior

Check if model path is absolute → use directly
Check cache directory → use if found
Download from MODEL_BASE_URL → save to cache
Verify SHA256 checksum (optional)
Retry with exponential backoff on failure

Model Sources

Default Source (Hugging Face):

https://huggingface.co/ggerganov/whisper.cpp/resolve/main/

Custom Source:

export MODEL_BASE_URL=https://your-cdn.com/models/
gosper transcribe audio.mp3 --model ggml-base.en.bin
# Downloads from: https://your-cdn.com/models/ggml-base.en.bin

Available Models

Model	Size	English-Only	Multilingual	RAM Required	Typical Speed
`ggml-tiny.en.bin`	75 MB	✅	❌	500 MB	5x real-time
`ggml-tiny.bin`	75 MB	❌	✅	500 MB	5x real-time
`ggml-base.en.bin`	142 MB	✅	❌	800 MB	3x real-time
`ggml-base.bin`	142 MB	❌	✅	800 MB	3x real-time
`ggml-small.en.bin`	466 MB	✅	❌	1.5 GB	1.5x real-time
`ggml-small.bin`	466 MB	❌	✅	1.5 GB	1.5x real-time
`ggml-medium.en.bin`	1.5 GB	✅	❌	3 GB	0.5x real-time
`ggml-medium.bin`	1.5 GB	❌	✅	3 GB	0.5x real-time
`ggml-large-v3.bin`	3.1 GB	❌	✅	6 GB	0.25x real-time

Notes:

.en.bin models are English-only (faster, more accurate for English)
Non-.en models support 100+ languages
Speed estimates are approximate (varies by hardware)
RAM includes model + decoded audio buffer

Manual Model Download

# Create cache directory
mkdir -p ~/.cache/gosper

# Download tiny model (fast, English only)
curl -L -o ~/.cache/gosper/ggml-tiny.en.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin

# Download base model (balanced, English only)
curl -L -o ~/.cache/gosper/ggml-base.en.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

# Download large model (best accuracy, multilingual)
curl -L -o ~/.cache/gosper/ggml-large-v3.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3.bin

SHA256 Verification

Gosper optionally verifies model checksums during download.

Expected Checksums:

# ggml-tiny.en.bin
sha256sum ~/.cache/gosper/ggml-tiny.en.bin
# Expected: (check whisper.cpp repository for latest)

# Verify manually
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin.sha256

Audio Format Specifications

WAV Format

Supported:

Extensions: .wav, .Wave, .WAV (case-insensitive)
Sample Rates: 8000 - 96000 Hz
Channels: Mono (1) or Stereo (2)
Bit Depth: 16-bit PCM or 32-bit IEEE float
File Size: No limit

Processing:

Decode PCM samples to float32
Downmix stereo to mono (if stereo): (L + R) / 2
Resample to 16000 Hz (Whisper requirement)

Example:

# 44.1kHz stereo WAV → 16kHz mono for Whisper
gosper transcribe music.wav --lang en

MP3 Format

Supported:

Extensions: .mp3, .MP3 (case-insensitive)
Sample Rates: 8000 - 96000 Hz
Channels: Mono or Stereo
Bitrate: All bitrates (CBR, VBR, ABR)
File Size: Maximum 200 MB compressed

Limitations:

200 MB limit to prevent memory exhaustion
Decoded audio requires ~3x compressed size in memory
VBR MP3s may not report accurate duration until fully decoded

Processing:

Validate file size (reject if > 200 MB)
Decode MP3 to 16-bit stereo PCM
Convert to float32: value / 32768.0
Downmix stereo to mono
Resample to 16000 Hz

For Large Files:

# If MP3 > 200MB, convert to WAV first
ffmpeg -i large-podcast.mp3 large-podcast.wav
gosper transcribe large-podcast.wav

Format Detection

Format is detected by file extension (case-insensitive):

switch ext {
case ".wav", ".Wave", ".WAV":
    return NewWAV(path)
case ".mp3", ".MP3":
    return NewMP3(path)
default:
    return nil, fmt.Errorf("unsupported format: %s", ext)
}

Unsupported Formats: Convert using ffmpeg:

# M4A → MP3
ffmpeg -i audio.m4a audio.mp3

# FLAC → WAV
ffmpeg -i audio.flac audio.wav

# OGG → MP3
ffmpeg -i audio.ogg audio.mp3

Performance Tuning

Thread Count

Automatic (default):

# Uses all available CPU cores
gosper transcribe audio.mp3

Manual:

# Use 4 threads
gosper transcribe audio.mp3 --threads 4

# Use 8 threads
export GOSPER_THREADS=8
gosper transcribe audio.mp3

Guidelines:

2-4 threads: Typical for small models (tiny, base)
4-8 threads: Optimal for medium models
8-16 threads: Large models on high-end CPUs
More threads ≠ always faster (diminishing returns after 8)

Model Selection Strategy

Use Case	Model	Reason
Quick testing	`ggml-tiny.en.bin`	Fastest, good enough for demos
Production (English)	`ggml-base.en.bin`	Balanced speed/accuracy
High accuracy (English)	`ggml-medium.en.bin`	Best for English
Multilingual	`ggml-small.bin`	Good balance for 100+ languages
Maximum accuracy	`ggml-large-v3.bin`	Slowest but most accurate

Memory Optimization

Estimated Memory Usage:

Total RAM = Model Size + Audio Buffer + Overhead

Examples:
- tiny.en + 10 min audio ≈ 75 MB + 200 MB + 50 MB = 325 MB
- medium.en + 1 hour audio ≈ 1.5 GB + 1.2 GB + 300 MB ≈ 3 GB

For Large Audio Files:

# Process in chunks (future feature)
# Currently: use smaller model or add more RAM

Language Detection

Automatic (default):

gosper transcribe audio.mp3 --lang auto
# Whisper detects language (adds ~1s overhead)

Explicit (faster):

gosper transcribe audio.mp3 --lang en
# Skips detection, 10-20% faster

Supported Languages (multilingual models only): English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and 80+ more.

Server Configuration

Basic Setup

# Default (port 8080, all interfaces)
./server

# Custom port
export PORT=9000
./server

# Bind to localhost only
export HOST=127.0.0.1
./server

Docker Configuration

docker run -p 8080:8080 \
  -e GOSPER_MODEL=ggml-base.en.bin \
  -e GOSPER_THREADS=4 \
  -e GOSPER_LOG=info \
  -v gosper-models:/root/.cache/gosper \
  gosper/server:latest

Kubernetes Configuration

ConfigMap (config.yaml):

apiVersion: v1
kind: ConfigMap
metadata:
  name: gosper-config
data:
  GOSPER_MODEL: "ggml-base.en.bin"
  GOSPER_THREADS: "4"
  GOSPER_LOG: "info"
  PORT: "8080"

Deployment (reference ConfigMap):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gosper-be
spec:
  template:
    spec:
      containers:
      - name: server
        image: gosper/server:local
        envFrom:
        - configMapRef:
            name: gosper-config
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"

Resource Limits

Kubernetes Resource Requests (recommended):

Model	Memory Request	Memory Limit	CPU Request	CPU Limit
`tiny`	1 GB	2 GB	500m	1000m
`base`	2 GB	4 GB	1000m	2000m
`small`	4 GB	8 GB	2000m	4000m
`medium`	8 GB	16 GB	2000m	4000m
`large`	16 GB	32 GB	4000m	8000m

Docker Memory Limits:

docker run -p 8080:8080 \
  --memory=4g \
  --cpus=2 \
  -e GOSPER_MODEL=ggml-base.en.bin \
  gosper/server:latest

Advanced Topics

Custom Model Base URL

Host models on your own CDN or file server:

export MODEL_BASE_URL=https://models.yourcompany.com/whisper/

# Downloads from: https://models.yourcompany.com/whisper/ggml-base.en.bin
gosper transcribe audio.mp3 --model ggml-base.en.bin

Logging Configuration

Log Levels:

export GOSPER_LOG=debug   # Verbose debugging
export GOSPER_LOG=info    # Normal operation (default)
export GOSPER_LOG=warn    # Warnings only
export GOSPER_LOG=error   # Errors only

JSON Logging (for structured logging):

# Future feature - currently plain text
export GOSPER_LOG_FORMAT=json

Audio Device Management (CLI)

List Devices:

gosper devices list

Select Device:

# By ID
gosper record --device "hw:0,0"

# By name (fuzzy match)
gosper record --device "USB Microphone"

Device Selection Algorithm:

Exact ID match
Exact name match (case-insensitive)
Prefix match
Substring match
Fuzzy match (Levenshtein distance)

Persist Selection: Last used device is saved to ~/.config/gosper/config.json

Next Steps

API Reference - HTTP API endpoints and examples
Quick Start - Get started quickly
Deployment - Production k8s deployment
Troubleshooting - Common issues and solutions

FilesExpand file tree

CONFIGURATION.md

Latest commit

History