Skip to content

samlovescoding/batchman

Repository files navigation

Batchman - High-Performance Ollama Batch Processor

A high-performance Python tool for processing large batches of inputs through Ollama LLM with concurrent workers, real-time progress tracking, and performance metrics.

Features

  • Concurrent Processing: Process multiple requests in parallel with configurable worker count
  • Real-time Progress: Live progress bar with ETA, percentage, and average response time
  • Line Preservation: Output file maintains exact line correspondence with input file
  • Error Handling: Automatic JSON parsing error logging to separate error file
  • Performance Metrics: Detailed statistics including throughput, avg response time, and total time
  • Benchmark Mode: Test different worker counts to find optimal performance
  • Robust JSON Extraction: Handles LLM responses with extra text around JSON

Installation

  1. Install Ollama (if not already installed):

  2. Pull the model (example with gemma3:1b):

    ollama pull gemma3:1b
  3. Install Python dependencies:

    pip install -r requirements.txt

Configuration

Edit config.py to customize settings:

# Ollama Configuration
OLLAMA_MODEL = "gemma3:1b"          # Change to your preferred model
OLLAMA_BASE_URL = "http://localhost:11434"
OLLAMA_CONTEXT = 4096
OLLAMA_KEEP_ALIVE = 30              # Minutes to keep model in memory

# File Paths
PROMPT_FILE = "prompt.txt"
INPUT_FILE = "input.txt"
OUTPUT_FILE = "output.jsonl"
ERROR_FILE = "errors.log"

# Performance Settings
PARALLEL_WORKERS = 5                # Adjust based on your system
REQUEST_TIMEOUT = 120               # Seconds per request

Usage

Basic Usage

  1. Prepare your input file (input.txt):

    • One input item per line
    • Each line will be processed separately
  2. Customize your prompt (prompt.txt):

    • Use {INPUT} as a placeholder for each line from input file
    • Example:
      You are an expert classifier.
      Analyze this input: {INPUT}
      Return valid JSON with your analysis.
      
  3. Run the processor:

    python main.py

Output

  • output.jsonl: JSONL file with one JSON object per line (matching input line numbers)
  • errors.log: JSON log entries for any failed parsing attempts

Real-time Progress Display

While running, you'll see:

[12/17] 70.6% | Avg: 2.34s | ETA: 0:00:12 | Elapsed: 0:00:28
  • [12/17]: Current item / Total items
  • 70.6%: Completion percentage
  • Avg: 2.34s: Average response time per item
  • ETA: 0:00:12: Estimated time to completion
  • Elapsed: 0:00:28: Total time elapsed

Performance Optimization

Finding Optimal Worker Count

Use the benchmark script to test different worker counts:

python benchmark.py

This will:

  1. Test with multiple worker counts (1, 3, 5, 10, 15, 20)
  2. Measure throughput and response times
  3. Recommend the optimal worker count for your system
  4. Save detailed results to benchmark_results.json

Factors Affecting Performance

  1. System Resources:

    • CPU cores available
    • RAM (models need memory)
    • Disk I/O speed
  2. Model Size:

    • Smaller models (1b-7b): Can handle more workers
    • Larger models (13b+): Need fewer workers due to memory/CPU constraints
  3. Ollama Configuration:

    • Ensure Ollama has sufficient resources allocated
    • Consider running multiple Ollama instances for extreme parallelism

Performance Tips

  • Start with 5 workers and adjust based on benchmark results
  • Monitor system resources during processing
  • Larger models may benefit from fewer workers (3-5)
  • Smaller models can handle more workers (10-20+)
  • SSD vs HDD: Faster storage helps with model loading

Example Use Case: Music File Classification

The included example classifies music file paths:

Input (input.txt):

C:\Users\Sam\Music\10cc 20th Anniversary\CD14\12 24 Hours (Edit).opus

Prompt (prompt.txt):

You are an expert music classifier.
Extract metadata from this file path: {INPUT}
Return JSON: {"artist": "", "album": "", "year": "", "track_number": "", "track_name": ""}

Output (output.jsonl):

{
  "artist": "10cc",
  "album": "20th Anniversary - CD14",
  "year": "",
  "track_number": "12",
  "track_name": "24 Hours (Edit)"
}

Error Handling

Errors are logged to errors.log in JSON format:

{
  "timestamp": "2025-11-13 14:23:45",
  "line_number": 5,
  "input": "problematic input line",
  "error": "JSON parse error: Expecting value: line 1 column 1"
}

Failed items leave an empty line in output.jsonl to maintain line correspondence.

Troubleshooting

Connection Errors

  • Ensure Ollama is running: ollama serve
  • Check Ollama URL in config matches your setup
  • Verify model is pulled: ollama list

Performance Issues

  • Run benchmark.py to find optimal worker count
  • Reduce PARALLEL_WORKERS if system is overloaded
  • Increase REQUEST_TIMEOUT for slow responses
  • Check system resources (CPU, RAM, disk)

JSON Parsing Errors

  • Check errors.log for specific failures
  • Improve prompt to ensure LLM returns valid JSON
  • The system automatically extracts JSON from surrounding text

Advanced Usage

Custom JSON Extraction

The system automatically finds JSON in LLM responses:

  • Searches for first { and last }
  • Extracts and parses the JSON portion
  • Handles extra text before/after JSON

Line Number Preservation

The system guarantees:

  • Output line N corresponds to input line N
  • Failed items produce empty lines (errors logged separately)
  • JSONL format for easy line-by-line processing

Performance Expectations

With the example configuration (gemma3:1b, 5 workers):

  • Throughput: 2-5 items/second (depending on system)
  • Response time: 0.5-2 seconds per item
  • Scaling: Near-linear up to CPU core count

License

This project is provided as-is for batch processing tasks with Ollama.

Contributing

Feel free to submit issues or pull requests for improvements!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages