Skip to content

terramentis-ai/HyPie-LM

Repository files navigation

Qore-AI

An offline AI productivity agent optimized for low-resource hardware, built with Go and llama.cpp.

Overview

Qore-AI is designed to run efficiently on constrained hardware (HP Folio 830 G3: Core i5, 8GB RAM, 4GB VRAM) with a focus on:

  • Offline-first operation with optional MCP connectivity
  • Low memory footprint (<6GB total usage)
  • Power efficiency with battery awareness
  • Local tool execution and document search
  • Minimal dependencies (single binary)

Hardware Requirements

  • CPU: Core i5 or equivalent (2-4 threads recommended)
  • RAM: 8GB (model uses ~4GB with Q4 quantization, app uses 1-2GB)
  • Storage: 10GB minimum (for models and data)
  • OS: Windows, Linux, or macOS

Project Structure

QoreAI/
├── main.go          # Entry point and agent loop
├── tools.go         # Local tool implementations
├── search.go        # Document indexing and search
├── mcp.go           # Optional MCP server integration
├── utils.go         # Utility functions (battery check, etc.)
├── config.json      # Configuration file
├── go.mod           # Go module definition
├── models/          # Directory for GGUF model files
└── data/            # Directory for indexed documents

Setup Instructions

1. Install Prerequisites

Go Installation

Download Go 1.21+ from golang.org/dl

# Verify installation
go version

llama.cpp Setup

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Build (Windows with MinGW or Linux/Mac)
make -j

# Optional: Build with OpenBLAS for faster math
make LLAMA_OPENBLAS=1 -j

2. Download Model

Download a quantized GGUF model optimized for CPU inference:

Recommended Models:

  • Llama-3-8B-Q4_K_M.gguf (~4.5GB) - Best balance
  • Phi-3-mini-4k-Q4_0.gguf (~2.3GB) - Faster, smaller
  • Mistral-7B-Q4_K_M.gguf (~4GB) - Alternative

Sources:

Place downloaded model in QoreAI/models/ directory.

3. Install Dependencies

cd QoreAI
go mod download

4. Build

# Standard build
go build -o qore-ai

# Optimized build (smaller binary)
go build -ldflags="-s -w" -o qore-ai

5. Configure

Edit config.json to customize:

  • Model path
  • MCP server settings (if using)
  • Thread count (2-4 recommended for your CPU)
  • Context size and token limits

6. Run

./qore-ai

# Or on Windows
qore-ai.exe

Usage

Basic Commands

You: What time is it?
Qore-AI: [Uses get_time tool] It's 2:45 PM on January 28, 2026

You: Calculate 45 * 23
Qore-AI: [Uses calculate tool] 1035.00

You: Read myfile.txt
Qore-AI: [Uses read_file tool] [displays file content]

Local Search

  1. Place documents (.txt, .md files) in the data/ directory
  2. Use search tool:
You: Search for machine learning concepts
Qore-AI: [Indexes documents and returns relevant chunks]

Available Tools

  • read_file|path - Read file contents
  • write_file|path|content - Write to file
  • get_time - Get current timestamp
  • calculate|expression - Evaluate math expressions
  • search_local|query - Search indexed documents

MCP Integration (Optional)

Model Context Protocol extends capabilities via external services when internet is available.

Enable MCP

  1. Set mcp_enabled: true in config.json
  2. Configure mcp_server_url to your MCP server
  3. Restart Qore-AI

MCP Tool Format

Tools prefixed with external_, mcp_, or web_ are routed to MCP server:

You: Fetch web_search|AI news
Qore-AI: [Routes to MCP server if connected]

Performance Optimization

Memory Management

  • Uses Q4 quantization for 4GB model footprint
  • Limits context to 2048 tokens
  • History truncated to last 10 exchanges

Power Efficiency

  • Auto-detects low battery (<20%)
  • Pauses inference and saves state
  • Thread count optimized for Core i5

Speed Expectations

  • ~10-20 tokens/second on Core i5
  • Faster responses for tool calls
  • Search operations are near-instant

Development Tips

Adding New Tools

Edit tools.go:

case "your_tool":
    // Your tool logic
    return "result"

Update system prompt in main.go to include new tool.

Improving Search

Current implementation uses simple keyword matching. Enhance with:

  • TF-IDF scoring
  • Embedding-based search (requires additional library)
  • BM25 ranking

Extending MCP

The MCP client in mcp.go is basic. Enhance with:

  • Capability caching
  • Retry logic for failed connections
  • Support for streaming responses

Troubleshooting

Model Loading Fails

  • Check model path in config.json
  • Verify GGUF format compatibility
  • Ensure sufficient RAM (need 4GB+ free)

Slow Inference

  • Reduce thread count (2 threads often optimal)
  • Try smaller model (Phi-3-mini)
  • Check CPU usage in task manager

MCP Connection Issues

  • Verify internet connectivity
  • Check MCP server URL
  • Review server logs for errors
  • Disable MCP if not needed

High Memory Usage

  • Use Q4 quantization (not Q5/Q6)
  • Reduce context size in config
  • Close other applications

Architecture

Agent Loop

  1. User input received
  2. History appended to prompt
  3. Model inference (llama.cpp)
  4. Tool detection in response
  5. Tool execution (local or MCP)
  6. Result fed back to model
  7. Final response to user
  8. State saved to history.txt

Design Philosophy

Inspired by Bytefrost optimization principles:

  • Native compilation (Go → machine code)
  • Zero runtime overhead
  • Minimal dependencies
  • Adaptive resource usage (goroutines only when needed)
  • Static binary for portability

Comparison with Alternatives

Feature Qore-AI (Go) Python Alternative
Binary Size 10-20MB 50-100MB+
Memory Overhead 1-2GB 2-3GB
Startup Time <1s 2-5s
Inference Speed ~15 tok/s ~12 tok/s
Dependencies None (static) Python runtime + libs
Battery Impact Low Medium

Future Enhancements

  • Web UI (optional Tailwind + HTMX frontend)
  • Vector embeddings for better search
  • Multi-model support with hot-swapping
  • Plugin system for custom tools
  • Conversation branching
  • Export conversations to markdown
  • Voice input/output integration

Contributing

This is optimized for your specific hardware. To adapt:

  1. Adjust thread count in config.json
  2. Choose appropriate model size for your RAM
  3. Modify battery threshold for your use case

License

MIT License - Free to use and modify

Resources

Acknowledgments

Built with inspiration from:

  • Bytefrost optimization principles
  • llama.cpp project by Georgi Gerganov
  • Model Context Protocol specification

Status: Ready for development and testing Target Hardware: HP Folio 830 G3 (Core i5, 8GB RAM, 4GB VRAM) Estimated Performance: 10-20 tokens/sec, <6GB RAM usage

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages