Qore-AI

An offline AI productivity agent optimized for low-resource hardware, built with Go and llama.cpp.

Overview

Qore-AI is designed to run efficiently on constrained hardware (HP Folio 830 G3: Core i5, 8GB RAM, 4GB VRAM) with a focus on:

Offline-first operation with optional MCP connectivity
Low memory footprint (<6GB total usage)
Power efficiency with battery awareness
Local tool execution and document search
Minimal dependencies (single binary)

Hardware Requirements

CPU: Core i5 or equivalent (2-4 threads recommended)
RAM: 8GB (model uses ~4GB with Q4 quantization, app uses 1-2GB)
Storage: 10GB minimum (for models and data)
OS: Windows, Linux, or macOS

Project Structure

QoreAI/
├── main.go          # Entry point and agent loop
├── tools.go         # Local tool implementations
├── search.go        # Document indexing and search
├── mcp.go           # Optional MCP server integration
├── utils.go         # Utility functions (battery check, etc.)
├── config.json      # Configuration file
├── go.mod           # Go module definition
├── models/          # Directory for GGUF model files
└── data/            # Directory for indexed documents

Setup Instructions

1. Install Prerequisites

Go Installation

Download Go 1.21+ from golang.org/dl

# Verify installation
go version

llama.cpp Setup

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Build (Windows with MinGW or Linux/Mac)
make -j

# Optional: Build with OpenBLAS for faster math
make LLAMA_OPENBLAS=1 -j

2. Download Model

Download a quantized GGUF model optimized for CPU inference:

Recommended Models:

Llama-3-8B-Q4_K_M.gguf (~4.5GB) - Best balance
Phi-3-mini-4k-Q4_0.gguf (~2.3GB) - Faster, smaller
Mistral-7B-Q4_K_M.gguf (~4GB) - Alternative

Sources:

Place downloaded model in QoreAI/models/ directory.

3. Install Dependencies

cd QoreAI
go mod download

4. Build

# Standard build
go build -o qore-ai

# Optimized build (smaller binary)
go build -ldflags="-s -w" -o qore-ai

5. Configure

Edit config.json to customize:

Model path
MCP server settings (if using)
Thread count (2-4 recommended for your CPU)
Context size and token limits

6. Run

./qore-ai

# Or on Windows
qore-ai.exe

Usage

Basic Commands

You: What time is it?
Qore-AI: [Uses get_time tool] It's 2:45 PM on January 28, 2026

You: Calculate 45 * 23
Qore-AI: [Uses calculate tool] 1035.00

You: Read myfile.txt
Qore-AI: [Uses read_file tool] [displays file content]

Local Search

Place documents (.txt, .md files) in the data/ directory
Use search tool:

You: Search for machine learning concepts
Qore-AI: [Indexes documents and returns relevant chunks]

Available Tools

read_file|path - Read file contents
write_file|path|content - Write to file
get_time - Get current timestamp
calculate|expression - Evaluate math expressions
search_local|query - Search indexed documents

MCP Integration (Optional)

Model Context Protocol extends capabilities via external services when internet is available.

Enable MCP

Set mcp_enabled: true in config.json
Configure mcp_server_url to your MCP server
Restart Qore-AI

MCP Tool Format

Tools prefixed with external_, mcp_, or web_ are routed to MCP server:

You: Fetch web_search|AI news
Qore-AI: [Routes to MCP server if connected]

Performance Optimization

Memory Management

Uses Q4 quantization for 4GB model footprint
Limits context to 2048 tokens
History truncated to last 10 exchanges

Power Efficiency

Auto-detects low battery (<20%)
Pauses inference and saves state
Thread count optimized for Core i5

Speed Expectations

~10-20 tokens/second on Core i5
Faster responses for tool calls
Search operations are near-instant

Development Tips

Adding New Tools

Edit tools.go:

case "your_tool":
    // Your tool logic
    return "result"

Update system prompt in main.go to include new tool.

Improving Search

Current implementation uses simple keyword matching. Enhance with:

TF-IDF scoring
Embedding-based search (requires additional library)
BM25 ranking

Extending MCP

The MCP client in mcp.go is basic. Enhance with:

Capability caching
Retry logic for failed connections
Support for streaming responses

Troubleshooting

Model Loading Fails

Check model path in config.json
Verify GGUF format compatibility
Ensure sufficient RAM (need 4GB+ free)

Slow Inference

Reduce thread count (2 threads often optimal)
Try smaller model (Phi-3-mini)
Check CPU usage in task manager

MCP Connection Issues

Verify internet connectivity
Check MCP server URL
Review server logs for errors
Disable MCP if not needed

High Memory Usage

Use Q4 quantization (not Q5/Q6)
Reduce context size in config
Close other applications

Architecture

Agent Loop

User input received
History appended to prompt
Model inference (llama.cpp)
Tool detection in response
Tool execution (local or MCP)
Result fed back to model
Final response to user
State saved to history.txt

Design Philosophy

Inspired by Bytefrost optimization principles:

Native compilation (Go → machine code)
Zero runtime overhead
Minimal dependencies
Adaptive resource usage (goroutines only when needed)
Static binary for portability

Comparison with Alternatives

Feature	Qore-AI (Go)	Python Alternative
Binary Size	10-20MB	50-100MB+
Memory Overhead	1-2GB	2-3GB
Startup Time	<1s	2-5s
Inference Speed	~15 tok/s	~12 tok/s
Dependencies	None (static)	Python runtime + libs
Battery Impact	Low	Medium

Future Enhancements

Web UI (optional Tailwind + HTMX frontend)
Vector embeddings for better search
Multi-model support with hot-swapping
Plugin system for custom tools
Conversation branching
Export conversations to markdown
Voice input/output integration

Contributing

This is optimized for your specific hardware. To adapt:

Adjust thread count in config.json
Choose appropriate model size for your RAM
Modify battery threshold for your use case

License

MIT License - Free to use and modify

Resources

Acknowledgments

Built with inspiration from:

Bytefrost optimization principles
llama.cpp project by Georgi Gerganov
Model Context Protocol specification

Status: Ready for development and testing Target Hardware: HP Folio 830 G3 (Core i5, 8GB RAM, 4GB VRAM) Estimated Performance: 10-20 tokens/sec, <6GB RAM usage

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
models		models
.gitignore		.gitignore
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
Qore_ai_schema.Json		Qore_ai_schema.Json
README.md		README.md
ROADMAP.md		ROADMAP.md
build.sh		build.sh
config.json		config.json
go.mod		go.mod
main.go		main.go
mcp.go		mcp.go
search.go		search.go
tools.go		tools.go
utils.go		utils.go

terramentis-ai/HyPie-LM

Folders and files

Latest commit

History

Repository files navigation

Qore-AI

Overview

Hardware Requirements

Project Structure

Setup Instructions

1. Install Prerequisites

Go Installation

llama.cpp Setup

2. Download Model

3. Install Dependencies

4. Build

5. Configure

6. Run

Usage

Basic Commands

Local Search

Available Tools

MCP Integration (Optional)

Enable MCP

MCP Tool Format

Performance Optimization

Memory Management

Power Efficiency

Speed Expectations

Development Tips

Adding New Tools

Improving Search

Extending MCP

Troubleshooting

Model Loading Fails

Slow Inference

MCP Connection Issues

High Memory Usage

Architecture

Agent Loop

Design Philosophy

Comparison with Alternatives

Future Enhancements

Contributing

License

Resources

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages