An offline AI productivity agent optimized for low-resource hardware, built with Go and llama.cpp.
Qore-AI is designed to run efficiently on constrained hardware (HP Folio 830 G3: Core i5, 8GB RAM, 4GB VRAM) with a focus on:
- Offline-first operation with optional MCP connectivity
- Low memory footprint (<6GB total usage)
- Power efficiency with battery awareness
- Local tool execution and document search
- Minimal dependencies (single binary)
- CPU: Core i5 or equivalent (2-4 threads recommended)
- RAM: 8GB (model uses ~4GB with Q4 quantization, app uses 1-2GB)
- Storage: 10GB minimum (for models and data)
- OS: Windows, Linux, or macOS
QoreAI/
├── main.go # Entry point and agent loop
├── tools.go # Local tool implementations
├── search.go # Document indexing and search
├── mcp.go # Optional MCP server integration
├── utils.go # Utility functions (battery check, etc.)
├── config.json # Configuration file
├── go.mod # Go module definition
├── models/ # Directory for GGUF model files
└── data/ # Directory for indexed documents
Download Go 1.21+ from golang.org/dl
# Verify installation
go version# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Build (Windows with MinGW or Linux/Mac)
make -j
# Optional: Build with OpenBLAS for faster math
make LLAMA_OPENBLAS=1 -jDownload a quantized GGUF model optimized for CPU inference:
Recommended Models:
- Llama-3-8B-Q4_K_M.gguf (~4.5GB) - Best balance
- Phi-3-mini-4k-Q4_0.gguf (~2.3GB) - Faster, smaller
- Mistral-7B-Q4_K_M.gguf (~4GB) - Alternative
Sources:
Place downloaded model in QoreAI/models/ directory.
cd QoreAI
go mod download# Standard build
go build -o qore-ai
# Optimized build (smaller binary)
go build -ldflags="-s -w" -o qore-aiEdit config.json to customize:
- Model path
- MCP server settings (if using)
- Thread count (2-4 recommended for your CPU)
- Context size and token limits
./qore-ai
# Or on Windows
qore-ai.exeYou: What time is it?
Qore-AI: [Uses get_time tool] It's 2:45 PM on January 28, 2026
You: Calculate 45 * 23
Qore-AI: [Uses calculate tool] 1035.00
You: Read myfile.txt
Qore-AI: [Uses read_file tool] [displays file content]
- Place documents (.txt, .md files) in the
data/directory - Use search tool:
You: Search for machine learning concepts
Qore-AI: [Indexes documents and returns relevant chunks]
read_file|path- Read file contentswrite_file|path|content- Write to fileget_time- Get current timestampcalculate|expression- Evaluate math expressionssearch_local|query- Search indexed documents
Model Context Protocol extends capabilities via external services when internet is available.
- Set
mcp_enabled: trueinconfig.json - Configure
mcp_server_urlto your MCP server - Restart Qore-AI
Tools prefixed with external_, mcp_, or web_ are routed to MCP server:
You: Fetch web_search|AI news
Qore-AI: [Routes to MCP server if connected]
- Uses Q4 quantization for 4GB model footprint
- Limits context to 2048 tokens
- History truncated to last 10 exchanges
- Auto-detects low battery (<20%)
- Pauses inference and saves state
- Thread count optimized for Core i5
- ~10-20 tokens/second on Core i5
- Faster responses for tool calls
- Search operations are near-instant
Edit tools.go:
case "your_tool":
// Your tool logic
return "result"Update system prompt in main.go to include new tool.
Current implementation uses simple keyword matching. Enhance with:
- TF-IDF scoring
- Embedding-based search (requires additional library)
- BM25 ranking
The MCP client in mcp.go is basic. Enhance with:
- Capability caching
- Retry logic for failed connections
- Support for streaming responses
- Check model path in config.json
- Verify GGUF format compatibility
- Ensure sufficient RAM (need 4GB+ free)
- Reduce thread count (2 threads often optimal)
- Try smaller model (Phi-3-mini)
- Check CPU usage in task manager
- Verify internet connectivity
- Check MCP server URL
- Review server logs for errors
- Disable MCP if not needed
- Use Q4 quantization (not Q5/Q6)
- Reduce context size in config
- Close other applications
- User input received
- History appended to prompt
- Model inference (llama.cpp)
- Tool detection in response
- Tool execution (local or MCP)
- Result fed back to model
- Final response to user
- State saved to history.txt
Inspired by Bytefrost optimization principles:
- Native compilation (Go → machine code)
- Zero runtime overhead
- Minimal dependencies
- Adaptive resource usage (goroutines only when needed)
- Static binary for portability
| Feature | Qore-AI (Go) | Python Alternative |
|---|---|---|
| Binary Size | 10-20MB | 50-100MB+ |
| Memory Overhead | 1-2GB | 2-3GB |
| Startup Time | <1s | 2-5s |
| Inference Speed | ~15 tok/s | ~12 tok/s |
| Dependencies | None (static) | Python runtime + libs |
| Battery Impact | Low | Medium |
- Web UI (optional Tailwind + HTMX frontend)
- Vector embeddings for better search
- Multi-model support with hot-swapping
- Plugin system for custom tools
- Conversation branching
- Export conversations to markdown
- Voice input/output integration
This is optimized for your specific hardware. To adapt:
- Adjust thread count in config.json
- Choose appropriate model size for your RAM
- Modify battery threshold for your use case
MIT License - Free to use and modify
- llama.cpp GitHub
- Go Documentation
- Model Context Protocol
- GGUF Models on HuggingFace
- Terramentis-AI Projects
Built with inspiration from:
- Bytefrost optimization principles
- llama.cpp project by Georgi Gerganov
- Model Context Protocol specification
Status: Ready for development and testing Target Hardware: HP Folio 830 G3 (Core i5, 8GB RAM, 4GB VRAM) Estimated Performance: 10-20 tokens/sec, <6GB RAM usage