-
Notifications
You must be signed in to change notification settings - Fork 0
Description
MVP: Local chat model access for MCP servers
Status: API Foundation Complete - Ready for Model Integration
Current Understanding
Create a Rust library that provides a simple, unified interface for MCP servers to access local instruction-tuned language models. The MVP focuses on chat-style interactions with small, quantized models (1-3B parameters) that can be downloaded and cached locally.
✅ COMPLETED - API Foundation:
- Chat-style conversation builder API (system/user/assistant messages)
- Comprehensive error handling with SmartsError enum
- Async-compatible design using tokio
- Conversation validation and message formatting
- Integration tests and working examples
- Updated mdbook documentation structure
Key design decisions:
- Self-contained (no external dependencies like Ollama)
- Stateless operations (calling tools manage conversation state)
- Target models: TinyLlama-1.1B-Chat, Qwen2-1.5B-Instruct (~600-800MB quantized)
- Use candle-rs for pure Rust model execution
Primary use case: Context tracking and summarization for chat conversations.
Next Steps
- ✅ Set up basic module structure and core types (Smarts, ConversationBuilder, Message)
- ✅ Implement conversation builder API with chat message formatting
- ✅ Add basic error handling and async support
- Integrate candle-rs for local model loading and inference
- Add model-specific prompt formatting (Llama-2-Chat, Qwen2, etc.)
- Create model download and caching system
Open Questions
- Exact candle-rs integration approach and model loading patterns
- Model caching location and management strategy
- How to handle different tokenizers across model families
- Performance considerations for concurrent requests
Context
This library will be used across multiple MCP servers in the Socratic Shell ecosystem. The goal is to provide local AI capabilities without requiring external services, while maintaining a clean abstraction that can later support additional backends (MCP sampling, remote APIs, etc.).
The stateless design allows MCP servers to implement their own conversation management strategies while using the library for the actual model inference.