MVP: Local chat model access for MCP servers

# MVP: Local chat model access for MCP servers

**Status**: API Foundation Complete - Ready for Model Integration

## Current Understanding
Create a Rust library that provides a simple, unified interface for MCP servers to access local instruction-tuned language models. The MVP focuses on chat-style interactions with small, quantized models (1-3B parameters) that can be downloaded and cached locally.

**✅ COMPLETED - API Foundation:**
- Chat-style conversation builder API (system/user/assistant messages)
- Comprehensive error handling with SmartsError enum
- Async-compatible design using tokio
- Conversation validation and message formatting
- Integration tests and working examples
- Updated mdbook documentation structure

**Key design decisions:**
- Self-contained (no external dependencies like Ollama)
- Stateless operations (calling tools manage conversation state)
- Target models: TinyLlama-1.1B-Chat, Qwen2-1.5B-Instruct (~600-800MB quantized)
- Use candle-rs for pure Rust model execution

Primary use case: Context tracking and summarization for chat conversations.

## Next Steps
- [ ] ✅ Set up basic module structure and core types (Smarts, ConversationBuilder, Message)
- [ ] ✅ Implement conversation builder API with chat message formatting
- [ ] ✅ Add basic error handling and async support
- [ ] Integrate candle-rs for local model loading and inference
- [ ] Add model-specific prompt formatting (Llama-2-Chat, Qwen2, etc.)
- [ ] Create model download and caching system

## Open Questions  
- Exact candle-rs integration approach and model loading patterns
- Model caching location and management strategy
- How to handle different tokenizers across model families
- Performance considerations for concurrent requests

## Context
This library will be used across multiple MCP servers in the Socratic Shell ecosystem. The goal is to provide local AI capabilities without requiring external services, while maintaining a clean abstraction that can later support additional backends (MCP sampling, remote APIs, etc.).

The stateless design allows MCP servers to implement their own conversation management strategies while using the library for the actual model inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MVP: Local chat model access for MCP servers #1

MVP: Local chat model access for MCP servers

Current Understanding

Next Steps

Open Questions

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MVP: Local chat model access for MCP servers #1

Description

MVP: Local chat model access for MCP servers

Current Understanding

Next Steps

Open Questions

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions