Skip to content

MVP: Local chat model access for MCP servers #1

@nikomatsakis

Description

@nikomatsakis

MVP: Local chat model access for MCP servers

Status: API Foundation Complete - Ready for Model Integration

Current Understanding

Create a Rust library that provides a simple, unified interface for MCP servers to access local instruction-tuned language models. The MVP focuses on chat-style interactions with small, quantized models (1-3B parameters) that can be downloaded and cached locally.

✅ COMPLETED - API Foundation:

  • Chat-style conversation builder API (system/user/assistant messages)
  • Comprehensive error handling with SmartsError enum
  • Async-compatible design using tokio
  • Conversation validation and message formatting
  • Integration tests and working examples
  • Updated mdbook documentation structure

Key design decisions:

  • Self-contained (no external dependencies like Ollama)
  • Stateless operations (calling tools manage conversation state)
  • Target models: TinyLlama-1.1B-Chat, Qwen2-1.5B-Instruct (~600-800MB quantized)
  • Use candle-rs for pure Rust model execution

Primary use case: Context tracking and summarization for chat conversations.

Next Steps

  • ✅ Set up basic module structure and core types (Smarts, ConversationBuilder, Message)
  • ✅ Implement conversation builder API with chat message formatting
  • ✅ Add basic error handling and async support
  • Integrate candle-rs for local model loading and inference
  • Add model-specific prompt formatting (Llama-2-Chat, Qwen2, etc.)
  • Create model download and caching system

Open Questions

  • Exact candle-rs integration approach and model loading patterns
  • Model caching location and management strategy
  • How to handle different tokenizers across model families
  • Performance considerations for concurrent requests

Context

This library will be used across multiple MCP servers in the Socratic Shell ecosystem. The goal is to provide local AI capabilities without requiring external services, while maintaining a clean abstraction that can later support additional backends (MCP sampling, remote APIs, etc.).

The stateless design allows MCP servers to implement their own conversation management strategies while using the library for the actual model inference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ai-managedAI assistant can update this issueenhancementNew feature or requesttracking-issueOngoing work item tracked across multiple sessions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions