🛠️ Building with Shimmy: Developer Guide

Everything you need to build reliable applications with Shimmy as your foundation

🚀 What Are You Building with Shimmy?

Whether you're forking Shimmy for your application or integrating it as a service, this guide provides the tools and specifications you need to build systematically and reliably.

🎯 Quick Start: Two Powerful Ways to Build with Shimmy

1. 🔧 Integrate Shimmy into Your Application

Perfect for: Adding local AI capabilities to existing applications

# Start Shimmy server
shimmy serve --bind 127.0.0.1:11435

# Use OpenAI-compatible API
curl -X POST http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'

Documentation: See templates/integration_template.md for complete integration guide.

2. 🍴 Fork Shimmy for Custom Solutions

Perfect for: Building specialized AI inference tools tailored to your needs

# Fork and clone
git clone https://github.com/YOUR_USERNAME/shimmy.git
cd shimmy

# Review architectural principles
cat memory/constitution.md

# Plan your features with Spec-Kit methodology
# See "Feature Development Workflow" below

📋 Feature Development Workflow

Shimmy uses GitHub Spec-Kit methodology for systematic feature development. Here's how to plan and implement features:

Step 1: Specify Your Feature (`/specify`)

Create a detailed specification that focuses on WHAT and WHY, not HOW.

Template: Use templates/spec-template.md

Example: Planning MLX support for Apple Silicon

# Feature Specification: MLX GPU Acceleration

**Feature Branch**: `041-mlx-support`
**Created**: 2025-09-17
**Status**: Draft

## User Scenarios & Testing
- **Primary User**: Developer with Apple Silicon Mac running Shimmy locally
- **Scenario**: User runs `shimmy serve` and expects automatic GPU acceleration
- **Success Criteria**: Model inference uses Metal GPU instead of CPU

## Functional Requirements
- FR-001: Shimmy shall auto-detect Apple Silicon architecture
- FR-002: Shimmy shall prefer MLX backend when available
- FR-003: Shimmy shall fallback to CPU if MLX fails

Step 2: Plan Implementation (`/plan`)

Generate technical implementation plan from your specification.

Template: Use templates/plan-template.md

Constitutional Check: Ensure your plan complies with Shimmy's principles:

✅ Maintains 5MB binary size limit
✅ Preserves sub-2-second startup
✅ No new Python dependencies
✅ Maintains OpenAI API compatibility

Step 3: Break Into Tasks (`/tasks`)

Create actionable task list for implementation.

Template: Use templates/tasks-template.md

Example Task Breakdown:

## Tasks: MLX GPU Acceleration

- T001: Add MLX feature flag to Cargo.toml
- T002: Create MLX detection module in src/gpu/
- T003: [P] Write integration tests for MLX backend
- T004: [P] Write unit tests for GPU detection
- T005: Implement MLX model loading
- T006: Add MLX to engine adapter selection logic
- T007: Update documentation and examples

🛡️ Constitutional Compliance

Every feature must comply with Shimmy's architectural principles:

Immutable Constraints

5MB Binary Limit: Core binary cannot exceed 5MB
Sub-2-Second Startup: Performance must be maintained
Zero Python Dependencies: Pure Rust implementation only

Development Requirements

Library-First: Features start as standalone libraries
CLI Interface: All functionality accessible via command line
Test-First: Comprehensive tests before implementation
API Compatibility: Maintain OpenAI API compatibility

Quality Gates

Before any feature is merged:

🔧 Integration Templates

REST API Integration

import OpenAI from "openai";

const shimmy = new OpenAI({
  baseURL: "http://localhost:11435/v1",
  apiKey: "sk-local", // placeholder
});

const response = await shimmy.chat.completions.create({
  model: "your-model",
  messages: [{ role: "user", content: "Hello!" }],
  max_tokens: 100,
});

CLI Integration

# Programmatic model listing
MODELS=$(shimmy list --short)

# Health check
if curl -f http://localhost:11435/health; then
  echo "Shimmy is running"
fi

# Generation with error handling
shimmy generate --name "model" --prompt "test" --max-tokens 50 || {
  echo "Generation failed"
  exit 1
}

Docker Integration

FROM rust:1.89 as builder
COPY . /app
WORKDIR /app
RUN cargo build --release --features huggingface

FROM debian:bookworm-slim
COPY --from=builder /app/target/release/shimmy /usr/local/bin/
EXPOSE 11435
CMD ["shimmy", "serve", "--bind", "0.0.0.0:11435"]

📊 Performance Monitoring

Key Metrics to Track

Startup Time: Should be < 2 seconds
Memory Usage: Base 5MB + model size
Request Latency: Time to first token
Error Rate: Failed requests percentage

Health Check Integration

# Basic health check
curl -f http://localhost:11435/health

# Detailed monitoring
curl http://localhost:11435/v1/models | jq '.data | length'

🚀 Deployment Patterns

Single Instance (Development)

shimmy serve --bind 127.0.0.1:11435

Load Balanced (Production)

# docker-compose.yml
version: '3.8'
services:
  shimmy-1:
    image: shimmy:latest
    ports: ["11435:11435"]
  shimmy-2:
    image: shimmy:latest
    ports: ["11436:11435"]
  nginx:
    image: nginx
    # Load balance between instances

Serverless (AWS Lambda)

# Package for Lambda deployment
cargo lambda build --release

💡 Best Practices

For Application Developers

Health Checks: Always verify Shimmy is running before requests
Error Handling: Implement graceful degradation
Resource Limits: Monitor memory usage with large models
Security: Bind to localhost for local-only access

For Fork Maintainers

Read Constitution: Understand architectural principles first
Spec-First: Use /specify → /plan → /tasks workflow
Test Coverage: Write tests before implementation
Performance: Validate startup time and binary size
Stay Updated: Regularly sync with upstream
Document Changes: Maintain clear changelog
Constitutional Respect: Preserve core architectural principles

🎯 Success Stories

"I integrated Shimmy into my web app"

"The OpenAI API compatibility meant zero code changes. Just pointed my existing client to localhost:11435 and it worked perfectly."

"I forked Shimmy for our enterprise needs"

"The constitutional principles gave us confidence the architecture wouldn't drift. We added our custom auth layer while preserving the 5MB advantage."

"I contributed MLX support"

"The Spec-Kit workflow made it easy to plan the feature systematically. The constitutional checks caught potential performance issues early."

🔗 Resources

Integration Templates: templates/integration_template.md
Constitutional Principles: memory/constitution.md
Spec-Kit Templates: .internal/spec-template.md
GitHub Issues: Report bugs or request features
Discussions: Community Q&A

Building something cool with Shimmy? We'd love to hear about it! Share your project in GitHub Discussions.

Shimmy: Free forever, built to be your reliable foundation for local AI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🛠️ Building with Shimmy: Developer Guide

🚀 What Are You Building with Shimmy?

🎯 Quick Start: Two Powerful Ways to Build with Shimmy

1. 🔧 Integrate Shimmy into Your Application

2. 🍴 Fork Shimmy for Custom Solutions

📋 Feature Development Workflow

Step 1: Specify Your Feature (`/specify`)

Step 2: Plan Implementation (`/plan`)

Step 3: Break Into Tasks (`/tasks`)

🛡️ Constitutional Compliance

Immutable Constraints

Development Requirements

Quality Gates

🔧 Integration Templates

REST API Integration

CLI Integration

Docker Integration

📊 Performance Monitoring

Key Metrics to Track

Health Check Integration

🚀 Deployment Patterns

Single Instance (Development)

Load Balanced (Production)

Serverless (AWS Lambda)

💡 Best Practices

For Application Developers

For Fork Maintainers

🎯 Success Stories

"I integrated Shimmy into my web app"

"I forked Shimmy for our enterprise needs"

"I contributed MLX support"

🔗 Resources

FilesExpand file tree

DEVELOPERS.md

Latest commit

History

DEVELOPERS.md

File metadata and controls

🛠️ Building with Shimmy: Developer Guide

🚀 What Are You Building with Shimmy?

🎯 Quick Start: Two Powerful Ways to Build with Shimmy

1. 🔧 Integrate Shimmy into Your Application

2. 🍴 Fork Shimmy for Custom Solutions

📋 Feature Development Workflow

Step 1: Specify Your Feature (/specify)

Step 2: Plan Implementation (/plan)

Step 3: Break Into Tasks (/tasks)

🛡️ Constitutional Compliance

Immutable Constraints

Development Requirements

Quality Gates

🔧 Integration Templates

REST API Integration

CLI Integration

Docker Integration

📊 Performance Monitoring

Key Metrics to Track

Health Check Integration

🚀 Deployment Patterns

Single Instance (Development)

Load Balanced (Production)

Serverless (AWS Lambda)

💡 Best Practices

For Application Developers

For Fork Maintainers

🎯 Success Stories

"I integrated Shimmy into my web app"

"I forked Shimmy for our enterprise needs"

"I contributed MLX support"

🔗 Resources

Step 1: Specify Your Feature (`/specify`)

Step 2: Plan Implementation (`/plan`)

Step 3: Break Into Tasks (`/tasks`)