Everything you need to build reliable applications with Shimmy as your foundation
Whether you're forking Shimmy for your application or integrating it as a service, this guide provides the tools and specifications you need to build systematically and reliably.
Perfect for: Adding local AI capabilities to existing applications
# Start Shimmy server
shimmy serve --bind 127.0.0.1:11435
# Use OpenAI-compatible API
curl -X POST http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "your-model",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'Documentation: See templates/integration_template.md for complete integration guide.
Perfect for: Building specialized AI inference tools tailored to your needs
# Fork and clone
git clone https://github.com/YOUR_USERNAME/shimmy.git
cd shimmy
# Review architectural principles
cat memory/constitution.md
# Plan your features with Spec-Kit methodology
# See "Feature Development Workflow" belowShimmy uses GitHub Spec-Kit methodology for systematic feature development. Here's how to plan and implement features:
Create a detailed specification that focuses on WHAT and WHY, not HOW.
Template: Use templates/spec-template.md
Example: Planning MLX support for Apple Silicon
# Feature Specification: MLX GPU Acceleration
**Feature Branch**: `041-mlx-support`
**Created**: 2025-09-17
**Status**: Draft
## User Scenarios & Testing
- **Primary User**: Developer with Apple Silicon Mac running Shimmy locally
- **Scenario**: User runs `shimmy serve` and expects automatic GPU acceleration
- **Success Criteria**: Model inference uses Metal GPU instead of CPU
## Functional Requirements
- FR-001: Shimmy shall auto-detect Apple Silicon architecture
- FR-002: Shimmy shall prefer MLX backend when available
- FR-003: Shimmy shall fallback to CPU if MLX failsGenerate technical implementation plan from your specification.
Template: Use templates/plan-template.md
Constitutional Check: Ensure your plan complies with Shimmy's principles:
- ✅ Maintains 5MB binary size limit
- ✅ Preserves sub-2-second startup
- ✅ No new Python dependencies
- ✅ Maintains OpenAI API compatibility
Create actionable task list for implementation.
Template: Use templates/tasks-template.md
Example Task Breakdown:
## Tasks: MLX GPU Acceleration
- T001: Add MLX feature flag to Cargo.toml
- T002: Create MLX detection module in src/gpu/
- T003: [P] Write integration tests for MLX backend
- T004: [P] Write unit tests for GPU detection
- T005: Implement MLX model loading
- T006: Add MLX to engine adapter selection logic
- T007: Update documentation and examplesEvery feature must comply with Shimmy's architectural principles:
- 5MB Binary Limit: Core binary cannot exceed 5MB
- Sub-2-Second Startup: Performance must be maintained
- Zero Python Dependencies: Pure Rust implementation only
- Library-First: Features start as standalone libraries
- CLI Interface: All functionality accessible via command line
- Test-First: Comprehensive tests before implementation
- API Compatibility: Maintain OpenAI API compatibility
Before any feature is merged:
- Constitutional compliance verified
- All tests pass:
cargo test --all-features - Integration tests pass
- Startup time < 2 seconds
- Binary size < 5MB
import OpenAI from "openai";
const shimmy = new OpenAI({
baseURL: "http://localhost:11435/v1",
apiKey: "sk-local", // placeholder
});
const response = await shimmy.chat.completions.create({
model: "your-model",
messages: [{ role: "user", content: "Hello!" }],
max_tokens: 100,
});# Programmatic model listing
MODELS=$(shimmy list --short)
# Health check
if curl -f http://localhost:11435/health; then
echo "Shimmy is running"
fi
# Generation with error handling
shimmy generate --name "model" --prompt "test" --max-tokens 50 || {
echo "Generation failed"
exit 1
}FROM rust:1.89 as builder
COPY . /app
WORKDIR /app
RUN cargo build --release --features huggingface
FROM debian:bookworm-slim
COPY --from=builder /app/target/release/shimmy /usr/local/bin/
EXPOSE 11435
CMD ["shimmy", "serve", "--bind", "0.0.0.0:11435"]- Startup Time: Should be < 2 seconds
- Memory Usage: Base 5MB + model size
- Request Latency: Time to first token
- Error Rate: Failed requests percentage
# Basic health check
curl -f http://localhost:11435/health
# Detailed monitoring
curl http://localhost:11435/v1/models | jq '.data | length'shimmy serve --bind 127.0.0.1:11435# docker-compose.yml
version: '3.8'
services:
shimmy-1:
image: shimmy:latest
ports: ["11435:11435"]
shimmy-2:
image: shimmy:latest
ports: ["11436:11435"]
nginx:
image: nginx
# Load balance between instances# Package for Lambda deployment
cargo lambda build --release- Health Checks: Always verify Shimmy is running before requests
- Error Handling: Implement graceful degradation
- Resource Limits: Monitor memory usage with large models
- Security: Bind to localhost for local-only access
- Read Constitution: Understand architectural principles first
- Spec-First: Use
/specify→/plan→/tasksworkflow - Test Coverage: Write tests before implementation
- Performance: Validate startup time and binary size
- Stay Updated: Regularly sync with upstream
- Document Changes: Maintain clear changelog
- Constitutional Respect: Preserve core architectural principles
"The OpenAI API compatibility meant zero code changes. Just pointed my existing client to localhost:11435 and it worked perfectly."
"The constitutional principles gave us confidence the architecture wouldn't drift. We added our custom auth layer while preserving the 5MB advantage."
"The Spec-Kit workflow made it easy to plan the feature systematically. The constitutional checks caught potential performance issues early."
- Integration Templates:
templates/integration_template.md - Constitutional Principles:
memory/constitution.md - Spec-Kit Templates:
.internal/spec-template.md - GitHub Issues: Report bugs or request features
- Discussions: Community Q&A
Building something cool with Shimmy? We'd love to hear about it! Share your project in GitHub Discussions.
Shimmy: Free forever, built to be your reliable foundation for local AI.