SAM now supports per-request proxy mode for external tools like CLIO (Command Line Intelligence Orchestrator). This allows CLIO to use SAM as a pure LLM proxy without SAM's tools, prompts, or session management.
There are two ways to enable proxy mode:
- Enable in SAM UI: Preferences → API Server → "Proxy Mode"
- Affects all API requests when enabled
- Not ideal for mixed usage scenarios
- Set
sam_config.bypass_processing = truein your request - Only affects that specific request
- Perfect for tools that manage their own context/tools
curl -X POST http://127.0.0.1:8080/api/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-d '{
"model": "github_copilot/gpt-4.1",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"stream": true,
"sam_config": {
"bypass_processing": true
}
}'- ✅ NO SAM system prompts - Your messages go directly to the LLM
- ✅ NO MCP tools - SAM's 14 tools with 46+ operations are bypassed
- ✅ NO memory/context injection - No SAM conversation history added
- ✅ NO session management - No SAM conversation tracking
- ✅ Pure 1:1 passthrough - Exactly like calling OpenAI API directly
- ✅ All LLM providers - OpenAI, Anthropic, GitHub Copilot, DeepSeek, local models
- ✅ Streaming responses - Real-time token streaming
- ✅ Standard OpenAI format - Compatible with any OpenAI client library
- ✅ CLIO's own tools - CLIO can send its own tools in the request
import requests
import json
def call_sam_proxy(messages, model="github_copilot/gpt-4.1", stream=True):
"""
Call SAM in proxy mode - pure LLM passthrough without SAM processing
"""
url = "http://127.0.0.1:8080/api/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {YOUR_API_TOKEN}"
}
payload = {
"model": model,
"messages": messages,
"stream": stream,
"sam_config": {
"bypass_processing": True # THIS IS THE KEY FIELD
}
}
response = requests.post(url, headers=headers, json=payload, stream=stream)
if stream:
for line in response.iter_lines():
if line:
line_str = line.decode('utf-8')
if line_str.startswith('data: '):
data = line_str[6:] # Remove 'data: ' prefix
if data != '[DONE]':
chunk = json.loads(data)
if 'choices' in chunk and len(chunk['choices']) > 0:
delta = chunk['choices'][0].get('delta', {})
if 'content' in delta:
yield delta['content']
else:
result = response.json()
return result['choices'][0]['message']['content']
# Usage example
messages = [
{"role": "user", "content": "Explain Python decorators"}
]
for token in call_sam_proxy(messages):
print(token, end='', flush=True)SAM supports multiple providers. Use these model identifiers:
Get all available models with capabilities:
curl http://localhost:8080/v1/modelsResponse includes:
{
"object": "list",
"data": [
{
"id": "github_copilot/gpt-4.1",
"object": "model",
"created": 1705557600,
"owned_by": "github",
"context_window": 128000,
"max_completion_tokens": 32000,
"max_request_tokens": 96000
}
]
}Get specific model details:
curl http://localhost:8080/v1/models/github_copilot/gpt-4.1- context_window: Maximum total tokens (input + output)
- max_completion_tokens: Maximum tokens for model response
- max_request_tokens: Maximum tokens for input messages
- is_premium: Whether this is a premium model (GitHub Copilot billing tier)
- premium_multiplier: Billing multiplier for premium models (e.g., 1.5x)
Use these to right-size your requests:
- Query model capabilities before making requests
- Calculate token count of your messages
- Ensure
message_tokens + desired_output_tokens <= context_window - Track premium model usage for billing purposes
- GitHub Copilot:
github_copilot/gpt-4.1,github_copilot/gpt-4o,github_copilot/o1-preview - OpenAI:
openai/gpt-4,openai/gpt-4-turbo,openai/gpt-3.5-turbo - Anthropic:
anthropic/claude-3-5-sonnet,anthropic/claude-3-opus - DeepSeek:
deepseek/deepseek-coder,deepseek/deepseek-chat - Google:
gemini/gemini-2.5-pro,gemini/gemini-1.5-flash
- MLX:
mlx/mlx-community/Llama-3.2-3B-Instruct-4bit - GGUF:
lmstudio-community/Llama-3.2-3B-Instruct-GGUF
SAM requires API authentication for external requests:
- Set API token in SAM UI: Preferences → API Server → "API Token"
- Include in requests:
Authorization: Bearer YOUR_TOKEN - Internal bypass: CLIO can use
X-SAM-Internal: trueheader if running on same machine
| Feature | Standard Mode | Proxy Mode (bypass_processing: true) |
|---|---|---|
| System prompts | ✅ Applied | ❌ Bypassed |
| MCP tools | ✅ Available | ❌ Bypassed |
| Memory/RAG | ✅ Injected | ❌ Bypassed |
| Session tracking | ✅ Tracked | ❌ Bypassed |
| Response format | OpenAI-compatible | OpenAI-compatible |
| Streaming | ✅ Supported | ✅ Supported |
| Provider routing | ✅ Automatic | ✅ Automatic |
- No interference - SAM won't inject prompts or context CLIO doesn't want
- CLIO manages tools - CLIO can send its own tools in the request
- CLIO manages sessions - No SAM conversation state to manage
- Pure LLM responses - Exactly what CLIO expects from an LLM API
- Multi-provider access - Use SAM's configured providers without configuration
- Verify
sam_config.bypass_processingistrue(not"true"string) - Check SAM logs:
tail -f ~/Library/Logs/SAM/sam.log
- Ensure you're sending the field with the correct snake_case:
bypass_processing - Not camelCase:
bypassProcessing
- Set API token in SAM Preferences
- Include
Authorization: Bearer TOKENheader - Or use
X-SAM-Internal: truefor local requests
- Added in: SAM v20260118.2
- Field:
sam_config.bypass_processing(boolean, optional) - Code:
Sources/APIFramework/SAMAPIServer.swift:handleChatCompletion() - Fallback: If not specified, uses global
serverProxyModesetting
See SAM documentation or open an issue at: https://github.com/SyntheticAutonomicMind/SAM