Osprey is a lightweight Bash library for interacting with the DMR (Docker Model Runner) API. It provides simple functions to perform chat completions, streaming responses, and conversation memory management with LLM models through OpenAI-compatible APIs.
- Chat Completions: Send messages to LLM models and receive responses
- Streaming Support: Real-time streaming responses for interactive applications
- Conversation Memory: Built-in functions to manage chat history and context
- Simple Integration: Easy-to-use Bash functions that work with any OpenAI-compatible API
jq- A lightweight and flexible command-line JSON processor.curl- A command-line tool for transferring data with URLs.bash- A Unix shell and command language.gum- A tool for creating interactive command-line applications.
curl -fsSL https://github.com/k33g/osprey/releases/download/v0.0.6/osprey.sh -o ./osprey.sh
chmod +x ./osprey.shSource the library in your script:
. "./osprey.sh"DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="ai/qwen2.5:latest"
read -r -d '' DATA <<- EOM
{
"model":"'${MODEL}'",
"messages": [
{"role":"user", "content": "Hello, how are you?"}
],
"stream": false
}
EOM
response=$(osprey_chat ${DMR_BASE_URL} "${DATA}")
echo "${response}"DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="ai/qwen2.5:latest"
read -r -d '' DATA <<- EOM
{
"model":"'${MODEL}'",
"messages": [
{"role":"user", "content": "Hello, how are you?"}
],
"stream": true
}
EOM
function callback() {
echo -ne "$1"
}
osprey_chat_stream ${DMR_BASE_URL} "${DATA}" callbackDMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="hf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_s"
# Define your tools in JSON format
read -r -d '' TOOLS <<- EOM
[
{
"type": "function",
"function": {
"name": "calculate_sum",
"description": "Calculate the sum of two numbers",
"parameters": {
"type": "object",
"properties": {
"a": {"type": "number", "description": "The first number"},
"b": {"type": "number", "description": "The second number"}
},
"required": ["a", "b"]
}
}
}
]
EOM
read -r -d '' DATA <<- EOM
{
"model":"'${MODEL}'",
"messages": [
{"role":"user", "content": "Calculate the sum of 5 and 10"}
],
"tools": '${TOOLS}',
"tool_choice": "auto"
}
EOM
# Make the function call request
response=$(osprey_tool_calls ${DMR_BASE_URL} "${DATA}")
# Extract and process tool calls
TOOL_CALLS=$(get_tool_calls "${response}")
for tool_call in $TOOL_CALLS; do
FUNCTION_NAME=$(get_function_name "$tool_call")
FUNCTION_ARGS=$(get_function_args "$tool_call")
CALL_ID=$(get_call_id "$tool_call")
# Execute your function logic here
case "$FUNCTION_NAME" in
"calculate_sum")
A=$(echo "$FUNCTION_ARGS" | jq -r '.a')
B=$(echo "$FUNCTION_ARGS" | jq -r '.b')
SUM=$((A + B))
echo "Result: $SUM"
;;
esac
doneNote on Parallel Tool Calls: The parallel_tool_calls parameter enables models to make multiple function calls simultaneously. However, only a few local models support this feature effectively:
hf.co/salesforce/llama-xlam-2-8b-fc-r-gguf:q4_k_mhf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_mhf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_shf.co/salesforce/xlam-2-3b-fc-r-gguf:q3_k_l
Example with parallel tool calls:
read -r -d '' DATA <<- EOM
{
"model": "${MODEL}",
"options": {
"temperature": 0.0
},
"messages": [
{
"role": "user",
"content": "Say hello to Bob and to Sam, make the sum of 5 and 37"
}
],
"tools": ${TOOLS},
"parallel_tool_calls": true,
"tool_choice": "auto"
}
EOMSee the examples/ directory for more detailed usage examples including conversation memory management.
Osprey supports Model Context Protocol (MCP) servers with STDIO transport for extended function calling capabilities. You can use custom MCP servers that communicate via standard input/output to provide additional tools and functionalities.
First, build your MCP server Docker image:
cd examples/07-use-mcp/mcp-server
docker build -t osprey-mcp-server:demo .#!/bin/bash
. "./osprey.sh"
DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="hf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_s"
# Define the MCP server command
SERVER_CMD="docker run --rm -i osprey-mcp-server:demo"
# Get available tools from MCP server
MCP_TOOLS=$(get_mcp_tools "$SERVER_CMD")
TOOLS=$(transform_to_openai_format "$MCP_TOOLS")
read -r -d '' DATA <<- EOM
{
"model": "${MODEL}",
"options": {
"temperature": 0.0
},
"messages": [
{
"role": "user",
"content": "Say hello to Bob and calculate the sum of 5 and 37"
}
],
"tools": ${TOOLS},
"parallel_tool_calls": true,
"tool_choice": "auto"
}
EOM
# Make function call request
RESULT=$(osprey_tool_calls ${DMR_BASE_URL} "${DATA}")
TOOL_CALLS=$(get_tool_calls "${RESULT}")
# Process tool calls
for tool_call in $TOOL_CALLS; do
FUNCTION_NAME=$(get_function_name "$tool_call")
FUNCTION_ARGS=$(get_function_args "$tool_call")
# Execute function via MCP
MCP_RESPONSE=$(call_mcp_tool "$SERVER_CMD" "$FUNCTION_NAME" "$FUNCTION_ARGS")
RESULT_CONTENT=$(get_tool_content "$MCP_RESPONSE")
echo "Function result: $RESULT_CONTENT"
doneOsprey supports MCP servers with streamable HTTP transport for real-time tool execution and response streaming. This allows for more interactive experiences with MCP tools that can provide streaming responses.
First, build your streamable HTTP MCP server Docker image:
cd examples/10-use-streamable-mcp/mcp-server
docker build -t osprey-streamable-mcp-server:demo .Start the server:
docker run --rm -p 8080:8080 osprey-streamable-mcp-server:demo#!/bin/bash
. "./osprey.sh"
DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="hf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_s"
# Define the streamable HTTP MCP server endpoint
MCP_SERVER="http://localhost:9090"
# Get available tools from streamable MCP server
MCP_TOOLS=$(get_mcp_http_tools "$MCP_SERVER")
TOOLS=$(transform_to_openai_format "$MCP_TOOLS")
read -r -d '' DATA <<- EOM
{
"model": "${MODEL}",
"options": {
"temperature": 0.0
},
"messages": [
{
"role": "user",
"content": "Say hello to Bob and to Sam, make the sum of 5 and 37"
}
],
"tools": ${TOOLS},
"parallel_tool_calls": true,
"tool_choice": "auto"
}
EOM
# Make function call request
RESULT=$(osprey_tool_calls ${DMR_BASE_URL} "${DATA}")
TOOL_CALLS=$(get_tool_calls "${RESULT}")
# Process tool calls with streaming support
for tool_call in $TOOL_CALLS; do
FUNCTION_NAME=$(get_function_name "$tool_call")
FUNCTION_ARGS=$(get_function_args "$tool_call")
# Execute function via MCP
MCP_RESPONSE=$(call_mcp_http_tool "$MCP_SERVER" "$FUNCTION_NAME" "$FUNCTION_ARGS")
RESULT_CONTENT=$(get_tool_content_http "$MCP_RESPONSE")
echo "Function result: $RESULT_CONTENT"
done- HTTP Standards: Leverages standard HTTP streaming protocols
- Scalability: Easier to deploy and scale than STDIO servers
The Docker MCP Gateway provides access to a collection of pre-built MCP tools through Docker's MCP integration. This allows you to leverage existing MCP tools without setting up individual servers.
#!/bin/bash
. "./osprey.sh"
DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="hf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_s"
# Use Docker MCP Gateway
SERVER_CMD="docker mcp gateway run"
# Get available tools and filter specific ones
MCP_TOOLS=$(get_mcp_tools "$SERVER_CMD")
TOOLS=$(transform_to_openai_format_with_filter "${MCP_TOOLS}" "search" "fetch")
read -r -d '' DATA <<- EOM
{
"model": "${MODEL}",
"options": {
"temperature": 0.0
},
"messages": [
{
"role": "user",
"content": "fetch https://raw.githubusercontent.com/k33g/osprey/refs/heads/main/README.md"
}
],
"tools": ${TOOLS},
"tool_choice": "auto"
}
EOM
# Execute the request
RESULT=$(osprey_tool_calls ${DMR_BASE_URL} "${DATA}")
TOOL_CALLS=$(get_tool_calls "${RESULT}")
# Process tool calls
for tool_call in $TOOL_CALLS; do
FUNCTION_NAME=$(get_function_name "$tool_call")
FUNCTION_ARGS=$(get_function_args "$tool_call")
# Execute function via MCP Gateway
MCP_RESPONSE=$(call_mcp_tool "$SERVER_CMD" "$FUNCTION_NAME" "$FUNCTION_ARGS")
RESULT_CONTENT=$(get_tool_content "$MCP_RESPONSE")
echo "Function result: $RESULT_CONTENT"
doneYou can filter available tools using the transform_to_openai_format_with_filter function to only include tools that match specific criteria:
# Filter tools containing "search" or "fetch"
TOOLS=$(transform_to_openai_format_with_filter "${MCP_TOOLS}" "search" "fetch")You can create containerized AI agents using Docker Compose for easy deployment and management. The examples/05-compose-agent/ directory demonstrates how to build a complete agentic system.
cd examples/05-compose-agent/
docker compose up --build -d
docker attach $(docker compose ps -q seven-of-nine-agent)The agentic compose setup includes:
- Containerized Environment: Complete isolation with all dependencies
- Interactive Interface: Uses
gumfor enhanced command-line interactions - Conversation Memory: Persistent chat history throughout sessions
- Streaming Responses: Real-time token generation
- Character Personas: Configurable system instructions for roleplay
Configure your agent through compose.yml:
services:
your-agent:
build:
context: .
dockerfile: Dockerfile
args:
- OSPREY_VERSION=v0.0.1
tty: true
stdin_open: true
environment:
SYSTEM_INSTRUCTION: |
You are a helpful AI assistant.
Your role is to...
models:
chat_model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_CHAT_MODEL
models:
chat_model:
model: ai/qwen2.5:latest#!/bin/bash
. "./osprey.sh"
# Initialize conversation history array
CONVERSATION_HISTORY=()
function callback() {
echo -ne "$1"
ASSISTANT_RESPONSE+="$1"
}
while true; do
USER_CONTENT=$(gum write --placeholder "How can I help you?")
if [[ "$USER_CONTENT" == "/bye" ]]; then
break
fi
# Add user message to conversation history
add_user_message CONVERSATION_HISTORY "${USER_CONTENT}"
# Build messages array with conversation history
MESSAGES=$(build_messages_array CONVERSATION_HISTORY)
# Create API request with conversation history
read -r -d '' DATA <<- EOM
{
"model":"${MODEL}",
"options": {
"temperature": 0.5,
"repeat_last_n": 2
},
"messages": [${MESSAGES}],
"stream": true
}
EOM
ASSISTANT_RESPONSE=""
osprey_chat_stream ${DMR_BASE_URL} "${DATA}" callback
# Add assistant response to conversation history
add_assistant_message CONVERSATION_HISTORY "${ASSISTANT_RESPONSE}"
echo -e "\n"
doneThis creates a fully interactive, containerized AI agent with conversation memory and streaming responses.