Skip to content

hybridgroup/osprey

 
 

Repository files navigation

Osprey

Osprey is a lightweight Bash library for interacting with the DMR (Docker Model Runner) API. It provides simple functions to perform chat completions, streaming responses, and conversation memory management with LLM models through OpenAI-compatible APIs.

Features

  • Chat Completions: Send messages to LLM models and receive responses
  • Streaming Support: Real-time streaming responses for interactive applications
  • Conversation Memory: Built-in functions to manage chat history and context
  • Simple Integration: Easy-to-use Bash functions that work with any OpenAI-compatible API

Requirements

  • jq - A lightweight and flexible command-line JSON processor.
  • curl - A command-line tool for transferring data with URLs.
  • bash - A Unix shell and command language.
  • gum - A tool for creating interactive command-line applications.

Install the library

curl -fsSL https://github.com/k33g/osprey/releases/download/v0.0.6/osprey.sh -o ./osprey.sh
chmod +x ./osprey.sh

Usage

Source the library in your script:

. "./osprey.sh"

Basic Chat Completion

DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="ai/qwen2.5:latest"

read -r -d '' DATA <<- EOM
{
  "model":"'${MODEL}'",
  "messages": [
    {"role":"user", "content": "Hello, how are you?"}
  ],
  "stream": false
}
EOM

response=$(osprey_chat ${DMR_BASE_URL} "${DATA}")
echo "${response}"

Streaming Chat

DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="ai/qwen2.5:latest"

read -r -d '' DATA <<- EOM
{
  "model":"'${MODEL}'",
  "messages": [
    {"role":"user", "content": "Hello, how are you?"}
  ],
  "stream": true
}
EOM

function callback() {
  echo -ne "$1" 
}

osprey_chat_stream ${DMR_BASE_URL} "${DATA}" callback

Function Calling

DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="hf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_s"

# Define your tools in JSON format
read -r -d '' TOOLS <<- EOM
[
  {
    "type": "function",
    "function": {
      "name": "calculate_sum",
      "description": "Calculate the sum of two numbers",
      "parameters": {
        "type": "object",
        "properties": {
          "a": {"type": "number", "description": "The first number"},
          "b": {"type": "number", "description": "The second number"}
        },
        "required": ["a", "b"]
      }
    }
  }
]
EOM

read -r -d '' DATA <<- EOM
{
  "model":"'${MODEL}'",
  "messages": [
    {"role":"user", "content": "Calculate the sum of 5 and 10"}
  ],
  "tools": '${TOOLS}',
  "tool_choice": "auto"
}
EOM

# Make the function call request
response=$(osprey_tool_calls ${DMR_BASE_URL} "${DATA}")

# Extract and process tool calls
TOOL_CALLS=$(get_tool_calls "${response}")
for tool_call in $TOOL_CALLS; do
    FUNCTION_NAME=$(get_function_name "$tool_call")
    FUNCTION_ARGS=$(get_function_args "$tool_call")
    CALL_ID=$(get_call_id "$tool_call")
    
    # Execute your function logic here
    case "$FUNCTION_NAME" in
        "calculate_sum")
            A=$(echo "$FUNCTION_ARGS" | jq -r '.a')
            B=$(echo "$FUNCTION_ARGS" | jq -r '.b')
            SUM=$((A + B))
            echo "Result: $SUM"
            ;;
    esac
done

Note on Parallel Tool Calls: The parallel_tool_calls parameter enables models to make multiple function calls simultaneously. However, only a few local models support this feature effectively:

  • hf.co/salesforce/llama-xlam-2-8b-fc-r-gguf:q4_k_m
  • hf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_m
  • hf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_s
  • hf.co/salesforce/xlam-2-3b-fc-r-gguf:q3_k_l

Example with parallel tool calls:

read -r -d '' DATA <<- EOM
{
  "model": "${MODEL}",
  "options": {
    "temperature": 0.0
  },
  "messages": [
    {
      "role": "user",
      "content": "Say hello to Bob and to Sam, make the sum of 5 and 37"
    }
  ],
  "tools": ${TOOLS},
  "parallel_tool_calls": true,
  "tool_choice": "auto"
}
EOM

See the examples/ directory for more detailed usage examples including conversation memory management.

Using STDIO MCP Server

Osprey supports Model Context Protocol (MCP) servers with STDIO transport for extended function calling capabilities. You can use custom MCP servers that communicate via standard input/output to provide additional tools and functionalities.

Setting up an MCP Server

First, build your MCP server Docker image:

cd examples/07-use-mcp/mcp-server
docker build -t osprey-mcp-server:demo .

Using MCP Tools

#!/bin/bash
. "./osprey.sh"

DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="hf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_s"

# Define the MCP server command
SERVER_CMD="docker run --rm -i osprey-mcp-server:demo"

# Get available tools from MCP server
MCP_TOOLS=$(get_mcp_tools "$SERVER_CMD")
TOOLS=$(transform_to_openai_format "$MCP_TOOLS")

read -r -d '' DATA <<- EOM
{
  "model": "${MODEL}",
  "options": {
    "temperature": 0.0
  },
  "messages": [
    {
      "role": "user",
      "content": "Say hello to Bob and calculate the sum of 5 and 37"
    }
  ],
  "tools": ${TOOLS},
  "parallel_tool_calls": true,
  "tool_choice": "auto"
}
EOM

# Make function call request
RESULT=$(osprey_tool_calls ${DMR_BASE_URL} "${DATA}")
TOOL_CALLS=$(get_tool_calls "${RESULT}")

# Process tool calls
for tool_call in $TOOL_CALLS; do
    FUNCTION_NAME=$(get_function_name "$tool_call")
    FUNCTION_ARGS=$(get_function_args "$tool_call")
    
    # Execute function via MCP
    MCP_RESPONSE=$(call_mcp_tool "$SERVER_CMD" "$FUNCTION_NAME" "$FUNCTION_ARGS")
    RESULT_CONTENT=$(get_tool_content "$MCP_RESPONSE")
    
    echo "Function result: $RESULT_CONTENT"
done

Using Streamable HTTP MCP Server

Osprey supports MCP servers with streamable HTTP transport for real-time tool execution and response streaming. This allows for more interactive experiences with MCP tools that can provide streaming responses.

Setting up a Streamable HTTP MCP Server

First, build your streamable HTTP MCP server Docker image:

cd examples/10-use-streamable-mcp/mcp-server
docker build -t osprey-streamable-mcp-server:demo .

Start the server:

docker run --rm -p 8080:8080 osprey-streamable-mcp-server:demo

Using Streamable MCP Tools

#!/bin/bash
. "./osprey.sh"

DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="hf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_s"

# Define the streamable HTTP MCP server endpoint
MCP_SERVER="http://localhost:9090"

# Get available tools from streamable MCP server
MCP_TOOLS=$(get_mcp_http_tools "$MCP_SERVER")
TOOLS=$(transform_to_openai_format "$MCP_TOOLS")

read -r -d '' DATA <<- EOM
{
  "model": "${MODEL}",
  "options": {
    "temperature": 0.0
  },
  "messages": [
    {
      "role": "user",
      "content": "Say hello to Bob and to Sam, make the sum of 5 and 37"
    }
  ],
  "tools": ${TOOLS},
  "parallel_tool_calls": true,
  "tool_choice": "auto"
}
EOM

# Make function call request
RESULT=$(osprey_tool_calls ${DMR_BASE_URL} "${DATA}")
TOOL_CALLS=$(get_tool_calls "${RESULT}")

# Process tool calls with streaming support
for tool_call in $TOOL_CALLS; do
    FUNCTION_NAME=$(get_function_name "$tool_call")
    FUNCTION_ARGS=$(get_function_args "$tool_call")
        
    # Execute function via MCP
    MCP_RESPONSE=$(call_mcp_http_tool "$MCP_SERVER" "$FUNCTION_NAME" "$FUNCTION_ARGS")
    RESULT_CONTENT=$(get_tool_content_http "$MCP_RESPONSE")
    
    echo "Function result: $RESULT_CONTENT"
done

Benefits of Streamable HTTP Transport

  • HTTP Standards: Leverages standard HTTP streaming protocols
  • Scalability: Easier to deploy and scale than STDIO servers

Using Docker MCP Gateway

The Docker MCP Gateway provides access to a collection of pre-built MCP tools through Docker's MCP integration. This allows you to leverage existing MCP tools without setting up individual servers.

Basic Usage

#!/bin/bash
. "./osprey.sh"

DMR_BASE_URL="http://localhost:12434/engines/llama.cpp/v1"
MODEL="hf.co/salesforce/xlam-2-3b-fc-r-gguf:q4_k_s"

# Use Docker MCP Gateway
SERVER_CMD="docker mcp gateway run"

# Get available tools and filter specific ones
MCP_TOOLS=$(get_mcp_tools "$SERVER_CMD")
TOOLS=$(transform_to_openai_format_with_filter "${MCP_TOOLS}" "search" "fetch")

read -r -d '' DATA <<- EOM
{
  "model": "${MODEL}",
  "options": {
    "temperature": 0.0
  },
  "messages": [
    {
      "role": "user",
      "content": "fetch https://raw.githubusercontent.com/k33g/osprey/refs/heads/main/README.md"
    }
  ],
  "tools": ${TOOLS},
  "tool_choice": "auto"
}
EOM

# Execute the request
RESULT=$(osprey_tool_calls ${DMR_BASE_URL} "${DATA}")
TOOL_CALLS=$(get_tool_calls "${RESULT}")

# Process tool calls
for tool_call in $TOOL_CALLS; do
    FUNCTION_NAME=$(get_function_name "$tool_call")
    FUNCTION_ARGS=$(get_function_args "$tool_call")
    
    # Execute function via MCP Gateway
    MCP_RESPONSE=$(call_mcp_tool "$SERVER_CMD" "$FUNCTION_NAME" "$FUNCTION_ARGS")
    RESULT_CONTENT=$(get_tool_content "$MCP_RESPONSE")
    
    echo "Function result: $RESULT_CONTENT"
done

Tool Filtering

You can filter available tools using the transform_to_openai_format_with_filter function to only include tools that match specific criteria:

# Filter tools containing "search" or "fetch"
TOOLS=$(transform_to_openai_format_with_filter "${MCP_TOOLS}" "search" "fetch")

Creating an Agent with Agentic Compose

You can create containerized AI agents using Docker Compose for easy deployment and management. The examples/05-compose-agent/ directory demonstrates how to build a complete agentic system.

Quick Start

cd examples/05-compose-agent/
docker compose up --build -d
docker attach $(docker compose ps -q seven-of-nine-agent)

Agent Architecture

The agentic compose setup includes:

  • Containerized Environment: Complete isolation with all dependencies
  • Interactive Interface: Uses gum for enhanced command-line interactions
  • Conversation Memory: Persistent chat history throughout sessions
  • Streaming Responses: Real-time token generation
  • Character Personas: Configurable system instructions for roleplay

Configuration

Configure your agent through compose.yml:

services:
  your-agent:
    build:
      context: .
      dockerfile: Dockerfile
      args:
        - OSPREY_VERSION=v0.0.1
    tty: true
    stdin_open: true
    environment:
      SYSTEM_INSTRUCTION: |
        You are a helpful AI assistant.
        Your role is to...
    models:
      chat_model:
        endpoint_var: MODEL_RUNNER_BASE_URL
        model_var: MODEL_RUNNER_CHAT_MODEL

models:
  chat_model:
    model: ai/qwen2.5:latest

Agent Script Structure

#!/bin/bash
. "./osprey.sh"

# Initialize conversation history array
CONVERSATION_HISTORY=()

function callback() {
  echo -ne "$1"
  ASSISTANT_RESPONSE+="$1"
}

while true; do
  USER_CONTENT=$(gum write --placeholder "How can I help you?")
  
  if [[ "$USER_CONTENT" == "/bye" ]]; then
    break
  fi

  # Add user message to conversation history
  add_user_message CONVERSATION_HISTORY "${USER_CONTENT}"

  # Build messages array with conversation history
  MESSAGES=$(build_messages_array CONVERSATION_HISTORY)
  
  # Create API request with conversation history
  read -r -d '' DATA <<- EOM
{
  "model":"${MODEL}",
  "options": {
    "temperature": 0.5,
    "repeat_last_n": 2
  },
  "messages": [${MESSAGES}],
  "stream": true
}
EOM
  
  ASSISTANT_RESPONSE=""
  osprey_chat_stream ${DMR_BASE_URL} "${DATA}" callback
  
  # Add assistant response to conversation history
  add_assistant_message CONVERSATION_HISTORY "${ASSISTANT_RESPONSE}"
  
  echo -e "\n"
done

This creates a fully interactive, containerized AI agent with conversation memory and streaming responses.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Shell 78.4%
  • Dockerfile 17.8%
  • Go 3.8%