Skip to content

Bollwerkio/modal-agents

Repository files navigation

Modal Agents

A minimal, open-source template for building agentic workflows with PocketFlow, Ollama, and Modal. You can start experimenting at zero cost with this template as Modal provides 30 dollars of free compute time per month.

Overview

This template demonstrates how to build production-ready agents that:

  • Use PocketFlow for workflow orchestration
  • Deploy Ollama models on Modal with GPU acceleration
  • Communicate via i6pn private networking for fast, secure connections
  • Support Langfuse tracing for observability
  • Include mock mode for testing without GPU costs
  • Expose HTTP APIs for integration

Example Agent: Task Breakdown

The included example agent takes a task description and:

  1. AnalyzeNode: Breaks the task into subtasks (1 LLM call)
  2. EstimateNode: Estimates time/complexity for each subtask (1 LLM call)

Simple enough to understand, demonstrates the full pattern.

Quick Start

1. Install Dependencies

# Clone the repository
git clone https://github.com/Bollwerkio/modal-agents.git
cd modal-agents

# Install dependencies with uv (recommended)
uv sync

# (Optional) Set up environment variables
cp .env.example .env
# Edit .env and add your Langfuse keys if you want tracing

# Authenticate with Modal
uv run modal setup

2. Deploy Ollama Service

# Deploy Ollama GPU service to Modal
uv run modal deploy -m ollama_service

# Pull a model to the service
uv run modal run -m ollama_service::OllamaService.pull_model --model-name llama3.2:3b

# Verify service health
uv run modal run -m ollama_service::OllamaService.health_check

3. Deploy the Agent

# Deploy the agent workflow to Modal
uv run modal deploy -m modal_agents.main

# The deployment will output the API endpoint URL
# Example: https://your-workspace--task-agent-analyze.modal.run

4. Run Locally (Development)

# Run locally (uses localhost:11434 if Ollama is running locally)
uv run task-agent analyze "Build a REST API for user management"

# Use different model
uv run task-agent analyze "Create a web app" --model mistral

# Output as JSON
uv run task-agent analyze "Build a CLI tool" --json

# Save to file
uv run task-agent analyze "Design a database schema" -o result.json

5. Run on Modal (Production)

# Run via Modal CLI (uses @app.local_entrypoint)
uv run modal run -m modal_agents.main --task "Build a REST API"

# Or call the function directly
uv run modal run -m modal_agents.main::analyze_task --task "Build a REST API"

# Or call via HTTP API (after deploy)
curl -X POST https://your-workspace--task-agent-analyze.modal.run \
  -H "Content-Type: application/json" \
  -d '{"task": "Build a REST API"}'

6. Test with Mock Mode

# Test without GPU costs (no Ollama needed)
MOCK_LLM_MODE=true uv run task-agent analyze "Build a REST API"

Project Structure

modal-agents/
├── pyproject.toml              # Package config + CLI script entry point
├── ollama_service.py           # Ollama Modal deployment (separate app)
├── modal_agents/
│   ├── __init__.py             # Package exports
│   ├── main.py                 # Modal app + CLI (Typer)
│   ├── pocketflow.py           # PocketFlow framework (~200 lines)
│   ├── ollama.py               # call_ollama() with Langfuse tracing
│   ├── mock.py                 # Mock mode for testing
│   ├── flow.py                 # TaskBreakdownFlow
│   ├── nodes.py                # AnalyzeNode, EstimateNode
│   └── schemas.py              # Pydantic models
└── README.md                   # This file

Local vs Modal Execution

Local Execution

When running locally (via uv run task-agent analyze ...), the agent:

  • Connects to Ollama at localhost:11434 (requires local Ollama installation)
  • Runs in your local environment
  • Good for development and testing

Prerequisites for local execution:

  • Ollama installed and running locally
  • Model pulled locally: ollama pull llama3.2:3b

Modal Execution

When running on Modal, the agent:

  • Connects to Ollama via i6pn private networking (fast, secure)
  • Runs in Modal's cloud infrastructure
  • Good for production workloads

Prerequisites for Modal execution:

  • Ollama service deployed: uv run modal deploy -m ollama_service
  • Model pulled to Modal service: uv run modal run -m ollama_service::OllamaService.pull_model --model-name llama3.2

HTTP API

The deployed Modal app provides REST endpoints using @modal.fastapi_endpoint.

Setup

After deploying with uv run modal deploy -m modal_agents.main, Modal will output the endpoint URLs. The URL pattern is:

https://{workspace}--task-agent-{function}.modal.run

Important: Replace with your actual Modal workspace name from the deployment output.

Endpoints

POST /analyze - Analyze a task

# Replace with your actual endpoint URL from deployment output
curl -X POST https://your-workspace--task-agent-analyze.modal.run \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Build a REST API for user management",
    "model": "llama3.2:3b"
  }'

Response:

{
  "status": "success",
  "task": "Build a REST API for user management",
  "analysis": {
    "subtasks": [...],
    "reasoning": "..."
  },
  "estimate": {
    "estimates": [...],
    "total_hours": 12.0,
    "reasoning": "..."
  },
  "execution_time_seconds": 5.2
}

GET health - Health check

curl https://your-workspace--task-agent-health.modal.run

API Authentication

Note: The example endpoint has NO authentication for demo purposes. For production, add Modal's proxy auth.

Adding Authentication

In modal_agents/main.py, add requires_proxy_auth=True:

@app.function(image=image, ...)
@modal.fastapi_endpoint(method="POST", docs=True, requires_proxy_auth=True)
def analyze(request: AnalyzeRequest) -> dict[str, Any]:
    ...

Creating API Tokens

To create an API token, follow the instructions at: Modal Webhook Proxy Auth

You'll receive:

  • Token ID - use as Modal-Key header
  • Token Secret - use as Modal-Secret header

Important: Use proxy auth tokens, not your Modal account keys.

Calling with Authentication

   # Replace with your actual endpoint URL from deployment output
   curl -X POST https://your-workspace--task-agent-analyze.modal.run \
     -H "Modal-Key: <your-token-id>" \
     -H "Modal-Secret: <your-token-secret>" \
     -H "Content-Type: application/json" \
     -d '{"task": "Build a REST API"}'

Mock Mode

Mock mode allows you to test the full pipeline without making actual API calls or using GPU resources.

Enable Mock Mode

# Via environment variable
MOCK_LLM_MODE=true uv run task-agent analyze "Build a REST API"

# Or set in your shell
export MOCK_LLM_MODE=true
uv run task-agent analyze "Build a REST API"

Use Cases

  • Development: Test workflow logic without GPU costs
  • CI/CD: Run tests without API keys or GPU access
  • Demos: Show the pipeline without needing real infrastructure

Langfuse Tracing

Optional observability with Langfuse. All LLM calls are automatically traced when configured.

Setup

  1. Create a Langfuse account at langfuse.com
  2. Get your API keys
  3. Set environment variables:
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_HOST=https://cloud.langfuse.com

For Modal Deployment

Create a Modal secret:

uv run modal secret create langfuse-secrets \
  LANGFUSE_SECRET_KEY=sk-lf-... \
  LANGFUSE_PUBLIC_KEY=pk-lf-... \
  LANGFUSE_HOST=https://cloud.langfuse.com

Then update main.py to include the secret:

secrets = [modal.Secret.from_name("langfuse-secrets")]

@app.function(
    image=image,
    secrets=secrets,  # Add this
    ...
)

What Gets Traced

  • All LLM calls (prompt, response, tokens, cost)
  • Execution time
  • Model parameters
  • Mock mode calls (marked with mock_mode: true)

Modal Volumes for Caching

Modal Volumes provide persistent storage for caching results and other data.

How It Works

The example agent uses Modal Volumes to cache task breakdown results:

# Create volume
cache_volume = modal.Volume.from_name("task-cache", create_if_missing=True)

# Mount in function
@app.function(volumes={"/cache": cache_volume})
def analyze_task(task: str):
    # Check cache
    cache_path = Path("/cache") / f"{hash(task)}.json"
    if cache_path.exists():
        return json.loads(cache_path.read_text())

    # Run workflow
    result = run_task_breakdown(task)

    # Save to cache
    cache_path.write_text(json.dumps(result))
    cache_volume.commit()  # Persist to durable storage

    return result

Accessing Cached Data

# Browse volume contents
uv run modal shell --volume task-cache

# Download cached results
uv run modal volume get task-cache /cache ./cache_backup

Benefits

  • Cost savings: Avoid redundant LLM calls
  • Faster responses: Instant results for cached tasks
  • Persistent storage: Data survives container restarts

Customizing the Agent

This template is designed to be customized for your use case.

1. Modify the Workflow

Edit modal_agents/flow.py to change the workflow structure:

def create_custom_flow() -> Flow:
    node1 = CustomNode1()
    node2 = CustomNode2()
    node3 = CustomNode3()

    # Connect nodes
    node1 >> node2
    node2 - "action1" >> node3
    node2 - "action2" >> node1  # Loop back

    return Flow(start=node1)

2. Create New Nodes

Create new nodes in modal_agents/nodes.py:

class CustomNode:
    def prep(self, shared: dict) -> dict:
        # Gather context
        return {"data": shared.get("data")}

    def exec(self, prep_res: dict) -> Any:
        # Call LLM or do computation
        result = call_ollama(prompt, model="llama3.2")
        return result

    def post(self, shared: dict, prep_res: dict, exec_res: Any) -> str:
        # Store results and return next action
        shared["result"] = exec_res
        return "next_action"

3. Update Schemas

Add new Pydantic models in modal_agents/schemas.py:

class CustomData(BaseModel):
    field1: str
    field2: int

4. Add New Endpoints

Add new HTTP endpoints in modal_agents/main.py:

@app.function(image=image)
@modal.fastapi_endpoint(method="POST", docs=True)
def custom_endpoint(request: dict) -> dict:
    # Your logic here
    return {"status": "success"}

Note: For complex apps with many related endpoints, consider using @modal.asgi_app() to serve a full FastAPI application under a single URL base. See Modal's web endpoint docs for details.

License

MIT License - feel free to use this template for your own projects!

Contributing

Contributions welcome! This is a minimal template designed to be a starting point. Feel free to:

  • Add more example agents
  • Improve documentation
  • Add tests
  • Enhance the PocketFlow framework

About

A minimal, open-source template for building agentic workflows with PocketFlow, Ollama, and Modal.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages