Modal Agents

A minimal, open-source template for building agentic workflows with PocketFlow, Ollama, and Modal. You can start experimenting at zero cost with this template as Modal provides 30 dollars of free compute time per month.

Overview

This template demonstrates how to build production-ready agents that:

Use PocketFlow for workflow orchestration
Deploy Ollama models on Modal with GPU acceleration
Communicate via i6pn private networking for fast, secure connections
Support Langfuse tracing for observability
Include mock mode for testing without GPU costs
Expose HTTP APIs for integration

Example Agent: Task Breakdown

The included example agent takes a task description and:

AnalyzeNode: Breaks the task into subtasks (1 LLM call)
EstimateNode: Estimates time/complexity for each subtask (1 LLM call)

Simple enough to understand, demonstrates the full pattern.

Quick Start

1. Install Dependencies

# Clone the repository
git clone https://github.com/Bollwerkio/modal-agents.git
cd modal-agents

# Install dependencies with uv (recommended)
uv sync

# (Optional) Set up environment variables
cp .env.example .env
# Edit .env and add your Langfuse keys if you want tracing

# Authenticate with Modal
uv run modal setup

2. Deploy Ollama Service

# Deploy Ollama GPU service to Modal
uv run modal deploy -m ollama_service

# Pull a model to the service
uv run modal run -m ollama_service::OllamaService.pull_model --model-name llama3.2:3b

# Verify service health
uv run modal run -m ollama_service::OllamaService.health_check

3. Deploy the Agent

# Deploy the agent workflow to Modal
uv run modal deploy -m modal_agents.main

# The deployment will output the API endpoint URL
# Example: https://your-workspace--task-agent-analyze.modal.run

4. Run Locally (Development)

# Run locally (uses localhost:11434 if Ollama is running locally)
uv run task-agent analyze "Build a REST API for user management"

# Use different model
uv run task-agent analyze "Create a web app" --model mistral

# Output as JSON
uv run task-agent analyze "Build a CLI tool" --json

# Save to file
uv run task-agent analyze "Design a database schema" -o result.json

5. Run on Modal (Production)

# Run via Modal CLI (uses @app.local_entrypoint)
uv run modal run -m modal_agents.main --task "Build a REST API"

# Or call the function directly
uv run modal run -m modal_agents.main::analyze_task --task "Build a REST API"

# Or call via HTTP API (after deploy)
curl -X POST https://your-workspace--task-agent-analyze.modal.run \
  -H "Content-Type: application/json" \
  -d '{"task": "Build a REST API"}'

6. Test with Mock Mode

# Test without GPU costs (no Ollama needed)
MOCK_LLM_MODE=true uv run task-agent analyze "Build a REST API"

Project Structure

modal-agents/
├── pyproject.toml              # Package config + CLI script entry point
├── ollama_service.py           # Ollama Modal deployment (separate app)
├── modal_agents/
│   ├── __init__.py             # Package exports
│   ├── main.py                 # Modal app + CLI (Typer)
│   ├── pocketflow.py           # PocketFlow framework (~200 lines)
│   ├── ollama.py               # call_ollama() with Langfuse tracing
│   ├── mock.py                 # Mock mode for testing
│   ├── flow.py                 # TaskBreakdownFlow
│   ├── nodes.py                # AnalyzeNode, EstimateNode
│   └── schemas.py              # Pydantic models
└── README.md                   # This file

Local vs Modal Execution

Local Execution

When running locally (via uv run task-agent analyze ...), the agent:

Connects to Ollama at localhost:11434 (requires local Ollama installation)
Runs in your local environment
Good for development and testing

Prerequisites for local execution:

Ollama installed and running locally
Model pulled locally: ollama pull llama3.2:3b

Modal Execution

When running on Modal, the agent:

Connects to Ollama via i6pn private networking (fast, secure)
Runs in Modal's cloud infrastructure
Good for production workloads

Prerequisites for Modal execution:

Ollama service deployed: uv run modal deploy -m ollama_service
Model pulled to Modal service: uv run modal run -m ollama_service::OllamaService.pull_model --model-name llama3.2

HTTP API

The deployed Modal app provides REST endpoints using @modal.fastapi_endpoint.

Setup

After deploying with uv run modal deploy -m modal_agents.main, Modal will output the endpoint URLs. The URL pattern is:

https://{workspace}--task-agent-{function}.modal.run

Important: Replace with your actual Modal workspace name from the deployment output.

Endpoints

POST /analyze - Analyze a task

# Replace with your actual endpoint URL from deployment output
curl -X POST https://your-workspace--task-agent-analyze.modal.run \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Build a REST API for user management",
    "model": "llama3.2:3b"
  }'

Response:

{
  "status": "success",
  "task": "Build a REST API for user management",
  "analysis": {
    "subtasks": [...],
    "reasoning": "..."
  },
  "estimate": {
    "estimates": [...],
    "total_hours": 12.0,
    "reasoning": "..."
  },
  "execution_time_seconds": 5.2
}

GET health - Health check

curl https://your-workspace--task-agent-health.modal.run

API Authentication

Note: The example endpoint has NO authentication for demo purposes. For production, add Modal's proxy auth.

Adding Authentication

In modal_agents/main.py, add requires_proxy_auth=True:

@app.function(image=image, ...)
@modal.fastapi_endpoint(method="POST", docs=True, requires_proxy_auth=True)
def analyze(request: AnalyzeRequest) -> dict[str, Any]:
    ...

Creating API Tokens

To create an API token, follow the instructions at: Modal Webhook Proxy Auth

You'll receive:

Token ID - use as Modal-Key header
Token Secret - use as Modal-Secret header

Important: Use proxy auth tokens, not your Modal account keys.

Calling with Authentication

   # Replace with your actual endpoint URL from deployment output
   curl -X POST https://your-workspace--task-agent-analyze.modal.run \
     -H "Modal-Key: <your-token-id>" \
     -H "Modal-Secret: <your-token-secret>" \
     -H "Content-Type: application/json" \
     -d '{"task": "Build a REST API"}'

Mock Mode

Mock mode allows you to test the full pipeline without making actual API calls or using GPU resources.

Enable Mock Mode

# Via environment variable
MOCK_LLM_MODE=true uv run task-agent analyze "Build a REST API"

# Or set in your shell
export MOCK_LLM_MODE=true
uv run task-agent analyze "Build a REST API"

Use Cases

Development: Test workflow logic without GPU costs
CI/CD: Run tests without API keys or GPU access
Demos: Show the pipeline without needing real infrastructure

Langfuse Tracing

Optional observability with Langfuse. All LLM calls are automatically traced when configured.

Setup

Create a Langfuse account at langfuse.com
Get your API keys
Set environment variables:

export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_HOST=https://cloud.langfuse.com

For Modal Deployment

Create a Modal secret:

uv run modal secret create langfuse-secrets \
  LANGFUSE_SECRET_KEY=sk-lf-... \
  LANGFUSE_PUBLIC_KEY=pk-lf-... \
  LANGFUSE_HOST=https://cloud.langfuse.com

Then update main.py to include the secret:

secrets = [modal.Secret.from_name("langfuse-secrets")]

@app.function(
    image=image,
    secrets=secrets,  # Add this
    ...
)

What Gets Traced

All LLM calls (prompt, response, tokens, cost)
Execution time
Model parameters
Mock mode calls (marked with mock_mode: true)

Modal Volumes for Caching

Modal Volumes provide persistent storage for caching results and other data.

How It Works

The example agent uses Modal Volumes to cache task breakdown results:

# Create volume
cache_volume = modal.Volume.from_name("task-cache", create_if_missing=True)

# Mount in function
@app.function(volumes={"/cache": cache_volume})
def analyze_task(task: str):
    # Check cache
    cache_path = Path("/cache") / f"{hash(task)}.json"
    if cache_path.exists():
        return json.loads(cache_path.read_text())

    # Run workflow
    result = run_task_breakdown(task)

    # Save to cache
    cache_path.write_text(json.dumps(result))
    cache_volume.commit()  # Persist to durable storage

    return result

Accessing Cached Data

# Browse volume contents
uv run modal shell --volume task-cache

# Download cached results
uv run modal volume get task-cache /cache ./cache_backup

Benefits

Cost savings: Avoid redundant LLM calls
Faster responses: Instant results for cached tasks
Persistent storage: Data survives container restarts

Customizing the Agent

This template is designed to be customized for your use case.

1. Modify the Workflow

Edit modal_agents/flow.py to change the workflow structure:

def create_custom_flow() -> Flow:
    node1 = CustomNode1()
    node2 = CustomNode2()
    node3 = CustomNode3()

    # Connect nodes
    node1 >> node2
    node2 - "action1" >> node3
    node2 - "action2" >> node1  # Loop back

    return Flow(start=node1)

2. Create New Nodes

Create new nodes in modal_agents/nodes.py:

class CustomNode:
    def prep(self, shared: dict) -> dict:
        # Gather context
        return {"data": shared.get("data")}

    def exec(self, prep_res: dict) -> Any:
        # Call LLM or do computation
        result = call_ollama(prompt, model="llama3.2")
        return result

    def post(self, shared: dict, prep_res: dict, exec_res: Any) -> str:
        # Store results and return next action
        shared["result"] = exec_res
        return "next_action"

3. Update Schemas

Add new Pydantic models in modal_agents/schemas.py:

class CustomData(BaseModel):
    field1: str
    field2: int

4. Add New Endpoints

Add new HTTP endpoints in modal_agents/main.py:

@app.function(image=image)
@modal.fastapi_endpoint(method="POST", docs=True)
def custom_endpoint(request: dict) -> dict:
    # Your logic here
    return {"status": "success"}

Note: For complex apps with many related endpoints, consider using @modal.asgi_app() to serve a full FastAPI application under a single URL base. See Modal's web endpoint docs for details.

License

MIT License - feel free to use this template for your own projects!

Contributing

Contributions welcome! This is a minimal template designed to be a starting point. Feel free to:

Add more example agents
Improve documentation
Add tests
Enhance the PocketFlow framework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
modal_agents		modal_agents
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
ollama_service.py		ollama_service.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Modal Agents

Overview

Example Agent: Task Breakdown

Quick Start

1. Install Dependencies

2. Deploy Ollama Service

3. Deploy the Agent

4. Run Locally (Development)

5. Run on Modal (Production)

6. Test with Mock Mode

Project Structure

Local vs Modal Execution

Local Execution

Modal Execution

HTTP API

Setup

Endpoints

POST /analyze - Analyze a task

GET health - Health check

API Authentication

Adding Authentication

Creating API Tokens

Calling with Authentication

Mock Mode

Enable Mock Mode

Use Cases

Langfuse Tracing

Setup

For Modal Deployment

What Gets Traced

Modal Volumes for Caching

How It Works

Accessing Cached Data

Benefits

Customizing the Agent

1. Modify the Workflow

2. Create New Nodes

3. Update Schemas

4. Add New Endpoints

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages