A minimal, open-source template for building agentic workflows with PocketFlow, Ollama, and Modal. You can start experimenting at zero cost with this template as Modal provides 30 dollars of free compute time per month.
This template demonstrates how to build production-ready agents that:
- Use PocketFlow for workflow orchestration
- Deploy Ollama models on Modal with GPU acceleration
- Communicate via i6pn private networking for fast, secure connections
- Support Langfuse tracing for observability
- Include mock mode for testing without GPU costs
- Expose HTTP APIs for integration
The included example agent takes a task description and:
- AnalyzeNode: Breaks the task into subtasks (1 LLM call)
- EstimateNode: Estimates time/complexity for each subtask (1 LLM call)
Simple enough to understand, demonstrates the full pattern.
# Clone the repository
git clone https://github.com/Bollwerkio/modal-agents.git
cd modal-agents
# Install dependencies with uv (recommended)
uv sync
# (Optional) Set up environment variables
cp .env.example .env
# Edit .env and add your Langfuse keys if you want tracing
# Authenticate with Modal
uv run modal setup# Deploy Ollama GPU service to Modal
uv run modal deploy -m ollama_service
# Pull a model to the service
uv run modal run -m ollama_service::OllamaService.pull_model --model-name llama3.2:3b
# Verify service health
uv run modal run -m ollama_service::OllamaService.health_check# Deploy the agent workflow to Modal
uv run modal deploy -m modal_agents.main
# The deployment will output the API endpoint URL
# Example: https://your-workspace--task-agent-analyze.modal.run# Run locally (uses localhost:11434 if Ollama is running locally)
uv run task-agent analyze "Build a REST API for user management"
# Use different model
uv run task-agent analyze "Create a web app" --model mistral
# Output as JSON
uv run task-agent analyze "Build a CLI tool" --json
# Save to file
uv run task-agent analyze "Design a database schema" -o result.json# Run via Modal CLI (uses @app.local_entrypoint)
uv run modal run -m modal_agents.main --task "Build a REST API"
# Or call the function directly
uv run modal run -m modal_agents.main::analyze_task --task "Build a REST API"
# Or call via HTTP API (after deploy)
curl -X POST https://your-workspace--task-agent-analyze.modal.run \
-H "Content-Type: application/json" \
-d '{"task": "Build a REST API"}'# Test without GPU costs (no Ollama needed)
MOCK_LLM_MODE=true uv run task-agent analyze "Build a REST API"modal-agents/
├── pyproject.toml # Package config + CLI script entry point
├── ollama_service.py # Ollama Modal deployment (separate app)
├── modal_agents/
│ ├── __init__.py # Package exports
│ ├── main.py # Modal app + CLI (Typer)
│ ├── pocketflow.py # PocketFlow framework (~200 lines)
│ ├── ollama.py # call_ollama() with Langfuse tracing
│ ├── mock.py # Mock mode for testing
│ ├── flow.py # TaskBreakdownFlow
│ ├── nodes.py # AnalyzeNode, EstimateNode
│ └── schemas.py # Pydantic models
└── README.md # This file
When running locally (via uv run task-agent analyze ...), the agent:
- Connects to Ollama at
localhost:11434(requires local Ollama installation) - Runs in your local environment
- Good for development and testing
Prerequisites for local execution:
- Ollama installed and running locally
- Model pulled locally:
ollama pull llama3.2:3b
When running on Modal, the agent:
- Connects to Ollama via i6pn private networking (fast, secure)
- Runs in Modal's cloud infrastructure
- Good for production workloads
Prerequisites for Modal execution:
- Ollama service deployed:
uv run modal deploy -m ollama_service - Model pulled to Modal service:
uv run modal run -m ollama_service::OllamaService.pull_model --model-name llama3.2
The deployed Modal app provides REST endpoints using @modal.fastapi_endpoint.
After deploying with uv run modal deploy -m modal_agents.main, Modal will output the endpoint URLs. The URL pattern is:
https://{workspace}--task-agent-{function}.modal.run
Important: Replace with your actual Modal workspace name from the deployment output.
# Replace with your actual endpoint URL from deployment output
curl -X POST https://your-workspace--task-agent-analyze.modal.run \
-H "Content-Type: application/json" \
-d '{
"task": "Build a REST API for user management",
"model": "llama3.2:3b"
}'Response:
{
"status": "success",
"task": "Build a REST API for user management",
"analysis": {
"subtasks": [...],
"reasoning": "..."
},
"estimate": {
"estimates": [...],
"total_hours": 12.0,
"reasoning": "..."
},
"execution_time_seconds": 5.2
}curl https://your-workspace--task-agent-health.modal.runNote: The example endpoint has NO authentication for demo purposes. For production, add Modal's proxy auth.
In modal_agents/main.py, add requires_proxy_auth=True:
@app.function(image=image, ...)
@modal.fastapi_endpoint(method="POST", docs=True, requires_proxy_auth=True)
def analyze(request: AnalyzeRequest) -> dict[str, Any]:
...To create an API token, follow the instructions at: Modal Webhook Proxy Auth
You'll receive:
- Token ID - use as
Modal-Keyheader - Token Secret - use as
Modal-Secretheader
Important: Use proxy auth tokens, not your Modal account keys.
# Replace with your actual endpoint URL from deployment output
curl -X POST https://your-workspace--task-agent-analyze.modal.run \
-H "Modal-Key: <your-token-id>" \
-H "Modal-Secret: <your-token-secret>" \
-H "Content-Type: application/json" \
-d '{"task": "Build a REST API"}'Mock mode allows you to test the full pipeline without making actual API calls or using GPU resources.
# Via environment variable
MOCK_LLM_MODE=true uv run task-agent analyze "Build a REST API"
# Or set in your shell
export MOCK_LLM_MODE=true
uv run task-agent analyze "Build a REST API"- Development: Test workflow logic without GPU costs
- CI/CD: Run tests without API keys or GPU access
- Demos: Show the pipeline without needing real infrastructure
Optional observability with Langfuse. All LLM calls are automatically traced when configured.
- Create a Langfuse account at langfuse.com
- Get your API keys
- Set environment variables:
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_HOST=https://cloud.langfuse.comCreate a Modal secret:
uv run modal secret create langfuse-secrets \
LANGFUSE_SECRET_KEY=sk-lf-... \
LANGFUSE_PUBLIC_KEY=pk-lf-... \
LANGFUSE_HOST=https://cloud.langfuse.comThen update main.py to include the secret:
secrets = [modal.Secret.from_name("langfuse-secrets")]
@app.function(
image=image,
secrets=secrets, # Add this
...
)- All LLM calls (prompt, response, tokens, cost)
- Execution time
- Model parameters
- Mock mode calls (marked with
mock_mode: true)
Modal Volumes provide persistent storage for caching results and other data.
The example agent uses Modal Volumes to cache task breakdown results:
# Create volume
cache_volume = modal.Volume.from_name("task-cache", create_if_missing=True)
# Mount in function
@app.function(volumes={"/cache": cache_volume})
def analyze_task(task: str):
# Check cache
cache_path = Path("/cache") / f"{hash(task)}.json"
if cache_path.exists():
return json.loads(cache_path.read_text())
# Run workflow
result = run_task_breakdown(task)
# Save to cache
cache_path.write_text(json.dumps(result))
cache_volume.commit() # Persist to durable storage
return result# Browse volume contents
uv run modal shell --volume task-cache
# Download cached results
uv run modal volume get task-cache /cache ./cache_backup- Cost savings: Avoid redundant LLM calls
- Faster responses: Instant results for cached tasks
- Persistent storage: Data survives container restarts
This template is designed to be customized for your use case.
Edit modal_agents/flow.py to change the workflow structure:
def create_custom_flow() -> Flow:
node1 = CustomNode1()
node2 = CustomNode2()
node3 = CustomNode3()
# Connect nodes
node1 >> node2
node2 - "action1" >> node3
node2 - "action2" >> node1 # Loop back
return Flow(start=node1)Create new nodes in modal_agents/nodes.py:
class CustomNode:
def prep(self, shared: dict) -> dict:
# Gather context
return {"data": shared.get("data")}
def exec(self, prep_res: dict) -> Any:
# Call LLM or do computation
result = call_ollama(prompt, model="llama3.2")
return result
def post(self, shared: dict, prep_res: dict, exec_res: Any) -> str:
# Store results and return next action
shared["result"] = exec_res
return "next_action"Add new Pydantic models in modal_agents/schemas.py:
class CustomData(BaseModel):
field1: str
field2: intAdd new HTTP endpoints in modal_agents/main.py:
@app.function(image=image)
@modal.fastapi_endpoint(method="POST", docs=True)
def custom_endpoint(request: dict) -> dict:
# Your logic here
return {"status": "success"}Note: For complex apps with many related endpoints, consider using @modal.asgi_app()
to serve a full FastAPI application under a single URL base. See
Modal's web endpoint docs for details.
MIT License - feel free to use this template for your own projects!
Contributions welcome! This is a minimal template designed to be a starting point. Feel free to:
- Add more example agents
- Improve documentation
- Add tests
- Enhance the PocketFlow framework