Skip to content

Commit 321ff3f

Browse files
authored
Merge pull request #16 from Octane0411/feat/harbor-local-test
feat(benchmark): add Harbor local development testing support
2 parents fb8e9b5 + 9dbf70c commit 321ff3f

File tree

10 files changed

+498
-0
lines changed

10 files changed

+498
-0
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,6 @@ RELEASE_NOTES.md
5151

5252
# Worktrees
5353
.worktrees/
54+
55+
# Harbor benchmark results
56+
jobs/

benchmark/terminalbench/harbor/README.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,129 @@ Agent 代码 (create_run_agent_commands)
140140
- alex@laude.org
141141
- mikeam@cs.stanford.edu
142142

143+
## Local Development Testing
144+
145+
Before publishing to npm, you can test Harbor integration with local source code.
146+
147+
### Prerequisites
148+
149+
1. **Colima** (Docker-compatible runtime)
150+
```bash
151+
colima start
152+
```
153+
154+
2. **Harbor** (Python >= 3.12)
155+
```bash
156+
pip install harbor
157+
```
158+
159+
3. **MiniMax API credentials**
160+
```bash
161+
export ANTHROPIC_AUTH_TOKEN="sk-api-..."
162+
export ANTHROPIC_BASE_URL="https://api.minimaxi.com/anthropic/v1"
163+
```
164+
165+
### Quick Test
166+
167+
Use the automated test script:
168+
169+
```bash
170+
# Set MiniMax API credentials
171+
export ANTHROPIC_AUTH_TOKEN="sk-api-..."
172+
export ANTHROPIC_BASE_URL="https://api.minimaxi.com/anthropic/v1"
173+
174+
# Run test
175+
./scripts/test-harbor-local.sh
176+
```
177+
178+
This script will:
179+
1. Check all prerequisites (Colima, Harbor, API keys)
180+
2. Register the local development agent
181+
3. Run the hello-world test task
182+
4. Display results (reward: 1 = success, 0 = failure)
183+
184+
### Manual Testing
185+
186+
For more control, you can run tests manually:
187+
188+
```bash
189+
# 1. Register local development agent
190+
ln -sf $(pwd)/benchmark/terminalbench/harbor/agent_local.py \
191+
$(python -c "import harbor; print(harbor.__path__[0])")/agents/installed/open_agent_sdk_local.py
192+
193+
# 2. Set MiniMax credentials
194+
export ANTHROPIC_AUTH_TOKEN="sk-api-..."
195+
export ANTHROPIC_BASE_URL="https://api.minimaxi.com/anthropic/v1"
196+
197+
# 3. Run single task
198+
harbor jobs start \
199+
--path benchmark/terminalbench/test-tasks/hello-world \
200+
--agent-import-path "harbor.agents.installed.open_agent_sdk_local:OpenAgentSDKAgentLocal" \
201+
--model MiniMax-M2.5
202+
203+
# 4. Check results in logs
204+
# Look for "reward: 1" in verifier output
205+
```
206+
207+
### How It Works
208+
209+
**Local Development Flow:**
210+
```
211+
Host Machine
212+
↓ Set ANTHROPIC_AUTH_TOKEN + ANTHROPIC_BASE_URL
213+
Harbor Agent (agent_local.py)
214+
↓ Use install-open-agent-sdk-local.sh.j2
215+
Sandbox Container
216+
↓ git clone --depth 1 https://github.com/Octane0411/open-agent-sdk.git
217+
↓ bun install && bun run build
218+
↓ cd packages/cli && bun link
219+
↓ Run: oas -p "..." --model MiniMax-M2.5 --base-url https://api.minimaxi.com/anthropic/v1
220+
↓ Call MiniMax API via Anthropic-compatible endpoint
221+
↓ Generate output and save to files
222+
Verifier
223+
↓ Check output files and write reward (1 or 0) to /logs/verifier/reward.txt
224+
```
225+
226+
**Key differences from production:**
227+
- Installs from GitHub repository (not npm)
228+
- Builds packages locally in container
229+
- Uses `bun link` for global CLI access
230+
- Requires code to be pushed to GitHub before testing
231+
232+
### Troubleshooting
233+
234+
**Issue: "Colima is not running"**
235+
```bash
236+
colima start
237+
```
238+
239+
**Issue: "Harbor is not installed"**
240+
```bash
241+
pip install harbor
242+
```
243+
244+
**Issue: "ANTHROPIC_AUTH_TOKEN is not set"**
245+
```bash
246+
export ANTHROPIC_AUTH_TOKEN="sk-api-..."
247+
export ANTHROPIC_BASE_URL="https://api.minimaxi.com/anthropic/v1"
248+
```
249+
250+
**Issue: "git clone failed in container"**
251+
- Ensure your changes are pushed to GitHub
252+
- Check network connectivity in container
253+
- Try: `docker pull ubuntu:24.04` to test Docker networking
254+
255+
**Issue: "bun install failed"**
256+
- Check container has internet access
257+
- Verify package.json is valid
258+
- Check Bun installation logs
259+
260+
**Issue: "reward: 0" (test failed)**
261+
- Check container logs for errors
262+
- Verify API credentials are correct
263+
- Check if `greeting.txt` was created
264+
- Verify greeting has ≥10 words and contains greeting keywords
265+
143266
## 参考
144267

145268
- [Terminal-bench](https://www.tbench.ai/leaderboard/terminal-bench/2.0)
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
"""
2+
open-agent-sdk Harbor Agent Adapter (Local Development)
3+
4+
This is a standalone variant that installs from local source code instead of npm packages.
5+
Use this for testing before publishing to npm.
6+
7+
Usage:
8+
# Register agent
9+
ln -sf $(pwd)/benchmark/terminalbench/harbor/agent_local.py \
10+
$(python -c "import harbor; print(harbor.__path__[0])")/agents/installed/open_agent_sdk_local.py
11+
12+
# Run with MiniMax
13+
export ANTHROPIC_AUTH_TOKEN=your_token
14+
export ANTHROPIC_BASE_URL=https://api.minimaxi.com/anthropic/v1
15+
harbor jobs start \
16+
--path benchmark/terminalbench/test-tasks/hello-world \
17+
--agent-import-path "harbor.agents.installed.open_agent_sdk_local:OpenAgentSDKAgentLocal" \
18+
--model MiniMax-M2.5
19+
"""
20+
21+
import os
22+
from pathlib import Path
23+
24+
from harbor.agents.installed.base import BaseInstalledAgent, ExecInput
25+
from harbor.models.agent.context import AgentContext
26+
27+
28+
# CLI command (installed globally by install script)
29+
CLI_COMMAND = "oas"
30+
31+
32+
def is_minimax_model(model_name: str) -> bool:
33+
"""Check if the model is a MiniMax model."""
34+
return model_name.lower().startswith("minimax")
35+
36+
37+
def get_required_env_vars(model_name: str) -> dict[str, str]:
38+
"""
39+
Determine required environment variables based on model name.
40+
Returns a dict of {env_var_name: env_var_value}.
41+
"""
42+
env_vars = {}
43+
model_lower = model_name.lower()
44+
45+
# MiniMax uses Anthropic compatible endpoint with custom auth
46+
if is_minimax_model(model_name):
47+
auth_token = os.environ.get("ANTHROPIC_AUTH_TOKEN")
48+
base_url = os.environ.get("ANTHROPIC_BASE_URL")
49+
50+
if not auth_token:
51+
raise ValueError(
52+
"MiniMax model requires ANTHROPIC_AUTH_TOKEN environment variable. "
53+
"This is used for Bearer token authentication."
54+
)
55+
if not base_url:
56+
raise ValueError(
57+
"MiniMax model requires ANTHROPIC_BASE_URL environment variable. "
58+
"Example: https://api.minimaxi.com/anthropic/v1"
59+
)
60+
61+
env_vars["ANTHROPIC_AUTH_TOKEN"] = auth_token
62+
env_vars["ANTHROPIC_BASE_URL"] = base_url
63+
return env_vars
64+
65+
# Standard providers
66+
if model_lower.startswith("gemini") or model_lower.startswith("google"):
67+
api_key = os.environ.get("GEMINI_API_KEY")
68+
if not api_key:
69+
raise ValueError("Gemini model requires GEMINI_API_KEY environment variable.")
70+
env_vars["GEMINI_API_KEY"] = api_key
71+
elif model_lower.startswith("claude"):
72+
api_key = os.environ.get("ANTHROPIC_API_KEY")
73+
if not api_key:
74+
raise ValueError("Claude model requires ANTHROPIC_API_KEY environment variable.")
75+
env_vars["ANTHROPIC_API_KEY"] = api_key
76+
elif model_lower.startswith("gpt") or model_lower.startswith("openai"):
77+
api_key = os.environ.get("OPENAI_API_KEY")
78+
if not api_key:
79+
raise ValueError("OpenAI model requires OPENAI_API_KEY environment variable.")
80+
env_vars["OPENAI_API_KEY"] = api_key
81+
else:
82+
# Default to Gemini for unknown models
83+
api_key = os.environ.get("GEMINI_API_KEY")
84+
if not api_key:
85+
raise ValueError(f"Unknown model '{model_name}'. Please set GEMINI_API_KEY for default Gemini provider.")
86+
env_vars["GEMINI_API_KEY"] = api_key
87+
88+
return env_vars
89+
90+
91+
class OpenAgentSDKAgentLocal(BaseInstalledAgent):
92+
"""
93+
Local development variant of OpenAgentSDKAgent.
94+
Installs from GitHub repository instead of npm.
95+
"""
96+
97+
@staticmethod
98+
def name() -> str:
99+
return "open-agent-sdk-local"
100+
101+
def version(self) -> str | None:
102+
return "local-dev"
103+
104+
@property
105+
def _install_agent_template_path(self) -> Path:
106+
"""Override to use local installation script."""
107+
return Path(__file__).parent / "install-open-agent-sdk-local.sh.j2"
108+
109+
def create_run_agent_commands(self, instruction: str) -> list[ExecInput]:
110+
model = self.model_name or "gemini-2.0-flash"
111+
112+
# Get required environment variables
113+
env_vars = get_required_env_vars(model)
114+
115+
# Escape instruction for shell
116+
escaped = instruction.replace('"', '\\"').replace('$', '\\$')
117+
118+
# Build CLI command with env vars inline (for Daytona compatibility)
119+
# Daytona uses shlex.quote on env values which breaks shell variable assignment
120+
env_exports = " && ".join([f'export {k}="{v}"' for k, v in env_vars.items()])
121+
122+
# Build base command (always use /workspace as cwd for Harbor compatibility)
123+
cmd_parts = [
124+
'export PATH="$HOME/.bun/bin:$PATH"',
125+
env_exports,
126+
f'{CLI_COMMAND} -p "{escaped}" --model {model} --cwd /workspace --output-format json'
127+
]
128+
129+
# For MiniMax, add --provider and --base-url flags
130+
if is_minimax_model(model) and "ANTHROPIC_BASE_URL" in env_vars:
131+
base_url = env_vars["ANTHROPIC_BASE_URL"]
132+
# Use anthropic provider with custom base URL
133+
cmd_parts[2] = f'{CLI_COMMAND} --provider anthropic --base-url {base_url} -p "{escaped}" --model {model} --cwd /workspace --output-format json'
134+
135+
return [
136+
ExecInput(
137+
command=" && ".join(cmd_parts),
138+
timeout_sec=600,
139+
)
140+
]
141+
142+
def populate_context_post_run(self, context: AgentContext) -> None:
143+
# Harbor reads stdout from create_run_agent_commands automatically
144+
pass
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/bin/bash
2+
set -euo pipefail
3+
4+
# Install dependencies if not available
5+
if command -v apk &> /dev/null; then
6+
apk add --no-cache curl bash git
7+
elif command -v apt-get &> /dev/null; then
8+
apt-get update
9+
apt-get install -y curl git
10+
fi
11+
12+
# Install Bun if not present
13+
if ! command -v bun &> /dev/null; then
14+
curl -fsSL https://bun.sh/install | bash
15+
fi
16+
export PATH="$HOME/.bun/bin:$PATH"
17+
18+
# Clone repository (shallow clone for faster installation)
19+
echo "Cloning open-agent-sdk from GitHub..."
20+
git clone --depth 1 --branch feat/harbor-local-test https://github.com/Octane0411/open-agent-sdk.git /tmp/open-agent-sdk
21+
22+
# Build packages locally
23+
echo "Building packages..."
24+
cd /tmp/open-agent-sdk
25+
bun install
26+
27+
# Build core package
28+
cd packages/core
29+
bun run build
30+
cd ../..
31+
32+
# Link CLI globally
33+
cd packages/cli
34+
bun link
35+
cd ../..
36+
37+
# Ensure bun bin is in PATH
38+
export PATH="$HOME/.bun/bin:$PATH"
39+
40+
echo "Bun ready: $(bun --version)"
41+
echo "CLI ready: $(which oas)"
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
FROM ubuntu:24.04
2+
3+
# Install basic utilities
4+
RUN apt-get update && apt-get install -y \
5+
curl \
6+
bash \
7+
git \
8+
unzip \
9+
&& rm -rf /var/lib/apt/lists/*
10+
11+
# Set working directory
12+
WORKDIR /workspace
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Create a file named greeting.txt that contains a friendly greeting message welcoming someone to Harbor framework testing. The greeting must be at least 10 words long.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
set -e
3+
4+
# Generate greeting using oas CLI
5+
oas chat "Generate a friendly greeting message (at least 10 words) welcoming someone to Harbor framework testing. Output only the greeting text, no markdown formatting." > greeting.txt
6+
7+
echo "Greeting generated and saved to greeting.txt"
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[metadata]
2+
name = "hello-world"
3+
description = "Simple test task to verify Harbor + open-agent-sdk integration"
4+
version = "1.0.0"
5+
6+
[timeouts]
7+
agent = 180 # Allow time for LLM API call
8+
verifier = 180
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/bin/bash
2+
3+
# Verifier script for hello-world task
4+
# Checks if greeting.txt exists and contains valid content
5+
6+
REWARD_FILE="/logs/verifier/reward.txt"
7+
mkdir -p "$(dirname "$REWARD_FILE")"
8+
9+
# Check if greeting.txt exists
10+
if [ ! -f "greeting.txt" ]; then
11+
echo "0" > "$REWARD_FILE"
12+
echo "FAIL: greeting.txt not found"
13+
exit 0
14+
fi
15+
16+
# Read content
17+
CONTENT=$(cat greeting.txt)
18+
19+
# Check if content has at least 10 words
20+
WORD_COUNT=$(echo "$CONTENT" | wc -w | tr -d ' ')
21+
if [ "$WORD_COUNT" -lt 10 ]; then
22+
echo "0" > "$REWARD_FILE"
23+
echo "FAIL: greeting.txt has only $WORD_COUNT words (need at least 10)"
24+
exit 0
25+
fi
26+
27+
# Check if content contains greeting-related keywords (case-insensitive)
28+
if echo "$CONTENT" | grep -iE "(welcome|greeting|hello|harbor)" > /dev/null; then
29+
echo "1" > "$REWARD_FILE"
30+
echo "PASS: greeting.txt contains valid greeting ($WORD_COUNT words)"
31+
exit 0
32+
else
33+
echo "0" > "$REWARD_FILE"
34+
echo "FAIL: greeting.txt doesn't contain greeting-related keywords"
35+
exit 0
36+
fi

0 commit comments

Comments
 (0)