Skip to content

Latest commit

 

History

History
243 lines (183 loc) · 5.55 KB

File metadata and controls

243 lines (183 loc) · 5.55 KB

Training Data Export

Vibe Browse can now export your browser automation sessions as LLM training data.

Why This Matters

Training web agents requires high-quality examples of:

  • Human intent (what the user wants)
  • Agent actions (tool calls with parameters)
  • Browser results (what actually happened)

Every conversation you have with Vibe Browse generates this data automatically.

Usage

During a Session

Type export or /export at any time:

You: export

📊 Export training data
   1. OpenAI fine-tuning format (JSONL)
   2. Anthropic fine-tuning format (JSON)
   3. Raw trajectory (JSON)
   4. All formats

Choice: 1

At Session End

When you type exit, you'll be prompted to export:

You: exit

📊 Export training data?
   1. OpenAI fine-tuning format (JSONL)
   2. Anthropic fine-tuning format (JSON)
   3. Raw trajectory (JSON)
   4. All formats
   N. Skip

Choice: 4

Auto-Export Mode

Use the --export flag to automatically export all formats when your session ends:

npm start "scrape Hacker News" --export

Export Formats

1. OpenAI Fine-Tuning Format (JSONL)

Ready to upload to OpenAI's fine-tuning API:

{"messages": [
  {"role": "system", "content": "You are a browser automation agent..."},
  {"role": "user", "content": "Go to Hacker News and get the top post"},
  {"role": "assistant", "content": null, "tool_calls": [
    {"id": "call_0", "type": "function", "function": {
      "name": "navigate",
      "arguments": "{\"url\":\"https://news.ycombinator.com\"}"
    }}
  ]},
  {"role": "tool", "tool_call_id": "call_0", "name": "navigate", "content": "Navigated successfully..."},
  {"role": "assistant", "content": null, "tool_calls": [
    {"id": "call_0", "type": "function", "function": {
      "name": "extract",
      "arguments": "{\"instruction\":\"get top post title\"}"
    }}
  ]},
  {"role": "tool", "tool_call_id": "call_0", "name": "extract", "content": "Building AGI in my basement"},
  {"role": "assistant", "content": "The top post on Hacker News is 'Building AGI in my basement'"}
]}

Use with:

openai api fine_tuning.jobs.create \
  --training_file openai_2026-01-19_12-30.jsonl \
  --model gpt-4o-mini

2. Anthropic Fine-Tuning Format (JSON)

Structured format for Anthropic's fine-tuning:

{
  "system": "You are a browser automation agent...",
  "turns": [
    {
      "user": "Go to Hacker News and get the top post",
      "assistant": "I'll navigate to Hacker News and extract the top post.",
      "tool_calls": [
        {
          "name": "navigate",
          "input": {"url": "https://news.ycombinator.com"}
        }
      ],
      "tool_results": [
        {
          "name": "navigate",
          "content": "Navigated successfully...",
          "isError": false
        }
      ],
      "timestamp": 1737312000000
    }
  ]
}

3. Raw Trajectory (JSON)

Full session data for custom processing:

{
  "session_id": "2026-01-19_12-30",
  "total_turns": 5,
  "conversations": [...],
  "metadata": {
    "exported_at": "2026-01-19T12:35:00.000Z",
    "tool": "vibe-browse",
    "version": "2.0.0"
  }
}

Session Statistics

Each export includes stats:

Session stats:
  • Total turns: 5
  • Tool calls: 8
  • Success rate: 87.5%

Output Location

All exports are saved to ./training_data/ in your project directory:

training_data/
├── openai_2026-01-19_12-30.jsonl
├── anthropic_2026-01-19_12-30.json
└── trajectory_2026-01-19_12-30.json

Use Cases

1. Fine-Tune Your Own Web Agent

Collect 50-100 examples of specific tasks, then fine-tune:

# Collect examples
npm start "check competitor pricing on example.com" --export
npm start "scrape product reviews from site.com" --export
npm start "extract contact info from company page" --export

# Fine-tune
openai api fine_tuning.jobs.create \
  --training_file training_data/*.jsonl \
  --model gpt-4o-mini

2. Build Training Datasets

Create datasets for specific domains:

  • E-commerce automation
  • Research data collection
  • Competitive intelligence
  • QA testing scenarios

3. Benchmark Your Agents

Use raw trajectories to:

  • Measure agent performance
  • Compare different models
  • Test new prompting strategies
  • Evaluate tool usage patterns

Best Practices

  1. Quality over quantity - Focus on successful, clean examples
  2. Diverse scenarios - Cover different websites and tasks
  3. Error handling - Include some failure examples for robustness
  4. Privacy - Never export sessions with sensitive data (passwords, tokens, etc.)

Examples

Example 1: Training a Hacker News Bot

# Session 1
npm start "get top 5 posts from Hacker News" --export

# Session 2
npm start "find all Show HN posts on the front page" --export

# Session 3
npm start "get the most upvoted comment from the top story" --export

# You now have 3 training examples for HN automation

Example 2: Product Research Agent

# Collect examples of product research
npm start "go to amazon.com and find the price of the top laptop" --export
npm start "compare prices of iPhone on Best Buy vs Amazon" --export
npm start "find reviews for Sony headphones on multiple sites" --export

# Fine-tune a specialized product research agent

Roadmap

Future enhancements:

  • Screenshot embeddings in training data
  • WebArena benchmark format export
  • Automatic data quality scoring
  • Dataset merging and deduplication
  • Privacy-aware PII scrubbing

Built with Hyperbrowser

Follow @hyperbrowser for updates.