Skip to content

Commit 246892c

Browse files
authored
Merge pull request #3 from MattMagg/003-agent-orchestration-ui
feat: Agent Orchestration UI
2 parents 79695bf + f4444ff commit 246892c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+13271
-619
lines changed

.agent/rules/specify-rules.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# claw-dash Development Guidelines
2+
3+
Auto-generated from all feature plans. Last updated: 2026-02-22
4+
5+
## Active Technologies
6+
7+
- TypeScript / Node.js + Next.js (App Router), React, `lucide-react`, `shadcn/ui`, Tailwind CSS, `zod`, `trpc`, `@xyflow/react` (003-agent-orchestration-ui)
8+
9+
## Project Structure
10+
11+
```text
12+
src/
13+
tests/
14+
```
15+
16+
## Commands
17+
18+
npm test && npm run lint
19+
20+
## Code Style
21+
22+
TypeScript / Node.js: Follow standard conventions
23+
24+
## Recent Changes
25+
26+
- 003-agent-orchestration-ui: Added TypeScript / Node.js + Next.js (App Router), React, `lucide-react`, `shadcn/ui`, Tailwind CSS, `zod`, `trpc`, `@xyflow/react`
27+
28+
<!-- MANUAL ADDITIONS START -->
29+
<!-- MANUAL ADDITIONS END -->

.agent/skills/crawl/SKILL.md

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
---
2+
name: crawl
3+
description: "Crawl any website and save pages as local markdown files. Use when you need to download documentation, knowledge bases, or web content for offline access or analysis. No code required - just provide a URL."
4+
---
5+
6+
# Crawl Skill
7+
8+
Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.
9+
10+
## Authentication
11+
12+
The script uses OAuth via the Tavily MCP server. **No manual setup required** - on first run, it will:
13+
1. Check for existing tokens in `~/.mcp-auth/`
14+
2. If none found, automatically open your browser for OAuth authentication
15+
16+
> **Note:** You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. [Sign up at tavily.com](https://tavily.com) first if you don't have an account.
17+
18+
### Alternative: API Key
19+
20+
If you prefer using an API key, get one at https://tavily.com and add to `~/.claude/settings.json`:
21+
```json
22+
{
23+
"env": {
24+
"TAVILY_API_KEY": "tvly-your-api-key-here"
25+
}
26+
}
27+
```
28+
29+
## Quick Start
30+
31+
### Using the Script
32+
33+
```bash
34+
./scripts/crawl.sh '<json>' [output_dir]
35+
```
36+
37+
**Examples:**
38+
```bash
39+
# Basic crawl
40+
./scripts/crawl.sh '{"url": "https://docs.example.com"}'
41+
42+
# Deeper crawl with limits
43+
./scripts/crawl.sh '{"url": "https://docs.example.com", "max_depth": 2, "limit": 50}'
44+
45+
# Save to files
46+
./scripts/crawl.sh '{"url": "https://docs.example.com", "max_depth": 2}' ./docs
47+
48+
# Focused crawl with path filters
49+
./scripts/crawl.sh '{"url": "https://example.com", "max_depth": 2, "select_paths": ["/docs/.*", "/api/.*"], "exclude_paths": ["/blog/.*"]}'
50+
51+
# With semantic instructions (for agentic use)
52+
./scripts/crawl.sh '{"url": "https://docs.example.com", "instructions": "Find API documentation", "chunks_per_source": 3}'
53+
```
54+
55+
When `output_dir` is provided, each crawled page is saved as a separate markdown file.
56+
57+
### Basic Crawl
58+
59+
```bash
60+
curl --request POST \
61+
--url https://api.tavily.com/crawl \
62+
--header "Authorization: Bearer $TAVILY_API_KEY" \
63+
--header 'Content-Type: application/json' \
64+
--data '{
65+
"url": "https://docs.example.com",
66+
"max_depth": 1,
67+
"limit": 20
68+
}'
69+
```
70+
71+
### Focused Crawl with Instructions
72+
73+
```bash
74+
curl --request POST \
75+
--url https://api.tavily.com/crawl \
76+
--header "Authorization: Bearer $TAVILY_API_KEY" \
77+
--header 'Content-Type: application/json' \
78+
--data '{
79+
"url": "https://docs.example.com",
80+
"max_depth": 2,
81+
"instructions": "Find API documentation and code examples",
82+
"chunks_per_source": 3,
83+
"select_paths": ["/docs/.*", "/api/.*"]
84+
}'
85+
```
86+
87+
## API Reference
88+
89+
### Endpoint
90+
91+
```
92+
POST https://api.tavily.com/crawl
93+
```
94+
95+
### Headers
96+
97+
| Header | Value |
98+
|--------|-------|
99+
| `Authorization` | `Bearer <TAVILY_API_KEY>` |
100+
| `Content-Type` | `application/json` |
101+
102+
### Request Body
103+
104+
| Field | Type | Default | Description |
105+
|-------|------|---------|-------------|
106+
| `url` | string | Required | Root URL to begin crawling |
107+
| `max_depth` | integer | 1 | Levels deep to crawl (1-5) |
108+
| `max_breadth` | integer | 20 | Links per page |
109+
| `limit` | integer | 50 | Total pages cap |
110+
| `instructions` | string | null | Natural language guidance for focus |
111+
| `chunks_per_source` | integer | 3 | Chunks per page (1-5, requires instructions) |
112+
| `extract_depth` | string | `"basic"` | `basic` or `advanced` |
113+
| `format` | string | `"markdown"` | `markdown` or `text` |
114+
| `select_paths` | array | null | Regex patterns to include |
115+
| `exclude_paths` | array | null | Regex patterns to exclude |
116+
| `allow_external` | boolean | true | Include external domain links |
117+
| `timeout` | float | 150 | Max wait (10-150 seconds) |
118+
119+
### Response Format
120+
121+
```json
122+
{
123+
"base_url": "https://docs.example.com",
124+
"results": [
125+
{
126+
"url": "https://docs.example.com/page",
127+
"raw_content": "# Page Title\n\nContent..."
128+
}
129+
],
130+
"response_time": 12.5
131+
}
132+
```
133+
134+
## Depth vs Performance
135+
136+
| Depth | Typical Pages | Time |
137+
|-------|---------------|------|
138+
| 1 | 10-50 | Seconds |
139+
| 2 | 50-500 | Minutes |
140+
| 3 | 500-5000 | Many minutes |
141+
142+
**Start with `max_depth=1`** and increase only if needed.
143+
144+
## Crawl for Context vs Data Collection
145+
146+
**For agentic use (feeding results into context):** Always use `instructions` + `chunks_per_source`. This returns only relevant chunks instead of full pages, preventing context window explosion.
147+
148+
**For data collection (saving to files):** Omit `chunks_per_source` to get full page content.
149+
150+
## Examples
151+
152+
### For Context: Agentic Research (Recommended)
153+
154+
Use when feeding crawl results into an LLM context:
155+
156+
```bash
157+
curl --request POST \
158+
--url https://api.tavily.com/crawl \
159+
--header "Authorization: Bearer $TAVILY_API_KEY" \
160+
--header 'Content-Type: application/json' \
161+
--data '{
162+
"url": "https://docs.example.com",
163+
"max_depth": 2,
164+
"instructions": "Find API documentation and authentication guides",
165+
"chunks_per_source": 3
166+
}'
167+
```
168+
169+
Returns only the most relevant chunks (max 500 chars each) per page - fits in context without overwhelming it.
170+
171+
### For Context: Targeted Technical Docs
172+
173+
```bash
174+
curl --request POST \
175+
--url https://api.tavily.com/crawl \
176+
--header "Authorization: Bearer $TAVILY_API_KEY" \
177+
--header 'Content-Type: application/json' \
178+
--data '{
179+
"url": "https://example.com",
180+
"max_depth": 2,
181+
"instructions": "Find all documentation about authentication and security",
182+
"chunks_per_source": 3,
183+
"select_paths": ["/docs/.*", "/api/.*"]
184+
}'
185+
```
186+
187+
### For Data Collection: Full Page Archive
188+
189+
Use when saving content to files for later processing:
190+
191+
```bash
192+
curl --request POST \
193+
--url https://api.tavily.com/crawl \
194+
--header "Authorization: Bearer $TAVILY_API_KEY" \
195+
--header 'Content-Type: application/json' \
196+
--data '{
197+
"url": "https://example.com/blog",
198+
"max_depth": 2,
199+
"max_breadth": 50,
200+
"select_paths": ["/blog/.*"],
201+
"exclude_paths": ["/blog/tag/.*", "/blog/category/.*"]
202+
}'
203+
```
204+
205+
Returns full page content - use the script with `output_dir` to save as markdown files.
206+
207+
## Map API (URL Discovery)
208+
209+
Use `map` instead of `crawl` when you only need URLs, not content:
210+
211+
```bash
212+
curl --request POST \
213+
--url https://api.tavily.com/map \
214+
--header "Authorization: Bearer $TAVILY_API_KEY" \
215+
--header 'Content-Type: application/json' \
216+
--data '{
217+
"url": "https://docs.example.com",
218+
"max_depth": 2,
219+
"instructions": "Find all API docs and guides"
220+
}'
221+
```
222+
223+
Returns URLs only (faster than crawl):
224+
225+
```json
226+
{
227+
"base_url": "https://docs.example.com",
228+
"results": [
229+
"https://docs.example.com/api/auth",
230+
"https://docs.example.com/guides/quickstart"
231+
]
232+
}
233+
```
234+
235+
## Tips
236+
237+
- **Always use `chunks_per_source` for agentic workflows** - prevents context explosion when feeding results to LLMs
238+
- **Omit `chunks_per_source` only for data collection** - when saving full pages to files
239+
- **Start conservative** (`max_depth=1`, `limit=20`) and scale up
240+
- **Use path patterns** to focus on relevant sections
241+
- **Use Map first** to understand site structure before full crawl
242+
- **Always set a `limit`** to prevent runaway crawls

0 commit comments

Comments
 (0)