MattMagg
diff --git a/‎.agent/rules/specify-rules.md‎
Lines changed: 29 additions & 0 deletions b/‎.agent/rules/specify-rules.md‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎.agent/skills/crawl/SKILL.md‎
Lines changed: 242 additions & 0 deletions b/‎.agent/skills/crawl/SKILL.md‎
Lines changed: 242 additions & 0 deletions
@@ -0,0 +1,29 @@
+# claw-dash Development Guidelines
+
+Auto-generated from all feature plans. Last updated: 2026-02-22
+
+## Active Technologies
+
+- TypeScript / Node.js + Next.js (App Router), React, `lucide-react`, `shadcn/ui`, Tailwind CSS, `zod`, `trpc`, `@xyflow/react` (003-agent-orchestration-ui)
+
+## Project Structure
+
+```text
+src/
+tests/
+```
+
+## Commands
+
+npm test && npm run lint
+
+## Code Style
+
+TypeScript / Node.js: Follow standard conventions
+
+## Recent Changes
+
+- 003-agent-orchestration-ui: Added TypeScript / Node.js + Next.js (App Router), React, `lucide-react`, `shadcn/ui`, Tailwind CSS, `zod`, `trpc`, `@xyflow/react`
+
+<!-- MANUAL ADDITIONS START -->
+<!-- MANUAL ADDITIONS END -->
@@ -0,0 +1,242 @@
+---
+name: crawl
+description: "Crawl any website and save pages as local markdown files. Use when you need to download documentation, knowledge bases, or web content for offline access or analysis. No code required - just provide a URL."
+---
+
+# Crawl Skill
+
+Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.
+
+## Authentication
+
+The script uses OAuth via the Tavily MCP server. **No manual setup required** - on first run, it will:
+1. Check for existing tokens in `~/.mcp-auth/`
+2. If none found, automatically open your browser for OAuth authentication
+
+> **Note:** You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. [Sign up at tavily.com](https://tavily.com) first if you don't have an account.
+
+### Alternative: API Key
+
+If you prefer using an API key, get one at https://tavily.com and add to `~/.claude/settings.json`:
+```json
+{
+  "env": {
+    "TAVILY_API_KEY": "tvly-your-api-key-here"
+  }
+}
+```
+
+## Quick Start
+
+### Using the Script
+
+```bash
+./scripts/crawl.sh '<json>' [output_dir]
+```
+
+**Examples:**
+```bash
+# Basic crawl
+./scripts/crawl.sh '{"url": "https://docs.example.com"}'
+
+# Deeper crawl with limits
+./scripts/crawl.sh '{"url": "https://docs.example.com", "max_depth": 2, "limit": 50}'
+
+# Save to files
+./scripts/crawl.sh '{"url": "https://docs.example.com", "max_depth": 2}' ./docs
+
+# Focused crawl with path filters
+./scripts/crawl.sh '{"url": "https://example.com", "max_depth": 2, "select_paths": ["/docs/.*", "/api/.*"], "exclude_paths": ["/blog/.*"]}'
+
+# With semantic instructions (for agentic use)
+./scripts/crawl.sh '{"url": "https://docs.example.com", "instructions": "Find API documentation", "chunks_per_source": 3}'
+```
+
+When `output_dir` is provided, each crawled page is saved as a separate markdown file.
+
+### Basic Crawl
+
+```bash
+curl --request POST \
+  --url https://api.tavily.com/crawl \
+  --header "Authorization: Bearer $TAVILY_API_KEY" \
+  --header 'Content-Type: application/json' \
+  --data '{
+    "url": "https://docs.example.com",
+    "max_depth": 1,
+    "limit": 20
+  }'
+```
+
+### Focused Crawl with Instructions
+
+```bash
+curl --request POST \
+  --url https://api.tavily.com/crawl \
+  --header "Authorization: Bearer $TAVILY_API_KEY" \
+  --header 'Content-Type: application/json' \
+  --data '{
+    "url": "https://docs.example.com",
+    "max_depth": 2,
+    "instructions": "Find API documentation and code examples",
+    "chunks_per_source": 3,
+    "select_paths": ["/docs/.*", "/api/.*"]
+  }'
+```
+
+## API Reference
+
+### Endpoint
+
+```
+POST https://api.tavily.com/crawl
+```
+
+### Headers
+
+| Header | Value |
+|--------|-------|
+| `Authorization` | `Bearer <TAVILY_API_KEY>` |
+| `Content-Type` | `application/json` |
+
+### Request Body
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `url` | string | Required | Root URL to begin crawling |
+| `max_depth` | integer | 1 | Levels deep to crawl (1-5) |
+| `max_breadth` | integer | 20 | Links per page |
+| `limit` | integer | 50 | Total pages cap |
+| `instructions` | string | null | Natural language guidance for focus |
+| `chunks_per_source` | integer | 3 | Chunks per page (1-5, requires instructions) |
+| `extract_depth` | string | `"basic"` | `basic` or `advanced` |
+| `format` | string | `"markdown"` | `markdown` or `text` |
+| `select_paths` | array | null | Regex patterns to include |
+| `exclude_paths` | array | null | Regex patterns to exclude |
+| `allow_external` | boolean | true | Include external domain links |
+| `timeout` | float | 150 | Max wait (10-150 seconds) |
+
+### Response Format
+
+```json
+{
+  "base_url": "https://docs.example.com",
+  "results": [
+    {
+      "url": "https://docs.example.com/page",
+      "raw_content": "# Page Title\n\nContent..."
+    }
+  ],
+  "response_time": 12.5
+}
+```
+
+## Depth vs Performance
+
+| Depth | Typical Pages | Time |
+|-------|---------------|------|
+| 1 | 10-50 | Seconds |
+| 2 | 50-500 | Minutes |
+| 3 | 500-5000 | Many minutes |
+
+**Start with `max_depth=1`** and increase only if needed.
+
+## Crawl for Context vs Data Collection
+
+**For agentic use (feeding results into context):** Always use `instructions` + `chunks_per_source`. This returns only relevant chunks instead of full pages, preventing context window explosion.
+
+**For data collection (saving to files):** Omit `chunks_per_source` to get full page content.
+
+## Examples
+
+### For Context: Agentic Research (Recommended)
+
+Use when feeding crawl results into an LLM context:
+
+```bash
+curl --request POST \
+  --url https://api.tavily.com/crawl \
+  --header "Authorization: Bearer $TAVILY_API_KEY" \
+  --header 'Content-Type: application/json' \
+  --data '{
+    "url": "https://docs.example.com",
+    "max_depth": 2,
+    "instructions": "Find API documentation and authentication guides",
+    "chunks_per_source": 3
+  }'
+```
+
+Returns only the most relevant chunks (max 500 chars each) per page - fits in context without overwhelming it.
+
+### For Context: Targeted Technical Docs
+
+```bash
+curl --request POST \
+  --url https://api.tavily.com/crawl \
+  --header "Authorization: Bearer $TAVILY_API_KEY" \
+  --header 'Content-Type: application/json' \
+  --data '{
+    "url": "https://example.com",
+    "max_depth": 2,
+    "instructions": "Find all documentation about authentication and security",
+    "chunks_per_source": 3,
+    "select_paths": ["/docs/.*", "/api/.*"]
+  }'
+```
+
+### For Data Collection: Full Page Archive
+
+Use when saving content to files for later processing:
+
+```bash
+curl --request POST \
+  --url https://api.tavily.com/crawl \
+  --header "Authorization: Bearer $TAVILY_API_KEY" \
+  --header 'Content-Type: application/json' \
+  --data '{
+    "url": "https://example.com/blog",
+    "max_depth": 2,
+    "max_breadth": 50,
+    "select_paths": ["/blog/.*"],
+    "exclude_paths": ["/blog/tag/.*", "/blog/category/.*"]
+  }'
+```
+
+Returns full page content - use the script with `output_dir` to save as markdown files.
+
+## Map API (URL Discovery)
+
+Use `map` instead of `crawl` when you only need URLs, not content:
+
+```bash
+curl --request POST \
+  --url https://api.tavily.com/map \
+  --header "Authorization: Bearer $TAVILY_API_KEY" \
+  --header 'Content-Type: application/json' \
+  --data '{
+    "url": "https://docs.example.com",
+    "max_depth": 2,
+    "instructions": "Find all API docs and guides"
+  }'
+```
+
+Returns URLs only (faster than crawl):
+
+```json
+{
+  "base_url": "https://docs.example.com",
+  "results": [
+    "https://docs.example.com/api/auth",
+    "https://docs.example.com/guides/quickstart"
+  ]
+}
+```
+
+## Tips
+
+- **Always use `chunks_per_source` for agentic workflows** - prevents context explosion when feeding results to LLMs
+- **Omit `chunks_per_source` only for data collection** - when saving full pages to files
+- **Start conservative** (`max_depth=1`, `limit=20`) and scale up
+- **Use path patterns** to focus on relevant sections
+- **Use Map first** to understand site structure before full crawl
+- **Always set a `limit`** to prevent runaway crawls