Build a Chrome extension that opens a bottom command bar with a keyboard shortcut, accepts a natural-language instruction, and executes it on the current website through fast, human-like browser actions.
This is a hackathon MVP optimized for a live demo in 12 hours.
Primary demo scenarios:
- Hacker News: “Summarize the top 5 Hacker News articles.”
- Gmail: “Give me a summary of my last 5 unread emails.”
Priority order:
- Human-like but extremely fast navigation (most important for demo wow factor)
- Agent speed
- Reliable output
- Keyboard shortcut (default):
Cmd/Ctrl + Shift + Space - Opens a bottom floating command bar (Spotlight-style).
- User types prompt in command bar.
- Extension shows immediate state:
Planning...->Executing...->Summarizing.... - Agent performs visible browser actions (click/open/back/scroll/read).
- Final answer appears in the command bar panel with:
- concise summary
- sources visited (titles/links)
- optional “show steps” toggle
- Fast enough to feel magical.
- Human-like motion so actions feel understandable and trustworthy.
- Bottom command bar UI in content script.
- Keyboard shortcut to toggle UI.
- Task execution loop for two domains:
news.ycombinator.commail.google.com
- Claude-based planner + step generator.
- Deterministic action executor (click, type, open, back, extract text).
- Final summarization response.
- Basic guardrails + timeout + retry.
- General autonomous browsing across all websites.
- Multi-tab orchestration beyond what is needed for demo.
- Long-term memory, user profiles, auth vaults.
- Complex computer vision interaction.
-
Content Script
- Renders bottom command bar UI.
- Captures page state and DOM candidates.
- Runs action executor directly in page context.
-
Background Service Worker (MV3)
- Orchestrates workflow state.
- Calls Claude API.
- Handles message passing with content script.
-
Claude Agent Layer
- Produces strict JSON plans/actions.
- Uses a small action vocabulary for reliability.
-
Summarizer
- Claude pass over collected text chunks.
- Returns concise final output.
- Keeps execution local in content script for speed.
- Keeps model calls centralized for simpler control.
- Restricts model output format for robust execution.
Use a constrained action schema instead of free-form instructions.
LIST_ITEMS(find candidate list rows with selector hints)CLICKOPEN_IN_SAME_TABBACKWAIT_FOREXTRACT_TEXT(main article/email body)SCROLLDONE
- Claude outputs valid JSON only.
- Executor validates each step before running.
- On failure:
- quick fallback selector attempt
- one retry
- if still failing, return partial result with explicit warning
- Use “micro-animated cursor” overlay:
- very short curved movement to click target
- 80–180ms move + 40ms click pulse
- Real action still executes through direct DOM events for speed.
- Tiny randomized delays (20–80ms) to avoid robotic feel while staying fast.
Prompt example: “Summarize the top 5 hackernews articles.”
Plan:
- Ensure on
news.ycombinator.com(navigate if needed). - Get top 5 story links from
.athingrows. - For each story:
- click/open
- wait for readable content container
- extract article text (first N chars/tokens)
- go back
- Summarize all 5 with bullet points + one-line takeaways.
Fallback:
- If article extraction is weak, use
document.body.innerTexttruncation.
Prompt example: “Give me a summary of my last 5 unread emails.”
Plan:
- Ensure on
mail.google.com. - Filter/select unread threads from inbox list (
is:unreadbehavior via UI or unread row selectors). - For each of first 5 unread:
- open thread
- extract sender + subject + visible body text
- go back to inbox
- Return concise email-by-email summary + suggested priorities.
Gmail reliability notes:
- Prefer robust selectors based on roles/aria-labels over fragile class names.
- Keep extraction tolerant to Gmail DOM changes by using multiple fallback selectors.
- Use Claude for:
- planning next actions
- final summarization
System instruction should enforce:
- output strict JSON with allowed actions only
- max steps per iteration (e.g., 3–5)
- never invent data
- if unsure, request additional extraction step
- Send task + compact page context + current progress.
- Receive next action block.
- Execute actions.
- Feed execution results back.
- Repeat until
DONE. - Run final summarize pass.
- Keep context compact:
- page title
- URL
- top candidate elements
- extracted text snippets only
- Hard limits:
- max pages (5 for demo)
- max extraction chars per page
- max loops + hard timeout
- Extension: Chrome Extension Manifest V3
- Language: TypeScript
- UI: Lightweight HTML/CSS + minimal animation (no heavy framework needed)
- LLM: Claude API (Anthropic)
- Build: Vite (or simple esbuild) for fast iteration
Recommended file layout:
src/background.tssrc/content/ui.tssrc/content/executor.tssrc/content/extractors/hackernews.tssrc/content/extractors/gmail.tssrc/agent/claude.tssrc/agent/schema.tssrc/shared/types.ts
- Set up MV3 extension with background + content script.
- Add keyboard shortcut and bottom bar shell.
- Basic message passing.
- Implement action schema + validator.
- Implement click/open/back/wait/extract primitives.
- Add visible cursor animation layer.
- Implement HN selectors/extraction.
- End-to-end run for top 5 summaries.
- Handle failures + retries.
- Implement unread thread navigation and extraction.
- Add selector fallbacks.
- Stabilize back-navigation and thread indexing.
- Integrate Claude calls.
- Build iterative plan/execute loop.
- Add strict JSON parsing and recovery.
- Reduce waits/timeouts.
- Parallelize where safe (pre-fetch lightweight metadata only).
- Cache prompt scaffolding.
- Add hard limits, timeout messaging, partial-success response.
- Test both demo scenarios repeatedly.
- Refine UI copy/status transitions.
- Add “show steps” panel.
- Prepare fixed demo script and backup prompts.
- Open Hacker News front page.
- Press shortcut, type: “Summarize the top 5 hackernews articles.”
- Let agent visibly click/open/read/back fast.
- Show final summary with 5 bullets.
- Open Gmail inbox.
- Press shortcut, type: “Give me a summary of my last 5 unread emails.”
- Agent opens threads one by one, extracts context, returns to inbox.
- Show concise summary + priority suggestions.
Backup prompt options:
- “Summarize top 3 instead of 5.”
- “Give me only urgent unread emails summary.”
- Gmail DOM instability
- Mitigation: multi-selector fallback, role/aria-first strategy, partial result mode.
- LLM outputs invalid JSON
- Mitigation: schema validation + repair pass + retry with strict correction prompt.
- Slow run time
- Mitigation: hard cap on extracted text, short waits, limited loops, optimized selector queries.
- Navigation errors during live demo
- Mitigation: pre-demo smoke test checklist + fallback prompts (
top 3).
Must-have to win the demo:
- Command bar opens instantly with shortcut.
- Both HN and Gmail tasks complete end-to-end in a single run.
- Agent movement looks human but feels very fast.
- Final output is clear, structured, and grounded in extracted content.
Nice-to-have:
- Step replay panel.
- “Why this summary?” trace with snippet references.
- Optimize for a stable, impressive demo, not for full generalization.
- Prefer deterministic selectors + constrained agent actions over open-ended autonomy.
- If a tradeoff appears, choose what improves:
- perceived speed
- visible competence
- reliability on the two scripted scenarios