Product Name: Stun Category: UI Navigator – Multimodal Spatial AI Agent Hackathon: Gemini Live Agent Challenge
Stun is an infinite multimodal canvas where AI visually understands, reorganizes, and navigates knowledge directly within the workspace. Instead of responding inside chat boxes, AI executes structured UI actions such as moving, connecting, highlighting, and transforming content spatially.
Modern knowledge workflows are fragmented across:
- Tabs
- Documents
- Whiteboards
- Videos
- Notes
- Chat-based AI systems
Current AI systems operate in isolated text interfaces. Whiteboards are static. Automation tools lack contextual awareness.
There is no unified system where:
- AI visually interprets the workspace
- Knowledge is spatially organized
- Interaction is multimodal
- AI directly executes UI-level actions
Stun transforms AI from a conversational assistant into a spatial UI navigator.
AI does not reply with text blocks. AI reshapes the workspace itself.
Every insight becomes a node. Every relationship becomes a connection. Every command results in visible transformation.
- Infinite canvas (pan & zoom)
- Text and image voice , nodes
- Node connections
- Voice command input (press-to-speak)
- Screenshot-based multimodal reasoning
- Structured JSON action execution
- Firebase Authentication
- Firestore board persistence
- Cloud Run backend deployment
- Vertex AI integration
- Mind-map transformation mode
- Roadmap transformation layout
- Highlight animations
- AI-generated summary nodes
- Media upload support
- Real-time collaboration
- Mobile-first optimization
- OS-level automation
- Advanced semantic embedding clustering
- Plugin marketplace
| Category | Technology |
|---|---|
| Framework | Next.js (App Router) |
| Language | TypeScript |
| Canvas Engine | React Flow |
| State Management | Zustand |
| Voice Input | Web Speech API |
| Screenshot Capture | html2canvas |
| Styling | SCss |
| HTTP Client | Axios |
| Auth Client | Firebase SDK |
| Category | Technology |
|---|---|
| Runtime | Node.js |
| Framework | Express |
| AI SDK | Google GenAI SDK |
| AI Platform | Vertex AI |
| Validation | Zod |
| Deployment | Cloud Run |
| Containerization | Docker |
| Logging | Cloud Logging |
| Component | Technology |
|---|---|
| Database | Firestore |
| Media Storage | Cloud Storage |
| Authentication | Firebase Authentication |
| Access Control | Firestore Security Rules |
User ↓ Next.js Frontend (React Flow Canvas) ↓ (Authenticated HTTPS + JWT) Cloud Run Backend (Express) ↓ Vertex AI (Gemini Multimodal Model) ↓ Firestore / Cloud Storage ↓ Frontend Executes Structured Action Plan
- User issues voice command.
- Frontend captures canvas screenshot.
- Node metadata collected from state.
- Request sent to backend with JWT.
- Backend verifies authentication.
- Backend calls Gemini (multimodal input).
- Gemini returns structured JSON action plan.
- Backend validates action safety.
- Frontend executes spatial transformations.
Stun/
│
├── web/ # Next.js Frontend
│ ├── app/
│ │ └── board/[id]/page.tsx
│ │
│ ├── components/
│ │ ├── canvas/
│ │ │ ├── CanvasRoot.tsx
│ │ │ ├── NodeRenderer.tsx
│ │ │ ├── EdgeRenderer.tsx
│ │ │ └── CameraController.tsx
│ │ │
│ │ ├── nodes/
│ │ │ ├── TextNode.tsx
│ │ │ └── ImageNode.tsx
│ │ │
│ │ ├── voice/
│ │ │ └── VoiceOrb.tsx
│ │ │
│ │ └── layout/
│ │ ├── TopBar.tsx
│ │ └── SidePanel.tsx
│ │
│ ├── hooks/
│ │ ├── useBoard.ts
│ │ ├── useVoice.ts
│ │ └── useScreenshot.ts
│ │
│ ├── store/
│ │ └── board.store.ts
│ │
│ └── lib/
│ ├── api.ts
│ ├── firebase.ts
│ └── action-executor.ts
│
├── backend/
│ ├── src/
│ │ ├── index.ts
│ │ │
│ │ ├── routes/
│ │ │ ├── ai.route.ts
│ │ │ ├── board.route.ts
│ │ │ ├── auth.route.ts
│ │ │ └── health.route.ts
│ │ │
│ │ ├── services/
│ │ │ ├── gemini.service.ts
│ │ │ └── board.service.ts
│ │ │
│ │ ├── prompts/
│ │ │ └── planner.prompt.ts
│ │ │
│ │ ├── middleware/
│ │ │ └── auth.middleware.ts
│ │ │
│ │ ├── validators/
│ │ │ └── action.validator.ts
│ │ │
│ │ └── config/
│ │ ├── vertex.ts
│ │ └── firestore.ts
│ │
│ ├── Dockerfile
│ └── package.json
│
├── infra/
│ ├── deploy.sh
│ └── cloud-run.yaml
│
└── PRD.md
Returns authenticated user information. Auth: Required
Returns service health status. Auth: Not Required
Creates new board. Auth: Required Service: Firestore
Retrieves board data. Auth: Required Security: Owner validation
Updates board state. Auth: Required Service: Firestore
Generates structured spatial action plan.
Auth: Required External Service: Vertex AI
{
"boardId": "board123",
"command": "Turn this into a roadmap",
"screenshot": "base64",
"nodes": []
}{
"actions": [
{
"type": "move",
"nodeId": "node-1",
"to": { "x": 400, "y": 200 }
}
]
}Allowed action types:
- move
- connect
- highlight
- zoom
- group
- Firebase Authentication (JWT)
- JWT verification middleware
- Firestore ownership validation
- Strict action type whitelist
- Node existence validation
- Coordinate boundary checks
- Service account isolation for Vertex AI
- AI response under 5 seconds
- Safe JSON validation before execution
- Cloud-native deployment
- Secure middleware enforcement
- Stable real-time demo performance
- Cloud Run (Backend Hosting)
- Vertex AI (Gemini Multimodal Model)
- Firestore (Persistence)
- Cloud Storage (Optional Media)
- IAM (Service Accounts)
✔ Uses Gemini Model ✔ Uses Google GenAI SDK ✔ Uses Vertex AI ✔ Hosted on Cloud Run ✔ Uses at least one Google Cloud service ✔ Multimodal screenshot interpretation ✔ Structured executable UI output ✔ Deployment proof provided
- Real-time collaboration
- Semantic clustering
- Continuous live observation mode
- Plugin architecture
- Education-focused templates
- Accessibility enhancements
Stun redefines human–AI interaction by transforming AI from a text-based assistant into a spatial navigator that reshapes knowledge directly inside an infinite canvas.
AI does not respond.
AI navigates.