.-~~~-.
/ \
| GESTURE |
| CODE |
\ /
`-----'
||
|| π€ AI Assistant
||
π Wave β’ Pinch β’ Point π
Create β’ Scale β’ Select
A revolutionary AI-powered development environment where you build 3D visualizations and code through natural hand gestures and voice commands. Using real-time computer vision, MediaPipe hand tracking, and Gemini AI, this tool lets you create code by moving your hands in space.
- Gesture-Based Coding: Move your hands to create 3D objects and visualizations instantly
- Voice Collaboration: Talk to your AI sidekick for help and complex operations
- Real-Time Feedback: See your hand movements tracked live on camera
- Sandbox Execution: Code runs safely in an isolated environment
- Permission-Based Building: AI asks before completing complex tasks
- Real-time hand detection with 21 landmark tracking per hand
- Visual feedback showing hand positions and connections
- Gesture interpretation for code generation
- Support for both hands simultaneously
- Gemini Realtime for voice interaction and vision analysis
- Code execution in sandboxed Python environment
- 3D visualization with matplotlib
- Collaborative building with explicit user permission
- Code Builder: Main hand-controlled coding interface
- Posture Analyzer: Ergonomic monitoring for desk work
- Python 3.11+ with uv package manager
- API Keys:
- Gemini API Key - for AI vision and voice
- Stream API Credentials - for video infrastructure
- Clone and install dependencies:
git clone <repository-url>
cd gesturecode
uv sync- Set up environment variables:
cp .env.example .env
# Edit .env with your API keys- Run any example:
# Main gesture-controlled code builder
uv run python -m gesturecode.hand_code_builder
# Lightweight focus tracker
uv run python -m gesturecode.focus_tracker
# Desk posture analyzer
uv run python -m gesturecode.desk_posture_analyzer Gesture Controls:
_____
/ \
| PINCH | Move hands close β CREATE
\_____/
||
|| Move apart β SCALE
||
/\_/\ Point β SELECT
( o.o )
> ^ < Drag to ποΈ β DELETE
- Start the application - Your browser will open with the video interface
- Grant camera/microphone permissions when prompted
- Move your hands - You'll see real-time tracking with landmarks and connections
- Gesture to build - Pinch with both hands close together to create objects
- Talk to AI - Ask "Can you help me build a sphere?" for complex operations
- Pinch both hands together β Create new object instantly
- Move hands apart β Scale object bigger/smaller
- Point with index finger β Select objects
- Drag to red bin β Delete objects
- Voice commands β Complex operations and AI assistance
The AI sidekick will:
- Ask permission before completing complex tasks
- Understand context of what you're building
- Generate code based on your gestures and requests
- Execute code safely in sandboxed environment
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Hand Tracking β β AI Processing β β Code Execution β
β MediaPipe βββββΆβ Gemini RealtimeβββββΆβ CodeBox β
β (21 landmarks)β β + Vision β β Sandbox β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Video Overlay β β Voice Commands β β 3D Visualizationβ
β Stream Edge β β Permission Flow β β Matplotlib β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
- Vision Agents: Real-time AI framework with video processing
- MediaPipe Hands: 21-landmark hand tracking at 30 FPS
- Gemini Realtime: Multimodal AI with vision and voice
- Stream Edge: Low-latency video infrastructure
- CodeBox API: Sandboxed Python code execution
- Matplotlib 3D: Real-time visualization rendering
Change processing FPS (higher = more responsive, more expensive):
# In src/hand_code_builder.py
llm = gemini.Realtime(fps=5) # Default: 5 FPS for responsive gesture tracking
# or
llm = gemini.Realtime(fps=10) # Higher FPS for very fast interactionsAdjust hand tracking FPS:
# In src/hand_code_builder.py
MediaPipeHandsProcessor(fps=30) # Default: 30 FPS trackingSwitch to OpenAI instead of Gemini:
from vision_agents.plugins import openai
llm = openai.Realtime(fps=5)Edit the AI instructions in docs/ai/builder_sidekick.md to change:
- Personality and communication style
- Permission requirements and safety boundaries
- Code generation patterns and preferences
- Gesture interpretation rules
Monitor ergonomics and posture with voice feedback:
uv run python -m gesturecode.desk_posture_analyzerFeatures:
- Real-time posture analysis with YOLO pose detection
- Ergonomic coaching and feedback
- Workstation setup recommendations
Monitor presence and work patterns with gentle productivity coaching:
uv run python -m gesturecode.focus_trackerFeatures:
- Pose detection for presence monitoring at your desk
- Work session tracking and break pattern analysis
- Gentle productivity reminders and insights
- Supportive coaching focused on healthy work habits
We welcome contributions! Please see our Contributing Guide for details.
This project is open source and available under the MIT License.
- Special thanks to GetStream/Vision-Agents - This project is built upon their incredible open-source framework for real-time AI vision applications
- Built with Vision Agents framework
- Hand tracking powered by MediaPipe
- AI vision from Google Gemini
- Video infrastructure by Stream
- Code execution via CodeBox API