GestureCode

   .-~~~-.
  /       \
 | GESTURE |
 |  CODE   |
  \       /
   `-----'
     ||
     ||  🤖 AI Assistant
     ||
  👋 Wave • Pinch • Point 👋
     Create • Scale • Select

A revolutionary AI-powered development environment where you build 3D visualizations and code through natural hand gestures and voice commands. Using real-time computer vision, MediaPipe hand tracking, and Gemini AI, this tool lets you create code by moving your hands in space.

What You Can Do

Gesture-Based Coding: Move your hands to create 3D objects and visualizations instantly
Voice Collaboration: Talk to your AI sidekick for help and complex operations
Real-Time Feedback: See your hand movements tracked live on camera
Sandbox Execution: Code runs safely in an isolated environment
Permission-Based Building: AI asks before completing complex tasks

Key Features

Hand Tracking & Gestures

Real-time hand detection with 21 landmark tracking per hand
Visual feedback showing hand positions and connections
Gesture interpretation for code generation
Support for both hands simultaneously

AI-Powered Development

Gemini Realtime for voice interaction and vision analysis
Code execution in sandboxed Python environment
3D visualization with matplotlib
Collaborative building with explicit user permission

Multiple Applications

Code Builder: Main hand-controlled coding interface
Posture Analyzer: Ergonomic monitoring for desk work

Quick Start

Prerequisites

Python 3.11+ with uv package manager
API Keys:
- Gemini API Key - for AI vision and voice
- Stream API Credentials - for video infrastructure

Installation

Clone and install dependencies:

git clone <repository-url>
cd gesturecode
uv sync

Set up environment variables:

cp .env.example .env
# Edit .env with your API keys

Run any example:

# Main gesture-controlled code builder
uv run python -m gesturecode.hand_code_builder

# Lightweight focus tracker
uv run python -m gesturecode.focus_tracker

# Desk posture analyzer
uv run python -m gesturecode.desk_posture_analyzer

How to Use

    Gesture Controls:
     _____
    /     \
   |  PINCH |     Move hands close → CREATE
    \_____/
       ||
       ||  Move apart → SCALE
       ||
    /\_/\  Point → SELECT
   ( o.o )
    > ^ <   Drag to 🗑️ → DELETE

Basic Usage

Start the application - Your browser will open with the video interface
Grant camera/microphone permissions when prompted
Move your hands - You'll see real-time tracking with landmarks and connections
Gesture to build - Pinch with both hands close together to create objects
Talk to AI - Ask "Can you help me build a sphere?" for complex operations

Gesture Controls

Pinch both hands together → Create new object instantly
Move hands apart → Scale object bigger/smaller
Point with index finger → Select objects
Drag to red bin → Delete objects
Voice commands → Complex operations and AI assistance

AI Collaboration

The AI sidekick will:

Ask permission before completing complex tasks
Understand context of what you're building
Generate code based on your gestures and requests
Execute code safely in sandboxed environment

Architecture

Core Components

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Hand Tracking │    │   AI Processing  │    │  Code Execution │
│   MediaPipe     │───▶│   Gemini Realtime│───▶│   CodeBox       │
│   (21 landmarks)│    │   + Vision       │    │   Sandbox       │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ Video Overlay   │    │ Voice Commands   │    │ 3D Visualization│
│ Stream Edge     │    │ Permission Flow  │    │ Matplotlib      │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Key Technologies

Vision Agents: Real-time AI framework with video processing
MediaPipe Hands: 21-landmark hand tracking at 30 FPS
Gemini Realtime: Multimodal AI with vision and voice
Stream Edge: Low-latency video infrastructure
CodeBox API: Sandboxed Python code execution
Matplotlib 3D: Real-time visualization rendering

⚙️ Customization

Adjusting Performance

Change processing FPS (higher = more responsive, more expensive):

# In src/hand_code_builder.py
llm = gemini.Realtime(fps=5)  # Default: 5 FPS for responsive gesture tracking
# or
llm = gemini.Realtime(fps=10)  # Higher FPS for very fast interactions

Adjust hand tracking FPS:

# In src/hand_code_builder.py
MediaPipeHandsProcessor(fps=30)  # Default: 30 FPS tracking

Using Different AI Models

Switch to OpenAI instead of Gemini:

from vision_agents.plugins import openai

llm = openai.Realtime(fps=5)

Modifying AI Behavior

Edit the AI instructions in docs/ai/builder_sidekick.md to change:

Personality and communication style
Permission requirements and safety boundaries
Code generation patterns and preferences
Gesture interpretation rules

Alternative Applications

Included Examples

Desk Posture Analyzer

Monitor ergonomics and posture with voice feedback:

uv run python -m gesturecode.desk_posture_analyzer

Features:

Real-time posture analysis with YOLO pose detection
Ergonomic coaching and feedback
Workstation setup recommendations

Focus Tracker

Monitor presence and work patterns with gentle productivity coaching:

uv run python -m gesturecode.focus_tracker

Features:

Pose detection for presence monitoring at your desk
Work session tracking and break pattern analysis
Gentle productivity reminders and insights
Supportive coaching focused on healthy work habits

Documentation

Getting Started

Development

Features

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

This project is open source and available under the MIT License.

Acknowledgments

Special thanks to GetStream/Vision-Agents - This project is built upon their incredible open-source framework for real-time AI vision applications
Built with Vision Agents framework
Hand tracking powered by MediaPipe
AI vision from Google Gemini
Video infrastructure by Stream
Code execution via CodeBox API

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
src/gesturecode		src/gesturecode
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
builder_sidekick.md		builder_sidekick.md
desk_posture_coach.md		desk_posture_coach.md
focus_coach.md		focus_coach.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

ArkMaster123/gesturecode

Folders and files

Latest commit

History

Repository files navigation

GestureCode

What You Can Do

Key Features

Hand Tracking & Gestures

AI-Powered Development

Multiple Applications

Quick Start

Prerequisites

Installation

How to Use

Basic Usage

Gesture Controls

AI Collaboration

Architecture

Core Components

Key Technologies

⚙️ Customization

Adjusting Performance

Using Different AI Models

Modifying AI Behavior

Alternative Applications

Included Examples

Desk Posture Analyzer

Focus Tracker

Documentation

Getting Started

Development

Features

Contributing

License

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages