A sidecar service that streams live screenshots and viewer state from Neuroglancer to a browser panel, featuring AI narration, natural language querying of organelle data, interactive plotting and analysis, and movie recording capabilities powered by Gemini, Claude, or local Ollama.
- Live Screenshot Streaming: Debounced 0.1-5 fps JPEG streaming
- State Tracking: Position, zoom, orientation, layer visibility, and segment selection
- WebSocket Updates: Real-time updates to browser panel
- AI Narration: Context-aware descriptions using cloud (Gemini/Claude) or local (Ollama) AI
- Natural Language Query: Ask questions about organelles in plain English
- Agent-Driven Visualization: AI interprets queries to show/hide segments intelligently
- AI-Powered Analysis Mode: Generate and execute Python code for data analysis via natural language
- Voice Synthesis: Browser-based TTS or edge-tts with multiple voices
- Movie Recording: Record navigation sessions with synchronized narration
- Multiple Transition Modes: Direct cuts, crossfade, or smooth state interpolation
- Responsive UI: Clean dark theme with status indicators and narration history
- Explore Mode with Verbose Logging: Real-time progress tracking shows screenshot capture, AI narration generation, and audio synthesis status
# Install dependencies with pixi
pixi install
# Start the server
pixi run start
# Or with custom settings
pixi run python server/main.py --ng-port 9999 --web-port 8090 --fps 2# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r server/requirements.txt
# Start the server
python server/main.pyJust open one URL: http://localhost:8090/
The web panel now includes:
- Embedded Neuroglancer viewer (left) with sample EM data pre-loaded
- Explore Mode (default, right panel):
- Screenshots tab: Live screenshots with AI narrations as you navigate
- Verbose Log tab: Real-time progress tracking (πΈ Screenshot captured β π€ Sent to AI β β³ Waiting β β Narration received β π Audio generated)
- Query Mode: Natural language questions about organelles with AI-driven visualization
- State tracking: Position, zoom, layers, selections
- Recording controls: Capture and compile narrated tours with multiple transition modes
Navigate in the embedded viewer and watch the live stream update automatically!
Ask questions about organelles in plain English:
Examples:
- "show the largest mitochondrion"
- "how many nuclei are there?"
- "take me to the smallest peroxisome"
- "show mitochondria larger than 1e11 nmΒ³"
- "also show nucleus 5" (adds to current selection)
- "hide all mitochondria" (removes from view)
The AI agent:
- Converts your question to SQL
- Queries the organelle database
- Interprets the results based on query semantics
- Updates the visualization intelligently
- Provides a natural language answer with voice narration
See AGENT_DRIVEN_VISUALIZATION.md for technical details.
Switch to Analysis Mode to generate and execute Python code for data analysis using natural language:
Examples:
- "Plot the volume distribution of mitochondria"
- "Show me a histogram of nucleus sizes"
- "Create a scatter plot comparing mitochondria volume vs surface area"
The AI analysis agent:
- Converts your question to Python code
- Executes the code in a sandboxed container (Docker or Apptainer)
- Displays generated plots and statistics
- Tracks session metadata and timing information
Container Support:
- Docker: Default for most systems
- Apptainer: Automatic fallback for HPC/cluster environments
See ANALYSIS_MODE.md for technical details and API documentation.
- Start Recording: Click "Start Recording" to begin capturing frames
- Navigate: Explore the dataset - narration triggers automatically on significant view changes
- Stop Recording: Click "Stop Recording" when done
- Create Movie: Choose transition style and click "Create Movie"
- Direct Cuts: Instant transitions with 2-second silent pauses
- Crossfade: Smooth dissolve transitions between views
- State Interpolation: Neuroglancer renders smooth camera movements
Movies are saved to recordings/<session_id>/output/movie.mp4 with:
- 960x540 resolution
- Frame duration matches audio narration length
- 2-second silent transitions between narrations
- Synchronized audio track
See QUICKSTART.md for detailed usage guide.
- Neuroglancer viewer with state change callbacks
- Summarizes position, zoom, orientation, layers, and selections
- Filters meaningful changes to avoid spam
- Background thread captures screenshots when viewer state is "dirty"
- Converts PNG to JPEG for bandwidth efficiency
- Debounced to max 2 fps (configurable)
- FastAPI server with WebSocket endpoint
- Sends
{type: "frame", jpeg_b64: "...", state: {...}}messages - Browser displays live frames and state summary
- Triggers narration on meaningful state changes
- Uses Gemini, Claude, or local Ollama to describe current view
- Context-aware prompts for EM/neuroanatomy
- Real-time WebSocket broadcasting to all clients
- Configurable thresholds and intervals
- Browser-based TTS or edge-tts with multiple voices
- Automatic audio playback in browser
- Audio synchronized with narration display
- Saved to recordings for movie compilation
- Record navigation sessions with frame capture
- Three transition modes: cuts, crossfade, interpolation
- Frame duration matches narration audio length
- 2-second silent transitions between narrations
- FFmpeg-based video compilation with audio sync
- Neuroglancer video_tool integration for smooth camera movements
- SQLite database for organelle metadata (volume, position, etc.)
- AI-powered natural language to SQL conversion
- Multi-query support with automatic splitting
- Intent classification: navigation, visualization, or informational
- Agent-driven visualization state updates
- Semantic understanding: "show X" vs "also show X" vs "hide X"
- Context-aware command generation using current viewer state
- Natural language to Python code generation
- Sandboxed code execution (Docker/Apptainer)
- Interactive plot generation and visualization
- Session metadata tracking with timing breakdown
- Comprehensive results management with REST API
- Automatic container detection for HPC environments
tourguide/
βββ server/
β βββ main.py # Entry point
β βββ ng.py # Neuroglancer viewer + state tracking
β βββ stream.py # FastAPI WebSocket server + query/analysis endpoints
β βββ narrator.py # AI narration engine
β βββ query_agent.py # Natural language query agent
β βββ analysis_agent.py # Natural language to Python code agent
β βββ docker_sandbox.py # Docker container sandbox
β βββ apptainer_sandbox.py # Apptainer container sandbox
β βββ analysis_results.py # Analysis session metadata manager
β βββ organelle_db.py # SQLite database for organelle metadata
β βββ recording.py # Movie recording and compilation
β βββ requirements.txt # Legacy pip requirements
βββ web/
β βββ index.html # Web UI with recording and analysis controls
β βββ app.js # WebSocket client + recording + analysis logic
β βββ style.css # Styling with spinner animations
β βββ ng-screenshot-handler.js # Neuroglancer screenshot capture
βββ organelle_data/ # Organelle CSV files and database (gitignored)
βββ analysis_results/ # Analysis session outputs (gitignored)
βββ containers/ # Container images (gitignored)
βββ recordings/ # Recorded sessions (auto-created)
βββ pixi.toml # Pixi environment config
βββ AGENT_DRIVEN_VISUALIZATION.md # Agent visualization docs
βββ ANALYSIS_MODE.md # Analysis mode documentation
βββ README.md
--ng-host HOST Neuroglancer bind address (default: 127.0.0.1)
--ng-port PORT Neuroglancer port (default: 9999)
--web-host HOST Web server bind address (default: 0.0.0.0)
--web-port PORT Web server port (default: 8090)
--fps FPS Maximum screenshot frame rate (default: 2)
- Stage 0: Repository structure
- Stage 1: Neuroglancer state capture
- Stage 2: Screenshot loop
- Stage 3: WebSocket streaming
- Stage 4: AI narrator
- Stage 5: Voice/TTS
- Stage 6: Movie recording and compilation
- Stage 7: Natural language query system with agent-driven visualization
- Stage 8: Analysis mode with AI code generation and sandboxed execution
- Stage 9: Quality upgrades (ROI crop, advanced UI controls)
-
Get a free API key from https://aistudio.google.com/app/apikey
-
Create a
.envfile:cp .env.example .env
-
Add your API key to
.env:GOOGLE_API_KEY=your_api_key_here
-
Start the server:
pixi run start
For completely local, private, and free AI narration with voice:
-
Install Ollama from ollama.com
-
Download the vision model:
ollama pull llama3.2-vision
-
Install TTS (optional):
pixi run pip install kokoro soundfile sounddevice
-
Enable local mode in
.env:USE_LOCAL=true
-
Start the server:
pixi run start
See LOCAL_SETUP.md for detailed local setup instructions.
Use ANTHROPIC_API_KEY in .env instead of GOOGLE_API_KEY.
Navigate in Neuroglancer and watch the AI narrate your exploration in real-time!
To run on a GPU cluster node, use mode=shared when requesting GPUs:
bsub -P cellmap -n 12 -gpu "num=1:mode=shared" -q gpu_h100 -Is /bin/bashImportant: The mode=shared parameter is required! Without it, the GPU will be in exclusive mode, preventing both PyTorch (Chatterbox) and Ollama from using the GPU simultaneously.
Once on the node, run the application normally:
pixi run startSee CLUSTER_TROUBLESHOOTING.md for detailed cluster setup and troubleshooting.
- Python 3.10+
- FastAPI & Uvicorn
- Pillow
- Neuroglancer
- FFmpeg (for movie compilation)
- edge-tts (for voice synthesis, optional)
BSD 3-Clause License - see LICENSE file for details.