AI-powered video surveillance system that detects weapons and suspicious motion, analyzes events with Gemini, and enables semantic search of incidents.
- Dual Detection: YOLO for weapons, VideoMAE for suspicious motion
- AI Analysis: Gemini provides detailed event summaries
- Semantic Search: Query incidents using natural language via Qdrant
- Real-time Monitoring: Live detection with SMS alerts
- Batch Processing: Analyze uploaded videos and extract clips
anomaly-detection-system/
├── pyproject.toml # uv project configuration
├── .env # Environment variables (create from .env.example)
├── src/
│ └── video_surveillance_system/
│ ├── config.py # All configuration constants
│ ├── anomaly_models.py # YOLO + VideoMAE wrapper
│ ├── gemini_service.py # Gemini video analysis
│ ├── vector_db.py # Qdrant operations
│ └── app.py # Main Flask application
├── uploads/ # Uploaded videos
├── clips/ # Extracted anomaly clips
├── temp/ # Temporary processing files
└── models/ # Anomaly models
## Setup
### 1. Install uv (if not already installed)
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
# Initialize with uv
uv sync
# Or manually install dependencies
uv add flask flask-cors opencv-python torch torchvision transformers ultralytics moviepy twilio qdrant-client google-genai python-dotenv pillow numpy tqdmPlace these files in models directory:
Guns-100-11m.pt- YOLO weapon detection modelvideomae_model_A_binary.pth- VideoMAE motion detection model
cp .env.example .env
# Edit .env with your API keysRequired API keys:
- Twilio: Account SID, Auth Token, Phone Number
- Qdrant: Cloud URL and API key
- Gemini: Google AI API key
uv run src/video_surveillance_system/app.pyThe server will start on http://0.0.0.0:5001
POST /upload
- Upload video for batch analysis
- Form data:
video(file)
POST /start_simulation
- Start real-time monitoring
- Form data:
demo_video(file),phone_number(string)
POST /stop_realtime
- Stop real-time monitoring
GET /video_feed
- Live video stream (MJPEG)
GET /status
- Current system status
GET /results
- List all extracted clips
GET /results/
- Download specific clip
POST /chat
- Query detected events
- JSON body:
{"question": "your question here"} - Returns:
{"answer": "..."}
- Upload Video → Video saved to
uploads/ - Motion Preprocessing → Remove static frames
- Anomaly Detection → Run YOLO + VideoMAE on frames
- Event Merging → Combine overlapping detections
- Clip Extraction → Save anomaly segments to
clips/ - Gemini Analysis → Generate detailed event summaries
- Vector Storage → Store embeddings in Qdrant
- Frame Capture → Read frames from video source
- Detection → Check each frame for anomalies
- Debouncing → Confirm anomaly across multiple frames
- Recording → Save clip with pre/post-roll
- SMS Alert → Notify via Twilio
- Streaming → Live feed available at
/video_feed
- User asks question via
/chat - Question embedded using Gemini embeddings
- Semantic search in Qdrant vector DB
- Relevant events retrieved
- Gemini generates answer from context
Edit config.py to adjust:
- Detection thresholds:
CONFIDENCE_THRESHOLD_MOTION,CONFIDENCE_THRESHOLD_OBJECT - Real-time parameters:
DEBOUNCE_COUNT,BUFFER_SECONDS,POST_ROLL_FRAMES - Processing settings:
NUM_FRAMES,REALTIME_FRAME_SKIP,TARGET_CLIP_SIZE_MB
- GPU recommended for real-time performance
- Clips are saved as H.264 MP4 files
- Vector DB persists across sessions
- SMS alerts have 60-second cooldown by default