Computer Vision Engineer building production ML systems across detection, segmentation, tracking, and geospatial applications. I specialize in deploying optimized CV pipelines (YOLO, RT-DETR, SAM) for sports analytics, satellite imagery, and industrial inspection with real-time inference constraints.
Core Expertise:
- π― Multi-object tracking (ByteTrack, DeepSORT)
- π Model optimization (ONNX, INT8/INT4 quantization, TensorRT)
- β‘ Real-time inference on edge devices
- π οΈ Scalable deployment (FastAPI + Docker + Prometheus)
- π Data annotation (1000+ images via Roboflow, CVAT, QGIS)
Languages & Frameworks
Computer Vision & Deep Learning
MLOps & Deployment
Data Annotation & Geospatial
Production-ready tracking and analytics across volleyball, football, and basketball
Key Achievements:
- β‘ 100 FPS ball detection on CPU (Intel i7) using custom ONNX seq-9 model
- π― 87.3% MOTA player tracking with ByteTrack + Kalman filtering
- π¨ Zero-shot team classification via SigLIP embeddings + KMeans
- π Real-time inference (30-100 FPS) with hybrid CPU/GPU architecture
Tech: YOLO RT-DETR ByteTrack SigLIP ONNX Homography FastAPI
Semantic fashion search across 100K DeepFashion images
Key Achievements:
- π <100ms P95 latency on RTX 3050 GPU
- ποΈ FAISS vector index for 512-dim CLIP embeddings
- πΎ 60% memory efficiency via PyTorch optimizations
- β‘ Dynamic batching (8-32 images) with async FastAPI backend
Tech: CLIP FAISS FastAPI Streamlit Docker PyTorch
Multi-agent GeoAI system for natural language satellite imagery analysis
Key Features:
- π€ Multi-agent orchestration using LangGraph (Phi-3-mini, Moondream VLM, SAM3)
- π°οΈ Google Earth Engine integration (10K+ datasets) with 24-hour caching
- πΊοΈ ChromaDB vector search for similar region discovery
- β‘ Target: <5s latency on RTX 3050 (1.5GB VRAM)
Tech: LangGraph Moondream SAM3 ChromaDB Google Earth Engine Folium
Real-time anomaly detection for surveillance systems
Key Achievements:
- π― 92.47% precision | 83.78% recall | 0.7438 AUC (UCSD Ped2)
- π Non-blocking pipeline with FastAPI + ThreadPoolExecutor
- π Prometheus monitoring for GPU utilization and inference latency
- π Deployed on Render with Streamlit dashboard
Tech: Autoencoder FastAPI Prometheus Docker Streamlit
Speech AI & LLM Optimization
- Built real-time ASR pipeline with Voice Activity Detection, reducing false triggers by 40%
- Deployed FastAPI async speaker diarization handling 50+ concurrent audio streams
- Implemented prompt caching strategies cutting LLM inference costs by $0.02/minute
Industrial Computer Vision
- Developed YOLOv8 + PaddleOCR pipeline achieving 15% accuracy improvement and 61% CER reduction (18%β7%)
- Benchmarked Custom CNNs, R-CNN, and VLMs, selecting YOLO+OCR hybrid for <100ms latency
- Optimized inference for resource-constrained CCTV hardware via ONNX export
π MIT World Peace University | B.Tech, Electronics & Communication Engineering - AI/ML | 2021 - 2025
π Certifications:
- β AI Agents Fundamentals - HuggingFace
- β Google Cloud Computing Foundations - NPTEL
- β Computer Vision Bootcamp - OpenCV
Active contributor to:
- Roboflow - YOLO implementations and optimizations
- HuggingFace - Model documentation and transformers
- Computer vision libraries and tools

