You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> A full-stack AI-powered application that allows users to upload a video and ask context-aware questions about it using CLIP, FAISS, and LLaVA via Segmind API.
4
+
5
+
---
6
+
7
+
## How It Works
8
+
9
+
1.**Upload a Video** via a React + Tailwind frontend.
10
+
2.**Frame Extraction**: Key frames are extracted using OpenCV.
11
+
3.**Embedding**: Frames are embedded using Hugging Face’s CLIP model.
12
+
4.**Indexing**: Embeddings are stored and queried with FAISS.
13
+
5.**Questioning**: User questions are semantically matched to the most relevant frames.
14
+
6.**Answering**: Segmind's LLaVA API generates answers using the retrieved context.
15
+
16
+
---
17
+
18
+
## Tech Stack
19
+
20
+
### Backend (FastAPI)
21
+
- CLIP (Hugging Face `openai/clip-vit-large-patch14`)
0 commit comments