Skip to content

trishadabral/chronoview-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ChronoView 🎯

"Skip to the Good Part."

AI-powered semantic search engine for video content β€” find any moment, instantly.

Python PyTorch FastAPI License


What is ChronoView?

ChronoView transforms any video β€” a lecture, meeting, tutorial, or conference talk β€” into a fully searchable knowledge base using multimodal AI. Type a natural language query and jump to the exact timestamp where that moment occurs. No scrubbing. No guessing. No re-watching hours of content.


The Problem

  • 500+ hours of video are uploaded every minute globally
  • There is no "Ctrl+F" for video content
  • Students waste hours scrubbing lecture recordings
  • Enterprises lose $37B/year to unsearchable meeting recordings
  • Existing tools match keywords β€” not meaning

The Solution

ChronoView processes three parallel data streams from every video:

Stream Model Output
πŸŽ™οΈ Speech OpenAI Whisper Timestamped transcripts
πŸ‘οΈ Visual Vision Transformer (ViT) Scene embeddings
πŸ“„ On-screen text Tesseract OCR Slide & code text

All three streams are fused using a CLIP-inspired contrastive learning model into a unified semantic embedding β€” stored in a FAISS vector database for millisecond-speed retrieval.


Key Features

  • πŸ” Semantic Search β€” natural language query β†’ exact timestamp
  • 🧠 Direct Q&A β€” AI-generated answers extracted from the video
  • πŸ“š Auto-Chapters β€” AI-generated titled navigation segments
  • 🌍 Multilingual Search β€” query in any language
  • πŸ”— Shareable Timestamp Links β€” share exact video moments
  • πŸ“Š Engagement Heatmaps β€” analytics on which segments were searched most
  • βœ‚οΈ Highlight Reel Export β€” compile relevant segments into a short clip

Architecture

Video Input
    β”‚
    β–Ό
FFmpeg (segment splitting Β· keyframe extraction Β· audio strip)
    β”‚
    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β–Ό                                      β–Ό                     β–Ό
Whisper ASR                         Tesseract OCR           ViT Model
(Speech β†’ text)                  (Slides & code text)   (Scene encoding)
    β”‚                                      β”‚                     β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                           β”‚
                                           β–Ό
                              CLIP Fusion Model (PyTorch)
                           Unified semantic embedding space
                                           β”‚
                                           β–Ό
                                  FAISS / ChromaDB
                               Vector similarity index
                                           β”‚
                                           β–Ό
                              FastAPI Backend (REST API)
                                           β”‚
                                           β–Ό
                          React Dashboard + Video Player

Tech Stack

Layer Technology
Video Processing FFmpeg
Speech Recognition OpenAI Whisper
Scene Understanding Vision Transformer (ViT)
OCR Tesseract
Semantic Fusion CLIP (PyTorch + HuggingFace)
Vector Search FAISS / ChromaDB
Backend API FastAPI
Frontend React / Streamlit
Analytics Plotly
Storage AWS S3 / Google Cloud Storage
Containerization Docker

Getting Started

Prerequisites

Python 3.10+
NVIDIA GPU (RTX 3060 or higher recommended)
CUDA 11.8+
Node.js 18+ (for React frontend)
FFmpeg installed on system

Installation

# Clone the repository
git clone https://github.com/yourusername/chronoview.git
cd chronoview

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install frontend dependencies
cd frontend
npm install
cd ..

Environment Setup

# Copy environment template
cp .env.example .env

# Add your keys in .env
OPENAI_WHISPER_MODEL=base
HUGGINGFACE_TOKEN=your_token_here
AWS_ACCESS_KEY=your_key_here
AWS_SECRET_KEY=your_secret_here

Running the App

# Step 1 β€” Index a video
python pipeline/index_video.py --input your_video.mp4

# Step 2 β€” Start the backend
uvicorn app.main:app --reload --port 8000

# Step 3 β€” Start the frontend
cd frontend && npm run dev

Open http://localhost:3000 in your browser.


Project Structure

chronoview/
β”‚
β”œβ”€β”€ pipeline/
β”‚   β”œβ”€β”€ ingest.py          # FFmpeg video segmentation
β”‚   β”œβ”€β”€ transcribe.py      # Whisper ASR
β”‚   β”œβ”€β”€ ocr.py             # Tesseract OCR
β”‚   β”œβ”€β”€ vision.py          # ViT scene encoding
β”‚   β”œβ”€β”€ fuse.py            # CLIP fusion model
β”‚   └── index.py           # FAISS vector indexing
β”‚
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py            # FastAPI entry point
β”‚   β”œβ”€β”€ search.py          # Query embedding + retrieval
β”‚   β”œβ”€β”€ qa.py              # Direct Q&A generation
β”‚   └── analytics.py       # Heatmap + usage analytics
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ pages/         # Home, Results, Library, Analytics
β”‚   β”‚   └── components/    # SearchBar, ResultCard, VideoPlayer
β”‚   └── package.json
β”‚
β”œβ”€β”€ models/                # Saved model checkpoints
β”œβ”€β”€ tests/                 # Unit and integration tests
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ requirements.txt
└── README.md

API Reference

Search endpoint

POST /api/search
Content-Type: application/json

{
  "query": "explain gradient descent",
  "video_id": "cs229_lecture4",
  "top_k": 5
}

Response:

{
  "query": "explain gradient descent",
  "ai_answer": "Gradient descent minimizes loss by...",
  "results": [
    {
      "timestamp": "14:32",
      "title": "Gradient descent intuition",
      "snippet": "...learning rate alpha controls step size...",
      "confidence": 0.97,
      "sources": ["audio", "slide"]
    }
  ]
}

Research References

# Paper Venue
1 Radford et al. β€” CLIP (2021) ICML 2021
2 Radford et al. β€” Whisper (2022) arXiv:2212.04356
3 Dosovitskiy et al. β€” ViT (2020) ICLR 2021
4 Johnson et al. β€” FAISS (2019) IEEE Trans. Big Data
5 Liu et al. β€” Video Moment Localization (2023) ACM Computing Surveys

Use Cases

Sector Use Case
πŸŽ“ Education Students search lecture recordings by concept
🏒 Enterprise Teams retrieve decisions from meeting archives
πŸ”¬ Research Scientists index conference talks and webinars
πŸ§‘β€πŸ’» Developers Search coding tutorials for exact implementations
β™Ώ Accessibility Semantic index for hearing-impaired users

Roadmap

  • Multimodal pipeline (Whisper + ViT + OCR)
  • CLIP-based semantic fusion
  • FAISS vector indexing
  • FastAPI search endpoint
  • React dashboard
  • Cross-video search across entire libraries
  • Highlight reel export
  • Mobile app
  • Enterprise SSO integration
  • Fine-tuned domain-specific embedding model

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

# Run tests
pytest tests/

# Format code
black pipeline/ app/

License

MIT License β€” see LICENSE for details.


Author

Tanmay Built for hackathon β€” "Skip to the Good Part."

ChronoView does for video what Google did for the web.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages