Snip.AI 🎬✂️

Transcribe, summarize, and chat with any YouTube video — powered by AI.

Snip.AI is a cross-platform Flutter application that lets you paste a YouTube link and instantly get a full transcription, an AI-generated summary, and an interactive chat interface to ask questions about the video's content. Under the hood, it combines faster-whisper for speech recognition, Google Gemini for summarization and chat, and a vector database (Qdrant) for Retrieval-Augmented Generation (RAG).

Features

Feature	Description
🎙️ Transcription	Downloads audio via `yt-dlp` and transcribes it with `faster-whisper` (Tiny model, INT8 quantized for speed)
📝 AI Summarization	Sends the transcript through Google Gemini 2.0 Flash to produce structured, detailed summaries with headings and bullet points
💬 RAG Chat	Ask any question about the video; the system retrieves relevant transcript chunks from Qdrant and feeds them to Gemini for grounded answers
📌 Source Citations	Chat responses include timestamped source segments from the original transcript so you can verify answers
🖥️ Cross-Platform UI	Built with Flutter — runs on Android, iOS, macOS, Linux, and Windows
⚡ Async Processing	Vector DB storage happens in background tasks so the UI is never blocked

Architecture

┌──────────────────────────────────────────────┐
│              Flutter Frontend                │
│  ┌──────────┐          ┌───────────────────┐ │
│  │ Home Page│          │  Results Page     │ │
│  │ (URL     │─────────▶│  ┌─────────────┐ │ │
│  │  Input)  │          │  │   Summary   │ │ │
│  └──────────┘          │  ├─────────────┤ │ │
│                        │  │  RAG Chat   │ │ │
│                        │  └─────────────┘ │ │
│                        └───────────────────┘ │
└────────────────────┬─────────────────────────┘
                     │ HTTP (REST)
                     ▼
┌──────────────────────────────────────────────┐
│           FastAPI Backend (Python)           │
│                                              │
│  POST /transcribe          POST /chat        │
│      │                         │             │
│      ▼                         ▼             │
│  yt-dlp → MP3          Query Embedding       │
│      │                         │             │
│      ▼                         ▼             │
│  faster-whisper         Qdrant Vector DB     │
│  (Whisper Tiny)              │               │
│      │                         ▼             │
│      ▼                    Gemini API         │
│  Gemini API               (Answer)           │
│  (Summary)                                   │
│      │                                       │
│      ▼                                       │
│  Qdrant (Store chunks + summary)             │
└──────────────────────────────────────────────┘

Data Flow

User pastes a YouTube URL → Flutter sends POST /transcribe
Backend downloads audio with yt-dlp, transcribes with faster-whisper
Transcript is chunked into ~30-second segments
Long transcripts are split and summarized in parallel chunks via Gemini
Chunks and summary are embedded with sentence-transformers and stored in Qdrant
Flutter renders the summary using flutter_markdown
User asks a question → Flutter sends POST /chat
Backend embeds the query, searches Qdrant for relevant chunks, and calls Gemini with the retrieved context
Gemini returns a grounded answer with source references

Tech Stack

Frontend

Library	Version	Purpose
Flutter	SDK	Cross-platform UI framework
`google_fonts`	^6.2.1	Montserrat & Poppins typography
`flutter_markdown`	^0.6.18	Rendering AI-generated markdown summaries
`http`	^1.4.0	HTTP client for API calls
`cupertino_icons`	^1.0.8	iOS-style icons

Backend

Library	Purpose
FastAPI	Async REST API framework
`faster-whisper`	Optimized Whisper transcription (INT8)
`yt-dlp`	YouTube audio/video downloading
`sentence-transformers`	Text embedding (`all-MiniLM-L6-v2`, 384-dim)
`qdrant-client`	Vector database client (in-memory mode)
`httpx`	Async HTTP client for Gemini API calls
Google Gemini 2.0 Flash	Summarization and RAG-powered chat
`numpy`	Fallback random embeddings

Prerequisites

Flutter SDK ≥ 3.27.0 (install guide)
Python 3.8+ with pip
Node.js (optional, for tooling)
yt-dlp — installed separately or via pip:
```
pip install yt-dlp
```

ffmpeg — required by yt-dlp for audio conversion:

# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# Windows
winget install ffmpeg

A valid Google Gemini API key (free tier available at Google AI Studio)

Installation

1. Clone the repository

git clone https://github.com/shivansh00011/Snip.AI.git
cd Snip.AI

2. Install Flutter dependencies

flutter pub get

3. Install Python backend dependencies

pip install -r server/requirements.txt

Tip: It is recommended to use a virtual environment:

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r server/requirements.txt

4. Configure your Gemini API key

Open server/main.py and replace the placeholder with your key:

GEMINI_API_KEY = "YOUR_GEMINI_API_KEY_HERE"

Security note: For production use, set this as an environment variable and load it with os.environ.get("GEMINI_API_KEY") instead of hardcoding it.

Running the App

Start the backend server

uvicorn server.main:app --reload --host 127.0.0.1 --port 8000

You should see startup logs confirming that the embedding model, Whisper model, and Qdrant are initialized:

INFO: Loading embedding model...
INFO: Embedding model loaded successfully
INFO: Loading Whisper model...
INFO: Whisper model loaded successfully
INFO: Connecting to Qdrant...
INFO: Qdrant initialized successfully

Visit http://127.0.0.1:8000 to verify the server is running. The health endpoint returns service status.

Start the Flutter app

flutter run

Select your target platform when prompted (Chrome, macOS, Android emulator, etc.).

For a specific platform:

flutter run -d macos       # macOS desktop
flutter run -d chrome      # Web browser
flutter run -d linux       # Linux desktop
flutter run -d windows     # Windows desktop

API Reference

`GET /`

Health check — returns service availability status.

Response:

{
  "message": "Server is up and running 🔥",
  "services": {
    "embedding_model": "Available",
    "whisper_model": "Available",
    "vector_db": "Available"
  }
}

`POST /transcribe`

Downloads audio from a YouTube URL, transcribes it, generates a summary, and stores everything in the vector DB.

Request body:

{
  "youtube_url": "https://www.youtube.com/watch?v=..."
}

Response:

{
  "transcript": "Full transcript text...",
  "summary": "## AI Summary\n\n...",
  "session_id": "uuid-v4",
  "chunks_count": 24,
  "metadata": {
    "title": "Video Title",
    "duration": 3600.0,
    "video_url": "https://...",
    "transcript_id": "uuid-v4"
  }
}

Note: The session_id / transcript_id must be saved on the client to enable chat for this video.

`POST /chat`

Queries the transcribed content using RAG (Retrieval-Augmented Generation).

Request body:

{
  "query": "What did the speaker say about climate change?",
  "session_id": "uuid-v4"
}

Response:

{
  "answer": "According to the transcript, the speaker discussed...",
  "context": {
    "chunks": [
      {
        "text": "...relevant segment...",
        "start_time": 142.5,
        "end_time": 172.5,
        "score": 0.89,
        "chunk_index": 4,
        "type": "chunk"
      }
    ],
    "summary": {
      "text": "## Summary...",
      "type": "summary",
      "score": 1.0
    }
  }
}

`GET /status/{session_id}`

Check whether transcript chunks have been stored in the vector DB for a given session.

Response:

{
  "status": "completed",
  "message": "Transcript processing complete",
  "chunks_available": true,
  "chunks_count": 48
}

Project Structure

Snip.AI/
├── lib/
│   ├── main.dart                          # App entry point & routing
│   └── pages and logic/
│       ├── home.dart                      # URL input & feature showcase UI
│       └── results.dart                   # Summary view + RAG chat UI
│
├── server/
│   └── main.py                            # FastAPI backend (all logic)
│
├── android/                               # Android platform files
├── ios/                                   # iOS platform files
├── macos/                                 # macOS platform files
├── linux/                                 # Linux platform files
├── windows/                               # Windows platform files
├── web/                                   # Web platform files
│
├── pubspec.yaml                           # Flutter dependencies
└── README.md

How It Works

Transcription Pipeline

yt-dlp downloads the YouTube video audio as MP3 (max 5-minute timeout)
faster-whisper (Tiny model with INT8 quantization) transcribes the audio into timed segments
Segments are chunked into ~30-second windows to preserve temporal context

Summarization

Transcripts under ~4,000 words are sent to Gemini in a single request
Longer transcripts are split into 3,000-word chunks, each summarized independently, then a final synthesis pass combines them
The system retries up to 3 times per chunk on failure, with a fallback using the first and last segments

RAG Chat

User queries are embedded using sentence-transformers (all-MiniLM-L6-v2)
Qdrant performs cosine similarity search filtered by session_id to retrieve the top 5 relevant chunks
The video summary (stored as a special point) is always prepended to the context
The assembled context + user query is sent to Gemini with an instruction prompt to provide grounded, citation-backed answers

Embedding Storage

Each ~30-second transcript chunk becomes a Qdrant point with metadata: session_id, start_time, end_time, chunk_index
The full summary is stored as a separate point with type: "summary" for fast retrieval
Storage happens asynchronously via FastAPI BackgroundTasks so the API response is not delayed

Configuration

Setting	Location	Default	Description
Gemini API Key	`server/main.py`	`""`	Required — get from Google AI Studio
Whisper model size	`server/main.py`	`"tiny"`	Options: `tiny`, `base`, `small`, `medium`, `large`
Whisper compute type	`server/main.py`	`"int8"`	Options: `int8`, `float16`, `float32`
Chunk size (seconds)	`server/main.py`	`30.0`	Length of transcript segments for RAG
Embedding model	`server/main.py`	`all-MiniLM-L6-v2`	384-dim sentence transformer
Embedding dimensions	`server/main.py`	`384`	Must match embedding model output
Qdrant mode	`server/main.py`	In-memory	Change to `QdrantClient(host="localhost", port=6333)` for persistence
Backend URL	`lib/pages and logic/home.dart`	`http://127.0.0.1:8000`	Update for production deployments

Known Limitations

In-memory vector DB: Qdrant runs in-memory by default, meaning all stored transcripts are lost when the server restarts. Switch to a persistent Qdrant instance for production use.
Whisper Tiny accuracy: The tiny model is fast but may struggle with heavy accents, technical jargon, or low-quality audio. Use small or medium for better accuracy at the cost of speed.
Gemini API rate limits: Free-tier Gemini accounts have rate and quota limits that may cause failures on very long videos.
API key exposure: The Gemini API key is currently hardcoded in server/main.py. Move it to an environment variable before deploying.
No authentication: The backend has no user authentication. Any client with network access can submit URLs and use your Gemini quota.
Backend URL hardcoded: The Flutter app points to http://127.0.0.1:8000. This needs to be configurable for multi-device or production use.
Private/restricted videos: yt-dlp cannot download private, age-restricted, or DRM-protected content.

Contributing

Contributions are welcome! Here's how to get started:

Fork this repository
Create a feature branch: git checkout -b feature/my-feature
Make your changes and test them
Commit with a clear message: git commit -m "Add: my feature description"
Push to your fork: git push origin feature/my-feature
Open a Pull Request against main

Ideas for contributions

Persistent Qdrant storage with session management
Support for non-YouTube video sources (direct MP4 URLs, Vimeo, etc.)
Export transcript/summary as PDF or Markdown
Dark/light theme toggle in the Flutter UI
Environment variable support for backend configuration
Docker Compose setup for one-command deployment
User authentication and per-user transcript history

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
android		android
ios		ios
lib		lib
linux		linux
macos		macos
server		server
test		test
web		web
windows		windows
.gitignore		.gitignore
.metadata		.metadata
README.md		README.md
analysis_options.yaml		analysis_options.yaml
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Folders and files

Latest commit

History

Repository files navigation

Snip.AI 🎬✂️

Table of Contents

Features

Architecture

Data Flow

Tech Stack

Frontend

Backend

Prerequisites

Installation

1. Clone the repository

2. Install Flutter dependencies

3. Install Python backend dependencies

4. Configure your Gemini API key

Running the App

Start the backend server

Start the Flutter app

API Reference

GET /

POST /transcribe

POST /chat

GET /status/{session_id}

Project Structure

How It Works

Transcription Pipeline

Summarization

RAG Chat

Embedding Storage

Configuration

Known Limitations

Contributing

Ideas for contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`GET /`

`POST /transcribe`

`POST /chat`

`GET /status/{session_id}`

Packages