This repository is a Streamlit app that uses LLMs and Retrieval Augmented Generation (RAG) to summarize and chat with YouTube videos, plus a Library page to save outputs.
- Languages: Python
- Frameworks / libraries: Streamlit, LangChain, YouTube Transcript API, peewee
- Databases: ChromaDB (vector store)
Main entrypoint is main.py. Streamlit pages live in pages/. Shared logic is in
modules/. Prompts are in prompts/. Tests are in tests/.
High-level architecture: The app consists of three main components: Summary, Chat and Library. Summary generates summaries of YouTube videos, Chat leverages RAG to answer user questions based on video content, and Library manages saved summaries and answers from Q&A sessions.
.
├── main.py # Streamlit entrypoint and app shell
├── pages/ # Streamlit pages
│ ├── chat.py # RAG-based Q&A UI flow
│ ├── library.py # Saved summaries/answers UI
│ └── summary.py # Summary UI flow
├── modules/ # Core logic (summarization, RAG, persistence, UI helpers)
│ ├── helpers.py # Shared helpers (config, tokens, providers)
│ ├── persistance.py # SQLite models and library CRUD
│ ├── rag.py # Chunking, embeddings, and RAG response
│ ├── summary.py # Summary prompt + model invocation
│ ├── transcription.py # Whisper-based transcription
│ ├── ui.py # Streamlit UI helpers/settings
│ └── youtube.py # YouTube metadata and transcript retrieval
├── prompts/ # LLM system/user prompt templates
├── tests/ # pytest suites
├── data/ # Local SQLite database and saved outputs
├── config.json # Default models and UI/config settings
├── docker-compose.yml # ChromaDB + app services for local Docker setup
├── requirements.txt # Python dependencies- Summary:
pages/summary.py->modules/summary.py-> prompt files inprompts/. - Chat (RAG):
pages/chat.py->modules/rag.py-> ChromaDB. - Library:
pages/library.py->modules/persistance.py(SQLite).
- Follow SOLID principles.
- Code should be: simple, readable and maintainable.
- Format with Black.
- Use snake_case for variables/functions, PascalCase for classes.
- Provide Google-style docstrings for all classes and functions.
- Assume features will evolve over time; write code that is easy to extend and modify.
- Use the Conventional Commits format: type(scope): description.
- Use these types: feat, fix, docs, style, refactor, perf, test, chore, ci.
- Use imperative mood: "add" not "added".
- Keep the subject <= 72 characters.
- Include scope when clear (e.g., chat, summary, library, rag etc.).
- If multiple logical changes, summarize the most important one.
- Use pytest; tests are in
tests/.
uv syncdocker-compose up -d chromadbuv run streamlit run main.py