Simplified And Automated Research Amplification and Learning
Transform research papers into educational videos, podcasts, mind maps, and visual stories using AI.
Quick Links: Live Demo | Chrome Extension | WhatsApp Bot | API Reference | Contributing
| Document | Description |
|---|---|
| README.md | This file - Project overview and setup |
| Backend README | Backend API documentation and setup |
| Frontend README | React frontend documentation |
| Extension README | Chrome extension installation |
| Podcast Backend | Standalone podcast server |
| API Reference | Complete API endpoint documentation |
| Contributing Guide | How to contribute to the project |
| Security Policy | Security practices and guidelines |
| Import Fixes | Common import error solutions |
- Overview
- Key Features
- Use Cases
- System Requirements
- Project Structure
- Installation
- Configuration
- Running the Application
- Features Workflow
- Chrome Extension (SARALify)
- WhatsApp Bot
- API Documentation
- Troubleshooting
- Development
- Contributing
- License
- Acknowledgements
- Contact
Research Paper β AI Processing β πΉ Video | ποΈ Podcast | πΊοΈ Mindmap | π Visual Story
π Chrome Extension: Process papers from any research website!
π¬ WhatsApp Bot: 24/7 AI research assistant
SARAL AI democratizes research by transforming complex academic papers into accessible multimedia formats. Whether you're a student trying to understand a paper, an educator creating content, or a researcher sharing findings, SARAL AI makes it simple.
Key Capabilities:
- π₯ Educational Videos - Auto-generated scripts, professional slides, multi-language narration
- ποΈ Podcasts - Natural two-voice conversations explaining research
- πΊοΈ Mind Maps - Visual concept hierarchies with Mermaid diagrams
- π Visual Stories - Cinematic scene-by-scene narratives with AI imagery
- π Browser Extension - One-click processing from arXiv, bioRxiv, and more
- π¬ WhatsApp Bot - Chat-based research Q&A, anywhere, anytime
- π Multi-language - English, Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, and more
| Feature | Description | Docs |
|---|---|---|
| Video Generation | AI-powered scripts, LaTeX/Beamer slides, multi-language TTS narration | Backend API |
| Podcast Creation | Student-teacher dialogue generation with customizable voices | Backend API |
| Mind Mapping | Hierarchical concept extraction with Mermaid SVG export | Backend API |
| Visual Storytelling | Scene-based narratives with AI-generated imagery | Backend API |
| Chrome Extension | One-click video/podcast from arXiv, bioRxiv, medRxiv, chemRxiv | Extension Docs |
| WhatsApp Bot | 24/7 semantic search, Q&A, and paper summaries | Bot Repo |
| Google OAuth | Secure authentication with Google accounts | Backend API |
| Complexity Levels | Easy/Medium/Advanced content adaptation | Built-in |
| User | Use Case |
|---|---|
| Students | Exam prep, quick paper understanding, visual learning aids |
| Educators | Lecture content creation, teaching materials, multi-format resources |
| Researchers | Conference presentations, research outreach, accessible findings |
| Institutions | Content libraries, online courses, research accessibility programs |
| Mobile Users | WhatsApp bot for on-the-go research assistance |
| Browser Users | Chrome extension for instant paper processing |
- Python 3.11+ (see .python-version)
- LaTeX - pdflatex via MiKTeX (Windows) or TeX Live (Linux/macOS)
- Poppler - PDF to image conversion
- FFmpeg - Audio/video processing
- 4GB+ RAM recommended
- Node.js 16+
- npm 8+
- Modern browser (Chrome, Firefox, Safari, Edge)
| API | Required | Free Tier | Get Key |
|---|---|---|---|
| Google Gemini | β Required | 200 req/day | aistudio.google.com |
| Sarvam AI | Optional | Limited | sarvam.ai |
| Hugging Face | Optional | Free | huggingface.co |
| Google OAuth | Optional | Free | console.cloud.google.com |
GGW_Megathon_Saral/
βββ README.md # This file - Main documentation
βββ LICENSE # MIT License
βββ IMPORT_FIX.md # Import error fixes reference
β
βββ backend/ # FastAPI backend server
β βββ README.md # Backend-specific documentation
β βββ requirements.txt # Python dependencies
β βββ app/
β βββ main.py # FastAPI application entry
β βββ auth/ # Authentication (Google OAuth, JWT)
β β βββ dependencies.py
β β βββ decorators.py
β β βββ google_auth.py
β βββ models/ # Pydantic request/response models
β β βββ request_models.py
β βββ routes/ # API endpoints
β β βββ api_keys.py # API key management
β β βββ auth.py # Authentication routes
β β βββ images.py # AI image generation
β β βββ media.py # Audio/video generation
β β βββ mindmap.py # Mind map generation
β β βββ papers.py # Paper upload/processing
β β βββ podcast.py # Podcast generation
β β βββ scripts.py # Script generation
β β βββ slides.py # Slide generation
β β βββ visual_storytelling.py
β βββ services/ # Business logic
β β βββ ai_image_generator.py
β β βββ arxiv_fetcher.py
β β βββ arxiv_scraper.py
β β βββ auth_service.py
β β βββ beamer_generator.py
β β βββ bhashini_service.py
β β βββ cinematic_video_service.py
β β βββ gemini_mindmap_processor.py
β β βββ hindi_service.py
β β βββ language_service.py
β β βββ latex_processor.py
β β βββ mermaid_generator.py
β β βββ pdf_processor.py
β β βββ podcast_generator.py
β β βββ sarvam_sdk.py
β β βββ script_generator.py
β β βββ storage_manager.py
β β βββ tts_service.py
β β βββ video_service.py
β β βββ visual_storytelling_service.py
β βββ utils/
β βββ latex_to_images.py
β
βββ frontend/ # React frontend application
β βββ README.md # Frontend documentation (Create React App)
β βββ package.json # Node.js dependencies
β βββ tailwind.config.js # Tailwind CSS configuration
β βββ public/ # Static assets
β βββ src/
β βββ App.js # Main React component
β βββ index.js # Entry point
β βββ components/ # Reusable UI components
β β βββ auth/ # Authentication components
β β βββ common/ # Shared components
β β βββ forms/ # Form components
β β βββ navigation/ # Navigation components
β β βββ ui/ # UI primitives
β β βββ workflow/ # Workflow step components
β βββ contexts/ # React context providers
β β βββ ApiContext.jsx
β β βββ AuthContext.jsx
β β βββ ComplexityContext.jsx
β β βββ ThemeContext.jsx
β β βββ WorkflowContext.jsx
β βββ hooks/ # Custom React hooks
β βββ pages/ # Page components
β β βββ LandingPage.jsx
β β βββ ApiSetup.jsx
β β βββ PaperProcessing.jsx
β β βββ ScriptGeneration.jsx
β β βββ SlideCreation.jsx
β β βββ MediaGeneration.jsx
β β βββ PodcastGeneration.jsx
β β βββ MindmapGeneration.jsx
β β βββ VisualStorytellingPage.jsx
β β βββ Results.jsx
β βββ services/ # API client
β β βββ api.js
β βββ styles/ # CSS styles
β
βββ arxiv-plugin/ # Chrome Extension (SARALify)
β βββ manifest.json # Extension manifest (MV3)
β βββ content_script.js # Page injection scripts
β βββ service_worker.js # Background service worker
β βββ styles.css # Extension styles
β βββ saral-extension-readme.md # Extension documentation
β βββ podcast_backend/ # Standalone podcast server
β βββ README.md # Podcast backend docs
β βββ server.py # Flask podcast server
β βββ requirements.txt # Python dependencies
β βββ env_example.txt # Environment template
β
βββ poppler_temp/ # Poppler binaries (Windows)
Related Repository:
- Research-Paper-Chatbot - WhatsApp bot companion
Windows:
- Python 3.11+ - Add to PATH during install
- Node.js 16+ - LTS version recommended
- MiKTeX - LaTeX distribution with pdflatex
- Poppler - Add
binfolder to PATH - FFmpeg - Add to PATH
macOS:
brew install python@3.11 node poppler ffmpeg
brew install --cask mactexLinux (Ubuntu/Debian):
sudo apt update
sudo apt install python3.11 python3.11-venv nodejs npm poppler-utils ffmpeg texlive-full# 1. Clone repository
git clone https://github.com/N1KH1LT0X1N/GGW_Megathon_Saral.git
cd GGW_Megathon_Saral
# 2. Backend setup
cd backend
python -m venv .venv
# Activate virtual environment
# Windows PowerShell:
.venv\Scripts\Activate.ps1
# Windows CMD:
.venv\Scripts\activate.bat
# macOS/Linux:
source .venv/bin/activate
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# 3. Frontend setup
cd ../frontend
npm installCreate .env file in backend/ directory:
# Required - Google Gemini AI
GEMINI_API_KEY_1=AIzaSy... # Primary key
GEMINI_API_KEY_2=AIzaSy... # Optional: rotation key
GEMINI_API_KEY_3=AIzaSy... # Optional: additional keys
# Optional - Text-to-Speech (Hindi and regional languages)
SARVAM_API_KEY=your_sarvam_key # Get from https://www.sarvam.ai/
# Optional - AI Image Generation
HUGGINGFACE_API_KEY=hf_... # Get from https://huggingface.co/settings/tokens
# Optional - Google OAuth (for user authentication)
GOOGLE_CLIENT_ID=your_client_id # Get from Google Cloud Console
# Optional - Windows-specific paths
POPPLER_PATH=C:/path/to/poppler/bin # If not in PATHCreate .env file in frontend/ directory:
# Backend API URL (for production deployment)
REACT_APP_API_URL=http://localhost:8000
# Google OAuth Client ID (must match backend)
REACT_APP_GOOGLE_CLIENT_ID=your_client_idAdd multiple Gemini keys (GEMINI_API_KEY_1, GEMINI_API_KEY_2, etc.) for automatic rotation when quota limits are hit. The system will cycle through available keys automatically.
Alternatively, configure API keys through the web interface at /api-setup after launching the application.
Terminal 1 - Backend:
cd backend
source .venv/bin/activate # Windows: .venv\Scripts\activate
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Terminal 2 - Frontend:
cd frontend
npm startAccess Points:
| Service | URL |
|---|---|
| Frontend | http://localhost:3000 |
| Backend API | http://localhost:8000 |
| Swagger Docs | http://localhost:8000/docs |
| ReDoc | http://localhost:8000/redoc |
See Backend README for production deployment instructions.
Upload Paper β Generate Script β Edit Content β Assign Images β Generate Audio β Create Video
- Supports PDF and arXiv URL input
- AI-powered script generation with complexity levels (Easy/Medium/Advanced)
- Multi-language narration (English, Hindi, and 9+ regional languages)
- Professional Beamer/LaTeX slides
Upload Paper β Generate Dialogue β Customize Voices β Create Audio
- Natural student-teacher conversation format
- Complexity-adapted explanations
- Multiple voice options per language
Enter arXiv URL β AI Extracts Concepts β Generate Mermaid Diagram β Download SVG
- Hierarchical concept visualization
- Interactive Mermaid.js diagrams
- SVG export for presentations
Upload Paper β Generate Scenes β Create AI Images β Add Narration β Produce Video
- Cinematic scene-by-scene narratives
- AI-generated imagery (Hugging Face/Placeholder)
- Text overlays and transitions
The SARALify browser extension enables one-click processing of research papers directly from supported websites.
- arXiv.org - Physics, Math, CS, and more
- bioRxiv.org - Biology preprints
- medRxiv.org - Medical preprints
- chemRxiv.org - Chemistry preprints
- eartharXiv.org - Earth sciences
- OSF Preprints - Social sciences
- Preprints.org - Multidisciplinary
From Source (Developer Mode):
- Open
chrome://extensions/in Chrome/Edge - Enable Developer mode (top right toggle)
- Click Load unpacked
- Select the
arxiv-pluginfolder
See Extension README for detailed instructions.
- Navigate to any supported paper page (e.g.,
arxiv.org/abs/2301.12345) - Click the SARALify button that appears on the page
- Choose format: Video or Podcast
- Select language: English or Hindi
- Wait for processing, then download your content
The extension can use either:
- Main Backend - Full SARAL AI backend at
localhost:8000 - Podcast Backend - Lightweight Flask server in
arxiv-plugin/podcast_backend/
See Podcast Backend README for standalone podcast generation.
Your 24/7 AI research assistant for semantic search, Q&A, and summarization.
Join the Bot: WhatsApp Link
Repository: Research-Paper-Chatbot
Live Demo: https://research-paper-chatbot-2.onrender.com
| Command | Description |
|---|---|
transformer attention |
Semantic search for papers |
select 1 |
Select a paper from results |
ready for Q&A |
Start Q&A session |
Explain transformers |
Get topic explanations |
Activities machine learning |
Generate educational activities |
| Platform | Best For |
|---|---|
| Web App | Comprehensive content generation (videos, podcasts, mindmaps) |
| WhatsApp Bot | Quick research queries, paper discovery, mobile access |
| Chrome Extension | Instant processing while browsing research sites |
# Clone bot repository
git clone https://github.com/N1KH1LT0X1N/Research-Paper-Chatbot.git
cd Research-Paper-Chatbot
# Install dependencies
pip install -r requirements.txt
# Configure environment (.env)
TWILIO_ACCOUNT_SID=your_sid
TWILIO_AUTH_TOKEN=your_token
GEMINI_API_KEY=your_key
# Run the bot
python research_bot.py
# Expose webhook (use ngrok or similar)
ngrok http 5000Configure Twilio WhatsApp sandbox webhook: https://your-ngrok-url.ngrok.io/whatsapp
http://localhost:8000/api
Most endpoints require JWT authentication via Google OAuth. Include token in header:
Authorization: Bearer <your_jwt_token>
| Endpoint | Method | Description |
|---|---|---|
/auth/google/login |
POST | Authenticate with Google |
/keys/setup |
POST | Configure API keys |
/keys/status |
GET | Check API key status |
/papers/upload-zip |
POST | Upload LaTeX ZIP |
/papers/scrape-arxiv |
POST | Fetch from arXiv URL |
/papers/upload-pdf |
POST | Upload PDF file |
/scripts/{paper_id}/generate |
POST | Generate presentation script |
/slides/{paper_id}/generate |
POST | Generate Beamer slides |
/media/{paper_id}/generate-audio |
POST | Generate TTS audio |
/media/{paper_id}/generate-video |
POST | Create final video |
/podcast/{paper_id}/generate-script |
POST | Generate podcast dialogue |
/podcast/{paper_id}/generate-audio |
POST | Create podcast audio |
/mindmap/generate-mindmap |
POST | Generate mind map from arXiv |
/visual-storytelling/{paper_id}/generate-storytelling-script |
POST | Generate visual story script |
/visual-storytelling/{paper_id}/generate-video |
POST | Create visual story video |
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
For detailed API documentation, see Backend README.
| Issue | Solution |
|---|---|
| ImportError | Activate venv, run pip install -r requirements.txt |
| PDF/LaTeX errors | Install poppler and MiKTeX/TeX Live, add to PATH |
| FFmpeg not found | Install FFmpeg, add to PATH |
| API key invalid | Check .env format: KEY=value (no quotes) |
| Gemini quota exceeded | Add multiple keys: GEMINI_API_KEY_1, _2, etc. |
| Port in use | Kill process or change port |
| npm install fails | Delete node_modules and package-lock.json, reinstall |
| No audio in video | Verify Sarvam API key is valid |
| Extension not working | Reload from chrome://extensions/ |
| WhatsApp bot not responding | Check Twilio webhook and API keys |
| CORS errors | Ensure frontend URL is in backend CORS origins |
Backend:
uvicorn app.main:app --reload --log-level debugFrontend:
REACT_APP_DEBUG=true npm start- GitHub Issues: Report Bugs
- Email: democratise.research@gmail.com
- WhatsApp Bot: Join
Backend:
- FastAPI 0.115+ - Modern async Python web framework
- Google Gemini API - AI content generation
- Sarvam AI SDK - Indian language TTS
- MoviePy + FFmpeg - Video processing
- PyMuPDF - PDF processing
- Pydantic v2 - Data validation
Frontend:
- React 18.x - UI framework
- Tailwind CSS - Styling
- Framer Motion - Animations
- React Router - Navigation
- Axios - HTTP client
- Mermaid.js - Diagram rendering
Extension:
- Chrome Extension Manifest V3
- Service Worker architecture
- Content Scripts for page integration
- Python: PEP 8, type hints
- JavaScript: ESLint, Prettier
- Commits: Conventional Commits format
# Backend
cd backend
pytest
# Frontend
cd frontend
npm testWe welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'feat: add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
- Bug Reports: Include description, steps to reproduce, error logs
- Feature Requests: Describe use case and benefits
- Code Changes: Follow existing code style, add tests
- Documentation: Keep docs updated with changes
MIT License Β© 2025 SARAL AI Team
See LICENSE for full text.
AI & APIs:
- Google Gemini - AI content generation
- Sarvam AI - Indian language TTS
- Hugging Face - AI image generation
Frameworks & Libraries:
- FastAPI - Backend framework
- React - Frontend framework
- Tailwind CSS - Styling
- MoviePy - Video editing
- Mermaid.js - Diagram generation
Tools:
- arXiv - Research paper repository
- LaTeX - Document preparation
- FFmpeg - Media processing
- Poppler - PDF utilities
| Channel | Link |
|---|---|
| democratise.research@gmail.com | |
| WhatsApp Bot | Join Bot |
| GitHub Issues | Report Bugs |
| Bot Repository | Research-Paper-Chatbot |
β Star this repository if you found it helpful!
Made with β€οΈ by the GitGoneWild Team
Making Research Accessible to Everyone