Skip to content

N1KH1LT0X1N/Saral.AI

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ“ SARAL AI - Research Democratization Platform

Simplified And Automated Research Amplification and Learning

License Python React FastAPI

Transform research papers into educational videos, podcasts, mind maps, and visual stories using AI.

Quick Links: Live Demo | Chrome Extension | WhatsApp Bot | API Reference | Contributing


πŸ“š Documentation

Document Description
README.md This file - Project overview and setup
Backend README Backend API documentation and setup
Frontend README React frontend documentation
Extension README Chrome extension installation
Podcast Backend Standalone podcast server
API Reference Complete API endpoint documentation
Contributing Guide How to contribute to the project
Security Policy Security practices and guidelines
Import Fixes Common import error solutions

πŸ“‹ Table of Contents


✨ Overview

Research Paper β†’ AI Processing β†’ πŸ“Ή Video | πŸŽ™οΈ Podcast | πŸ—ΊοΈ Mindmap | πŸ“– Visual Story
πŸ”Œ Chrome Extension: Process papers from any research website!
πŸ’¬ WhatsApp Bot: 24/7 AI research assistant

SARAL AI democratizes research by transforming complex academic papers into accessible multimedia formats. Whether you're a student trying to understand a paper, an educator creating content, or a researcher sharing findings, SARAL AI makes it simple.

Key Capabilities:

  • πŸŽ₯ Educational Videos - Auto-generated scripts, professional slides, multi-language narration
  • πŸŽ™οΈ Podcasts - Natural two-voice conversations explaining research
  • πŸ—ΊοΈ Mind Maps - Visual concept hierarchies with Mermaid diagrams
  • πŸ“– Visual Stories - Cinematic scene-by-scene narratives with AI imagery
  • πŸ”Œ Browser Extension - One-click processing from arXiv, bioRxiv, and more
  • πŸ’¬ WhatsApp Bot - Chat-based research Q&A, anywhere, anytime
  • 🌐 Multi-language - English, Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, and more

πŸš€ Key Features

Feature Description Docs
Video Generation AI-powered scripts, LaTeX/Beamer slides, multi-language TTS narration Backend API
Podcast Creation Student-teacher dialogue generation with customizable voices Backend API
Mind Mapping Hierarchical concept extraction with Mermaid SVG export Backend API
Visual Storytelling Scene-based narratives with AI-generated imagery Backend API
Chrome Extension One-click video/podcast from arXiv, bioRxiv, medRxiv, chemRxiv Extension Docs
WhatsApp Bot 24/7 semantic search, Q&A, and paper summaries Bot Repo
Google OAuth Secure authentication with Google accounts Backend API
Complexity Levels Easy/Medium/Advanced content adaptation Built-in

🎯 Use Cases

User Use Case
Students Exam prep, quick paper understanding, visual learning aids
Educators Lecture content creation, teaching materials, multi-format resources
Researchers Conference presentations, research outreach, accessible findings
Institutions Content libraries, online courses, research accessibility programs
Mobile Users WhatsApp bot for on-the-go research assistance
Browser Users Chrome extension for instant paper processing

πŸ“¦ System Requirements

Backend

  • Python 3.11+ (see .python-version)
  • LaTeX - pdflatex via MiKTeX (Windows) or TeX Live (Linux/macOS)
  • Poppler - PDF to image conversion
  • FFmpeg - Audio/video processing
  • 4GB+ RAM recommended

Frontend

  • Node.js 16+
  • npm 8+
  • Modern browser (Chrome, Firefox, Safari, Edge)

API Keys (Required/Optional)

API Required Free Tier Get Key
Google Gemini βœ… Required 200 req/day aistudio.google.com
Sarvam AI Optional Limited sarvam.ai
Hugging Face Optional Free huggingface.co
Google OAuth Optional Free console.cloud.google.com

πŸ—οΈ Project Structure

GGW_Megathon_Saral/
β”œβ”€β”€ README.md                    # This file - Main documentation
β”œβ”€β”€ LICENSE                      # MIT License
β”œβ”€β”€ IMPORT_FIX.md               # Import error fixes reference
β”‚
β”œβ”€β”€ backend/                     # FastAPI backend server
β”‚   β”œβ”€β”€ README.md               # Backend-specific documentation
β”‚   β”œβ”€β”€ requirements.txt        # Python dependencies
β”‚   └── app/
β”‚       β”œβ”€β”€ main.py            # FastAPI application entry
β”‚       β”œβ”€β”€ auth/              # Authentication (Google OAuth, JWT)
β”‚       β”‚   β”œβ”€β”€ dependencies.py
β”‚       β”‚   β”œβ”€β”€ decorators.py
β”‚       β”‚   └── google_auth.py
β”‚       β”œβ”€β”€ models/            # Pydantic request/response models
β”‚       β”‚   └── request_models.py
β”‚       β”œβ”€β”€ routes/            # API endpoints
β”‚       β”‚   β”œβ”€β”€ api_keys.py    # API key management
β”‚       β”‚   β”œβ”€β”€ auth.py        # Authentication routes
β”‚       β”‚   β”œβ”€β”€ images.py      # AI image generation
β”‚       β”‚   β”œβ”€β”€ media.py       # Audio/video generation
β”‚       β”‚   β”œβ”€β”€ mindmap.py     # Mind map generation
β”‚       β”‚   β”œβ”€β”€ papers.py      # Paper upload/processing
β”‚       β”‚   β”œβ”€β”€ podcast.py     # Podcast generation
β”‚       β”‚   β”œβ”€β”€ scripts.py     # Script generation
β”‚       β”‚   β”œβ”€β”€ slides.py      # Slide generation
β”‚       β”‚   └── visual_storytelling.py
β”‚       β”œβ”€β”€ services/          # Business logic
β”‚       β”‚   β”œβ”€β”€ ai_image_generator.py
β”‚       β”‚   β”œβ”€β”€ arxiv_fetcher.py
β”‚       β”‚   β”œβ”€β”€ arxiv_scraper.py
β”‚       β”‚   β”œβ”€β”€ auth_service.py
β”‚       β”‚   β”œβ”€β”€ beamer_generator.py
β”‚       β”‚   β”œβ”€β”€ bhashini_service.py
β”‚       β”‚   β”œβ”€β”€ cinematic_video_service.py
β”‚       β”‚   β”œβ”€β”€ gemini_mindmap_processor.py
β”‚       β”‚   β”œβ”€β”€ hindi_service.py
β”‚       β”‚   β”œβ”€β”€ language_service.py
β”‚       β”‚   β”œβ”€β”€ latex_processor.py
β”‚       β”‚   β”œβ”€β”€ mermaid_generator.py
β”‚       β”‚   β”œβ”€β”€ pdf_processor.py
β”‚       β”‚   β”œβ”€β”€ podcast_generator.py
β”‚       β”‚   β”œβ”€β”€ sarvam_sdk.py
β”‚       β”‚   β”œβ”€β”€ script_generator.py
β”‚       β”‚   β”œβ”€β”€ storage_manager.py
β”‚       β”‚   β”œβ”€β”€ tts_service.py
β”‚       β”‚   β”œβ”€β”€ video_service.py
β”‚       β”‚   └── visual_storytelling_service.py
β”‚       └── utils/
β”‚           └── latex_to_images.py
β”‚
β”œβ”€β”€ frontend/                    # React frontend application
β”‚   β”œβ”€β”€ README.md               # Frontend documentation (Create React App)
β”‚   β”œβ”€β”€ package.json            # Node.js dependencies
β”‚   β”œβ”€β”€ tailwind.config.js      # Tailwind CSS configuration
β”‚   β”œβ”€β”€ public/                 # Static assets
β”‚   └── src/
β”‚       β”œβ”€β”€ App.js             # Main React component
β”‚       β”œβ”€β”€ index.js           # Entry point
β”‚       β”œβ”€β”€ components/        # Reusable UI components
β”‚       β”‚   β”œβ”€β”€ auth/          # Authentication components
β”‚       β”‚   β”œβ”€β”€ common/        # Shared components
β”‚       β”‚   β”œβ”€β”€ forms/         # Form components
β”‚       β”‚   β”œβ”€β”€ navigation/    # Navigation components
β”‚       β”‚   β”œβ”€β”€ ui/            # UI primitives
β”‚       β”‚   └── workflow/      # Workflow step components
β”‚       β”œβ”€β”€ contexts/          # React context providers
β”‚       β”‚   β”œβ”€β”€ ApiContext.jsx
β”‚       β”‚   β”œβ”€β”€ AuthContext.jsx
β”‚       β”‚   β”œβ”€β”€ ComplexityContext.jsx
β”‚       β”‚   β”œβ”€β”€ ThemeContext.jsx
β”‚       β”‚   └── WorkflowContext.jsx
β”‚       β”œβ”€β”€ hooks/             # Custom React hooks
β”‚       β”œβ”€β”€ pages/             # Page components
β”‚       β”‚   β”œβ”€β”€ LandingPage.jsx
β”‚       β”‚   β”œβ”€β”€ ApiSetup.jsx
β”‚       β”‚   β”œβ”€β”€ PaperProcessing.jsx
β”‚       β”‚   β”œβ”€β”€ ScriptGeneration.jsx
β”‚       β”‚   β”œβ”€β”€ SlideCreation.jsx
β”‚       β”‚   β”œβ”€β”€ MediaGeneration.jsx
β”‚       β”‚   β”œβ”€β”€ PodcastGeneration.jsx
β”‚       β”‚   β”œβ”€β”€ MindmapGeneration.jsx
β”‚       β”‚   β”œβ”€β”€ VisualStorytellingPage.jsx
β”‚       β”‚   └── Results.jsx
β”‚       β”œβ”€β”€ services/          # API client
β”‚       β”‚   └── api.js
β”‚       └── styles/            # CSS styles
β”‚
β”œβ”€β”€ arxiv-plugin/               # Chrome Extension (SARALify)
β”‚   β”œβ”€β”€ manifest.json          # Extension manifest (MV3)
β”‚   β”œβ”€β”€ content_script.js      # Page injection scripts
β”‚   β”œβ”€β”€ service_worker.js      # Background service worker
β”‚   β”œβ”€β”€ styles.css             # Extension styles
β”‚   β”œβ”€β”€ saral-extension-readme.md  # Extension documentation
β”‚   └── podcast_backend/       # Standalone podcast server
β”‚       β”œβ”€β”€ README.md          # Podcast backend docs
β”‚       β”œβ”€β”€ server.py          # Flask podcast server
β”‚       β”œβ”€β”€ requirements.txt   # Python dependencies
β”‚       └── env_example.txt    # Environment template
β”‚
└── poppler_temp/              # Poppler binaries (Windows)

Related Repository:


βš™οΈ Installation

Prerequisites

Windows:

  1. Python 3.11+ - Add to PATH during install
  2. Node.js 16+ - LTS version recommended
  3. MiKTeX - LaTeX distribution with pdflatex
  4. Poppler - Add bin folder to PATH
  5. FFmpeg - Add to PATH

macOS:

brew install python@3.11 node poppler ffmpeg
brew install --cask mactex

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install python3.11 python3.11-venv nodejs npm poppler-utils ffmpeg texlive-full

Setup

# 1. Clone repository
git clone https://github.com/N1KH1LT0X1N/GGW_Megathon_Saral.git
cd GGW_Megathon_Saral

# 2. Backend setup
cd backend
python -m venv .venv

# Activate virtual environment
# Windows PowerShell:
.venv\Scripts\Activate.ps1
# Windows CMD:
.venv\Scripts\activate.bat
# macOS/Linux:
source .venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 3. Frontend setup
cd ../frontend
npm install

πŸ”§ Configuration

Backend Environment Variables

Create .env file in backend/ directory:

# Required - Google Gemini AI
GEMINI_API_KEY_1=AIzaSy...        # Primary key
GEMINI_API_KEY_2=AIzaSy...        # Optional: rotation key
GEMINI_API_KEY_3=AIzaSy...        # Optional: additional keys

# Optional - Text-to-Speech (Hindi and regional languages)
SARVAM_API_KEY=your_sarvam_key    # Get from https://www.sarvam.ai/

# Optional - AI Image Generation
HUGGINGFACE_API_KEY=hf_...        # Get from https://huggingface.co/settings/tokens

# Optional - Google OAuth (for user authentication)
GOOGLE_CLIENT_ID=your_client_id   # Get from Google Cloud Console

# Optional - Windows-specific paths
POPPLER_PATH=C:/path/to/poppler/bin  # If not in PATH

Frontend Environment Variables

Create .env file in frontend/ directory:

# Backend API URL (for production deployment)
REACT_APP_API_URL=http://localhost:8000

# Google OAuth Client ID (must match backend)
REACT_APP_GOOGLE_CLIENT_ID=your_client_id

API Key Rotation

Add multiple Gemini keys (GEMINI_API_KEY_1, GEMINI_API_KEY_2, etc.) for automatic rotation when quota limits are hit. The system will cycle through available keys automatically.

Web UI Setup

Alternatively, configure API keys through the web interface at /api-setup after launching the application.


▢️ Running the Application

Development Mode

Terminal 1 - Backend:

cd backend
source .venv/bin/activate  # Windows: .venv\Scripts\activate
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 - Frontend:

cd frontend
npm start

Access Points:

Service URL
Frontend http://localhost:3000
Backend API http://localhost:8000
Swagger Docs http://localhost:8000/docs
ReDoc http://localhost:8000/redoc

Production Deployment

See Backend README for production deployment instructions.


πŸ“š Features Workflow

1. Video Generation

Upload Paper β†’ Generate Script β†’ Edit Content β†’ Assign Images β†’ Generate Audio β†’ Create Video
  • Supports PDF and arXiv URL input
  • AI-powered script generation with complexity levels (Easy/Medium/Advanced)
  • Multi-language narration (English, Hindi, and 9+ regional languages)
  • Professional Beamer/LaTeX slides

2. Podcast Creation

Upload Paper β†’ Generate Dialogue β†’ Customize Voices β†’ Create Audio
  • Natural student-teacher conversation format
  • Complexity-adapted explanations
  • Multiple voice options per language

3. Mind Mapping

Enter arXiv URL β†’ AI Extracts Concepts β†’ Generate Mermaid Diagram β†’ Download SVG
  • Hierarchical concept visualization
  • Interactive Mermaid.js diagrams
  • SVG export for presentations

4. Visual Storytelling

Upload Paper β†’ Generate Scenes β†’ Create AI Images β†’ Add Narration β†’ Produce Video
  • Cinematic scene-by-scene narratives
  • AI-generated imagery (Hugging Face/Placeholder)
  • Text overlays and transitions

πŸ”Œ Chrome Extension (SARALify)

The SARALify browser extension enables one-click processing of research papers directly from supported websites.

Supported Sites

  • arXiv.org - Physics, Math, CS, and more
  • bioRxiv.org - Biology preprints
  • medRxiv.org - Medical preprints
  • chemRxiv.org - Chemistry preprints
  • eartharXiv.org - Earth sciences
  • OSF Preprints - Social sciences
  • Preprints.org - Multidisciplinary

Installation

From Source (Developer Mode):

  1. Open chrome://extensions/ in Chrome/Edge
  2. Enable Developer mode (top right toggle)
  3. Click Load unpacked
  4. Select the arxiv-plugin folder

See Extension README for detailed instructions.

Usage

  1. Navigate to any supported paper page (e.g., arxiv.org/abs/2301.12345)
  2. Click the SARALify button that appears on the page
  3. Choose format: Video or Podcast
  4. Select language: English or Hindi
  5. Wait for processing, then download your content

Extension Backend

The extension can use either:

  • Main Backend - Full SARAL AI backend at localhost:8000
  • Podcast Backend - Lightweight Flask server in arxiv-plugin/podcast_backend/

See Podcast Backend README for standalone podcast generation.


πŸ’¬ WhatsApp Bot

Your 24/7 AI research assistant for semantic search, Q&A, and summarization.

Quick Start

Join the Bot: WhatsApp Link
Repository: Research-Paper-Chatbot
Live Demo: https://research-paper-chatbot-2.onrender.com

Features

Command Description
transformer attention Semantic search for papers
select 1 Select a paper from results
ready for Q&A Start Q&A session
Explain transformers Get topic explanations
Activities machine learning Generate educational activities

Integration with SARAL AI

Platform Best For
Web App Comprehensive content generation (videos, podcasts, mindmaps)
WhatsApp Bot Quick research queries, paper discovery, mobile access
Chrome Extension Instant processing while browsing research sites

Self-Hosting

# Clone bot repository
git clone https://github.com/N1KH1LT0X1N/Research-Paper-Chatbot.git
cd Research-Paper-Chatbot

# Install dependencies
pip install -r requirements.txt

# Configure environment (.env)
TWILIO_ACCOUNT_SID=your_sid
TWILIO_AUTH_TOKEN=your_token
GEMINI_API_KEY=your_key

# Run the bot
python research_bot.py

# Expose webhook (use ngrok or similar)
ngrok http 5000

Configure Twilio WhatsApp sandbox webhook: https://your-ngrok-url.ngrok.io/whatsapp


πŸ“‘ API Documentation

Base URL

http://localhost:8000/api

Authentication

Most endpoints require JWT authentication via Google OAuth. Include token in header:

Authorization: Bearer <your_jwt_token>

Key Endpoints

Endpoint Method Description
/auth/google/login POST Authenticate with Google
/keys/setup POST Configure API keys
/keys/status GET Check API key status
/papers/upload-zip POST Upload LaTeX ZIP
/papers/scrape-arxiv POST Fetch from arXiv URL
/papers/upload-pdf POST Upload PDF file
/scripts/{paper_id}/generate POST Generate presentation script
/slides/{paper_id}/generate POST Generate Beamer slides
/media/{paper_id}/generate-audio POST Generate TTS audio
/media/{paper_id}/generate-video POST Create final video
/podcast/{paper_id}/generate-script POST Generate podcast dialogue
/podcast/{paper_id}/generate-audio POST Create podcast audio
/mindmap/generate-mindmap POST Generate mind map from arXiv
/visual-storytelling/{paper_id}/generate-storytelling-script POST Generate visual story script
/visual-storytelling/{paper_id}/generate-video POST Create visual story video

Interactive Documentation

For detailed API documentation, see Backend README.


πŸ” Troubleshooting

Common Issues

Issue Solution
ImportError Activate venv, run pip install -r requirements.txt
PDF/LaTeX errors Install poppler and MiKTeX/TeX Live, add to PATH
FFmpeg not found Install FFmpeg, add to PATH
API key invalid Check .env format: KEY=value (no quotes)
Gemini quota exceeded Add multiple keys: GEMINI_API_KEY_1, _2, etc.
Port in use Kill process or change port
npm install fails Delete node_modules and package-lock.json, reinstall
No audio in video Verify Sarvam API key is valid
Extension not working Reload from chrome://extensions/
WhatsApp bot not responding Check Twilio webhook and API keys
CORS errors Ensure frontend URL is in backend CORS origins

Debug Mode

Backend:

uvicorn app.main:app --reload --log-level debug

Frontend:

REACT_APP_DEBUG=true npm start

Getting Help


πŸ› οΈ Development

Tech Stack

Backend:

  • FastAPI 0.115+ - Modern async Python web framework
  • Google Gemini API - AI content generation
  • Sarvam AI SDK - Indian language TTS
  • MoviePy + FFmpeg - Video processing
  • PyMuPDF - PDF processing
  • Pydantic v2 - Data validation

Frontend:

  • React 18.x - UI framework
  • Tailwind CSS - Styling
  • Framer Motion - Animations
  • React Router - Navigation
  • Axios - HTTP client
  • Mermaid.js - Diagram rendering

Extension:

  • Chrome Extension Manifest V3
  • Service Worker architecture
  • Content Scripts for page integration

Code Style

  • Python: PEP 8, type hints
  • JavaScript: ESLint, Prettier
  • Commits: Conventional Commits format

Testing

# Backend
cd backend
pytest

# Frontend
cd frontend
npm test

🀝 Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'feat: add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open a Pull Request

Contribution Guidelines

  • Bug Reports: Include description, steps to reproduce, error logs
  • Feature Requests: Describe use case and benefits
  • Code Changes: Follow existing code style, add tests
  • Documentation: Keep docs updated with changes

πŸ“„ License

MIT License Β© 2025 SARAL AI Team

See LICENSE for full text.


πŸ™ Acknowledgements

AI & APIs:

Frameworks & Libraries:

Tools:


πŸ“ž Contact

Channel Link
Email democratise.research@gmail.com
WhatsApp Bot Join Bot
GitHub Issues Report Bugs
Bot Repository Research-Paper-Chatbot

⭐ Star this repository if you found it helpful!

Made with ❀️ by the GitGoneWild Team

Making Research Accessible to Everyone

About

πŸŽ“ Transform research papers into educational videos, podcasts, mind maps, and visual stories using AI. Democratizing research through multi-format content generation.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C 37.9%
  • C++ 29.7%
  • JavaScript 15.1%
  • Python 13.7%
  • Roff 2.0%
  • CSS 0.6%
  • Other 1.0%