AskPDF: RAG Application

AskPDF is a Retrieval-Augmented Generation (RAG) system designed to process large documents (PDFs) and provide accurate, context-aware answers to user queries. By leveraging vector embeddings and Large Language Models (LLMs), it allows users to interact with their document repository naturally.

🏗 System Architecture

The system is composed of three main microservices:

Client (SvelteKit): A responsive frontend for document management and chat interface.
Server (FastAPI): The core API handling user requests, database interactions, and LLM communication.
PDF Processor (Python/Celery): An asynchronous worker responsible for parsing PDFs, generating embeddings, and updating the vector store.

File Ingestion Pipeline

When a user uploads a file, it goes through a secure signed URL process directly to S3, triggering an event-driven processing pipeline.

Query & Response Flow

User queries are processed to retrieve relevant document contexts from ChromaDB, which are then fed into the Gemini LLM to generate precise answers with citations.

🚀 Features

Document Ingestion: Secure upload and asynchronous processing of PDF documents.
Semantic Search: Uses advanced embeddings (Sentence-Transformers) to find the most relevant document sections.
Contextual Answers: Generates answers using Gemini LLM based strictly on the provided context.
Citations: Every answer includes citations pointing back to the specific source document and text.
Conversation History: Maintains chat sessions for continuous context (partially implemented).

🛠 Tech Stack

Frontend: SvelteKit, TypeScript, TailwindCSS
Backend: FastAPI, Python, SQLAlchemy
AI/ML: Sentence-Transformers, Gemini API, ChromaDB (Vector Store)
Database: PostgreSQL, In-memory SQLite
Async Processing: Celery, RabbitMQ/SQS
Infrastructure: AWS Lambda, S3, Docker

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
client		client
dev		dev
docs		docs
lambda		lambda
pdf_processor		pdf_processor
server		server
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AskPDF: RAG Application

🏗 System Architecture

File Ingestion Pipeline

Query & Response Flow

🚀 Features

🛠 Tech Stack

About

Uh oh!

Releases

Packages

Languages

santanu2310/AskPDF

Folders and files

Latest commit

History

Repository files navigation

AskPDF: RAG Application

🏗 System Architecture

File Ingestion Pipeline

Query & Response Flow

🚀 Features

🛠 Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages