Skip to content

atharvaa45/Financial-Compliance-System-using-Evaluated-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏦 SEC Financial Data Lakehouse & RAG Pipeline

An end-to-end Big Data project that ingests unstructured SEC 10-K filings, processes them with Apache Spark, and powers an AI-driven financial analysis dashboard.

Status: ✅ Complete | Tech Stack: Docker, Spark, MinIO, DuckDB, Streamlit, Google Gemini


🚀 How It Works

This project implements a modern "Lakehouse" architecture to solve the problem of analyzing messy legal documents.

  1. Ingest: A Python script fetches raw 10-K HTML reports from the SEC EDGAR Archives.
  2. Store: Raw files are saved to MinIO (S3-compatible object storage).
  3. Process: Apache Spark reads the HTML, cleans tags, redacts PII (emails/phones), and chunks text for AI.
  4. Analyze: DuckDB queries the processed Parquet files in milliseconds.
  5. Intelligence (RAG): Google Gemini Pro acts as a reasoning engine, answering user questions based only on the retrieved financial data.
RAG pipline flow

🛠️ Tech Stack

  • Infrastructure: Docker & Docker Compose
  • Storage: MinIO (Object Storage)
  • Processing: Apache Spark (PySpark)
  • Query Engine: DuckDB (OLAP)
  • UI/Visualization: Streamlit
  • GenAI Model: gemini-2.5-flash

📸 Project Snapshots

StreamLit Dashboard: Screenshot 2026-01-25 133806

Screenshot 2026-01-25 133651 Screenshot 2026-01-25 133715 image

MinIO HomePage: Screenshot 2026-01-25 134823

Apache Spark: Screenshot 2026-01-25 143551


⚡ Quick Start

1. Prerequisites

  • Docker Desktop installed
  • Python 3.9+ installed

2. Setup Infrastructure

git clone [https://github.com/atharvaa45/Financial-Compliance-System-using-Evaluated-RAG.git](https://github.com/atharvaa45/Financial-Compliance-System-using-Evaluated-RAG.git)
cd compliance-llm-pipeline
docker-compose up -d

About

LLM-powered Retrieval-Augmented Generation (RAG) system for analyzing financial and regulatory documents with evidence-backed compliance insights.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors