Skip to content

Autonomous-Drone-Target-Tracking-System/MediScan-AI_Smart-Report-Analyzer

Repository files navigation

🩺 MediScan AI

Smart Medical Report Analyzer

Upload any blood test or lab report → Get AI-powered health insights in seconds.

FastAPI Next.js Python Groq OCR.space SQLite License: MIT

⚠️ Demo purposes only. Not a substitute for professional medical advice.


📖 Table of Contents


⚡ How to Run

TL;DR — Two terminals, four commands, and you're live.

Step 1 — Clone & configure secrets

git clone https://github.com/Autonomous-Drone-Target-Tracking-System/MediScan-AI_Smart-Report-Analyzer.git
cd MediScan-AI_Smart-Report-Analyzer

# Copy the env template and fill in your API keys
copy .env.example .env

Open .env and set:

GROQ_API_KEY=<your Groq key>        # https://console.groq.com/keys
OCR_SPACE_API_KEY=<your OCR key>    # https://ocr.space/ocrapi  (free tier OK)
FRONTEND_URL=http://localhost:3000
NEXT_PUBLIC_API_URL=http://localhost:8000

Step 2 — Start the Backend (Terminal 1)

cd backend

# Create & activate virtual environment
python -m venv venv
venv\Scripts\activate          # Windows
# source venv/bin/activate     # macOS / Linux

# Install dependencies
pip install -r requirements.txt

# Run the API server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

✅ Backend ready at http://localhost:8000 · Swagger docs at http://localhost:8000/docs


Step 3 — Start the Frontend (Terminal 2)

cd frontend

# Copy frontend env
copy .env.local.example .env.local

# Install dependencies
npm install

# Start the dev server
npm run dev

✅ Frontend ready at http://localhost:3000


Step 4 — Use the App

1. Open http://localhost:3000
2. Click "Analyze My Report"
3. Drag & drop a PDF or image of a blood/lab report
4. Click "Analyze Report"
5. Wait ~10–15 seconds → dashboard with your health score & AI insights

💡 No Tesseract? The app works fine without it — OCR.space handles scanned documents, and pdfplumber handles digital PDFs. Tesseract is only a local fallback.


🌟 Overview

MediScan AI is a full-stack web application built for a hackathon that transforms raw medical lab reports (PDFs or images) into clear, actionable health insights — powered by OCR and large language models.

A user uploads their blood test or lab report, and within seconds receives:

  • A health score (0–100)
  • Color-coded risk classification for each biomarker (Normal / Moderate / Critical)
  • Plain-English AI explanations for every marker
  • Personalized recommendations generated by Groq LLM
  • A persistent dashboard they can revisit at any time

No sign-up. No medical degree required. Just clarity.


🎬 Live Demo Flow

1. Visit http://localhost:3000
2. Click "Analyze My Report" → Upload page
3. Drag & drop a PDF or image of a lab report
4. Click "Analyze Report"
5. ⏱️  ~10–15 seconds later → redirected to your personal Dashboard
6. See your health score, biomarker table, AI summary, and recommendations

✨ Features

Feature Description
📄 Smart OCR Extraction pdfplumber (text PDFs) → OCR.space API (scanned) → Tesseract (local fallback)
🧠 AI Interpretation Groq LLM generates plain-English explanations for every biomarker
📊 Risk Dashboard Health score gauge, color-coded biomarker cards, risk badges
🛡️ Rule-Based Validation Clinical reference ranges engine — grounds AI output in facts
Results in Seconds Upload to full dashboard in under 15 seconds
🔬 30+ Biomarkers Hemoglobin, LDL, Blood Sugar, TSH, Vitamin D, Creatinine, and more
💾 Persistent Reports All analyses stored in SQLite — revisit any report via dashboard URL
📱 Fully Responsive Mobile-first design, works on phones, tablets, and desktops
🔒 No Login Required Zero friction — upload and analyze immediately

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        USER BROWSER                             │
│                    Next.js 16 Frontend                          │
│         Landing → Upload → Dashboard (per report_id)            │
└──────────────────────┬──────────────────────────────────────────┘
                       │ HTTP (axios)
                       ▼
┌────────────────────────────────────────────────────────────────┐
│                    FastAPI Backend :8000                       │
│                                                                │
│  POST /api/upload          POST /api/analyze/{id}              │
│  GET  /api/report/{id}     GET  /docs (Swagger)                │
│                                                                │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                  Analysis Pipeline                      │   │
│  │                                                         │   │
│  │  1. OCR Router ──────────────────────────────────────┐  │   │
│  │     ├── pdfplumber (text-based PDFs)                 │  │   │
│  │     ├── OCR.space API (scanned PDFs & images)        │  │   │
│  │     └── Tesseract + OpenCV (local fallback)          │  │   │
│  │                                                      │  │   │
│  │  2. Medical Parser ──── regex + keyword matching     │  │   │
│  │     └── Extracts: name, value, unit, ref range       │  │   │
│  │                                                      │  │   │
│  │  3. Risk Engine ───────────────────────────────────┐ │  │   │
│  │     ├── Classifies: Normal / Moderate / Critical   │ │  │   │
│  │     └── Calculates: Health Score (0–100)           │ │  │   │
│  │                                                    │ │  │   │
│  │  4. Groq LLM ──────────────────────────────────── ◄┘ │  │   │
│  │     ├── Per-biomarker plain-English explanations     │  │   │
│  │     ├── Overall health summary                       │  │   │
│  │     └── Personalized recommendations                 │  │   │
│  │                                                      │  │   │
│  │  5. SQLite Persistence ──────────────────────────────┘  │   │
│  └─────────────────────────────────────────────────────────┘   │
└────────────────────────────────────────────────────────────────┘
                       │
                       ▼
              SQLite (medical.db)
         reports + biomarkers tables

🛠️ Tech Stack

Backend

Technology Version Purpose
FastAPI 0.136 REST API framework
Uvicorn 0.47 ASGI server
pdfplumber 0.11.9 Text extraction from digital PDFs
OCR.space API Cloud OCR for scanned documents
pytesseract 0.3.13 Local OCR fallback
OpenCV 4.13 Image preprocessing for OCR
Groq SDK 1.2.0 LLM inference (llama3 models)
SQLite3 built-in Lightweight persistence
Pydantic 2.13 Data validation & schemas
python-multipart 0.0.29 File upload handling
python-dotenv 1.2.2 Environment variable loading

Frontend

Technology Version Purpose
Next.js 16.2.6 (Turbopack) React framework with SSR/CSR
TypeScript 5 Type safety
framer-motion Animations & micro-interactions
Recharts Health score gauge & charts
axios HTTP client for API calls
Lucide React Icon library
Poppins + Inter Google Fonts Typography
Vanilla CSS Custom design system with CSS tokens

📁 Project Structure

Hackethon/
├── .env                          # Root env file (shared API keys)
├── README.md
│
├── backend/
│   ├── main.py                   # FastAPI app entry point, CORS, lifespan
│   ├── requirements.txt          # Python dependencies
│   ├── medical.db                # SQLite database (auto-created)
│   │
│   ├── routes/
│   │   ├── upload.py             # POST /api/upload
│   │   └── analyze.py            # POST /api/analyze/{id}, GET /api/report/{id}
│   │
│   ├── services/
│   │   ├── pipeline.py           # Main orchestrator: OCR→Parse→Classify→AI→DB
│   │   ├── ocr_router.py         # Routes to PDF or image extractor
│   │   ├── pdf_service.py        # pdfplumber + OCR.space PDF fallback
│   │   ├── ocr_service.py        # OCR.space API + Tesseract local fallback
│   │   ├── medical_parser.py     # Regex-based biomarker extraction
│   │   ├── risk_engine.py        # Clinical range classification + health score
│   │   └── ai_service.py        # Groq LLM: explanations, summary, recommendations
│   │
│   ├── db/
│   │   ├── database.py           # SQLite connection factory
│   │   ├── init_db.py            # Schema creation & migrations
│   │   └── crud.py               # All DB read/write operations
│   │
│   ├── models/
│   │   └── schemas.py            # Pydantic request/response models
│   │
│   └── uploads/                  # Uploaded files (gitignored)
│
└── frontend/
    ├── .env.local                 # NEXT_PUBLIC_API_URL
    ├── app/
    │   ├── layout.tsx             # Root layout (fonts, metadata)
    │   ├── globals.css            # Design system (CSS tokens, utilities)
    │   ├── page.tsx               # Landing page
    │   ├── upload/
    │   │   └── page.tsx           # Upload page (drag-and-drop)
    │   └── dashboard/
    │       └── [reportId]/
    │           └── page.tsx       # Results dashboard (dynamic route)
    │
    └── components/
        └── Navbar.tsx             # Sticky responsive navbar

🚀 Getting Started

Prerequisites

Requirement Notes
Python 3.11+ Backend runtime
Node.js 18+ Frontend runtime
Tesseract OCR Download for Windows — optional fallback
Groq API Key Free at console.groq.com
OCR.space API Key Free at ocr.space — optional

Backend Setup

# 1. Navigate to backend
cd backend

# 2. Create and activate virtual environment
python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Start the development server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

The API will be available at:


Frontend Setup

# 1. Navigate to frontend
cd frontend

# 2. Install dependencies
npm install

# 3. Start the development server
npm run dev

The frontend will be available at http://localhost:3000


Environment Variables

Create a .env file in the project root (Hackethon/.env):

# ── AI (Groq) ──────────────────────────────────────────────────────────────────
GROQ_API_KEY=your_groq_api_key_here

# ── OCR (OCR.space) ────────────────────────────────────────────────────────────
# Get a free key at https://ocr.space/OCRAPI
# Leave empty to use Tesseract-only fallback
OCR_SPACE_API_KEY=your_ocrspace_key_here

# ── CORS ───────────────────────────────────────────────────────────────────────
FRONTEND_URL=http://localhost:3000

Create a .env.local file inside frontend/:

# ── API URL ─────────────────────────────────────────────────────────────────────
NEXT_PUBLIC_API_URL=http://localhost:8000

💡 Groq Model Note: Ensure you use a currently supported model in ai_service.py.
As of 2026, use llama-3.3-70b-versatile or llama3-70b-8192. The llama3-8b-8192 model has been decommissioned.


📡 API Reference

POST /api/upload

Upload a medical report file.

Request: multipart/form-data

Field Type Description
file File PDF, JPG, or PNG — max 20 MB

Response:

{
  "report_id": 5,
  "document_url": "/uploads/abc123.pdf",
  "message": "File uploaded successfully"
}

POST /api/analyze/{report_id}

Run the full analysis pipeline on an uploaded report.

Response: AnalysisResult

{
  "report_id": 5,
  "health_score": 70,
  "biomarkers": [
    {
      "marker_id": 12,
      "marker_name": "Hemoglobin",
      "extracted_value": 10.2,
      "unit": "g/dL",
      "risk_category": "Moderate",
      "ai_explanation": "Your hemoglobin is slightly below the normal range..."
    }
  ],
  "ai_summary": "Your report shows mild anemia with otherwise normal metabolic markers...",
  "recommendations": [
    "Consult a hematologist about your hemoglobin levels",
    "Consider iron-rich foods like spinach and lentils"
  ]
}

GET /api/report/{report_id}

Retrieve a previously analyzed report from the database.

Response: Same AnalysisResult schema as above.

Error (404): Report not found or analysis not yet run.


🔬 Analysis Pipeline

The pipeline in services/pipeline.py runs 7 sequential steps:

Step 1: OCR Extraction
    ├── pdfplumber  → text-based PDFs (fastest, most accurate)
    ├── OCR.space   → scanned PDFs and images (cloud, handles tables well)
    └── Tesseract   → local fallback with OpenCV preprocessing

Step 2: Biomarker Parsing  (medical_parser.py)
    └── Regex + keyword matching to extract:
        name, numeric value, unit, reference range

Step 3: Risk Classification  (risk_engine.py)
    ├── Normal   → within reference range (or <10% deviation)
    ├── Moderate → 10–25% outside reference range
    └── Critical → >25% outside reference range

Step 4: Health Score Calculation  (risk_engine.py)
    └── Start at 100, deduct 15 per Critical, 7 per Moderate (min: 0)

Step 5: AI Explanations  (ai_service.py → Groq)
    └── Per-biomarker plain-English explanation

Step 6: AI Summary & Recommendations  (ai_service.py → Groq)
    ├── Overall health narrative
    └── Prioritized, actionable recommendations

Step 7: Persist to Database  (db/crud.py)
    └── health_score, biomarkers, ai_summary, recommendations → SQLite

🗄️ Database Schema

-- Users (placeholder, no auth currently)
CREATE TABLE users (
    user_id    INTEGER PRIMARY KEY AUTOINCREMENT,
    created_at TEXT    DEFAULT (datetime('now'))
);

-- Medical Reports
CREATE TABLE reports (
    report_id            INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id              INTEGER REFERENCES users(user_id),
    upload_timestamp     TEXT    DEFAULT (datetime('now')),
    document_url         TEXT    NOT NULL,
    overall_health_score INTEGER DEFAULT 100,
    ai_summary           TEXT,
    recommendations      TEXT    -- JSON array stored as string
);

-- Biomarker Results
CREATE TABLE biomarkers (
    marker_id       INTEGER PRIMARY KEY AUTOINCREMENT,
    report_id       INTEGER NOT NULL REFERENCES reports(report_id) ON DELETE CASCADE,
    marker_name     TEXT    NOT NULL,
    extracted_value REAL,
    unit            TEXT,
    risk_category   TEXT    CHECK(risk_category IN ('Normal','Moderate','Critical')) DEFAULT 'Normal',
    ai_explanation  TEXT
);

📊 Health Scoring

The health score (0–100) is calculated deterministically from biomarker risk classifications:

Risk Level Deduction Trigger Condition
Critical −15 points Value >25% outside reference range
Moderate −7 points Value 10–25% outside reference range
Normal −0 points Value within range (or <10% deviation)

Example:

13 biomarkers found:
  → 3 Critical  = 3 × 15 = 45 points deducted
  → 2 Moderate  = 2 ×  7 = 14 points deducted
  → 8 Normal    = 0 deducted

Health Score = max(0, 100 - 59) = 41

⚠️ Known Issues & Limitations

Issue Impact Fix / Workaround
llama3-8b-8192 model decommissioned AI explanations fail silently Update model in ai_service.py to llama-3.3-70b-versatile
No user authentication All reports are public by report_id Add JWT auth for production
SQLite not suitable for production Single-writer lock, no concurrent writes Migrate to PostgreSQL for deployment
OCR.space free tier limits 500 API calls/month Use Tesseract fallback or upgrade plan
Report history not shown in UI Users must know their report_id Add a History page

🗺️ Roadmap

  • Fix Groq model → update to llama-3.3-70b-versatile
  • Report History page — list all past uploads with timestamps
  • Export to PDF — print-friendly dashboard report
  • Authentication — user accounts via NextAuth / Supabase
  • Trend Analysis — compare reports over time (same user)
  • Deploy — Vercel (frontend) + Railway/Render (backend)
  • PostgreSQL migration — replace SQLite for production
  • More biomarkers — expand the reference range database

👥 Team

Built for HackXcelarate 2K26 by the Optimus Devs.


📄 License

MIT License — see LICENSE for details.


Made with ❤️ and ☕ for the HackXcelarate 2K26

🚀 Upload Your Report · 📖 API Docs

About

AI-powered medical report analyzer that extracts, interprets, and summarizes clinical reports using OCR, NLP, and intelligent health insights. Built with FastAPI, modern frontend technologies, and automated report processing pipelines for smarter healthcare assistance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors