Upload any blood test or lab report → Get AI-powered health insights in seconds.
⚠️ Demo purposes only. Not a substitute for professional medical advice.
- ⚡ How to Run
- Overview
- Live Demo Flow
- Features
- Architecture
- Tech Stack
- Project Structure
- Getting Started
- API Reference
- Analysis Pipeline
- Database Schema
- Health Scoring
- Known Issues & Limitations
- Roadmap
TL;DR — Two terminals, four commands, and you're live.
git clone https://github.com/Autonomous-Drone-Target-Tracking-System/MediScan-AI_Smart-Report-Analyzer.git
cd MediScan-AI_Smart-Report-Analyzer
# Copy the env template and fill in your API keys
copy .env.example .envOpen .env and set:
GROQ_API_KEY=<your Groq key> # https://console.groq.com/keys
OCR_SPACE_API_KEY=<your OCR key> # https://ocr.space/ocrapi (free tier OK)
FRONTEND_URL=http://localhost:3000
NEXT_PUBLIC_API_URL=http://localhost:8000cd backend
# Create & activate virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS / Linux
# Install dependencies
pip install -r requirements.txt
# Run the API server
uvicorn main:app --reload --host 0.0.0.0 --port 8000✅ Backend ready at http://localhost:8000 · Swagger docs at http://localhost:8000/docs
cd frontend
# Copy frontend env
copy .env.local.example .env.local
# Install dependencies
npm install
# Start the dev server
npm run dev✅ Frontend ready at http://localhost:3000
1. Open http://localhost:3000
2. Click "Analyze My Report"
3. Drag & drop a PDF or image of a blood/lab report
4. Click "Analyze Report"
5. Wait ~10–15 seconds → dashboard with your health score & AI insights
💡 No Tesseract? The app works fine without it — OCR.space handles scanned documents, and pdfplumber handles digital PDFs. Tesseract is only a local fallback.
MediScan AI is a full-stack web application built for a hackathon that transforms raw medical lab reports (PDFs or images) into clear, actionable health insights — powered by OCR and large language models.
A user uploads their blood test or lab report, and within seconds receives:
- A health score (0–100)
- Color-coded risk classification for each biomarker (Normal / Moderate / Critical)
- Plain-English AI explanations for every marker
- Personalized recommendations generated by Groq LLM
- A persistent dashboard they can revisit at any time
No sign-up. No medical degree required. Just clarity.
1. Visit http://localhost:3000
2. Click "Analyze My Report" → Upload page
3. Drag & drop a PDF or image of a lab report
4. Click "Analyze Report"
5. ⏱️ ~10–15 seconds later → redirected to your personal Dashboard
6. See your health score, biomarker table, AI summary, and recommendations
| Feature | Description |
|---|---|
| 📄 Smart OCR Extraction | pdfplumber (text PDFs) → OCR.space API (scanned) → Tesseract (local fallback) |
| 🧠 AI Interpretation | Groq LLM generates plain-English explanations for every biomarker |
| 📊 Risk Dashboard | Health score gauge, color-coded biomarker cards, risk badges |
| 🛡️ Rule-Based Validation | Clinical reference ranges engine — grounds AI output in facts |
| ⚡ Results in Seconds | Upload to full dashboard in under 15 seconds |
| 🔬 30+ Biomarkers | Hemoglobin, LDL, Blood Sugar, TSH, Vitamin D, Creatinine, and more |
| 💾 Persistent Reports | All analyses stored in SQLite — revisit any report via dashboard URL |
| 📱 Fully Responsive | Mobile-first design, works on phones, tablets, and desktops |
| 🔒 No Login Required | Zero friction — upload and analyze immediately |
┌─────────────────────────────────────────────────────────────────┐
│ USER BROWSER │
│ Next.js 16 Frontend │
│ Landing → Upload → Dashboard (per report_id) │
└──────────────────────┬──────────────────────────────────────────┘
│ HTTP (axios)
▼
┌────────────────────────────────────────────────────────────────┐
│ FastAPI Backend :8000 │
│ │
│ POST /api/upload POST /api/analyze/{id} │
│ GET /api/report/{id} GET /docs (Swagger) │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Analysis Pipeline │ │
│ │ │ │
│ │ 1. OCR Router ──────────────────────────────────────┐ │ │
│ │ ├── pdfplumber (text-based PDFs) │ │ │
│ │ ├── OCR.space API (scanned PDFs & images) │ │ │
│ │ └── Tesseract + OpenCV (local fallback) │ │ │
│ │ │ │ │
│ │ 2. Medical Parser ──── regex + keyword matching │ │ │
│ │ └── Extracts: name, value, unit, ref range │ │ │
│ │ │ │ │
│ │ 3. Risk Engine ───────────────────────────────────┐ │ │ │
│ │ ├── Classifies: Normal / Moderate / Critical │ │ │ │
│ │ └── Calculates: Health Score (0–100) │ │ │ │
│ │ │ │ │ │
│ │ 4. Groq LLM ──────────────────────────────────── ◄┘ │ │ │
│ │ ├── Per-biomarker plain-English explanations │ │ │
│ │ ├── Overall health summary │ │ │
│ │ └── Personalized recommendations │ │ │
│ │ │ │ │
│ │ 5. SQLite Persistence ──────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
│
▼
SQLite (medical.db)
reports + biomarkers tables
| Technology | Version | Purpose |
|---|---|---|
| FastAPI | 0.136 | REST API framework |
| Uvicorn | 0.47 | ASGI server |
| pdfplumber | 0.11.9 | Text extraction from digital PDFs |
| OCR.space API | — | Cloud OCR for scanned documents |
| pytesseract | 0.3.13 | Local OCR fallback |
| OpenCV | 4.13 | Image preprocessing for OCR |
| Groq SDK | 1.2.0 | LLM inference (llama3 models) |
| SQLite3 | built-in | Lightweight persistence |
| Pydantic | 2.13 | Data validation & schemas |
| python-multipart | 0.0.29 | File upload handling |
| python-dotenv | 1.2.2 | Environment variable loading |
| Technology | Version | Purpose |
|---|---|---|
| Next.js | 16.2.6 (Turbopack) | React framework with SSR/CSR |
| TypeScript | 5 | Type safety |
| framer-motion | — | Animations & micro-interactions |
| Recharts | — | Health score gauge & charts |
| axios | — | HTTP client for API calls |
| Lucide React | — | Icon library |
| Poppins + Inter | Google Fonts | Typography |
| Vanilla CSS | — | Custom design system with CSS tokens |
Hackethon/
├── .env # Root env file (shared API keys)
├── README.md
│
├── backend/
│ ├── main.py # FastAPI app entry point, CORS, lifespan
│ ├── requirements.txt # Python dependencies
│ ├── medical.db # SQLite database (auto-created)
│ │
│ ├── routes/
│ │ ├── upload.py # POST /api/upload
│ │ └── analyze.py # POST /api/analyze/{id}, GET /api/report/{id}
│ │
│ ├── services/
│ │ ├── pipeline.py # Main orchestrator: OCR→Parse→Classify→AI→DB
│ │ ├── ocr_router.py # Routes to PDF or image extractor
│ │ ├── pdf_service.py # pdfplumber + OCR.space PDF fallback
│ │ ├── ocr_service.py # OCR.space API + Tesseract local fallback
│ │ ├── medical_parser.py # Regex-based biomarker extraction
│ │ ├── risk_engine.py # Clinical range classification + health score
│ │ └── ai_service.py # Groq LLM: explanations, summary, recommendations
│ │
│ ├── db/
│ │ ├── database.py # SQLite connection factory
│ │ ├── init_db.py # Schema creation & migrations
│ │ └── crud.py # All DB read/write operations
│ │
│ ├── models/
│ │ └── schemas.py # Pydantic request/response models
│ │
│ └── uploads/ # Uploaded files (gitignored)
│
└── frontend/
├── .env.local # NEXT_PUBLIC_API_URL
├── app/
│ ├── layout.tsx # Root layout (fonts, metadata)
│ ├── globals.css # Design system (CSS tokens, utilities)
│ ├── page.tsx # Landing page
│ ├── upload/
│ │ └── page.tsx # Upload page (drag-and-drop)
│ └── dashboard/
│ └── [reportId]/
│ └── page.tsx # Results dashboard (dynamic route)
│
└── components/
└── Navbar.tsx # Sticky responsive navbar
| Requirement | Notes |
|---|---|
| Python 3.11+ | Backend runtime |
| Node.js 18+ | Frontend runtime |
| Tesseract OCR | Download for Windows — optional fallback |
| Groq API Key | Free at console.groq.com |
| OCR.space API Key | Free at ocr.space — optional |
# 1. Navigate to backend
cd backend
# 2. Create and activate virtual environment
python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Start the development server
uvicorn main:app --reload --host 0.0.0.0 --port 8000The API will be available at:
- Base URL: http://localhost:8000
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
# 1. Navigate to frontend
cd frontend
# 2. Install dependencies
npm install
# 3. Start the development server
npm run devThe frontend will be available at http://localhost:3000
Create a .env file in the project root (Hackethon/.env):
# ── AI (Groq) ──────────────────────────────────────────────────────────────────
GROQ_API_KEY=your_groq_api_key_here
# ── OCR (OCR.space) ────────────────────────────────────────────────────────────
# Get a free key at https://ocr.space/OCRAPI
# Leave empty to use Tesseract-only fallback
OCR_SPACE_API_KEY=your_ocrspace_key_here
# ── CORS ───────────────────────────────────────────────────────────────────────
FRONTEND_URL=http://localhost:3000Create a .env.local file inside frontend/:
# ── API URL ─────────────────────────────────────────────────────────────────────
NEXT_PUBLIC_API_URL=http://localhost:8000💡 Groq Model Note: Ensure you use a currently supported model in
ai_service.py.
As of 2026, usellama-3.3-70b-versatileorllama3-70b-8192. Thellama3-8b-8192model has been decommissioned.
Upload a medical report file.
Request: multipart/form-data
| Field | Type | Description |
|---|---|---|
file |
File | PDF, JPG, or PNG — max 20 MB |
Response:
{
"report_id": 5,
"document_url": "/uploads/abc123.pdf",
"message": "File uploaded successfully"
}Run the full analysis pipeline on an uploaded report.
Response: AnalysisResult
{
"report_id": 5,
"health_score": 70,
"biomarkers": [
{
"marker_id": 12,
"marker_name": "Hemoglobin",
"extracted_value": 10.2,
"unit": "g/dL",
"risk_category": "Moderate",
"ai_explanation": "Your hemoglobin is slightly below the normal range..."
}
],
"ai_summary": "Your report shows mild anemia with otherwise normal metabolic markers...",
"recommendations": [
"Consult a hematologist about your hemoglobin levels",
"Consider iron-rich foods like spinach and lentils"
]
}Retrieve a previously analyzed report from the database.
Response: Same AnalysisResult schema as above.
Error (404): Report not found or analysis not yet run.
The pipeline in services/pipeline.py runs 7 sequential steps:
Step 1: OCR Extraction
├── pdfplumber → text-based PDFs (fastest, most accurate)
├── OCR.space → scanned PDFs and images (cloud, handles tables well)
└── Tesseract → local fallback with OpenCV preprocessing
Step 2: Biomarker Parsing (medical_parser.py)
└── Regex + keyword matching to extract:
name, numeric value, unit, reference range
Step 3: Risk Classification (risk_engine.py)
├── Normal → within reference range (or <10% deviation)
├── Moderate → 10–25% outside reference range
└── Critical → >25% outside reference range
Step 4: Health Score Calculation (risk_engine.py)
└── Start at 100, deduct 15 per Critical, 7 per Moderate (min: 0)
Step 5: AI Explanations (ai_service.py → Groq)
└── Per-biomarker plain-English explanation
Step 6: AI Summary & Recommendations (ai_service.py → Groq)
├── Overall health narrative
└── Prioritized, actionable recommendations
Step 7: Persist to Database (db/crud.py)
└── health_score, biomarkers, ai_summary, recommendations → SQLite
-- Users (placeholder, no auth currently)
CREATE TABLE users (
user_id INTEGER PRIMARY KEY AUTOINCREMENT,
created_at TEXT DEFAULT (datetime('now'))
);
-- Medical Reports
CREATE TABLE reports (
report_id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER REFERENCES users(user_id),
upload_timestamp TEXT DEFAULT (datetime('now')),
document_url TEXT NOT NULL,
overall_health_score INTEGER DEFAULT 100,
ai_summary TEXT,
recommendations TEXT -- JSON array stored as string
);
-- Biomarker Results
CREATE TABLE biomarkers (
marker_id INTEGER PRIMARY KEY AUTOINCREMENT,
report_id INTEGER NOT NULL REFERENCES reports(report_id) ON DELETE CASCADE,
marker_name TEXT NOT NULL,
extracted_value REAL,
unit TEXT,
risk_category TEXT CHECK(risk_category IN ('Normal','Moderate','Critical')) DEFAULT 'Normal',
ai_explanation TEXT
);The health score (0–100) is calculated deterministically from biomarker risk classifications:
| Risk Level | Deduction | Trigger Condition |
|---|---|---|
| Critical | −15 points | Value >25% outside reference range |
| Moderate | −7 points | Value 10–25% outside reference range |
| Normal | −0 points | Value within range (or <10% deviation) |
Example:
13 biomarkers found:
→ 3 Critical = 3 × 15 = 45 points deducted
→ 2 Moderate = 2 × 7 = 14 points deducted
→ 8 Normal = 0 deducted
Health Score = max(0, 100 - 59) = 41
| Issue | Impact | Fix / Workaround |
|---|---|---|
llama3-8b-8192 model decommissioned |
AI explanations fail silently | Update model in ai_service.py to llama-3.3-70b-versatile |
| No user authentication | All reports are public by report_id |
Add JWT auth for production |
| SQLite not suitable for production | Single-writer lock, no concurrent writes | Migrate to PostgreSQL for deployment |
| OCR.space free tier limits | 500 API calls/month | Use Tesseract fallback or upgrade plan |
| Report history not shown in UI | Users must know their report_id |
Add a History page |
- Fix Groq model → update to
llama-3.3-70b-versatile - Report History page — list all past uploads with timestamps
- Export to PDF — print-friendly dashboard report
- Authentication — user accounts via NextAuth / Supabase
- Trend Analysis — compare reports over time (same user)
- Deploy — Vercel (frontend) + Railway/Render (backend)
- PostgreSQL migration — replace SQLite for production
- More biomarkers — expand the reference range database
Built for HackXcelarate 2K26 by the Optimus Devs.
MIT License — see LICENSE for details.
Made with ❤️ and ☕ for the HackXcelarate 2K26