AI Career Intelligence Engine

An intelligent tool to analyze, score, and rank resumes against job descriptions using FastAPI, NLP, and PGVector.

🚀 Features

Resume Parsing: Extract text from PDF and DOCX files.
Bias-Aware Privacy: Redact PII (names, emails, phones) before analysis to ensure fairness.
Skills Matching: Automatically extract skills and identify gaps.
Smart Scoring: Uses Sentence-Transformers for semantic similarity between resumes and job descriptions.
Portfolio Scaffold: Ready-to-use landing page for showcase.

🛠 Tech Stack

Backend: FastAPI, SQLAlchemy, PostgreSQL (PGVector), SpaCy, Sentence-Transformers.
Frontend: React, Vite, CSS Modules.
DevOps: Docker, Docker Compose, GitHub Actions.

📦 Getting Started

Prerequisites

Docker and Docker Compose
Node.js (optional, for local frontend development)
Python 3.11+ (optional, for local backend development)

Local Development

Clone the repo:

git clone https://github.com/Umoru98/ai-career-intelligence-engine
cd ai-career-intelligence-engine

Setup Environment:
```
cp .env.example .env
```
Start Services:
```
docker-compose up -d
```

Run Migrations:

docker-compose exec api alembic upgrade head

The API will be available at http://localhost:8000/docs and the frontend at http://localhost:5173.

📈 Makefile Commands

We provide a Makefile for convenience:

make up: Start services.
make logs: View logs.
make test: Run backend tests.
make migrate: Run database migrations.
make lint: Run linters (Ruff/MyPy).

🛡 Security

See SECURITY.md for reporting vulnerabilities.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

A production-ready, API-first resume analysis platform powered by Sentence Transformers, spaCy, and FastAPI. Upload resumes, paste a job description, and get instant match scores, skill gap analysis, and actionable improvement suggestions.

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    docker-compose                        │
│                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │  Frontend    │  │   Backend    │  │  PostgreSQL  │  │
│  │  React+Vite  │→ │   FastAPI    │→ │  + pgvector  │  │
│  │  nginx:80    │  │  port 8000   │  │  port 5432   │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└─────────────────────────────────────────────────────────┘

Pipeline

Upload (PDF/DOCX)
    → Text Extraction (pdfplumber / python-docx)
    → Text Cleaning (whitespace, bullets, page numbers)
    → PII Redaction (spaCy NER + regex: names, emails, phones, addresses)
    → Section Detection (regex heading rules)
    → Skills Extraction (dictionary/PhraseMatcher against skills.yml)
    → Embedding Generation (Sentence Transformers: all-MiniLM-L6-v2)
    → Cosine Similarity vs JD Embedding
    → Score Normalization: (cos_sim + 1) / 2 × 100
    → Skill Gap Analysis (intersection / difference)
    → Template-based Explanation + Suggestions
    → Store in PostgreSQL

Local Development

Prerequisites

Docker Desktop
Python 3.11+ (for local dev without Docker)
Node 20+ (for frontend dev)

Quick Start (Docker)

# 1. Clone and configure
cp .env.example .env

# 2. Start all services
docker-compose up -d

# 3. Run database migrations
docker-compose exec api alembic upgrade head

# 4. (Optional) Pre-download ML models
make download-models

# 5. Open the app
# Frontend: http://localhost:5173
# API docs: http://localhost:8000/docs

Local Backend Dev

cd backend
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm

# Set env vars (copy .env.example to .env and adjust DATABASE_URL)
uvicorn app.main:app --reload --port 8000

Local Frontend Dev

cd frontend
npm install
npm run dev
# Opens at http://localhost:5173

API Usage Examples

Upload a Resume

curl -X POST http://localhost:8000/v1/resumes/upload \
  -F "file=@resume.pdf"

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "original_filename": "resume.pdf",
  "extraction_status": "success",
  "sha256": "abc123...",
  "created_at": "2026-02-18T17:00:00Z"
}

Create a Job Description

curl -X POST http://localhost:8000/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Senior Python Engineer",
    "description": "We are looking for a Python engineer with FastAPI, PostgreSQL, Docker, and AWS experience..."
  }'

Analyze One Resume

curl -X POST http://localhost:8000/v1/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "resume_id": "<resume-uuid>",
    "job_id": "<job-uuid>"
  }'

Rank Multiple Resumes

# Upload 3 resumes first, then:
curl -X POST http://localhost:8000/v1/jobs/<job-uuid>/rank \
  -H "Content-Type: application/json" \
  -d '{
    "resume_ids": ["<uuid1>", "<uuid2>", "<uuid3>"]
  }'

Sample Workflow (3 Resumes → Rank)

# Step 1: Upload 3 resumes
R1=$(curl -s -X POST http://localhost:8000/v1/resumes/upload -F "file=@alice.pdf" | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
R2=$(curl -s -X POST http://localhost:8000/v1/resumes/upload -F "file=@bob.docx" | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
R3=$(curl -s -X POST http://localhost:8000/v1/resumes/upload -F "file=@carol.pdf" | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")

# Step 2: Create JD
JOB=$(curl -s -X POST http://localhost:8000/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{"title":"Python Dev","description":"Python, FastAPI, Docker, PostgreSQL, AWS, CI/CD experience required."}' \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")

# Step 3: Rank
curl -X POST http://localhost:8000/v1/jobs/$JOB/rank \
  -H "Content-Type: application/json" \
  -d "{\"resume_ids\": [\"$R1\", \"$R2\", \"$R3\"]}"

# Step 4: View details
curl http://localhost:8000/v1/resumes/$R1

Score Interpretation

Score	Meaning
75–100%	Strong match
50–74%	Moderate match
0–49%	Weak match

Score formula: score = clamp((cosine_similarity + 1) / 2 × 100, 0, 100)

This is a linear normalization of cosine similarity from [-1, 1] to [0, 100]. Typical resume-JD similarities range from 0.3–0.9 (50–95%). A calibrated threshold model is a future improvement (see TODO in embedder.py).

ML Models

Model	Purpose	Size
`sentence-transformers/all-MiniLM-L6-v2`	Text embeddings	~80MB
`en_core_web_sm`	NER for PII redaction	~12MB

Offline / Air-gapped Environments

Models are pre-downloaded during Docker build (Dockerfile). For fully offline use:

# Pre-download on a machine with internet, then copy cache
docker-compose exec api python -c "
from sentence_transformers import SentenceTransformer
SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
"

The model cache is stored in the model_cache Docker volume.

Database Schema

Table	Purpose
`resumes`	Uploaded files + extracted/cleaned/redacted text + sections
`jobs`	Job descriptions
`embeddings`	Cached embedding vectors (JSONB; pgvector upgrade path documented)
`analyses`	Match results: score, skills, explanation, suggestions

pgvector Upgrade Path

Currently embeddings are stored as JSONB arrays. To upgrade to pgvector:

Ensure pgvector/pgvector:pg16 image is used (already in docker-compose)
The migration runs CREATE EXTENSION IF NOT EXISTS vector
Add a new vector(384) column to embeddings table
Migrate JSONB → vector column
Create ivfflat or hnsw index for ANN search

Running Tests

cd backend
pip install aiosqlite  # for in-memory SQLite tests
pytest tests/ -v --tb=short
pytest tests/ --cov=app --cov-report=term-missing

Security Considerations

File validation: Content-type + size enforced before processing
Safe filenames: UUID-based, no user-provided paths
PII redaction: Names, emails, phones, addresses removed before embedding
No code execution: Uploaded files are never executed
Non-root Docker: API runs as appuser (UID 1000)
CORS: Configurable via CORS_ORIGINS env var
No auth (MVP): Structure supports adding OAuth2/JWT middleware to FastAPI
Secrets: Never logged; use .env file (not committed)

Advanced Features Implemented

✅ Bias-aware scoring (PII redaction before embeddings)
✅ Section detection (education, skills, experience, projects, certifications)
✅ Resume improvement suggestions (grounded, template-based)
✅ Multiple resume comparison (/v1/compare)
✅ API-first design with versioned endpoints

TODOs / Future Work

Calibrate score thresholds with labeled data
OCR support for scanned PDFs (Tesseract integration, opt-in)
Celery/RQ for background embedding jobs
pgvector ANN indexing for large-scale ranking
Authentication (OAuth2 + JWT)
Resume version history
Export results as PDF/CSV
LLM-based suggestions (constrained, evidence-grounded)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Career Intelligence Engine

🚀 Features

🛠 Tech Stack

📦 Getting Started

Prerequisites

Local Development

📈 Makefile Commands

🛡 Security

📄 License

Architecture Overview

Pipeline

Local Development

Prerequisites

Quick Start (Docker)

Local Backend Dev

Local Frontend Dev

API Usage Examples

Upload a Resume

Create a Job Description

Analyze One Resume

Rank Multiple Resumes

Sample Workflow (3 Resumes → Rank)

Score Interpretation

ML Models

Offline / Air-gapped Environments

Database Schema

pgvector Upgrade Path

Running Tests

Security Considerations

Advanced Features Implemented

TODOs / Future Work

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AI Career Intelligence Engine

🚀 Features

🛠 Tech Stack

📦 Getting Started

Prerequisites

Local Development

📈 Makefile Commands

🛡 Security

📄 License

Architecture Overview

Pipeline

Local Development

Prerequisites

Quick Start (Docker)

Local Backend Dev

Local Frontend Dev

API Usage Examples

Upload a Resume

Create a Job Description

Analyze One Resume

Rank Multiple Resumes

Sample Workflow (3 Resumes → Rank)

Score Interpretation

ML Models

Offline / Air-gapped Environments

Database Schema

pgvector Upgrade Path

Running Tests

Security Considerations

Advanced Features Implemented

TODOs / Future Work