Hybrid Recommendation Engine

Overview

A modular hybrid recommender system supporting:

Popularity baseline
Collaborative Filtering (user–user & item–item cosine)
Content-Based Ranking (TF-IDF over item metadata)
Neural Collaborative Filtering (PyTorch embeddings + MLP)
Hybrid Blending (tri-weight: CF, content, neural)
Alpha sweep & Optuna hyperparameter tuning
Cold-start evaluation (new / sparse users & items)
Ranking Metrics: Precision@K, Recall@K, NDCG@K, Coverage, Item Diversity
Reproducible pipeline (Makefile + scripts)

Why Hybrid?

Pure collaborative filtering struggles with cold-start and sparse data. Content-based models generalize to new items but lack deep personalization. Neural CF captures non-linear interactions. The hybrid blends these to raise NDCG and coverage while mitigating cold-start performance loss.

Features

Component	File	Description
Data download	`src/data_loading.py`	MovieLens 100K fetch & extract
Preprocessing	`src/preprocess.py`	Filtering low-interaction users, train/test split
Popularity baseline	`src/popularity.py`	Global item ranking
Collaborative Filtering	`src/cf_baseline.py`	User–User / Item–Item cosine similarity
Content-based	`src/content_based.py`	TF-IDF item embeddings & similarity
Neural CF	`src/neural_cf.py`	Embeddings + MLP (explicit ratings regression)
Hybrid blend	`src/hybrid.py`	Tri-weight combination of CF + content + neural
Metrics	`src/metrics.py`	Precision@K, Recall@K, NDCG@K, Coverage, Diversity
Evaluation	`src/evaluation.py`	Unified baseline evaluation
Alpha sweep	`scripts/alpha_sweep.py`	Evaluate multiple α values
Optuna tuning	`src/optuna_tune.py`	Optimize CF neighbor count / mode
Cold-start eval	`src/cold_start.py`	Segment sparse users/items
Logging	`src/logging_config.py`	Structured logging
Pipeline script	`scripts/run_pipeline.sh`	End-to-end automation
Make targets	`Makefile`	Reproducible commands

Metrics (Example / Placeholder)

Model	P@10	R@10	NDCG@10	Coverage	Diversity
Popularity	0.18	0.09	0.11	0.04	0.21
User-CF	0.27	0.14	0.21	0.33	0.37
Item-CF	0.26	0.13	0.20	0.29	0.35
Content (TF-IDF)	0.19	0.10	0.15	0.41	0.49
Neural CF	0.29	0.16	0.23	0.36	0.39
Hybrid (CF=0.6, Content=0.4)	0.31	0.17	0.24	0.44	0.46
Hybrid (CF=0.5, Content=0.3, Neural=0.2)	0.32	0.18	0.25	0.45	0.45

(Replace with real outputs after running.)

Cold-Start (Example)

Segment	P@10 (User-CF)	P@10 (Content)	P@10 (Hybrid)
New Users (≤4 ratings)	0.09	0.15	0.19
New Items (low exposure)	0.06	0.14	0.17

Directory Structure

.
├── README.md
├── LICENSE
├── Makefile
├── requirements.txt
├── models/
├── experiments/
├── scripts/
└── src/

Quick Start

python -m venv .venv
source .venv/bin/activate         # Windows: .venv\Scripts\activate
pip install -r requirements.txt

make download
make preprocess
make baselines
make neural
make hybrid
make evaluate
make alpha
make coldstart

Example Hybrid Run

Tri-Weight Blending

# CF + Content blend (no neural)
python -m src.hybrid --w_cf 0.6 --w_content 0.4 --w_neural 0.0 \
  --train_path data/processed/train.csv \
  --test_path data/processed/test.csv \
  --items_path data/ml-100k/u.item

# CF + Content + Neural blend (neural weight ignored if model not found)
python -m src.hybrid --w_cf 0.5 --w_content 0.3 --w_neural 0.2 \
  --train_path data/processed/train.csv \
  --test_path data/processed/test.csv \
  --items_path data/ml-100k/u.item \
  --neural_model_path models/neural_cf.pt

Note: If the neural model is not trained/available, the neural weight will be automatically set to 0 and weights will be renormalized.

API Usage

The system provides a FastAPI-based REST API for serving recommendations:

# Start the API server
uvicorn api.app:app --reload

# Get recommendations with custom weights
curl "http://localhost:8000/recommend?user_id=1&k=10&w_cf=0.6&w_content=0.4&w_neural=0.0"

# Check API health
curl "http://localhost:8000/health"

# Get metadata
curl "http://localhost:8000/meta"

Alpha Sweep

python scripts/alpha_sweep.py --alphas 0.3 0.5 0.7 0.9 --k 10

Optuna Hyperparameter Tuning

python -m src.optuna_tune --trials 30

Cold-Start Evaluation

python -m src.cold_start --train_path data/processed/train.csv \
  --test_path data/processed/test.csv --k 10

Scalability (Interview Talking Points)

Replace brute-force similarity with ANN (Faiss)
Offline batch refresh + incremental updates
Candidate generation → re-ranking pipeline
Feature store for user/item embeddings

https://gpt-website-builder-1-0.onrender.com/gpt/0acbab5a.html

Future Work

Transformer text embeddings
Implicit feedback (BPR / WARP)
Meta-learning blend weights
MLflow tracking
A/B simulation harness

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hybrid Recommendation Engine

Overview

Why Hybrid?

Features

Metrics (Example / Placeholder)

Cold-Start (Example)

Directory Structure

Quick Start

Example Hybrid Run

Tri-Weight Blending

API Usage

Alpha Sweep

Optuna Hyperparameter Tuning

Cold-Start Evaluation

Scalability (Interview Talking Points)

Future Work

License

Citation

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
api		api
experiments		experiments
models		models
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements-deploy.txt		requirements-deploy.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Hybrid Recommendation Engine

Overview

Why Hybrid?

Features

Metrics (Example / Placeholder)

Cold-Start (Example)

Directory Structure

Quick Start

Example Hybrid Run

Tri-Weight Blending

API Usage

Alpha Sweep

Optuna Hyperparameter Tuning

Cold-Start Evaluation

Scalability (Interview Talking Points)

Future Work

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages