A full-stack application implementing a custom Recommender System with a modern web interface.
- Robust Recommendation Engine
- Matrix Factorization: Custom implementations trained with Numba-accelerated Stochastic Gradient Descent (SGD, Option 1) and Sparse CSR-block Alternating Least Squares (MF-ALS, Option 4). Both feature massive loop parallelization and dynamic Early Stopping for millisecond-scale processing on 27M+ ratings.
- Deep Neural CF: Hybrid deep learning model with Text CNN for title feature extraction.
- Matrix SVD: Closed-form SVD latent factors calibrated with Ridge/Lasso regression.
- Automated Data Processing
- Per-user random 80/20 data split for reliable training and testing.
- Comprehensive Evaluation Metrics
- Rating Prediction:
MAE,RMSE - Top-K Recommendations:
Precision@10,Recall@10,F-measure@10,NDCG@10
- Rating Prediction:
- Structural Data Analysis
- Distribution profiling, feature influence analysis, latent factor interpretation, and synthetic data generation.
- Modern Web Interface (Next.js + Django)
- Browse library with multi-genre filtering and sorting.
- Detailed movie pages with metadata and similar movie suggestions.
- Personalized user profiles showcasing rating history and top recommendations.
- Dynamic TMDB API integration for rich image enrichment (posters and backdrops).
| Home Page | Top Picks |
|---|---|
![]() |
![]() |
| Library | Movie Detail | Community |
|---|---|---|
![]() |
![]() |
![]() |
| User Profile | Actor Detail | Settings |
|---|---|---|
![]() |
![]() |
![]() |
dataset/ # Raw MovieLens data (e.g. ml-latest/)
backend/ # Django REST API
frontend/ # Next.js web application
models/ # ML model code and generated artifacts (option1, option2, option3_ridge, option3_lasso, option4, splits)
scripts/ # Training, evaluation, enrichment, and report generation
analysis/ # Final report (final_report.md), figures, and JSON/CSV artifacts
Once the Python venv and dependencies are installed (Step 1 below) and the model is trained (Step 2), you can start both backend and frontend with one command:
- Windows (PowerShell):
.\start.ps1 - macOS / Linux:
./start.sh
This starts the Django API on port 8001 and the Next.js app on port 3001, and opens the app in your browser.
Requires Python 3.11 (or a compatible 3.x version). Create and activate a virtual environment, then install dependencies:
# macOS / Linux
python -m venv .venv
source .venv/bin/activate
# Windows
python -m venv .venv
.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtTrain on MovieLens data under dataset/ml-latest/ (or pass --dataset-dir).
Model files are written to models/artifacts/<model-type>/, and split metadata is shared in models/artifacts/.
Quick default run:
python -m scripts.train_and_evaluate --dataset-dir dataset/ml-latest --top-k 10Model-specific quick runs:
python -m scripts.train_and_evaluate --model-type option1 --dataset-dir dataset/ml-latest
python -m scripts.train_and_evaluate --model-type option2 --dataset-dir dataset/ml-latest
python -m scripts.train_and_evaluate --model-type option3_ridge --dataset-dir dataset/ml-latest
python -m scripts.train_and_evaluate --model-type option3_lasso --dataset-dir dataset/ml-latest
python -m scripts.train_and_evaluate --model-type option4 --dataset-dir dataset/ml-latestDetailed parameter presets and copy-ready commands are documented in @TRAINING_PARAMETERS.md.
GPU note: Option 2 is PyTorch-based and auto-selects
cuda/mpswhen available. Training, deep-model dependencies, and plotting tools are all included in:pip install -r requirements.txt
The script caches a shared train/test split in models/artifacts/splits/ so all models are evaluated on the same holdout split.
Use --force-resplit to regenerate.
Enrich movie records with posters, backdrops, overviews, and cast/director data from TMDB. The script reads movies.csv from dataset/ml-latest (or --dataset-dir) and writes movies_enriched.csv beside it. If the dataset copy is unavailable, it falls back to models/artifacts. Run after training (Step 2).
- Get a free API key at TMDB and create a
.envin the project root:TMDB_API_KEY=your_api_key_here
- Run the scraper:
python -m scripts.scrape_tmdb
Start the Django development server:
cd backend
python manage.py runserver 8001⚡ Performance Architecture — Lazy Loading: The backend uses a lazy model loading strategy optimized for the full MovieLens 27M dataset. On startup, the server only scans which model files exist on disk (instant) rather than loading all 5 model pickle files into memory (which would take 30+ seconds).
- First request after startup: The active model (~100–200 MB) is loaded on-demand when the first API request arrives. Expect a ~15–25 second wait on the very first request (or when switching to a new model via Settings).
- All subsequent requests: Served from memory in < 5 ms. Models stay cached until the server restarts.
- Switching models: When you switch the active model in the frontend Settings page, the newly selected model is loaded on-demand. This takes ~15–25 seconds for the first request with that model, then all subsequent requests are instant.
User history and rating statistics are computed dynamically via Pandas DataFrames rather than pre-built Python dictionaries, reducing memory usage from ~5 GB to ~500 MB for the full dataset.
Warnings you can safely ignore (they do not affect functionality):
Pandas requires version '2.10.2' or newer of 'numexpr' ...
nopython is set for njit and is ignored ...
You have 2 unapplied migration(s) ...
Key Endpoints:
GET /api/health— API health checkGET /api/movies— Paginated movies with search and genre filtersGET /api/movie/<id>— Movie detail and metadataGET /api/recommend/<user_id>— Top-K recommendations for a userGET /api/users— User listGET /api/user/<user_id>/history— User rating historyGET /api/predict/<user_id>/<item_id>— Predicted rating for a user–item pairGET /api/search— Full-text movie searchGET /api/stats— Database statisticsGET /api/model-config— Loaded model configuration- TMDB and scrape endpoints for image enrichment (see backend
api/urls.pyfor full list)
In a new terminal, start the Next.js application:
cd frontend
npm install
npm run dev -- -p 3001(Optional) If you need to specify a custom backend URL:
NEXT_PUBLIC_API_BASE_URL="http://localhost:8001/api" npm run dev -- -p 3001Access the application at: http://localhost:3001
This repository includes a render.yaml blueprint for a two-service deployment:
streamx-backend(Django API, Python)streamx-frontend(Next.js UI, Node.js)
In Render, create a Blueprint service and point it to this repository. Render will detect render.yaml and propose both services.
Set frontend env var NEXT_PUBLIC_API_BASE_URL to your backend public URL:
https://<your-backend-service>.onrender.com/api
For the free-tier blueprint, the backend uses STREAMX_DATA_DIR=/tmp/streamx.
- On first boot,
backend/start_render.shseeds that directory frommodels/artifacts/. - Runtime updates (for example
active_model.txt,movies_enriched.csv, andscrape_state.json) are written there while the instance is alive. - Note:
/tmpis ephemeral on free tier, so data may reset after restart/redeploy.
SECRET_KEY(generated in blueprint by default)DEBUG=FalseALLOWED_HOSTS=.onrender.com(or your custom domain list)TMDB_API_KEY(optional, required for TMDB scraping endpoints)
- The data loader supports both
csvanddatMovieLens formats. - The recommender algorithm is built from scratch and does not rely on black-box recommendation libraries.
- The analysis pipeline is designed to support course-style interpretation questions, not only predictive metrics.
- The UI features a responsive design, glass-morphism effects, and dynamic filtering components.
Special thanks to the open-source projects and communities that made this possible:







