📘 Movie Recommendation System - Complete Project Guide

Comprehensive technical documentation for developers, covering installation, configuration, development, deployment, and troubleshooting.

📑 Table of Contents

🎯 Overview

This guide provides comprehensive documentation for the Movie Recommendation System, a production-ready Django application that delivers intelligent movie recommendations using advanced machine learning algorithms.

What This Guide Covers

✅ Installation: Step-by-step setup instructions
✅ Configuration: Environment variables and settings
✅ Model Training: Creating custom recommendation models
✅ Development: Working with the codebase
✅ Deployment: Production deployment guides
✅ API Reference: Complete endpoint documentation
✅ Troubleshooting: Common issues and solutions

✅ Prerequisites

System Requirements

Component	Minimum	Recommended
Python	3.10+	3.10+
RAM	4GB	8GB+
Storage	2GB	5GB+
OS	Windows/macOS/Linux	Any

Required Software

Python 3.10 or higher - Download here
pip - Python package manager (included with Python)
Git - Version control Download here
Virtual environment tool - venv (included with Python)

Optional Software

PostgreSQL - For production database
Redis - For caching (production)
Docker - For containerized deployment

💻 Installation

Step 1: Clone the Repository

git clone https://github.com/yourusername/movie-recommendation-system.git
cd movie-recommendation-system

Step 2: Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate

# macOS/Linux:
source venv/bin/activate

Verification: Your terminal should show (venv) prefix.

Step 3: Install Dependencies

# Upgrade pip
pip install --upgrade pip

# Install project dependencies
pip install -r requirements.txt

Expected output: All packages installed successfully without errors.

Step 4: Database Setup

# Run database migrations
python manage.py migrate

Output: Should show migrations applied successfully.

Step 5: Start Development Server

python manage.py runserver

Output:

Starting development server at http://127.0.0.1:8000/

Step 6: Verify Installation

Open your browser and navigate to:

http://localhost:8000

You should see the Movie Recommendation System home page. ✅

✓ Quick Verification

Run these commands to verify everything is working:

# 1. Check Python version
python --version
# Expected: Python 3.10.x or higher

# 2. Check Django installation
python manage.py --version
# Expected: Django version number

# 3. Test health endpoint
curl http://localhost:8000/api/health/
# Expected: {"status": "healthy", ...}

# 4. Test search API
curl "http://localhost:8000/api/search/?q=matrix"
# Expected: {"movies": [...], "count": ...}

🏗️ Project Architecture

High-Level Architecture

Component Breakdown

Django Application (`movie_recommendation/`)

settings.py: Configuration and environment settings
urls.py: URL routing to apps
wsgi.py: WSGI application entry point

Recommender App (`recommender/`)

views.py: Core recommendation logic
- MovieRecommender class (model loading, recommendations)
- View functions (main, search_movies, health_check)
urls.py: URL patterns for the app
templates/: HTML templates with inline CSS

Model Files (`models/` or `static/`)

movie_metadata.parquet: Movie information (title, rating, genres, etc.)
similarity_matrix.npz: Precomputed similarity scores (sparse format)
title_to_idx.json: Mapping from titles to indices
tfidf_vectorizer.pkl: TF-IDF model (for future retraining)
svd_model.pkl: SVD dimensionality reduction model

Training Scripts (`training/`)

train.py: Complete training pipeline
infer.py: Inference examples and usage
guide.md: Training documentation

🔍 How It Works

Recommendation Pipeline

1. User Input
   └─> "Inception"

2. Fuzzy Matching
   └─> Find closest title in database
       └─> "Inception" (exact match) ✓

3. Get Movie Index
   └─> title_to_idx["Inception"] = 42

4. Fetch Similarity Scores
   └─> similarity_matrix[42] = [0.95, 0.87, 0.82, ...]

5. Sort & Filter
   └─> Top 15 similar movies (excluding input)
   └─> Apply filters (rating, year, genre)

6. Format Response
   └─> Return movie details with metadata

7. Display Results
   └─> Render cards with ratings, genres, links

Content-Based Filtering

The system uses content-based filtering with these features:

TF-IDF Vectorization
- Converts movie features (genres, keywords, plot) into numerical vectors
- Captures importance of terms relative to corpus
SVD Dimensionality Reduction (optional)
- Reduces feature space from thousands to 300-600 dimensions
- Captures latent patterns and reduces noise
- Makes computation more efficient
Cosine Similarity
- Measures similarity between movie vectors
- Ranges from 0 (completely different) to 1 (identical)
Ranking & Filtering
- Ranks movies by similarity score
- Applies user-defined filters (year, rating, genre)

For more details, see training/guide.md - How It Works

🎓 Model Training

Overview

The system supports two model sources:

Demo Model (included) - 2,000 popular movies, ready to use
Custom Model (train your own) - 10K to 1M+ movies

Using Demo Model

# Demo model is in static/ directory
export MODEL_DIR=./static
python manage.py runserver

No training needed! Works out of the box.

Training Your Own Model

For complete training documentation, see training/guide.md

Quick Training Example:

from training.train import MovieRecommenderTrainer

# Initialize trainer
trainer = MovieRecommenderTrainer(
    output_dir='./models',
    use_dimensionality_reduction=True,
    n_components=500
)

# Train on your dataset
df, sim_matrix = trainer.train(
    'path/to/your/dataset.csv',
    quality_threshold='medium',  # low/medium/high
    max_movies=100000
)

Training Configurations

Configuration	Movies	Time	Memory	Model Size	Use Case
Small	10K	2 min	500MB	40MB	Testing
Medium ⭐	100K	15 min	2GB	180MB	Production
Large	1M+	60 min	6GB	800MB	Full dataset

Dataset Requirements

Your CSV must have these columns:

title (required)
genres (required)
keywords (required)
vote_average, vote_count (recommended)
release_date, imdb_id, poster_path (optional)

For detailed requirements and training guide, see training/guide.md

🔧 Using Different Models

Switching Models

The system uses the MODEL_DIR environment variable to locate models:

# Method 1: Environment variable
export MODEL_DIR=./models
python manage.py runserver

# Method 2: .env file
echo "MODEL_DIR=./models" >> .env
python manage.py runserver

# Method 3: Modify settings.py
# MODEL_DIR = os.path.join(BASE_DIR, 'models')

Model Directory Structure

Your model directory must contain:

models/
├── movie_metadata.parquet    # Required
├── similarity_matrix.npy     # Required (or .npz)
├── title_to_idx.json         # Required
├── config.json               # Optional (for metadata)
├── tfidf_vectorizer.pkl      # Optional (for retraining)
└── svd_model.pkl            # Optional (for retraining)

Verifying Model

# Check health endpoint
curl http://localhost:8000/api/health/

# Response shows model information
{
  "status": "healthy",
  "movies_loaded": 100000,
  "model_dir": "./models",
  "model_loaded": true
}

⚙️ Configuration

Environment Variables

Create a .env file in the project root:

# Django Core
SECRET_KEY=your-secret-key-here-minimum-50-characters
DEBUG=True
ALLOWED_HOSTS=localhost,127.0.0.1

# Model Configuration
MODEL_DIR=./models

# Database (optional - defaults to SQLite)
DATABASE_URL=postgresql://user:password@localhost:5432/dbname

# CORS (if using separate frontend)
CORS_ALLOWED_ORIGINS=http://localhost:3000,http://127.0.0.1:3000

# Admin Panel
ADMIN_ENABLED=False

# Deployment
RENDER_EXTERNAL_HOSTNAME=your-app.onrender.com

Generating SECRET_KEY

python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"

Database Configuration

Development (SQLite - default):

# Already configured in settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': BASE_DIR / 'db.sqlite3',
    }
}

Production (PostgreSQL - recommended):

# Install: pip install dj-database-url psycopg2-binary
import dj_database_url

DATABASES = {
    'default': dj_database_url.config(
        default=os.environ.get('DATABASE_URL'),
        conn_max_age=600
    )
}

🔨 Development Guide

Project Structure

recommender/
├── views.py          # Core logic
├── urls.py           # URL patterns
├── models.py         # Database models (currently empty)
├── admin.py          # Admin configuration
├── apps.py           # App configuration
└── templates/        # HTML templates
    └── recommender/
        ├── index.html    # Home/search page
        ├── result.html   # Recommendations page
        └── error.html    # Error page

Development Workflow

# 1. Activate virtual environment
source venv/bin/activate  # or venv\Scripts\activate on Windows

# 2. Run development server
python manage.py runserver

# 3. Access application
# http://localhost:8000

# 4. Make changes to code
# Files auto-reload on save

# 5. Run tests (when available)
python manage.py test

Django Management Commands

# Database
python manage.py migrate                    # Apply migrations
python manage.py makemigrations            # Create migrations
python manage.py showmigrations            # Show migration status

# Static files
python manage.py collectstatic --noinput   # Collect static files

# Development
python manage.py runserver                 # Run dev server
python manage.py runserver 8080           # Run on different port
python manage.py shell                     # Django shell

# Admin (if enabled)
python manage.py createsuperuser           # Create admin user

Viewing Logs

# Real-time logs (Unix/macOS)
tail -f logs/django.log

# Real-time logs (Windows PowerShell)
Get-Content logs\django.log -Wait

# Last 100 lines
tail -n 100 logs/django.log

🧪 Testing

Running Tests

# Run all tests
python manage.py test

# Run specific app tests
python manage.py test recommender

# Run with verbose output
python manage.py test --verbosity=2

# Keep test database
python manage.py test --keepdb

Writing Tests

from django.test import TestCase, Client
from django.urls import reverse

class RecommenderTests(TestCase):
    def setUp(self):
        self.client = Client()
    
    def test_home_page(self):
        """Test home page loads"""
        response = self.client.get(reverse('recommender:main'))
        self.assertEqual(response.status_code, 200)
    
    def test_search_api(self):
        """Test search API"""
        response = self.client.get('/api/search/?q=matrix')
        self.assertEqual(response.status_code, 200)
        data = response.json()
        self.assertIn('movies', data)

🚀 Deployment

Deployment Checklist

Before deploying to production:

Deploy to Render

Step 1: Prepare Repository

git add .
git commit -m "Prepare for deployment"
git push origin main

Step 2: Create Render Account

Go to render.com
Sign up (free tier available)

Step 3: Create New Web Service

Click "New +" → "Web Service"
Connect your GitHub repository
Render auto-detects render.yaml

Step 4: Configure Environment Variables

SECRET_KEY=<auto-generated>
DEBUG=False
ALLOWED_HOSTS=your-app.onrender.com
MODEL_DIR=./models

Step 5: Deploy

Click "Create Web Service"
Wait for build to complete
Access at https://your-app.onrender.com

Deploy to Heroku

Prerequisites:

# Install Heroku CLI
# https://devcenter.heroku.com/articles/heroku-cli

# Login
heroku login

Deployment:

# Create app
heroku create your-app-name

# Add PostgreSQL
heroku addons:create heroku-postgresql:mini

# Set environment variables
heroku config:set SECRET_KEY="your-secret-key"
heroku config:set DEBUG=False
heroku config:set MODEL_DIR=./models

# Deploy
git push heroku main

# Run migrations
heroku run python manage.py migrate

# Open app
heroku open

Deploy with Docker

Dockerfile:

FROM python:3.10-slim

ENV PYTHONUNBUFFERED=1
WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

RUN python manage.py collectstatic --noinput

EXPOSE 8000

CMD ["gunicorn", "movie_recommendation.wsgi:application", "--bind", "0.0.0.0:8000"]

Build and Run:

# Build
docker build -t movie-recommender .

# Run
docker run -p 8000:8000 -e DEBUG=False movie-recommender

# Access
http://localhost:8000

Deploy to AWS

# Install EB CLI
pip install awsebcli

# Initialize
eb init -p python-3.10 movie-recommender

# Create environment
eb create movie-recommender-env

# Deploy
eb deploy

# Open
eb open

📡 API Reference

Endpoints

1. Home Page / Search

Endpoint: GET /

Description: Display search interface

Response: HTML page

2. Submit Search

Endpoint: POST /

Content-Type: application/x-www-form-urlencoded

Parameters:

Parameter	Type	Required	Description
movie_name	string	Yes	Movie title to search
csrfmiddlewaretoken	string	Yes	CSRF token

Response: HTML page with recommendations or error

3. Search Movies (Autocomplete)

Endpoint: GET /api/search/

Parameters:

Parameter	Type	Required	Description
q	string	Yes	Search query (min 2 chars)

Example Request:

curl "http://localhost:8000/api/search/?q=matrix"

Example Response:

{
  "movies": [
    "The Matrix",
    "The Matrix Reloaded",
    "The Matrix Revolutions"
  ],
  "count": 3
}

4. Health Check

Endpoint: GET /api/health/

Description: Check service health and model status

Example Request:

curl "http://localhost:8000/api/health/"

Example Response:

{
  "status": "healthy",
  "movies_loaded": 100000,
  "model_dir": "./models",
  "model_loaded": true
}

Status Codes:

200 OK - Service healthy
503 Service Unavailable - Service unhealthy

💻 Command Reference

Virtual Environment

# Create
python -m venv venv

# Activate (Windows)
venv\Scripts\activate

# Activate (Unix/macOS)
source venv/bin/activate

# Deactivate
deactivate

Django Commands

# Server
python manage.py runserver              # Start dev server
python manage.py runserver 8080        # Custom port

# Database
python manage.py migrate                # Apply migrations
python manage.py makemigrations        # Create migrations
python manage.py showmigrations        # Show status

# Static files
python manage.py collectstatic         # Collect static files

# Shell
python manage.py shell                 # Django shell

# Testing
python manage.py test                  # Run tests

Git Commands

# Clone
git clone <url>

# Status
git status

# Stage changes
git add .

# Commit
git commit -m "message"

# Push
git push origin main

# Pull
git pull origin main

🐛 Troubleshooting

Common Issues

Issue 1: Module Not Found

Problem: ModuleNotFoundError: No module named 'package'

Solution:

# Ensure virtual environment is activated
source venv/bin/activate  # or venv\Scripts\activate

# Reinstall dependencies
pip install -r requirements.txt --upgrade

Issue 2: Model Not Found

Problem: FileNotFoundError: Model files not found

Solution:

# Check MODEL_DIR
echo $MODEL_DIR

# Verify files exist
ls -la models/  # or dir models\ on Windows

# Use demo model
export MODEL_DIR=./static

# Or train new model
python training/train.py

Issue 3: Port Already in Use

Problem: Error: That port is already in use

Solution:

# Use different port
python manage.py runserver 8080

# Or kill process (Unix/macOS)
lsof -ti:8000 | xargs kill -9

# Windows
netstat -ano | findstr :8000
taskkill /PID <PID> /F

Issue 4: Static Files Not Loading

Problem: CSS/images not displaying in production

Solution:

# Collect static files
python manage.py collectstatic --noinput

# Verify settings
# STATIC_ROOT should be set
# WhiteNoise should be in MIDDLEWARE

Issue 5: Database Errors

Problem: OperationalError: no such table

Solution:

# Run migrations
python manage.py migrate

# If still failing, reset database
rm db.sqlite3
python manage.py migrate

Issue 6: Memory Error During Training

Problem: System crashes or MemoryError during training

Solution:

# Reduce dataset size
trainer.train(data_path, max_movies=50000)

# Or reduce SVD components
trainer = MovieRecommenderTrainer(n_components=300)

# Or use higher quality threshold
trainer.train(data_path, quality_threshold='high')

See training/guide.md - Troubleshooting for training-specific issues.

⚡ Best Practices

Performance

Use Production Server

# Don't use runserver in production
gunicorn movie_recommendation.wsgi:application

Enable Caching

# Use Redis for production
CACHES = {
    'default': {
        'BACKEND': 'django_redis.cache.RedisCache',
        'LOCATION': 'redis://127.0.0.1:6379/1',
    }
}

Optimize Database
- Use PostgreSQL in production
- Add database indexes
- Use connection pooling
Enable Compression
- Enable gzip compression
- Use CDN for static files
- Optimize model files

Security

Environment Variables
- Never commit secrets
- Use .env files
- Rotate keys regularly

Dependencies

# Check for outdated packages
pip list --outdated

# Update dependencies
pip install --upgrade package_name

Security Headers

# Already in settings.py for production
SECURE_SSL_REDIRECT = True
SESSION_COOKIE_SECURE = True
CSRF_COOKIE_SECURE = True

Development

Code Quality

# Use linting
pip install flake8 black

# Check code
flake8 recommender/

# Format code
black recommender/

Version Control
- Commit frequently
- Write descriptive messages
- Use feature branches
- Review before merging
Documentation
- Add docstrings to functions
- Comment complex logic
- Update README when needed
- Keep documentation current

❓ FAQ

General

Q: Do I need to train a model to use the system?
A: No! The project includes a pre-trained demo model with 2,000 movies. Just run and use.

Q: What's the difference between models/ and static/?
A: static/ contains the demo model (2K movies). models/ is for your custom trained models (created after training).

Q: How do I switch between models?
A: Set the MODEL_DIR environment variable:

export MODEL_DIR=./static     # Demo model
export MODEL_DIR=./models     # Your trained model

Training

Q: How long does training take?
A: Depends on dataset size:

10K movies: ~2 minutes
100K movies: ~15 minutes
1M+ movies: ~60 minutes

Q: How much memory do I need for training?
A:

10K movies: 500MB RAM
100K movies: 2GB RAM
1M+ movies: 6-8GB RAM

For detailed training FAQ, see training/guide.md - FAQ

Deployment

Q: Can I deploy for free?
A: Yes! Render and Heroku offer free tiers suitable for this project.

Q: Do I need a database for deployment?
A: SQLite works for development. Use PostgreSQL for production (most platforms provide it).

Q: How do I configure HTTPS?
A: Most cloud platforms (Render, Heroku) provide HTTPS automatically.

Development

Q: Can I modify the UI?
A: Yes! Edit the templates in recommender/templates/recommender/. All CSS is inline for easy modification.

Q: How do I add new features?
A:

Create a feature branch
Make changes in recommender/views.py or templates
Test locally
Update documentation
Submit pull request

📚 Additional Resources

Documentation

README.md - Quick start and overview
training/guide.md - Model training guide
CHANGELOG.md - Version history

External Resources

Community

GitHub Issues - Bug reports and feature requests
GitHub Discussions - Questions and community support

Need more help? Check training/guide.md for training help or open an issue on GitHub.

⬆ Back to Top

Uh oh!

FilesExpand file tree

PROJECT_GUIDE.md

Latest commit

History

PROJECT_GUIDE.md

File metadata and controls

📘 Movie Recommendation System - Complete Project Guide

📑 Table of Contents

Getting Started

Core Functionality

Configuration & Development

Deployment

Reference

🎯 Overview

What This Guide Covers

Related Documentation

✅ Prerequisites

System Requirements

Required Software

Optional Software

💻 Installation

Step 1: Clone the Repository

Step 2: Create Virtual Environment

Step 3: Install Dependencies

Step 4: Database Setup

Step 5: Start Development Server

Step 6: Verify Installation

✓ Quick Verification

🏗️ Project Architecture

High-Level Architecture

Component Breakdown

Django Application (movie_recommendation/)

Recommender App (recommender/)

Model Files (models/ or static/)

Training Scripts (training/)

🔍 How It Works

Recommendation Pipeline

Content-Based Filtering

🎓 Model Training

Overview

Using Demo Model

Training Your Own Model

Training Configurations

Dataset Requirements

🔧 Using Different Models

Switching Models

Model Directory Structure

Verifying Model

⚙️ Configuration

Environment Variables

Generating SECRET_KEY

Database Configuration

🔨 Development Guide

Project Structure

Development Workflow

Django Management Commands

Viewing Logs

🧪 Testing

Running Tests

Writing Tests

🚀 Deployment

Deployment Checklist

Deploy to Render

Deploy to Heroku

Deploy with Docker

Deploy to AWS

📡 API Reference

Endpoints

1. Home Page / Search

2. Submit Search

3. Search Movies (Autocomplete)

4. Health Check

💻 Command Reference

Virtual Environment

Django Commands

Git Commands

🐛 Troubleshooting

Common Issues

Issue 1: Module Not Found

Django Application (`movie_recommendation/`)

Recommender App (`recommender/`)

Model Files (`models/` or `static/`)

Training Scripts (`training/`)