A hybrid recommender system combining User-Based and Item-Based Collaborative Filtering on the MovieLens 20M dataset to generate personalized movie recommendations.
Given a user's movie rating history, recommend new movies they are likely to enjoy using two complementary approaches:
- User-Based CF — Find similar users by taste and recommend what they liked
- Item-Based CF — Find movies similar to what the user already rated highly
- Source: MovieLens 20M Dataset
- Scale: 20 million ratings from 138,493 users across 27,278 movies
- After filtering: Movies with 1,000+ ratings retained (3,134 movies)
- Build user-movie rating matrix (138K × 3K)
- Identify users who watched ≥60% of the same movies as the target user
- Compute Pearson correlation between target user and candidates
- Filter to users with correlation ≥ 0.65
- Calculate weighted ratings (correlation × rating)
- Rank movies by weighted average score → Top 5 recommendations
- Find the target user's highest-rated recent movie
- Compute pairwise Pearson correlation between that movie and all others
- Rank by correlation strength → Top 5 recommendations
Combines both approaches: 5 user-based + 5 item-based = 10 recommendations
User-Based Recommendations:
| Movie | Weighted Score |
|---|---|
| The Shawshank Redemption | 4.52 |
| Pulp Fiction | 4.38 |
| Forrest Gump | 4.31 |
| The Matrix | 4.27 |
| Fight Club | 4.15 |
Actual results vary by target user.
- Python 3.8+ — Core language
- Pandas — Data manipulation, pivot tables, correlation computation
- NumPy — Numerical operations
git clone https://github.com/eboekenh/Movie_Recommendation_Algorithm.git
cd Movie_Recommendation_Algorithm
pip install -r requirements.txtDownload the MovieLens 20M dataset and place movies.csv and ratings.csv in the project directory.
python Hybrid_movie_recommendation.py- Minimum 1,000 ratings per movie — Ensures correlation calculations are statistically meaningful
- 60% overlap threshold — Users must share substantial viewing history for reliable similarity
- Pearson correlation ≥ 0.65 — Only strongly similar users influence recommendations
- Weighted ratings — Higher-correlated users have more influence on recommendations
MIT