Skip to content

INF 553 - User Based, Item Based & Model Based CF Movie Recommendation System

Notifications You must be signed in to change notification settings

piyushumate/Movie-Recommendation-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Movie-Recommendation-System

INF 553 - User Based, Item Based & Model Based CF Movie Recommendation System

Versions

Spark - 2.2.1

Scala - 2.11

Python - 2.7

Task 1 : Jaccard Based LSH Python command

spark-submit Piyush_Umate_task1_Jaccard.py <rating_file_path>

Scala command

spark-submit --class JaccardLSH Piyush_Umate_hw3.jar <rating_file_path>

Python output -

Time: 113.05479598 sec Precision 1.0

Recall 0.815018491384

Scala output Time: 102sec

Description - I used 8 hash functions with b = 4 and r = 2. My hash function was of the form (7row_index) + (3i) % 671 . The value of “i” varies from 1 to 8 . A good hash function is one that contains prime numbers thus 7 and 3 were used.

Task 2.1 : Model Based CF Python command - spark-submit --driver-memory 4g --executor-memory 4g Piyush_Umate_task2_ModelBasedCF.py <rating_file_path> <testing_file_path> Scala command - spark-submit --driver-memory 4g --executor-memory 4g --class ModelBasedCF Piyush_Umate_hw3.jar <rating_file_path> <testing_file_path>

ML-Latest - Small data Python output

=0 and <1: 13761 =1 and <2: 4149 =2 and <3: 714 =3 and <4: 103 =4: 6

RMSE: 0.95073628922 Time: 10.8795609474 sec

ML-Latest - Small data Scala output

=0 and <1: 13826 =1 and <2: 4091 =2 and <3: 708 =3 and <4: 102 =4: 6 RMSE: 0.9479834931653777 Time: 7sec

ML-20m - Big data Python output

=0 and <1: 3232631 =1 and <2: 723767 =2 and <3: 82068 =3 and <4: 7655 =4: 210

RMSE: 0.817364633871

Time: 1286.95861316 sec

ML-20m - Big data Scala output

=0 and <1: 3232743 =1 and <2: 723742 =2 and <3: 82029 =3 and <4: 7611 =4: 206

RMSE: 0.8172113310320842

Time: 545sec Description - For Model Based CF , I used rank as 10 , 12 as the number of iterations and 0.1 as the regularization factor of ALS. I used ParamGridBuilder in tuning different values and then used Regression Evaluator to extract the best parameters.

Task 2.2: User Based CF Python command - spark-submit Piyush_Umate_task2_UserBasedCF.py <rating_file> <testing_file_path> Scala command - spark-submit --class UserBasedCF Piyush_Umate_hw3.jar <rating_file> <testing_file> ML-Latest - Small data Python output

=0 and <1: 14965 =1 and <2: 4088 =2 and <3: 1039 =3 and <4: 153 =4: 11 RMSE: 0.988432740607

Time: 30.9461770058 sec

ML-Latest - Small data Scala output

=0 and <1: 15070 =1 and <2: 4000 =2 and <3: 1023 =3 and <4: 154 =4: 9 RMSE: 0.982381870797202

Time: 54sec

Description - For User Based CF, I used Pearson correlation as the similarity metric. The outliers (outcasts) whose predicted ratings were not within 0 to 5 were fitted in that range by normalization. For certain predicted ratings where Pearson correlation fails, I used imputation boosting to handle them.

Task 2.3 Item Based CF with LSH Python command - spark-submit Piyush_Umate_task2_ItemBasedCF.py <rating_file_path> <testing_file_path> Scala command - spark-submit --class ItemBasedCF Piyush_Umate_hw3.jar <rating_file_path> <testing_file_path>

ML-Latest - Small data Python output

=0 and <1: 13944 =1 and <2: 5133 =2 and <3: 959 =3 and <4: 208 =4: 12

RMSE: 1.00286340067

Time: 8.94653177261 sec

ML-Latest - Small data Scala output

=0 and <1: 13924 =1 and <2: 5152 =2 and <3: 955 =3 and <4: 214 =4: 11

RMSE: 1.0049722266060221 Time: 9sec

Comparing the result with CF without LSH and answering how LSH could affect the recommendation system? While LSH compares the similarity between movies by taking into consideration only whether the user rated the movie or not and not the weight i.e. rating given by the user, LSH could severely affect the recommendation system. LSH in turn causes loss of data in terms of the ratings made by the user. For example, while Pearson correlation takes into consideration the similarity between the ratings of user to predict the rating, LSH only takes into consideration whether a user rated a movie or not but not how good was the rating.

About

INF 553 - User Based, Item Based & Model Based CF Movie Recommendation System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published