Input Data: Poets & Quants Webscrape
Preprocessing Steps: Checkout Methods Here
Created one model for each school.
Tons of overfitting going on (check out Michigan, Cornell, and INSEAD). Right now, just testing Proof of Concept and Things.
Tiny number of samples for each school, currently experimenting with ways to reduce dimensionality of categorical features like majors.
Data quality is a work in progress. The webscrape parsing piece leaves much to be desired.
Looking to try some more techniques apart from Linear Regression.
Number of Samples for Duke: 22
Classifier: LinearRegression for Duke
Mean Absolute Error: 3.58485224787
Mean Squared Error: 22.0921411102
R2 Coefficient of Determination: 0.795259046485
Number of Samples for Columbia: 29
Classifier: LinearRegression for Columbia
Mean Absolute Error: 3.46662790023
Mean Squared Error: 18.7949446535
R2 Coefficient of Determination: 0.790745676603
Number of Samples for Stanford: 47
Classifier: LinearRegression for Stanford
Mean Absolute Error: 4.60340588495
Mean Squared Error: 33.0629974013
R2 Coefficient of Determination: 0.462376435337
Number of Samples for Cornell: 9
Classifier: LinearRegression for Cornell
Mean Absolute Error: 7.83570739158e-14
Mean Squared Error: 1.19461589438e-26
R2 Coefficient of Determination: 1.0
Number of Samples for Berkeley: 26
Classifier: LinearRegression for Berkeley
Mean Absolute Error: 4.3932741025
Mean Squared Error: 36.3744266231
R2 Coefficient of Determination: 0.733180197792
Number of Samples for Michigan: 10
Classifier: LinearRegression for Michigan
Mean Absolute Error: 1.98951966013e-14
Mean Squared Error: 7.27014210252e-28
R2 Coefficient of Determination: 1.0
Number of Samples for Tuck: 25
Classifier: LinearRegression for Tuck
Mean Absolute Error: 4.57402835951
Mean Squared Error: 39.4828064477
R2 Coefficient of Determination: 0.605171935523
Number of Samples for UCLA: 17
Classifier: LinearRegression for UCLA
Mean Absolute Error: 2.0019628512
Mean Squared Error: 15.5555829657
R2 Coefficient of Determination: 0.886868487522
Number of Samples for NYU: 8
Classifier: LinearRegression for NYU
Mean Absolute Error: 2.62012633812e-14
Mean Squared Error: 1.06180677843e-27
R2 Coefficient of Determination: 1.0
Number of Samples for Wharton: 40
Classifier: LinearRegression for Wharton
Mean Absolute Error: 5.31598176412
Mean Squared Error: 50.9461001814
R2 Coefficient of Determination: 0.402145237385
Number of Samples for Kellogg: 43
Classifier: LinearRegression for Kellogg
Mean Absolute Error: 5.13166042293
Mean Squared Error: 40.8422677543
R2 Coefficient of Determination: 0.486276509675
Number of Samples for Harvard: 49
Classifier: LinearRegression for Harvard
Mean Absolute Error: 5.92251808645
Mean Squared Error: 59.1383837851
R2 Coefficient of Determination: 0.350563103457
Number of Samples for Booth: 29
Classifier: LinearRegression for Booth
Mean Absolute Error: 4.71861838448
Mean Squared Error: 34.9298834845
R2 Coefficient of Determination: 0.659999629509
Number of Samples for Yale: 18
Classifier: LinearRegression for Yale
Mean Absolute Error: 1.78430864078
Mean Squared Error: 9.84162846404
R2 Coefficient of Determination: 0.945117252627
Number of Samples for Sloan: 21
Classifier: LinearRegression for Sloan
Mean Absolute Error: 1.7679722085
Mean Squared Error: 8.05797198748
R2 Coefficient of Determination: 0.947393550755
Number of Samples for INSEAD: 6
Classifier: LinearRegression for INSEAD
Mean Absolute Error: 3.07901852163e-14
Mean Squared Error: 1.31266454629e-27
R2 Coefficient of Determination: 1.0