In this project, he dataset contains house sale prices for King County. The data is first treated for missing values using imputing. Data Preprocessing is done to keep only the releavant columns with maximum information by combining few columns and drop irrelavant ones. Binning is performed for the zip-codes. The target variable sale-price is made Y and the other independent variable are made X. The train-test ratio remain 70:30. Standardisation is perform using StandardScaler from sklearn library to fit the data in a common range. Multi-collinearity is removed using VIF from statsmodel library. Training the model with linear regression generates prediction with an accuracy score of 0.76. Assupmtions of Linear Regression are verified using residual plot and distribution of error plot which signifies errors mostly lie near 0. The model coefficients gives features on which the price of the house depends. The price of the house depends maximum on the grade, followed by Zipcode_Group_Zipcode_Group5, Zipcode_Group_Zipcode_Group4 and Transformed_sqft_above. It is a Kaggle dataset: https://www.kaggle.com/harlfoxem/housesalesprediction
-
Notifications
You must be signed in to change notification settings - Fork 0
The dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. It's a great dataset for evaluating simple regression models. In this project, we used simple LinearRegression to predict the price and the features it depends on. The data have been split into training and testin…
License
akashjborah97/House-Sale-Price-Predictor
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
The dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. It's a great dataset for evaluating simple regression models. In this project, we used simple LinearRegression to predict the price and the features it depends on. The data have been split into training and testin…
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published