This repo is home to three projects I completed as a student in General Assembly's Data Science Immersive program.
- My first project explores Kaggle's Ames Housing Dataset, to which I applied principal component analysis and lasso regression to predict the sale prices of homes.
- In my second project, I scraped job postings to attempt to a) classify whether a job posting is a data science job or a job in some related field and b) distinguish high-wage jobs from lower wage jobs. I also characterized what differentiates these jobs from each other, using both regression coefficients and term weights.
- My most comprehensive project explores an anonymized dataset of Instacart customer orders, from which I derive features to fit a model that predicts which previously ordered items will be in customers' next orders. I then analyze the performance of the model and draw next steps for improving model performance.
I wrote these projects in Jupyter Notebooks. You can view any Notebook inside this repo by clicking on the project. To view or edit these Notebooks locally, you'll need to either:
- Install Jupyter
- Install and run Docker
- Start a container by running the command docker run -it --rm -p 8888:8888 jupyter/scipy-notebook
- Visit the url the command generates to access the Notebook server
- Upload a Notebook file
- Source data for Ames Housing project: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
- Source data for Instacart project: https://www.kaggle.com/c/instacart-market-basket-analysis/data