The aim of this project is to provide a decision making tool for job seekers in the analytics fields. Specifically, the project consists of two parts: Visualizations and Predictive modeling.
The Visualizations offer key insights from the job postings data relating to the followings:
- Locations
- Sectors & Industries
- Qualification & Skills
- Salary
The predictive tool allows the users to get an estimate on Annual salary based on features variables (e.g., Analytics-specific skills such as SQL and Visualization, and Company information such as Industry and number of employees)
Below are the files created for this projects:
-
Business_Analyst_cleaning.ipynb: This script performs cleaning of Business analyst job posting data
-
Data_Analyst_cleaning.ipynb: This script performs cleaning of Data analyst job posting data
-
Data_Scientist_cleaning.ipynb: This script performs cleaning of Data scientist job posting data
-
Business_Analyst_cleaned.xlsx: This is the cleaned data for Business analyst job posting data
-
Data_Analyst_cleaned.xlsx: This is the cleaned data for Data analyst job posting data
-
Data_Scientist_cleaned.xlsx: This is the cleaned data for Data scientist job posting data
-
Business_Analyst.ipynb: This script performs EDA on the Business analyst job posting data
-
Data_Analyst.ipynb: This script performs EDA on the Data analyst job posting data
-
Data_Scientist.ipynb: This script performs EDA on the Data scientist job posting data
-
Predictive_Modeling.ipynb: This script constructs predictive models for annual salary, runs validations, and compares results between the models.
- Models: Linear, Ridge, and Lasso regression models
- Validation: Cross validation
- Measure of goodness: RMSE
For the overall summarized results, please see Presentation.pdf.