This is Almabetter's capstone project on ML-Regression problem. In this project we are going to work on Yes bank’s stock price dataset. Yes Bank is a well-known bank in the Indian financial domain. Since 2018, it has been in news because of the fraud case involving Rana Kapoor. This dataset has monthly stock prices of the bank since its inception and includes closing, starting, highest and lowest stock prices of every month. The main objective is to predict the stock's closing price of the month.
##Objective
####To achieve the objective of the project we need to do perform exploratory data analysis, Hypothesis testing, some data manipulation and feature engineering, data preprocessing, model implementation and several other things mentioned below.
Main libraries to be used:
- Pandas for data manipulation and aggregation.
- Matplotlib and Seaborn for visualization.
- Numpy for computationally efficient operations.
- Scikit learn for model training, model optimization and metrics calculation.
- Load Dataset
- Dataset first look
- Rows & Columns count
- Check for duplicate and null values
- Variables description
- Check Unique Values for each variable
Fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. Dataset can contain missing data, numerical string value, various cues. If we can clean them, It will make our analyzing process easy.
- Data wrangling
- Visualization
- Storytelling and experimenting with charts
- Understand the relationships between variables
Based on our EDA, we will define 3 hypothetical statements from the dataset and peform hypothesis testing to obtain final conclusion about the statements.
- Handling missing values
- Handling Outliers
- Categorical encoding
- Textual data preprocessing
- Feature manipulation and selection
- Data transformation
- Data scaling
- Dimensionality reduction
- Data splitting
- Handling imbalanced data
- Implementation
- Explain the ML Model used and it's performance using Evaluation metric Score Chart.
- Cross- Validation & Hyperparameter Tuning
ML models we are going to use:
- Linear regression
- Ridge, Lasso and ElasticNet for Regularization
- Random Forest Regressor
- XGboost Regressor
SHAP(SHapley Additive exPlanations) is a model explainability tool to explain the predictions of machine learning models. It is based on the concepts of game theory and can be used to explain the predictions of any machine learning model by calculating the contribution of each feature to the prediction.