Skip to content

Latest commit

 

History

History
38 lines (30 loc) · 5.09 KB

File metadata and controls

38 lines (30 loc) · 5.09 KB

Data Science Handbook

Repository containing portfolio of Data Science and Machine Learning projects.

It is presented in the form of iPython Notebooks and PDF.

Notes

Fundamentals

No Notebook Description
1 NumPy Overview Overview of how to use numpy
2 Pandas Overview Overview of how to use pandas
3 Matplotlib Overview Overview how to use matplotlib data visualization
4 Seaborn Overview Overview of how to use seaborn data visualization

EDA - Data Preparation and Preprocessing

No Notebook Description
1 Feature Engineering: Variable Types & Characteristics Collections of variables type and characteristics, such as MNAR, MCAR, MAR, cardinality, distributions, linear model assumptions, outliers, and variable magnitude
2 Feature Engineering: Univariate Missing Data Imputation Collections of univariate missing data imputation technique, such as mean median mode, aribitrary, end of distribution, random sample, and many more
3 Feature Engineering: Multivariate Missing Data Imputation KNN and MICE multivariate missing data imputation
4 Feature Engineering: Categorical Encoding Collection of categorical encoding techniques, such as rare label encoding, one hot encoding, woe encoding, and other monotonic relationship encoding
5 Feature Engineering: Variable Transformation Collection of variable transformation techniques to transform non-gaussian distribution for linear model, such as log transformer, box-cox transformer, yeo-johnson transformer
6 Feature Engineering: Discretization Collection of discretization methods, such as equal width discretization, equal frequency discretization, K-means discretization, and many more
7 Feature Selection: Filter Methods Collection of feature selection filter methods, such as constant, quasi-constant, duplicated features pair, multi-collinearity, mutual information, ANOVA, and many more

Modelling and Analysis

No Notebook Report Dasbhoard Description
1 E-Commerce Sales Performance and Customer RFM Behavior Analysis PDF Tableau Dashboard Story E-Commerce companies want to know sales performance and customer behavior. This analysis goals are to understand customer behavior and what recommendations can be made to increase sales and customer satisfaction
2 Credit Default Risk_Home Credit_Light GBM PDF - Credit Default Risk classification and Debtors Grading with SHAP model explainability using Light GBM
3 Book Recommendation System_Content and Item-based Collaborative Filtering PDF - Build a book recommendation system to help users choose their books based on the books they have purchased
4 Article Topic Classification_Kumparan_Light GBM PDF - Build a model to classify article topics based on their content using TF-IDF vectorization
5 Airplane Passengers_SARIMA Forecasting - - Number of plane passengers seasonal forecasting using Walk-Forward Validation
6 Sales Advertising_Linear Regression - - Sales prediction based on advertising amount