Skip to content
This repository was archived by the owner on Jun 20, 2023. It is now read-only.

Commit 36f9a12

Browse files
authored
🎉 Added RoadMaps
1 parent 9d94535 commit 36f9a12

File tree

3 files changed

+144
-0
lines changed

3 files changed

+144
-0
lines changed
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# *****************************
2+
# ** Data Science - Road Map **
3+
# *****************************
4+
#
5+
# - Model Project: https://www.kaggle.com/code/dsfelix/spaceship-titanic-competition
6+
#
7+
8+
---- 0 - Documentation ----
9+
10+
\ Problem Description (context and main goal)
11+
\ Files Descriptions (train/validation/test datasets, submissions, how the datasets have been extracted, ...)
12+
\ Variables (name and description)
13+
\ Target Variable (name, description and values examples)
14+
\ Model Evaluation Metric (description, goal, equation and example)
15+
\ Dataset Limitations
16+
\ Goals
17+
\ Setup (tools, packages and commands to install the packages)
18+
\ Aknowledges
19+
20+
21+
22+
---- 1 - Descriptive Analysis ----
23+
24+
\ Import and Set up Libraries, Create Constants
25+
\ Read Dataset, Parse Dates and Encode Characters if Needed
26+
\ Check out Dataset Shape
27+
\ Split Dataset into Features and Target
28+
\ Check out Missing Values (numbers and plots)
29+
\ Check out Data Types and Make Conversions if Needed
30+
\ Check out String Features (convert them to lower case, treat duplicated simblings values and treat typos)
31+
\ Convert String Features into Categorical Features
32+
\ Check out Data Leakage (Target Leakage, Train-Test Contamination and Stratification) and Drop Features
33+
\ Treat Time Series Features
34+
\ Treat GeoSpatial Features
35+
36+
37+
38+
---- 2 - Statistical Analysis ----
39+
40+
\ Use Describe() Function for Numerical and Categorical Features (add new Descriptive Statistics into it, like Standard Error, Variance, Median, Absolute Median Deviation, Skewness, Kurtosis and Range)
41+
42+
\ Check out the Numerical Features Histogram to see which Distribution they have
43+
44+
\ Check out the Categorical Features Value_Counts to see which Distribution they have
45+
46+
\ Create Frequence and Cross Tables for Categorical Features
47+
48+
\ Check out the Correlation between the Numerical Features (use corr() function in pandas and create HeatMaps Plots; check out the need to exclude Na/Missing Values)
49+
50+
\ Make Hypothesis Testing applying a 95% Confidence Interval (T-Test, Z-Test, ANOVA and Chi-Squared Test)
51+
52+
\ Use Regression Analysis to model the relationship between numerical
53+
columns and a categorical or numerical target column (Linear Regression, Logistic Regression and K-Means Clusters)
54+
55+
\ Generate Summary Reports (Autoviz and Pandas Report Libraries)
56+
57+
58+
59+
---- 3 - Datas Transformations ----
60+
61+
\ Split Dataset into Training and Validation
62+
\ Filter Good and Bad Labels
63+
\ Check the Need to use Imputers, Encoders, Label Encoders and Standardizations
64+
\ Check out for Outliers
65+
66+
67+
68+
---- 4 - Features Engineering ----
69+
70+
\ Mutual Information (MI)
71+
\ K-Means Clustering and Elbow Method (biggest drop in the plot)
72+
\ Principal Component Analysis (PCA)
73+
74+
75+
76+
---- 5 - Base Models ----
77+
78+
\ Pipelines (Imputers, Encoders, Label Encoders and Standardizations)
79+
\ Create Simple Models
80+
\ Create XGBoost Models
81+
\ Create Deep Learning Models
82+
83+
84+
85+
---- 6 - Evaluating Models ----
86+
87+
\ Cross-Validation
88+
\ Evaluation Metric
89+
\ Overfitting and Underfitting
90+
\ Choose the Best Model
91+
92+
93+
94+
---- 7 - Model Explainability ----
95+
96+
\ Permutation Importance
97+
\ Summary Plots
98+
\ Partial Plots
99+
\ Contribution\Dependence Plots
100+
101+
102+
103+
---- 8 - Making Predictions ----
104+
105+
\ Make Predictions with Test Dataset
106+
\ Export Model in Pickle Format
107+
\ Load Model in Pickle Format
108+
\ Create a Simple Kernel to use the model and make Predictions
109+
110+
111+
112+
---- 9 - Reach Me Section ----
113+
114+
\ E-mail
115+
\ LinkedIn
116+
\ Portfolio
117+
\ GitHub
118+
\ Kaggle
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# **************************
2+
# ** Dashboard - Road Map **
3+
# **************************
4+
#
5+
6+
7+
---- 0 - Setting Up ----
8+
9+
\ Load Data
10+
\ Filter and Transform Data
11+
\ Prepare the Front-End (add explanations, filter fields and plots)
12+
\ Add Metric Cards (dinamic or static)
13+
\ Add Plots (dinamic or static)
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# ********************************
2+
# ** Web Application - Road Map **
3+
# ********************************
4+
#
5+
6+
7+
8+
---- 0 - Setup ----
9+
10+
\ Create Front-End Application
11+
\ Create Back-End Application
12+
\ Load Model in Pickle
13+
\ Consume the Model in Pickle

0 commit comments

Comments
 (0)