Energy Data Engineering-2-Project2

Machine learning - Supervised Learning

Fossil Hardcoal Generation Analysis and Classification

Problem Satement: The generation of Fossil hard coal is necessary for future predictions. Predicting the generation whether it is low or high is important for other energy sources in the enrgy market. The relationship has to be analysed between different energy sources.

Features:

Exploratory Data Analysis:

Create a binary classification setting with labels "High" and "Low" using a threshold of 3000MW.
Generate three meaningful visualizations exploring the relationship between features and target variables.
Include two heatmaps (x: day of year, y: time of day) for the target variable:
- Continuous variable
- Binary variable

Logistic Regression and k-Nearest Neighbors (kNN):

Implement and train a logistic regression model, identifying optimum parameters.
Implement and train a kNN model, identifying optimum hyperparameters.
Plot showing train and test scores.
Evaluate and compare kNN and logistic regression models:
Compare all relevant scores (Accuracy, Recall, Precision, F1-Score, ROC_AUC, MCC).
Visualize confusion matrix and ROC plot.
Explain the default scoring value (ROC_AUC is preferred due to training set imbalance).

Decision Tree:

Model evaluation with cross-validation.
Visualize the decision tree.
Visualize feature importance.

Comparison of Multiple Classification Models:

Apply and compare 5 different classification models:
- Logistic Regression
- kNN
- Random Forests
- Gradient Boosting
- AdaBoost
Perform cross-validation and hyperparameter optimization for all models.
Select the best performing model based on ROC_AUC score and explain the choice.

Deployment of Best Performing Model:

Train and fit the best performing model, named "best_model". (Random Forests)
Deploy the best model on 2024 data (not used for training).
Evaluate and visualize predictions on the test dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Actual Generation per Production Type_202301010000-202401010000.csv		Actual Generation per Production Type_202301010000-202401010000.csv
Actual Generation per Production Type_202401010000-202501010000.csv		Actual Generation per Production Type_202401010000-202501010000.csv
EDE2_Project2_SauravB_NooreldinA (1).ipynb		EDE2_Project2_SauravB_NooreldinA (1).ipynb
EDE2_Project2_SauravB_NooreldinA.html		EDE2_Project2_SauravB_NooreldinA.html
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Energy Data Engineering-2-Project2

Fossil Hardcoal Generation Analysis and Classification

Features:

Exploratory Data Analysis:

Logistic Regression and k-Nearest Neighbors (kNN):

Decision Tree:

Comparison of Multiple Classification Models:

Deployment of Best Performing Model:

About

Uh oh!

Releases

Packages

Languages

SauravBhowmick/Fossil-Hardcoal-Generation-Analysis-and-Classification

Folders and files

Latest commit

History

Repository files navigation

Energy Data Engineering-2-Project2

Fossil Hardcoal Generation Analysis and Classification

Features:

Exploratory Data Analysis:

Logistic Regression and k-Nearest Neighbors (kNN):

Decision Tree:

Comparison of Multiple Classification Models:

Deployment of Best Performing Model:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages