Machine learning - Supervised Learning
Problem Satement: The generation of Fossil hard coal is necessary for future predictions. Predicting the generation whether it is low or high is important for other energy sources in the enrgy market. The relationship has to be analysed between different energy sources.
- Create a binary classification setting with labels "High" and "Low" using a threshold of 3000MW.
- Generate three meaningful visualizations exploring the relationship between features and target variables.
- Include two heatmaps (x: day of year, y: time of day) for the target variable:
- Continuous variable
- Binary variable
- Implement and train a logistic regression model, identifying optimum parameters.
- Implement and train a kNN model, identifying optimum hyperparameters.
- Plot showing train and test scores.
- Evaluate and compare kNN and logistic regression models:
- Compare all relevant scores (Accuracy, Recall, Precision, F1-Score, ROC_AUC, MCC).
- Visualize confusion matrix and ROC plot.
- Explain the default scoring value (ROC_AUC is preferred due to training set imbalance).
- Model evaluation with cross-validation.
- Visualize the decision tree.
- Visualize feature importance.
- Apply and compare 5 different classification models:
- Logistic Regression
- kNN
- Random Forests
- Gradient Boosting
- AdaBoost
- Perform cross-validation and hyperparameter optimization for all models.
- Select the best performing model based on ROC_AUC score and explain the choice.
- Train and fit the best performing model, named "best_model". (Random Forests)
- Deploy the best model on 2024 data (not used for training).
- Evaluate and visualize predictions on the test dataset.