Skip to content

Commit 9e70394

Browse files
FEAT: Adding the basic structure of the rfe.py
1 parent 58a6828 commit 9e70394

File tree

1 file changed

+109
-0
lines changed

1 file changed

+109
-0
lines changed

rfe.py

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
"""
2+
================================================================================
3+
Recursive Feature Elimination (RFE) Automation and Feature Analysis Tool
4+
================================================================================
5+
Author : Breno Farias da Silva
6+
Created : 2025-10-07
7+
Description :
8+
This script automates the process of performing Recursive Feature Elimination (RFE)
9+
on structured datasets to identify the most relevant features for classification tasks.
10+
It provides a fully integrated pipeline — from dataset loading and preprocessing
11+
to feature ranking, visualization, and export of analysis reports.
12+
13+
Core functionalities include:
14+
- Dataset validation and safe file handling
15+
- Standardization of numeric features using z-score normalization
16+
- Recursive Feature Elimination (RFE) with Random Forest as the base estimator
17+
- Generation of ranked feature lists with visual and statistical summaries
18+
- Boxplot-based visualization of top features by class distribution
19+
- Cross-platform sound notification upon completion
20+
21+
Usage:
22+
1. Set the `csv_file` variable inside the `main()` function to the dataset path.
23+
2. Run the script using:
24+
$ make main
25+
3. The program will automatically:
26+
- Load and clean the dataset
27+
- Run RFE to select the most relevant features
28+
- Save results and visualizations to the `Feature_Analysis/` directory
29+
- Optionally play a notification sound when finished
30+
31+
Output:
32+
- Text report (`RFE_results_<Model>.txt`) summarizing feature rankings.
33+
- CSV summary of top features with mean and standard deviation per class.
34+
- Boxplot visualizations for each selected feature stored in `Feature_Analysis/`.
35+
36+
TODOs:
37+
- Add support for additional estimators (e.g., SVM, Gradient Boosting).
38+
- Integrate evaluation metrics (F1-score, accuracy, precision, recall, FPR, FNR)
39+
directly after feature selection.
40+
- Incorporate correlation analysis to remove redundant features.
41+
- Extend preprocessing to handle categorical and missing data automatically.
42+
- Implement CLI argument parsing for dataset paths and configuration options.
43+
- Add parallel RFE runs with different feature subset sizes (1, 2, 5, 10, 15, 20, 25).
44+
45+
Dependencies:
46+
- Python >= 3.9
47+
- pandas, numpy, seaborn, matplotlib, scikit-learn, colorama
48+
49+
Notes:
50+
- The last column of the dataset is assumed to be the target variable.
51+
- Only numeric columns are considered for RFE processing.
52+
- Sound playback is skipped on Windows platforms by default.
53+
"""
54+
55+
import atexit # For playing a sound when the program finishes
56+
import os # For file and directory operations
57+
import numpy as np # For numerical operations
58+
import pandas as pd # For data manipulation
59+
import matplotlib.pyplot as plt # For plotting
60+
import re # For regular expressions
61+
import seaborn as sns # For advanced plots
62+
import platform # For getting the operating system name
63+
from colorama import Style # For coloring the terminal
64+
from sklearn.model_selection import train_test_split # For splitting the data
65+
from sklearn.preprocessing import StandardScaler # For scaling the data (standardization)
66+
from sklearn.feature_selection import RFE # For Recursive Feature Elimination
67+
from sklearn.ensemble import RandomForestClassifier # For the Random Forest model
68+
69+
# Macros:
70+
class BackgroundColors: # Colors for the terminal
71+
CYAN = "\033[96m" # Cyan
72+
GREEN = "\033[92m" # Green
73+
YELLOW = "\033[93m" # Yellow
74+
RED = "\033[91m" # Red
75+
BOLD = "\033[1m" # Bold
76+
UNDERLINE = "\033[4m" # Underline
77+
CLEAR_TERMINAL = "\033[H\033[J" # Clear the terminal
78+
79+
# Execution Constants:
80+
VERBOSE = False # Set to True to output verbose messages
81+
82+
# Sound Constants:
83+
SOUND_COMMANDS = {"Darwin": "afplay", "Linux": "aplay", "Windows": "start"} # The commands to play a sound for each operating system
84+
SOUND_FILE = "./.assets/Sounds/NotificationSound.wav" # The path to the sound file
85+
86+
# RUN_FUNCTIONS:
87+
RUN_FUNCTIONS = {
88+
"Play Sound": True, # Set to True to play a sound when the program finishes
89+
}
90+
91+
# Functions Definitions:
92+
93+
def main():
94+
"""
95+
Main function.
96+
97+
:return: None
98+
"""
99+
100+
pass
101+
102+
if __name__ == "__main__":
103+
"""
104+
This is the standard boilerplate that calls the main() function.
105+
106+
:return: None
107+
"""
108+
109+
main() # Call the main function

0 commit comments

Comments
 (0)