Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions machine_learning/best_random_state_in_random_forest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import pandas as pd
import warnings
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler

warnings.filterwarnings("ignore")

Check failure on line 8 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (I001)

machine_learning/best_random_state_in_random_forest.py:1:1: I001 Import block is un-sorted or un-formatted


def find_best_random_state(
data: pd.DataFrame, target_column: str, iterations: int = 200
) -> int:
"""
Find the best random state for the Random Forest Classifier that maximizes accuracy.

Args:
data (pd.DataFrame): The dataset containing features and target variable.
target_column (str): The name of the target column in the dataset.
iterations (int): Number of random states to test. Default is 200.

Returns:
int: The random state that provides the best accuracy.
"""
# Split dataset into predictors and target
predictors = data.drop(target_column, axis=1)
target = data[target_column]

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(

Check failure on line 30 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N806)

machine_learning/best_random_state_in_random_forest.py:30:5: N806 Variable `X_train` in function should be lowercase

Check failure on line 30 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N806)

machine_learning/best_random_state_in_random_forest.py:30:14: N806 Variable `X_test` in function should be lowercase
predictors, target, test_size=0.20, random_state=0
)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

Check failure on line 36 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N806)

machine_learning/best_random_state_in_random_forest.py:36:5: N806 Variable `X_train_scaled` in function should be lowercase

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_train_scaled

X_test_scaled = scaler.transform(X_test)

Check failure on line 37 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N806)

machine_learning/best_random_state_in_random_forest.py:37:5: N806 Variable `X_test_scaled` in function should be lowercase

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_test_scaled


max_accuracy_rf = 0
best_random_state = 0

# Loop through specified random states
for random_state in range(iterations):
rf = RandomForestClassifier(random_state=random_state)
rf.fit(X_train_scaled, y_train)
y_pred_rf = rf.predict(X_test_scaled)

current_accuracy = round(accuracy_score(y_test, y_pred_rf) * 100, 2)
if current_accuracy > max_accuracy_rf:
max_accuracy_rf = current_accuracy
best_random_state = random_state

print(
f"The best random state is: {best_random_state} with an accuracy score of: {max_accuracy_rf} %"

Check failure on line 54 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/best_random_state_in_random_forest.py:54:89: E501 Line too long (103 > 88)
)
return best_random_state


if __name__ == "__main__":
# Load dataset
dataset = pd.read_csv("heart.csv")

# Find the best random state
best_state = find_best_random_state(dataset, target_column="target")
Loading