Skip to content
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions machine_learning/best_random_state_in_random_forest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import pandas as pd
import warnings
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler

warnings.filterwarnings('ignore')

Check failure on line 8 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (I001)

machine_learning/best_random_state_in_random_forest.py:1:1: I001 Import block is un-sorted or un-formatted


def find_best_random_state(data: pd.DataFrame, target_column: str, iterations: int = 200) -> int:

Check failure on line 11 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/best_random_state_in_random_forest.py:11:89: E501 Line too long (97 > 88)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/best_random_state_in_random_forest.py, please provide doctest for the function find_best_random_state

"""
Find the best random state for the Random Forest Classifier that maximizes accuracy.

Args:
data (pd.DataFrame): The dataset containing features and target variable.
target_column (str): The name of the target column in the dataset.
iterations (int): Number of random states to test. Default is 200.

Returns:
int: The random state that provides the best accuracy.
"""
# Split dataset into predictors and target
predictors = data.drop(target_column, axis=1)
target = data[target_column]

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.20, random_state=0)

Check failure on line 28 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N806)

machine_learning/best_random_state_in_random_forest.py:28:5: N806 Variable `X_train` in function should be lowercase

Check failure on line 28 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N806)

machine_learning/best_random_state_in_random_forest.py:28:14: N806 Variable `X_test` in function should be lowercase

Check failure on line 28 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/best_random_state_in_random_forest.py:28:89: E501 Line too long (107 > 88)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_train

Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_test


# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

Check failure on line 32 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N806)

machine_learning/best_random_state_in_random_forest.py:32:5: N806 Variable `X_train_scaled` in function should be lowercase

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_train_scaled

X_test_scaled = scaler.transform(X_test)

Check failure on line 33 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N806)

machine_learning/best_random_state_in_random_forest.py:33:5: N806 Variable `X_test_scaled` in function should be lowercase

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable and function names should follow the snake_case naming convention. Please update the following name accordingly: X_test_scaled


max_accuracy_rf = 0
best_random_state = 0

# Loop through specified random states
for random_state in range(iterations):
rf = RandomForestClassifier(random_state=random_state)
rf.fit(X_train_scaled, y_train)
y_pred_rf = rf.predict(X_test_scaled)

Check failure on line 43 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W293)

machine_learning/best_random_state_in_random_forest.py:43:1: W293 Blank line contains whitespace
current_accuracy = round(accuracy_score(y_test, y_pred_rf) * 100, 2)
if current_accuracy > max_accuracy_rf:
max_accuracy_rf = current_accuracy
best_random_state = random_state

print(f"The best random state is: {best_random_state} with an accuracy score of: {max_accuracy_rf} %")

Check failure on line 49 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/best_random_state_in_random_forest.py:49:89: E501 Line too long (106 > 88)
return best_random_state


if __name__ == "__main__":
# Load dataset
dataset = pd.read_csv("heart.csv")

Check failure on line 56 in machine_learning/best_random_state_in_random_forest.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W293)

machine_learning/best_random_state_in_random_forest.py:56:1: W293 Blank line contains whitespace
# Find the best random state
best_state = find_best_random_state(dataset, target_column="target")