-
-
Notifications
You must be signed in to change notification settings - Fork 48.7k
Added best_random_state_in_random_forest.py file an algorithm to find best random state of random forest classifier #12277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added best_random_state_in_random_forest.py file an algorithm to find best random state of random forest classifier #12277
Conversation
The code is designed to identify the optimal random state for a Random Forest Classifier to maximize its prediction accuracy on a given dataset. It uses the heart.csv dataset. The logic is encapsulated within the function find_best_random_state, which accepts a DataFrame and the name of the target column as arguments.
Create best_random_state_in_random_forest.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper
commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper review
to trigger the checks for only added pull request files@algorithms-keeper review-all
to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
warnings.filterwarnings('ignore') | ||
|
||
|
||
def find_best_random_state(data: pd.DataFrame, target_column: str, iterations: int = 200) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there is no test file in this pull request nor any test function or class in the file machine_learning/best_random_state_in_random_forest.py
, please provide doctest for the function find_best_random_state
target = data[target_column] | ||
|
||
# Split dataset into train and test sets | ||
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.20, random_state=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable and function names should follow the snake_case
naming convention. Please update the following name accordingly: X_train
Variable and function names should follow the snake_case
naming convention. Please update the following name accordingly: X_test
|
||
# Scale features | ||
scaler = StandardScaler() | ||
X_train_scaled = scaler.fit_transform(X_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable and function names should follow the snake_case
naming convention. Please update the following name accordingly: X_train_scaled
# Scale features | ||
scaler = StandardScaler() | ||
X_train_scaled = scaler.fit_transform(X_train) | ||
X_test_scaled = scaler.transform(X_test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable and function names should follow the snake_case
naming convention. Please update the following name accordingly: X_test_scaled
for more information, see https://pre-commit.ci
Closing require_tests PRs to prepare for Hacktoberfest |
Description of Change
The code is designed to identify the optimal random state for a Random Forest Classifier to maximize its prediction accuracy on a given dataset. It uses the heart.csv dataset. The logic is encapsulated within the function
find_best_random_state
, which accepts a DataFrame and the name of the target column as arguments.Describe your change:
Added an algorithm to find the best optimal random state of Random Forest Classifier.
Checklist: