GitHub - Carlosssr/K-Nearest-Neighbors-Project

K-Nearest Neighbors Project

Project Overview

This project demonstrates the implementation of the K-Nearest Neighbors (KNN) algorithm using a synthetic dataset. The primary goal is to classify data points based on their features into predefined target classes. This notebook follows a structured approach, including data preprocessing, exploratory data analysis (EDA), model training, evaluation, and optimization.

Project Structure

The notebook is divided into several sections:

Import Libraries: Importing essential libraries such as pandas, numpy, matplotlib, seaborn, and scikit-learn.
Get the Data: Loading the dataset into a pandas DataFrame.
Exploratory Data Analysis (EDA): Visualizing the data using seaborn to understand the relationships between features.
Data Preprocessing: Standardizing the features using StandardScaler to ensure all features contribute equally to the distance calculations in the KNN algorithm.
Train-Test Split: Splitting the data into training and testing sets to evaluate the model's performance.
Model Training: Training a KNN classifier with the training data.
Model Evaluation: Evaluating the model's performance using a confusion matrix and classification report.
Hyperparameter Tuning: Using the elbow method to determine the optimal number of neighbors (K) for the KNN classifier.
Retraining and Final Evaluation: Retraining the model with the optimal K value and evaluating its performance.

How to Use This Notebook

Clone or Download the Repository: Ensure you have the notebook file (03-K Nearest Neighbors Project - Solutions.ipynb) and the dataset file (KNN_Project_Data.csv).
Install Necessary Libraries: Make sure you have the required libraries installed. You can install them using pip:
```
pip install pandas numpy matplotlib seaborn scikit-learn
```
Run the Notebook: Open the notebook in Jupyter Notebook or any compatible environment and run the cells sequentially to follow the steps and reproduce the results.

Key Steps and Code Snippets

Importing Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix

Loading the Data

df = pd.read_csv('KNN_Project_Data.csv')
df.head()

Exploratory Data Analysis

sns.pairplot(df, hue='TARGET CLASS', palette='coolwarm')

Data Preprocessing

scaler = StandardScaler()
scaler.fit(df.drop('TARGET CLASS', axis=1))
scaled_features = scaler.transform(df.drop('TARGET CLASS', axis=1))
df_feat = pd.DataFrame(scaled_features, columns=df.columns[:-1])
df_feat.head()

Train-Test Split

X_train, X_test, y_train, y_test = train_test_split(scaled_features, df['TARGET CLASS'], test_size=0.30)

Model Training and Evaluation

knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
pred = knn.predict(X_test)
print(confusion_matrix(y_test, pred))
print(classification_report(y_test, pred))

Hyperparameter Tuning

error_rate = []
for i in range(1, 40):
    knn = KNeighborsClassifier(n_neighbors=i)
    knn.fit(X_train, y_train)
    pred_i = knn.predict(X_test)
    error_rate.append(np.mean(pred_i != y_test))

plt.figure(figsize=(10, 6))
plt.plot(range(1, 40), error_rate, color='blue', linestyle='dashed', marker='o',
         markerfacecolor='red', markersize=10)
plt.title('Error Rate vs. K Value')
plt.xlabel('K')
plt.ylabel('Error Rate')

Retraining with Optimal K Value

knn = KNeighborsClassifier(n_neighbors=30)
knn.fit(X_train, y_train)
pred = knn.predict(X_test)
print('WITH K=30')
print(confusion_matrix(y_test, pred))
print(classification_report(y_test, pred))

Conclusion

This project demonstrates the implementation and evaluation of the K-Nearest Neighbors algorithm. By following the steps outlined in this notebook, you will gain insights into data preprocessing, model training, and hyperparameter tuning using the KNN algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
03-K Nearest Neighbors Project.ipynb		03-K Nearest Neighbors Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Project Structure

How to Use This Notebook

Key Steps and Code Snippets

Importing Libraries

Loading the Data

Exploratory Data Analysis

Data Preprocessing

Train-Test Split

Model Training and Evaluation

Hyperparameter Tuning

Retraining with Optimal K Value

Conclusion

About

Uh oh!

Releases

Packages

Languages

Carlosssr/K-Nearest-Neighbors-Project

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Project Structure

How to Use This Notebook

Key Steps and Code Snippets

Importing Libraries

Loading the Data

Exploratory Data Analysis

Data Preprocessing

Train-Test Split

Model Training and Evaluation

Hyperparameter Tuning

Retraining with Optimal K Value

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages