This project uses the K-Nearest Neighbors (KNN) algorithm to classify iris flowers into one of three species based on four features:
- Sepal Length
- Sepal Width
- Petal Length
- Petal Width
The objective is to build a classification model that can predict the species of iris flowers using these features and evaluate its performance.
- Python 3.7+
- Libraries:
numpypandasmatplotlibseabornscikit-learn
Install dependencies using pip:
pip install numpy pandas matplotlib seaborn scikit-learnThe dataset used is the popular Iris dataset, which is available in sklearn.datasets. It contains 150 samples, with 50 samples each for three species of iris flowers: Setosa, Versicolor, and Virginica.
Load the Dataset: The dataset is loaded using load_iris from sklearn.datasets.
Split the Dataset: Split the data into training and testing sets using train_test_split.
Train the Model: Train the K-Nearest Neighbors (KNN) classifier on the training data.
Evaluate the Model: Evaluate the classifier on the test set using metrics like accuracy, precision, recall, and F1 score.
Visualize the Results: Visualize the model performance with a confusion matrix and scatter plots of feature distributions.
Accuracy: Measures overall correctness. Precision: Measures the accuracy of positive predictions. Recall: Measures the ability to find all positive instances. F1 Score: Harmonic mean of precision and recall, useful for imbalanced datasets.
Confusion Matrix: Displays true vs. predicted labels, helping identify misclassifications.
Feature Scatter Plot: Provides a 2D visualization of sepal length and sepal width across species.
The model achieves high accuracy and performs well across all metrics for this balanced dataset.

