k-nearest neighbors algorithm (k-NN)

What is k-NN?

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric classification method developed by Evelyn Fix and Joseph Hodges in 1951 and later expanded by Thomas Cover. Used for classification and regression. In both cases, the input includes training models close to the data set. Output depends on whether k-NN is used for classification or regression:

In the k-NN classification, the output of the class membership. An item is divided by a majority vote of its neighbors, the item is assigned the most common category among its closest neighbors and k. If k = 1, then the object is given to the class of that one nearest neighbor.
In k-NN regression, the output is the property value for the object. This value is the average value of k for nearby neighbors.

Dataset

Dataset used for the model is “Fish.csv”. Dataset consists of 159 rows and 7 columns.

Dataset Description

The attributes used in dataset are given bellow:

Species
Weight
Length1
Length2
Length3
Height
Width

Independent Attributes

Independent attributes in dataset are:

Length1
Length2
Length3
Height
Width

Dependent Attribute

Dependent attribute in dataset is:

Weight

Target Attribute

Target attribute in dataset is:

Weight

We will predict the weight of fish by using other attributes to train the model.

Dataset Head

Here is the head of dataset used in the model.

Dataset Preprocessing

As we don't need Species attribute to predict the weight of fish. So, we will drop Species attribute. The final dataset is given below:

Model Training

After preprocessing the dataset, we used preprocessed data for model training. For this purpose, we split up the data and select 30% of data for test purposes and 70% of data for model training. We test our model for k up to 20 to get minimum Root Mean Square Error (RMSE). The RMSE for different values of k are given below:

As it is clear from the above figure that RMSE is minimum for k = 3.

RMSE value for k = 3 is: 47.21824893415052

Prediction Results

So, we used k = 3 for the prediction of the weight of fishes. As we get minimum RMSE on 3 which is approximately 47. The prediction results are given below:

References

Dataset Link: View Fish Dataset
Download Link: Download Fish Dataset

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Fish.csv		Fish.csv
README.md		README.md
k-nearest neighbors algorithm.py		k-nearest neighbors algorithm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

k-nearest neighbors algorithm (k-NN)

What is k-NN?

Dataset

Dataset Description

Independent Attributes

Dependent Attribute

Target Attribute

Dataset Head

Dataset Preprocessing

Model Training

Prediction Results

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

k-nearest neighbors algorithm (k-NN)

What is k-NN?

Dataset

Dataset Description

Independent Attributes

Dependent Attribute

Target Attribute

Dataset Head

Dataset Preprocessing

Model Training

Prediction Results

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages