-
-
Notifications
You must be signed in to change notification settings - Fork 48.7k
Implementation of DBSCAN algorithm in Machine Learning #11632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
e107d6d
da81c07
b526b4d
49e9f61
d618090
12ac966
a393075
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
import pandas as pd | ||
import math | ||
import matplotlib.pyplot as plt | ||
from typing import dict, list | ||
class DbScan: | ||
''' | ||
DBSCAN Algorithm : | ||
Density-Based Spatial Clustering Of Applications With Noise | ||
Refer this website for more details : https://en.wikipedia.org/wiki/DBSCAN | ||
|
||
Functions: | ||
---------- | ||
__init__() : Constructor that sets minPts, radius and file | ||
perform_dbscan() : Invoked by constructor and calculates the core and noise points and returns a dictionary. | ||
print_dbscan() : Prints the core and noise points along with stating if the noise are border points or not. | ||
plot_dbscan() : Plots the points to show the core and noise point. | ||
|
||
To create a object | ||
------------------ | ||
import dbscan | ||
obj = dbscan.DbScan(minpts, radius, file) | ||
obj.print_dbscan() | ||
obj.plot_dbscan() | ||
''' | ||
def __init__(self, minpts : int, radius : int, file : str) -> None: | ||
''' | ||
Constructor | ||
|
||
Attributes: | ||
----------- | ||
minpts (int) : Minimum number of points needed to be | ||
within the radius to considered as core | ||
radius (int) : The radius from a given core point where | ||
other core points can be considered as core | ||
file (csv) : CSV file location. Should contain x and y | ||
coordinate value for each point. | ||
|
||
Example : | ||
minPts = 4 | ||
radius = 1.9 | ||
file = 'data_dbscan.csv' | ||
|
||
File Structure of CSV Data: | ||
--------------------------- | ||
_____ | ||
x | y | ||
----- | ||
3 | 7 | ||
4 | 6 | ||
5 | 5 | ||
6 | 4 | ||
7 | 3 | ||
----- | ||
''' | ||
self.minpts = minpts | ||
self.radius = radius | ||
self.file = file | ||
self.dict1 = self.perform_dbscan() | ||
def perform_dbscan(self) -> dict[int, list[int]]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As there is no test file in this pull request nor any test function or class in the file |
||
''' | ||
Parameters: | ||
----------- | ||
None | ||
|
||
Return: | ||
-------- | ||
Dictionary with points and the list of points | ||
that lie in its radius | ||
''' | ||
data = pd.read_csv(self.file) | ||
e = self.radius | ||
dict1 = {} | ||
for i in range(len(data)): | ||
for j in range(len(data)): | ||
dist = math.sqrt(pow(data['x'][j] - data['x'][i],2) + pow(data['y'][j] - data['y'][i],2)) | ||
if dist < e: | ||
if i+1 in dict1: | ||
dict1[i+1].append(j+1) | ||
else: | ||
dict1[i+1] = [j+1,] | ||
return dict1 | ||
def print_dbscan(self) -> None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As there is no test file in this pull request nor any test function or class in the file There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As there is no test file in this pull request nor any test function or class in the file There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As there is no test file in this pull request nor any test function or class in the file |
||
''' | ||
Outputs: | ||
-------- | ||
Prints each point and if it is a core or a noise (w/ border) | ||
''' | ||
for i in self.dict1: | ||
print(i," ",self.dict1[i], end=' ---> ') | ||
if len(self.dict1[i]) >= self.minpts: | ||
print("Core") | ||
else: | ||
for j in self.dict1: | ||
if i != j and len(self.dict1[j]) >= self.minpts and i in self.dict1[j]: | ||
print("Noise ---> Border") | ||
break | ||
else: | ||
print("Noise") | ||
def plot_dbscan(self) -> None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As there is no test file in this pull request nor any test function or class in the file There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As there is no test file in this pull request nor any test function or class in the file There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As there is no test file in this pull request nor any test function or class in the file |
||
''' | ||
Output: | ||
------- | ||
A matplotlib plot that show points as core and noise along | ||
with the circle that lie within it. | ||
''' | ||
data = pd.read_csv(self.file) | ||
e = self.radius | ||
for i in self.dict1: | ||
if len(self.dict1[i]) >= self.minpts: | ||
plt.scatter(data['x'][i-1], data['y'][i-1], color='red') | ||
circle = plt.Circle((data['x'][i-1], data['y'][i-1]), e, color='blue', fill=False) | ||
plt.gca().add_artist(circle) | ||
plt.text(data['x'][i-1], data['y'][i-1], 'P'+str(i), ha='center', va='bottom') | ||
else: | ||
plt.scatter(data['x'][i-1], data['y'][i-1], color='green') | ||
plt.text(data['x'][i-1], data['y'][i-1], 'P'+str(i), ha='center', va='bottom') | ||
plt.xlabel('X') | ||
plt.ylabel('Y') | ||
plt.title('DBSCAN Clustering') | ||
plt.legend(['Core','Noise']) | ||
plt.show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there is no test file in this pull request nor any test function or class in the file
machine_learning/dbscan.py
, please provide doctest for the functionperform_dbscan