Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions machine_learning/DBSCAN.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
import pandas as pd

Check failure on line 1 in machine_learning/DBSCAN.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N999)

machine_learning/DBSCAN.py:1:1: N999 Invalid module name: 'DBSCAN'
import numpy as np

Check failure on line 2 in machine_learning/DBSCAN.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (F401)

machine_learning/DBSCAN.py:2:17: F401 `numpy` imported but unused
import math
import matplotlib.pyplot as plt

class DBSCAN:

Check failure on line 6 in machine_learning/DBSCAN.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (I001)

machine_learning/DBSCAN.py:1:1: I001 Import block is un-sorted or un-formatted
'''
Author : Gowtham Kamalasekar
LinkedIn : https://www.linkedin.com/in/gowtham-kamalasekar/

DBSCAN Algorithm :
Density-Based Spatial Clustering Of Applications With Noise
Refer this website for more details : https://en.wikipedia.org/wiki/DBSCAN

Attributes:
-----------
minPts (int) : Minimum number of points needed to be within the radius to considered as core

Check failure on line 17 in machine_learning/DBSCAN.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/DBSCAN.py:17:89: E501 Line too long (100 > 88)
radius (int) : The radius from a given core point where other core points can be considered as core

Check failure on line 18 in machine_learning/DBSCAN.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/DBSCAN.py:18:89: E501 Line too long (107 > 88)
file (csv) : CSV file location. Should contain x and y coordinate value for each point.

Check failure on line 19 in machine_learning/DBSCAN.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/DBSCAN.py:19:89: E501 Line too long (97 > 88)

Example :
minPts = 4
radius = 1.9
file = 'data_dbscan.csv'

File Structure of CSV Data:
---------------------------
_____
x | y
-----
3 | 7
4 | 6
5 | 5
6 | 4
7 | 3
-----

Functions:
----------
__init__() : Constructor that sets minPts, radius and file
perform_dbscan() : Invoked by constructor and calculates the core and noise points and returns a dictionary.

Check failure on line 41 in machine_learning/DBSCAN.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/DBSCAN.py:41:89: E501 Line too long (116 > 88)
print_dbscan() : Prints the core and noise points along with stating if the noise are border points or not.

Check failure on line 42 in machine_learning/DBSCAN.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/DBSCAN.py:42:89: E501 Line too long (117 > 88)
plot_dbscan() : Plots the points to show the core and noise point.

To create a object
------------------
import DBSCAN
obj = DBSCAN.DBSCAN(minPts, radius, file)
obj.print_dbscan()
obj.plot_dbscan()
'''

Check failure on line 52 in machine_learning/DBSCAN.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (W293)

machine_learning/DBSCAN.py:52:1: W293 Blank line contains whitespace
def __init__(self, minPts, radius, file):

Check failure on line 53 in machine_learning/DBSCAN.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N803)

machine_learning/DBSCAN.py:53:24: N803 Argument name `minPts` should be lowercase
self.minPts = minPts
self.radius = radius
self.file = file
self.dict1 = self.perform_dbscan()

def perform_dbscan(self):
data = pd.read_csv(self.file)

minPts = self.minPts
e = self.radius

dict1 = {}
for i in range(len(data)):
for j in range(len(data)):
dist = math.sqrt(pow(data['x'][j] - data['x'][i],2) + pow(data['y'][j] - data['y'][i],2))
if dist < e:
if i+1 in dict1:
dict1[i+1].append(j+1)
else:
dict1[i+1] = [j+1,]

return dict1

def print_dbscan(self):
for i in self.dict1:
print(i," ",self.dict1[i], end=' ---> ')
if len(self.dict1[i]) >= self.minPts:
print("Core")
else:
for j in self.dict1:
if i != j and len(self.dict1[j]) >= self.minPts and i in self.dict1[j]:
print("Noise ---> Border")
break
else:
print("Noise")

def plot_dbscan(self):
data = pd.read_csv(self.file)
e = self.radius
for i in self.dict1:
if len(self.dict1[i]) >= self.minPts:
plt.scatter(data['x'][i-1], data['y'][i-1], color='red')
circle = plt.Circle((data['x'][i-1], data['y'][i-1]), e, color='blue', fill=False)
plt.gca().add_artist(circle)
plt.text(data['x'][i-1], data['y'][i-1], 'P'+str(i), ha='center', va='bottom')
else:
plt.scatter(data['x'][i-1], data['y'][i-1], color='green')
plt.text(data['x'][i-1], data['y'][i-1], 'P'+str(i), ha='center', va='bottom')

plt.xlabel('X')
plt.ylabel('Y')
plt.title('DBSCAN Clustering')

plt.legend(['Core','Noise'])
plt.show()
Loading