@@ -19,27 +19,27 @@ points. Then, ``q`` is considered to be *density reachable* by ``p`` if
19
19
there exists a sequence `` p_1, p_2, \ldots, p_n `` such that `` p_1 = p ``
20
20
and `` p_{i+1} `` is directly density reachable from `` p_i `` .
21
21
22
- A cluster, which is a subset of the given set of points, satisfies two
23
- properties :
24
- 1 . All points within the cluster are mutually * density-connected* ,
22
+ The points within DBSCAN clusters are categorized into * core * (or * seeds * )
23
+ and * boundary * :
24
+ 1 . All points of the cluster * core * are mutually * density-connected* ,
25
25
meaning that for any two distinct points `` p `` and `` q `` in a
26
- cluster, there exists a point `` o `` such that both `` p `` and `` q ``
27
- are density reachable from `` o `` .
28
- 2 . If a point is density-connected to any point of a cluster, it is
29
- also part of that cluster.
26
+ core, there exists a point `` o `` such that both `` p `` and `` q ``
27
+ are * density reachable* from `` o `` .
28
+ 2 . If a point is * density-connected* to any point of a cluster core, it is
29
+ also part of the core.
30
+ 3 . All points within the `` \epsilon `` -neighborhood of any core point, but
31
+ not belonging to that core (i.e. not * density reachable* from the core),
32
+ are considered cluster * boundary* .
30
33
31
34
## Interface
32
35
33
- There are two implementations of * DBSCAN* algorithm in this package
34
- (both provided by [ ` dbscan ` ] ( @ref ) function):
35
- - Distance (adjacency) matrix-based. It requires `` O(N^2) `` memory to run.
36
- Boundary points cannot be shared between the clusters.
37
- - Adjacency list-based. The input is the `` d \times n `` matrix of point
38
- coordinates. The adjacency list is built on the fly. The performance is much
39
- better both in terms of running time and memory usage. Returns a vector of
40
- [ ` DbscanCluster ` ] ( @ref ) objects that contain the indices of the * core* and
41
- * boundary* points, making it possible to share the boundary points between
42
- multiple clusters.
36
+ The implementation of * DBSCAN* algorithm provided by [ ` dbscan ` ] ( @ref ) function
37
+ supports the two ways of specifying clustering data:
38
+ - The `` d \times n `` matrix of point coordinates. This is the preferred method
39
+ as it uses memory- and time-efficient neighboring points queries via
40
+ [ NearestNeighbors.jl] ( https://github.com/KristofferC/NearestNeighbors.jl ) package.
41
+ - The `` n\times n `` matrix of precalculated pairwise point distances.
42
+ It requires `` O(n^2) `` memory and time to run.
43
43
44
44
``` @docs
45
45
dbscan
0 commit comments