Slow DBSCAN runtimes

Hey guys , hope this is the right place to ask about this : 

I am working on a data pipeline which handles sales data of stores and display some analytics which includes clustering order totals (1D) . I've implemented K-means using scholar and it is working great! i wanted to try other algorithms as well and picked DBSCAN. However it is very slow to the point that a dataset with ~600 ish orders is taking forever to complete. 

Working on a mac, using EXLA as backend, slow execution happens in both livebook and brew install of elixir

This is my code for timing  it with slices of the original data : 

```
 for it <- [10,50,100,200,300,400,500,600] do
  {time, _result} =
  :timer.tc(fn ->
    x = order_values["discounted_amount"] |> Nx.to_tensor()

    y =
      x
      |> Nx.slice([0], [it])
      |> Nx.reshape({:auto, 1})

    Scholar.Cluster.DBSCAN.fit(y, eps: 100, min_samples: 2)
  end)

IO.puts("time: #{time / 1000000} seconds")
end 
 ```

Output

time: 0.010366 seconds
time: 0.026122 seconds
time: 0.421917 seconds
time: 18.469798 seconds
time: 195.325409 seconds
time: 2069.163107 seconds

i aborted it when number of records reached 400. K-means is running much faster on much bigger datasets. Any help is appreciated! and apologise if i've missed any relevant details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow DBSCAN runtimes #324

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slow DBSCAN runtimes #324

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions