Skip to content

Slow DBSCAN runtimes #324

@srinivass99

Description

@srinivass99

Hey guys , hope this is the right place to ask about this :

I am working on a data pipeline which handles sales data of stores and display some analytics which includes clustering order totals (1D) . I've implemented K-means using scholar and it is working great! i wanted to try other algorithms as well and picked DBSCAN. However it is very slow to the point that a dataset with ~600 ish orders is taking forever to complete.

Working on a mac, using EXLA as backend, slow execution happens in both livebook and brew install of elixir

This is my code for timing it with slices of the original data :

 for it <- [10,50,100,200,300,400,500,600] do
  {time, _result} =
  :timer.tc(fn ->
    x = order_values["discounted_amount"] |> Nx.to_tensor()

    y =
      x
      |> Nx.slice([0], [it])
      |> Nx.reshape({:auto, 1})

    Scholar.Cluster.DBSCAN.fit(y, eps: 100, min_samples: 2)
  end)

IO.puts("time: #{time / 1000000} seconds")
end 

Output

time: 0.010366 seconds
time: 0.026122 seconds
time: 0.421917 seconds
time: 18.469798 seconds
time: 195.325409 seconds
time: 2069.163107 seconds

i aborted it when number of records reached 400. K-means is running much faster on much bigger datasets. Any help is appreciated! and apologise if i've missed any relevant details

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions