-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Hey guys , hope this is the right place to ask about this :
I am working on a data pipeline which handles sales data of stores and display some analytics which includes clustering order totals (1D) . I've implemented K-means using scholar and it is working great! i wanted to try other algorithms as well and picked DBSCAN. However it is very slow to the point that a dataset with ~600 ish orders is taking forever to complete.
Working on a mac, using EXLA as backend, slow execution happens in both livebook and brew install of elixir
This is my code for timing it with slices of the original data :
for it <- [10,50,100,200,300,400,500,600] do
{time, _result} =
:timer.tc(fn ->
x = order_values["discounted_amount"] |> Nx.to_tensor()
y =
x
|> Nx.slice([0], [it])
|> Nx.reshape({:auto, 1})
Scholar.Cluster.DBSCAN.fit(y, eps: 100, min_samples: 2)
end)
IO.puts("time: #{time / 1000000} seconds")
end
Output
time: 0.010366 seconds
time: 0.026122 seconds
time: 0.421917 seconds
time: 18.469798 seconds
time: 195.325409 seconds
time: 2069.163107 seconds
i aborted it when number of records reached 400. K-means is running much faster on much bigger datasets. Any help is appreciated! and apologise if i've missed any relevant details