Improve DBSCAN performance #325

jonatanklosko · 2025-12-18T20:09:56Z

Closes #324.

For the reproduction from #324 (comment):

before	after
0.140113 s	0.148038 s
0.08597 s	0.070765 s
0.663867 s	0.083935 s
16.370221 s	0.084186 s
109.480284 s	0.087279 s
381.106067 s	0.1175 s
	0.147327 s
	0.174654 s

jonatanklosko · 2025-12-18T20:11:29Z

lib/scholar/cluster/dbscan.ex

+        # During cluster expansion, we store points to be visited on
+        # the stack. Each point can be at the stack at most once, so
+        # the number of points is the upper bound on stack size.
+        stack = Nx.broadcast(-1, {Nx.axis_size(indices, 0)})


The max stack size was heavily overestimated. The stack is effectively a set of points and should never exceed N points. We just need to make sure to never add duplicates into the stack, which the other line change does.

My first attempt was to remove the innermost while loop (putting all relevant indices onto the stack at once), and I was able to with a trick. But I started to think that it may break in an edge case if we get to a close stack. Then analyzing the stack size, I realised it should be way smaller, and that was the actual fix :p

krstopro

Looks good to me.

One thing we might want is more unit tests for these algorithms. For example, ten points at one location (duplicates) and ten points at another one far away. I could do this as I expect to have some time at the end of the year.

josevalim

Awesome job!

jonatanklosko · 2025-12-19T17:30:27Z

I came up with a rewrite that paralelizes much better, I am merging this for reference, but will submit another PR soon :)

jonatanklosko · 2025-12-19T17:31:17Z

One thing we might want is more unit tests for these algorithms. For example, ten points at one location (duplicates) and ten points at another one far away. I could do this as I expect to have some time at the end of the year.

Definitely, PRs for that would be great!

Improve DBSCAN performance

2c0c95f

jonatanklosko commented Dec 18, 2025

View reviewed changes

jonatanklosko mentioned this pull request Dec 18, 2025

Slow DBSCAN runtimes #324

Closed

jonatanklosko added 2 commits December 18, 2025 21:17

bump actions/cache

8575886

Revert mix.lock

9090c27

krstopro self-requested a review December 18, 2025 20:31

krstopro approved these changes Dec 18, 2025

View reviewed changes

josevalim approved these changes Dec 19, 2025

View reviewed changes

jonatanklosko merged commit 0bce6e4 into main Dec 19, 2025
2 checks passed

jonatanklosko deleted the jk-dbscan branch December 19, 2025 17:31

jonatanklosko mentioned this pull request Dec 19, 2025

Rewrite DBSCAN from DFS to iterative propagation #326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve DBSCAN performance #325

Improve DBSCAN performance #325

Uh oh!

jonatanklosko commented Dec 18, 2025

Uh oh!

jonatanklosko Dec 18, 2025 •

edited

Loading

Uh oh!

jonatanklosko Dec 18, 2025 •

edited

Loading

Uh oh!

krstopro left a comment •

edited

Loading

Uh oh!

josevalim left a comment

Uh oh!

jonatanklosko commented Dec 19, 2025

Uh oh!

jonatanklosko commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improve DBSCAN performance #325

Improve DBSCAN performance #325

Uh oh!

Conversation

jonatanklosko commented Dec 18, 2025

Uh oh!

jonatanklosko Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonatanklosko Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krstopro left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josevalim left a comment

Choose a reason for hiding this comment

Uh oh!

jonatanklosko commented Dec 19, 2025

Uh oh!

jonatanklosko commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jonatanklosko Dec 18, 2025 •

edited

Loading

jonatanklosko Dec 18, 2025 •

edited

Loading

krstopro left a comment •

edited

Loading