deduplicate function is stuck for a long time when applied to large dataset #839
kailashsp
started this conversation in
Show and tell
Replies: 1 comment
-
|
Hey @kailashsp, thank you for opening this discussion! The In the long term, we want to address this using blocking methods or probabilistic data structures. Before that, could you send a snippet of the code you used and a sample of the data? This might help us debug this. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have been trying out the deduplicate feature. It seems to work fine when I apply it a subset of dataset of size 100 items. But when I apply it to the whole data of 32000. It is stuck and I have tried to change the n_jobs and still no success
Beta Was this translation helpful? Give feedback.
All reactions